Developing and enhancing the OpenCitations Corpus

In May 2017, the OpenCitations Enhancement Project (duration: 18 months, with support from The Alfred P. Sloan Foundation)to develop and enhance the OpenCitations Corpus, was announced. If you are an expert in Web Interface Design and Information Visualization, and can demonstrate a commitment to increasing the openness of scholarly information, you can express early interest in joining the OpenCitations team!

"It is a scandal that mass access to citation data is still in the hands of a small group of closed-access players". –@dshotton #WikiCite

The OpenCitations Corpus (OCC) is an open scholarly citation database that freely and legally makes available accurate citation data (academic references) to assist scholars with their academic studies, and to serve knowledge to the wider public.

The OpenCitations Enhancement Project will

make the OCC more useful to the academic community both by significantly expanding the volume of citation data held within the Corpus, and by developing novel data visualizations and query services over the stored data.

These objectives will be achieved in the following ways:

  • By establishing a new powerful physical server to handle the Corpus data and offer adequate performance for query services;
  • By increasing the rate of data ingest into the Corpus, by integrating with server 30 small data-ingest computers, Raspberry Pi 3Bs, working in parallel to harvest references, thus increasing the current rate of corpus data ingest some thirty-fold to about half a million citation links per day;
  • By employing a post-doctoral computer science research engineer specifically to develop information visualisation interfaces and sense-making tools that will both provide smart ways of envisaging and comprehending the citation data stored within the OpenCitations Corpus, and will also ease the task of manual curation of the OCC.

This post-doctoral appointment will start in the autumn of 2017, once the new hardware has been commissioned and programmed.  

OCC seek a highly intelligent, skilled and motivated individual who is an expert in Web Interface Design and Information Visualization, and who can demonstrate a commitment to increasing the openness of scholarly information.

Applications open for a Research Fellowship on OpenCitations, deadline 7 August 2017 - more info here

The OCC is being continuously populated from the scholarly literature

As of July 13, 2017, the OCC has ingested the references from 194,445 citing bibliographic resources and contains information about 8,276,364 citation links to 4,817,851 cited resources.

By the end of the OpenCitations Enhancement Project, 

it will have harvested approximately 190 million citation links obtained from the reference lists of about 4.4 million scholarly articles (approx.15% of Web of Science’s coverage). This will represent a significant initial step towards the comprehensive literature coverage sought for the OCC, and establish the OpenCitations Corpus as a valuable and persistent free-to-use global scholarly on-line Linked Open Data service.

In so doing, the project team* aims at empower the global community by liberating scholarly citation data from their current commercial shackles, publishing such data with a Creative Commons CC0 Public Domain Dedication that will enable novel third-party services to be built over them.

Silvio Peroni of the University of Bologna is Principle Applicant on the successful Sloan Foundation application to fund the OpenCitations Enhancement Project, with the Centre's Director, Professor Dave De Roure, as Co-Applicant. Dr David Shotton, the Centre's Senior Research and Emeritus Fellow, is Consultant Co-Investigator on the project. Individuals with the relevant skills and background who would like to express early interest in joining the OpenCitations team may contact Silvio Peroni by e-mail: [email protected].

Source: Oxford e-Research CenterFollow OpenCitations @opencitations

___________________________________________________________________________________________________

Related content: