A review of the COAR Interest Group "Controlled Vocabularies for Repository Assets

Jochen Schirrwagen is research fellow at Bielefeld University Library, Germany. He works for the OpenAIRE2020 project and is responsible for the bibliographic metadata aggregation from data sources like repositories, e-journals and CRIS. He also coordinates the advancement of the OpenAIRE guidelines for data source managers. Accordingly there are number of issues with regard to interoperability of metadata exchange formats, transfer protocols and vocabularies.This was motivation enough to set up and chair the COAR Interest Group on Controlled Vocabularies which is also open to interested people whose organization is not member or partner of COAR. AIMS Editorial team asked him the following questions:-

AIMS: Could you summarise the objectives and background concept or ideas behind the COAR Interest Group "Controlled Vocabularies for Repository Assets"?

JS: The Interest Group aims for collecting information and maintaining a knowledge base on general, cross-domain but also thematic vocabularies and application profiles which are used to describe items in Open Access repositories. Typical items are research publications, research data, learning objects, digital collections.

Special attention is paid to an application profile developed in the context of the EU DRIVER and OpenAIRE knowledge infrastructure. It is named by its namespace prefix "info:eu-repo" and defines 11 vocabularies describing common aspects of research outputs (like publication and object type, access right, classification scheme, date event, author identifier, funder and project identifier, identifier scheme for the described resource and identifier schemes for linked or referenced resources).

While the "info:eu-repo" application profile has gained global importance in the scholarly information infrastructure, it has lacked international governance and sustainable organizational support. COAR as an international body and in particular the Interest Group can provide the proper governance structure and moreover it can extend the viability of the application profile over the lifetime of a project.

Adding or changing vocabulary terms has a profound impact on the systems implementing such vocabulary, therefore an international group of stakeholders is needed to review the existing vocabulary and agree on any future changes. COAR is the international body with the proper governance structure that can take up that role.

AIMS: What are the major highlights of the activities and deliverables achieved by the COAR Interest Group "Controlled Vocabularies for Repository Assets" so far?

JS:The group started in January 2014 and the major focus from the beginning was to organize the revision process of the "info:eu-repo" vocabularies and their terms. To this aim an authority group within the Interest Group was set up to perform the necessary review.

Soon it turned out that not only a consolidation of the vocabularies was necessary but also a modern web standard to put the vocabularies on a sound basis and overcome conceptual issues of "info:eu-repo". Such issues pertained the name of the application profile with its European focus and thus hindering global acceptance, the deprecation of the "info" URI registry where the namespace is registered and last but not least the lack of translated labels for use in regional repository networks. The solution was to rebrand the application profile into the "Set of COAR Controlled Vocabularies" and to choose the SKOS format as a standard to describe the vocabularies.

The experience from the AIMS community members participating in the Interest Group has been crucial in this decision. It is in fact VocBench, a web-based, multilingual, editing and workflow tool that manages thesauri, authority lists and glossaries using SKOS, the tool that has been selected to support the COAR Controlled Vocabularies. The AIMS Community is also providing the technical support and infrastructure for the publication of these vocabularies as SKOS and Linked Open Data.

Each vocabulary in "info:eu-repo" was compared with similar vocabularies and dictionaries. E.g. terms from the publication type vocabulary as the most comprehensive one was compared with terms used by DCMI, CERIF Semantic Vocabulary, CASRAI Dictionary, DataCite metadata kernel and others. A revised list of terms was suggested to the Interest Group and labels have been translated initially into English, Spanish and Chinese.

A working document has been drafted and will be released in April 2015. It describes the methodology of the review, the COAR controlled vocabularies and gives first recommendations for other vocabularies governed by external authorities. The set of COAR Controlled Vocabularies comes just at the right time when repository networks and scholarly initiatives like OpenAIRE, EuroCRIS, JISC in UK, La Referencia in Latin America and SHARE in the United States discussing steps to ensure that metadata information about research outputs is interoperable on a global level.

AIMS: What are the future activities planned for the IG as the research landscape, repository assets and technologies are changing?

JS:One of the major next steps will be the official release of the COAR Controlled Vocabularies and its communication to the repository community, repository platform vendors and scholarly initiatives. At the same time the group need to make the efforts necessary for the continuation of the technical host and the editorial support of the vocabularies. In my opinion the best strategy will be to expand the authority group to an editorial board which represents information specialists from the repository networks and standardization initiatives. Being a permanent body this ensures monitoring of ongoing changes in the repository landscape and research infrastructures and performing any updates needed.