UNESCO Thesaurus published with Semantic Web standards and Open-Source software

This essay highlights the success story of the deployment of SKOSMOSSKOS Play, Fuseki, and VocBench editing tool to manage and publish the UNESCO Thesaurus. The system leverages Semantic Web standards by relying on SKOS as the data exchange format, SPARQL as the online thesaurus query language, and dereferancable URI identifiers.

___________________________________________________________________________________________________

In 2016, Sparna conducted the replacement of the Thesaurus Management Software and thesaurus publication platform for the UNESCO Thesaurus, with Open-Source tools - such as SKOSMOSSKOS PlayFuseki, and VocBench editing tool all relying on Semantic Web technologies. 

VOCABULARY PUBLISHING SOLUTION

SKOSMOS - used as the UNESCO thesaurus browser -  is easy to deploy, well documented open source web­-based SKOS vocabulary browser that uses a SPARQL endpoint as its back­end. It can be used as a publishing platform for controlled vocabularies such as thesaurilightweight ontologies, classifications and authority files (see e.g.: AGROVOC multilingual thesaurus operated by FAO;  the Finnish national thesaurus and ontology service Finto, operated by the National Library of Finland).

SKOSMOS provides a multilingual user interface for alphabetical/hierarchical browsing and searching the data and for visualizing concept hierarchies. All of the SKOS data is made available as Linked Data. A developer­-friendly REST API is also available providing access for using vocabularies in other applications such as annotation systems.

Important aspects for UNESCO Thesaurus were the ability to have a multilingual interface (English, French, Spanish, Russian), the possibility to customize the stylesheets/logo/help page, or the order of the fields in a concept display page. A direct link to trigger a search in the UNESDOC database from a concept page in SKOSMOS was added , thus easily linking the new thesaurus browser to the existing resource center.

Two additionnal components were used for a complete vocabulary publishing solution:

THESAURUS MANAGEMENT SOFTWARE

UNESCO and Sparna chose to deploy VocBench, an open-source SKOS-based thesaurus management solution from the Tor Vergata University in Rome. VocBench features were also useful to capture corresponding country codes and language codes for certain concepts in the UNESCO thesaurus with a small UNESCO vocabulary publishing ontology.

VocBench was chosen mainly for its:

  • ability to properly handle collaborative multi-user maintenance of the thesaurus; this was an important aspect for UNESCO, having remote contributors to the thesaurus in Russia, and translations in Chinese and Arabic coming in the future;
  • edition workflow management;
  • multilingual thesaurus editing;
  • possibility to add custom attributes to the thesaurus concepts and terms;
  • validation workflow of the modifications.

The middleware component on which VocBench relies is called SemanticTurkey. In particular, VocBench requires a total of four pieces of software (relationnal database, RDF triplestore, SemanticTurkey server, VocBench application server). Once you are familiar with the procedure, and again with the precious help of the community on the mailing-list, everything works fine. 

VochBench is already deployed by other international organizations, and the upcoming v3 of VocBench is funded by the ISA2 program of the European Union, thus giving garantees as to the maintenance of the application in the next few years.

VocBench is SKOS-XL from the bottom up and stores the thesaurus data in an RDF triplestore. GraphDB from Ontotext is chosen to deploy as the backend for VocBench.

LEVERAGE THE THESAURUS TO ACHIEVE INTEROPERABILITY

All in all, the UNESCO has achieved its mission of transforming its thesaurus into open, reusable data. The thesaurus is now available for browsing by humans and in machine-readable formats. URIs makes it open for linking from/to other knowledge organization systems on the web, thus enabling interoperability between document databases of multiple organizations.

The project is also a great success story for Open Source; the support from the community and the maintainers of both SKOSMOS and VocBench was essential for such a quality achievement. Sparna and UNESCO contributed to both communities by providing translations, filing bug reports and testing new versions. It shows how these tools have enabled the UNESCO to replace an entire thesaurus management platform with no licensing cost, no vendor or data lock-in.

The UNESCO Thesaurus team is currently working on integrating the new thesaurus within the various information systems. Next phase will be mapping UNESCO thesaurus with vocabularies such as the UN Thesaurus and Eurovoc.

Source: SPARNA – UNESCO Thesaurus ...

___________________________________________________________________________________________________

Related contents: