UNESCO Thesaurus published with Semantic Web standards and Open-Source software
This essay highlights the success story of the deployment of SKOSMOS, SKOS Play, Fuseki, and VocBench editing tool to manage and publish the UNESCO Thesaurus. The system leverages Semantic Web standards by relying on SKOS as the data exchange format, SPARQL as the online thesaurus query language, and dereferancable URI identifiers.
In 2016, Sparna conducted the replacement of the Thesaurus Management Software and thesaurus publication platform for the UNESCO Thesaurus, with Open-Source tools - such as SKOSMOS, SKOS Play, Fuseki, and VocBench editing tool - all relying on Semantic Web technologies.
VOCABULARY PUBLISHING SOLUTION
SKOSMOS - used as the UNESCO thesaurus browser - is easy to deploy, well documented open source web-based SKOS vocabulary browser that uses a SPARQL endpoint as its backend. It can be used as a publishing platform for controlled vocabularies such as thesauri, lightweight ontologies, classifications and authority files (see e.g.: AGROVOC multilingual thesaurus operated by FAO; the Finnish national thesaurus and ontology service Finto, operated by the National Library of Finland).
SKOSMOS provides a multilingual user interface for alphabetical/hierarchical browsing and searching the data and for visualizing concept hierarchies. All of the SKOS data is made available as Linked Data. A developer-friendly REST API is also available providing access for using vocabularies in other applications such as annotation systems.
Important aspects for UNESCO Thesaurus were the ability to have a multilingual interface (English, French, Spanish, Russian), the possibility to customize the stylesheets/logo/help page, or the order of the fields in a concept display page. A direct link to trigger a search in the UNESDOC database from a concept page in SKOSMOS was added , thus easily linking the new thesaurus browser to the existing resource center.
Two additionnal components were used for a complete vocabulary publishing solution:
- SKOS Play was used to generate downloadable PDF documents generated from the SKOS thesaurus : complete editions of the thesaurus with alphabetical index, hierarchical tree and translation tables, and KWIC indexes, each in French, English, Spanish and Russian. The documents are regenerated automatically each time a new version of the thesaurus is published;
- Fuseki with a customized SPARLQ form was used as the frontend for public SPARQL querying of the thesaurus.
THESAURUS MANAGEMENT SOFTWARE
UNESCO and Sparna chose to deploy VocBench, an open-source SKOS-based thesaurus management solution from the Tor Vergata University in Rome. VocBench features were also useful to capture corresponding country codes and language codes for certain concepts in the UNESCO thesaurus with a small UNESCO vocabulary publishing ontology.
VocBench was chosen mainly for its:
- ability to properly handle collaborative multi-user maintenance of the thesaurus; this was an important aspect for UNESCO, having remote contributors to the thesaurus in Russia, and translations in Chinese and Arabic coming in the future;
- edition workflow management;
- multilingual thesaurus editing;
- possibility to add custom attributes to the thesaurus concepts and terms;
- validation workflow of the modifications.
The middleware component on which VocBench relies is called SemanticTurkey. In particular, VocBench requires a total of four pieces of software (relationnal database, RDF triplestore, SemanticTurkey server, VocBench application server). Once you are familiar with the procedure, and again with the precious help of the community on the mailing-list, everything works fine.
VochBench is already deployed by other international organizations, and the upcoming v3 of VocBench is funded by the ISA2 program of the European Union, thus giving garantees as to the maintenance of the application in the next few years.
LEVERAGE THE THESAURUS TO ACHIEVE INTEROPERABILITY
All in all, the UNESCO has achieved its mission of transforming its thesaurus into open, reusable data. The thesaurus is now available for browsing by humans and in machine-readable formats. URIs makes it open for linking from/to other knowledge organization systems on the web, thus enabling interoperability between document databases of multiple organizations.
The project is also a great success story for Open Source; the support from the community and the maintainers of both SKOSMOS and VocBench was essential for such a quality achievement. Sparna and UNESCO contributed to both communities by providing translations, filing bug reports and testing new versions. It shows how these tools have enabled the UNESCO to replace an entire thesaurus management platform with no licensing cost, no vendor or data lock-in.
The UNESCO Thesaurus team is currently working on integrating the new thesaurus within the various information systems. Next phase will be mapping UNESCO thesaurus with vocabularies such as the UN Thesaurus and Eurovoc.
Source: SPARNA – UNESCO Thesaurus ...
- 2016 Open Source Yearbook
- Focus On VocBench and GACS: achieving efficiency through semantic interoperability and cooperative maintenance