AGROVOC thesaurus : one of the biggest datasets in the Linguistic Linked Open Data cloud

AGROVOC multilingual thesaurus is currently one of the biggest datasets in the Linguistic Linked Open Data cloud (LLOD) – a collaborative effort pursued by OKFN Working Group on Linguistics to publish data according to the Open Definition for linguistics and natural language processing.

LLOD is inspired by the Linking Open Data (LOD) cloud diagramthe resources included in LLOD are chosen according to the same criteria of openness, availability and interlinking:

  • Data should be openly licensed using the Creative Commons licenses;
  • The elements in a dataset should be uniquely identified by means of a resolvable, Uniform Resource Identifier (URI), so users can access more information using web browsers.
  • Existing vocabularies such as OWLlemon and NIF help express linguistic resources;
  • Resolving an LLOD resource should return results using web standards such as HTMLRDF or JSON-LD (Content Negotiation may be used to show different versions to different users);
  • Data from multiple sources can trivially be combined. Linked graphs are a more flexible representation format for linguistic data.
  • Links to other resources help users discover new resources and provide semantics (common links express what you mean) and dynamicity (web data can be continuously improved).

LLOD is going to give birth to subcloud that gathers linguistic resources from a specific domain: the Linguistic Legal Linked Open Data cloud (LLLOD).

AGROVOC multilingual thesaurus (represented in 29 languages) is currently one of the biggest datasets (with 35,874 concepts and 676,091 terms, as of August 2018) in the LLOD, where it is interlinked with other linguistic resources classified in LLOD depending on their typology.

AGROVOC multilingual thesaurus is :

The conversion of AGROVOC into RDF has been accomplished thanks to VocBench, - a collaborative ontology and RDF editing tool, developed by the ART Group at the University of Rome Tor Vergata and FAO of the United Nations. In August 2018 AGROVOC editing has been migrated to Vocbench3 environment (see: official Vocbench3 documentation in progress). VocBench3 offers a web environment for maintaining thesauri, code lists and authority resources, providing advanced collaboration features such as history, validation and a publication workflow, and multi-user management with role-based access control. As a free and open source RDF modelling platform, VocBench3 aims at improving its previous incarnation VocBench2  by setting new standards for flexibility, openness and expressive power. The current version of VocBench3 offers fine grained RDF editing, and provides support for several core modelling vocabularies and ontologies.

  • If you would like to contribute to the expansion of the AGROVOC content in your language (actual AGROVOC language coverage can be found here) in the Vochbench3, please, don’t hesitate to contact the AGROVOC team at: [email protected] and we will do our best to provide you with all necessary information.

AGROVOC can be freely accessed through several endpoints such as SPARQL, online search and RDF dumps.

Did you know?

The overall purpose of Linked (Open) Data and Web of Data/Semantic Web is that, starting from one data source, the user can get information from different sources connected by RDF links. The collection of Semantic Web technologies (RDF, OWL, SKOS, SPARQL, etc.) provides an environment where application can query that data, draw inferences using vocabularies, etc.


Keep up-to-date by Signing up for AIMS News, follow @AIMS_Community on Twitter.

And, thanks again for your interest!