New publications by AGROVOC team
Armando Stellato, Ahsan Morshed, Gudrun Johannsen, Yves Jaques, Caterina Caracciolo, Sachit Rajbhandari, Imma Subirats, Johannes Keizer (2011). A Collaborative Framework for Managing and Publishing KOS. 10th European Networked Knowledge Organisation Systems (NKOS) Workshop, Berlin (Germany).
Location in FAO CDR: http://www.fao.org/docrep/article/am814e.pdf
Abstract
In the Food and Agriculture Organization of the United Nations (FAO), the need to revamp its popular agriculture vocabulary AGROVOC using Semantic Web knowledge representation standards combined with the need to provide a collaborative environment for development and maintenance purposes, pushed forward the realization of a dedicated AGROVOC thesaurus maintenance tool. With the progressive standardization of the AGROVOC knowledge model, following recent Simple Knowledge Organization System (SKOS) recommendations by the World Wide Web Consortium (W3C) and with the addition of more FAO-maintained vocabularies, the former “AGROVOC Concept Server Workbench” has become a general-purpose framework for thesauri and vocabulary development and is now reborn as “VocBench”. In this paper, we describe the path which led to its realization and its main features.
Ahsan Morshed, Benjamin Zapilko, Gudrun Johannsen, Philipp Mayr, and Johannes Keizer (2011). Evaluating approaches to automatically match thesauri from different domains for Linked Open Data. 10th European Networked Knowledge Organisation Systems (NKOS) Workshop, Berlin (Germany).
Location in FAO CDR: http://www.fao.org/docrep/article/am815e.pdf
Abstract
With the use of SKOS the heterogeneous environment of various vocabularies worldwide can be technically harmonized prospectively and especially the content of traditional databases can be made accessible and connectable for applications of the Semantic Web, i.e. as Linked Open Data. Vocabularies in SKOS format and respectively crosswalks between them can play a relevant role in this context, because they can serve as a bridging hub for the inter-linking of different published and indexed data sets. However, huge effort in developing and evaluating automatic alignment techniques have focused mostly on ontologies in recent years (see activities from the OAEI and van Hage 2008), with the demand that vocabularies in SKOS format would often have to be converted into OWL format.<
Our case study presents how thesauri from different domains can be matched automatically and which matching approaches are most promising for this difficult task. Therefore we reprise approaches made in Lauser et al. 2008. The Thesaurus for the Social Sciences (TheSoz) and the AGROVOC thesaurus are established KOS in their domains and by their scope, but they seem to have very few conceptual overlap. Both thesauri are available in SKOS format and are freely available on the web. However, in order to detect possible linkages between both thesauri and to expose them into the LOD cloud, the intention of this paper is to check, if there are any good approaches to find conceptual overlap in thesauri from remote domains (semi-)automatically. Therefore different approaches for aligning ontologies and linking data sources on the web are performed on both SKOS thesauri. The automatic generated matches, which should preferably be statements with properties skos:exactMatch, skos:closeMatch or owl:sameAs, are then evaluated by domain experts.
The initial matching approach is based on the syntactic algorithm which consists of Levenshtein distance and Jaro Measure. We can adapt it by following the steps :
- The selected thesauri are downloaded as SKOS resources from their respective websites.
- A single triple store is created, with all RDF/SKOS triples coming from the thesauri.
- Each pair of thesauri (AGROVOC-X) is considered at a time, e.g., AGROVOC and TheSoz, and so on).
- To all of the possible pairs of concepts formed (the first concept coming from AGROVOC, the second one, from the other thesaurus), the following steps are taken:
a. the preferred label only is considered;
b. the above similarity measure is applied;
c. the average of the similarity measure is computed;
d. a threshold is applied for tuning the measure for finding the matches