Automatic Term Relationship Cleaning and Refinement for AGROVOC

Resumen

AGROVOC is a multilingual thesaurus developed and maintained by the Food and Agricultural Organization of the United Nations. Like all thesauri, it contains some explicit semantics, which allow it
to be transformed into an ontology or used as a resource for ontology construction. However, most thesauri, AGROVOC included, give very broad relationships that lack the semantic precision needed in
an ontology. Many relationships in a thesaurus are incorrectly applied or defined too broadly.
Accordingly, extracting ontological relationships from a thesaurus requires data cleaning and refinement of semantic relationships.
This paper presents a hybrid approach for (semi-)automatically detecting these problematic relationships and for suggesting more precisely defined ones. The system consists of three main modules: Rule Acquisition, Detection and Suggestion, and Verification. The Refinement Rule Acquisition module is used to acquire rules specified by experts and through machine learning. The Detection and Suggestion
module uses noun phrase analysis and WordNet alignment to detect incorrect relationships and to suggest more appropriate ones based on the application of the acquired rules. The Verification module is a tool
for confirming the proposed relationships. We are currently trying to apply the learning system with some
semantic relationships to test our method.

Kawtrakul, Asanee and Imsombut, Aurawan and Thunyakijjanukit, Aree and Soergel, Dagobert and Liang, Anita and Sini, Margherita and Johannsen, Gudrun and Keizer, Johannes Automatic Term Relationship Cleaning and Refinement for AGROVOC., 2005 . In 5th Conference of the European Federation for Information Technology in Agriculture, Food and Environment, Vila Real (Portugal), July 25-28 2005. [Conference paper]