Automatic Term Relationship Cleaning and Refinement for AGROVOC
Abstract
AGROVOC is a multilingual thesaurus developed and maintained by the Food and Agricultural Organization of the United Nations. Like all thesauri, it contains some explicit semantics, which allow it
to be transformed into an ontology or used as a resource for ontology construction. However, most thesauri, AGROVOC included, give very broad relationships that lack the semantic precision needed in
an ontology. Many relationships in a thesaurus are incorrectly applied or defined too broadly.
Accordingly, extracting ontological relationships from a thesaurus requires data cleaning and refinement of semantic relationships.
This paper presents a hybrid approach for (semi-)automatically detecting these problematic relationships and for suggesting more precisely defined ones. The system consists of three main modules: Rule Acquisition, Detection and Suggestion, and Verification. The Refinement Rule Acquisition module is used to acquire rules specified by experts and through machine learning. The Detection and Suggestion
module uses noun phrase analysis and WordNet alignment to detect incorrect relationships and to suggest more appropriate ones based on the application of the acquired rules. The Verification module is a tool
for confirming the proposed relationships. We are currently trying to apply the learning system with some
semantic relationships to test our method.