Used for indexing information resources, Agrotagger is a keyword extractor that uses the AGROVOC thesaurus as its set of allowable keywords. It can extract from Microsoft Office documents, PDF files and web pages.
Agrotagger began as a collaboration with Indian Institute of Technology of Kanpur (IITK) in 2010. Building on top of the popular Keyword Extraction Engine (KEA) the team created several versions, some based on a reduced subset of AGROVOC known as AGROTAGS (produced by partner ICRISAT) and others using the full set of AGROVOC concepts.
MIMOS in collaboration with IITK and FAO produced an interesting application on top of the IITK tagging service by storing the generated keywords as RDF triples and building from this a tag cloud showing the most commonly extracted keywords.
In addition, FAO has collaborated with the Metadata Research Center of the University of North Carolina who include AGROVOC along with a host of other thesauri in their indexing and browsing tool known as HIVE.
There are currently several available services that can be accessed either as web interfaces for manual document upload or as REST web services that can be programmatically invoked:
- MIMOS-hosted IITK Kanpur Agrotagger (to invoke this service programmatically, follow the same instructions as for the IITK Kanpur service)
In 2011 the Metadata Research Center at the University of North Carolina's School of Information and Library Science performed two analyses of AgroTagger. Objective: comparing automatic and manual indexing.
“An Evaluation of Automatically Assigned Subject Metadata using AgroTagger and HIVE"
An independent work by Jenni Clements.
The study compares the subject metadata assigned by two automatic indexing programs, AgroTagger and HIVE (Helping Interdisciplinary Vocabulary Engineering), to subject metadata assigned by information professionals. It aims to determine if these automatically generated terms compare favorably with the professionally assigned terms and if they can be considered good enough to use in metadata records.
A collective report by Jackie Chapman, Jesse Savage and Brian Young.
The project’s objective is to study the effectiveness of manually and automatically generated AGROVOC metadata terms by evaluating the relevance of terms based on user judgment. The pilot study seeks to establish a methodology for accomplishing the overall project objective, including the development of a survey instrument that can be used in future, larger-scale studies.