AgroTagger

Agrotagger is a keyword extractor that uses the AGROVOC thesaurus as its set of allowable keywords. It can extract from Microsoft Office documents, PDF files and web pages.

Background

Agrotagger began as a collaboration with Indian Institute of Technology of Kanpur (IITK) in 2010. Building on top of the popular Keyword Extraction Engine (KEA) the team created several versions, some based on a reduced subset of AGROVOC known as AGROTAGS (produced by partner ICRISAT) and others using the full set of AGROVOC concepts.

MIMOS in collaboration with IITK and FAO produced an interesting application on top of the IITK tagging service by storing the generated keywords as RDF triples and building from this a tag cloud showing the most commonly extracted keywords.

In addition, FAO has collaborated with the Metadata Research Center of the University of North Carolina who include AGROVOC along with a host of other thesauri in their indexing and browsing tool known as HIVE.

Finally, within the context of the agINFRA project, FAO assembled an AGROVOC-based indexing package using the Maui indexing framework.

AgroTagger Application

The application allows a possibilitity to index web documents, creating RDF triples that link a web URL to some URIs of a SKOS thesaurus. Currently, the tagging application is based on MAUI and the used thesaurus is AGROVOC.

Web documents to be indexed can be passed to the application through a file containing a list of URLs, or through a file containing the output of an Apache Nutch Web Crawler.

The application is a java application based on three sub-application to be executed sequentially.Some bash scripts are provided to execute the application on a UNIX environment. Information about Java sub-applications and examples of usage are available in the Java Applications page.

The application and accompanying documentation can be accessed here.

Services

There are currently several available services that can be accessed either as web interfaces for manual document upload or as REST web services that can be programmatically invoked:

AGROTAGS V3 subset of AGROVOC
Full AGROVOC thesaurus
University of North Carolina - Hive Indexer
MAUI including AGROVOC
Installation packages:

Evaluations

In 2011 the Metadata Research Center at the University of North Carolina's School of Information and Library Science performed two analyses of AgroTagger. Objective: comparing automatic and manual indexing.

An Evaluation of Automatically Assigned Subject Metadata using AgroTagger and HIVE"
An independent work by Jenni Clements.
The study compares the subject metadata assigned by two automatic indexing programs, AgroTagger and HIVE (Helping Interdisciplinary Vocabulary Engineering), to subject metadata assigned by information professionals. It aims to determine if these automatically generated terms compare favorably with the professionally assigned terms and if they can be considered good enough to use in metadata records.
 
A collective report by Jackie Chapman, Jesse Savage and Brian Young.
The project’s objective is to study the effectiveness of manually and automatically generated AGROVOC metadata terms by evaluating the relevance of terms based on user judgment. The pilot study seeks to establish a methodology for accomplishing the overall project objective, including the development of a survey instrument that can be used in future, larger-scale studies.