Status of the AgroTagger service and its integration into Drupal and DSpace

Yesterday, 1st September 2010, our group in FAO attended a presentation by Rahul Samaddar (IIT Kanpur) on the status of the AgroTagger service (more info on AgroTagger here).

 

Other meetings with Rahul are planned for the next two days. This is just a short summary of the first day.

The AgroTagger service is almost ready as a really interoperable web service: it will be a java application exposing a URL that can be called, that will accept text or the URL of a PDF and will return the selected Agrovoc terms (as text string at the beginning, then as XML and perhaps Json soon). For the moment, they implemented sort of an un-documented API that they use from PHP to push the document to their AgroTagger application server and have it indexed.

The current web page giving access to the service is:

http://agropedialabs.iitk.ac.in/Tagger/

 

The AgroTagger Drupal module that has been developed is not a client module (allowing to index Drupal nodes) as expected, but an interface to the AgroTagger service itself: it creates a page on the Drupal website that allows to send text or a PDF to the AgroTagger and display the resulting terms. Although different from what expected, since it is apparently easy to make this Drupal page act as a URL accepting parameters and returning XML, the potential of implementing a RESTful AgroTagger web service on any Drupal installation is of high interest to us anyway and in the next days we will work on an implementation of the service on our server.

Once the web service is available, developing a client script in any Drupal installation will not be difficult, even without a module.

Another interesting achievement of the IIT team in Kanpur is their collaboration with ICRISAT for the implementation of a DSpace AgroTagger plugin for automatic indexing: the plugin was used in the ICRISAT repository DSpace installation and it successfully indexed 2000 documents. The remaining 1000 documents were scanned images instead of native PDFs so some semantics were lost, but they managed to do some lower-quality indexing anyway by using some OCR software and converting to PDF. An impressive result.

They are now waiting until the plugin is "standardized" for any DSpace installation and then they will release it for the DSpace community.