"The history of VocBench, what we have been doing during the last 15 months and what the future holds"

Armando Stellato has been collaborating with the AIMS team for many years now. He is currently leading the development of VocBench, a web-based, multi-lingual, editing and workflow tool that manages thesauri, authority lists and glossaries that use SKOS-XL. Recently Vocbench 2.0 was released, a perfect occasion to ask him some questions. He kindly gave us an exhaustive overview of the history and current status of the thesauri management tool and talked about the next steps. 

1. Could you tell us something about the history of the VocBench; how was it conceived?

VocBench (VB), for those who don’t know it from its first years of activity, was born inside FAO as the “Agrovoc Concept Server”. Developed in the context of the EU funded project Neon, it was a web based collaborative framework for the maintenance of Agrovoc. The system was highly customized around Agrovoc definitions and vocabulary. Backed by Protégé 3.0 API (extended with the OWL Plugin) over a relational database, it did not have a strong binding with RDF (SPARQL, triple oriented APIs) nor native support for the RDF vocabulary which, in the meanwhile, was being adopted for Agrovoc: SKOSXL. 

However, these limitations must be seen in the context of the years in which it was being developed. VB is in fact older than SKOSXL, and its development started even before the SKOS core language was released as a W3C draft document. Actually, it is also thanks to the many use cases provided by FAO, and by Agrovoc in particular, that SKOS and SKOSXL have been developed (through the work of Margherita Sini as member of the W3C Semantic Web Deployment Group).

I recall also the first time Johannes Keizer of the AIMS team talked to me about contributing to development of the original Agrovoc Concept Server (2009, when it almost finished). They had some performance issues, and Agrovoc, with its 5/6 million triples, was considered a large scale RDF dataset. Today, it can still represent a challenge for visualization/maintenance tools aiming at using reasoning, providing advanced UIs which require many queries etc… but surely loading/managing its data in a reasonable time is no more an issue for modern triple stores.

To give it the proper credit, and not seeing it in terms of its limitations, VB1.0 had a very powerful and friendly user interface, developed by continuous interaction with a concrete and wide user community, a well structured support for collaboration, with different groups of users and a powerful mapping mechanism between groups and enabled operations, good UI customization possibilities and an interesting publication workflow, which follows the various stages of management, from resource creation, approval and publication.

In the following years, VB had gathered the interest of other FAO departments interested in developing/maintaining their thesauri, and other institutions, such as the European Environment Agency (Gemet Thesaurus). They thus faced the problem of bringing in the possibility to manage different vocabularies, and also to make the move quite easily. After coming out with rapid patches for covering the first needs, soon it was clear that the project had to evolve into a general-purpose thesauri development tool. The concept “VocBench” was thus coined, and the idea of restructuring and revamping its architecture briefly followed.

2. Could you tell us what are the new technical and functional improvements that come with VocBench?

In 2012, the “VocBench 2.0 objective” was defined. A new architecture, with possibility for extensions/plugins, native RDF management and explicit support for/compliancy with SKOSXL.

The project was rather ambitious: an already notable amount of effort had been spent on VB1.0, still remaining inside the boundaries of Agrovoc and of the modeling exigencies of the FAO group. Completely supporting an existing standard , and moving to an open source project, would have dramatically widened the set of things to keep under control.

At that time, I was already collaborating with the AIMS group on VB and publication of Agrovoc as a LOD Dataset. In those same years, at the University of Rome Tor Vergata, where I’m currently faculty member as researcher, we developed another RDF Management framework, called Semantic Turkey. Semantic Turkey (ST from now on) had a simple UI deployed as a Firefox extension, and was able to manage OWL ontologies and SKOS and SKOSXL concept schemes. We never had the strong community support of VB, and we never invested too much in improving its usability, nor had the same user feedback VB had, but the backing RDF framework we developed already had everything that was missing from VB. In a certain sense, we immediately saw Semantic Turkey and VB as exact complimentary systems. A potential marriage which would have brought us easier and quicker to the established objectives.

I thus suggested (not without some concern!) to define a first VB2.0 release as a mere porting of VB1.2 (the latest VB version at that time), which had to show nothing new to the user, but enabling extensibility and more general support for SKOSXL by completely replacing its RDF management layer with the one offered by Semantic Turkey. After a few weeks examining requirements, possibilities and things to do, the group accepted the proposal. The path was laid!

3. VocBench is a community product with various institutions involved in its development. Which are these institutions and what is their role?

Well, FAO obviously by first, which supported the project from its very start, and it’s still largely contributing to its current development.

University of Tor Vergata, Rome, which contributed the Semantic Turkey RDF Framework and is pushing ahead development of both platforms.

For what concerns our specific adoption of VocBench in maintaining many FAO resources, we greatly thank MIMOS Berhad, the national R&D center of Malaysia, which is supporting the hosting of all LOD related services, including VocBench, our SPARQL endpoints, Web Services and much more

Oh, and last minute news: the documentation office of the European Union, which has already manifested large interest in the adoption of VocBench, has recently run a development/deployment team for adopting VB for managing the Eurovoc thesaurus. Their interest will vary from producing extensions, to actively collaborating to development of the core framework.

VocBench is not only a tool for editing of multilingual thesauri, but is multilingual itself!, we thus thanks the contributions received for updating its sheets containing the localized entries for its UI, from various partners all over the world.

4. Initially VocBench was designed to edit AGROVOC thesaurus, however, over the years VocBench is applicable to diverse (SKOS-XL ) environments and varied user groups. Could you enumerate the known users of VocBench and how it is used?

We have started adopting VocBench2.0 two months ago, and have not worked too much on disseminating this result, so it is maybe early to say, but, concretely speaking about other FAO departments and other organizations already adopting it or very close to doing it, I may cite:

  • European Documentation Office, for the maintenance of Eurovoc
  • Italian Senate (Teseo Thesaurus)
  • European Environmental Agency (GEMET thesaurus). They were really pioneers, as they underwent all the required work to customize the old VB1.x in order to work with GEMET
  • Harvard University: Unified Astronomy Thesaurus
  • NetAge (Netherlands), an IT company proposing it to their customers

Inside FAO, we maintain:

  • Agrovoc
  • Land and Water Thesaurus
  • Biotech Glossary

5. What are the expected developments regarding VocBench in the near future?

We are happy to have reached this first objective. I’m happy in particular about the “development cut” we found in order to balance introduction of new required features and the necessity to release it in a reasonable time.

Now we have a second design/development phase which is no less intense than the first, and surely more complex. In order to facilitate development of extensions, and to scale out to new possibilities, we are improving the adoption of OSGi inside Semantic Turkey, and we recently added Spring support, as we were interested to its Aspect Programming and Dependency Injection features. I don’t go into the details of this: as you may guess this is already food for developers, but will pay us back on the longer term.

I still think that, with a developer-friendly approach, VB can gather the interest of more contributors, and I see a future where totally new functionalities will be developed by interested third parties. It will be a non trivial path, as VocBench is not a desktop tool, and we are studying the implications of any extensions point in the context of a web system accessed by several users, addressing scalability in terms of both performance and usability.

In any case, end users are not forgotten! the foreseen improvements will be immediately exploited for a new Ontology Alignment framework which is already being developed in the context of the EU funded project SemaGrow. This will feature functionalities for facilitating manual alignment of resources, but also the inclusion of automatic alignment systems, which can be (new at current state of the art) automatically configured and fine-tuned on the basis of available LOD metadata.

Ontology Alignment support in a collaborative editing framework is not a trivial feature to realize, as it differs much from cases of alignment plugins for ordinary desktop tools. These latters usually work pretty well, until you need to load huge datasets to be mapped. In a centralized system serving many clients, policies for allowing the setup of a mapping environment only to certain groups of users ,should be defined, as well for letting other users access to these environments. For this reason, we are thinking about various “alignment experiences”, which will vary the balance between immediateness of use and strength of support from the system.