ONKI Project: providing accessibility to controlled vocabularies, an interview with Osma Suominen

The AIMS editorial team invited Mr.Osma Suominen to share the experience of the ONKI project, from the National Library of Finland.This use case is provided below in the form of a Question and Answer.

Q1. Kindly let us know about yourself and your work related to the Finto.fi or the ONKI Ontology Browser

I'm working as an information systems specialist in the ONKI Project at the National Library of Finland. Previously I have been working at the Semantic Computing Research Group (SeCo) at Aalto University, where the ONKI ontology library system was developed during the last decade in the FinnONTO projects. The ONKI system was conceived as a research prototype using the Living Lab approach and many organizations have started to use it over the years. They use ONKI to access vocabularies as services and have integrated ONKI services into their information systems and metadata production processes. However, there has been uncertainty about its future, as it was run and financed only in an academic context.

Now in the ONKI project, the National Library of Finland, together with the Finnish Ministry of Education and Culture and the Ministry of Finance, is building the Finto service as a spin-off, production system implementation of the ONKI idea. We have a team of around eight people building the Finto service and also further developing the ontologies, including the General Finnish upper ontology YSO and many domain-specific ontologies.

Q2. Briefly explain the ONKI vocabulary service and browser and what model was employed to browse independent ontologies and what was the major purpose?

The idea is to host up-to-date versions of controlled vocabularies, including thesauri, classifications and lightweight ontologies. These are made available for the benefit of people creating metadata, for example catalogers in libraries, museums and archives. Also searchers looking for information in databases that make use of these vocabularies can benefit from having easy access to the vocabularies, including search facilities, various forms of browsing, alphabetical and hierarchical displays.

Perhaps the more novel part is to provide these same vocabularies also as open data and linked data. All vocabulary data is accessible via open APIs. The idea is that other information systems may rely on our API instead of implementing their own vocabulary storage and access components. The APIs are also easy to use in web widgets such as autocomplete facilities for quickly picking suitable terms.

In general, the vocabularies we host are treated as independent datasets. Currently we have 25 vocabularies in public view and more are on the way. However, the General Finnish upper ontology YSO is special as it is used as a basis for a large number of domain-specific ontologies (e.g. TERO for healthcare, LIITO for business, MAO/TAO for museum collections), which extend YSO in various speciality areas while retaining the general concepts from YSO. The KOKO ontology aggregates all these and provides a large cross-domain view.

All the FinnONTO ontologies are mainly intended to be used in applications that require rich metadata, be it museum collections, large scale web sites or library databases. The usage model is similar to many large thesauri such as LCSH or AGROVOC, but YSO and the other FinnONTO ontologies have been built with a more rigorous hierarchical structure than most other similar vocabularies.

Q3. Could you give us examples of organizations and instances  using your service and how is the service being utilized in managing information?

Some examples:

  • Viikki Science Library uses ONKI services for describing books and academic articles, and also as aids in their retrieval
  • EnterpriseFinland, a national portal for enterpreneurs, uses the LIITO ontology and ONKI services to describe content on their site, including text articles and service descriptions
  • Finnish Broadcasting Company YLE uses ONKI services and the KOKO vocabulary for describing content on their Swedish language website.
  • National Institute for Health and Welfare THL uses the TERO ontology, imported from ONKI, to describe web pages in their content management system and also for describing the specialities and skills of their staff
  • Brages Pressarkiv, a newspaper archive, uses ONKI services to describe newspaper articles
  • The National Research Data project, which aims to store datasets created as part of academic research and create a research data catalog, uses ONKI services to help creating descriptive metadata for the datasets.Read this case here.

Many of these access ONKI services via a JavaScript web widget that provides vocabulary services with very lightweight integration.

Q4. The AIMS team (FAO) seeks to encourage agricultural institutions to use AGROVOC or other RDF Linked ontologies in tagging and indexing their collections, what advice would you extend in this specific function of information management?

My advice is to make the vocabularies easily accessible both to regular users and to developers. Providing a good browser is important, but you also need to consider that collection managers in your target institutions probably spend most of their time looking at their own information systems, so try to influence what they see there. Provide easy to use APIs so that web developers who create those systems can integrate your vocabulary into the system and make it part of the user interface. Providing RDF / Linked Data is good, but unfortunately web developers may have a hard time grasping all the concepts required to use that, so a simplified REST-style API that gives JSON (or JSON-LD) is more helpful to them.

You seem to already provide an online browser for AGROVOC and a SOAP web service. That is good, but I think REST is nowadays more popular with developers. A simple REST API can be used directly from front-end tools such as jQuery, while SOAP always requires server-side integration and can be a bit clunky to set up.

It also helps to make your vocabulary data as open as possible. Creative Commons Attribution or even CC0 licensing would ensure the greatest possible propagation of your vocabulary.

The software we use to run the Finto service is called ONKI Light and it is open source (MIT License, available on Google Code). You are welcome to use that as an alternate browser and API provider for AGROVOC and other SKOS vocabularies!