Presentation and talks with the Outreach Division of the Secretariat

04.09.2010

Presentation and talks with the Outreach Division of the Secretariat

yesterday I had a presentation with the staff of the "Information Aquisition and processing" cluster of the Outreach Division of the UN secretariat. After the presentation I had talks with Shinichi Kushima the cluster chief and we agreed to set up a common project. Our discussion can be summarized as follows:

Our purpose is to initiate a project that exploits the developments done in both teams and that aims to bring the quality of Information Management in our areas of responsibility to an higher level. Initially this project will be started by FAO’s Office of Knowledge Exchange, Research and Extension, but it is open to encompass the entire United Nations network between the secretariat and it’s specialized agencies. The title of this project will be “Linked Open Data from the United Nations for the World”. This project will aim to construct an advanced infrastructure (without any centralization) that makes information from FAO and UN Secretariate easily accessible to the world. This project will have the following elements:

a) Publishing existing metadata and vocabularies as fast as possible as linked open data. Existing structured data (metadata from databases and vocabularies) can be published without bigger investments and with little delay. This is of great necessary to make our data reference points in the linked data environment

b) .Development of plugins for search engines based on the existing agency thesauri. These plug ins will make it possible to execute multilingual assisted searches with common fulltext search engines like the Google Search Appliance, Lucene and similar search engines. With these plugins, developments already ongoing at UN in NY and FAO in Rome (Customization of Google search Appliance and Multilingual Semantic search assistant) will be brought to a production level.

c) Introduction of a common maintenance environment for vocabularies (thesauri, authorityfiles, categorization schemes), This common maintenance environment will be based on the conceptserver workbench, already developed by FAO. The workbench will be enhanced as a platform to be used also for UNBIS and later for all UN agency vocabularies

d) Development of Production level machine indexing to substitute human indexing of agency publications. This machine indexing service will use the thesauri of the organizations. It will be based on the algorithms and software that has been developed for AgroTagger. As first step this will be adapted also to become an UNBIS tagger. Methodologies will be developed to adapt the system to any Agency thesaurus and document corpus.

e) Based on the Automatic Indexing system an openWebservice will be established, that gives anyone in the world the possibility to use the vocabularies of the UN and its agencies to index content. This service will work like the existing openCalais application, but the data will be freely available for everyone

f) The project will develop a methodology and framework for semantic interoperability of information from the UN agencies. As a first step AGROVOC and the UNBIS thesaurus will be mapped and equivalence declarations will be inserted into the published linked data, so that crosswalks between the different document corpora become possible.

g) Institution of openArchives. The initiative aims for the institution of open Archives with all public UN/Agency information for the secretariat and all agencies. At FAO this development is already quite progressed. Lessons learned and experiences will be documented as material for the other institutions. An analogous openArchive will be institutionalized by the Outreach Division of the secretariat.