At FAO, we are committed to helping combat and eradicate world hunger. Information dissemination is an important and necessary tool in furthering this cause –we need to provide consistent, usable access to information for the community of people doing this very work. In order to support this goal, we have proposed the design and development of an Agricultural Ontology Service (AOS) that will function as a reference tool that structures and standardises agricultural terminology in multiple languages for use by any number of different systems.
We prepared and presented a concept note titled The Agricultural Ontology Service (AOS)-A Tool for Facilitating Access to Knowledge which was disseminated at the First International Semantic Web Working Symposium at Stanford from July 30 – August 1, 2001. A follow-up workshop to discuss the various issues presented in the concept note was organized at the FAO Headquarters in Rome on November 14-15, 2001. Experts in the areas of Knowledge Representation, Ontologies, Databases and the Semantic Web were invited to make presentations and discuss their views on the project.
We present the deliberations and discussions conducted at the workshop along thematic categories or dimensions. These dimensions are motivated by the need of implementing a system that supports the requirements presented in the AOS concept note. The set of broad categories or dimensions that were identified, is:
For each dimension or category, we will summarize the discussions at the workshop. These discussions included the state of the art and research, and tools/algorithms needed to support the various functions related to that category.
One of the most critical problems in the development of the AOS system will be the creation of new domain ontologies. Also important is the fact that these ontologies will keep evolving over time, and hence it is important to design and develop tools and processes for maintenance and versioning.
The existing tools that support ontology building are primarily data model specific and depend on models like the E-R, frame and object oriented models. A layered approach to ontology building that might be suitable in the context of the System is discussed in the next section. Some existing ontology building tools that were identified were: InfoSleuth Ontology Editor (ER model), OKBC Editor (Frame model), Protégé, OntoEdit (Free and commercial), and Ontology Builder. Some free UML based ontology tools are: I-logix, Uniting Software Design and Tigris. Also, a list of third party, open source tools based on the RDF and DAML standards can be downloaded from http://www.semanticweb.org and http://www.daml.org. Some issues related to building and maintenance of various versions of ontologies that tools need to handle are:
A “layered” approach to description logics for building AOS is proposed that will provide several layers of complexity in the description logic used for building AOS ontologies. The number and extent of each layer is yet to be determined, but could follow these guidelines:
It seems clear that existing language standards for constructing ontologies should be used, rather than invent new languages specifically for AOS. The Core layer could generally follow RDF standards, and the Intermediate layer could follow +DAML. However, there is little agreement on the details or standards available for the middle and domain layers.
We envisage that a large number of Domain Experts across a large number of sponsoring organizations across multiple domains will be involved in ontology building efforts. We discussed ontology building tools and a layered approach to building ontologies in the previous section. There are groupware based products that support collaborative document creation and editing. WebOnto and Ontolingua are tools that support collaborative ontology design and construction.
It is important to move beyond conventional relationships found in thesauri (BT/NT/RT/UF/TT/SA/SN) to a more formal and richer set of language primitives for building descriptions of concepts in the ontology. This would enable a higher degree of semantic representation, leading to improved information retrieval performance. The term “description logic” applies to such languages, which are formally defined and which have well understood computational properties. For example the BT/NT distinction commonly used in thesauri is used loosely and ambiguously to represent both superclass/subclass as well as part/whole relationships. In a formal description logic, superclass/subclass and part/whole would be given distinct and unambiguous language elements.
However, caution is advised since most of the main user groups identified are currently unfamiliar with description logics (Section 2.7) and are either more comfortable using standard thesauri relationships (catalogers), or for which detailed representational issues must either be completely hidden or cleverly displayed through some high-level data visualization interface (end-user). Some of the tradeoffs involved in using advanced description logics are listed here. Richer expressiveness refers to allowing more language elements:
In modern description logics, different language elements can be switched on or off as needed in modeling a particular domain, keeping in mind the potential price to be paid in using a particular element. This leads to a compromise to resolve the above expressiveness/tractability tradeoff that would enable different user groups to utilize AOS to whatever level of complexity their application and situation requires.
More work is needed to define issues related to supporting multiple languages (English, Spanish, Italian, etc). Adding support for multiple languages makes the complex process of ontology building even more complicated. Even within a single language such as Spanish, there is no standard, and throughout the world multiple dialects are used.
Approaches discussed to supporting multiple languages include using an interlingua (having a language-neutral intermediate language for building concept descriptions which are mapped to different languages) and alternatively having core concepts defined in English and then translated to other languages. The interlingua approach is more flexible but more difficult to build. It is possible that not all concepts in AOS would be available in all languages (relaxing the requirement to translate everything), but that would result in a heterogeneous system that could not support global search in any language. Perhaps only a subset, such as core concepts would be available in all languages. A more thorough review of existing approaches is needed before final recommendations can be made.
Given the manually intensive and costly nature of developing ontologies, an attempt should be made to investigate the re-use of existing KOSs and domain ontologies in the design and creation of ontologies in the AOS system.
Modules are discrete sections of the AOS which are domain specific for instance, the forestry and the fishery domains. It may be the case that there are several KOSs for a particular module of the ontology, for instance CABI thesaurus and AGROVOC which are very similar. Alternatively, existing KOSs, such as the AGROVOC includes concepts from different domains such as forestry and fishery. We can either decide on a module and then identify the relevant KOSs, or alternatively, use an existing KOS to decide on a particular module. Some toolsfor constructing ontologies are accompanied by a methodology and process to develop an ontology. One example of such a process is the one used in developing the
One approach for reducing the time and effort to create ontologies is to design algorithms and techniques to enhance existing KOSs into an ontology. Studying the various KOSs will show us what is needed in terms of a representation/upper level ontology (Section 2.1.3). The constructs identified in this layer can then be used to represent the domain specific concepts in the AOS ontology. We need to work both top-down and bottom up to determine these requirements. In order to assess the degree of effort, the sources can be categorized using:
If too much effort is needed to re-use a KOS, then it may be beneficial to re-use only a part of the KOS or to build its concepts into the AOS from scratch without re-use. There have been attempts to transform and enhance database schemas into domain ontologies. A product that generates an E-R model after analyzing database schemas is ERWin. The other KOStypes that need to be dealt with are: thesauri, vocabularies, glossaries, subject headings, classification lists, etc. However there is no known software at this type which transforms and enhances these other KOStypes into a domain ontology.
In this section, we discuss requirements of tools and techniques required for the incorporation of pre-existing domain ontologies into the AOS ontology. Issues related to distributed ontology interoperation will be discussed in another section. Some of the identified requirements are:
Since the ontology is to be used to access data and documents over the web, tools and techniques that help define and create associations/mappings between ontological elements and the underlying data are very critical for the success of the project.
It would be desirable to identify information resources (i.e. guide book, journal article, web site, video segment, still image, etc.) in a consistent fashion. This will benefit users who may be searching for a particular media type, or who would want to know what form particular information is in. Such a facility exists in the Dublin Core standard where there is a resource attribute associated with metadata. In RDF, any valid URL is considered to be a resource. However, it would be difficult to formulate such a list as a permanent part of AOS, and in many cases assignment of such a resource type would be ambiguous. Rather than identify resource types a formal part of AOS, it is suggested that an ontology of resource types could be created and use as desired by catalogers. Such an ontology of resources could be modified and maintained over time. Rigid definitions would be required in order to use such resource definitions unambiguously.
The first set of tools are those that associate database schemas with concepts and attributes in the ontology. Examples of commercial tools that have some of that functionality are tools for object relational mapping available in the IDE associated with J2EE suite of software that help specify a mapping between the EJB component object models and the underlying relational database schemas. Other examples are tools available with the InfoSleuth/EDEN system and Kaon-reverse from Univ.of Kalsruhe.
pThere are a wide variety of tools that help annotate textual documents with concepts from an ontology. Examples of these are the IKA class of software and Onto-Mat. There is no known software for annotation of images with ontological concepts. Tools to generate web-sites based on ontologies, e.g.., Semantic Miner, and tools to learn ontological annotations of textual documents, e.g., TextToOnto are also likely to be useful in this context.
There is interest in supporting natural language processing (NLP) as part of AOS at some level. Applications of NLP would include query processing (users make requests for information by stating queries in the form of natural language expressions), machine translation (conversion of texts from one language to another), and information extraction (extracting facts and other information from on-line texts). NLP could help identify the context of a user’s information request, and automatically select appropriate word senses for the terms being used. NLP is expected to emerge as an important technology in the coming years, and NLP systems would require a terminology resource such as AOS as a fundamental component. Adding NLP support capabilities to AOS would require Advanced-level description logic support, and would require the most complex and extensive data structures. Currently there is little agreement or standardization of what form these data structures would take. In addition to associating part-of-speech information with lexical terms, NLP extensions to AOS would also include inclusion of grammatical patterns associated with lexical entries, and ways of mapping syntactic patterns to semantic representations.
Whenever a user specifies an information request based on the ontology (which in turn is associated with/mapped to a large number of text and relational databases), the information request has to be: (a) decomposed into component information requests which the individual data sources can understand; and (b) composition of the results in a manner that satisfies the constraints of the original information request. In the context of the AOS system, which is organized as a federation of ontologies, there is also a need for tools and techniques to support distributed ontologies and the interoperation across them.
The distributed ontology will result in multiple ontology servers. Each server would perform a core set of standard interface functions such as:
However, each server should also be able to use inter-ontological relationships that capture the overlap across ontologies and translate queries expressed using terms in one ontology into terms from another ontology. Algorithms have been proposed that perform query re-writing across ontologies, and also compute loss of information accrued due to these translations. Some research projects which have come up with partial solutions for these problems are the ONION system at Stanford and the OBSERVER System.
The above problem has been extensively researched in the context of multi-database query processing for over a decade. Some examples of distributed query processing are the Carnot, Mermaid and Interbase systems. The InfoSleuth/
An interesting perspective on the distributed query processing problem is the “push” or the notification/subscription problem. Various algorithms and techniques have been proposed for document filtering and triggers have been implemented in relational databases. The InfoSleuth/ EDEN system implements a ontology based subscription/notification mechanism where users can identify relevant ontological concepts in their profiles and the appropriate data (documents, information) can be forwarded to them as soon as it becomes available in the system.
Techniques that use learning algorithms to automate the annotation of text and image documents, and support data mining and knowledge discovery can be classified into the following broad classes of software: (a) Decision Tree based learning algorithms, e.g., C4.5; (b) Neural Network based approaches; (c) Statistical Clustering based approaches, such as K-means, hierarchical, which in turn are based on Vector space based indexing of text documents, e.g, Latent Semantic Indexing; and (d) Data Mining approaches such as association rule mining.
We envisage the AOS System as a highly distributed system that consists of ontologies, knowledge bases, KOSs, query processors, and relational and textual data repositories available to the system on the Internet. The underlying communication and component infrastructure used by these systems to communicate and coordinate with each other is critical for the success of this system
Recently there has been a lot of work in the area of distibuted computing technologies such as J2EE and . NET component models. The essential feature of these component architectures is that important functions like database mappings and access, transaction and concurrency control, etc. are managed by “containers” which are implemented in conformance with industry-wide standards and specifications. The components in these architectures encapsulate data and knowledge repositories and functionality such as query processing, etc. The container is responsible for communication and message passing using RMI or some other communication protocol.
Some interesting alternatives to the object and component based infrastructures are the new emerging standards of internet-based web services such as WSDL, UDDI and SOAP. The AOS System architecture can be visualised as a collection of services offered by various components of the system and communicating by passing messages based on the SOAP protocol. Alternatively agent infrastructures such as the InfoSleuth agent system and the FIPA agent shell can also be explored for this purpose. It is important to note the various standards for encoding messages in a variety of markup languages, such as:
In this section, we discuss the back end technology used to store and manage data under the various KOSs on the one hand, and ontologies and knowledge on the other. It may be noted that the same technology (DBMS) may be used to store and manage both ontologies and data in the AOS system.
A database management system is needed as a basis for physical implementation of AOS in order to facilitate rapid operations such as search and retrieval, and to utilize the benefits of commercial database management systems such as transaction management, security, and integrity control. Object database management systems (ODBMS) offer interesting advantages for storing ontologies, as the object structure and taxonomic nature of ODBMS parallels the organization of an ontology. Relational database management systems (RDBMS) are better established commercially, and more widely supported. Techniques exist for mapping object structures into relational databases. However, the question of what database architecture is best for AOS is an implementation detail. AOS needs to be defined in terms of standards, and the physical implementation of these standards can be left to the choice of the local implementation. Some well known DBMS products are: Oracle, Sybase, MySQL, SQL Server (relational) and ObjectStore, Versant, Poet (object oriented).
Knowledge Management Systems consist of standard Knowledge Base technology as well as systems used to support emerging web standards such as RDF(S) and DAML+ OIL. It also includes systems designed to manage knowledge present in thesauri and vocabularies. Examples of systems of the first kind are KL-ONE systems such as LOOM, CLASSIC, BACK, etc. and RDF-based systems such as the ICS-FORTH system, SESAME, RDFDB, etc. Examples of systems that support management of thesauri are LEXICON, MultiTes and Knowledge Map.
These systems are geared towards storage and management of content available on the web such as textual data, images and web pages. Well known CMSs available today are Verity, Documentum, Isis, Basis, etc. There are systems that support template and data driven websites such as those using web data stored in a database system and served using presentation templates such as JSPs and ASPs.
The User Interface of the AOS System is crucial in the sense that it will determine the success or failure of the system. We try to identify the various user groups that will be using the AOS and also explore technologies for visualisation of knowledge, queries and results in the system.
The following categories of users were identified at the workshop:
Existing visualization tools can be used to browse and edit the ontology using 2D graphics (and possibly 3D/VRML based interfaces) or graphical user interfaces based on conventional components such as outliners. Much more work is needed to develop appropriate interfaces that will assist end-users in navigating the ontology. Most users would be unfamiliar with the concept of browsing ontologies, and much of the details of the ontology would need to be hidden (especially with intermediate and advanced description logics). Alternatively visualization tools can also be employed to enable users to interactively identify portions of the ontology they are interested and incrementally refine the query based on the results retrieved. If there are a large number of results returned, innovative visualization techniques can be employed to present a categorized view of results based on the ontological concepts they can be classified under.
Security considerations will depend on the needs of the organizations participating in the Agricultural Ontology Server project. Though the importance of this issue was recognized, it was not discussed in detail at the workshop.
The following priority list was generated from the breakout session:
The funding and organizational issues related to the project are discussed next.
It was decided that the project should stay within the focus that is described in the project proposal. This includes the agricultural domain but also other food-related domains like fishery and forestry. The AOS should aim to provide a global reference terminology for the agricultural domain and propose a commonly agreed conceptualisation.
This proposed common agreement requires that multiple parties working in the domain should be able to join the project consortium. Partners can bring in their own KOSes. For example CABI Publishing could participate in the project and could bring in their thesaurus. The AOS is a collaborative effort with FAO acting as a focal point and the secretariat for the project. The following roles were suggested as a possible participation scheme:
The project proposal should clearly identify possible participants and stakeholders in the domain. FAO could take the role of a facilitator or integrator. The project itself is a distributed efforts. The intellectual property question involved in this could be not clarified. Participants can benefit from the project through efficiency gains through collaboration as they can benefit from the shared results. Credibility is also enhanced by being backed by FAO. The project should be open to all participants that have proper authorization.
We need to determine whether a KOS can be used or not. There are several restrictions due to licensing, copyright, scope, partnership, etc. For instance if a given KOS is going to be used to build the
The project proposal should be refined and extended in the following year to provide a version that can be passed on to senior management and to serve as a project proposal to the EU and other governmental bodies that provide project funding. The project itself should start end of 2002 and might take 3-4 years. This proposal should clarify advantages of the project, face all project dimensions, identify usiness opportunities, address main stakeholders and their individual intellectual property rights. Participants should clearly understand from this document what goes in and what comes out of this project. The refined proposal should be finished in the 3rd quarter of 2002.
Project requirements will be laid out further by a revised version of the project proposal. This proposal should take a user-centric perspective considering user requirements and business objectives. Benefits should be made clear for each category of users:
User requirements should be considered for each category of users. Additionally it should constantly be reflected whether or not the proposal is still consistent with its mission. This reflection and further clarification of the project and user requirements could be reached by implementing a prototype. This prototype could also be used to assess a cost/benefit ratio. This prototype could also consist of multiple targeted case studies. The technical aspects of the project should consider knowledge maintenance issues and address possible evolving standards.
Initial funding exists to go on. The project itself could be funded by the following funding categories:
The funding strategy should consider all costs of the project, e.g. personnel, expert assistants, administrative aspects etc.
This workshop was initiated by FAO with the expectation to establish collaborations and set up a team of experts for a project proposal. It demonstrated both interest in the