Crop Ontology: harmonizing semantics for phenotyping and agronomy data

The Crop Ontology project is the creation of the Generation Challenge Programme GCP of CGIAR,  which understood from its inception the importance of controlled vocabularies and ontologies for the digital annotation of data. The Crop Ontology was presented at the IGAD Pre-plenary meeting before the RDA 7th Plenary, by Bioversity International scientist and Crop Ontology project leader, Elizabeth Arnaud, on 28 February, 2016, at the University of Tokyo (Japan).


To improve crops and adapt them to changing environments, researchers need access to a broader pool of information on plant genetic diversity, which not only explains the interaction between a genotype and its environment, but also identifies its genetic basis and heritability of its adaptive traits. In order to facilitate the data integration and sharing for such data in multi-crop platforms, controlled vocabularies and ontologies should be used to annotate genetic and phenotypic data.

 

In this view, eight CGIAR centers and their national partners - under the leadership of Bioversity international - developed the Crop Ontology (CO).

In particular, the CO project is the creation of the Generation Challenge Programme (GCP), which understood - from its start - the importance of controlled vocabularies and ontologies for the digital annotation of data.

The CO - otherwise known also as a vocabulary for crop-related concepts - is a service of the Integrated Breeding Platform  (IBP) as well as a provider of controlled trait description for the Breeding Management System (BMS).

The IBP is a web-based solution (with CHADO phenotypic databases) for crop breeders, where registered users (at the forefront, breeders, geneticists) can access purpose built tools to manage their plant breeding programmes, obtain support and professional services, find new knowledge, access training resources and discuss pertinent issues with their peers in various communities of practice. Researchers can also submit/deposit trait names using the curation/annotation tool ‘add an ontology’ to increment the tool’s capacity.

The BMS is the centre-piece of the IBP and represents a suite of interconnected breeding software tools and related databases specifically designed to help breeders with project planning, data management, statistical analysis and decision-making in their integrated plant breeding programmes. 

The CO houses standard (harmonized and validated) lists of crop trait names, measurement methods and scales for breeders’ field books and crop information systems as well as variables for currently 20 crops, namely: cassava, banana, barley, chickpea, common bean, cowpea, groundnut, lentil, maize, oat, pearl millet, pigeon pea, potato, rice, sorghum, soybean, sweet potato, vitis,  wheat and yam. As an open-source tool, CO  provides scientists and breeders with a common language that describes crop phenotypes and interprets descriptions provided by farmers for the performance of the varieties they prefer.

In ontologies, all terms bear a particular, logically defined relationship to each other, allowing computational reasoning on data annotated with a structured vocabulary. The volume of agriculture-related information and terminology related to phenotype, breeding, germplasm, pedigree, traits, among others, is increasing exponentially. The concepts of the CO are being used to curate agronomic databases and describe the data.

The use of ontology terms to describe agronomic phenotypes and the accurate mapping of these descriptions into databases is important in comparative phenotypic and genotypic studies across species and gene-discovery experiments as it provides harmonized description of the data and therefore facilitates the retrieval of information.

As communicated at the IGAD Pre-plenary meeting (28 February, 2016), in the near future the CO will be completed by an Agronomy Ontology (AgrO) that currently compiles 350 variables selected out of the set of the variables produced by the International Consortium for Agricultural Systems Applications (ICASA).

All of the CO traits are used as the basis for the creation of the breeders’ fieldbooks of the BMS. The NextGeneration Breeding Databases developed by Boyce Thompson Institute for banana,  cassava, potato, sweet potato also embed the CO traits. Last but not least, the CO is currently being used for data annotation by the International Cassava Database, Wageningen University and Research Centre, and the CGIAR Research Program on Climate Change, Agriculture and Food Security’s repository of evaluation trials.

At the present time the CO contributes to the content of the reference ontologies of the NSF-awarded project Planteome (Planteome site beta version 1.0) to contribute improving the reference ontologies for plants by mapping the crop traits of CO to reference ontologies. Additionally, the CO is a partner of the pilot project Agroportal (released a beta version 1.0) developed by LIRRM, IRD and Bioversity.

The Crop Lead Centers and breeders from different regions produced Eighteen Breeder Trait Dictionaries (TD), in order to facilitate access to the data held within and/or across the databases. Dictionaries for breeders' fieldbooks and a CO to facilitate the harmonization of the data capture and powerful manipulations of the data through ontology-driven queries. This is a development that raised interest in CGIAR Centres and other communities, like the Gramene team developing the Plant Trait Ontology, ecologists and semantic web developers holding vast quantities of agriculture-related data.

A new Trait Dictionary template v5 (TDv5) was released in 2015 that now includes the ‘standard variable’ composed by a property, a method and a scale and needed to accurately annotate the measurements stored in the databases and a support the creation of standard electronic fieldbooks. This template is currently tested by INRA to describe the Wheat traits of Ephesis database.

At the technical level, the CO adopted the RDF language - a standard model for data interchange on the Semantic Web - facilitating data merging even if the underlying schemas differ. With this addition, the CO is capable of storing various formats of controlled vocabularies ranging from semantically rich ontologies to simpler tabular based structures such as the Comma-Separated Values (CSV). The Ontology curators are able to upload a full ontology in OBO format, create it online, add attribute information, and submit or delete terms from the CO.

In order to make reusable CO concepts by third party database or web site, an API has been developed (e.g. Agtrials, Phenotyping platforms, etc). 

(Source: An Online GCP CO for Annotating Trait Data Useful for Plant Breeders).

The CO site is hosted on Google App Engine and the versioned code is hosted on GitHub.

The Crop Ontology Curation Tool is still under development, so your feedback will help to improve it. Ensuing from the need to characterize crops using farmers’ opinions, as in participatory plant breeding or pre-breeding programs, the CO is developing further by looking at how to integrate citizen science.

Guidelines for the CO requirementsare available at the Crop Ontology wiki.

9-13 May,  2016 (Montpellier France): Crop Ontology Community – in preparation! Workshop on Harmonization, Semantic and Integration of Phenotypic and Agronomic Data.


Sources:

Crop Ontology Community Web Site

Crop Ontology

Bridging the phenotypic and genetic data useful for integrated breeding through a data annotation using the Crop Ontology developed by the crop communities of practice (article, 2012)

Multifunctional crop trait ontology for breeders' data: field book, annotation, data discovery and semantic enrichment of the literature (article, 2012)