GACS structural survey and hierarchy scenarios

The Global Agricultural Concept Scheme (GACS), version Beta 3.1, was released in May 2016. One of the known weaknesses of GACS in its current form is the concept hierarchy (BT/NT relationships). The hierarchy was formed by automatically merging parts of the concept hierarchies of the three source thesauri (AGROVOC, CAB Thesaurus and NAL Thesaurus) and performing only a small amount of manual cleanup afterwards. The result is a somewhat inconsistent hierarchy with over 600 top-level concepts, a relatively large amount of polyhierarchy, and different principles of hierarchical organization mixed together

Cleaning up the hierarchy is an obvious task for the next development phase of GACS. In order to start that work, the community around GACS needs to establish goals and principles for the hierarchical organization. The GACS working group will then implement them using a combination of manual and automated methods.

In order to define the principles, we have created three alternative scenarios for the GACS hierarchy, representing different styles of hierarchical organization commonly used in thesauri or other concept schemes. By participating in the GACS Structural Survey, you can express your views about which of these scenarios would be most desirable for you as well as comment on other aspects of GACS. The survey is open until 20 November 2016.

Hierarchy scenarios

To illustrate the different hierarchy scenarios, a diverse selection of 30 GACS concepts was first chosen. The selection includes many types of concepts, but no geographic locations nor taxonomic organisms (except for mammals), because there are conventional ways of arranging these kinds of concepts in a hierarchy (by region or type and a taxonomic hierarchy, respectively). Many, but not all, of the selected concepts are among the most used concepts in the main bibliographic databases maintained by the original GACS partners (FAO/AGRIS, CABI/CAB Abstracts, NAL/AGRICOLA) and thus very important in the domain of agriculture.

The chosen concepts are, in alphabetical order:

animal diseases
cattle
chemical composition
fertilizers
fish culture
flavour enhancers
forestry
forests
genes
genetic variation

growth
hunting
laboratories
lignite
mammals
mathematical models
mortality
nitrogen
parasites
pastures

pathogens
pH
plant breeding
plant physiology
soil
soil science
styrene
temperature
wheat
zoologists

Each of the hierarchy diagrams below illustrates the location of these 30 concepts in a particular type of hierarchy. The selected concepts are shown in yellow, while other concepts that form the hierarchy are light blue. Top concepts are in the leftmost column except for scenario C, where they are in the first light blue column.

Note: The diagrams don't quite fit the layout of this blog - please click on the diagrams for a larger view. The diagrams can also be found attached to this post as PDF files.

Scenario A: Focused top level hierarchy

In this scenario, the top levels of the hierarchy divide the concepts into high-level categories based on generic IS-A relationships. The top level is very focused, consisting only of 3 top concepts that are not intended for indexing.

GACS hierarchy A

Scenario B: Type-based top level hierarchy

In this scenario, the top level consists of a small number (10-20) of concepts that function as types (or facets) for the underlying concepts. As in scenario A, the top level is not used for indexing and lower hierarchy levels are organized based on generic IS-A relationships.

GACS hierarchy B

Scenario C: Shallow hierarchy of thematically classified concepts

In this scenario, the actual concept hierarchy (of BT/NT relationships) is intentionally shallow, with consequently a very large number of top level concepts. Those top level concepts would in turn be classified into thematic groups (shown in dark blue), which are separate from the concepts and not intended for indexing documents.

GACS hierarchy C

References

All three hierarchies are based on existing examples of hierarchical organization found in various thesauri and other concept schemes, applied to GACS concepts.

Scenario A is based on the General Finnish Ontology YSO, whose top-level hierarchy is in turn inspired by DOLCE, a linguistic/cognitive upper ontology. Another thesaurus based on the DOLCE upper ontology is the EARTh Environmental Thesaurus.

Scenario B is heavily based on AGROVOC, which has a type-based (or faceted) upper level structure currently consisting of 25 top concepts.

Scenario C is a classified thesaurus structure that most resembles the CAB Classified Thesaurus (last published in 1999). The thematic groups/classes used for organizing are the ones that currently exist in GACS and originate in the UAT project. The hierarchical relations between concepts were manually pared down to a minimum by retaining only those hierarchical relations that exist in all three source thesauri. The result also resembles the NAL Thesaurus, though it has only a single level of 17 thematic classes, as well as the UNESCO Thesaurus and the STW Thesaurus for Economics, both of which include a hierarchical thematic classification of concepts.