The Research Data Alliance (RDA) recently held a hands-on webinar on aggregating and mapping metadata.
The purpose of this webinar was to familiarise the participants with the aspects of aggregation, a process found in most science and humanities related projects (projects that include some form of data gathering of metadata capture). Metadata aggregation allows sharing of content across distributed collections, supports one-stop searching, benefits users and increases the exposure of collections.
Most metadata aggregation models allows metadata pulling from many sources into a single location. This presentation was done by Mr. Dimitris Gavrilis, Digital Curation Unit – IMIS, Athena Research Center.
This webinar offered users an easy entry point explanation to the concept of aggregating and mapping metadata. His working definition was that metadata aggregation is a process where metadata records from different sources and in different formats are aggregated into 1 common format. He, thereafter, mentioned the following key concepts, digital repository, digital record, metadata record, metadata schema, XML and RDF.
Regarding the metadata schema, Mr. Dimitris explained through examples how to convert from one schema to the next. Using the example of MORe (Metadata & Object Repository) the process of harvest, transformation and enrichment at an object level was demonstrated.
In metadata aggregation and mapping the following data quality issues are common - missing information, loss of information due to the mapping process and conceptually wrong mappings. However, there are common quality metrics employed such as completeness, data accuracy, consistency, appropriateness, and auditability (the ability to trace the record to its original form). The most common quality intervention is metadata enrichment, this can be done through use of vocabulary matching and also spatial tools.
The presentation can be viewed here:
AGRIS use case
In the agricultural domain, the International System for Agricultural Science and Technology (AGRIS) is an example of metadata aggregation and mapping. AGRIS is supported by a community of data providers, partners and users. AGRIS ingests bibliographic metadata provided by the community and publishes it as open data; the metadata is captured through either
- pulling data through harvesting from clients or
- by data being pushed to the AGRIS from clients.
AGRIS uses various tools and technologies to consume metadata from content providers and accepts any metadata records that meet the Meaning Bibliographic Metadata (M2B) standards. AGRIS’s data providers come from an international audience, with users often at varying stages of technological development.
The figure below summarizes the AGRIS data workflow, ingestion and processing.
The process of how AGRIS aggregates metadata was well explained here, will the data consumption workflows and data validation explained. Bibliographic records are often static and usual void of sufficient information to answer a user’s query.
In 2011, AGRIS moved to RDF and Linked Open Data has enabled enrich AGRIS records by taking advantage of alignments of AGROVOC and other knowledge organizations systems. To date AGRIS mashup shows a richly enhanced visualisation of core metadata linked with related content on the web.
- Research Data Alliance Webinar page
- Celli F, Malapela T, Wegner K et al. 2015 [version 1; referees: 2 approved] F1000Research 2015, 4:110 (doi: 10.12688/f1000research.6354.1)
- Celli F, Jaques Y, Anibaldi S, et al.: Pushing, Pulling, Harvesting, Linking: Rethinking bibliographic workflows for the semantic web. EFITA-WCCA-CIGR Conference, Turin, Italy, 24–27 June 2013. 2013
- Anibaldi S, Jaques Y, Celli F, et al.: Migrating bibliographic datasets to the Semantic Web: The AGRIS case. Semantic Web. 2015; 6(2): 113–120