Given the relevance of the topic to the new CODATA effort on Coordinating Data Standards among Scientific Union Working Group, in October 2016, the Wellcome Trust published a report Interoperability Standards - Digital Objects in Their Own Right. The report focuses on the wealth of content standards - available in the life and biomedical sciences - ensuring that digital research outputs are FAIR.
Standards are agreed-upon conventions for doing something, e.g. managing a process or delivering a service, and are established by community consensus or an authority.
Interoperability standards enable the operational processes underlying exchange and sharing of information between different systems to ensure all digital research outputs are Findable, Accessible, Interoperable and Reusable, according to the FAIR principles. A number of interoperability standards focus on the descriptions/metadata of digital objects (DO) which can refer to datasets, code, algorithms, workflows, models, software, or journal articles.
Within interoperability standards there are content standards (e.g., Reporting guidelines/Checklists, Models/Formats/Syntaxes, Terminology artefacts/Semantics), which open DOs to transparent interpretation, verification and exchange. Putting a specific emphasis on enhancing the ability of machines to automatically find and use DOs - in addition to supporting their reuse (by individuals) throughout DO life cycle - these standards are important to ensure all DOs are FAIR.
After a content standard is mature, it should be channeled to the appropriate stakeholder community that should recommend this standard (in Data Policies/Data Management Plans) or use it to facilitate a high-quality data life cycle.
To introduce the landscape of content standards (in the life and biomedical sciences) and to highlight a number of issues, which should be addressed to uptake a vision of integrable content standards becoming part of the research data management enterprise of the future, in October 2016, the Wellcome Trust released a report Interoperability Standards - Digital Objects in Their Own Right.
Over and above all, the report recommends to:
- Recognize interoperability standards as digital assets/DOs in their own right, with their life cycle, associated research, development and educational activities;
- Research new or apply existing methods to develop, extend, refine and harmonize interoperability standards (and also related tools and educational materials) on specific domains, within and cross-disciplines.
Optimal interoperability is achieved when access and use of data and other DOs is completely automated, and accessible to both human and machine. This requires standardized:
- identifiers (see, e.g., FORCE11resource identification initiative, Permid.org, Schema.datacite.org) and
- descriptions/metadata (including the accessibility level of the information and/or license type) for each DO.
These unique, resolvable and versionable identifiers and good structured and standardized metadata would then need to be widely used and shared, i.e. to be implemented by an array of registries, catalogues, databases and services needed to find, store, manage and aggregate these DOs. That means that long lifetime and the sustainability of content standards are best supported by their wide acceptance and adoption (also through cross-walking), and the continued participation of new groups.
Life and biomedical sciences increasingly require effective ways to find, access and (re)use data and related DOs (e.g. code, software). Existing mechanisms used by software repositories, languages and in scientific domains are heterogeneous and there is not a common standard.
The ‘Interoperability Standards - Digital Objects in Their Own Right’ report provides exemplars of community-driven metadata standards efforts and illustrates the value of synergies between more specific life and biomedical sciences and generic cross-domain metadata efforts.
An example of such efforts is the CodeMeta that brings together leaders of software and data repositories with academic researchers to develop a ‘crosswalk table’ that would translate the diverse metadata currently used. This effort intersects and works with related initiatives, including (but not limited to) the Force11 Software Citation Working Group, the SSI and WSSSPE.
Another initiative is Bioschemas (focused on different DOs, including tools, training material and datasets, but also organization, events and more) encouraging the use of the schema.org and coordinating its extension in the life science area. Bioschemas brings together members from a variety of communities, including ELIXIR and the ELIXIR-UK Node, Pistoia Alliance, Goblet, BioSharing, BBMRI and the EMBL Australia Bioinformatics Resource.
The data-related activities in Bioschemas are done also in synergy with the NIH BD2K bioCADDIE project - a community-driven effort creating a data discovery index for PubMed - DataMed. The BD2K envisions the creation of the Commons - a shared virtual space for FAIR DOs, including interoperability standards.
The Force11 Data Citation Implementation Group - a set of diverse stakeholders and organizations (including DataCite and CODATA) behind the Joint Declaration of Data Citation Principles - has agreed to a set of minimal requirements for repositories to implement a landing page with metadata supporting data citation.
While very few community-developed content standards are known in other disciplines (as listed by the JISC DCC directory), over a thousand exist in the life, environmental and biomedical sciences. In these areas BioSharing (operating as an open WG under Force11 and the RDA) is building a comprehensive curated resource that maps this landscape and provides a list of recommended standards.
BioSharing ensures that standards are findable and accessible (according to the FAIR principles). BioSharing also provides the indicators necessary to monitor the development, evolution and integration (by interlinking) of standards, databases and data policies, as well as guides users to discover these resources.
The FAIR-supporting EXCELERATE interoperability backbone brings the aforementioned ELIXIR together with other biological and medical research ESFRI infrastructures sharing the same data management and sharing principles.
Besides providing information about landscape of content standards in the life, environmental and biomedical sciences, the report presents also:
- an extensive range of communities involved in these standardization efforts;
- several challenges that both producers and consumers of content standards needs to be aware of;
- a list of key needs to tackle interoperability and sustainability of content standards.
Among these latter, the report accentuates particularly the need for:
- a portal for discovery of standards, mapping the landscape;
- formal indicators and evaluation methods to measure standards usage and usability;
- open-source infrastructures, tools and services to overcome technical and social challenges throughout the standards life cycle;
- education, documentation, hackathons, training and courses materials (and events) targeting both producers and consumers of standards, and set to create a new career path.
Click here to read the full report
Click here to read set of reports released on related topics