Assigning METADATA as method to support DIGITAL DATA CURATION in trusted repositories

14.09.2017

Assigning METADATA as method to support DIGITAL DATA CURATION in trusted repositories

(Image sources: ©Open Source Guide to ESI; ©The University of Queensland ; © Boston College Libraries; ©SlidePlayer: Research Data Management Overview and Introduction)

GOOD METADATA is key for data access and re-use.

Even though figuring out precisely what metadata to create, to apply, and to capture

- in order to support Digital Data Curation ongoing processes -

is a complex task, - fortunately, many communities of practice have formalized different metadata specifications and standards tailored to a number of different needs or functions.

Nevertheless ... these specifications and standards are only effective or F.A.I.R. (aiming to increase Findability, Accessibility, Interoperability of the data) if people know about them, and use them in a coordinated manner...

This essay provides overview about Digital Data Curation, metadata on its support, and links for futher reading.

Digital Data Curation Lifecycle & Metadata on its support

Digital Data Curation involves maintaining, preserving and adding value to digital data/digital object throughout its lifecycle.

Digital Objects: simple digital objects (discrete digital items such as text files, image files or sound files, along with their related identifiers and metadata) or complex digital objects (discrete digital objects made by combining a number of other digital objects, such as websites), - see DCC Glossary.

The DCC Curation Lifecycle Model Download the Curation Lifecycle Model

provides a graphical, high-level overview of the stages required for successful curation and preservation of data/digital object from initial conceptualisation, capture or receipt through the iterative curation cycle.

This ideal lifecycle model - - as is thought out at the start of a data management project and planned for throughout - - presents granular functionality of three sets of actions and can be used to plan activities within your organisation, to ensure that all of the necessary steps in the curation lifecycle are covered.

FULL LIFECYCLE ACTIONS [support curation throughout the data's lifecycle]:	Description &Representation Information Assign administrative, descriptive, technical, structural and preservation metadata [as a subset of technical or administrative metadata], using appropriate standards, to ensure adequate description and control over the long-term. Collect and assign representation information required to understand and render both the digital material and the associated metadata.
	Preservation Planning
	Community Watch & Participation
	Curate & Preserve
SEQUENTIAL ACTIONS [need to be undertaken sequentially to ensure successful curation]:	Conceptualise
	Create or Receive: Create data including administrative, descriptive, structural and technical metadata. Preservation metadata may also be added at the time of creation. Receive data, in accordance with documented collecting policies, from data creators, other archives, repositories or data centres, and if required - assign appropriate metadata.
	Appraise & Select
	Preservation Action: Preservation actions should ensure that data remains authentic, reliable and usable while maintaining its integrity. Actions include data cleaning, validation, assigning preservation metadata, assigning representation information and ensuring acceptable data structures or file formats.
	Store
	Access, Use & Reuse
	Transform
OCCASIONAL ACTIONS [may be undertaken ocassionally if certain situation arises]:	Dispose Reappraise Migrate

Depending on method for Data Management & Curation that is being chosen/followed

- - according to the policies and procedures that underpin curation within an organisation - -,

one can map <roles and responsibilities> in delivering curation, and the <framework of standards and technologies> that support it at any point of the Curation Lifecycle Model.

This approach can be helpful in identifying gaps in digital curation planning, in undertaking mitigation actions, and in documenting adequately all workflows and resources required to support specific digital curation processes. In other words, users may enter at any stage of the digital curation lifecycle depending on their current area(s) of need, when, e.g.:

Considering curation from the point of ingest,
Refining the support offered during the conceptualisation and creation processes,
Improving data management in all its stages, including long-term preservation.

Some good practices in research data management:

Research data management at the Unversity of Queensland
Data Management at MITLibraries
The LibGuide of the University of Witwatersrand provides resources on support of Trusted Digital Repositories: Preservation, Curation and Data Management
Göttingen eResearch Alliance about eResearch related questions and Data Management issues

ICPSR Digital Preservation Policy Framework that reflects the seven attributes of a Trusted Digital Repository (TDR):

(1) Open Archival Information System (OAIS) Reference Model (2012) compliance
(2) Administrative responsibility
(3) Organizational viability
(4) Financial sustainability
(5) Technological and procedural suitability
(6) Systems security
(7) Procedural accountability.
The mapping of ICPSR's preservation process to OAIS is synthesized in Digital Preservation Requirements Applied to ICPSR.

The active digital data curation (in TDR) reduces threats to long-term data value (including data quality) and mitigates the risk of digital obsolescence:

Data Management & Permanent Access to Digital Research Resources : learning from DANS Institute
Trustworthy preservation of digital objects in Institutional Repositories of Portugal

Metadata plays an important role in the discovery of information found in digital databases and repositories, library catalogs. Metadata is providing access to the wide variety of resources made available through archival finding aids. Thus, metadata is considered to be the backbone of Digital Curation.

Usually taking the form of a structured set of descriptive elements, metadata assists in the identification, location, processing, tracking, preserving, sharing / re-use and retrieval of data/information resource, while facilitating content and access management.

Without metadata - a digital resource may be irretrievable, unidentifiable or unusable.

Towards alignment of different metadata structures

Although the common goal of (descriptive - structural - administrative) metadata is that to help you to work (both at technical and not technical level) with your data/digital objects, your metadata structures may not match the structures used by others.

That can make it harder to communicate your data, making it less findable, accessible, interoperable and reusable (FAIR). To avoid these problems, many communities have established metadata standards or community specifications/recommendations, for the minimum information that should be collected about data/digital object in order for it to be re-used. You may take a look at:

Some examples of metadata standards promoted/endorsed by different communities:

What are Metadata Standards - - List of Metadata Standards - - search Metadata Standards by Discipline (on the DCC portal)

Preservation Metadata & Technology Watch Report 13-3: Preservation Metadata (2nd edition) (on the Digital Preservation Coalition Portal)
United Nations Archives & Records Management Section (Policies, Standards, Guidelines)

... but these are only effective if people know about them and use them. To cope with this challenge, for instance:

The Research Data Alliance (RDA) has developed a community-mantained version of Catalogue of Metadata Standards and Tools aimed at researchers and those who support them.

In a first phase of work, the RDA Metadata Standards Directory Working Group (MSDWG) took the aforementioned DCC Disciplinary Metadata Catalogue as its base and updated and extended the information contained within it.
Watch the recorded Webinar: Metadata as Standard: improving Interoperability through the Research Data Alliance (RDA).

AIMS community of practice has developed LODE-BD (Linked Open Data enabled Bibliographical Data). On top of nine LODE-BD metadata groups, which are:

- - Title Information - - Responsible Body - - Physical Characteristics - - Location - - Subject - - Description of Content - - Intellectual Property - - Usage - - Relation.,

the metadata normalization/alignment could be accomplished.
LODE-BD nine metadata groups are consistent in encoding both type of Entities and Relationships between Entities.
Being mapped to Dublin Core (& to other) metadata and encoding schemes designed to support bibliographical data on the Web, LODE-BD increases the flexibility and interoperability of the data, and can be seen as one-size-fits-all approach for encoding meaningful LOD-ready bibliographical data concentrated on the data, not on the scheme.

Related:

See DCC Curation Lifecycle Model FAQ for more information.
Also, DCC Digital Curation 101 and 101 Lite courses introduce researchers and data custodians to the stages of the Curation Lifecycle Model.
Digital Preservation Handbook: a strategic overview of the broad issues & tasks involved in preserving digital resources
NISO Recommended Practice on Metadata Indicators for Accessibility and Licensing of E-Content