Big Metadata : prioritizing next steps to advance Metadata Research in Data Science

(Image sources: Why you need metadata for Big Data success ; Big Data vs. Metadata: What’s the Difference?, - LinkedIn)

Have you heard of Big Metadata?

While Big Data offers undreamed-of possibilities to find new data-driven solutions, Big Metadata can be perceived as data that encompasses information about the relationships among data, projects, reports, processes ... 


"Whether it is geographical information, statistics, weather data, research data, transport data, energy consumption data, or health data, the need to make sense of "Big Data" is leading to innovations in technology, development of new tools and new skills", - Digital Single Market.

Metadata provides granular info about a single file while Big Data gives you the ability to discover patterns and trends in ALL of your data... Big Data and Metadata have one very important thing in common: they’re each only as valuable as you make them", - Digital Asset Management Learning Center

"Getting ready for Big Metadata means making certain you have a strategy for adoption of in-memory computing techniques in order to have processing power available", - Data Center Knowledge.

 "strong metadata management process eases Big Data woes", - TechTarget

It is worth noting that there are vocabularies to describe datasets and there are vocabularies to encode the actual records of data (and there are vocabularies that do both but using different classes with different properties). Choosing the right vocabulary/dictionary for both the dataset metadata and the data is essential. And it is important when choosing a tool for dataset management.

So far, most tools do not support a metadata model that caters for different layers of metadata. Data repository tools and dataset vocabularies will hopefully cover this soon. At the moment, it seems that the only dataset vocabulary that covers this is the W3C RDF Data Cube vocabulary.


... Data Science can successfully address our most significant societal challenges, and more fully contribute to the greater good. 

To bring support to this new challenging scenario, the present entry introduces you to an Open Access article :

Big Metadata, Smart Metadata, and Metadata Capital: Toward Greater Synergy Between Data Science and Metadata, by Jane Greenberg, in Journal of Data and Information Science, Volume 2, Issue 3, 2017-08-22 | DOI:

The aforementioned paper :

* identifies factors that challenge Metadata Research in the digital ecosystem,
* defines M
etadata and Data science, and
* presents 
the constituent parts of a Metadata lingua franca connecting metadata to Data Science, with the ultimate goal, - 

to encourage the development of a more cohesive Metadata Research agenda in Data Science.

By the way, “Can you imagine Data Science without Metadata?” Below, you will find the main highlights in support of this rhetorical question, extracted from the previously cited article.


Article sections :

Highlights : 

1. Introduction


Leading Data Science Journals and Conferences have been increasing coverage of metadata research and development (R&D):

​US report entitled The Federal Big Data Research and Development Strategic Plan (NITRD, 2016) stresses the need for research on metadata frameworks to ensure data trustworthiness, and identifies a myriad of metadata-related research topics, many of which are found in similar governmental and disciplinary reports worldwide (e.g. ERAC Secretariat, 2016Ilevbare, Athanassopoulou, & Wooldridge, 2017).

2. Metadata and Data Science defined


Metadata is structured data supporting functions associated with an object, an object being any “entity, form, or mode”.

Metadata functions include: *data discovery, *access, *use, *provenance *tracking, *authenticity and security verification, *preservation management, and other activities throughout the data lifecycle (UK Data Archive). 

The “types” of metadata  - -  descriptive, technical, preservation, provenance, usage metadata, as well as business and technical metadata, process and operational metadata - -  connect metadata to the lifecycle of the digital object being represented or tracked.

Metadata can more universally be thought of as value-added language that serves as an integrated layer in an information system. 

Data Science as the “scientific study of the creation, validation and transformation of data to create meaning” (The Data Science Association).

Data Science endeavors rely not only on data, but accurate description of the data - hence metadata

3. Challenges to Metadata Research


Metadata research faces impediments in Data Science, and other disciplines ... due to:

1.) The Utilitarian Nature of Metadata : 
Metadata is generally viewed as a practical application relating to cataloging, indexing, database development, and the recording of digital transactions (Riley, Understanding metadata, NISO, 2017).

However, seeking pragmatic solutions with metadata is vital to nearly any digital undertaking.

 2.) Historical and traditional perceptions of metadata :
Metadata is a rather simple concept that doesn’t seem to require scientific study” (Visual business intelligence, a blog by Few, 2017)

4. Metadata Concepts relating to Data Science:

Big (Meta)Data 

Smart Metadata

Metadata Capital


1.) Big Data (The 5 Vs Everyone Must Know)

Big Data in the Data Science Framework is warranted inasmuch as it helps define Big Metadata reflecting the wide range of data lifecycle activities found among projects, settings, systems and processes. 

Big Metadata may be extremely helpful to understand and track better different Data Lifecycle scenarios extending from simple (data creation, capture, storage, and preservation) to complex (data use, reuse, repurposing, and modification).

2.) Metadata is inherently Smart Data because it:

  • provides context and meaning for data (e.g. interoperable metadata, metadata as an enabler or characteristic of the Semantic Web and Linked Data, accessibility, and ontologies); 
  • enables an action that draws on the data being represented or tracked. 

Principles of  Smart Data:

  • Good quality = Trusted metadata. Trust connects across all principles, although it primarily links with quality and preservation.
  • Accessible, along with data being represented, to support data-driven activities.
  • Actionable :  smart metadata can be ingested and understood by humans and/or machines
  • Preserved : Smart metadata is preserved in a useful manner, to identify data patterns over time. Metadata must be preserved by a trusted, dependable source; this includes the preservation metadata vocabularies, such as data dictionaries and attribute descriptions. 

A related aspect of Smart Metadata is the alignment with smart technology, including smart, mobile devices, and appliances. 

3.) Metadata Capital as an asset (who, what, where, when, how, why, etc.?) with value which can be captured via metadata attributes. The value may be financial, intellectual, social, or defined in other ways.

5. Summary and Conclusions


“… it is logical to conclude that metadata innovation ought to have progressed in tandem with advances in Big Data and Data Science”.

Drawing on this broader context of metadata value, the article aims to encourage researchers to consider the significance of Metadata as a highly research-worthy topic within Data Science and the larger digital ecosystem.

P.S. And which (Meta)Data Topics are most pressing to pursue ... for You? 

"Worldwide, data can be quickly generated, analyzed and used.  Experts think that the wise use of data will be one of the most important tools for achieving the United Nations’ Sustainable Development Goals (UN SDGs)", - FOOD SECURITY CENTER.

"Data is much more than simply information: in expert hands, it is intelligence", - CGIAR Platform for Big Data in Agriculture 2017-2022

Related content: 

METADATA 2020 : towards Metadata as the Scholarly Community’s top priority

Add comment

Log in or register to post comments