Find the DATA You need ... more easily with Google Dataset Search!
Data sets and related information tend to be spread across multiple data repositories on the web. Governments, scientific publishers, researchers, data providers (both individual providers and data repositories) publish data for fields ranging from agriculture and climate science, life science, social science to high-energy physics and more. In many cases, information about these data sets is neither linked nor has it been indexed by search engines, making data discovery often frustrating or, in some cases, impossible. Easy access to data sets and to its provenance on the web is critical in order to facilitate reproducibility of research results (thus enabling scientists to build on others’ work), and to boost returns on investments traceable in different directions. |
In order to facilitate the universal accessibility to and increase discoverability of datasets through a single interface, in September 2018, Google launched a Beta version of a GOOGLE DATASET SEARCH Engine, - now available alongside other specialized Google’s search engines.
A Google DATASET SEARCH engine aims to create a Data Sharing Ecosystem that will encourage data publishers and users to follow best practices for producing, storage, consuming, citing and discovering of datasets.
SOME TECHNICAL ASPECTS...
To provide a confederated search point … | … for the millions of web pages that host datasets, - the DATASET SEARCH function relies on structured data embedded in web sites. |
To embed metadata within the coding of each web page … | … that offers data, - Google has adopted the open source standard for structured data schema.org that is based on an effort recently standardized at W3C (the Data Catalog Vocabulary), and which includes such dataset description as: - who created the data - when it was created - terms of use, etc. [Developers can contribute to expanding schema.org metadata for datasets, providing domain-specific vocabularies, as well as working on tools and applications that consume this rich metadata]. |
To help data providers describe their datasets … | … in a structured way, enabling Google and others to link this structured metadata with information describing locations, scientific publications, or even Knowledge Graph, facilitating data discovery for others, - Google Search has published new guidelines. |
“… search engines improve most quickly when a critical mass of users is there to provide data on what they’re doing” (Google launches new search engine to help scientists find the datasets they need | The Verge).
Nevertheless, before search for data becomes as seamless as it should be – a number of technical challenges still remain, such as:
Defining and identifying more consistently what constitutes a dataset:
| Describing content of datasets:
|
Relating datasets to each other and propagating metadata among related datasets:
|
- AGRIS: providing access to agricultural research data exploiting open data on the web
- AGORA: Access to Global Online Research in Agriculture
Dive deeper:
- Google unveils search engine for open data | Nature
- Data on the Web? Here’s How | W3C Blog
- Data on the Web Best Practices | W3C
- Spatial Data on the Web Best Practices | W3C
- Making it easier to discover datasets |The Keyword, Google
- re3data.org (Registry of Research Data Repositories)
- Data Interoperability: The Land Portal experience of Open Data management (recorded GODAN Webinar)
- Discover open Land Linked Datasets shared by the Land Portal
- AgroPortal: a backbone for data integration and standardization in Agronomy
- FAIRsharing : Find, Register, Claim your Standard, Database, and Policy... FAIRsharing will do the rest!
- About Federated Research Data Infrastructures (FRDI) : Knowledge Exchange Report (2017)
- Improving Geospatial Data Search
- Using Visual Search to Find Geographically Similar Features on Satellite Imagery
- A Data Citation Roadmap for Scholarly Data Repositories
- Unique Identifiers for individuals: Enter once | Reuse often: What does this mean to research institutions? (recorded ORCID Webinar)
- AfricaConnect2 makes it possible! ICT Infrastructure, stable Internet connectivity and support services crucial for Open Data sharing across disciplines
- Implement effective Open Data & Keep it alive with Open Data TOOLKIT
- SWIB18 – 10th Semantic Web in Libraries Conference (26-28 November, 2018)
Keep up-to-date by signing up for AIMS News, follow @AIMS_Community on Twitter.
And, thanks again for your interest !