This interview is complementary to the blog post: "OpenMinTed: new EU project for text and data mining. Behind the scenes."
Ranked the number one agricultural institute in Europe and number two in the world, the French National Institute for Agricultural Research INRA carries out mission-oriented research for high-quality and healthy foods, competitive and sustainable agriculture and a preserved and valorized environment.
INRA is strongly involved in the French Infrastructure for Bioinformatics IFB, which is an Elixir (European life-sciences Infrastructure for biological Information) node.
The main mission of INRA is to produce and enable access to knowledge to the international community of researchers and practitioners in agriculture but also towards policy makers and society. Another INRA mission is to develop innovations and know-how of service to society.
To serve both these missions, several INRA entities - the Scientific and Technical Information Department (DIST), and the Bibliome team from the research laboratory Applied Mathematics and Computer Science from Genomes to the Environment (MaIAGE) - conduct research and develop services in text and data mining on agriculture and biology related material.
The MaIAGE-Bibliome team and the DIST are leading the agriculture uses cases, from requirements to implementation, in the OpenMinteD project (WP4 and WP9), as well as participating in all working groups for the interoperability framework in WP5 and are actively involved in community engagement and training activities, in WP2 and WP3.
Together with other partners, INRA is contributing to build a science-oriented and researcher-centred Open Mining e-INfrastructure for TExt and Data to enhance scientific publications, and to render them discoverable and interoperable through appropriate registries and a standards-based interoperability layer, respectively.
To better understand the role of INRA in the OpenMinteD project, AIMS invited Sophie Aubin (Information and Knowledge Management Officer, INRA) to answer a series of related questions. Sophie Aubin is particularly interested in sharing vocabularies in the agricultural domains in order to enrich textual documents and to enhance their retrieval and data analysis. Sophie Aubin is also involved in the Working Group Wheat data interoperability of the Research Data Alliance (RDA):
With reference to the presentation of INRA on the OpenMinted platform, INRA has the following role in the OpenMinteD Project, i.e. “The MaIAGE-Bibliome team and the DIST will:
1. lead the agriculture uses cases from requirements to implementation in WP4 and WP9;
2. participate in all working groups for the interoperability framework in WP5;
3. be actively involved in community engagement and training activities, in WP2 and WP3”.
Is there already some progress according to these points?
1. We have defined five use cases for the Agriculture & Biodiversity communities.
We held surveys and interviews to collect the needs of potential users of these future services. All use cases are designed to elicit requirements by bringing together the different stakeholders, content providers and scientific communities, text mining and infrastructure builders, legal experts, data and computing centers, industrial players and SMEs. The following use cases are led by INRA, and two additional by AgroKnow.
Microbial Ecosystem use case facilitates the discovery and exploration of content from publications and databases about microbiology biodiversity with a focus on food positive flora, which is microorganisms involved in fermentation (bread, wine, cheese), nutriment production (e.g. vitamin B12) or biopreservation.
Linking Wheat data with literature
This use case addresses needs of the international community of researchers who work on bread wheat. Direct links from genetics, phenotypic and genomic data to scientific publications of interest will be computed by OpenMinTeD and closely integrated into the WeatIS GnpIS node.
Extracting gene regulation networks involved in seed development
This use case will assist researchers in plant breeding, especially those interested in plant reproduction and seed development by providing a better understanding of the molecular mechanism (how plants work). The OpenMinTeD application will be integrated into the FLAGdb++ software used by researchers who need to explore and analyse genetics information and plant genomes (genomes from six plant species are present in the database).
INRA will work closely with the use case leaders to prepare the data, choose components and design workflows, run them, and integrate the TDM results into the targeted applications/services on long-term perspective. These tasks will contribute to the evaluation of the OpenMinTeD platform. INRA is in charge of implementing its own uses cases plus AgroKnow's two that deal with automatic classication of Agris content with specific taxonomies.
2. INRA participates in the working groups respectively on metadata, language resources, licenses, and workflows and tools.
We are currently collecting preferences and habits of the scientific communities represented in the project but also from internal and external experts. Results from national and international initiatives like OpenAire, Clarin, or Metashare are considered in priority. Collected needs are being translated into requirements for the platform design. The most adequate strategies (shared standard, hub, mappings, etc.) will be tested and adopted in order to ensure the best level of interoperability of components of the platform to be designed and implemented.
3. Watch the website at http://openminted.eu/about/ and follow us on Twitter @openminted_eu.
Like each partner, INRA participates to the dissemination of the project outputs by joining events all over the world.
WP3 activities have hardly started.
How INRA’s mission-oriented research (for high-quality and healthy foods, competitive and sustainable agriculture and a preserved and valorized environment) could be enriched through OpenMinteD ongoing activities, and vice versa?
OpenMinTeD ongoing activities are much oriented towards interoperability issues applied to knowledge and language resources, tools and services as well as their relative licenses. In that sense, OpenMinTeD operates as part of the institutional effort towards opening data and making scientific knowledge better shared with partners and other stakeholders of the agronomy, food and environmental domains.
Being able to access concepts and data buried in texts and exploit them possibly in an integrated manner with experimental & observation data will contribute to better understanding and creation of new knowledge in the institute. OpenMinTeD also works as a catalit for the INRA TDM activities in terms of internal and external visibility and technological improvements.
“Our main objective is to come up with an interoperability framework that will allow text mining research communities and service providers to deliver and consume text mining tools in a seamless and uniform way”.
Could OA repositories take advantage of this “interoperability framework”, and if so, how?
OA repositories will certainly be major beneficiaries of OpenMinTeD as the project strongly relies on OA and intends to promote OA publications and repositories. On the technical side, publication metadata open standards, language resources and tools will be integrated, assessed, and improved. OpenMinTeD will allow content providers (publishers and repositories) and service providers (OpenAIRE, CORE, Europe PMC) to incorporate semantic metadata extraction and text-mining to their own services.
Does this “interoperability framework” include/require some mechanisms connected with Linked Data, Big Data?
OpenMinTeD is definitely linked data-aware though not focused on it. The platform requirements that are currently being built should include conformance to linked data standards.
Big Data Volume is not clearly an issue as texts' volumes to process cannot be considered as Big Data. In TDM approaches, an important bottleneck for "small players" is the computing cost as soon as it involves serial complex treatments, being applied on large amounts of data or not. Soon, the OpenMinTeD users will benefit from the technology and expertise of GRNET - the leading cloud computing provider in Greece, the largest public cloud in Europe, operating Infrastructure as a Service - making things easier and faster.
Is it possible that such an infrastructure connects different institutions related to Agriculture, and if so, how?
Just like vocabularies and data, as soon as they are shared and reusable, text mining workflows and related resources can bring people together. In particular, some of the data interoperability challenges in Agriculture require text-mining processing. We hope that the OpenMinTeD platform will allow the designing and implementation of ambitious projects involving communities that will take advantage of all the text mining promises for sciences. Technical and legal issues are currently tackled to this end in OpenMinTeD.
Thank you very much for taking the time to answer our questions.
Further information on the OpenMinTeD platform can be found in the complementary blog: "OpenMinTed: new EU project for text and data mining. Behind the scenes."