Text and data mining (TDM) in agri-food research
A wealth of published research outputs are publicly available ( thanks to active Open Access initiatives and mandates that keep finding ways to open up even more research data). However, researchers are still facing a challenge when searching for specific elements of a research publication in order to support their own research, such as an image, a diagram or a dataset related to a specific topic, such as crop disease, within the scientific literature. Indeed, such components are currently embedded in various types of publications and cannot be identified, described and retrieved as individual entities.
On the other hand, trying to manually identify the location of necessary components like the aforementioned ones is a challenging and time-consuming process; especially when we are referring to results retrieved from large bibliographic databases like the FAO AGRIS one which currently provides access to more than 8 million bibliographic records.
About the OpenMinTeD Project
OpenMinTeD: “Open Mining INfrastructure for TExt and Data“ is a Horizon 2020 project that aims to provide solutions in cases like this. More specifically, the project aims to provide the necessary solutions that will allow the identification and retrieval of such components available in research publications. Following a user query with specific terms, value-added services based on text-mining mechanisms will preview related images/datasets that are located inside the publications. End users are expected to be able to click the preview of the related figure (such as image, diagram or dataset name) and will be redirected to the specific location of the publication to study more details. This is expected to significantly reduce the time needed for retrieving such individual components and at the same time maximize the efficiency of traditional search mechanisms.
In the context of the project, a number of use cases will be validated, including the use of the AGRIS repositories, possibly other publisher multilingual content, public databases as well as the RSS feeds from the identified sources. The methodology applied will rely on external call for technologies on top of INRA’s text-mining services. Applying these technologies and enabling such features over the more than 8 million bibliographic records (including full-text) of the AGRIS database is expected to significantly facilitate the work of agri-food researchers who depend on available literature for their own research purposes.
(this post was initially published on the Agro-Know blog; click to read the full post)