For much of its history, the scholarly publication industry (scientific journals, monographs, major reference works, and so forth), has had a distant relationship with the data that underly the research findings it publishes. Data were either directly referenced in tables or else not mentioned at all. Many disciplines, particularly those in the humanities and social sciences, did not recognise the ‘data’ label as being relevant for their outputs. In any case, data were not being produced to a scale that particularly challenged the long-standing formats, methods of production, peer review and distribution.
During the twentieth century, technology began to impinge heavily on these practices. The Open Access movement extended to ‘Open Science’ and this, together with a series of retraction and reproducibility scandals, has incentivised scholarly publishers to re-examine their relationship with research data.
At industry level, the first real evidence of this is the 2007 Brussels Declaration deals in large part with Open Access in general, but it also sets out an early position towards research data, indicating widespread consensus that data will ideally be made freely accessible:
"Raw research data should be made freely available to all researchers. Publishers encourage the public posting of the raw data outputs of research. Sets or sub-sets of data that are submitted with a paper to a journal should wherever possible be made freely accessible to other scholars."
This was followed in 2012 by the Joint STM-DataCite Declaration on the Linkability and Citability of Research Data, which signalled the beginning of the collaborative approach that is required in order for progress to be made. However, although some of the born-digital publishers have been actively engaging with research data (see, for instance, Public Library of Science’s data policy announced in 2014), progress has been much slower across the majority of player, for a number of reasons. These include the fact that when developing new services and/or policies, publishers tend to be led by journal ‘community’ (author, editorial board or learned society) demands. If these groups have not required engagement on research data, publishers are hesitant to instigate them unilaterally. There is intense awareness that any addition to the publication process – to authors, editors, reviewers, or typesetter – could result in confusion, irritation, the introduction of mistakes or actual ethical problems.
There has also been reluctance to commit to investment in services that may be unnecessary or unwanted, or even turn out to have been counter-productive. There has been a lack of clarity in understanding what steps to take, what degree of rigour or compliance checking is required, how to treat different subject areas, how to work with other stakeholders, such as repositories. It is also difficult for the larger publishers to take large-scale decisions that affects global journal programmes and requires coordination across multiple functions and sites. Meanwhile, smaller publishers tend to be held up by lack of bandwidth or concern about potential cost burdens.
Consequently, publishers have started to seek organisations and partners who can help with gaining knowledge, developing best practices, and receiving advice on how to roll them out. Apart from researchers themselves, and learned societies, the Research Data Alliance has increasingly emerged as such a partner.
When the RDA first launched, there was a flurry of activity in the research data and publishing space, which produced several useful outputs, either in their first or subsequent incarnations (e.g. the data publishing reference model, Scholix). Among the membership of these early groups, were a number of individuals working in publishing, many of whom have attended a number of plenaries as individual RDA members. Over the past six years, some larger publishing industry's bodies have joined RDA as organisational members and one of publishing’s industry bodies, the STM Association, is also an organisational member. However, until recently collaborations have been mostly at the personal level within the various working and interest groups.
In 2017, as part of the RDA-EU3 project, the RDA Industry Advisory Board investigated how RDA could contribute to industrial awareness, understanding and interactions with research data, with scientific publishing being one of the sectors under examination. Working through a series of consultations, meetings, interviews and a workshop, the IAB surfaced some interesting insights about how publishing viewed its relationship to research data (link to report here). Observations included the fact that: “general principles, guidelines and proofs of concept have been much easier to implement than technical outputs.” At the same time, the report was able to short that the industries viewed RDA as a venue where standards and best practices could be developed together and were keen to pursue more opportunities for industry and RDA to meet and investigate potential collaborations.
Most recently, the STM Association branded 2020 as ‘The STM Year of Research Data’. This has involved setting up a website (https://www.stm-researchdata.org) and stating some ambitious aims around the slogan ‘SHARE-LINK-CITE’ where ‘SHARE’ means increasing the number of journals with data policies and articles with Data Availability Statements (DAS); ‘LINK’ is to increase the number of journals that deposit the data links to the SCHOLIX framework; and ‘CITE’ is to increase the citations to datasets along the Joint Declaration of Data Citation Principles. The RDA has been closely involved with these objectives, both directly – through attending and helping to publicise the series of in-person and virtual workshops included in the programme – and indirectly as the facilitator and convenor of several of its key reference materials. As well as Scholix, these include the journal data policy framework developed by the RDA Data Policy Standardisation and Implementation Interest Group. Working with publishers and researcher groups, the FAIRsharing.org (also originating as an RDA Working Group) has co-curated a database of standards, policies and databases. As a result, their website provides an excellent overview of data repositories that journals can recommend to their authors. The Enabling FAIR Data Project has relied heavily on the RDA community for review and community feedback, refinement and implementation. And the Belmont Forum Data Accessibility Statement Policy and Template has achieved recognition as a gold standard output co-designed by funders and publishers, but with the underlying rigor of having been reviewed and discussed by the RDA members.