Survey Open Access Repositories in the Agricultural Domain

The Knowledge and Capacity for Development Branch (OEKC) of Food and Agriculture Organization of the United Nations has been involved extensively, especially since 2000, in promoting Open Access (OA)  model within the scientific and scholarly community in food, agriculture, development, fisheries, forestry and natural resources. First through the AGRIS network, an international initiative based on a collaborative network of institutions. And since 2007, through the Coherence in Information for Agricultural Research for Development (CIARD)-initiative to make agricultural research information publicly available.

To be able to continue addressing issues of Open Access in a meaningful way FAO conducted in December 2009/January 2010 a survey on the state of the art of Open Access document repositories in the agricultural domain. The overall aim was to obtain an idea of the weak and strong points of digital repositories especially in the field of semantics and technology.

The survey contained 30 questions divided into the following thematic groups:

Methodology

Since there does not yet exist a directory that includes all existing repositories related to agriculture, it was not possible to survey them all. Still, in the following two ways, an attempt was made to contact as much as possible repositories dedicated to agriculture and related sciences.

  1. The link to the web survey was distributed to every organization that set-up a repository with which OEKC had been in contact over the years. They were reached through mailing lists that have been created and updated by OEKC over a long period of time with the scope of wanting to communicate and share knowledge with institutes/communities involved in agriculture and/or agricultural information management.    
  2. In the case of repositories in the agricultural domain not yet present in the above mentionded lists, it was decided to use OpenDOAR- an authoritative directory of academic open access repositories - as the source to extract repositories from that are partially or fully related to agriculture and related sciences.  

In the end the survey was sent to a total of 9 mailing lists used by OEKC and 150 institutions in OpenDOAR. The first messages were sent in December 2009. Reminders were sent by e-mail one week before the survey concluded.

As a result 82 institutes answered our survey-request, a sample that reflects the characteristics of the pool from which it was drawn.

The CIARD RING: obtaining a better overview

To ensure that this survey is continued in time and does not remain a one-time effort, for each repository a rich metadata record was added to theCIARD RING (Routemap to Information Nodes and Gateways), a global registry of web-based services that give access to any kind of information pertaining to agricultural research for development. To be able to also reach the repositories that have not registered anywhere yet, it is of high importance that the CIARD RING itself and its directory get promoted in the agricultural community.

The more repositories related to agriculture register on the CIARD RING, the more and better information on the number of existing agricultural repositories and their characteristics we can get.

Survey Findings

Some general findings of the survey include:

  • The responses show an increasing number of repositories since 2007.
  • A high number of repositories is located in Europe, followed by the USA,  Canada and South America. From Asia and the Pacific only 9 institutions/organizations participated, from Africa only 1.
  • More than half of the content available in the repositories is open access or publicly available.
  • 60% of the participating institutions have set up an open access policy.
  • Journal articles and technical reports are the most common type of documents available from the responding institutions' repositories, followed by theses, conference papers, books and book chapters.
  • In the field of subject indexing 90% assigns keywords and/or geographic or subject categories to their records. Of these, 40% uses only freely assigned keywords. 30% uses controlled lists of subjects or geographic terms, while 30% uses thesauri.
  • Most of the digital repositories expose metadata only in Dublin Core.
  • AGROVOC is the most common thesaurus used by the participating institutions.
  • 62% of the answering repositories do not use any authority control at all for bibliographical data.  Only 40% is using some sort of authority control, especially for Journal titles.
  • DSpace is used by 30% of participating institutions, 16% chose to produce a local solution and 11% use EPrints.
  • OpenDOAR is the registry most commonly used together with ROAR, but 18% of the participants confirmed that they have not yet registered their digital repository.

Requests from the participants:

  • To enhance the interoperability among digital repositories in the agricultural domain;
  • To provide face-to-face and online capacity building activities on open access; and
  • To facilitate the use of AGROVOC in the most common digital repository softwares.

General Information

Name and type of institution responsible for repository?

Answer options Response Count
Governmental organization 6
International organization 7
Non governmental organization 7
Other 7
Research Institute 17
University 38

Table 1. Type of institution

  • 82 digital repositories compiled the survey (list of participants).
  • 47% of the institutions behind the responding repositories are universities.
  • 21% are research institutes.
  • NGOs, international organizations and governmental organizations are each represented by 9%.

In which country or countries is your institution based?

Provenance Response Count
Europe 32
North America 17
South America 16
International Organizations 7
Asia 9
Africa 1

Table 2. Number of participants geographically

An important number of repositories is located in Europe, followed by the USA,  Canada and South America. From Asia only 9 institutions participated, from Africa only 1.

Year that your digital repository became publicly accessible?

Answer Options Response Count
1993 1
1995 2
1996 1
1998 1
1999 2
2000 1
2001 4
2002 4
2003 5
2004 6
2005 6
2006 9
2007 13
2008 12
2009 9
No answer 6

Table 3. Year that your digital repository became publicly accessible

The years of foundation indicated by the repositories cover a period from 1993 until 2009. From 1993 until 2000 one or two repositories per year were created. From 2001 on the number of repositories founded a year start to increase substantially. A peak is reached in 2007 with 13 repositories, in 2009 9 repositories were founded .

Has your institution set-up an open-access policy?

60% of the repositories have set up an open access policy and especially mention the use of the Creative Commons Licenses. This does not mean that the remaining 40% provides only restricted access to their documents. Many of the no´s state that their repository is completely or partially freely accessible, but that they just have no formal instititional policy. In some cases a policy is under discussion, in others it´s just a question of encouraging open access, but not wanting to oblige anyone to restrict the access to their documents.

Content

Type of digital repository & fields of interest

Answer Options Response Count
< 25% of documents is about Agriculture 18
< 5% of documents is about Agriculture 17
< 50% of documents is about Agriculture 1
> 50% of documents is about Agriculture 8
100% of documents is about Agriculture 34

Table 4. Type of digital repository

  • 34 repositories store exclusively information on agriculture and related sciences.
  • 8 are multidisciplinary repositories with more than 50% of documents on agriculture.
  • There are 18 that have less than 25% and 17 less than 5%.
  • Only one repository has less than 50%.

Table 5 represents the results of the question on fields of interest. It shows the nature of the information stored in the document repositories.

Answer Options Response Count
Agriculture - General/All 60
Animal Production and Health 21
Economics and Policy 20
Education and Extension 11
Engineering, Technology and Research 18
Farming Practices and Systems 21
Fisheries and Aquaculture 24
Food safety and Human nutrition 19
Food Security 21
Forestry 22
Geographical and Regional Information 14
Government, Administration and Legislation 8
Information Management 14
Natural Resources and Environment 30
Plant Production and Protection 23
Rural and Social Development 23
Other 15

Table 5. Fields of interest

Which type of documents are available in your digital repository?

Answer Options Response Count
Book Chapters 40
Books 42
Conference papers 48
Conference proceedings 37
E-learning objects 21
Journal articles 53
Other 25
Pre-prints 28
Technical reports 53
Thesis 44
Working papers 38

Table 6. Type of documents

Grey literature, technical reports and working papers are very well represented.

Another well covered category is peer-reviewed material such as journal articles and conference papers.

'Other’ includes material like statistics, newletters, educational posters, annual reports, policy documents, reports, manuals, policies, scientific data of various types - spreadsheets, images, files produced by scientific analysis software, etc. - posters, conferences presentations, scientific data sets, photograph,  maps, data - including spatial data, survey data, data documentation - fact sheets, government reports, student projects, undergraduate theses.

In general, the repositories accept whatever they can find, coming from the production of their institutional users.

Which languages are present in the documents of your repository?

Answer Options Response Count
Arabic 2
Catalan 2
Chinese 2
Czech 1
Danish 2
Dutch 4
English 70
French 18
Gaelic 1
German 6
Greek 1
Indonesian 1
Italian 2
Japanese 1
Lao 2
Norwegian 3
Persian 1
Portuguese 8
Russian 2
Spanish 32
Swedish 6
Turkish 1
Thai 1
Ukranian 2
Vietnamese 1

Table 7. List of languages

The representation of the languages is highly covered by English with 85% of the repositories accepting papers in English. Spanish is present in 32 repositories, French in 18 , Portuguese in 8, German and Swedish with 6 each. The rest of the languages (Dutch, Arabic, Catalan, Chinese, Danish, Italian, Lao, Norwegian, Russian, Ukrainian, Czech, Gaelic, Greek, Indonesian, Japanese, Norwegian, Persian, Turkish, Thai, Vietnamese) are present in one or two repositories maximum.

Which is the percentage of full text documents in your digital repository? 

Answer Options Response Count
 < 25% of documents 18
 < 5% of documents 17
 < 50% of documents 1
 > 50% of documents 8
 100 % of documents 34

Table 8. Percentage of full text documents

In more than half of the answering repositories all the documents are in full text available.

In another quarter more than half of the documents are accessible in full text. This leaves another quarter of repositories in which less than half of the documents is in full text.

The more documents are digitalized the better. Unfortunately still 50% of the repositories has not managed to digitalize all their material.

How many records are available in your repository?

Answer Options Response Count
 > 50,000 8
 > 25,000 2
> 10,000 2
> 5,000 8
 >1,000 4
> 100 5

Table 9. Number of records

Format and metadata

Which types of formats are supported by your digital repository?

Answer Options Response Count
HTML 38
JPEG 33
PPT 33
DOC 36
PDF 76
RTF 19
Other 16

Table 10. Type of formats

Each type of format is supported by 15% of the repositories, except for PDF which is supported by 30% of the repositories. Other types of materials mentioned in comments:

  • XML
  • TIFF
  • XLS
  • SDMX
  • 99PX
  • CSV

In which metadata formats is your metadata exported?

Answer Options Response Count
Dublin Core 47
AGRIS AP 10
MODS 10
MARC XML 5
IEE/LOM 2
PubMED XML 2
E-Prints AP 2
EndNote 2
BibTex 2
ASCII Citation 1
Other 2

Table 11. Metadata formats

Unqualified Dublin Core is mandatory when using the OAI-PMH to expose metadata.

According to this requirement 45% of the answering repositories use it.

45 data providers are offering both unqualified Dublin Core and richer metadata formats such as MODS, MARC XML or AGRIS-AP.

Semantics

Do you assign keywords or subject categories to the bibliographical records?

Answer Options Response Count
No 7
Yes, but onlt freely assigned keywords 24
Yes, we index using geographic and thematic items 9
Yes, we use a controlled list of subjects 21
Yes, we us a thesaurus 20
I do not know 1

Table 12. Use of keywords or subject categories

90% assigns keywords and/or geographic or subject categories to their records. 40% of these repositories uses only freely assigned keywords. 30% uses controlled lists of subjects or geographic terms, while 30% uses thesauri.

If yes, which thesaurus are you using?

Answer Options Response Count
AGROVOC 23
ASFA 5
CABI Thesaurus 3
NALT 6
LCSH 3
OCDE 3
AGRIS Subject Categories 2
USDA Agricola 1
IRRI Rice Thesaurus 1
Thesaurus INRA 1
Other 6

Table 13. List of Thesaurus

Are you using any authority lists in your digital repository to control bibliographical data? 

62% of the answering repositories does not use any authority control at all for bibliographical data. Only 40% is using some sort of authority control, especially for Journal titles. This is not surprising considering the fact that journal articles are the type of documents that is most present in the repositories.

If yes, which bibliographical fields are controlled by authority lists?

Answer Options Response Count
Conference Names 7
Corporate Body Names 13
Journal Titles 20
Other 7

Table 14. Fields most commonly controlled by authority lists

Also the in general much searched access points Corporate bodies and Conference names are mentioned.

Which field/s would you like to be able to control?

Answer Options Response Count
Conference Names 17
Corporate Body Names 21
Journal Titles 24
Personal Names 32
Series Title 20
Other (Geographic descriptors) 8

Table 15. Which field/s would you like to be able to control?

Personal names is an access point which 26% of the repositories would like to control. This makes sense taking in consideration that the most common type of document present in repositories are journal articles.

Software

Which software do you use in your digital repository?

Answer Options Response Count
Drupal 3
DSpace 24
Eprints Software 9
Fedora Commons 2
Locally produced solution 11
WebAgris 6
Greenstone 3
Digital Commons 2
Digitool 1
ImpressCMS 1
PC-Axis 1
Other 11

Table 16. List of Software

DSpace is by far the most common software package used (30%).
16% chooses to produce a local solution and 11% uses EPrints.

Have you registered your digital repository with any of the following directories, portals or search engines?

Answer Options Response Count
OpenDOAR 35
ROAR 33
AGRIS 7
OAISter 5
No 15
I do not know 10

Table 17. List of Directories

OpenDOAR is the repository registry most commonly used together with ROAR. 18% of the participants confirmed that they have not yet registered their digital repository.

Management

Who can submit documents in your digital repository?

Answer Options Response Count
Only members from my institution 60
External authors/users 13
Imports metadata from other sources 1

Table 18. Type of submitters

80% of the repositories state that only members of the associated institution can submit documents.
17% accepts also external authors.

Who catalogues the documents in your digital repository?

Answer Options Response Count
Librarians 45
Authors (self-archiving) 27
Digital repository staff 23
Administrative and support staff 13
Other 5

Table 19. Type of cataloguers

The cataloguing is done mostly by well trained staff: librarians (40%) and digital repository staff (20%).

Another 12% is done by instructed administrative and support staff members. 23% states to use self-archiving by the authors.

What is the feedback from users?

"E-mail contacts looking for more information is basically the only feedback we have."

"Extremely positive. Depositors like the exposure and the regular download counts we provide."

"Generally positive, in that we are one of the few providers of free full-text publications in our field."

"Jana Hawley, head of the Apparel, Textiles and Interior Design department, offers this comment: “My work in textile recycling is rather obscure, but was discovered by colleagues at Washington University in St. Louis and Hansei University in South Korea. Both resulted in invited lectures. The Washington University lecture partnered with Fashion Group International and the keynote lecture focused on the importance of textile recycling as part of the eco-fashion movement. The Hansei University lecture was the keynote lecture at the Eco-Fashion Conference. Without the open access platform through K-Rex, I doubt I would have had this opportunity.”

"Limited - the submitters sometimes complain about slowness (=connection problems). The librarians prefer good standard metadata, other submitters prefer limited metadata to create records. Finally the biggest problem for the participants is redundancy. Sometimes they have to maintain the collections in different databases/catalogues: Library catalogue - OceanDocs - ASFA database. Automatic exchange of metadata is an important issue."

"Most of the users find their documents through Google. But we don't know if the main target group of OceanDocs are also using Google only. Does people working in the field go to the specific tools like AVANO, aquatic and oceanographic repositories? If not, we have a problem. What is all the extra metadata quality worth in such a situation?"

"Satisfactory; they find this tool very useful."

"So far, the authors are very happy with the repository."

"They love the simplicity of the system. The trick is how to get the necessary metadata without overkill. System has import and default metadata functions that users love as time savers."

"They would like more Lao Language documents, to have more digital files and also to improve overall quality of data inputted."

"We are personally approaching authors and expalining what is Eprints@IARI. They are supporting the initiative."

Feedback from participants

Comments

"I think there is a great opportunity to work on the association between statistics data and text content, an area currently underserved. SDMX and DDI are probably the base mechanisms to use."

"In my view there is a tendency to place too much emphasis on technology and complexity in developing digital repositories. Small organisations and end users need simple, robust and practical publishing tools, not rocket science."

"The larger question ..... is looking at is how to finance the production of future data.  If open access becomes the norm, those individuals, institutions or countries who fund data generation will subsidize those who do not for financial, philosophical or cultural reasons.  Obviously, this situation will eventually result in atrophy of data generation capabilities by those who share openly and a decrease in data availability to everyone, both those who share and those who do not."

"There is no excuse in 2009 for any public agency to publish useful documents under any terms more limiting than the Creative Commons Attribution license (ie. essentially in the public domain)."

"We are currently using AGRIS standards which are very good but also make things a bit complicated. A number of other Lao focused repositories have come on-line and they are much more simple to use."