Survey Open Access Repositories in the Agricultural Domain

The Knowledge and Capacity for Development Branch (OEKC) of Food and Agriculture Organization of the United Nations has been involved extensively, especially since 2000, in promoting Open Access (OA)  model within the scientific and scholarly community in food, agriculture, development, fisheries, forestry and natural resources. First through the AGRIS network, an international initiative based on a collaborative network of institutions. And since 2007, through the Coherence in Information for Agricultural Research for Development (CIARD)-initiative to make agricultural research information publicly available.

To be able to continue addressing issues of Open Access in a meaningful way FAO conducted in December 2009/January 2010 a survey on the state of the art of Open Access document repositories in the agricultural domain. The overall aim was to obtain an idea of the weak and strong points of digital repositories especially in the field of semantics and technology.

The survey contained 30 questions divided into the following thematic groups:

Methodology

Since there does not yet exist a directory that includes all existing repositories related to agriculture, it was not possible to survey them all. Still, in the following two ways, an attempt was made to contact as much as possible repositories dedicated to agriculture and related sciences.

  1. The link to the web survey was distributed to every organization that set-up a repository with which OEKC had been in contact over the years. They were reached through mailing lists that have been created and updated by OEKC over a long period of time with the scope of wanting to communicate and share knowledge with institutes/communities involved in agriculture and/or agricultural information management.    
  2. In the case of repositories in the agricultural domain not yet present in the above mentionded lists, it was decided to use OpenDOAR- an authoritative directory of academic open access repositories - as the source to extract repositories from that are partially or fully related to agriculture and related sciences.  

In the end the survey was sent to a total of 9 mailing lists used by OEKC and 150 institutions in OpenDOAR. The first messages were sent in December 2009. Reminders were sent by e-mail one week before the survey concluded.

As a result 82 institutes answered our survey-request, a sample that reflects the characteristics of the pool from which it was drawn.

The CIARD RING: obtaining a better overview

To ensure that this survey is continued in time and does not remain a one-time effort, for each repository a rich metadata record was added to theCIARD RING (Routemap to Information Nodes and Gateways), a global registry of web-based services that give access to any kind of information pertaining to agricultural research for development. To be able to also reach the repositories that have not registered anywhere yet, it is of high importance that the CIARD RING itself and its directory get promoted in the agricultural community.

The more repositories related to agriculture register on the CIARD RING, the more and better information on the number of existing agricultural repositories and their characteristics we can get.

Survey Findings

Some general findings of the survey include:

  • The responses show an increasing number of repositories since 2007.
  • A high number of repositories is located in Europe, followed by the USA,  Canada and South America. From Asia and the Pacific only 9 institutions/organizations participated, from Africa only 1.
  • More than half of the content available in the repositories is open access or publicly available.
  • 60% of the participating institutions have set up an open access policy.
  • Journal articles and technical reports are the most common type of documents available from the responding institutions' repositories, followed by theses, conference papers, books and book chapters.
  • In the field of subject indexing 90% assigns keywords and/or geographic or subject categories to their records. Of these, 40% uses only freely assigned keywords. 30% uses controlled lists of subjects or geographic terms, while 30% uses thesauri.
  • Most of the digital repositories expose metadata only in Dublin Core.
  • AGROVOC is the most common thesaurus used by the participating institutions.
  • 62% of the answering repositories do not use any authority control at all for bibliographical data.  Only 40% is using some sort of authority control, especially for Journal titles.
  • DSpace is used by 30% of participating institutions, 16% chose to produce a local solution and 11% use EPrints.
  • OpenDOAR is the registry most commonly used together with ROAR, but 18% of the participants confirmed that they have not yet registered their digital repository.

Requests from the participants:

  • To enhance the interoperability among digital repositories in the agricultural domain;
  • To provide face-to-face and online capacity building activities on open access; and
  • To facilitate the use of AGROVOC in the most common digital repository softwares.

General Information

Name and type of institution responsible for repository?

Answer optionsResponse Count
Governmental organization6
International organization7
Non governmental organization7
Other7
Research Institute17
University38

Table 1. Type of institution

  • 82 digital repositories compiled the survey (list of participants).
  • 47% of the institutions behind the responding repositories are universities.
  • 21% are research institutes.
  • NGOs, international organizations and governmental organizations are each represented by 9%.

In which country or countries is your institution based?

ProvenanceResponse Count
Europe32
North America17
South America16
International Organizations7
Asia9
Africa1

Table 2. Number of participants geographically

An important number of repositories is located in Europe, followed by the USA,  Canada and South America. From Asia only 9 institutions participated, from Africa only 1.

Year that your digital repository became publicly accessible?

Answer OptionsResponse Count
19931
19952
19961
19981
19992
20001
20014
20024
20035
20046
20056
20069
200713
200812
20099
No answer6

Table 3. Year that your digital repository became publicly accessible

The years of foundation indicated by the repositories cover a period from 1993 until 2009. From 1993 until 2000 one or two repositories per year were created. From 2001 on the number of repositories founded a year start to increase substantially. A peak is reached in 2007 with 13 repositories, in 2009 9 repositories were founded .

Has your institution set-up an open-access policy?

60% of the repositories have set up an open access policy and especially mention the use of the Creative Commons Licenses. This does not mean that the remaining 40% provides only restricted access to their documents. Many of the no´s state that their repository is completely or partially freely accessible, but that they just have no formal instititional policy. In some cases a policy is under discussion, in others it´s just a question of encouraging open access, but not wanting to oblige anyone to restrict the access to their documents.

Content

Type of digital repository & fields of interest

Answer OptionsResponse Count
< 25% of documents is about Agriculture18
< 5% of documents is about Agriculture17
< 50% of documents is about Agriculture1
> 50% of documents is about Agriculture8
100% of documents is about Agriculture34

Table 4. Type of digital repository

  • 34 repositories store exclusively information on agriculture and related sciences.
  • 8 are multidisciplinary repositories with more than 50% of documents on agriculture.
  • There are 18 that have less than 25% and 17 less than 5%.
  • Only one repository has less than 50%.

Table 5 represents the results of the question on fields of interest. It shows the nature of the information stored in the document repositories.

Answer OptionsResponse Count
Agriculture - General/All60
Animal Production and Health21
Economics and Policy20
Education and Extension11
Engineering, Technology and Research18
Farming Practices and Systems21
Fisheries and Aquaculture24
Food safety and Human nutrition19
Food Security21
Forestry22
Geographical and Regional Information14
Government, Administration and Legislation8
Information Management14
Natural Resources and Environment30
Plant Production and Protection23
Rural and Social Development23
Other15

Table 5. Fields of interest

Which type of documents are available in your digital repository?

Answer OptionsResponse Count
Book Chapters40
Books42
Conference papers48
Conference proceedings37
E-learning objects21
Journal articles53
Other25
Pre-prints28
Technical reports53
Thesis44
Working papers38

Table 6. Type of documents

Grey literature, technical reports and working papers are very well represented.

Another well covered category is peer-reviewed material such as journal articles and conference papers.

'Other’ includes material like statistics, newletters, educational posters, annual reports, policy documents, reports, manuals, policies, scientific data of various types - spreadsheets, images, files produced by scientific analysis software, etc. - posters, conferences presentations, scientific data sets, photograph,  maps, data - including spatial data, survey data, data documentation - fact sheets, government reports, student projects, undergraduate theses.

In general, the repositories accept whatever they can find, coming from the production of their institutional users.

Which languages are present in the documents of your repository?

Answer OptionsResponse Count
Arabic2
Catalan2
Chinese2
Czech1
Danish2
Dutch4
English70
French18
Gaelic1
German6
Greek1
Indonesian1
Italian2
Japanese1
Lao2
Norwegian3
Persian1
Portuguese8
Russian2
Spanish32
Swedish6
Turkish1
Thai1
Ukranian2
Vietnamese1

Table 7. List of languages

The representation of the languages is highly covered by English with 85% of the repositories accepting papers in English. Spanish is present in 32 repositories, French in 18 , Portuguese in 8, German and Swedish with 6 each. The rest of the languages (Dutch, Arabic, Catalan, Chinese, Danish, Italian, Lao, Norwegian, Russian, Ukrainian, Czech, Gaelic, Greek, Indonesian, Japanese, Norwegian, Persian, Turkish, Thai, Vietnamese) are present in one or two repositories maximum.

Which is the percentage of full text documents in your digital repository? 

Answer OptionsResponse Count
 < 25% of documents18
 < 5% of documents17
 < 50% of documents1
 > 50% of documents8
 100 % of documents34

Table 8. Percentage of full text documents

In more than half of the answering repositories all the documents are in full text available.

In another quarter more than half of the documents are accessible in full text. This leaves another quarter of repositories in which less than half of the documents is in full text.

The more documents are digitalized the better. Unfortunately still 50% of the repositories has not managed to digitalize all their material.

How many records are available in your repository?

Answer OptionsResponse Count
 > 50,0008
 > 25,0002
> 10,0002
> 5,0008
 >1,0004
> 1005

Table 9. Number of records

Format and metadata

Which types of formats are supported by your digital repository?

Answer OptionsResponse Count
HTML38
JPEG33
PPT33
DOC36
PDF76
RTF19
Other16

Table 10. Type of formats

Each type of format is supported by 15% of the repositories, except for PDF which is supported by 30% of the repositories. Other types of materials mentioned in comments:

  • XML
  • TIFF
  • XLS
  • SDMX
  • 99PX
  • CSV

In which metadata formats is your metadata exported?

Answer OptionsResponse Count
Dublin Core47
AGRIS AP10
MODS10
MARC XML5
IEE/LOM2
PubMED XML2
E-Prints AP2
EndNote2
BibTex2
ASCII Citation1
Other2

Table 11. Metadata formats

Unqualified Dublin Core is mandatory when using the OAI-PMH to expose metadata.

According to this requirement 45% of the answering repositories use it.

45 data providers are offering both unqualified Dublin Core and richer metadata formats such as MODS, MARC XML or AGRIS-AP.

Semantics

Do you assign keywords or subject categories to the bibliographical records?

Answer OptionsResponse Count
No7
Yes, but onlt freely assigned keywords24
Yes, we index using geographic and thematic items9
Yes, we use a controlled list of subjects21
Yes, we us a thesaurus20
I do not know1

Table 12. Use of keywords or subject categories

90% assigns keywords and/or geographic or subject categories to their records. 40% of these repositories uses only freely assigned keywords. 30% uses controlled lists of subjects or geographic terms, while 30% uses thesauri.

If yes, which thesaurus are you using?

Answer OptionsResponse Count
AGROVOC23
ASFA5
CABI Thesaurus3
NALT6
LCSH3
OCDE3
AGRIS Subject Categories2
USDA Agricola1
IRRI Rice Thesaurus1
Thesaurus INRA1
Other6

Table 13. List of Thesaurus

Are you using any authority lists in your digital repository to control bibliographical data? 

62% of the answering repositories does not use any authority control at all for bibliographical data. Only 40% is using some sort of authority control, especially for Journal titles. This is not surprising considering the fact that journal articles are the type of documents that is most present in the repositories.

If yes, which bibliographical fields are controlled by authority lists?

Answer OptionsResponse Count
Conference Names7
Corporate Body Names13
Journal Titles20
Other7

Table 14. Fields most commonly controlled by authority lists

Also the in general much searched access points Corporate bodies and Conference names are mentioned.

Which field/s would you like to be able to control?

Answer OptionsResponse Count
Conference Names17
Corporate Body Names21
Journal Titles24
Personal Names32
Series Title20
Other (Geographic descriptors)8

Table 15. Which field/s would you like to be able to control?

Personal names is an access point which 26% of the repositories would like to control. This makes sense taking in consideration that the most common type of document present in repositories are journal articles.

Software

Which software do you use in your digital repository?

Answer OptionsResponse Count
Drupal3
DSpace24
Eprints Software9
Fedora Commons2
Locally produced solution11
WebAgris6
Greenstone3
Digital Commons2
Digitool1
ImpressCMS1
PC-Axis1
Other11

Table 16. List of Software

DSpace is by far the most common software package used (30%).
16% chooses to produce a local solution and 11% uses EPrints.

Have you registered your digital repository with any of the following directories, portals or search engines?

Answer OptionsResponse Count
OpenDOAR35
ROAR33
AGRIS7
OAISter5
No15
I do not know10

Table 17. List of Directories

OpenDOAR is the repository registry most commonly used together with ROAR. 18% of the participants confirmed that they have not yet registered their digital repository.

Management

Who can submit documents in your digital repository?

Answer OptionsResponse Count
Only members from my institution60
External authors/users13
Imports metadata from other sources1

Table 18. Type of submitters

80% of the repositories state that only members of the associated institution can submit documents.
17% accepts also external authors.

Who catalogues the documents in your digital repository?

Answer OptionsResponse Count
Librarians45
Authors (self-archiving)27
Digital repository staff23
Administrative and support staff13
Other5

Table 19. Type of cataloguers

The cataloguing is done mostly by well trained staff: librarians (40%) and digital repository staff (20%).

Another 12% is done by instructed administrative and support staff members. 23% states to use self-archiving by the authors.

What is the feedback from users?

"E-mail contacts looking for more information is basically the only feedback we have."

"Extremely positive. Depositors like the exposure and the regular download counts we provide."

"Generally positive, in that we are one of the few providers of free full-text publications in our field."

"Jana Hawley, head of the Apparel, Textiles and Interior Design department, offers this comment: “My work in textile recycling is rather obscure, but was discovered by colleagues at Washington University in St. Louis and Hansei University in South Korea. Both resulted in invited lectures. The Washington University lecture partnered with Fashion Group International and the keynote lecture focused on the importance of textile recycling as part of the eco-fashion movement. The Hansei University lecture was the keynote lecture at the Eco-Fashion Conference. Without the open access platform through K-Rex, I doubt I would have had this opportunity.”

"Limited - the submitters sometimes complain about slowness (=connection problems). The librarians prefer good standard metadata, other submitters prefer limited metadata to create records. Finally the biggest problem for the participants is redundancy. Sometimes they have to maintain the collections in different databases/catalogues: Library catalogue - OceanDocs - ASFA database. Automatic exchange of metadata is an important issue."

"Most of the users find their documents through Google. But we don't know if the main target group of OceanDocs are also using Google only. Does people working in the field go to the specific tools like AVANO, aquatic and oceanographic repositories? If not, we have a problem. What is all the extra metadata quality worth in such a situation?"

"Satisfactory; they find this tool very useful."

"So far, the authors are very happy with the repository."

"They love the simplicity of the system. The trick is how to get the necessary metadata without overkill. System has import and default metadata functions that users love as time savers."

"They would like more Lao Language documents, to have more digital files and also to improve overall quality of data inputted."

"We are personally approaching authors and expalining what is [email protected]. They are supporting the initiative."

Feedback from participants

Comments

"I think there is a great opportunity to work on the association between statistics data and text content, an area currently underserved. SDMX and DDI are probably the base mechanisms to use."

"In my view there is a tendency to place too much emphasis on technology and complexity in developing digital repositories. Small organisations and end users need simple, robust and practical publishing tools, not rocket science."

"The larger question ..... is looking at is how to finance the production of future data.  If open access becomes the norm, those individuals, institutions or countries who fund data generation will subsidize those who do not for financial, philosophical or cultural reasons.  Obviously, this situation will eventually result in atrophy of data generation capabilities by those who share openly and a decrease in data availability to everyone, both those who share and those who do not."

"There is no excuse in 2009 for any public agency to publish useful documents under any terms more limiting than the Creative Commons Attribution license (ie. essentially in the public domain)."

"We are currently using AGRIS standards which are very good but also make things a bit complicated. A number of other Lao focused repositories have come on-line and they are much more simple to use."