Survey Open Access Repositories in the Agricultural Domain
The Knowledge and Capacity for Development Branch (OEKC) of Food and Agriculture Organization of the United Nations has been involved extensively, especially since 2000, in promoting Open Access (OA) model within the scientific and scholarly community in food, agriculture, development, fisheries, forestry and natural resources. First through the AGRIS network, an international initiative based on a collaborative network of institutions. And since 2007, through the Coherence in Information for Agricultural Research for Development (CIARD)-initiative to make agricultural research information publicly available.
To be able to continue addressing issues of Open Access in a meaningful way FAO conducted in December 2009/January 2010 a survey on the state of the art of Open Access document repositories in the agricultural domain. The overall aim was to obtain an idea of the weak and strong points of digital repositories especially in the field of semantics and technology.
The survey contained 30 questions divided into the following thematic groups:
Methodology
Since there does not yet exist a directory that includes all existing repositories related to agriculture, it was not possible to survey them all. Still, in the following two ways, an attempt was made to contact as much as possible repositories dedicated to agriculture and related sciences.
- The link to the web survey was distributed to every organization that set-up a repository with which OEKC had been in contact over the years. They were reached through mailing lists that have been created and updated by OEKC over a long period of time with the scope of wanting to communicate and share knowledge with institutes/communities involved in agriculture and/or agricultural information management.
- In the case of repositories in the agricultural domain not yet present in the above mentionded lists, it was decided to use OpenDOAR- an authoritative directory of academic open access repositories - as the source to extract repositories from that are partially or fully related to agriculture and related sciences.
In the end the survey was sent to a total of 9 mailing lists used by OEKC and 150 institutions in OpenDOAR. The first messages were sent in December 2009. Reminders were sent by e-mail one week before the survey concluded.
As a result 82 institutes answered our survey-request, a sample that reflects the characteristics of the pool from which it was drawn.
The CIARD RING: obtaining a better overview
To ensure that this survey is continued in time and does not remain a one-time effort, for each repository a rich metadata record was added to theCIARD RING (Routemap to Information Nodes and Gateways), a global registry of web-based services that give access to any kind of information pertaining to agricultural research for development. To be able to also reach the repositories that have not registered anywhere yet, it is of high importance that the CIARD RING itself and its directory get promoted in the agricultural community.
The more repositories related to agriculture register on the CIARD RING, the more and better information on the number of existing agricultural repositories and their characteristics we can get.
Survey Findings
Some general findings of the survey include:
- The responses show an increasing number of repositories since 2007.
- A high number of repositories is located in Europe, followed by the USA, Canada and South America. From Asia and the Pacific only 9 institutions/organizations participated, from Africa only 1.
- More than half of the content available in the repositories is open access or publicly available.
- 60% of the participating institutions have set up an open access policy.
- Journal articles and technical reports are the most common type of documents available from the responding institutions' repositories, followed by theses, conference papers, books and book chapters.
- In the field of subject indexing 90% assigns keywords and/or geographic or subject categories to their records. Of these, 40% uses only freely assigned keywords. 30% uses controlled lists of subjects or geographic terms, while 30% uses thesauri.
- Most of the digital repositories expose metadata only in Dublin Core.
- AGROVOC is the most common thesaurus used by the participating institutions.
- 62% of the answering repositories do not use any authority control at all for bibliographical data. Only 40% is using some sort of authority control, especially for Journal titles.
- DSpace is used by 30% of participating institutions, 16% chose to produce a local solution and 11% use EPrints.
- OpenDOAR is the registry most commonly used together with ROAR, but 18% of the participants confirmed that they have not yet registered their digital repository.
Requests from the participants:
- To enhance the interoperability among digital repositories in the agricultural domain;
- To provide face-to-face and online capacity building activities on open access; and
- To facilitate the use of AGROVOC in the most common digital repository softwares.
Name and type of institution responsible for repository?
Answer options | Response Count |
Governmental organization | 6 |
International organization | 7 |
Non governmental organization | 7 |
Other | 7 |
Research Institute | 17 |
University | 38 |
Table 1. Type of institution
- 82 digital repositories compiled the survey (list of participants).
- 47% of the institutions behind the responding repositories are universities.
- 21% are research institutes.
- NGOs, international organizations and governmental organizations are each represented by 9%.
In which country or countries is your institution based?
Provenance | Response Count |
Europe | 32 |
North America | 17 |
South America | 16 |
International Organizations | 7 |
Asia | 9 |
Africa | 1 |
Table 2. Number of participants geographically
An important number of repositories is located in Europe, followed by the USA, Canada and South America. From Asia only 9 institutions participated, from Africa only 1.
Year that your digital repository became publicly accessible?
Answer Options | Response Count |
1993 | 1 |
1995 | 2 |
1996 | 1 |
1998 | 1 |
1999 | 2 |
2000 | 1 |
2001 | 4 |
2002 | 4 |
2003 | 5 |
2004 | 6 |
2005 | 6 |
2006 | 9 |
2007 | 13 |
2008 | 12 |
2009 | 9 |
No answer | 6 |
Table 3. Year that your digital repository became publicly accessible
The years of foundation indicated by the repositories cover a period from 1993 until 2009. From 1993 until 2000 one or two repositories per year were created. From 2001 on the number of repositories founded a year start to increase substantially. A peak is reached in 2007 with 13 repositories, in 2009 9 repositories were founded .
Has your institution set-up an open-access policy?
60% of the repositories have set up an open access policy and especially mention the use of the Creative Commons Licenses. This does not mean that the remaining 40% provides only restricted access to their documents. Many of the no´s state that their repository is completely or partially freely accessible, but that they just have no formal instititional policy. In some cases a policy is under discussion, in others it´s just a question of encouraging open access, but not wanting to oblige anyone to restrict the access to their documents.
Type of digital repository & fields of interest
Answer Options | Response Count |
< 25% of documents is about Agriculture | 18 |
< 5% of documents is about Agriculture | 17 |
< 50% of documents is about Agriculture | 1 |
> 50% of documents is about Agriculture | 8 |
100% of documents is about Agriculture | 34 |
Table 4. Type of digital repository
- 34 repositories store exclusively information on agriculture and related sciences.
- 8 are multidisciplinary repositories with more than 50% of documents on agriculture.
- There are 18 that have less than 25% and 17 less than 5%.
- Only one repository has less than 50%.
Table 5 represents the results of the question on fields of interest. It shows the nature of the information stored in the document repositories.
Answer Options | Response Count |
Agriculture - General/All | 60 |
Animal Production and Health | 21 |
Economics and Policy | 20 |
Education and Extension | 11 |
Engineering, Technology and Research | 18 |
Farming Practices and Systems | 21 |
Fisheries and Aquaculture | 24 |
Food safety and Human nutrition | 19 |
Food Security | 21 |
Forestry | 22 |
Geographical and Regional Information | 14 |
Government, Administration and Legislation | 8 |
Information Management | 14 |
Natural Resources and Environment | 30 |
Plant Production and Protection | 23 |
Rural and Social Development | 23 |
Other | 15 |
Table 5. Fields of interest
Which type of documents are available in your digital repository?
Answer Options | Response Count |
Book Chapters | 40 |
Books | 42 |
Conference papers | 48 |
Conference proceedings | 37 |
E-learning objects | 21 |
Journal articles | 53 |
Other | 25 |
Pre-prints | 28 |
Technical reports | 53 |
Thesis | 44 |
Working papers | 38 |
Table 6. Type of documents
Grey literature, technical reports and working papers are very well represented.
Another well covered category is peer-reviewed material such as journal articles and conference papers.
'Other’ includes material like statistics, newletters, educational posters, annual reports, policy documents, reports, manuals, policies, scientific data of various types - spreadsheets, images, files produced by scientific analysis software, etc. - posters, conferences presentations, scientific data sets, photograph, maps, data - including spatial data, survey data, data documentation - fact sheets, government reports, student projects, undergraduate theses.
In general, the repositories accept whatever they can find, coming from the production of their institutional users.
Which languages are present in the documents of your repository?
Answer Options | Response Count |
Arabic | 2 |
Catalan | 2 |
Chinese | 2 |
Czech | 1 |
Danish | 2 |
Dutch | 4 |
English | 70 |
French | 18 |
Gaelic | 1 |
German | 6 |
Greek | 1 |
Indonesian | 1 |
Italian | 2 |
Japanese | 1 |
Lao | 2 |
Norwegian | 3 |
Persian | 1 |
Portuguese | 8 |
Russian | 2 |
Spanish | 32 |
Swedish | 6 |
Turkish | 1 |
Thai | 1 |
Ukranian | 2 |
Vietnamese | 1 |
Table 7. List of languages
The representation of the languages is highly covered by English with 85% of the repositories accepting papers in English. Spanish is present in 32 repositories, French in 18 , Portuguese in 8, German and Swedish with 6 each. The rest of the languages (Dutch, Arabic, Catalan, Chinese, Danish, Italian, Lao, Norwegian, Russian, Ukrainian, Czech, Gaelic, Greek, Indonesian, Japanese, Norwegian, Persian, Turkish, Thai, Vietnamese) are present in one or two repositories maximum.
Which is the percentage of full text documents in your digital repository?
Answer Options | Response Count |
< 25% of documents | 18 |
< 5% of documents | 17 |
< 50% of documents | 1 |
> 50% of documents | 8 |
100 % of documents | 34 |
Table 8. Percentage of full text documents
In more than half of the answering repositories all the documents are in full text available.
In another quarter more than half of the documents are accessible in full text. This leaves another quarter of repositories in which less than half of the documents is in full text.
The more documents are digitalized the better. Unfortunately still 50% of the repositories has not managed to digitalize all their material.
How many records are available in your repository?
Answer Options | Response Count |
> 50,000 | 8 |
> 25,000 | 2 |
> 10,000 | 2 |
> 5,000 | 8 |
>1,000 | 4 |
> 100 | 5 |
Table 9. Number of records
Format and metadata
Which types of formats are supported by your digital repository?
Answer Options | Response Count |
HTML | 38 |
JPEG | 33 |
PPT | 33 |
DOC | 36 |
76 | |
RTF | 19 |
Other | 16 |
Table 10. Type of formats
Each type of format is supported by 15% of the repositories, except for PDF which is supported by 30% of the repositories. Other types of materials mentioned in comments:
- XML
- TIFF
- XLS
- SDMX
- 99PX
- CSV
In which metadata formats is your metadata exported?
Answer Options | Response Count |
Dublin Core | 47 |
AGRIS AP | 10 |
MODS | 10 |
MARC XML | 5 |
IEE/LOM | 2 |
PubMED XML | 2 |
E-Prints AP | 2 |
EndNote | 2 |
BibTex | 2 |
ASCII Citation | 1 |
Other | 2 |
Table 11. Metadata formats
Unqualified Dublin Core is mandatory when using the OAI-PMH to expose metadata.
According to this requirement 45% of the answering repositories use it.
45 data providers are offering both unqualified Dublin Core and richer metadata formats such as MODS, MARC XML or AGRIS-AP.
Do you assign keywords or subject categories to the bibliographical records?
Answer Options | Response Count |
No | 7 |
Yes, but onlt freely assigned keywords | 24 |
Yes, we index using geographic and thematic items | 9 |
Yes, we use a controlled list of subjects | 21 |
Yes, we us a thesaurus | 20 |
I do not know | 1 |
Table 12. Use of keywords or subject categories
90% assigns keywords and/or geographic or subject categories to their records. 40% of these repositories uses only freely assigned keywords. 30% uses controlled lists of subjects or geographic terms, while 30% uses thesauri.
If yes, which thesaurus are you using?
Answer Options | Response Count |
AGROVOC | 23 |
ASFA | 5 |
CABI Thesaurus | 3 |
NALT | 6 |
LCSH | 3 |
OCDE | 3 |
AGRIS Subject Categories | 2 |
USDA Agricola | 1 |
IRRI Rice Thesaurus | 1 |
Thesaurus INRA | 1 |
Other | 6 |
Table 13. List of Thesaurus
Are you using any authority lists in your digital repository to control bibliographical data?
62% of the answering repositories does not use any authority control at all for bibliographical data. Only 40% is using some sort of authority control, especially for Journal titles. This is not surprising considering the fact that journal articles are the type of documents that is most present in the repositories.
If yes, which bibliographical fields are controlled by authority lists?
Answer Options | Response Count |
Conference Names | 7 |
Corporate Body Names | 13 |
Journal Titles | 20 |
Other | 7 |
Table 14. Fields most commonly controlled by authority lists
Also the in general much searched access points Corporate bodies and Conference names are mentioned.
Which field/s would you like to be able to control?
Answer Options | Response Count |
Conference Names | 17 |
Corporate Body Names | 21 |
Journal Titles | 24 |
Personal Names | 32 |
Series Title | 20 |
Other (Geographic descriptors) | 8 |
Table 15. Which field/s would you like to be able to control?
Personal names is an access point which 26% of the repositories would like to control. This makes sense taking in consideration that the most common type of document present in repositories are journal articles.
Which software do you use in your digital repository?
Answer Options | Response Count |
Drupal | 3 |
DSpace | 24 |
Eprints Software | 9 |
Fedora Commons | 2 |
Locally produced solution | 11 |
WebAgris | 6 |
Greenstone | 3 |
Digital Commons | 2 |
Digitool | 1 |
ImpressCMS | 1 |
PC-Axis | 1 |
Other | 11 |
Table 16. List of Software
DSpace is by far the most common software package used (30%).
16% chooses to produce a local solution and 11% uses EPrints.
Have you registered your digital repository with any of the following directories, portals or search engines?
Answer Options | Response Count |
OpenDOAR | 35 |
ROAR | 33 |
AGRIS | 7 |
OAISter | 5 |
No | 15 |
I do not know | 10 |
Table 17. List of Directories
OpenDOAR is the repository registry most commonly used together with ROAR. 18% of the participants confirmed that they have not yet registered their digital repository.
Management
Who can submit documents in your digital repository?
Answer Options | Response Count |
Only members from my institution | 60 |
External authors/users | 13 |
Imports metadata from other sources | 1 |
Table 18. Type of submitters
80% of the repositories state that only members of the associated institution can submit documents.
17% accepts also external authors.
Who catalogues the documents in your digital repository?
Answer Options | Response Count |
Librarians | 45 |
Authors (self-archiving) | 27 |
Digital repository staff | 23 |
Administrative and support staff | 13 |
Other | 5 |
Table 19. Type of cataloguers
The cataloguing is done mostly by well trained staff: librarians (40%) and digital repository staff (20%).
Another 12% is done by instructed administrative and support staff members. 23% states to use self-archiving by the authors.
What is the feedback from users?
"E-mail contacts looking for more information is basically the only feedback we have."
"Extremely positive. Depositors like the exposure and the regular download counts we provide."
"Generally positive, in that we are one of the few providers of free full-text publications in our field."
"Jana Hawley, head of the Apparel, Textiles and Interior Design department, offers this comment: “My work in textile recycling is rather obscure, but was discovered by colleagues at Washington University in St. Louis and Hansei University in South Korea. Both resulted in invited lectures. The Washington University lecture partnered with Fashion Group International and the keynote lecture focused on the importance of textile recycling as part of the eco-fashion movement. The Hansei University lecture was the keynote lecture at the Eco-Fashion Conference. Without the open access platform through K-Rex, I doubt I would have had this opportunity.”
"Limited - the submitters sometimes complain about slowness (=connection problems). The librarians prefer good standard metadata, other submitters prefer limited metadata to create records. Finally the biggest problem for the participants is redundancy. Sometimes they have to maintain the collections in different databases/catalogues: Library catalogue - OceanDocs - ASFA database. Automatic exchange of metadata is an important issue."
"Most of the users find their documents through Google. But we don't know if the main target group of OceanDocs are also using Google only. Does people working in the field go to the specific tools like AVANO, aquatic and oceanographic repositories? If not, we have a problem. What is all the extra metadata quality worth in such a situation?"
"Satisfactory; they find this tool very useful."
"So far, the authors are very happy with the repository."
"They love the simplicity of the system. The trick is how to get the necessary metadata without overkill. System has import and default metadata functions that users love as time savers."
"They would like more Lao Language documents, to have more digital files and also to improve overall quality of data inputted."
"We are personally approaching authors and expalining what is Eprints@IARI. They are supporting the initiative."
Feedback from participants
Comments
"I think there is a great opportunity to work on the association between statistics data and text content, an area currently underserved. SDMX and DDI are probably the base mechanisms to use."
"In my view there is a tendency to place too much emphasis on technology and complexity in developing digital repositories. Small organisations and end users need simple, robust and practical publishing tools, not rocket science."
"The larger question ..... is looking at is how to finance the production of future data. If open access becomes the norm, those individuals, institutions or countries who fund data generation will subsidize those who do not for financial, philosophical or cultural reasons. Obviously, this situation will eventually result in atrophy of data generation capabilities by those who share openly and a decrease in data availability to everyone, both those who share and those who do not."
"There is no excuse in 2009 for any public agency to publish useful documents under any terms more limiting than the Creative Commons Attribution license (ie. essentially in the public domain)."
"We are currently using AGRIS standards which are very good but also make things a bit complicated. A number of other Lao focused repositories have come on-line and they are much more simple to use."