Institutional Repository Development: A Case Study of KARI and KAINet
Institutional Repository Development: A Case Study of KARI and KAINet
By Richard Kedemi
Kenya Agricultural Research Institute
This paper is based on KAINet’s and KARI’s experience and lessons learnt with developing digital repositories using AgriDrupal and AgriOceanDspace. KAINet is a network originally made up of institutions that were on the Kenya AGRIS Pilot Project supported by the Food and Agriculture Organization with funding from the UK Department for International Development (DFID). The institutions are KARI Headquarters, KARI – National Agricultural Research Laboratories (NARL), the Kenya Forestry Research Institute (KEFRI), the Ministry of Agriculture Library, and the Jomo Kenyatta University of Agriculture and Technology (JKUAT).
The KAINet website was originally developed in 2006 using Typo3 Content Management System with WebAGRIS, a software tool for creating, managing and disseminating metadata compliant with AGRIS Application Profile (AP), running in the background for the repository metadata management. The KARI portal was developed using Drupal and integrated with WebGRIS. In both cases, users searched the WebAGRIS installation/interface integrated into another environment hence limiting its performance and usability. The WebAGRIS platform was made up of JavaServlets and CGI-based application and templating systems. It served both KARI and KAINet well but was expensive to maintain. Updating the portals with new features and ensuring that they integrate well with WebAGRIS was often difficult.
Because of the limitations of the tools and lessons learnt, there was need to look at other available tools for implementing digital repositories. Tools that seamlessly integrate website/portal functionalities with content (full-text) and metadata management functionalities. Therefore, the KAINet ICT team monitored and followed the AgriDrupal and AgriOceanDspace development with great interest and anticipation, and also participated in the testing, offering feedback inputs from time to time for these tools. In late 2011, it was decided to migrate KARI’s repository to AgriOceanDspace and KAINet’s repository to AgriDrupal platforms. This would make it easier to adopt new cutting-edge technologies, improve search ability online, and integrate the tow repositories seamlessly with other online services to enhance the visibility and accessibility of content and to make the metadata records easily harvestable by other service providers.
2. Why AgriOceanDspace and AgriDrupal
AgriDrupal is a normal Drupal installation with a customized configuration and special modules. Its use on KAINet portal ensured that content and repository metadata reside on the platform increasing usability and accessibility of both content and the repository. By implementing AgriDrupal, KAINet benefited from the following features:
- a cataloguing interface that out-of-the-box provides the most commonly used metadata elements in bibliographic databases, in particular those defined by AGRIS AP, but is easily extendable to include any other element;
- special input interface for subject indexing with the AGROVOC thesaurus;
- exposure of records through the OAI-PMH protocol;
- import and export functionalities based on widely used formats (Dublin Core, AGRIS AP, CSV or RSS). AGROVOC can be used to index any contents:
- a flexible import functionalities allowing periodical incremental import / harvesting of records
- addition of new data providers, being easily adjustable to different output formats, provided that the basic metadata requirements are met.
KARI needed a robust tool that would decentralize publishing of metadata online considering its network of 30 sub-centres. By adopting AgriOcean Dspace, KARI benefited from the following features:
- high standards for metadata (AGRIS AP, MODS) and OAI-PMH compliant; Controlled vocabularies (ASFA, AGROVOC);
- a very easy AGRIS AP batch metadata importing feature which enables migration of various format of metadata (XMl and Endnote) manageable.
- easy to install Dspace version for both Windows and Linux and easy Up-to-date personalizable layout which is a move from the very compliant DSPACE installation
AgriOcean Dspace, is a normal Dspace installation with a customized configuration and integrated with AGRIS AP.
3. CIARD Initiative
The adoption of AgriDrupal and AgriOceanDspace also ensured that KAINet and KARI repositories conform to some of the recommendations by the Coherence in Information for Agricultural Research for Development (CIARD), a global initiative working to make agricultural research information publicly available and accessible to all. Both tools have adopted Resource Description Framework (RDF), linked data, Really Simple Syndication (RSS), and Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), thus enhancing visibility and accessibility of agricultural content in the repositories.
4. The Project
4.1. The core team
Migration from WebAGRIS-based repositories to AgriDrupal and AgriOceanDspace repositories involved several activities and required expertise both in WebAGRIS and the new tools. It was important to ensure that the WebAGRIS metadata would be easily migrated to the new repository software. A Drupal expert from KARI undertook the migration of website, metadata and customization of the AgriDrupal for KAINet and the installation, migration and customization of AgriOceanDspace for KARI. The KARI Library team undertook the validation of the metadata. The FAO team, in-charge of both AgriDrupal and AgriOceanDspace, offered technical support.
4.2. Setting up and configuration of the repository
The KAINet repository was built to integrate the metadata which was exported and imported from the institutional repositories of member institutions. The following support tools were installed and setup: PHP, Apache or IIS, MySQL followed by an upgrade of the Drupal installation to Drupal 6:19 and above.
To integrated the AGRIS AP with the new repository, we installed the AGRIS AP and AGROVOC modules at the same time. This was done to avoid the installation crashing which happens with multiple module installation. Metadata from WebAGRIS-based institutional and KAINet repositories was then migrated to the AgriDrupal-based repository. Finally the look and feel of the portal was customized to give it the KAINet brand.
The KARI e-repository required the installation of Java, Apache Tomcat, Postgresql and then the AgriOcean Dspace. Here the most important thing is to set the supporting tools to work properly before running the easy AgriOcean Dspace installation. Because several communities were to manage the metadata under the different KARI thematic research disciplines, were created several users with the different rights and permission to upload and manage the repository.
Metadata was imported from WebAGRIS and the process was finalized with submission of the metadata into the right communities by the new users. The look and feel of the AgriOceanDspace search interface was customized to harmonize it with the KARI portal.
To ensure interoperability, KAINet and KARI implemented a simple RSS feeds, a web 2.0 tool that facilitate the exchange and sharing of information. The two installations also integrate interoperability using OAI-PMH.
- To boost discoverability of new content as it gets published each day, particularly on the homepage
- To increase user engagement
- To decentralize the management of the repository
- To increase overall traffic
With the successful migration of the two repositories we believe that we are on track in achieving these objectives.
Content searching and browsing in both KARI and KAINet is now powered by inbuilt powerful free-text searching utilities, as well as standard author and subject-searching options. Both KAINet and KARI repositories can be searched using various fields of content, i.e. by title, articles, subject and advanced search. Simple searches on both portals enable one to search both the metadata in the repositories and other content types (article content, video, podcasts, article responses, etc).
7. Performance and scalability
KARI and KAINet are busy sites. The performance and scalability of the new redesigned site in Drupal was a key requirement of the project. Thanks to the open source model of the AgriDrupal community, there is a Drupal 6 distribution for high traffic sites that is optimized for performance and scalability.
8. Some Innovations
The use of AgriDrupal and AgriOeacnDspace has resulted in some innovations on the KAINet and KARO portals, mainly benefiting from Drupal and Google applications. The Google analytics has been implemented on KAINet’s Agridrupal installation to track visitors and know what they are searching for and how to serve them better, while KARI’s AgriOceanDspace has statistics function that enables us track the usage of the tool. In addition blogs and forums have been implemented on the KAINet site to support communication and interaction with stakeholders
Several challenges were faced during the implementation of the project. The major ones were the absence of institutional policies that support open access, low awareness of copyright and Intellectual property rights (IPR) issues relating to the digital environment by content generators, and inadequate technical infrastructure.
The absence of appropriate information management policies and copyright/IPR issues made it difficult for information managers to collect full-text documents from the content generators (research scientists) for the repositories. This is why at the moment; the two repositories have more metadata than associated full-text documents. There is resistance towards providing access to full-text documents.
For content to be easily accessible and visible, institutional repositories should be published on the Internet. Unfortunately, most institutions in Kenya, due to inadequate technical infrastructure especially reliable Internet connections, maintain their repositories on local area networks or on a local single personal computer (PC) and hence hindering accessible and visibility of their content globally.
10. Lessons Learnt
System comparability: in both cases being able to use the right versions of supporting tools, which have been tested and are supported by either AgriDrupal or AgriOceanDspace communities of users, was key to success. For example implementing a lower PHP or MySQL versions than the ones tested with the systems would cause the system not to work.
Technical skills: availability of technical ICT skills to make adjustments in some of the configuration files (my,ini, php.ini, config.ini and .htaccess) to improve the performance of the supporting resources, is extremely important. Skills, especially knowledge of working with templates and Cascading Style Sheets (CSS) files, are also required to customize the look and feel the interfaces installations to match those of the institutional portals.
Collaboration: among various stakeholders in the institution is critical. Content generators (research scientist in this case), information managers (librarians and documentalists) and ICT specialists all bring difference skills and knowledge to the initiative and need to work together. In addition, management needs to ensure that appropriate information management policies and strategies are put in place.
Figure 1. KAINet – Agridrupal
Figure 2. KARI - AgriOceanDspace
Building a repository is a major task that requires meticulous planning of various processes and resources such as hardware, software, skilled human resources and uninterrupted Internet connectivity with high bandwidth. Once established, the next task will be to win the trust of content owners to provide the content to populate the repositories. This requires creating awareness among academic and scientific communities about various benefits of open access and self-archiving. Collaborative efforts are needed to develop a new culture of disseminating content from research. Librarians and other information professionals have an important role to play in this respect.
ICT capacity among information managers needs to be built to keep up with the ever changing ICT tools and technologies and promote, implement and support repositories. The disconnect between the information managers and ICT specialists in most institutions creates another obstacle to the development and management of repositories. The two need each other for the successful development of institutional repositories.
Although there are several tools for managing and exchanging content, especially metadata to improve accessibility and visibility, there is need to choose the right tools based on requirements analysis. The challenge for most information managers in agriculture institutions is how to ensure that they get the right tool. AgriDrupal and AgriOceanDspace are tailor-made to meet the requirements of storing, enriching and sharing agricultural information. However, these tools require some good ICT skills. Therefore, there is the need to also build ICT capacity and skills on the new emerging open source tools to foster achieving, exchange and sharing of information. Institutional policies and strategies that support the use of open access software in addition to creating awareness on IPR (Intellectual Property Rights) issues by content generators are also very important. The ever growing support community around both AgriDrupal and AgriOceanDspace and the existence of the CIARD Ring and FAO AIMS resources make them easy tools to implement with very simple User Documentation.