Open Repositories 2011: Changing Platforms

Open Repositories is an annual conference for everyone interested and involved in the world of repositories - mainly institutional or subject repositories providing open access to research-related publications. This year's conference (OR11) was held in Austin, Texas, and hosted by the University of Texas.

Imma Subirats and I agreed to present a joint paper to the OR11 conference which we called "Changing Platforms: Parallel case studies in repository platform migration".

I have been involved with implementing and managing repository systems for the University of London since 2006 - initially on the DSpace platform, but now almost exclusively with EPrints. Among the repositories we run is SAS-Space, a repository for the University of London School of Advanced Study (SAS). We effectively ended our long-standing association with DSpace in 2010, when we migrated SAS-Space to EPrints.

In addition to her work for FAO, Imma Subirats is also Chief Executive of the international E-LIS subject repository for Library and Information Science. Imma has also recently completed a repository platform migration for E-LIS, however her migration was from EPrints to DSpace.

Imma and I described how the motivation for the two migration projects was very similar: in each case it was dictated by the availability of skills and resources. For E-LIS, specialist support comes from CILEA in Italy, which specialises in DSpace, and is a registered Duraspace provider. In the case of SAS-Space, the repository team at the University of London Computer Centre (ULCC) has worked on many EPrints projects since 2007, and is now one of the leading providers of EPrints services in the UK.

EPrints and DSpace are often contrasted with each other: their differences are often defined by their implementation (Apache/Perl in the case of EPrints; Tomcat/Java in the case of DSpace). Nevertheless they have many features in common. This is hardly surprising since both packages share a common lineage in earlier digital object management initiatives, notably Cogprints. Both platforms also share a common adherence to the same open standards and open access, and there are even developers who have contributed to both projects. One benefit of this common heritage is that both platforms share a similar model of managing digitial object bitstreams and metadata, and for this reason exporting and importing between the two packages was not especially complicated.

There are, however, a number of significant differences between the platforms. The JISC RSP Repository Software survey contains a detailed breakdown of the features available in several leading repository packages, but there are many subtle differences that cannot always be identified by such a survey. It is only through using a software package for period of time that one really gets to understand where its strengths and weaknesses lie. Comparatively few repository managers (or repository developers) have experience of using more than one package, or get to appreciate the significance of some key differences: yet, some of these features can make a substantial difference to the workload of repository managers, or the accessibility of repository content.

For example, one of the critical issues E-LIS  experienced in moving from EPrints to DSpace relates to the packages' workflow management. The E-LIS team is accustomed to a very flexible EPrints-based workflow that allows items to have their workflow status changed quite freely. DSpace, by contrast, has a unidirectional workflow model, so that items cannot (for example) be reverted from Live to Pending: if some kind of error is spotted, the item needs to be deleted and resubmitted. This is obviously a significant divergence between the superficially similar repository platforms, and potentially forces the many E-LIS editors around the world to change processes that they are familiar with.

Some other issues with DSpace arise from its default Web template. Many repository implementations - DSpace and EPrints - make few, if any, changes to the default Web templates for such things as Search Forms, Browse Views and Item Abstracts. In many places, the DSpace default templates are not intuitive for users; nor do they output valid HTML or CSS, which means that some intervention is essential before a DSpace installation can comply with Web Content Accessibility Guidelines. (Many government organisations insist on compliance with WCAG for Web sites and Web applications.)

For EPrints, perhaps the most obvious deficiency is that it does not support the Handle system for persistent URIs, so when moving SAS-Space away from DSpace we had to consider the implications of this. We decided that the benefits of the Handle system were not sufficient to justify custom development of this feature for EPrints. Instead, by carefully manipulating item IDs during the import, and setting some simple redirection rules in the Apache Web server, it was possible to ensure that all Handle URIs previously created by DSpace continue to function; however no further Handle URIs will be created for new content.

EPrints also does not implement "Communities" and "Collections" as explicitly as DSpace. Communities and Sub-Communities are easy enough to implement as an Institutional structure within EPrints (and the Divisions can be renamed "Communities", "Departments", etc). But in order to implement Collections, we had to define an additional metadata field and browse view template.

Differences aside, in the case of both projects, the fact that we successfully transferred substantial collections of digital objects, and metadata between systems should be seen as positive for the Open Repositories community. The ability to migrate large digital objects collections, and their metadata, reliably and effectively between software platforms, is an important freedom, made possible only by the availability of open standards for data and metadata, and high-quality open source applications like EPrints and DSpace.

Sources