JHOVE: a widely-used open source digital preservation tool


JHOVE is a widely-used open source digital preservation tool, used for validating formats of digital objects. The Open Preservation Foundation has assumed responsibility for this project and is in the process of creating a new permanent and sustainable home for JHOVE.

 “I don’t know of any open source validator that is as efficient as JHOVE, able to handle about 12 formats, written in JAVA and as famous as it. There are surely some others, but one which includes PDF for free, I don’t know of any ... At this level, it is undeniably the opportunity for the whole digital archiving community to join efforts in order to maintain and improve the situation of this international tool” (Open Preservation Foundation). 

JHOVE  (JSTOR / Harvard Object Validation Environment) digital preservation tool was originally developed in 2003 by Harvard and JSTOR for automating format-specific identification, validation, and characterization of digital resources. In particular, the JHOVE was conceived to be integrated into the Ingest function of an OAIS and was made available under an open source license (GNU Lesser General Public License) to support twelve file formats and to be widely deployed internationally.

In 2014,  JHOVE was added to Open Preservation Foundation (OPF) growing software portfolio providing a sustainable home for digital preservation software products and ensuring their ongoing sustainability and maintenance at international level. 

The OPF is now stewarding the JHOVE software in line with its Software Maturity Model (that facilitates development and release of patches and new modules) and coordinating road-mapping and future development activities.

JHOVE Evaluation and Stabilisation Plan  accesses the current state of the JHOVE project resources and describes OPF plans to maintain or preserve them.

It is worth noting that JHOVE from GitHub of the OPF shouldn't be confused with JHOVE2 which does similar things but has a different code base.

With JHOVEdata curators can verify the file formats of the digital objects (DOs) in their repositories. Such a verification consists of three functions: 

  • Identification: determination of the DO’s format;
  • Validation: checking whether the DO conforms to its format’s technical norms;
  • Characterization: providing a report of the DO’s salient properties. 

Identification and validation are linked; that means that any trivial error in the validation process can result in a DO failing to be identified. Format validation conformance is determined at three levels: 

  1. well-formedness: a DO is well-formed if it meets the purely syntactic requirements for its format:
  2. validity: a DO is valid if it is well-formed and it meets the higher-level semantic requirements for format validity;
  3. consistency: a DO is consistent if it is valid and its internally extracted representation information is consistent with externally supplied representation information.

JHOVE only reports full conformance to a profile, that is, it focuses on the semantics of a file rather than its content: a file which is well-formed but not valid has errors. 

Use of JHOVE is widespread in the digital preservation community; JHOVE is included in the POWRR tool grid Preserving Digital Objects with restricted resources, is highlighted on the PREMIS page Tools for preservation metadata implementation and in the Digital Preservation Handbook of the Digital Preservation Coalition.

On May 2016, JHOVE 1.14 was released.  This new version has three new format modules: gzip, WARC and PNG. Among other features, it has a black box testing module and support for Unicode 7.0.0. 

JHOVE is designed incorporating an API, which can be used on its own to create compatible tools and applications. Developers wishing to recompile the JHOVE source code will require Apache Ant.

The JHOVE website provides user and developer documentation and is currently under review to ensure it is up to date and accurate. Installation of JHOVE requires solid knowledge of command line interfaces and experience with manually editing configuration files. Familiarity with metadata outputs is also essential.

The SourceForge code repository includes a forum and it also hosts a mailing list and the usual facilities for filing bug reports, feature requests and support requests.

On 11th of October, 2016, the OPF held JHOVE Online Hack Day to enhance digital preservation community knowledge about JHOVE errors, in particular, to create descriptions of errors and to identify example files, as well as to start to understand their preservation impact and what can possibly be done about them.

On the occasion of the JHOVE Online Hack Day, a collaborative Google document has been created to organize the tasks, contributed from the Document Interest Group and JHOVE Product Board which welcome additional suggestions from the community. 

If your organization uses or wants to use JHOVE, please consider becoming a JHOVE Software Supporter. JHOVE Software Supporters guide the roadmap and can receive free training and technical support.

Join The Open Preservation Foundation - an international, not-for-profit membership organization.