Experiences of publishing Getty Vocabularies into LOD

AIMS Team invited Patricia Harpring the Managing Editor of the Getty Vocabulary Programme at the Getty Research Institute in Los Angeles to share more information on the publication of Getty Vocabularies as Linked Open Data (LOD). Getty Vocabulary Program produces the Art & Architecture Thesaurus (AAT) ®, the Getty Thesaurus of Geographic Names (TGN) ®, the Union List of Artist Names (ULAN) ®, and the Cultural Objects Name Authority (CONA) ®.

Patricia is involved in disseminating information and training about vocabularies and cataloging art. She is the co-editor of CCO (Cataloging Cultural Objects) and CDWA (Categories for the Description of Works of Art). She is also the author of Introduction to Controlled Vocabularies (Los Angeles, 2013), as well as of editorial rules for building vocabularies, numerous articles, papers, and presentations on cataloging art, controlled vocabularies, and data standards. 

Question 1: Explain briefly what are the Getty Vocabularies?

A brief description is available here. I might add that Getty vocabularies are compiled resources that grow through contributions from various Getty projects and numerous outside institutions. Contributors to the Getty vocabularies include museums, libraries, archives, special collections, visual resources collections, bibliographic and documentation projects, and large translation projects. Contributions must meet the following criteria: they must be submitted by an authorized contributor; must be within scope of the vocabulary; must include the minimum information; and must be submitted in the prescribed format. By contributing data to the vocabulary, the contributor agrees to its contributed data becoming a permanent part of the vocabulary. Historically, the data has been licensed in various formats. Our newest form of releasing the Getty vocabularies is as Linked Open Data (LOD).

Question 2: Briefly summarize the work and plans of publishing Getty Vocabularies as Linked Open Data?

As of 21 August 2014, the Getty has released the Art & Architecture Thesaurus (AAT) and the Getty Thesaurus of Geographic Names (TGN) as Linked Open Data. The data sets are available for download under an Open Data Commons Attribution License (ODC BY 1.0).

The AAT is a reference of over 250,000 terms on art and architectural history, styles, and techniques. It is one of the Getty Research Institute’s four Getty Vocabularies, databases that serve as the premier resources for cultural heritage terms, artists’ names, and geographic information, reflecting over 30 years of collaborative scholarship. The TGN is a reference of over 2,000,000 names for places, both current and historical. It focuses particularly on places relevant for the cataloging and retrieval of information about art. The other two Getty Vocabularies, ULAN (containing personal and corporate names and biographies) and CONA (containing core information about a sample number of works of art and architecture), will be released as Linked Open Data over the coming year.

The process of releasing the Getty vocabulary data as LOD has involved months of analysis and work by our team of technical and editorial experts. For example, the Getty Vocabulary Program editors have been busy preparing for the AAT and TGN LOD releases in recent months. Inconsistencies in the data that were not apparent in previous forms of releases had to be remedied in order to allow accurate linking in the LOD world. On the technical side, our team thoroughly analyzed which ontologies should be used to express the data and resolved other issues. For the AAT, wherever possible, the data elements were mapped to the following external standards: SKOS, SKOSXL, ISO 25964 for representing the thesaurus information; DC, DCT for common properties; BIBO, FOAF for sources and contributors; RDF, RDFS, OWL, XSD for system properties; R2RML for implementing the conversion of data from Oracle to Ntriples.

A new ontology, the GVP (Getty Vocabulary Program) ontology was required to fill gaps and include various classes, properties and individuals (values) to be used in the mapping to add further detail. Some examples include the following: Broader Transitive; Term Characteristics; Sort Order; Historic Information; Provenance; and Associative Relationships. As we release each vocabulary as LOD, further revisions to the mapping and ontologies will be required. 

For development of the Getty vocabularies in LOD, we have established an open community and we welcome collaboration. Our technical team works closely with an international team of advisers to resolve various issues as they arise.

The AAT and TGN datasets are currently available in various semantic views: JSON, RDF, N3/Turtle, N-Triples. The data will be refreshed often to keep up-to-date with additions and edits to the databases, which change daily due to new contributions and other additions and changes. The ontology and other technical information about our LOD project is available on this page

Question 3: How have you tackled the licensing issues that come as a result of publishing these vocabularies as Linked Open Data?

The AAT, TGN, and ULAN have traditionally been licensed and released in various formats: Included are a free online search formand raw data files that are licensed from the Getty in relational tables and XML format and through Web services APIs. Even after the Getty vocabularies are released as LOD, we plan to continue providing the data in relational tables and XML releases, to meet the critical needs of many users. (If any format is discontinued in the future, users will be given advance notice.)

However, the Getty’s entering the world of LOD required a reconsideration of licensing possibilities. If data is to be open to the community for linking and discovery, traditional licensing and copyright practices for art information, images, and associated vocabularies must be adjusted. Data is considered open if the community is free to use, reuse, and redistribute the data, subject either to no restriction or to only the requirements of attribution or “share-alike.” Among the licenses most often applied to art information are Creative Commons and Open Data Commons licenses, each of which offers a range of levels of openness.

After analyzing the possibilities, and comparing our data and our priorities to others in the cultural heritage arena, the Getty decided upon the Open Data Commons Attribution License (ODC-By) v1.0 for the releases of AAT and TGN;. Under this license users are free to do the following: To Share: Including to copy, distribute and use the database; To Create: Including to produce works from the database; and To Adapt: Including to modify, transform and build upon the database, so long as the user attributes the Getty in any public use of the database, or works produced from the database.

As background to these decisions, an atmosphere of openness and collaboration exists at the Getty. In recent months the Getty has launched the Open Content Program, which makes thousands of images of works of art available for download, and the Virtual Library, offering free online access to hundreds of Getty Publications backlist titles. The Getty vocabularies’ release as LOD is another collaborative project between scholars, other content experts worldwide, and technology experts, thus providing a further step towards our goal to make art and research resources as widely accessible and usable as possible