Meaningful Bibliographic Metadata (M2B) -- Recommendations of a set of metadata properties and encoding vocabularies BETA VERSION
M2B is intended to assist content providers in selecting appropriate metadata properties for the creation, management and exchange of meaningful bibliographic information in open repositories. Its objectives include:
- To provide a set of common metadata properties;
- To encourage the use of authority data, controlled vocabularies, and syntax encoding standards;
- To recommend the use of URIs as names for things [1], especially for data values, when they are available.
Conceptual model
In order to have an overall picture and common understanding of involving entities and relationships in bibliographic descriptions, M2B has established a general conceptual model [2] (Figure 1) that provides a high level of abstraction focusing on bibliographic resource entity. Major relations can be identified between a resource instance (e.g., an article or a report) and the agent(s) (e.g., a personal author or a research team) that are responsible for the creation of the content and the dissemination of the resource, as well as the thema(s) (i.e., things that being the subjects or topics of an article. As a result, three core entities are presented in the model: resource, agent, and thema. The model presented in Figure 2 is based on the implication of the general concept model with examples of possible relationships between and among the instances in different entities.
The models convey the following meanings (entity names are presented in italics):
- Basic entities and their relationships. The resource entity is the centre of every description here. The model does not exemplify the types of sub-entities, e.g., the sub-entities of resource would be various resource types. Relationships are established between the resource entity and two other major entitles: agent and thema.
- Relationships between instances within the same entity. Relationships between instances of an entity also exist. For example, a resource may be related to another resource. An agent may be related to another agent. Such relationships are demonstrated in the model.
- Relationships between instances of different entities. Relationships between any pair of instances vary and can be found at different levels. The sample relationships illustrated in Figure 2 are demonstrative and may apply at different levels of the bibliographic resource entity. For example, an agent may provide the funding for the creation of an original work, for the translation of a work, or the production of a new format of a translation.
- Control of values. Authority control is considered an important element of the model. The agents, regardless of their roles in relation to a resource, should be managed through name authority files. Concepts, topics, and geographic places as the themas of a resource should be controlled with value vocabularies. Although not emphasized in the model for the authority control of the titles of bibliographic resource given the context of this report, it is also a logical step that resource uniform titles also be controlled.
More and more name authority files, controlled vocabularies, and resource datasets are becoming available as Linked Open Data (LOD). The model intentionally sets an extracted piece of the LOD cloud as the background for each entity, to remind the reader of reality.
The conceptual model holds the key for sharing the common understanding of the important entities and relationships for bibliographic data. It can be used with different data models that have different implementation approaches.
| Figure 2. The implication of the general concept |
Groups of Common Properties
Common properties for describing bibliographic resources are identified and grounded in nine groups based on our comprehensive studies of several open repositories. About two dozen properties used for describing a bibliographic resource are included in Group 1 to 8. Two sets of properties for describing relations between bibliographic resources or between agents are included in Group 9. In the following list of the groups, some selected properties are emphasized in italic format. In the report, the word resource is used to represent bibliographic resource, a primary resource type to be described.
- 1. Title Information. Title is one of the most important and relevant access points for any resource. The information is usually supplied through a number of properties including title, alternative title-(handling parallel title(s), translated title(s), transliterated title(s), etc.
- 2. Responsible Body. This group contains the properties associated with any agent who is responsible for the creation and/or publication of the content of the resource, for example, the creator, contributor, and publisher or issuer of a resource.
- 3. Physical Characteristics. Properties that describe the appearance and the characteristics of the physical form of a resource are placed into this group. They are: date, identifier, language, format, and edition/version.
- 4. Location (physical location). It is considered important for a resource to be located and obtained in the information exchange. Properties that record the location and availability information are taken into account in this unique group.
- 5. Subject. In contrast to the physical characteristics, the Subject group embraces the properties that describe or otherwise help the discovery of what the resource is about or denotes, in the form of subject term, classification/category, freely assigned keyword and geographic term.
- 6. Description of content. Two major types of descriptions that focus on the content of the resource rather than the physical object are considered in this group: a) any representative description of the content, usually in the form of abstract, summary, note, and table of contents and b) type or genre of the resource.
- 7. Intellectual property. Any property that deals with an aspect of intellectual property rights relating to access and use of a resource is included in this group, with special regard to rights, terms of use and access condition.
- 8. Usage. Properties that are related to the use of a resource, rather than the characteristics of the resource itself, are considered to belong to this group. Typical properties are: audience, literary indication, and education Level.
- 9. Relation. This group has a different perspective for describing the resources from other groups that focus on describing the resource itself. Here, various relations between two resources or between two agents are the focus of description. Due to the significant number of such properties, no specific properties are listed under the Relation group in the following table.
These groups of information are listed together in Table 1, with the specific properties included in each group. Special attention should also be given to the additional recommendations on cardinality, value control, and important attributes. Table 1 comprises the following components in corresponding columns:
- A. Groups of properties
- B. Properties included in each group. Two special styles are used to signify the importance of the properties: two plus signs “++” (also in red colour) for the mandatory property; one plus sign “+” (also in blue colour) for the highly recommended property in the context of bibliographic information exchange. The rest are recommended or optional.
- C. Requirements of properties in the context of both non-analytical and analytical bibliographic descriptions, specified with (M)andatory; (H)ighly-(R)ecommended; (R)ecommended; and (O)ptional marked for either process.
- D. Recommendation on the control of values, indicating: (n)ot controlled; should use a name authority or a controlled vocabulary; or should follow a syntax encoding rule.
- E. Some important attributes associated with individual properties, with special regard to the language and scheme attributes. A scheme can be either a value encoding scheme or a syntax encoding scheme.
A | B | C | D | E | |
Group | Property | Requirement | M | HR | R | O | | Value Control | Important Attributes | |
Non Analytical | Analytical | ||||
1. Title Information | title++ | M | M | n | language |
alternative title | O | O | n | ||
2. Responsible Body | creator+ | HR | HR | n or Name authority (personal, corporate body, conference) | scheme |
contributor | O | O | n or Name authority | ||
publisher/issuer+ | HR | R | n or Name authority | ||
3. Physical Characteristics | date++ | M | M | Syntax encoding rule | scheme |
identifier+ | HR | HR | Syntax encoding rule | scheme | |
language++ | M | M | Controlled list | scheme | |
format/medium+ | HR | HR | Controlled list | scheme | |
edition /version | R | R | n | ||
source+ | HR | R | n | ||
4. Location | location++ | M | M | n or Rule [Holding unit names may be managed through a controlled list] | |
5. Subject | subject term+ | HR | HR | Controlled vocabulary | language scheme |
classification | O | O | Controlled vocabulary, Classification system | scheme | |
[freely assigned] keyword | R | R | n | language | |
geographic term | O | O | Controlled vocabulary | language scheme | |
6. Description of content | description/abstract (or note/ summary/ table of contents) | R | R | n | language |
type/form/genre | R | R | Controlled vocabulary | language scheme | |
7. Intellectual property | rights+ term of use access condition | R | R | n [Rights holders may be managed through name authorities] | |
8. Usage | audience | O | O | Controlled list | scheme |
literary indication | O | O | Controlled list | scheme | |
education level | O | O | Controlled list | scheme | |
9. Relation | [relation between resources]+ | O | HR | Controlled resource IDs | |
[relation between agents] | O | O | n or Name authority |
[1] See Lee, T-B. (2006). Linked Data - Design Issues.
[2] The conceptual model is built on a FRBR-based model previously developed by the FAO AIMS team, with enormous extension and reconsideration for the current recommendations.