Semantic Model: Modelling Digital Objects

A compound object refers to one or more digital files that together constitute a collection item. According to the findings of the HOPE Content Providers Survey, 57.4 percent of the total number of collections, representing 42.9 percent of the metadata records being supplied, did not at the time of the survey have a one-to-one relationship between descriptions and digital objects. Moreover, five content providers, holding more than half of these, claimed that they were not able to change the granularity of the digital objects because they lacked the time and the resource. Therefore, it was determined that the HOPE data model should accommodate the association of multiple digital files to one Descriptive Unit entity. This decision, however, meant that HOPE would not fulfill the Europeana Semantic Elements (ESE) requirement for a one-to-one correspondence between digital objects and metadata. HOPE thus relied upon the Europeana Data Model (EDM) to shape its approach.

In the HOPE system there is an added level of complexity. Not only can a digital object comprise multiple digital files, but the functional requirements specified that the HOPE data model should also store information on multiple versions of a single digital file. These versions could include: a low-resolution derivative file, a thumbnail file, and a transcription text file. (See: Semantic Model: Functional Requirements, section on Discovery-to-Delivery Needs of Designated Community.) This meant that it was necessary to distinguish between three levels of granularity for digital content: the compound object, the digital file, and the file version.

Compound Digital Object

In the HOPE data model a (compound) digital object is not an entity in its own right but the name given to one or more Digital Resource entities that are associated with one single Descriptive Unit. A digital object represents one single collection item, described by the Descriptive Unit entity, even if that item is represented by multiple images, i.e. the associated Digital Resource entities. EDM follows a similar approach, i.e. the digital object is a discrete unit of information that consists of all instances of ens:webResource that are gathered by an ore:aggregation.

The only information supplied by content providers, about the digital object as such, may consist of a URL for a web page on the content provider's website which displays the digital object (i.e. a set of digital image files) as well as the description of the corresponding collection item. The HOPE data model records this URL using the Landing Page element and requires the use of PIDs for this element. Since digital objects have a one-to-one relationship with the Descriptive Unit entity, and since the Landing Page also represents the metadata record, the Landing Page is recorded as a property of the Descriptive Unit entity.

Diagram 3-B - HOPE Compound Digital Object: Diagram showing the Digital Resource entity and related Descriptive Unit entity highlighting the Landing Page element.

Digital Representation

In order to accommodate the multiple images composing a single digital object and at the same time accommodate multiple versions of one image, the HOPE data model specifies an intermediate level that is denoted the digital representation. In the HOPE system, a digital representation is a single digital image or audiovisual/sound recording, that provides a single, unique rendition of a collection item. In the HOPE data model, information about digital representations is captured in the Digital Resource entity.

The Digital Resource entity records a PID for each digital representation—a PID created automatically by the Aggregator—, as well as structural (i.e. associated Descriptive Units, sequence information) and administrative (i.e. Language, Type, Rights) metadata that apply to the image or recording. The administrative metadata elements are used to record type, language, and rights information related to the digital resource itself, as separate from the original object, though in practice these are almost always the same. Type and Rights values are required by Europeana and Europeana vocabularies are used to populate these.

Diagram 3-C - HOPE Digital Representation: Diagram showing the Digital Resource entity and related Descriptive Unit entity highlighting digital representation elements.

Digital (Derivative) File

A digital representation may include multiple versions of the same image. The HOPE data model accommodates metadata about three types of versions, or derivative types: a low-resolution derivative (so-called derivative 2), a thumbnail derivative (so-called derivative 3), and a transcription. A digital (derivative) file is an image, text, film, or sound recording encoded as a binary computer file and created for a specific use, such as printing, displaying, or previewing

Information about these digital files is recorded as part of the Digital Resource entity using the Derivative 2, Derivative 3 and Transcription elements. These three elements record PIDs as well as local identifiers and resolve URLs for each digital file expressing the digital representation.

The digital derivative files may be stored in a local object repository or in the HOPE Shared Object Repository. Currently, HOPE requires content providers to supply Derivative 2 information in order to provide digital content to Europeana. The Derivative 3 element is also mandatory, though content providers are not required to supply it since the Aggregator can automatically create a thumbnail from the supplied Derivative 2. The supply of a manually or automatically (i.e. OCRed) transcription of the digital representation is optional.

The Thumbnail element in the Descriptive Unit is populated using the 'primary' attribute in the Derivative 3 element of the Digital Resources. Only one thumbnail should be primary. If no thumbnail is marked as primary, then the system will select the Derivative 3 of the first instance of the Digital Resource entity associated with the Descriptive Unit. If there are several primary thumbnails, then the system will select among them. This primary thumbnail will be used for for display with metadata records in results lists.

Diagram 3-D - HOPE Digital File: Diagram showing the Digital Resource entity and related Descriptive Unit entity highlighting digital file elements.

To illustrate how this works in practice:

Example: The Content Provider provides the aggregator with information about a coin, which belongs to a HOPE Collection. The coin itself is the collection item. The Content Provider provides the Aggregator with a metadata record that includes the description of the coin and information about the digital image of the coin. The information that describes the coin is mapped into the Descriptive Unit entity. The metadata record contains information about two different Digital Representations of the coin, an image of the front side of the coin, and an image of the back side of the coin. This information is mapped into two different Digital Resource entities, one for each image. In the HOPE data model, these two digital resource entities are associated with the Descriptive Unit entity using the 'is represented by' relationship. Together, these two instances of the Digital Resource entity constitute a Digital Object, representing the collection item. Each digital resource contains information about the different versions of the digital representation, i.e. a low-resolution derivative of the front or back side image, a thumbnail of the front or back side image. The Derivative 2 and Derivative 3 elements contain a PID for each digital file, redirecting to the LOR where these digital files have been stored.

In the HOPE data model, each Digital Resource must be associated with one and only one Descriptive Unit, but a Descriptive Unit may be associated with many Digital Resources or none at all (e.g. in the case of higher levels of description or of descriptions without available digital content). The concrete association between a single Descriptive Unit and one or more Digital Resource entities is recorded by the hope:isRepresentedBy and hope:represents relationships. These two elements together serve the role of the EDM ens:hasView property, which associates an ore:aggregation with zero or more ens:webresources. Note, this is the only relationship in the HOPE data model that is bi-directional. The Aggregator creates this relationship based on the information provided by the content provider in the mapping worksheet.

The HOPE data model likewise provides the opportunity to organize a series of Digital Resources that are associated with the same Descriptive Unit in a particular sequence. This may particularly apply to Compound Digital Objects where the series of images represents the logical order of the original resource (e.g. the separate digitized pages of a book). In order to support this feature, hope:isNextInSequence records the PID of the next Digital Resource in the ranking. It serves the same role as the EDM ens:isNextInSequence property.

The HOPE Schema records metadata about compound objects in a way that is analogous with EDM in part to ensure a simple mapping from the HOPE Schema to the EDM format. Content providers map information on compound objects through the domain profiles, after which the HOPE Aggregator transforms compound objects from domain profile to the HOPE Schema. HOPE also allows content providers to describe in plain language how sequence information is recorded in their data sets. For instance, data sets may record ranking numbers for each digital file of an object.

HOPE Digital Object Modelling as Best Practice

The HOPE data model nicely captures the complex relationship between compound digital objects and descriptive metadata, not only supporting the various files that make up a single item, but also supporting multiple quality variants of each file. Metadata on digital objects is stored at two levels. The first, the Descriptive Unit not only holds metadata on the source object but also contains information on the digital object, namely the URL of the landing page on the content provider's website and the primary thumbnail used to represent the object in search results. The second, the Digital Resource holds information on the individual digital images or recordings that make up the digital object; it includes various digital derivatives, structural information, and basic administrative data such as the rights, language, and type of digital object. HOPE's representation of digital objects is modeled on Open Archives Initiative Object Reuse and Exchange (OAI-ORE), which is also the basis EDM.

Nonetheless HOPE's representation is also problematic in several respects. The first and most obvious is that the metadata on language, type, and rights is recorded in both the Digital Resource and Descriptive Unit. (See: Semantic Model: Representing Domains, section on HOPE Domain Profiles as Best Practice.) Theoretically, these are different; the elements in the Descriptive Unit hold information on the original source object, while parallel elements in the Digital Resource hold information on the digital object. But in practice these values are rarely different. Yet the Aggregator does not validate or harmonize these values. Such controls may simply lie outside the scope of the project.

The storage of Rights, Language, and Type values as part of the Digital Resource is also problematic. In fact this information rarely applies at the level of a single digital file, but rather at the level of a file group—i.e. a digital object in its entirety. For multi-page documents, HOPE essentially records rights, language, and type information for each page. Again, though such a practice may be useful in exceptional cases, it is by no means the norm. And again, it is unclear whether HOPE has put in place sufficient validation to prevent problems. There is, simply put, a mismatch in granularity between the information and the level at which it is stored.

The storage of normalized administrative metadata at the level of the Digital Resource had a simple motivation. All three normalized values are highly recommended for submission to Europeana. (See: Semantic Harmonization, section on Value Types Selected for Normalization.) Rather than compel content providers to normalize legacy metadata that had been created according to selected domain standards, HOPE supplemented the domain elements with the three new Europeana elements. In a sense, these elements sit over and above submitted descriptions, though their values may of course be mapped from existing data. As these values are needed only for Europeana and therefore only for digital content, HOPE has created a workaround by placing these elements at the level of the digital content.

In many senses the flaws in the HOPE data model reflect those in EDM. Seemingly redundant elements are presented at various levels, capturing legacy data and supplementing it with additional layers of meaning. There is little or no thought of validating or harmonizing similar information stored in different elements (and different entities). This remains the problem of content providers and not the harvester, though the ramifications are certainly felt by the users of the service. Yet it is worthwhile to note that in Europeana edm:hasType and dc:language are recorded as part of the metadata on the original item (or Cultural Heritage Object), rather than with the Digital Resource (or Web Resource). And edm:rights metadata is generally recorded as part of the Aggregation and values are inherited by individual Web Resources. Thus, the HOPE data model has also strayed to some extent from EDM.

Over time, HOPE has informally begun to address the problems attendant with its initial approach. Recently HOPE created a new Type element as part of the Descriptive Unit, mirroring Europeana. The element simply duplicates the value stored in the Digital Resource(s) but at a higher level. The same has also been suggested for the Digital Resource Language value, though not yet for Rights. These are positive steps, but it might be advisable to revisit the HOPE data model in its entirety and to reexamine the initial approach.

Related Resources

Europeana. ''Definition of the Europeana Data Model Elements, Version 5.2.3''. February 2012.

Europeana. ''Europeana Data Model Primer''. October 2011. (https://pro.europeana.eu/files/Europeana_Professional/Share_your_data/Te...)

HOPE: Heritage of the People's Europe. ''The HOPE Common Metadata Structure, including Harmonisation Specifications''. May 2011. (http://www.peoplesheritage.eu/pdf/D2_2_Metadata%20Structure.pdf)

''Open Archives Initiative Object Reuse and Exchange (OAI-ORE)'' (http://www.openarchives.org/ore)


This section last updated July 2013. Content is no longer maintained.