Semantic Model: Common Data Model and XML Schema

The high-level HOPE data model is an abstraction of the HOPE Common Metadata Structure and serves as a high-level design of the HOPE Aggregator's metadata architecture. The main source of inspiration for the HOPE data model was the Europeana Data Model Specifications v5.2 (EDM) since several of the functional requirements are the same, i.e. accommodating cross-domain metadata, hierarchical descriptions, and compound objects. Moreover, the Europeana portal is one of the main discovery services targeted by the Aggregator, so the resemblance with EDM is expected to ensure a simple transformation to the EDM format.

The current EDM introduces a number of classes that distinguish between six non-information resources: agents, events, places, physical things, concepts, and time spans. These classes establish a semantic framework for the creation of common authority lists. Future releases of EDM will specify element sets for accommodating additional metadata about agents, events, places etc. The implementation of published controlled vocabularies validating these classes is also being studied, in particular the use of the Virtual International Authority File (VIAF) for person and organization names.  Implementing these classes in the HOPE data model not only ensures the interoperability with Europeana's authority lists, but it also allows HOPE to use authoritative schemes and vocabularies developed within the Europeana community for enriching the HOPE metadata.

Beyond this, since Europeana is also a cross-domain network, EDM provides specific solutions to enable the cross searching of diverse community-specific metadata. EDM has specified a set of generic properties which currently combine two different namespaces. On the one hand, EDM integrates the Europeana Semantic Elements (ESE)/Dublin Core element set as a series of properties. Within EDM, the ESE properties basically represent an object-centered approach, which directly links the features of the described object to the ore:proxy (the entity representing the description of the object). On the other hand, EDM has introduced an event-centered approach with a set of Europeana namespace (ENS) properties, which first groups the features of the described object per event in the history of the object, and then links the events to the ore:proxy. These properties are particularly relevant for event-centric metadata formats such as Lightweight Information Describing Objects (LIDO—the Europeana best practice encoding format for museum metadata).

Unlike ESE, EDM allows for hierarchical links between descriptive units using the dc:hasPart and dc:isPartOf properties. This feature is particularly relevant for accommodating multilevel archival finding aids, but also for describing formal or informal collections of museum objects and titles and issues of publications. The Archive Portal of Europe (APEnet) has already completed test mappings between Encoded Archival Description (EAD) and EDM, which were used for the specification and serialization of the HOPE archival profile. To accommodate bibliographic description, EDM will in the future also incorporate the Functional Requirements for Bibliographic Records (FRBR) entities such as Work, Expression, Manifestation, and Item.

Apart from EDM, a series of other metadata standards were incorporated in the HOPE data model. First of all Qualified Dublin Core for recording descriptive metadata, which is also an important component of EDM, and the PREMIS Data Dictionary for Preservation Metadata for accommodating metadata about digital resources. The functional requirements for the HOPE Common Metadata Structure have been the source for additional HOPE-specific [[Glossary#Metadata_Element|metadata elements]] and attributes.

Entities of the HOPE Data Model

In the HOPE system, entities are the distinct separate units of information that are exchanged between local systems, the HOPE Aggregator, and discovery services. In this respect, the transformation process consists of identifying these entities in the content providers' metadata and mapping them in the HOPE data model.

Diagram 3-A - HOPE High Level Data Model: Diagram showing the entities, sub-entities, basic metadata elements, and relationships of the HOPE data model.

The Descriptive Unit and Digital Resource entities are the core entities of the HOPE data model, accommodating information about respectively the description of the collection item and the digital content representing the collection item.

  • ''Descriptive Unit:'' contains information about one or more collection items. For recording descriptive metadata, the HOPE data model uses by default Qualified Dublin Core metadata elements, but the entity accommodates information that is available for all kinds of collection items, irrespective of the domain they belong to (archive, library, visual, audiovisual). Domain-specific information is recorded in one of the five specific sub-entities of the Descriptive Unit entity: Archive Unit, Library Unit, Visual Unit, Audiovisual Unit, and Dublin Core Unit. A Descriptive Unit entity can describe a single collection item, such as an archival document, a publication, a photograph, or a movie. But it can also describe a series of items, such as an archival fonds, a series of monographs, a periodical, a collection of photographs, or a series of TV broadcasts.
  • ''Digital Resource:'' contains information about a digital representation, a single digital image or audio-visual/sound recording, that provides a single, unique rendition of a collection item.

These entities are based on the implementation of Open Archives Initiative Object Reuse and Exchange (OAI-ORE) entities and properties by EDM. The core of EDM is organized around three entities borrowed from the OAI-ORE standard: ore:aggregation (in EDM used to represent complex constructs of web resources and proxies); ens:webResource (in EDM used for web representations of the corresponding collection item); and ore:proxy (in EDM used to convey metadata about the corresponding collection item). In EDM, aggregations are basically used to distinguish between proxies and web resources about the same collection item, but coming from different sources (i.e. different Europeana content providers). Since this kind of disambiguation is not relevant for the HOPE system, the HOPE data model specifies only entities for the descriptive metadata (cf. ore:proxy) and the corresponding digital resource (cf. ens:webResource). The ore:aggregation entity has essentially been passed over.

The HOPE data model allows the association Descriptive Unit entities in order to create hierarchies of Descriptive Units or to create sequences of Descriptive Units (See: Semantic Model: Representing Domains, section on Hierarchically Structured Descriptions). The HOPE data model likewise allows the association of Digital Resource entities in order to create a sequence of Digital Resources (See: Semantic Model: Modelling Digital Objects). A Descriptive Unit entity may be associated with one or more Digital Resources. A Digital Resource entity must be associated with one and only one Descriptive Unit. 

In the Descriptive Unit entity, each instance of a Dublin Core element can hold a value that is related with one of the HOPE domain profiles (See: Semantic Model: Representing Domains). By creating one index for every Dublin Core element, HOPE enables cross-domain searching over all the instances of this element. However, for some elements searching for each domain separately is preferred. This particularly applies to domain-specific elements which correspond with dc:format and dc:type, such as material designation for the library domain or object name for the visual domain. Such domain-specific elements have been specified in their own domain sub-entities. Each sub-entity holds the elements unique to that domain. But each sub-entity also inherits all the elements from the associated Descriptive Unit. Thus a sub-entity and its associated Descriptive Unit can be considered as one entity. In order to retain the domain-specific characteristics of each metadata element, the HOPE data model specifies a set of attributes that record the domain-specific context for each element:

  • ''Label:'' contains a domain-specific label to allow discovery services to display the corresponding value with the correct domain label;
  • ''Encoding:'' contains a reference to a metadata data standard that is the source from which the HOPE element has been derived. The metadata standards include EAD, MARC21 Bibliographic, LIDO, EN15907 for cinematographic works, Qualified Dublin Core, EDM, and PREMIS.  Elements that are specific for HOPE refer to the HOPE namespace;
  • ''Cataloging:'' contains a reference to a content standard specifying the order, syntax, and form of the recorded metadata values. This attribute contains the name of the cataloging rules and the identification number of the element. The cataloging rules include the General International Standard Archival Description (ISAD(G)) and International Standard Bibliographic Description (ISBD) 2007 Consolidated.

Example: <dc:creator encoding="ead:origination - name of person" cataloguing="ISAD(G) 3.2.1" label="origination">Sir Joris Janssens</dc:creator>

In this way, the Dublin Core element set enables the integration of HOPE metadata in one information space while the attributes ensure that the domain-specific character of the metadata is retained.

In order to improve cross searching, the HOPE data model borrows four more entities from EDM: 

  • ''Agent:'' people, individually or in a group;
  • ''Place:'' a geographical location;
  • ''Concept:'' ideas or notions;
  • ''Event:'' a set of coherent phenomena or cultural manifestations, bounded in time and space. 

Based on the functional requirements, the HOPE data model also includes two additional entities:

  • ''Content Provider:'' the institution supplying metadata and content to the HOPE Aggregator;
  • ''HOPE Theme:'' a thematic heading specific to the fields of social and labour history.

A Descriptive Unit entity can be associated with one or more Agent, Place, Concept, Event, or HOPE Theme entities. An Agent, Place, Concept, Event, or HOPE Theme entity can be associated with one or more Descriptive Unit entities. A Content Provider entity can be associated with one or more Descriptive Unit entities. A Descriptive Unit entity must be associated with one and only one Content Provider entity.

In the HOPE data model, each of the eight primary entities is uniquely identified by a PID; these serve as the glue which holds the system together. The six entities for which the information is supplied by the content provider (i.e. Descriptive Unit, Digital Resource, Agent, Place, Concept, and Event) also allow for the provision of a local identifier; this enables the identification of the analogous entity in the local system and can help the Aggregator create PIDs if needed. Due to their importance in managing data both within and between services, the provision of PIDs (or local identifiers) is among the few hard coded requirements in the HOPE system.

The HOPE XML Schema

The HOPE data model has been implemented by means of XML technology with an XML schema, called the HOPE Metadata Schema. The HOPE Metadata Schema is an XML file specified using the (W3C) XML Schema language, which prescribes the structure, content, and semantics of XML documents. The HOPE Metadata Schema implements the HOPE data model by representing entities and their properties as XML elements and attributes. In essence, the HOPE Schema describes HOPE metadata records, that is XML documents containing information about materials from the holding of HOPE content providers. The HOPE Aggregator stores XML documents in a database and indexes their content to provide search and browsing functionality.

This solution was chosen over other technologies (e.g a relational database) primarily because metadata records are harvested from the content providers and exported to discovery services in XML format. In order to avoid a double transformation from XML to, for example, records in a relational database and back to XML again for dissemination purpose, HOPE metadata records are stored as XML documents. Furthermore, the design of a relational database implementing the HOPE data model would have had to include many relationships between the different entities of the model. In such cases the database schema would have been hard to implement and search performances might have been affected as the number of stored records increased.

Related Resources

''Dublin Core (DC)'' (http://dublincore.org)

Europeana. ''Definition of the Europeana Data Model Elements, Version 5.2.3''. February 2012.

Europeana. ''Europeana Data Model Primer''. July 2013. (https://pro.europeana.eu/files/Europeana_Professional/Share_your_data/T…)

HOPE: Heritage of the People's Europe. ''The HOPE Common Metadata Structure, including Harmonisation Specifications''. May 2011.
(http://www.peoplesheritage.eu/pdf/D2_2_Metadata%20Structure.pdf)


This section last updated July 2013. Content is no longer maintained.