Semantic Harmonization

Users must be able to filter result sets by date, language, and media type or format. Users must also be able to browse and filter content by institution, collection, and relevant themes. Added contextual information and other enhancements of base metadata, e.g. timelines, themes, or tags, should be presented as part of the search and discovery interface.

The HOPE project has set few criteria for acceptance of digital content and descriptive metadata. The primary goal has been to gather content from multiple domains for dissemination to a wide range of discovery services which themselves place minimal barriers on the supply of content.  Nevertheless, a secondary goal is to forge the HOPE Social History Resource, a corpus of material which can be cross searched on the Social History Portal. With this second aim in mind, HOPE supports a base level of descriptive metadata normalization and has also drafted a set of non-mandatory best practice recommendations. These are the requirements proposed for the supply of descriptive metadata to the Social History Portal. These requirements are intended to support a high-level discovery-to-delivery experience for the target user groups.

Value Types Selected for Normalization

In order to facilitate the discovery of HOPE content, HOPE supports controlled access to a subset of HOPE elements. To this end, the Aggregator performs a series of normalization steps after ingest of the content provider's metadata. This a semi-automated process, whereby the Aggregator checks whether the values in a particular set of HOPE elements are compliant with a corresponding data value standard, i.e. a controlled syntax, list, or vocabulary. In most cases, the content provider is asked to supply a table matching non-compliant values to their corresponding compliant values. The Aggregator then records these compliant values in the appropriate HOPE elements. In these cases, the 'original' value supplied by the content provider is maintained in the value space, while the normalized value is recorded as a normalized attribute. In the case of dates, HOPE simply requires content providers to supply dates in a standard form.

HOPE normalizes the following value types:

  1. ''Types:'' This enables faceted browsing by object/document type in the Social History Portal. HOPE requires a normalized type value for the Type element in the Digital Resource entity and has recently added an additional mandatory Type element in the Descriptive Unit. In practice, these values are generally the same for a given item. Normalized type values are also a requirement for each Descriptive Unit submitted to Europeana.
    Because of our strong committment to Europeana, HOPE has opted to use Europeana type values over the more widely used Dublin Core Metadata Initiative Type Vocabulary.

    List of Europeana Types based on DCMI Type values:

    DC Text=Text
    A resource consisting primarily of words for reading, e.g. books, letters, dissertations, poems, newspapers, articles, and reports.

    DC Image/Still Image=Image
    A visual representation other than text, specifically a static visual representation, e.g. paintings, drawings, graphic designs, plans, and maps.

    DC Sound=Audio
    A resource primarily intended to be heard, e.g. recorded music, speech, or sounds.

    DC Image/Moving Image=Video
    A visual representation other than text, specifically a series of visual representations imparting an impression of motion when shown in succession, e.g. animations, movies, short films, or television programs.

    ''Related non-normalized type elements:'' Each Descriptive Unit domain sub-entity has an additional type-like element—including Genre of the Fonds (Archival Profile), General Material Designation (Library Profile), Object Name (Visual Profile), Instantiation Type (Audiovisual Profile)—that is not normalized by HOPE. These elements contain either free-text or domain-specific controlled values about the about the type of the object and are submitted as part of the descriptive metadata.

  2. ''Rights:'' This provides Social History Portal users with information on use restrictions that apply to the content. A normalized rights value is required in the Rights element of the Digital Resource. Normalized rights values for each Digital Resource are also recommended and supported by Europeana.
    HOPE has opted to use Europeana rights values. The ens:rights values, which specify a set URLs to Creative Commons (CC) licenses or Europeana rights statements.

    List of CC licenses with URLs:
    Public Domain Mark (http://creativecommons.org/publicdomain/mark/1.0)
    CC0 (http://creativecommons.org/publicdomain/zero/1.0)
    CC BY (http://creativecommons.org/licenses/by/3.0)
    CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0)
    CC BY-NC (http://creativecommons.org/licenses/by-nc/3.0)
    CC BY-NC-SA (http://creativecommons.org/licenses/by-nc-sa/2.0)
    CC BY-ND (http://creativecommons.org/licenses/by-nd/2.0)
    CC BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/1.0)

    List of Europeana rights statements with URLs:
    Rights Reserved - Free Access (http://www.europeana.eu/rights/rr-f.html)
    Rights Reserved - Paid Access (http://www.europeana.eu/rights/rr-p.html)
    Rights Reserved - Restricted Access (http://www.europeana.eu/rights/rr-r.html)
    Unknown (http://www.europeana.eu/rights/unknown.html)

    ''Related non-normalized rights elements:'' The Rights element of the Descriptive Unit—including Conditions Governing Access (Archival Profile), Conditions Governing Use (Archival Profile), Credit Line (Visual Profile), Access Conditions (Audiovisual Profile), IPR Registration (Audiovisual Profile), Rights (Dublin Core Profile)—is not normalized by HOPE. This element contains free-text information about the rights restrictions on the original item and is submitted as part of the descriptive metadata.

  3. ''Language:'' This enables faceted browsing and search by language in the Social History Portal. Though not a mandatory element, when used the Language element of the Digital Resource requires a normalized value. Europeana requires a normalized value for language elements, specifically to record the official language of the country where the content provider is located.
    In addition, HOPE requires a normalized value in the Language Metadata element of each Descriptive Unit. This helps the Aggregator and the Social History Portal to identify the base language of cataloging for purposes of indexing or display. Translated values are also given a normalized Language Attribute value; the attribute is captured from description encoded using various domain standards, in elements such as Translated Title (Library Profile), Translated Subtitle (Library Profile), Translated Title (Audiovisual Profile), Translated Synopsis (Audiovisual Profile), but may also be provided for other translated text values. The use of the Language Attribute is highly recommended when applicable.
    HOPE uses ISO 639-3:2007, an international standard defining three‐letter codes for all known natural languages.

    Examples:
    English=eng
    German=deu
    Dutch=ndl

    ''Related non-normalized language elements:'' The Language element of the Descriptive Unit—including Language of the Described Material (Archival Profile), Original Language (Audiovisual Profile), Language Used (Audiovisual Profile), and Subtitle Language (Audiovisual Profile)—is not normalized by HOPE. This element contains free-text information about the language of the original item and is submitted with other descriptive metadata.

  4. ''Script:'' This will allow the Aggregator to create special indexes for values that are encoded using non-Latin scripts. When applicable, the use of the script attribute is highly recommended as it facilitates multilingual indexing of HOPE descriptive metadata. Currently, non-Latin values also require a transliteration to Latin script in order to be appropriately cross-indexed and sorted with other language metadata by the HOPE Aggregator.
    HOPE uses ISO 15924, a standard which defines both a four-letter code and a numeric code for each script.

    Examples:<br>Latin=Latn
    Arabic=Arab
    Cyrillic=Cyrl
    Yi=Yiii

  5. ''Dates:'' This enables browsing and search by year or exact date in the Social History Portal. A uniform date form might also support the display of collections on a timeline. Though not a mandatory element, when used the Date element of the Descriptive Unit—including Date of Creation (Archival Profile), Date of Publication (Library Profile), Date of Printing or Manufacture of Engraving (Library Profile), Object Production Date (Visual Profile), Publication Date (Audiovisual Profile), Broadcast Date (Audiovisual Profile), Date (Dublin Core Profile)—requires a normalized value. Normalized values are also used in Date elements found in HOPE authorities, such as HOPE Themes, Agents, and Events to help with  identification and potential disambiguation.
    For date forms HOPE relies on ISO 8601:2004, a standard covering the exchange of date and time-related data. ISO 8601:2004 also covers time intervals and recurring time intervals.

    Examples:
    2011 (calendar dates expressed in terms of calendar year)
    2011-03 (calendar dates expressed in terms of calendar year and calendar month)
    2011-03-29  (calendar dates expressed in terms of calendar year, calendar month, and calendar day)
    2010-05-21/2011-03-29 (time intervals, using solidus '/' as an interval designator)

(See: Semantic Model: Modelling Digital Objects, section on Digital Object Modelling as Best Practice; see also: Semantic Model: Representing Domains, section on HOPE Domain Profiles as Best Practice.)

Harmonizing Hierarchically Structured Descriptions

A final value which is normalized in HOPE is the Description Level element, which is stored as part of the Descriptive Unit domain sub-entities. As noted, the HOPE data model allows for each Descriptive Unit to specify a 'level of description', i.e. the level of arrangement of the descriptive unit. (See: Semantic Model: Representing Domains, section on Hierarchically Structured Descriptions.) Though the data model specifies a domain-specific element for each of the profiles, HOPE does not yet provide domain-specific vocabularies for populating these. As an alternative however and to allow the Aggregator to distinguish between top-level collection descriptions and bottom-level item descriptions, which are linked with digital content, the data model specifies a generic normalized value for each Descriptive Unit. The generic values are:

  • ''Collection Record:'' This is the top-level record for each collection supplied by the content providers. Though still standard Descriptive Units, collection records are not part of the mapped data set submitted by the content provider. Instead collection metadata is created during the mapping by completing a standardized form, a step which can be seen as a form of data enrichment.<br>Collection records may include descriptions of item groupings or groupings of mid-level series units. Collection records may not include related digital content.
  • ''Series Records:'' These Descriptive Units are optional and include descriptions of item groupings or groupings of other mid-level units. Series records have no related digital content. Each series record must have one and only one parent, be it a collection record or another series record.
  • ''Item-Level Records:'' Include descriptions of single items. If the collection has been digitized, the digital content is linked at item level. Each digital object (and corresponding Digital Resources) must be linked to one and only one item record. Each item-level record must have one and only one parent, be it a collection record or a series record.

The normalized value is given based on the mapping specifications supplied by content providers for each collection. The mapping sheet allows the specification of different mappings for top-level collection records, mid-level series records, and item records.  The following table lists the normalized values for Description Level with examples of domain-specific levels of description that may be mapped to them.

Table 3-J - HOPE Level of Description By Domain: Table showing the normalized Description Level values required by the HOPE Schema next to the domain-specific levels of description identified by content providers.

In HOPE, the content provider must provide a controlled term for the Description Level for each descriptive unit. Below are sample HOPE XML instances of the level of description metadata for each domain, including the normalized values added via the mapping sheet:

Examples for Archival Descriptive Units:
<descriptionLevel normalised="collection"></descriptionLevel>
<descriptionLevel normalised="series">fonds</descriptionLevel>
<descriptionLevel normalised="series">subseries</descriptionLevel>
<descriptionLevel normalised="item">folder</descriptionLevel>
<descriptionLevel normalised="item">document</descriptionLevel>

Examples for Library Descriptive Units:
<descriptionLevel normalised="collection"></descriptionLevel>
<descriptionLevel normalised="series">periodical</descriptionLevel>
<descriptionLevel normalised="item">issue</descriptionLevel>

Examples for Audiovisual Descriptive Units:
<descriptionLevel normalised="collection"></descriptionLevel>
<descriptionLevel normalised="series">series</descriptionLevel>
<descriptionLevel normalised="item">program</descriptionLevel>
<descriptionLevel normalised="item">film</descriptionLevel>
<descriptionLevel normalised="item">sound recording</descriptionLevel

Examples for Visual Descriptive Units:
<descriptionLevel normalised="collection"></descriptionLevel>
<descriptionLevel normalised="item">object</descriptionLevel>
<descriptionLevel normalised="item">image</descriptionLevel>

In order to provide guidance on the consistent application of hierarchical structures in the system, HOPE has defined a set of best practice requirements for the supply of hierarchical descriptions as well as a set of hierarchical models for each domain. At the top of each multilevel description in every domain there is always a collection description which is not part of the data sets but is instead created at ingest time. It is expected that the metadata on mid- and lower-level units are part of the supplied data and will be mapped to the domain profiles using the mapping sheets.

  • Archival Collections: In the HOPE data model, the most granular level of an archival collection, the so-called 'item', can be either a document or a folder (i.e. group of documents). Above each document or folder the levels of description can be arrayed as follows:

    Document/Folder < Series < Fonds < Collection
    (Note: subfonds and subseries can also be added along with fonds or series throughout this section.)

    Document/Folder < Collection

  • Library Collections: In the HOPE data model, the most granular level of a library collection can be either a monograph or an issue. Above each monograph or issue, the levels of description can be arrayed as follows:

    Issue < Periodical < Collection

    Monograph < Series < Collection

    Monograph  < Collection

  • Audiovisual Collections: In the HOPE data model, the most granular level of an audiovisual collection can be either a film, a sound recording, or a program. Above each film, sound recording, or program, the levels of description can be arrayed as follows:

    Program < Series < Collection

    Film/Sound Recording/Program < Collection

  • Visual Collections: Visual collections basically consist of a set of single cultural heritage objects, i.e. photographs, prints, posters, badges, etc. In the HOPE data model, the most granular level of a visual collection is an image or object. Above each image or object, there is a collection, as shown below:

    Image/Object < Collection
    (Note: images and objects may include suites of images or object groups which are described collectively as items)

As mentioned, HOPE has few hard-coded requirements for the supply of metadata. All together these include: the provision of PIDs or local identifiers for all submitted entities; a normalized value for Language Metadata and a normalized value for Description Level for all descriptive units; a normalized Type value and Rights value for each Digital Resource; the provision of a PID (or local identifier and URL) for a Derivative 2 of each Digital Resource and the PID (or URL) of a Landing Page for each non-collection descriptive unit. These, along with a few mandatory elements populated by the Aggregator itself, are seen as essential to ensuring the basic functionality of the Aggregator and to meeting the base requirements for supply of content to Europeana, the primary external discovery service. Notably, however, in HOPE there exist few domain-specific mandatory elements and those that stand are based on domain practice rather than the cross-searching needs of HOPE's target users.

To address this, the following section specifies three sets of best-practice requirements, one for each level of description identified. Requirements are separated and discussed by level of description for the following reasons: 1) the metadata recorded for each level of description bears a different relationship to the locally managed metadata; and 2) the records for each level of description will eventually serve a different purpose on the IAHLI portal. This reasoning will be further developed within the section.

Metadata Recommendations for Collection Descriptive Units

HOPE's collection description is based on the Dublin Core Collection Description Profile and was created specifically for use in the Social History Portal. Collections are defined by content providers according to guidelines provided by HOPE. (See: Semantic Model: Representing Domains, section on Application of Domain Profiles to HOPE Collections.)  Metadata on collections is entered by the content provider during the mapping phase of content submission.  As the total number of collections for each content provider is relatively small and as this metadata is created specifically for HOPE (and is not expected to be stored and managed in local systems), the requirements for collection description are hard coded and relatively stringent. Collections are connected to series and/or to items but never directly to digital content. 

The collection record was created to provide a uniform access point to HOPE’s wide ranging social history content. The purpose of the collection record is fourfold. First, collection records provide an overview of content available on the Social History Portal. Collections are of a granularity and number which allow them to be showcased on a map, arrayed on a timeline, or presented topically, alphabetically, or by institution. The HOPE Social History Resource currently includes 130 digital collections. Second, collection records help orient users by providing a relatively uniform set of basic information about each set of material submitted. Basic information includes the nature (both physical and intellectual) and structure of the underlying content and description. As such collection descriptions are intended to sit atop the domain-specific mid- and item-level descriptions, which are steeped in their respective professional practice and often disorienting for end users to navigate between. Importantly, while in most cases it is infeasible to produce item-level descriptions in multiple languages, collection descriptions can easily be translated according to a defined language policy. Third, collection records serve as a navigation device allowing users to jump between highly granular item-level results and broader sets of related material—and between content and context. Queries on collections can likewise provide an alternative to 'noisy' item-level results sets. Fourth, collection records present and explain overarching administrative, access/use, and delivery policies. Collection description can help illuminate content as an administrative as well as an intellectual grouping.

Mandatory Elements

  • ''To support browsing and searching:'' Collection Title; Abstact; Date Created; Language (of items, if applicable); Item Type; Repository; HOPE Themes (added after import to Aggregator);
  • ''To support identification and selection:'' Size In Items (if applicable);
  • ''To provide information on overarching policies:'' Use Rights; Access Rights;

Recommended Elements

  • ''To support browsing and searching:'' Subject; Spatial Coverage;
  • ''To support identification and selection:'' Item Format; Temporal Coverage; non-normalized values which accompany normalized values for dates and physical description fields;
  • ''To provide practical information or overarching policies:'' Custodial History; Catalog or Index.
Metadata Recommendations for Mid-Level Descriptive Units

A series, as defined by HOPE in the most generic sense, is a set of items grouped together based on their creation/production or collection history. Series are represented by one or more levels of description in several domains, including hierarchical fonds descriptions and descriptions of published or produced series or serials. HOPE content providers currently store such mid-level metadata in their local collection management systems. As a rule, this metadata has been created and managed following professional content standards. Thus, few requirements are imposed on series description. Series are connected to collections, other series, and items but not to digital content.

Mid-level series records are needed to provide the creation, production, and collecting context for HOPE content. They also provide an important level of granularity for the search and browse interfaces.  As mid-level descriptive units are highly domain specific, it is best to discuss them accordingly.

Archival Fonds Description

Archival description comprises not one but several hierarchically structured mid-level descriptive records: Fonds, Subfonds, Series, Subseries. Such multilevel description is written as a single interconnected document with inheritance rules that pull description to the higher levels. As such, for Social History Portal end users it is important that the structure is not violated—that existing units are kept and presented together and that the fonds unit is present. Because these units of description exist for purposes other than to facilitate the presentation of materials, it is not recommended to alter multilevel descriptions. Instead, the collection record has been developed to assume higher-level search and browsing roles. Nevertheless, hierarchical description still serves several important functions on the Social History Portal very similar to those listed above under Collection Description: to provide context about the creation and collection of materials, to provide administrative and access information, to serve as an important level of granularity which allows users to move between general and specific—content and context.

Element requirements have been set in accordance with ISAD(G) specifications, and not specifically with reference to Social History Portal functionality. Inheritance means that lower-level archival units have fewer requirements than fonds.

Mandatory Elements

  • ''For fonds units:'' Call Number; Level of Description; Title; Origination (or Origination sub-elements); Date of Creation; Extent;
  • ''For non-fonds units:'' Call Number; Level of Description; Title.<br>If Extent, Date of Creation, or Origination are of a different value than those in fonds description, these should also be present;

Recommended Elements

  • ''For fonds units:'' Administrative/Biographical History; Archival History; Content Summary; System of Arrangement; Conditions Governing Access; Conditions Governing Use; Languages of the Described Materials (if applicable); and various Controlled Access Headings.
  • ''For non-fonds units:'' Content Summary; System of Arrangement.<br>Conditions Governing Access, Conditions Governing Use, and Languages of the Described Material should also be present if values are different from those in fonds description.

Library and Audiovisual Series Description

Library and audiovisual series are sets of items that are produced as a formal coherent work and generally share a uniform title and often include a uniform description. In practice, items belonging to a series are often described using a minimum of elements while more extensive description is pulled into the series description. Though treated as mid-level descriptive units by HOPE, they are generally more granular than fonds-level description. As a result, library and audiovisual series can be used in place of items to group and present search results, thus preventing the 'noise' that would result from the presentation of all items bearing a similar name and description. The Social History Portal specifications currently make no reference to this issue.

Here we propose mandatory elements which would allow library and audiovisual series records to be searched together with other item records, thus providing a potentially valuable option for the Social History Portal search interface:

Mandatory Elements

  • ''For library series:'' Book Number; Title; Type; if applicable, Language;
  • ''For audiovisual series:'' Inventory Number; Title; Type; if applicable, Language, Original Language, Subtitle Language, Language Used;

Recommended Elements

  • ''For library series:'' if applicable, Sub-Title, Statement of Responsibility, Edition Statement; if applicable, Author; if applicable, Contributor, Co-Author; Abstract or Subject/Keywords, as applicable; Publisher or Printer, Place of Publication or Place of Printing, Publication Frequency;
  • ''For audiovisual series:'' if applicable, Producer; if applicable, Director, Cast; Synopsis or Subject/Country of Reference; Distributor or Broadcaster, Publication Location.

Date and Extent (in terms of number of items) are also highly recommended. (Note, however, that these can be tricky. The dates of publication or production and extent of a whole formal series are often broader than the dates and extent of the actual holdings of a single institution and can thus be difficult to establish.)

Metadata Recommendations for Item-Level Descriptive Units

Items are the finest level of granularity in the context of the creation, production, or collection of the original materials.  In HOPE, items are represented by various formats in each domain, archival documents and folders, monographs, periodical issues, images, objects, televised programs, musical recordings, films... Item-level metadata is generally, but not necessarily, managed in local collection management systems. Archives in particular follow idiosyncratic practice in the creation and management of item-level descriptions. It is expected that HOPE content providers currently have some item-level metadata, especially for library and audiovisual material, but that they will also dedicate time and resources to enhancing and harmonizing metadata to meet HOPE requirements. Thus, requirements have been set to optimize the discovery experience on the Social History Portal. In HOPE items are connected to collections, series, and digital content. The requirement for digital content is specifically related to items.

Item-level records support users in the discovery, identification, selection, and retrieval of particular material. The purpose of item records in HOPE is fourfold. First, item records support simple and advanced search and filter functionality leading to the discovery of specific material. Second, they serve to help users navigate between granular and broad results sets—and between content and context. Third, they provide information on the identity, form, content, creation, and source of particular material. And fourth, item records give practical information related to the source of, access to, and use of materials with direct links to digitized copies of the material when available or further information on access and delivery options when not.

To facilitate the searching and navigation of collections within the Social History Portal:

Mandatory HOPE Schema Elements (listed with related elements from domain profiles)

  • ''To support browsing and searching:'' Title (Title); Type; Date (Date of Creation, Production, Issue, Publication, Broadcast); if applicable, Language (Language, Language of Material, Original Language, Subtitle Language, Language Used);
  • ''To provide practical information:'' Identifier (Resource Identifier, Call Number, Book Number, Object Number, Inventory Number); Rights (Rights, Conditions Governing Use, Credit Line, IPR Registration);

Recommended Elements (if not recorded at the series level)

  • ''To support browsing and searching:'' if applicable, other Title (Sub-Title, Statement of Responsibility, Edition Statement); Creator (Creator, Origination, Author, Object Production Name, Producer); if applicable, Contributor (Contributor, Co-Author, Director, Cast); Description (Description, Content Summary, Abstract or Table of Contents, Synopsis or Shotlist) or Subject/Spatial Coverage (Subject, Spatial Coverage, Keywords, Indexed Entities, Associated Entities, Depicted Entities) as applicable;
  • ''To support identification and selection:'' if applicable, Publication (Publisher or Printer, Distributor or Broadcaster, Place of Publication or Place of Printing, Publication Location); Extent or Dimensions, Original Length or Original Duration;
  • ''To provide practical information:'' as applicable, Rights (Conditions Governing Access, Access Conditions).

Links to local catalog records and digital content, when available, are already supported through the following required or Aggregator-supplied metadata:

  • ''To support identification, selection, and delivery and to provide practical information:'' Landing Page (Landing Page PID or Landing Page Resolve URL);
  • ''To support browsing, searching, identification and selection (for digitized content only):'' Derivative 3 (Derivative 3 PID or Derivative 3 Local Identifier and Resolve URL);
  • ''To support identification, selection, and delivery (for digitized content only):'' Derivative 2 (Derivative 2 PID or Derivative 2 Local Identifier and Resolve URL);
  • ''To support identification, selection, and delivery (for digitized content only):'' Digital Resource Sequence information (PID Next Descriptive Unit in Sequence or Local Identifier Next Descriptive Unit in Sequence).
Harmonizing Multilingual/Script Descriptions

The sheer work involved in changing the language or script of existing descriptive metadata or altering a long-standing transliteration practice, gives the impression that linguistic harmonization is 'too big to confront'. This has prevented HOPE from setting strict requirements on the language and script of metadata. In particular, HOPE has chosen not to establish a base language for the project but instead supports description in all submitted languages and scripts. As mentioned, HOPE uses UNICODE UTF-8, which allows description in all world languages. As a bare minimum HOPE mandates the use of a Metadata Language elements and highly recommends the use of Language and Script Attributes to differentiate translated and non-Latin script values. (See: Semantic Model: Supporting Multilingual Description.)

Nevertheless, HOPE would like to underscore some basic language-related issues likely to impede target user access to the HOPE Social History Resource. To address these issues, HOPE recommends a few best practice measures:

''A mismatch between the language of the material and the language of cataloging used to describe it.''  A mismatch is often signaled by the need to use translated titles, particularly in library and audiovisual description where primary titles are generally recorded in original language. The description of multilingual content in a single institutional language is, in fact, standard practice in institutional catalogs or repositories and has come to be expected by end users. Nevertheless, the practice does not extend well when metadata is disseminated to a global audience through international or multilingual discovery services. In an early survey on target user habits, 67.7 percent of the sample group confirmed this, citing that in addition to their mother tongue they perform searches in the language of the material sought—though even more (85 percent) claimed to use English for searching.

Thus when feasible, HOPE recommends that content providers supply original language Title and Description to each item-level and mid-level series record (the element labels for these vary according to the domain profile, see above). For collection records, HOPE strongly recommends the provision of metadata in the original language of the content, when possible. If there is more than one language represented in the collection, then collection metadata may be recorded in one or more dominant languages or in an umbrella language. Parallel metadata may also be created in the language of cataloguing of the content provider. (If HOPE eventually chooses to establish a base language, all collection records should likewise be translated to this base language.)

''Non-Latin script description.'' Simply put, non-Latin script description poses problems for the common search, sort, and display of HOPE collections. If non-Latin script values are assigned the appropriate script attribute, then the HOPE Aggregator may store values in separate indexes which would be available to the Social History Portal for the development of multi-script functionality.

However this future possibility aside, it is highly recommended to provide HOPE with Latin script transliterations for those elements which will be displayed and sorted on common results lists, specifically Creator and Title, as well as elements which may form the basis of faceted searching, namely as Contributor or Subject (the element labels for these vary according to the domain profile). Here it should be noted that personal and organizational names are a key cross language access point. HOPE's early survey on target user habits revealed that personal and organizational name queries were used by an overwhelming 80 percent of the sample group (by contrast dates and geographical terms were each used by less than 50 percent). To facilitate cross searching on names, HOPE also encourages content providers to record names in their 'native form'. This will be discussed in more detail. (See: Semantic Enrichment.)

''A range of available transliteration schemes.'' The transliteration of non-Latin values can support the common indexing of HOPE content, but when various transliteration schemes are used, much of this benefit is lost. The four HOPE content providers who record transliterated values from Cyrillic rely on three transliteration schemes. Two use the ISO 9: 1995 standard, while another employs the older ISO/R 9: 1968. The latter also leaves out the diacritics on the 'e', while retaining them on the other letters. A fourth content provider uses the ALA-LC Romanization Tables, but entirely without diacritics.

ISO 9:1995 has strong features to recommend it, not least of which is a universal 1:1 mapping of letters whereby each character is represented by one unique equivalent character. Though this means that the system itself relies on multiple diacritics, it is also the only Cyrillic transliteration scheme that supports complete reverse translation (and thus in a sense serves to retain the original script values). It is intended to be completely language independent. Chinese and Arabic scripts are also transliterated by one content provider each so there is as yet no discernible variation. Arabic script is transliterated using ISO 233-2:1993, a relatively recent simplified transliteration scheme developed for the purposes of indexing bibliographic data. Chinese script is transliterated using ISO 7098:1982, also known as the Pinyin Scheme. Pinyin (in its updated form ISO 7098:1991) is not only the most up-to-date ISO Chinese transliteration standard, but is also the official scheme used by the Chinese government. Both are strong candidates for recommendation. In general, when considering a transliteration scheme, the following features should be analyzed: the possibility for retro-conversion into the original script; lack of ambiguity; reliance on diacritics; and widespread use. For those in a position to select a transliteration scheme, HOPE currently recommends the most up-to-date ISO standards.

''The storage of transliterated values.'' The above leads to the question of whether to store parallel transliterated and non-transliterated values. Of the four HOPE content providers who transliterate from non-Latin scripts, only one saves the original script forms. As with the choice of transliteration schemes, it is clear that practice depends in part on the multi-script support of domain standards and national union catalogs. It is also clear that the use of transliteration schemes such as ISO 9: 1995 which allow for retro-conversion make the decision to store original script values less onerous.

Nevertheless, it remains the case that search and discovery carried out on material in a non-Latin script is seriously impeded by the existence of transliterated metadata in a variety of forms. And again, as content is disseminated through more services to a wider audience, the reliance on local or outdated transliteration practice becomes inappropriate. The storage of original script values enables: 1) a direct and unambiguous search in the original language of the material; and 2) the possibility to re-transliterate data if/when transliteration schemes are updated or preferred standards.

Thus, HOPE recommends that, if feasible within the current cataloging practice, content providers retain values in original script. In addition to the transliterated elements list above (Title, Creator, Contributor, Subject), HOPE particularly recommends original script values for non-transliterated elements related to the content or coverage of the material, notably the Description, Provenance, Spatial Coverage, and any non-normalized Rights and Language values (the element labels for these vary according to the domain profile). Feasibility should be considered in light of the following:

  • ''Internal Context:'' In-house language/alphabet policies; professional standards followed (specifically, library and archival content and structural standards and authorities); copy cataloging practices;  technical environment (e.g. support of character sets);
  • ''External Context:'' language/transliteration practices for external material potentially presented and searched with the material; language/transliteration practice followed by national institutions or similar profile institutions; union catalog, harvester, portal rules and guidelines;
  • ''Target User Context:'' typical research and search patterns in the field; technical environment.

In sum, HOPE suggests that when possible, titles and descriptions should be recorded in the original language of the material. For non-Latin script metadata, titles, names, and concepts should be transliterated for appropriate display in common results lists and added cross searching value, but these values should also be retained in original script. Values which relate to the content or coverage of and access to the resource should also be retained in original script. When choosing a transliteration scheme, as a rule HOPE supports the most up-to-date ISO schemes. It is clear that within the scope of a project like HOPE, multi-lingual challenges cannot be completely overcome. Many will also have to be confronted at the level of indexing and display. Nevertheless, with a few minor changes in practice, access to HOPE descriptive metadata can be greatly improved and cross searching enhanced.

HOPE's strength lies in the flexibility of its data model and its ability to capture metadata in a common schema while retaining domain-specific and linguistic nuances. HOPE metadata currently reaches the acceptance levels for most broad-based portals and discovery services. And yet HOPE still strives for the next level, the integration of content into a coherent corpus. During the initial phases, HOPE has accomplished this through the normalization of several key values. It has also attempted to guide content providers in the supply of data beyond the minimal requirements. It remains to discuss the key role of common authority files for cross searching HOPE's multi-domain multilingual collections.

Related Resources

Europeana. ''Europeana Semantic elements Specification and Guidelines''. July 2013. (https://pro.europeana.eu/files/Europeana_Professional/Share_your_data/T…)

''Europeana Professional'' (http://pro.europeana.eu)

HOPE: Heritage of the People's Europe. "Section: Data Model." ''The HOPE Common Metadata Structure, including Harmonisation Specifications''. May 2011. (http://www.peoplesheritage.eu/pdf/D2_2_Metadata%20Structure.pdf)

HOPE: Heritage of the People's Europe. "Section: XML Schema." ''The HOPE Common Metadata Structure, including Harmonisation Specifications''. May 2011.
(http://www.peoplesheritage.eu/pdf/D2_2_Metadata%20Structure.pdf)

HOPE: Heritage of the People's Europe. "Appendix B: HOPE Audiovisual Profile (prototype)." ''The HOPE Common Metadata Structure, including Harmonisation Specifications''. May 2011.

HOPE: Heritage of the People's Europe. ''HOPE Archival Profile Mapping Table, Version 1.5''. August 2012.

HOPE: Heritage of the People's Europe. ''HOPE Dublin Core Profile Mapping Table, Version 1.5''. August 2012.

HOPE: Heritage of the People's Europe. ''HOPE Library Profile Mapping Table, Version 1.5''. August 2012.

HOPE: Heritage of the People's Europe. ''HOPE Visual Profile Mapping Table, Version 1.5''. August 2012.

''ISO (International Organization for Standardization), Language codes - ISO 639'' (https://www.iso.org/iso-639-language-codes.html)

International Council on Archives (ICA). ''ISAD(G): General International Standard Archival Description, Second Edition''. Ottawa: 2000. (https://www.ica.org/sites/default/files/CBPS_2000_Guidelines_ISAD%28G%2…)

ISO (International Organization for Standardization). ''ISO-8601: Data Elements and Interchange Formats - Information Interchange - Representation of Dates and Times, Third Edition''. 2004. (https://www.iso.org/standard/40874.html)

Unicode. ''ISO 15924 Registration Authority''. (http://www.unicode.org/iso15924)


This section last updated July 2013. Content is no longer maintained.