A review of the HOPE Content Providers Surveys given at the outset of the project reveals that HOPE content providers have thus far primarily concerned themselves with traditional cataloging and processing and with the culling of descriptive metadata. A few store structural metadata, generally with links from metadata records to files located in a directory structure on a file server, or in one case a quite complex system of metadata and object storage united by an Excel-based integration file.
''Identifiers'' in some form are used by all content providers to store and manage local metadata and objects, but only three content providers had implemented globally persistent identification systems at the outset of the project. One used the German namespace in the national library URN system (urn:nbn:de:) with registration and resolver services hosted through the Deutche Nationalbibliothek for its metadata records. Another uses the Handle Service for records in its DSpace repository, though in order to implement Handles for its HOPE digital objects and metadata, it still has to register and administer a second Naming Authority. A third indicated that they used an unidentified PID system internally. The French content providers likewise indicated that they might need to employ ARK in accordance with Bibliothèque nationale de France. Thus use of PIDs was limited at the start of the project, and three institutions initially initially expressed unwillingness to implement PIDs; three others were undecided.
Digital object file names, on the other hand, served an important role as identifiers within the HOPE institutions. File naming conventions were thus generally quite developed. For the overwhelming majority of HOPE content providers, root file names were created on the basis of institutional domains—library or archival standards—in an attempt to map digital objects and files against physical collections. Such solutions included straightforward references to the physical archival units, inventory numbers or library call numbers, media and format specifications, or repository IDs. A few content providers used automatically generated 'meaningless' names, such as randomly created numeric codes.
File and directory naming systems depended on several factors: is the object digitally born or was it digitized; how much flexibility is allowed by the cataloging software; are there any IT system or network limitations; are the files being exchanged; what options are allowed by the back-up or storage system? In all cases the goal was the same, to uniquely identify files within the institution. Many institutions relied heavily on file names and directory paths as structural metadata within their digital object management systems—a dubious practice. The Open Archival Information System (OAIS) reference model suggests that such 'packaging information' is by nature transitory and cannot serve as content and preservation description information.
''Technical Metadata'' was scarce. Three content providers stored fixity information. One stored file checksums generated using the MD5 algorithm. The other two did not specify which method and algorithm was used. Otherwise, three content providers suggested that representation information was stored along with objects. Only one provided details, stating that they provided links to external viewers for audiovisual material. Not surprisingly, the content provider with the most extensive audiovisual collection and dedicated audiovisual repository stored the most explicit technical information, including: file size, file date creation, width, height, bits per px. Finally, several content providers stored the location of masters on tape or other long-term storage media.
It is notable that, though many tools are available to generate this information, no content providers mentioned saving information on the file format separately from the file extension. Neither did any explicitly mention capturing technical metadata related to the creation of the digital object. Finally, though checksums were used by a few content providers, digital signatures were not used by any.
''Rights Metadata'' is recorded by about half of the content providers. Though there was no detailed explanation given, it can be assumed that most stored information on the copyright owner in a free text field. Two also mentioned storing additional information on restrictions (donor restrictions or data protection), at least one in a dedicated metadata field. No use of CC licenses was recorded except by one content provider for a special project. In final reckoning, most content providers used a combination of technical means (i.e. watermarking, low resolutions copies, or other limits to online access) and basic free text metadata to control access to and use of collections.
''Digital Provenance Metadata'' was not routinely collected by most content providers. Two implemented repositories which changed the file name upon submission. The repository of one packaged files in a proprietary format and created a new file name without saving the name of the submitted file. The other renamed the files but also stored the name of the submitted file. The in-house repositories of three content providers created audit trails of activity. Of these, one supported full-scale versioning of both metadata and objects. For these three institutions, only activities which took place external to the repositories, such as the initial creation of digital objects, were not tracked. The range of functionality supported by their repositories thus determined the scope of their digital provenance metadata. Otherwise for the majority of content providers, there was little explicit digital provenance metadata recorded. However, this may be overstated. A combination of informal documentation, such as scan logs, and tacit information on quality control, transformation, and migration policies and procedures likely exists and could be used to create standard metadata when the need arose. Filezilla or other FTP clients may also facilitate event tracking.
More worrying perhaps was the number of content providers that outsourced digitization and even repository functions. Eleven of the thirteen content providers noted that they outsourced digitization at least some of the time. While this is not troubling in and of itself, it does mean that they risk losing the technical and provenance data on the creation of digital masters and derivatives. While information on the software and hardware used to create objects is relatively easy to retrieve even after a lapse of some years’ for digitization work undertaken in house, it may be nearly impossible to get from external vendors after the fact. Several institutions likewise noted dependence on an external or parent organization for their technical infrastructure. Though this is not necessarily a disadvantage—often quite the contrary—it may be an additional obstacle to the collection and storage of standardized administrative, and particularly digital provenance,
metadata.
Recommendations for the HOPE Federated Repositories
As noted in Collecting Administrative Metadata for the moment it is only necessary to undertake digital object management in so much as it supports the HOPE service’s specific functions:
- To submit, store, and make available over the medium term digital masters and/or digital derivatives;
- To ensure the fixity and integrity of objects after submission to the system;
- To deliver objects in a form that can be rendered in the online environment and understood by designated community;
- To clarify, record, and implement the access and use rights and restrictions over our content;
- And to store information in a manner that will not preclude later preservation activities.
Given the complex nature of the HOPE system and underlying services, it is best to approach general recommendations by identifying the function of each HOPE module and ensuring that the digital object management is undertaken to support this function.
The ''HOPE Aggregator'' has as its primary function to store and disseminate descriptive information and their related Dissemination Information Packages—in other words to deliver objects in a form that can be rendered in the online environment and understood and used by the designated community. For this, it is recommended that the Aggregator maintain sufficient reference, representation, and context information to allow a digital derivative object to be accessed, rendered, and used by the designated community as well as the rights data to support such access and use.
Highly recommended, administrative and other metadata on:
- Identifiers that are globally unique and resolvable on the web ''(description, object, or file level)'';
- File format and size of access derivative ''(file level)'';
- Copyright, licenses, or other use restrictions ''(description or object level)'';
- Use, role, or variant of access derivative (e.g. derivative 2, preview, thumbnail) ''(object level)'';
- Original material type and language (this is generally counted as descriptive metadata but may also serve as representation information), ''(description level)'';
- Granularity of item (e.g. document, periodical issues, set or 'file/folder' of documents) ''(description level)'';
- Structural metadata ''(object or file level)'';
- Access Restrictions ''(description or object level)''.
Also recommended, administrative metadata on:
- Physical characteristics which inhibit access ''(description or object level)'';
- Access facilitators (e.g. time coding) ''(object or file level)''.
The ''HOPE Shared Object Repository (SOR)'' has as its primary function to ingest, store, and make available over the medium term digital master objects as well as to support object transformation (i.e. derivative creation). For this, it is recommended that the Shared Object Repository store the reference information needed to manage files and objects, the representation information necessary to create derivatives in a format required by portals, and the representation information needed to make objects accessible to designated community. It is also recommended that the Shared Object Repository store fixity information to support routine quality control and provenance information tracking transformations within the system itself.
Highly recommended, administrative and other metadata on:
- Object identifiers for masters and derivatives that are globally unique ''(file and possibly object level)'';
- Fixity of masters ''(file level)'';
- Viewers and players that are not readily available to the average user, specifically AV players (links to external viewers or embed links would also suffice) for masters and derivatives ''(file or object level)'';
- File format of masters and derivatives ''(file level)'';
- File size for masters and derivatives ''(file level)'';
- Use, role, or variant of object (e.g. master, derivative 1, derivative 2, preview, thumbnail) ''(object level)'';
- Audit trail logging transactions with files from the point of submission ''(file and object level)'';
- Structural metadata ''(file and object level)'';
- Access restrictions ''(object level)''.
Also recommended, administrative metadata on:
- Format version and registry information for masters ''(file level)'';
- Fixity of derivatives ''(file level)'';
- Submitted master file name, if changed ''(file level)''.
The ''HOPE-Compliant Local Object Repositories (LORs)'' generally serve a range of functions, which may encompass those above. In the case of non-SOR users, local repositories may support ingest and storage of master files and derivative creation as well as routine quality control. LORs may also support some of the Aggregator functions on their own local sites or may independently export content to portals. For these, they should follow the recommendations above.
It is highly recommended that all HOPE LORs be able to produce when needed the reference, representation, and context information (e.g. identifiers, file formats and size, viewers and players, structural metadata, language, type, and granularity) that is necessary to represent objects in an online environment for designated community and the rights information that supports access and use. LORs will also play a key role in any future preservation activities. Care should be taken to collect and store relevant technical metadata on the events in the digital life cycle of each object from the moment of its creation. (It is important to note that there are currently no hard requirements or recommendations on whether and how such information is stored but only that LORs should be able to produce the information if needed.)
Related Resource
Consultative Committee for Space Data Systems. ''Reference Model for an Open Archival Information System''. CCSDS 650.0-M-2 Magenta Book. Washington D.C.: NASA, 2012. (https://public.ccsds.org/pubs/650x0m2.pdf)