Technical Framework: Systems and Practices

The introduction of networked systems in social history institutions dates back to the 1990s. In the wake of large-scale library automation, such institutions were eager to convert their card indexing systems or paper-based finding aids into electronic catalogs and to this end introduced specialist or standard collection management systems. Over time other databases were gradually brought in to supplement collection management systems. These allowed institutions to manage information about material of a single media-type, collection, project, or exhibition, particularly in the case of archival collections—for which item-level data has traditionally been scarce—and visual collections—for which descriptive standards remain scarce. It is notable that almost 50 percent of digital items slated for submission as part of the HOPE Social History Resource were described with idiosyncratic descriptive metadata; this included approximately 85 percent of the archival items. Likewise, not a single museum or visual descriptive standard was used by HOPE content providers.

In some cases, another layer has recently been added, as systems were brought in to manage a burgeoning supply of digitized content. In other cases, digital content sits on file servers under loose controls, from where it is pulled into descriptive databases or pushed directly to websites. These self-made, patch-work type information systems are now ubiquitous in the cultural heritage domain, and even more so in social history institutions where, with their relatively small-scale and heterogeneous collections, powerful enterprise management software have failed to take off.

The legacy of data structures and systems in use at social history institutions has not only led to an outdated and often expensive information architecture but also at times obstructed the introduction of best practices. Despite the widespread acceptance of library and archival descriptive standards (and respective XML schema) and emerging importance of preservation standards, legacy systems are often not easy to adapt. Those institutions depending on in-house or open source solutions often lack the technical and professional know-how to keep abreast of changing practice. Those using proprietary solutions are locked into data structures supported by service providers. In all cases, strong institutional habits bind these organizations firmly to existing practice—no matter how outdated.

It is therefore not surprising that none of the HOPE partner institutions have managed to implement a fully-functional preservation repository along the lines of the Open Archival Information System (OAIS) model. (In fact, one of the clearest conclusions that can be drawn from the survey data is a general lack of consensus over the concept of 'digital repository'.) On a broad scale, but also within individual institutions, an interesting mix of proprietary, open source, and custom built solutions co-exist. Manual processing and workarounds are often used to compensate for true integrated system architecture.

Digital Object Management Systems

Only three HOPE content providers have full-scale digital repositories: FMS's Westbrook Fortis, SSA's IMS Server-Client, and VGA's M-Box. All three are proprietary solutions to some extent. FMS manage their development in house, while SSA and VGA depend on service providers to maintain and develop their systems. While none of the three systems supports the full range of preservation functions, all include some ingest, storage, and access functions (validation of size and/or formats, fixity checks, derivative creation, access controls, and collection and storage of technical, structural, and provenance metadata). More surprising perhaps, all support descriptive metadata internal to the system, rather than linking to existing collection management systems—as all three institutions are archives, granular metadata may not be available elsewhere. The fact that all three solutions are proprietary may limit their ability interoperate with external services. Currently, none of them do. In the case of Westbrook Fortis, data is also 'locked in' to a proprietary file format.

Other institutions such as IISG, FES (Library), and OSA have experimented with open source digital repository software, Fedora, MyCoRe and D-Space respectively, for special projects and non-HOPE collections—in the case OSA and IISG to handle born-digital content. In all cases, the software have been configured to handle some ingest, archival storage, and data management functions. Interestingly, none of these institutions has yet developed these solutions to manage their entire collections. The majority of HOPE content providers continue to store digital content on file servers or in collection management systems. To the question "If no digital object repository is currently in place at your organization, how do you store and manage your digital content?" responses were more or less variations on the same theme. Six content providers explicitly mentioned storing content on file servers; of these three mentioned links from metadata records. Two store digital content directly in its collection management system. Seven content providers routinely back up digital masters to tape, storage devices, or servers; several depend on a larger umbrella organization to perform this task.

Three content providers indicated that they have service level agreements with external providers, but more use proprietary collection management systems for at least a sub-set of their material: Adlib, Aleph, Alexandrie, Arkhéïa, Flora, FAUST, Geac are used singly or in some combination by five institutions. Such solutions may hinder the standardization of metadata across partner institutions and obstruct effective bridging with external services. Three content providers use open source solutions supported by local governments and professional associations. CGIL use the Italian implementation of UNESCO's CDS-ISIS, called CDS-ISIS Teca. FES (Library) use Allegro-c developed and supported by the Science and Culture Ministry of Lower Saxony and used widely throughout Germany. SSA use Nebis, a Swiss library union catalog. Two content providers have recently introduced international open-source library platforms, Greenstone and KOHA. Such open source solutions exhibit more flexibility in their service packages than proprietary solutions, though integration with other systems would still require staff time, expertise, and possibly commitment to the community development process.

Digital Object Management Workflows

Turning to high-level digital object management workflows, it is clear that likely as a result of limited financial resources and staffing, many of the smaller organizations rely heavily on service providers or umbrella organizations for ICT support and infrastructure and many likewise outsource digitization. Not surprisingly, institutions with more service dependencies tend to have more straightforward internal workflows, containing fewer loops, redundancies, and extra manual work in the process itself. Larger institutions and those with a more varied collection profile tend to have more idiosyncratic, flexible, and 'organically' developed internal workflows. In these cases, scanning is mostly run by in-house staff with in-house technical support, while large-scale digitization is outsourced to vendors. In contrast, institutions with digital repositories tend to have more standardized digitization and more uniform workflows. Diagrams 1-E, 1-F, and 1-G show the high-level digital object management workflows of the three French content providers.

Diagram 1-E - HOPE Content Provider Génériques' Digital Object Management Workflow: digitization is outsourced; description is managed through the proprietary software Arkhéïa with indexing and web publication through an open source extension Pleade; internal workflows are highly dependent on manual procedures.

Diagram 1-F - HOPE Content Provider MSH-Dijon's Digital Object Management Workflow: digitization is managed in house, complicating workflows; description, indexing, and web production are also based on Arkhéïa/Pleade; the university infrastructure means fewer manual procedures and more robust storage and back up.

Diagram 1-G - HOPE Content Provider BDIC's Digital Object Management Workflow: digitization is both internal and external; collection management, indexing, and web publication are managed through the integrated solution Flora, thus cutting steps; links from the French union catalogs, SUDOC and Calames, require synchronization with Flora; university infrastructure provides robust storage and data back up; work is focused around an excel-based 'integration file' to manage disparate activities.

HOPE Recommendations

Given the understandable investment and attachment to legacy systems, HOPE can only give a gentle nudge towards best practice. As a first step, institutions should become familiar with the functional entities (in OAIS terms) of a preservation repository system: ingest, archival storage, data management, and access, as well as the processes that each of these include. By articulating more clearly the entire range of functions, an institution may come to a better understanding of what a digital repository is and what it isn't. It is recommended that an institution prioritize functions according to current need and available resources and begin to gradually introduce functions into local architectures.

When developing a repository system from scratch, HOPE recommends a loosely-coupled, modular set of components, whether packaged as a single system or a stack of applications. In general, HOPE advises the use of open source solutions as the strongest protection against data lock-in in its various forms. Currently, the open source solution Fedora (Flexible Extensible Digital Object and Repository Architecture) nicely fulfills the above requirements and is put forward for those with the technical capacity to develop and support it. Those who lack the in-house technical expertise should consider outsourcing the development of Fedora or another open source package; this may not cost more than a typical service package. In general, institutions are advised to avoid monolithic one-size-fits-all solutions, but should focus instead on forging a system of different elements—a hybrid of open source and proprietary components if necessary. Finally, for small institutions that lack funding and technical know how, distributed or federated services shared among like-minded institutions may be a good alternative to profit-driven service providers.

As it stands, intentions regarding the use of open source technologies are clear: the survey revealed a strong preference for open source applications and open formats. In practice, many content providers explicitly committed to using open source technologies to develop digital repositories, OSA, IISG, Amsab-ISG, are still in the research and pilot phase. Those who have actually set up digital repositories have opted to use proprietary systems. This may be a worrying trend, and HOPE recommends that institutions take a longer look at open source alternatives and applications with modular and loosely-coupled architecture.

More problematic may be the fact that institutions have not clearly articulated the need for dedicated repositories as separate from collection management systems. HOPE content providers have either a collection management system or a digital repository but not both. For the most part, collection management systems, many proprietary but some open source, remain the focal point of institutional workflows and technical development. Until institutions begin to clearly distinguish digital object management from their more familiar collection management, Trusted Digital Repository best practices may remain elusive.

Related Resources

Bradley, Kevin, Junran Lei, and Chris Blackall. ''Towards and Open Source Repository and Preservation System: Recommendations on the Implementation of an Open Source Digital Archive and Preservation System and on Related Software Development''. Paris: UNESCO, 2007.

Consultative Committee for Space Data Systems. ''Reference Model for an Open Archival Information System''. CCSDS 650.0-M-2 Magenta Book. Washington D.C.: NASA, 2012. (https://public.ccsds.org/pubs/650x0m2.pdf)

Jantz, Ronald, and Michael J. Giarlo. "Digital Preservation: Architecture and Technology for Trusted Digital Repositories." In: ''D-Lib Magazine 11: 6'' (June 2005). (http://www.dlib.org/dlib/june05/jantz/06jantz.html)


This section last updated July 2013. Content is no longer maintained.