DRS Preservation Policy and Practice

advertisement

Harvard University Library (HUL) Digital Repository Service (DRS)

Preservation Policies and Practices

Rev. 4 – 2007-04-11

The Harvard University Library (HUL) operates the Digital Repository Service (DRS) as a preservation and access repository for library-like digital assets.

1.

The primary obligation of the DRS is to manage digital assets and ensure their usability over time.

2.

The emphasis on usability implies that the object of preservation is the underlying information content of the digital asset as exposed to patrons through specific behaviors.

3.

The unit of preservation interest is the digital object, an expression of abstract information content that is tangibly manifested in one or more formatted digital files.

4.

The baseline preservation function of the DRS includes system backup, disaster planning and recovery, monitoring and risk assessment, and the maintenance of the bit-level integrity of all managed digital objects. Note, however, that these activities alone may not be sufficient to ensure usability over time.

5.

All managed digital objects receive the highest level of preservation service that is supportable given their formal characteristics, the degree to which those characteristics are documented in metadata, and current technical understanding of the digital environment.

6.

Preservation services beyond the baseline functions, such as preservation planning and intervention, are performed on a “best effort” basis, subject to managerial considerations of staff availability, the allocation of resources for other HUL priorities, and cost.

7.

To achieve necessary operational efficiencies as the scale of the managed assets grows over time, preservation monitoring, assessment, and planning are performed at an aggregate, rather than the unit level. Preservation intervention, on the other hand, always occurs at the unit level.

8.

The mechanism for aggregation is the content model , which defines a class of digital objects that share a particular arrangement of:

Format , role , and relational structure

Characterization of these technical properties by metadata

Patron expectations for behavior

9.

Content models are periodically assessed with respect to their technical ability to facilitate usability over time. The results of these assessments are expressed in terms of a tripartite classification scheme:

Preferable

– Content models most conducive to preserving the usability of underlying content and behavior over time

Acceptable

– Content models somewhat conducive to preserving usability over time

Problematic – Content models least conducive to usability over time

The primary questions involved in this assessment are focused on: a.

Rendering availability – How easily available and widely deployed are appropriate processing tools?

DRS Preservation Policy and Practice Page 1

b.

Preservation confidence – Primarily expressed in terms of:

Imminence – Will preservation intervention be needed sooner rather than later?

Loss – When preservation intervention becomes necessary, is the potential for loss of information content or experiential behavior large or

Cost small?

– When preservation intervention becomes necessary, is the degree of resource expenditure high or low?

This assessment is based in part on a consideration of the formats used in a particular content model. Consequently, all formats also undergo a similar assessment process using the same classification scheme. The granularity and completeness of metadata are also an important part of this assessment, as are behavioral considerations.

10.

DRS staff issue periodic best practice recommendations for digital asset creation and acquisition.

These recommendations provide guidance regarding the selection of content models most conducive to preservation.

11.

Conformance to DRS best practice recommendations and the use of the HUL integrated infrastructure for digital content management and delivery will result in a higher level of preservation confidence than would otherwise be the case.

Content models and behavior

The use of content models is necessary to clarify the varying planning and intervention efforts that must be carried out for the DRS to meet its preservation obligations for digital content that is otherwise indistinguishable solely on the basis of format. The inclusion of behavior as a component of content model definition recognizes that all digitally-encoded information needs to be rendered , i.e., transformed into humanly-sensible form, in order for that information to be usable. (Note that rendering is distinct from delivery , which involves merely providing a copy of the digitally-encoded information, i.e., the

“bits,” to a client-side agent.) While rendering always takes place on the client-side, there may be significant pre-rendering server-side steps in the process, for example, AIP-to-DIP conversion. Clientside rendering can be performed by agents that are commonly deployed, e.g., web browsers, or are highly specialized agents particular to niche communities.

Server-side repository Client-side agent

DRS

Storage and management Access

Image TIFF IDS

Package

TIFF

Image processing tool /

Photoshop

Image TIFF JPEG IDS

Select

JPEG Web browser

Image TIFF JP2 HTML JPEG Web browser

Page-turned object XML TIFF ASCII

IDS

Transform

PDS

Transform

HTML GIF Web browser

BIL image stack XML TIFF ADS

Package

XML TIFF

Biomedical volumetric viewer

AIP

DRS Preservation Policy and Practice

DIP

Page 2

For example, these five instances of TIFF files will probably be “preserved” in several different manners depending on the nature of the server-side AIP-to-DIP conversion, some of which are transformative while others are not, and the type of client-side agent responsible for rendering.

DRS Preservation Policy and Practice Page 3

Download