DRS Preservation Policy and Practice

advertisement

Harvard University Library (HUL) Digital Repository Service (DRS)

Preservation Policies and Practices

Rev. 4

– 2007-04-11

The Harvard University Library (HUL) operates the Digital Repository Service (DRS) as a preservation and access repository for library-like digital assets.

1.

The primary obligation of the DRS is to manage digital assets and ensure their usability over time.

2.

The emphasis on usability implies that the object of preservation is the underlying information content of the digital asset as exposed to patrons through specific behaviors.

3.

The unit of preservation interest is the digital object, an expression of abstract information content that is tangibly manifested in one or more formatted digital files.

4.

The baseline preservation function of the DRS includes system backup, disaster planning and recovery, monitoring and risk assessment, and the maintenance of the bit-level integrity of all managed digital objects. Note, however, that these activities alone may not be sufficient to ensure usability over time.

5.

All managed digital objects receive the highest level of preservation service that is supportable given their formal characteristics, the degree to which those characteristics are documented in metadata, and current technical understanding of the digital environment.

6.

Preservation services beyond the baseline functions, such as preservation planning and intervention, are performed on a “best effort” basis, subject to managerial considerations of staff availability, the allocation of resources for other HUL priorities, and cost.

7.

To achieve necessary operational efficiencies as the scale of the managed assets grows over time, preservation monitoring, assessment, and planning are performed at an aggregate, rather than the unit level. Preservation intervention, on the other hand, always occurs at the unit level.

8.

The mechanism for aggregation is the

content model

, which defines a class of digital objects that share a particular arrangement of:

Format

,

role

, and relational

structure

Characterization of these technical properties by

metadata

Patron expectations for

behavior

9.

Content models are periodically assessed with respect to their

technical

ability to facilitate usability over time. The results of these assessments are expressed in terms of a tripartite classification scheme:

Preferable

Acceptable

– Content models

most

conducive to preserving the usability of underlying content and behavior over time

– Content models

somewhat

conducive to preserving usability over time

Problematic

– Content models

least

conducive to usability over time

The primary questions involved in this assessment are focused on: a.

Rendering

availability

– How easily available and widely deployed are appropriate processing tools?

DRS Preservation Policy and Practice Page 1

b.

Preservation

confidence

– Primarily expressed in terms of:

Imminence

Loss

Cost

– Will preservation intervention be needed sooner rather than later?

– When preservation intervention becomes necessary, is the potential for loss of information content or experiential behavior large or small?

– When preservation intervention becomes necessary, is the degree of resource expenditure high or low?

This assessment is based in part on a consideration of the formats used in a particular content model. Consequently, all formats also undergo a similar assessment process using the same classification scheme. The granularity and completeness of metadata are also an important part of this assessment, as are behavioral considerations.

10.

DRS staff issue periodic best practice recommendations for digital asset creation and acquisition.

These recommendations provide guidance regarding the selection of content models most conducive to preservation.

11.

Conformance to DRS best practice recommendations and the use of the HUL integrated infrastructure for digital content management and delivery will result in a higher level of preservation confidence than would otherwise be the case.

Content models and behavior

The use of content models is necessary to clarify the varying planning and intervention efforts that must be carried out for the DRS to meet its preservation obligations for digital content that is otherwise indistinguishable solely on the basis of format. The inclusion of

behavior

as a component of content model definition recognizes that all digitally-encoded information needs to be

rendered

, i.e., transformed into humanly-sensible form, in order for that information to be usable. (Note that rendering is distinct from

delivery

, which involves merely providing a copy of the digitally-encoded information, i.e., the

“bits,” to a client-side agent.) While rendering always takes place on the client-side, there may be significant pre-rendering server-side steps in the process, for example, AIP-to-DIP conversion. Clientside rendering can be performed by agents that are commonly deployed, e.g., web browsers, or are highly specialized agents particular to niche communities.

Server-side repository Client-side agent

DRS

Storage and management Access

Image

TIFF IDS

Package

TIFF

Image processing tool /

Photoshop

Image

TIFF JPEG IDS

Select

JPEG Web browser

Image

TIFF JP2 HTML JPEG Web browser

Page-turned object

XML TIFF ASCII

IDS

Transform

PDS

Transform

HTML GIF Web browser

BIL image stack

XML TIFF ADS

Package

XML TIFF

Biomedical volumetric viewer

AIP

DRS Preservation Policy and Practice

DIP

Page 2

For example, these five instances of TIFF files will probably be “preserved” in several different manners depending on the nature of the server-side AIP-to-DIP conversion, some of which are transformative while others are not, and the type of client-side agent responsible for rendering.

DRS Preservation Policy and Practice Page 3

Download