Digital Preservation at HUL & DRS 2

advertisement
DRS 2
one in a series of periodic updates
Harvard University Library
Andrea Goethals
October 21, 2009
DRS = Digital Repository Service
Agenda
1.
2.
3.
4.
5.
DRS 2 context
DRS vs DRS 2
Current work: DRS 2.1
Next set of work: DRS 2.2
Questions & comments
1. DRS 2 context …
HUL’s Digital Preservation
Program
A continuation of HUL’s mission to provide
current and future access to research
materials and resources, with recognition
that preserving access to digital content
requires different strategies, tools and
skills
 Centerpiece of the preservation program:
the DRS

Shapers of the DRS

Digital Preservation Community



Best practices, standards, lessons learned, experiments
Collaborative projects, member organizations, interest
groups, meetings, conferences, correspondence,
conversations, shared tool development
Harvard needs

Increasing amount of digital content



New formats and genres, born-digital material


DRS growth has been fueled by large projects…
Require services to store, preserve, manage, make
discoverable, etc.
Bring new requirements
Support changing user expectations

Print on demand, e-readers
DRS growth
120
100
TB
80
60
40
20
0
Jun-01
Jun-02
Jun-03
Jun-04
Jun-05
Jun-06
Jun-07
Jun-08
Jun-09
10/1/09: 118 TB in the DRS (Counting all backups: 378 TB)
2. DRS vs DRS 2
DRS

Set of professionally managed services
preservation
planning
& activities,
administration,
management tools
creation/
acquisition
creation &
format
guidelines,
training, ingest
service
storage &
monitoring
service
delivery services,
access
restrictions,
persistent names
use
DRS 2

Same services, but much improved
preservation
planning
& activities,
administration,
management tools
creation/
acquisition
creation &
format
guidelines,
training, ingest
service
storage &
monitoring
service
delivery services,
access
restrictions,
persistent names
use
DRS 2

Improvements
revamped
management
tools, adding
reporting, more
preservation
planning
creation/
acquisition
more guidelines,
acceptance of
more formats
and metadata
richer data model,
more robust and
scalable storage
system, better
monitoring and
recovery processes
additional access
restrictions,
redundant delivery
servers, additional
delivery services
use
3. Current work: DRS 2.1
DRS 2.1 Scope
1.
2.
Redesign of conceptual foundation
Release to a QA environment
DRS 2.1 Scope
1.
Redesign of conceptual foundation




2.
Modified data model
Content models
Object descriptors
New and different metadata schemas
Release to a QA environment

New and enhanced tools for creation and
deposit of objects for depositor testing
Modified Data Model

Current DRS: file level

All metadata is associated at the file level


All management has to be done at the
individual file level


Even if the same metadata applies to a group of files
Non-intuitive and unwieldy
DRS 2: adding 2 more levels



objects
(files)
bitstreams
Objects?

Aggregations of files that together represent a
coherent unit of content



Useful for management, reporting and searching


All the files that make up a single digital book
All the master and use copies representing a single
photograph
“How many PDS document objects do I have in the
DRS?”
Hook for new metadata


Administrative categories (projects, exhibits, collections,
etc.)
Descriptive metadata, catalog records
Bitstreams?
A subset of a file
 Hooks for metadata that apply to part but
not all of the file



To characterize the audio portion of a video file
To describe the contents of a ZIP file
Allow fine-grained description and
management
 May save storage space


some types of content can remain compressed
and still be described
Content models
Object types
 Define




valid file formats and relationships
known delivery and rendering applications
associated assessments and preservation plans
Enforces conformity - we know what we
have
 Tie directly to technology watches and
preservation plans

DRS 2.1 content models – deposit &
delivery
1.
Still image

2.
PDS document

3.
Initially just PDF files, delivered by FDS
Opaque

5.
Page-turned documents, delivered by PDS
Document

4.
Image objects, delivered by IDS
Files in any format
Text

Text, XML, etc. delivered by FDS
Object descriptors

A METS metadata file per object on the file
system alongside content files




Descriptive, administrative, preservation,
technical and structural metadata
Describes the object, all its files and bitstreams
and related significant events
Gives the metadata the same secure storage
as the content files
Self-contained, portable objects
Peering into a METS object
descriptor

For the object



For the object, each file and bitstream



MODS
PdsMD (for PDS document objects)
PREMIS
HulAdminMD
For each applicable file and bitstream




MIX
TextMD
DocumentMD
…
Deposit tools

Currently:



BatchBuilder
DRS Loader
DRS 2.1:





Enhanced BatchBuilder
New! File Information Tool Set (FITS)
New! Object Tool Set (OTS)
Enhanced DRS Loader
New! DRS Services
Enhanced BatchBuilder
Will build batches of objects rather than
batches of files
 Will automatically determine all technical
metadata (using FITS)
 Will automatically create all object
descriptors (using OTS)

DRS Services

New back-end service to centralize and
control access to DRS objects



Simplifies front end applications
Secures content and metadata
DRS 2.1 services
1.
2.
Object ingest
File delivery
June 2010: QA release to depositors

Depositors will be able to test new
workflows in QA


New BatchBuilder and DRS Loader to create
and deposit objects into the DRS
Enhanced IDS, FDS and PDS to view the
deposited content
4. Next set of work: DRS 2.2
DRS 2.2 Scope

DRS Web Admin



Additional content models


Audio, Web Harvest, Dark PDS Document, various
Google, MOA2 document, Biomedical Image, Target
Image and Email
Improved audio support



Easier discovery, batch updates, reporting, etc.
Repository administration and monitoring
MP3, MP4/AAC
BatchBuilder support
Rights and access management metadata


Rights metadata stored in DRS with content
Analysis of need for more granular access restrictions
June 2011: Production release
Creation, deposit and management of
objects
 All delivery services integrated with the
DRS Services
 All DRS files will have been migrated to
objects

Many people in OIS working on
DRS 2
Digital Library Projects Group
 Systems Operations Group
 Systems Development Group
 Metadata Analyst

More information

HUL’s Digital Preservation Program
http://hul.harvard.edu/ois/digpres/

DRS 2 Enhancements
http://hul.harvard.edu/ois/systems/drs/enhancements.html

andrea_goethals@harvard.edu
5. Questions & Comments
Download