PREMIS - Preservation Metadata: Implementation Strategies

advertisement
Preservation Metadata:
Implementation Strategies
(PREMIS)
Rebecca Guenther
Library of Congress
rgue@loc.gov
IS&T Archiving Conference
April 28, 2005
Preservation Metadata: Implementation Strategies
Overview of presentation
Background to PREMIS
PREMIS membership and charge
Preservation repositories implementation survey
PREMIS Core elements group
• Development of data dictionary
• Data model
 Next steps
 Implementation issues




Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
OCLC/RLG Preservation Metadata Framework
Working Group
 OCLC/RLG Preservation Metadata Working Group
• Convened March 2000
• Looked at CEDARS, NLA, NEDLIB, OCLC
 Preservation metadata framework (June 2002)
• Synthesized elements from existing sets
• Based on OAIS information model
• Elaboration of OAIS
• Set of “prototype” preservation metadata elements
http://www.oclc.org/research/projects/pmwg/pm_framework.pdf
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
PREMIS
 June 2003: OCLC/RLG sponsored new working group: PREMIS
• Preservation Metadata: Implementation Strategies
 Need
• Practical and implementable, not broadly theoretical
• Independent of specific implementation
 Objectives
• Define “core” set of preservation metadata elements, with
supporting data dictionary, applicable to broad range of
digital preservation activities
• Identify and evaluate alternative strategies for encoding,
storing, managing, and exchanging preservation metadata
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Membership















Priscilla Caplan, FCLA (Chair)
Rebecca Guenther, LC (Chair)
Michael Alexander, British Library
George Barnum, GPO
Charles Blair, U. of Chicago
Olaf Brandt, U. of Gottingen
Adam Farquhar, British Library
David Gewirtz, Yale
Kevin Glavash, MIT/Dspace
Cathy Hartman, U. of N. Texas
Helen Hodgart, British Library
Nancy Hoebelheinrich, Stanford
Roger Howard/Sally Hubbard,
Getty Museum
Pam Kircher, OCLC
John Kunze, Calif. Digital Library
Apr. 28, 2005













Brian Lavoie, OCLC liaison
Robin Dale, RLG liaison
Vicky McCarger, LA Times
Jerry McDonough, NYU/METS
Evan Owens, JSTOR
Erin Rhodes, NARA
Madi Solomon, Walt Disney Co.
Angela Spinazze, ATSPIN
Gunter Waibel, RLG
Lisa Weber, NARA
Robin Wendler, Harvard
Hilde van Wijngaarden, KB
Andrew Wilson, NAA
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Advisory Committee





Howard Besser, UCLA
Liz Bishoff, OCLC (via
Colorado Digitization
Program)
Gerard Clifton, National
Library of Australia
Gail Hodge, CENDI
Steve Knight, National Library
of New Zealand
Apr. 28, 2005




Maggie Jones, Digital
Preservation Coalition
Nancy McGovern, Cornell
Cliff Morgan, Wiley UK
Richard Rinehart, U. of
California, Berkeley
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Implementation Survey Report




State of the art in Winter, 2003/2004
28 libraries, 7 archives, 3 museums, and 11 other
13 different countries; 45% from U.S.
38% in planning; 33% development; 46% production
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Survey findings
 Little experience with digital preservation
• Most didn’t have active preservation strategy
• Many not yet in production
• Cannot assess adequacy of metadata
 Lack of common vocabulary and conceptual framework
• Informed by OAIS reference model
• Difference of opinion as to meaning of OAIS compliance
 Metadata
• Many recording rights, provenance, technical,
administrative, descriptive and structural
 Most repositories serve goals of both preservation and
access
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Trends
 Store metadata redundantly in XML or relational database
and with content data objects
 Use METS for structural metadata and as container for
descriptive and administrative; MIX for images
 Use OAIS as framework and starting point
 Maintain multiple versions (originals, some normalized or
migrated) in repository with complete metadata for all
versions
 Choose multiple strategies for digital preservation
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Core Elements
 Mission: Define a core set of implementable
preservation metadata elements.
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Core Elements
 Mission: Define a core set of implementable
preservation metadata elements.
•
Information that supports and documents the
digital preservation process;
•
Information that supports the viability,
renderability, understandability, identity and
authenticity of digital objects over time.
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Core Elements
 Mission: Define a core set of implementable
preservation metadata elements.
•
•
What most working preservation repositories
are likely to need to know
Core does not imply mandatory
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Core Elements
 Mission: Define a core set of implementable
preservation metadata elements.
As rigorous as possible
• As much explanation as possible
• Implementation neutral -- “This is what you
have to know”
• Values can be automatically supplied and
processed -- no lengthy textual descriptions
•
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Core Elements: Data Model
Intellectual
Entities
Rights
Objects
Agents
Events
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Scope of data dictionary
Implementation independent
Descriptive metadata out of scope
Metadata about Agents is limited
Technical metadata applying to all or most format types
Media or hardware details is limited
Business rules are essential for working repositories, but
not covered
 Rights information for preservation actions, not access






Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Sample data dictionary entry
Semantic unit
Semantic
components
Definition
Rationale
Data constraint
Object category
Applicability
Examples
Repeatability
Obligation
Creation/
Maintenance notes
Usage notes
Apr. 28, 2005
size
None
The size in bytes of the file or bitstream stored in the
repository.
Size is useful for ensuring the correct number of bytes from
storage have been retrieved and that an application has
enough room to move or process files. It might also be used
when billing for storage.
Integer
Representation
File
Bitstream
Applicable
Applicable
2038927
Not repeatable
Not repeatable
Optional
Optional
Automatically obtained by the repository.
Not applicable
Defining this semantic unit as size in bytes makes it
unnecessary to record a unit of measurement. However, for
the purpose of data exchange the unit of measurement should
be stated or understood by both partners.
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Semantic units pertaining to objects








objectIdentifier
preservationLevel
objectCategory
objectCharacteristics
creatingApplication
originalName
Storage
environment
Apr. 28, 2005
signatureInformation
relationship
linkingEventIdentifier
linkingIntellectual
Entity Identifier
 linkingPermission
StatementIdentifier




IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
objectCharacteristics






compositionlevel
fixity
size
format
significantProperties
inhibitors
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Semantic units pertaining to Events








eventIdentifier
eventType
eventDateTime
eventDetail
eventOutcome
eventOutcomeDetail
linkingAgentIdentifier
linkingObjectIdentifier
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Semantic units pertaining to Agents
 agentIdentifier
 agentName
 agentType
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Semantic units pertaining to Rights

permissionStatement
 permissionStatementIdentifier
 relatedObject
 grantingAgent
 grantingAgreement
 permissionGranted
 act
 restriction
 termOfGrant
 permissionNote
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Next steps
 PREMIS deliverables (May 2005)
•
•
•
•
Data dictionary and report
XML schemas
Draft for experimentation to remain stable for a year
Revisions will be based on results of testing
 Follow-up activities
•
•
•
•
Testbeds for implementation and exchange
Community outreach
Establish maintenance activity
Consider formal standardization
Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
Implementation considerations
Schema use with specific implementations (e.g. METS)
Machine generation of metadata
Tools
Role of registries (format, environment)
Prospects for collaboration and exchanging information
content
 Rights and permissions
 Emergence of best practices
 Support needed from PREMIS maintenance activity





Apr. 28, 2005
IS&T Archiving Conference 2005
Preservation Metadata: Implementation Strategies
For More Information:
 PREMIS Web Site
• www.oclc.org/research/projects/pmwg
 “Implementing Metadata in Digital Preservation Systems:
The PREMIS Activity” D-Lib (April ‘04)
• www.dlib.org/dlib/april04/lavoie/04lavoie.html
 RLG DigiNews October 2004 and December 2004 issues
• www.rlg.org/en/page.php?Page_ID=12081
 Priscilla Caplan: pcaplan@ufl.edu
 Rebecca Guenther: rgue@loc.gov
Apr. 28, 2005
IS&T Archiving Conference 2005
Download