PREMIS - Preservation Metadata: Implementation

advertisement
Understanding and Implementing the
PREMIS Data Dictionary for
Preservation Metadata
Rebecca Guenther,
Network Development & MARC
Standards Office
Library of Congress
Preservation
Metadata
Preservation metadata includes:


Provenance:
• Who has had custody/ownership of the
digital object?
Content
Authenticity:
• Is the digital object what it purports to be?
10 years on
50 years on

Preservation Activity:
• What has been done to preserve it?

Technical Environment:
• What is needed to render and use it?

Rights Management:
• What IPR must be observed?
 Makes digital objects self-documenting across time
Forever!
PREMIS Data Dictionary

May 2005: Data Dictionary for Preservation
Metadata: Final Report of the PREMIS Working Group

March 2008: PREMIS Data Dictionary for Preservation
Metadata, version 2.0 (version 2.1 Jan. 2011)

Includes PREMIS Data Dictionary, context/assumptions, data model, usage

XML schema to support implementation

Data Dictionary:
examples
Comprehensive view of information needed to support digital preservation
• Guidelines/recommendations to support creation, use, management
• Based on deep pool of institutional experiences in setting up and managing
operational capacity for digital preservation
•
http://www.loc.gov/standards/premis/v2/premis-2-0.pdf
What does PREMIS cover?
 Administrative metadata that supports the digital
preservation process
 Provides information to help manage a resource
for preservation purposes
•
•
•
Technical characteristics
Information about actions on an object
Relationships (structural and derivative)
Structural: indicates how compound objects are put
together
• Derivative: results of common preservation actions
•
Rights metadata associated with preservation
 In OAIS terms:
• Metadata as part of SIP, AIP or DIP
• Fits into Preservation Description Information
(Reference, Context, Provenance, Fixity)
•
What PREMIS is and is not

What PREMIS is:
•
•
•
•

Common data model for organizing/thinking about preservation
metadata
A checklist for core metadata in a repository
Guidance for local implementations
Standard for exchanging information packages between repositories
What PREMIS is not:
•
•
•
•
Out-of-the-box solution: need to instantiate as metadata elements in
repository system
All needed metadata: excludes business rules, format-specific
technical metadata, descriptive metadata for access, non-core
preservation metadata
Lifecycle management of objects outside repository
Rights management: limited to permissions regarding actions taken
within repository
PREMIS Data Model
Intellectual
Entities
Rights
Statements
Agents
Objects
Events
Intellectual Entities

Examples:
 Rabbit Run by John Updike
(a book)
 “Maggie at the beach”
(a photograph)
 The Library of Congress
Website (a website)
 The Library of Congress:
American Memory Home
page (a web page)



Set of content that is
considered a single
intellectual unit for purposes
of management and
description (e.g., a book, a
photograph, a map, a
database)
May include other Intellectual
Entities (e.g. a website that
includes a web page)
**Has one or more digital
representations**
Previously not fully described
in PREMIS DD, but will be in
scope in version 3.0
Objects



Discrete unit of information in
digital form
**Objects are what repository
actually preserves**
Three types of Object:
FILE: named and ordered
sequence of bytes that is
known by an operating
system
• REPRESENTATION: set of
files, including structural
metadata, that, taken
together, constitute a
complete rendering of an
Intellectual Entity
• BITSTREAM: data within a
file with properties relevant
for preservation purposes
(but needs additional
structure or reformatting to
be stand-alone file)
Intellectual entity will become
another level of object
•
Examples:
 chapter1.pdf (a file)
 chapter1.pdf + chapter2.pdf +
chapter3.pdf (representation of
a book w/3 chapters)
 TIFF file containing header and
2 images (2 bitstreams
(images), each with own set of
properties (semantic units):
e.g., identifiers, technical
metadata, inhibitors, … )
Object Example: book in two versions
Intellectual Entity
Da Vinci Code by
Dan Brown
Representation 1
Page image
version
File 1:
page1.tiff
File 2:
page2.tiff
File N:
pageN.tiff
Representation 2
ebook version
File N+1:
METS.xml
File 1:
book.lit
Events


Examples:
 Validation Event: use JHOVE
tool to verify that
chapter1.pdf is a valid PDF
file
 Ingest Event: transform an
OAIS SIP into an AIP
 Migration Event: create a
new version of an Object in
an up-to-date format


An action that involves or impacts
at least one Object or Agent
associated with or known by the
preservation repository
Helps document digital
provenance. Can track history of
Object through the chain of
Events that occur during the
Objects lifecycle
Determining which Events are in
scope is up to the repository
(e.g., Events which occur before
ingest, or after de-accession)
Determining which Events should
be recorded, and at what level of
granularity is up to the repository
Agents


Examples:
 Martha Anderson (a person)
 Library of Congress (an
organization)
 Dark Archive in the Sunshine
State implementation (a
system)
 JHOVE version 1.0 (a
software program)

Person, organization, or
software program/system
associated with an Event or a
Right (permission statement)
Agents are associated only
indirectly to Objects through
Events or Rights
Not defined in detail in
PREMIS DD; not considered
core preservation metadata
beyond identification
Rights Statements


Example:
 Priscilla Caplan grants FCLA
digital repository permission
to make three copies of
metadata_fundamentals.pdf
for preservation purposes.
An agreement with a rights
holder that grants permission
for the repository to
undertake an action(s)
associated with an Object(s)
in the repository.
Not a full rights expression
language; focuses exclusively
on permissions that take the
form:
• Agent X grants Permission
Y to the repository in
regard to Object Z.
Technical metadata pertaining to
objects
Object identifier
Preservation level
Significant characteristics
Object characteristics
• fixity
• format
• size
• creating application
• inhibitors
• object characteristics
extension
 Creating application
 Original name




 Storage
 Environment
• software
• hardware
 Digital signatures
 Relationships
 Linking event identifier
 Linking permission
statement identifier
Semantic units pertaining to Events:
provenance and preservation activity
 Event identifier
 Event type (e.g. capture, creation, validation,
migration, fixity check)
 Event dateTime
 Event detail
 Event outcome
 Event outcome detail
 Linking agent identifier
 Linking object identifier
Semantic units pertaining to Rights
 Rights Statement





Rights Statement
Identifier
Rights Basis
Copyright Information
License Information
Statute Information

Rights Granted







act
restriction
termOfGrant
rightsGranted
Linking Object
Identifier
Linking Agent Identifier
rightsExtension
Semantic units pertaining to Agents







Agent Identifier
Agent Name
Agent Type
Agent Note
Agent Extension
linking Event Identifier
Linking Rights Identifier
PREMIS timeline
Metadata
Framework
For
Digital
Preservation
2002
PREMIS 2.0
released
PREMIS Data Dictionary released
Maintenance Activity formed
2003
PREMIS
Working Group
formed
2004
2005
2006
2007
2008
PREMIS
UK Digital
Preservation Editorial Committee
formed
Award
2009
PREMIS 2.1
released
2010
PREMIS
Implementation
Fairs
2011
The State of PREMIS
 de facto standard for preservation metadata; in some
countries mandated for cultural heritage repositories
 PREMIS implementations are appearing in many places,
many contexts, many forms
 Some experimentation is leading to changes in the data
dictionary and schema
 PREMIS Implementation fairs: attempts to consolidate
implementation experiences, issues, best practices,
PREMIS Maintenance Activity
 Web site:
•
•
•
Permanent Web presence, hosted by
Library of Congress
Central destination for PREMIS-related
info, announcements, resources
Home of the PREMIS Implementers’ Group (PIG)
discussion list
 PREMIS Editorial Committee:
•
•
•
Set directions/priorities for PREMIS development
Coordinate future revisions of Data Dictionary and XML
schema
Promote implementation
http://www.loc.gov/standards/premis/
PREMIS Editorial Committee membership







Rebecca Guenther, Chair
(Library of Congress)
Yair Brama (ExLibris)
Karin Bredenberg
(Riksarkivet, Swedish National
Archives)
Priscilla Caplan (Florida
Center for Library
Automation)
Angela Dappert (British
Library)
Angela Di Iorio (Fondazione
Rinascimento Digitale)
Markus Enders (British
Library)








Karsten Huth (Sächsisches
Staatsarchiv)
David Lake (US National
Archives and Records
Administration)
Brian Lavoie (OCLC)
Sébastien Peyrard
(Bibliothéque nationale de
France)
Robert Sharpe (Tessella)
Sally Vermaaten (Statistics
New Zealand)
Robert Wolfe (MIT/DSpace)
Kate Zwaard (US
Government Printing Office)
PREMIS activities

Integration with other standards and efforts
• Survey of PREMIS in METS profiles (DLib magazine Sept 2010)
http://www.dlib.org/dlib/september10/vermaaten/09vermaaten.html
Extensibility: Add elements about extensions as in METS
• US intelligence community extending for security classification
PREMIS Documentation
• Understanding PREMIS: Priscilla Caplan (2009)
• Gentle introduction to the PREMIS standard
• Spanish, German and Italian translations
• PREMIS Data Dictionary for Preservation Metadata version 2.0:
translation in Japanese and Spanish
Workflows and registries
• PREMIS Tools to facilitate automated workflows: PREMIS in METS
toolkit made available as open source
• PREMIS controlled vocabularies in id.loc.gov
PREMIS OWL Ontology in development, soon to be released
•



Some implementers …








DAITTSS (Florida): a preservation repository for the use of the
libraries of the public universities of Florida.
Ex Libris Rosetta: a commercial digital preservation system
supporting acquisition, validation, ingest, storage, management,
preservation and dissemination of different types of digital objects
National Digital Newspaper Program
Archivematica: comrehensive open-source digital preservation
system
National Archives of Sweden, National Archives of Scotland
Carolina Digital Repository: repository for material in electronic
formats produced by members of the University of North Carolina
at Chapel Hill community.
British Library electronic journal archiving project
For more information see:
• http://www.loc.gov/premis/premis-registry.html
What does it mean to implement PREMIS?







You are keeping preservation metadata that is defined in the
PREMIS data dictionary as information you need to know to
preserve digital objects
There can be a phased approach to implementation in terms of
which PREMIS entities to implement
Most values can be extracted from the object or generated by a
repository
You don’t have to control all levels of objects; some may only
manage files, not representations or bitstreams
If you aren’t already, you should be planning to track actions on
objects for future preservation activities (PREMIS events)
You may or may not store data using METS as a container, but it is
useful as a standard exchange package (SIP or DIP)
PREMIS conformance statement was developed and is available
PREMIS in METS toolbox
 Developed by Florida Center for Library Automation under
contract with LC
 Uses PREMIS in METS guidelines
 A set of open-source tools to support the implementation of
PREMIS especially in the METS container format
 3 components: validate, convert, describe
 Source code available:
http://pimtoolbox.sourceforge.net
Describe:
uses the DAITSS description service
/a/real/file
droid/jhove
<premis>
<ext>
</premis>
Convert:
between PREMIS and PREMIS in METS
OR PREMIS in METS to PREMIS
<premis/>
xslt
<mets>
<premis>
</mets>
Validate:
PREMIS in METS document
<mets>
<premis/>
</mets>
Schematron
confirmation
or
errors
Tools continued
 Id.loc.gov
Preservation events
• Preservation level role
• Cryptographic hash functions
 Additional vocabularies to be included soon
•
Conclusions

PREMIS Data Dictionary provides critical piece of reliable digital
preservation infrastructure comprised of technology, standards, and
best practice

PREMIS was produced from an international, cross-domain,
consensus-building process and is applicable to any preservation
effort

PREMIS Data Dictionary is a building block with which effective,
sustainable digital preservation strategies can be implemented

PREMIS Data Dictionary and the Maintenance Activity is tightly
focused on implementation

Preservation metadata will be crucial for the future even if it doesn’t
enhance current access
URLs, etc.

PREMIS Maintenance Activity:
http://www.loc.gov/standards/premis/

PREMIS Data Dictionary for Preservation Metadata:
http://www.loc.gov/standards/premis/v2/premis-2-1.pdf

PREMIS Implementation Registry
http://www.loc.gov/standards/premis/premis-registry.php

PREMIS Implementers Group list
http://listserv.loc.gov/listarch/pig.html
Download