PREMIS To Be or Not In My METS

advertisement
ALA Annual 2013 ALCTS PARS
Intellectual Access to Preservation Metadata
PREMIS:
To Be or Not To Be in My METS
The Preservation Journey at the University of Connecticut
Libraries
Digital Preservation
“Digital preservation combines



Policies
Strategies, and
Actions
that ensure access to digital content over time.”
--ALA/ALCTS/PARS Short Definition
TRAC & PREMIS

Policies:



Compliance (if not certification) with TRAC/CCSDS 652.0-M-1
(“Magenta Book”)
Requires set of policies
Strategies:

Fixity checking:



“4.4.1.2:The repository shall actively monitor the integrity of AIPs.”
Remotely replicated copies
Actions:



Fixity checking, at ingest and over time
Record fixity “hashes” or “message digests” in PREMIS
Replace fixity failures with good copies
Current Landscape

Currently The University of Connecticut Libraries relies on a
number of solutions that incorporate various “levels” of
preservation






CONTENTdm
Digital Commons
Archivists’ Toolkit
Archivematica
UCL’s AMFS (Archival Master File Service)
In 2011, a new team, the Second Generation Digital Library
Services Working Group (2G), was created to investigate
alternatives to these solutions that incorporated a more
consistent preservation mission over all its solutions for its
digital collections.
Timeline of Events for 2G
• Creation of the Second Generation Digital Library Services Working Group
Fall 2011 • Initial Questions on Metadata
• Metadata Standards & Normalization
Spring & • Metadata & Our Content Model: Where Does Metadata Live?
Summer
2012
• Islandora
• Role of Metadata & Islandora
Fall 2012 • METS Profile Registration
• Playing with metadata
Spring & • Re-conceptualize Islandora as a presentation layer
Summer
2013
Fall 2013
& Beyond
• Investigations into Handling PREMIS Events, RDF & Linked Open Data
Fall 2011


In the search for alternatives to our current solutions,
selected Fedora Digital Repository
Began investigating ingest scenarios





How will content creators submit content and associated
metadata?
How will metadata be structured and organized? Will content
and metadata be in shareable formats and/or proprietary?
Where do content and associated metadata live in Fedora?
Do we work with SIPs and what would these SIPs look like?
Do we follow others and rely on METS? How do we use
METS?
Spring & Summer 2012

UCL’s Content Model (CM) for Fedora

Needed to decide whether to “lump” or “split” our architecture

Decision was made to “split” our content model into 3 different levels of related
Fedora Digital Objects




UCL’s CM, Metadata and Content

Our “atomistic” CM means a more “atomistic” approach to metadata


Grouping Object Level that acts as the highest level to group like objects
Container Object Level refers to the type of a specific resource, such as an image vs a
letter
Media Object Level contains the actual digital content such as the jpg or pdf
Metadata can live at any of our 3 different levels (or metadata can live as a data
stream in any Fedora Digital Object at the grouping, container, or media object level.)
Metadata Standards and Normalization (METS)


We wanted the ability to process a variety of metadata (technical,
descriptive, preservation, administrative, structural) at ingest, we needed
a way to “normalize” ingested metadata in order to create and/or update
data streams in the appropriate Fedora Digital Objects
We chose METS
Fall 2012

METS, aka the Uberset




What is the Uberset?
The role of the METS Uberset file
Our ideal role for metadata and in particular preservation
metadata
Islandora

Seen early on as an administrative model and presentation
layer for our Fedora Digital Repository
Spring & Summer 2013

Development of the METS Uberset file





What are the minimal requirements for metadata?
How do these minimal requirements for across 3 different
levels of related Fedora Digital Objects?
What is the role of METSRights in relation to the other rights
statements in the descriptive metadata?
What do we do with the technical metadata?
Where do we get our initial PREMIS data and where does that
go in the METS Uberset file?
Spring & Summer 2013
Issues Encountered

Development of Islandora



Encountered conflicting content models
Decision to use Islandora as a presentation layer but not as an
administrative model
Metadata in the METS Uberset file

Problem with the technical metadata from Archivematica


Problem with PREMIS as it was transformed from METS Uberset file
to a data stream


Inconsistent
Logical loop created
Other issues with METS Uberset files



Workarounds solutions to problems above necessitated that METS
Uberset files be re-written over several times
Large METS Uberset files which caused transformation problems and
slow transformation times
Too much repetition in METS Uberset files
Summer 2013

Decision to move away from the METS Uberset File as a
tool to create and/or update data streams

From metadata “lumpers” to “splitters”

Notion of Metadata “Modules” as






Flexible
Re-usable
Interoperable
Ability to add directly into FOXML and Fedora data streams
Unique specifications (best practices, standards, policies, forms, etc.)
Parts that can be packaged a number of different ways including as our
METS Uberset file if needed
Fall 2013 & Our Next Steps

Refine specifications for metadata modules








Descriptive metadata (Simple Dublin Core, MODS for now)
Rights metadata (METSRights for now)
Administrative metadata (Fedora)
Structural metadata (Fedora, rels-ext, rels-int)
Technical metadata (Via FITS)
Preservation metadata (Normalized from Technical metadata
with the one event, “ingest” for now)
Investigate how to handle PREMIS beyond the ingest
event
Investigate RDF and Linked Open Data
Thank you

Jennifer Eustis


Jennifer.eustis@lib.uconn.edu
David Lowe

David.lowe@lib.uconn.edu
Download