Preservation Metadata: Implementation Strategies (PREMIS) Rebecca Guenther Library of Congress rgue@loc.gov IS&T Archiving Conference April 28, 2005 Preservation Metadata: Implementation Strategies Overview of presentation Background to PREMIS PREMIS membership and charge Preservation repositories implementation survey PREMIS Core elements group • Development of data dictionary • Data model Next steps Implementation issues Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies OCLC/RLG Preservation Metadata Framework Working Group OCLC/RLG Preservation Metadata Working Group • Convened March 2000 • Looked at CEDARS, NLA, NEDLIB, OCLC Preservation metadata framework (June 2002) • Synthesized elements from existing sets • Based on OAIS information model • Elaboration of OAIS • Set of “prototype” preservation metadata elements http://www.oclc.org/research/projects/pmwg/pm_framework.pdf Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies PREMIS June 2003: OCLC/RLG sponsored new working group: PREMIS • Preservation Metadata: Implementation Strategies Need • Practical and implementable, not broadly theoretical • Independent of specific implementation Objectives • Define “core” set of preservation metadata elements, with supporting data dictionary, applicable to broad range of digital preservation activities • Identify and evaluate alternative strategies for encoding, storing, managing, and exchanging preservation metadata Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Membership Priscilla Caplan, FCLA (Chair) Rebecca Guenther, LC (Chair) Michael Alexander, British Library George Barnum, GPO Charles Blair, U. of Chicago Olaf Brandt, U. of Gottingen Adam Farquhar, British Library David Gewirtz, Yale Kevin Glavash, MIT/Dspace Cathy Hartman, U. of N. Texas Helen Hodgart, British Library Nancy Hoebelheinrich, Stanford Roger Howard/Sally Hubbard, Getty Museum Pam Kircher, OCLC John Kunze, Calif. Digital Library Apr. 28, 2005 Brian Lavoie, OCLC liaison Robin Dale, RLG liaison Vicky McCarger, LA Times Jerry McDonough, NYU/METS Evan Owens, JSTOR Erin Rhodes, NARA Madi Solomon, Walt Disney Co. Angela Spinazze, ATSPIN Gunter Waibel, RLG Lisa Weber, NARA Robin Wendler, Harvard Hilde van Wijngaarden, KB Andrew Wilson, NAA IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Advisory Committee Howard Besser, UCLA Liz Bishoff, OCLC (via Colorado Digitization Program) Gerard Clifton, National Library of Australia Gail Hodge, CENDI Steve Knight, National Library of New Zealand Apr. 28, 2005 Maggie Jones, Digital Preservation Coalition Nancy McGovern, Cornell Cliff Morgan, Wiley UK Richard Rinehart, U. of California, Berkeley IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Implementation Survey Report State of the art in Winter, 2003/2004 28 libraries, 7 archives, 3 museums, and 11 other 13 different countries; 45% from U.S. 38% in planning; 33% development; 46% production Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Survey findings Little experience with digital preservation • Most didn’t have active preservation strategy • Many not yet in production • Cannot assess adequacy of metadata Lack of common vocabulary and conceptual framework • Informed by OAIS reference model • Difference of opinion as to meaning of OAIS compliance Metadata • Many recording rights, provenance, technical, administrative, descriptive and structural Most repositories serve goals of both preservation and access Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Trends Store metadata redundantly in XML or relational database and with content data objects Use METS for structural metadata and as container for descriptive and administrative; MIX for images Use OAIS as framework and starting point Maintain multiple versions (originals, some normalized or migrated) in repository with complete metadata for all versions Choose multiple strategies for digital preservation Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Core Elements Mission: Define a core set of implementable preservation metadata elements. Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Core Elements Mission: Define a core set of implementable preservation metadata elements. • Information that supports and documents the digital preservation process; • Information that supports the viability, renderability, understandability, identity and authenticity of digital objects over time. Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Core Elements Mission: Define a core set of implementable preservation metadata elements. • • What most working preservation repositories are likely to need to know Core does not imply mandatory Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Core Elements Mission: Define a core set of implementable preservation metadata elements. As rigorous as possible • As much explanation as possible • Implementation neutral -- “This is what you have to know” • Values can be automatically supplied and processed -- no lengthy textual descriptions • Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Core Elements: Data Model Intellectual Entities Rights Objects Agents Events Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Scope of data dictionary Implementation independent Descriptive metadata out of scope Metadata about Agents is limited Technical metadata applying to all or most format types Media or hardware details is limited Business rules are essential for working repositories, but not covered Rights information for preservation actions, not access Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Sample data dictionary entry Semantic unit Semantic components Definition Rationale Data constraint Object category Applicability Examples Repeatability Obligation Creation/ Maintenance notes Usage notes Apr. 28, 2005 size None The size in bytes of the file or bitstream stored in the repository. Size is useful for ensuring the correct number of bytes from storage have been retrieved and that an application has enough room to move or process files. It might also be used when billing for storage. Integer Representation File Bitstream Applicable Applicable 2038927 Not repeatable Not repeatable Optional Optional Automatically obtained by the repository. Not applicable Defining this semantic unit as size in bytes makes it unnecessary to record a unit of measurement. However, for the purpose of data exchange the unit of measurement should be stated or understood by both partners. IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Semantic units pertaining to objects objectIdentifier preservationLevel objectCategory objectCharacteristics creatingApplication originalName Storage environment Apr. 28, 2005 signatureInformation relationship linkingEventIdentifier linkingIntellectual Entity Identifier linkingPermission StatementIdentifier IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies objectCharacteristics compositionlevel fixity size format significantProperties inhibitors Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Semantic units pertaining to Events eventIdentifier eventType eventDateTime eventDetail eventOutcome eventOutcomeDetail linkingAgentIdentifier linkingObjectIdentifier Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Semantic units pertaining to Agents agentIdentifier agentName agentType Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Semantic units pertaining to Rights permissionStatement permissionStatementIdentifier relatedObject grantingAgent grantingAgreement permissionGranted act restriction termOfGrant permissionNote Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Next steps PREMIS deliverables (May 2005) • • • • Data dictionary and report XML schemas Draft for experimentation to remain stable for a year Revisions will be based on results of testing Follow-up activities • • • • Testbeds for implementation and exchange Community outreach Establish maintenance activity Consider formal standardization Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies Implementation considerations Schema use with specific implementations (e.g. METS) Machine generation of metadata Tools Role of registries (format, environment) Prospects for collaboration and exchanging information content Rights and permissions Emergence of best practices Support needed from PREMIS maintenance activity Apr. 28, 2005 IS&T Archiving Conference 2005 Preservation Metadata: Implementation Strategies For More Information: PREMIS Web Site • www.oclc.org/research/projects/pmwg “Implementing Metadata in Digital Preservation Systems: The PREMIS Activity” D-Lib (April ‘04) • www.dlib.org/dlib/april04/lavoie/04lavoie.html RLG DigiNews October 2004 and December 2004 issues • www.rlg.org/en/page.php?Page_ID=12081 Priscilla Caplan: pcaplan@ufl.edu Rebecca Guenther: rgue@loc.gov Apr. 28, 2005 IS&T Archiving Conference 2005