Lifecycle Metadata for Digital Objects October 16, 2006 Implementing metadata in a repository system The OAIS model NASA/CCSDS Developed to preserve scientific data Assumed that creators were scientists Assumed that users would be limited Specifies a set of functions for a trusted digital repository OAIS external relations Where’s the (external) metadata? Implications of externalities: pre-ingest, postdissemination Pre-ingest: repository may not control Who provides the metadata? How is it obtained? Post-dissemination: repository must respond What environment will the metadata be used in? What metadata should be kept re usage? Importance of external agreements SIP agreement Defining formats Providing tools to receive and access formats Providing specific metadata Providing for testing of automated ingest DIP agreement Not an explicit part of the model But crucial to define “designated user community” and its expectations At any given time, may define “plain-vanilla” context of general user population OAIS (internal) functional model Where’s the (internal) metadata? Metadata is generated at every step Ingest AIP bundling Ongoing data management Access events Repository management Roles of metadata in a repository Regulation of ingest process Serving as warrant of genuineness Defining placement in repository Defining relations with other objects in repository Regulation of access permissions Regulation of preservation scheduling and actions Assisting in management of repository Ingest Verifying what was received Automated test template Harvest of existing metadata per SIP agreement Preparing to put it away Aggregates? Single items? Additional metadata? Archival bond links to existing collections? Archival storage Taking care of digital objects Preserving them as received Importance of message digest Regular integrity-checking Preserving them otherwise than as received “Use copies” for frequently-used materials Migration on demand Storage within the repository Active file system Active database Used to contain archival digital objects Used to contain use copies Used to contain metadata Can also be used to contain index to all text data in repository (as inverted index) Offline file system and database (“dark archive”) Used to store objects and metadata securely Data management Taking care of metadata Maximizing access Tracking and understanding usage Assisting with making new connections (analyzing usage data) Integrating possible feedback metadata Database functions Internal Tracks ingest, repackaging, usage, and preservation activities Provides locator for objects External Provides searching on metadata fields Provides searching on object content for text objects Provides information for validation of access privileges Database type and choice Relational Hierarchical “Native XML” “Supports XML” Hybrid (database structure, XML document access) Access As conceived in original OAIS document: handled by people, offline In practice: automated as much as possible Who gets access? What kind of access? Recording access instances Overall management Repository as a whole External relations in general Administration including periodic recertification Preservation planning SIP agreements and negotiation with depositors DIP agreements and interaction with users Persistence and trustworthiness Most crucial element of the OAIS model: requirement for specifying cessation process Commitment to donor and user community Guarantee of continued service Explicit agreements with potential successor organizations Vital to user community that expects permanent guarantees (e.g., government) Certification Repository excellence must be judged against standards May be audited by certifying body (this is still under discussion) Certification plan is current task of core group from RLG and NARA with interest from Cornell, Harvard, OCLC and others; draft checklist released spring 2006, final version in process