Lifecycle Metadata for Digital Objects The Final Curtain December 4, 2006 Dramatis Personae • • • • • • • • Mundee Harrison Kaczmarczik Sevcik Bibb Holt Addison Cofield • Keenan MIX - Metadata for Images in XML Schema • Currently under development • Schema for a set of technical data elements required to manage digital image collections • Useful for digitized text (page images) Profiling the Dynamic Web Page What Is This Dynamic Business?!?! The Deep End: The Database of All Databases Dynamic Web Pages and the Metadata Sets Who Love Them Why Dynamic Web Pages Die Harvesters, Crawlers, and Extractors Picking and Choosing Metadata Decisions, Decisions Sound Recording (Digitized) • Use case: Student recitals recorded as analog, digitized for streaming access • Challenge: Find schema that apply to musical performances and have usefulness for searching • Metadata standards: mpeg-7, DC/MODS Susan Harwood Kaczmarczik December 4, 2006 Preserving ETDs Major Issues--electronic theses and dissertations Fonts--embedded--unrecognized--hacked? Big list of Unicode: http://www.alanwood.net/unicode/fonts.html Active features--links, fields, encryption Solutions PDF/A--too simple & still in development Multi-page TIFF + "too big to fail" Administrative Degree candidacy elements Digitized Moving Image: VHS *High Points* • Extension Schema: LOC AV Prototype *Problems Encountered* •Getting started dmdSec • MODS amdSec • • • • techMD: VMD rightsMD: RMD sourceMD:VMD digiProvMD: PMD •Overwhelming file sizes •Copyright •Confusing technical terminology related to video DSpace SIP Profile for a Born Digital Audio Music File Preservation Issues - Formats and Guidelines Controlled Vocabularies - Library of Congress Subject Headings - Getty Thesaurus of Geographic Names - MARC Value List for Relators and Roles - DCMI Type Vocabulary - ISO 639-2 Extension Schemas - MODS - Creative Commons - AUDIOMD - LC-AV Audio Metadata Extension Schema Born Digital Still Images •Similar lifecycle to digitized. MD not always stored. •Primarily use NISO MIX format (includes EXIF, GPS). •Images are numerical representations - different image formats compress differently - some need special MD. •NISO MIX contains many fields that are seemingly unimportant but may be valuable as evidence. •NISO MIX also includes many fields completely unintelligible to the layman, referenced or not. •The previous two factors can spell trouble if the preservationist is not an expert! EXIF would help, but there is not a 1:1 ratio of information. •Metadata is meant to help understand transformation, not to “step backwards” to recreate images although this is possible with sufficient detail. Addison 4 DEC 06 Born Digital Spoken Word Oral History Audio Rules of Description - Name and Date formats Getty Thesaurus of Geographic Names LCSH HASSET .WAV .MP3 .TXT AUDIOMD & PREMIS TEXTMD & PREMIS TEI Encoding METS SIP Profile for Spreadsheets Melissa Keenan • Preservation issues: – Saving formulas – Proprietary format – Open Document Format for Office Applications (ISO/IEC 26300:2006) • Metadata: • EAD (use case is archival) • MathML (complex formulas) • Automatically generated by Microsoft • Microsoft.Office.Tools.Excel • PREMIS