Lifecycle Metadata for Digital Objects The Final Curtain December 4, 2006

advertisement
Lifecycle Metadata for Digital
Objects
The Final Curtain
December 4, 2006
Dramatis Personae
•
•
•
•
•
•
•
•
Mundee
Harrison
Kaczmarczik
Sevcik
Bibb
Holt
Addison
Cofield
• Keenan
MIX - Metadata for Images in
XML Schema
• Currently under
development
• Schema for a set of
technical data
elements required to
manage digital image
collections
• Useful for digitized
text (page images)
Profiling the Dynamic Web Page



What Is This Dynamic Business?!?!
The Deep End: The Database of All
Databases
Dynamic Web Pages and the Metadata
Sets Who Love Them




Why Dynamic Web Pages Die
Harvesters, Crawlers, and Extractors
Picking and Choosing Metadata
Decisions, Decisions
Sound Recording
(Digitized)
• Use case: Student recitals recorded as
analog, digitized for streaming access
• Challenge: Find schema that apply to
musical performances and have
usefulness for searching
• Metadata standards: mpeg-7,
DC/MODS
Susan Harwood Kaczmarczik
December 4, 2006
Preserving ETDs
Major Issues--electronic theses and dissertations
 Fonts--embedded--unrecognized--hacked?
 Big list of Unicode:
http://www.alanwood.net/unicode/fonts.html

Active features--links, fields, encryption
Solutions


PDF/A--too simple & still in development
Multi-page TIFF + "too big to fail"
Administrative

Degree candidacy elements
Digitized Moving Image: VHS
*High Points*
• Extension Schema: LOC
AV Prototype
*Problems Encountered*
•Getting started
 dmdSec
• MODS
 amdSec
•
•
•
•
techMD: VMD
rightsMD: RMD
sourceMD:VMD
digiProvMD: PMD
•Overwhelming file sizes
•Copyright
•Confusing technical
terminology related to video
DSpace SIP Profile for a Born Digital Audio
Music File

Preservation Issues
- Formats and Guidelines

Controlled Vocabularies
- Library of Congress Subject Headings
- Getty Thesaurus of Geographic Names
- MARC Value List for Relators and Roles
- DCMI Type Vocabulary
- ISO 639-2

Extension Schemas
- MODS
- Creative Commons
- AUDIOMD - LC-AV Audio Metadata Extension Schema
Born Digital Still Images
•Similar lifecycle to digitized. MD not always stored.
•Primarily use NISO MIX format (includes EXIF, GPS).
•Images are numerical representations - different image
formats compress differently - some need special MD.
•NISO MIX contains many fields that are seemingly
unimportant but may be valuable as evidence.
•NISO MIX also includes many fields completely
unintelligible to the layman, referenced or not.
•The previous two factors can spell trouble if the
preservationist is not an expert! EXIF would help,
but there is not a 1:1 ratio of information.
•Metadata is meant to help understand transformation,
not to “step backwards” to recreate images although
this is possible with sufficient detail.
Addison 4 DEC 06
Born Digital Spoken Word
Oral History Audio
Rules of Description - Name and Date formats
Getty Thesaurus of Geographic Names
LCSH
HASSET
.WAV
.MP3
.TXT
AUDIOMD & PREMIS
TEXTMD & PREMIS
TEI Encoding
METS SIP Profile for Spreadsheets
Melissa Keenan
• Preservation issues:
– Saving formulas
– Proprietary format
– Open Document Format for Office Applications (ISO/IEC
26300:2006)
• Metadata:
• EAD (use case is archival)
• MathML (complex formulas)
• Automatically generated by Microsoft
• Microsoft.Office.Tools.Excel
• PREMIS
Download