DRS 2 Metadata Migration

advertisement
DRS 2 Metadata Migration
June 25, 2013
Agenda
•
•
•
•
•
Introduction
Preliminary results - content analysis
Metadata options
Next steps
Questions
INTRODUCTION
Reason for metadata migration
• Different data model
– File -> Object (a coherent set of content that is
considered a single intellectual unit for purposes of
description, use and/or management: for example a
particular book, web harvest, serial or photograph.)
• Different metadata schemas
– Many locally-defined -> community-standard
• Different packaging of metadata
– Use of METS in some cases -> consistent use of METS
Path to metadata migration
We are here
Migration
plan
Prototype
Analysis
• Metadata
• Content
• Users
• Proof-ofconcept
• Time
estimates
• Sequence
• Schedule
Develop
tools
• Dashboard
• Object
builders
Metadata
migration
Key feedback points
Process
options
Technical
options
Migration
plan
Prototype
Analysis
• Metadata
• Content
• Users
• Proof-ofconcept
• Time
estimates
• Sequence
• Schedule
Develop
tools
• Dashboard
• Object
builders
Metadata
migration
Timing
Next 3 months
Migration
plan
Prototype
Analysis
• Metadata
• Content
• Users
• Proof-ofconcept
• Time
estimates
• Sequence
• Schedule
Develop
tools
• Dashboard
• Object
builders
Metadata
migration
What does it involve?
• Aggregate DRS1 files into objects
– Different object types = content models
• Generate an object descriptor per object
Document example
PDF file
Document example
New object (content model = DOCUMENT)
PDF file
Document example
New object (content model = DOCUMENT)
PDF file
Descriptor file
Still image example
Archival
master
image file
Still image example
Archival
master
image file
Production
master
image file
Still image example
Archival
master
image file
Production
master
image file
Deliverable
image file
Still image example
New object (content model = STILL IMAGE)
Archival
master
image file
Production
master
image file
Deliverable
image file
Still image example
New object (content model = STILL IMAGE)
Archival
master
image file
Production
master
image file
Descriptor file
Deliverable
image file
Aggregate DRS1 files into objects
• One content file per object
– Color profile
– Document
– Google document container 1
– Google document container 2
– Google document container 3
– Opaque container
– Text
Aggregate DRS1 files into objects
• Multiple content files per object
– Audio
– Web harvest
– Biomedical image
– PDS document
– Target image
– MOA2
– Still image
Generate object descriptors
• METS format
– Embedded schemas (PREMIS, MODS, MIX, etc.)
• Metadata sources
– DRS1 database
– DRS1 METS files where they exist
– Examining the content files
– Catalog records?
PRELIMINARY RESULTS:
CONTENT ANALYSIS
Preliminary content analysis
• Conceptually “built” objects for 13/14 content
models (~36 million / 44 million files)
– All but still image
– Order helps!
PDS
Document
MOA2
Still
Image
Biomedical
Image
Preliminary content analysis
• 1,091,670 objects from 36,190,120 files
– ~33 files per object
• Relatively few surprises but content analysis is
not complete
Content cleanup
•
•
•
•
MOA2 files (8,024)
Index maps (2,686)
Entity files (1)
Merged PDS descriptors (22,203)
Content cleanup
• Orphaned target image (5), target description
files (4)
• Orphaned audio files (71)
METADATA OPTIONS
DRS1
DRS2
DESCRIPTOR
FILE INFO
e.g., billingCode
ownerCode
accessFlag
tech metadata
owner-suppliedName
role
purpose
quality
usageClass
OBJECT INFO
e.g., billingCode
ownerCode
owner-suppliedName
O
FILE INFO
e.g., accessFlag
tech metadata
owner-suppliedName
role
processing
quality
usageClass
DRS1
DRS2
DESCRIPTOR
FILE INFO
e.g., billingCode
ownerCode
accessFlag
tech metadata
owner-suppliedName
role
purpose
quality
usageClass
OBJECT INFO
e.g., billingCode
ownerCode
owner-suppliedName
O
FILE INFO
e.g., accessFlag
tech metadata
owner-suppliedName
role
processing
quality
usageClass
DRS1
DRS2
DESCRIPTOR
FILE INFO
e.g., billingCode
ownerCode
accessFlag
tech metadata
owner-suppliedName
role
purpose
quality
usageClass
METS
Object Label
MODS
PDS info, etc.
OBJECT INFO
Object Label
Object-level MODS
billingCode
ownerCode
owner-suppliedName
caption unit name
view text
O
FILE INFO
accessFlag
tech metadata
owner-suppliedName
role
processing
quality
usageClass
Objects
• Owner supplied name is required
• Need to generate during migration
• Four cases
–
–
–
–
A METS file exists
New object will be built from a single content file
New object will be built from multiple content files
No OSN (potential case)
• Proposal for most cases:
– add prefix or suffix to METS or content file owner supplied
name
Objects
• Other required object elements
– insertionDate
• date of earliest file?
– captionBehavior
• for existing objects, set based on billing code
• prospectively, set by depositor
– viewText
• available for all objects, not just PDS
• default to off
Objects
• Descriptive metadata
– Take MODS from existing METS as is or import
new
• From Aleph
• From Finding Aid
– If re-imported, update METS label or not?
– Import from OLIVIA based on owner supplied
name for the file?
Objects from existing METS
• Identifiers for Harvard metadata
– Identify finding aid identifiers
– Convert “Old HOLLIS” numbers
– Aleph IDs: include check digit or not?
– Convert to URIs or actionable URNs from plain IDs
• Could DRS format such URIs for new DRS2 input?
Objects from existing METS
• PDS elements
– PDF owner text becomes caption unit name
– viewOcr function becomes viewText
– goto function will be automatically determined by
presence of structMap/div attributes
• Caption behavior
– for existing objects, set by billing code
Files
• Run automated processes to identify, validate
and characterize file technical characteristics
• Extract technical metadata
Files
• isFirstGenerationinDrs
– Values: yes, no, unspecified
– Should we supply “yes” for archival masters
and/or top of derivation chain?
Image Files
• Converting from local scheme to MIX
• Local field questions
– Methodology
– History
– Source
– Enhancements
Text files
• Converting from local scheme to textMD
• Descriptor_type will be absorbed into
different places in DRS2
• Extracted metadata can supply
• markup_basis
• markup_language for specific schemas
• possibly other elements
Audio files
• Moving from local schema to AES57-2011:
Audio object structures for preservation and
restoration
Versioned metadata
• History will be tracked for key administrative
elements:
– Access flag
– Admin flag (new)
– Billing code
– Owner code
• What values to assign for required creation
date and agent for migrated content?
NEXT STEPS
Next steps
• Continue analysis and development of
technical requirements
• Build prototype
• September check-in on progress
• Create metadata migration plan
• Open meeting to review plan
OPEN FOR QUESTIONS
Download