Digitization with Millennium & CONTENTdm

advertisement
Digitization with Millennium &
CONTENTdm
Stuart Hunt
IUG17
Anaheim
May 2009
Overview
•
•
•
•
•
Background
Digitisation
Metadata
Workflows
Now
University of Warwick
•
•
•
•
Royal Charter 1965
Russell Group
16,000 FTE students
5000 staff
University Library
•
•
•
•
•
•
•
Approx 1.1 million volumes
170 staff (110 FTE)
Millennium 2003
Approx 100,000 issues/renewals per yr
Approx 28,000 new books per yr
RLUK member
OCLC member
Content
•
•
•
•
•
Marandet Collection
4000+ French plays 1720 to 1900
Acquired 1970s
Guide published 1979
Bibliographic records in Millennium,
RLUK, COPAC, & WorldCat
• No IPR issues
Projects
• Revolutionary Drama (1789-1800)
– 339 plays
• Empire Period Drama (1801-1815)
– 123 plays
• JISC Digitisation Programme: Enriching
Digital Resources
• ‘Exposing Marandet’
– 1500 plays/75,000 pages
Objectives
• Cross-searching
• Full-text searching
• Integration with existing & future
systems
– Millennium
– Web
– Vertical search solution
Options
• Existing solutions
– Millennium
– In-house web publishing tool
• Separate product
– Digital collection management software
– CONTENTdm
• Solution would drive approach taken
Digital production
• Image files
– TIFF & JPEG derivative
– Full colour & greyscale
– Outsourced
• Text files/full-text transcripts
– OCR quality initially not acceptable
– Re-keying
– Outsourced
Media Management
•
•
•
•
•
•
•
Tried & tested solution
Quick & easy
Link digital content
D2D process simplified
Existing bibs
New bibs
Use existing authentication if required
Media Management
• No full-text searching
• No cross-collection searching (unless in
separate scope)
• Tied to MARC metadata
• Metadata enrichment difficult
• Image file format
• Not a total solution
CONTENTdm
•
•
•
•
•
•
Full-text & cross-collection searching
Not tied to MARC metadata
Metadata enrichment simple
Local Windows server
Initial licence <50K images
Upgraded to unlimited licence 2008
Local metadata context
• Separate bibs
– Print vs electronic
– Describes what is
– Supports better (future) FRBRisation
– Ease of maintenance
– Location & format based scoping
• 793 for local added entry/uniform title
– Collection name
Metadata option 1
• Create metadata within CONTENTdm
• Play-by-play
• Metadata already present in Millennium
Metadata option 1
• Assumes that metadata is already
available
• Not scalable
• Poor use of resources
• Does not allow data to work harder or
smarter
Metadata option 2
• Create metadata outside of Millennium
• Metadata not already present in
Millennium
• Play-by-play
• Harvest from CONTENTdm into
Millennium via XML Harvester
XML Harvester
• Single configuration file
• Needs to be edited for each separate
resource
• Uses XSLT not load table(s)
• Major changes (e.g. harvest different
schema) may need to be done by III
Configuration file triggers
@XML_TYPE=DC (or MARCXML)
@OAI_FORMAT=oai_dc
@DBNAME=[Repository name]
@URL=[url for OAI-PMH]
@USEOAI=true (or false)
@OAISET=[Name of set]
@RECID_MARCTAG=001
XML Harvester
Harvested metadata
•
•
•
•
•
Loaded through Data Exchange
Significant re-editing
Tags & indicators
Diacritics
Creating attached items or holdings
records
Harvested metadata
Metadata option 3
• Batchload into CONTENTdm via
delimited file from Create Lists
• Cross-walk MARC21 to DC
• Directory structure
MARC to Simple DC crosswalk
Record#
dc:identifier
008/07-10
dc:language
100 dc:creator
245 dc:title
260|ab dc:publisher
260|c dc:date
300 dc:format
5XX dc:description
6XX dc:subject
700 dc:contributor
700|t dc:relation
793 dc:source
MARC – DC Crosswalk
Additional DC elements
• dc:rights
• dc:type
• Transcript mapped to dc:description
Metadata workflow
• Create separate bibs for e-versions
• Export print records via Data Exchange
• MarcEdit to remove extraneous tags
(907, etc)
• Insert 006, 007, 008/23, GMD, 533
• Re-import into Millennium as new bibs
• [856 CONTENTdm reference url added]
Metadata workflow
• Review file of newly loaded bibs
exported from Create Lists
• Cross-walked from MARC to DC
• Additional DC elements added
• Item level metadata added
• Loaded to CDM as delimited files with
directory structure
Metadata in CONTENTdm
• Compound objects
• Document level
• Page level
– Less rich than document level
• Hospitable to multiple schemas
• Deliberate attempt to stay close to DC
• Administrative metadata
– Later feature
Document level
• AACR in DC wrapper
• All descriptive metadata from bib
(except LDR, 006, 007, 008, GMD)
• Authority control (names, subjects,
uniform titles)
• Rights (dc:rights)
• Identifier (.b number)
• Mapped to DC for OAI harvesting
Page level
• Basic descriptive metadata (creator,
title, publisher, date)
• Rights (dc:rights)
• Identifier (.b number)
• Transcript (dc:description)
• No OAI harvesting at page level
– Local decision
Access & availability
• Availability across local → global
continuum
• Metadata contribution
• Collection level descriptions
• OAI
• Collapse D2D
Metadata in WorldCat
• Local CDM server – not able to use
Connexion Digital Import
• Bug between WorldCat and CDM for
compound objects
• FRBRized display in worldcat.org
potentially impedes discovery
Now
• ‘Exposing Marandet’ completes 9/2009
• Established service 4 collections
– Ancien Régime Drama
– Revolutionary Drama
– Empire Period Drama
– Restoration Drama
• Integration with course delivery
• Metadata enrichment to/from CÉSAR
Links
• http://go.warwick.ac.uk/fac/arts/french/m
arandet/
• http://www.jisc.ac.uk/whatwedo/program
mes/digitisation/enrichingdigi/marandet.
aspx
• http://webcat.warwick.ac.uk
• http://contentdm.warwick.ac.uk
stuart.hunt@warwick.ac.uk
Download