Digitization with Millennium & CONTENTdm Stuart Hunt IUG17 Anaheim May 2009 Overview • • • • • Background Digitisation Metadata Workflows Now University of Warwick • • • • Royal Charter 1965 Russell Group 16,000 FTE students 5000 staff University Library • • • • • • • Approx 1.1 million volumes 170 staff (110 FTE) Millennium 2003 Approx 100,000 issues/renewals per yr Approx 28,000 new books per yr RLUK member OCLC member Content • • • • • Marandet Collection 4000+ French plays 1720 to 1900 Acquired 1970s Guide published 1979 Bibliographic records in Millennium, RLUK, COPAC, & WorldCat • No IPR issues Projects • Revolutionary Drama (1789-1800) – 339 plays • Empire Period Drama (1801-1815) – 123 plays • JISC Digitisation Programme: Enriching Digital Resources • ‘Exposing Marandet’ – 1500 plays/75,000 pages Objectives • Cross-searching • Full-text searching • Integration with existing & future systems – Millennium – Web – Vertical search solution Options • Existing solutions – Millennium – In-house web publishing tool • Separate product – Digital collection management software – CONTENTdm • Solution would drive approach taken Digital production • Image files – TIFF & JPEG derivative – Full colour & greyscale – Outsourced • Text files/full-text transcripts – OCR quality initially not acceptable – Re-keying – Outsourced Media Management • • • • • • • Tried & tested solution Quick & easy Link digital content D2D process simplified Existing bibs New bibs Use existing authentication if required Media Management • No full-text searching • No cross-collection searching (unless in separate scope) • Tied to MARC metadata • Metadata enrichment difficult • Image file format • Not a total solution CONTENTdm • • • • • • Full-text & cross-collection searching Not tied to MARC metadata Metadata enrichment simple Local Windows server Initial licence <50K images Upgraded to unlimited licence 2008 Local metadata context • Separate bibs – Print vs electronic – Describes what is – Supports better (future) FRBRisation – Ease of maintenance – Location & format based scoping • 793 for local added entry/uniform title – Collection name Metadata option 1 • Create metadata within CONTENTdm • Play-by-play • Metadata already present in Millennium Metadata option 1 • Assumes that metadata is already available • Not scalable • Poor use of resources • Does not allow data to work harder or smarter Metadata option 2 • Create metadata outside of Millennium • Metadata not already present in Millennium • Play-by-play • Harvest from CONTENTdm into Millennium via XML Harvester XML Harvester • Single configuration file • Needs to be edited for each separate resource • Uses XSLT not load table(s) • Major changes (e.g. harvest different schema) may need to be done by III Configuration file triggers @XML_TYPE=DC (or MARCXML) @OAI_FORMAT=oai_dc @DBNAME=[Repository name] @URL=[url for OAI-PMH] @USEOAI=true (or false) @OAISET=[Name of set] @RECID_MARCTAG=001 XML Harvester Harvested metadata • • • • • Loaded through Data Exchange Significant re-editing Tags & indicators Diacritics Creating attached items or holdings records Harvested metadata Metadata option 3 • Batchload into CONTENTdm via delimited file from Create Lists • Cross-walk MARC21 to DC • Directory structure MARC to Simple DC crosswalk Record# dc:identifier 008/07-10 dc:language 100 dc:creator 245 dc:title 260|ab dc:publisher 260|c dc:date 300 dc:format 5XX dc:description 6XX dc:subject 700 dc:contributor 700|t dc:relation 793 dc:source MARC – DC Crosswalk Additional DC elements • dc:rights • dc:type • Transcript mapped to dc:description Metadata workflow • Create separate bibs for e-versions • Export print records via Data Exchange • MarcEdit to remove extraneous tags (907, etc) • Insert 006, 007, 008/23, GMD, 533 • Re-import into Millennium as new bibs • [856 CONTENTdm reference url added] Metadata workflow • Review file of newly loaded bibs exported from Create Lists • Cross-walked from MARC to DC • Additional DC elements added • Item level metadata added • Loaded to CDM as delimited files with directory structure Metadata in CONTENTdm • Compound objects • Document level • Page level – Less rich than document level • Hospitable to multiple schemas • Deliberate attempt to stay close to DC • Administrative metadata – Later feature Document level • AACR in DC wrapper • All descriptive metadata from bib (except LDR, 006, 007, 008, GMD) • Authority control (names, subjects, uniform titles) • Rights (dc:rights) • Identifier (.b number) • Mapped to DC for OAI harvesting Page level • Basic descriptive metadata (creator, title, publisher, date) • Rights (dc:rights) • Identifier (.b number) • Transcript (dc:description) • No OAI harvesting at page level – Local decision Access & availability • Availability across local → global continuum • Metadata contribution • Collection level descriptions • OAI • Collapse D2D Metadata in WorldCat • Local CDM server – not able to use Connexion Digital Import • Bug between WorldCat and CDM for compound objects • FRBRized display in worldcat.org potentially impedes discovery Now • ‘Exposing Marandet’ completes 9/2009 • Established service 4 collections – Ancien Régime Drama – Revolutionary Drama – Empire Period Drama – Restoration Drama • Integration with course delivery • Metadata enrichment to/from CÉSAR Links • http://go.warwick.ac.uk/fac/arts/french/m arandet/ • http://www.jisc.ac.uk/whatwedo/program mes/digitisation/enrichingdigi/marandet. aspx • http://webcat.warwick.ac.uk • http://contentdm.warwick.ac.uk stuart.hunt@warwick.ac.uk