Metadata in the Digitisation Process
Cultural unity and diversity of the Baltic Sea Region – common history, different languages, mixed culture
Helsinki, 21st–22 nd October 2010
Tiina Ison, Senior Analyst,
National Library of Finland
Outline
1. Front End - National Digital Library and Long Term
Preservation (KDK/PAS)
2. Back End - Digitisation Production Process, METS Profiles
3. Descriptive Metadata
4. Administrative/Technical Metadata
5. Structural Metadata
6. Wrapping things together: METS Profile
7. Processes towards distrubed work, crowd soucing, annotaiton and ontologies
Infrastructure Intiatives:
National Digital LibraryNational
Long-Term Prservation http://www.kdk2011.fi
Rights
Management
...
METS profiles
Libraries / Archives / Museums
BACK END SYSTEMS
In their digitisation production memory institutions produce authentic, trustworthy digitised content and collections
OPM-KD Project 2007-2009, digitisation production revewed http://www.kansalliskirjasto.fi/extra/vanhat_bulletinit/b ulletin09/article6.html
Ministry of Education www.kdk2011.fi/fi/tietoa-hankkeesta www.minedu.fi
Kansallisen Digitaalisen Kirjaston Arkkitehtuuri http:// www.kdk2011.fi/images/stories/Kokonaisarkkitehtuuri-yleiskuva-fi_iso.jpg
Articles
Illustrations
Poems LEVEL OF
MARK UP
Structural metadata METS, ALTO
POST
PROCESSING
DIGITAL RESOURCE
COMPREHENSIVE
DIGITIAL COLLECTIONS
Standards & OAI-PMH complient METS SIP packages
Administrative/technical metadata MIX/PREMIS
METS EXPORT
Packesges include:
SCANNING JPEG2000
Descriptive metadata MARC21/MODS
CATALOUGING
Newspapers
Serials
Books
Parchments
Notes
Maps
Audio
SOURCE MATERIAL
PHYSICAL COLLECTIONS
Two Bibliographic
Records
OCR TXT as ALTO XML
JPEG(150)
METSXML
MARCXML
Catalogued Items
Un-catalogued Items – Minimal bibligraphic record
Bar Code ID’s – Unique ID’s for Physical Items
Ingest of bibliographic metadata into digitisation produciton
MARC21 conversion into MARCXML (MODS)
Two bibliographic recrods – physical and digital (link 776)
Post cataloguing for minimal records
Enrichmnent of catalogue
CATALOUGING
SCANNING
An XML Schema designed for expressing technical metadata for digital still images
Technical Metadata for Digital Still Images - (NISO Z39.87 Data Dictionary)
MIX: Image width, Color space, color profile, Scanner metadata, Digital camera settings
Preservation Metadata/Premis (information about actions on object, on even, on technical environment)
Rights Metadata (access restriction)
Persistent ID’s
Navigation, use and access ?
Logical Structure
Physical Structure
METS structMap – relatinships between parts
POST
PROCESSING
LEVEL OF
MARK UP
Material types books , serials, newspaoers, audio, projects
Granularity - different level of structural mark up
- i.e. article, illustration, poem
Granularity - all material types: pages, footnotes, running title, tables, advertisemnts, image (captions and categories )
Labour intensive
Phased approach in production
Crowd sourcing
METS profiles for different material types
• monographs, serials, newspapers, audio…
Export files :
JPEG2000, lossless, PDF, OCR TXT as ALTO XML, JPEG (150dpi), METSXML and
MARCXML
METS container or wrapper provides a SIP package for delivery and exchange of digital objects accross systems that is OAI-PMH compliant. Wraps descriptive, administrative and structural metadata + PREMIS.
• MODS and MARCXML for descriptive and bibliographical metadata
(http://www.loc.gov/standards/mods/)
( http://www.loc.gov/standards/marcxml/ )
• MIX for image technical metadata ( http://www.loc.gov/standards/mix/ )
• PREMIS for preservation metadata ( http://www.loc.gov/standards/premis/ )
(standardi salkku)
Content and context as part of digitisation processes…
OCR Correction
Automatic and semiautomatic proccess for data extraction …
Distributed work processes i.e. for:
• Mark up level
• OCR correction
• Controlled annotation
• Social tagging
THANK YOU