Tiina Ison

advertisement

National Library of Finland

Metadata in the Digitisation Process

Cultural unity and diversity of the Baltic Sea Region – common history, different languages, mixed culture

Helsinki, 21st–22 nd October 2010

Tiina Ison, Senior Analyst,

National Library of Finland

Outline

1. Front End - National Digital Library and Long Term

Preservation (KDK/PAS)

2. Back End - Digitisation Production Process, METS Profiles

3. Descriptive Metadata

4. Administrative/Technical Metadata

5. Structural Metadata

6. Wrapping things together: METS Profile

7. Processes towards distrubed work, crowd soucing, annotaiton and ontologies

1. Frond End: National Digital Library and Long-

Term Preservation Infrastructure

Infrastructure Intiatives:

National Digital LibraryNational

Long-Term Prservation http://www.kdk2011.fi

Rights

Management

...

METS profiles

Libraries / Archives / Museums

BACK END SYSTEMS

In their digitisation production memory institutions produce authentic, trustworthy digitised content and collections

OPM-KD Project 2007-2009, digitisation production revewed http://www.kansalliskirjasto.fi/extra/vanhat_bulletinit/b ulletin09/article6.html

Ministry of Education www.kdk2011.fi/fi/tietoa-hankkeesta www.minedu.fi

Kansallisen Digitaalisen Kirjaston Arkkitehtuuri http:// www.kdk2011.fi/images/stories/Kokonaisarkkitehtuuri-yleiskuva-fi_iso.jpg

2. Back End: Digitisation Production Processes,

METS Profiles

Articles

Illustrations

Poems LEVEL OF

MARK UP

Structural metadata METS, ALTO

POST

PROCESSING

DIGITAL RESOURCE

COMPREHENSIVE

DIGITIAL COLLECTIONS

Standards & OAI-PMH complient METS SIP packages

Administrative/technical metadata MIX/PREMIS

METS EXPORT

Packesges include:

SCANNING JPEG2000

Descriptive metadata MARC21/MODS

CATALOUGING

Newspapers

Serials

Books

Parchments

Notes

Maps

Audio

SOURCE MATERIAL

PHYSICAL COLLECTIONS

Two Bibliographic

Records

OCR TXT as ALTO XML

PDF

JPEG(150)

METSXML

MARCXML

3. Descriptive Metadata

Catalogued Items

Un-catalogued Items – Minimal bibligraphic record

Bar Code ID’s – Unique ID’s for Physical Items

Ingest of bibliographic metadata into digitisation produciton

MARC21 conversion into MARCXML (MODS)

Two bibliographic recrods – physical and digital (link 776)

Post cataloguing for minimal records

Enrichmnent of catalogue

CATALOUGING

4. Administrative/Technical Metadata

SCANNING

An XML Schema designed for expressing technical metadata for digital still images

Technical Metadata for Digital Still Images - (NISO Z39.87 Data Dictionary)

MIX: Image width, Color space, color profile, Scanner metadata, Digital camera settings

Preservation Metadata/Premis (information about actions on object, on even, on technical environment)

Rights Metadata (access restriction)

Persistent ID’s

5. Structural Metadata

Navigation, use and access ?

Logical Structure

Physical Structure

METS structMap – relatinships between parts

POST

PROCESSING

6. Level of Structural Mark Up

LEVEL OF

MARK UP

Material types books , serials, newspaoers, audio, projects

Granularity - different level of structural mark up

- i.e. article, illustration, poem

Granularity - all material types: pages, footnotes, running title, tables, advertisemnts, image (captions and categories )

Labour intensive

Phased approach in production

Crowd sourcing

7. Wrapping things together; METS Profiles

METS profiles for different material types

• monographs, serials, newspapers, audio…

Export files :

JPEG2000, lossless, PDF, OCR TXT as ALTO XML, JPEG (150dpi), METSXML and

MARCXML

METS container or wrapper provides a SIP package for delivery and exchange of digital objects accross systems that is OAI-PMH compliant. Wraps descriptive, administrative and structural metadata + PREMIS.

• MODS and MARCXML for descriptive and bibliographical metadata

(http://www.loc.gov/standards/mods/)

( http://www.loc.gov/standards/marcxml/ )

• MIX for image technical metadata ( http://www.loc.gov/standards/mix/ )

• PREMIS for preservation metadata ( http://www.loc.gov/standards/premis/ )

(standardi salkku)

8. Processes towards distributed work, crowd sourcing, annotation and ontolgies

Content and context as part of digitisation processes…

OCR Correction

Automatic and semiautomatic proccess for data extraction …

Distributed work processes i.e. for:

• Mark up level

• OCR correction

• Controlled annotation

• Social tagging

THANK YOU

Download