Not as easy as it sounds! - American Library Association

advertisement
Taming Metadata in the Wild
West
Liz Bishoff, Colorado Digitization Program
Cheryl Walters, Utah State University
Chuck Thomas, Florida State University
Elizabeth S. Meagher, University of Denver
©2002 Colorado Digitization Project
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
©2002 Colorado Digitization Program
Who what when where—Western
Trails Digital Standards
• Western Trails 2001 IMLS funded grant
– Multi-state initiative to create a collection of
digital objects on topic of Western Trails
– 23 participating institutions, creating 20,000
digital objects
– Each institution would host their own digital
object/each would create their own
metadata/each would use their own
metadata standards and their own database
or local system
– Each state would create a statewide
http://coloradodigital.coalliance.org
©2002 Colorado
Digitization Project
database
http://www.cdpheritage.org
©2003 Colorado Digitization Program
Who, what, when, where (con’t)
• Interoperability of the 4 state databases was
through Z39.50, with a SiteSearch Web-Z
interface
• Based on the CDP experience
– Crosswalks from various databases
– Reviewed the CDP Best Practices
– Agreed to utilize Dublin Core as the common format
• Involved more than just the 4 states
– Utah Academic Library Consortia, New Mexico,
Arizona, Minnesota, Kansas, Nebraska, Colorado,
Wyoming
– 18 representatives from archives, museums,
historical societies in these states met over a 9
month period to develop the document
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Why Western States Best
Practices?
• Improve user results/satisfaction
• Improve consistency across different cultural
heritage institutions
• Enhance potential for creating union catalogs
from multiple databases/ILS
• Provide guidance for cultural heritage
institutions on use of Dublin Core for digital
resources
• Support interoperability
• Support emerging standards--OAI
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Taming Metadata in the Wild
West
Part 2: Writing the metadata
guidelines
©2002 Colorado Digitization Project
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
©2002 Colorado Digitization Program
Western Trail Metadata Task Forces
- Descriptive Elements
-
Title
Creator
Subject
Description
Date.Original
Relation
-
Contributor
Publisher
Language
Source
Coverage
- Technical Elements
-
Date.Digital
Type
Format.Use
Holding.Institution
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
Format.Creation
Identifier
Rights Management
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Getting It Done
• Set up two electronic discussion lists
• Used Colorado Digital Project’s metadata guidelines as
a base document
• Created initial working draft
– base document
– decisions made during WSDSG’s first meeting
– input from task force members
• Distributed draft to task force for revision
(additions, deletions, rewrites, etc.)
• Changes discussed via email & incorporated
• Result taken back to entire Western States Digital
Standards Group for review
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Some concepts
• Define all terms used
• Avoid being library-centric
• Do not assume cataloging or metadata
experience
• Provide lots of examples
• Provide links to related thesauri,
standards, etc.
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Problems, points of contention
• Figuring out exactly what data each
Dublin Core element should contain
Not as easy as it sounds!
- Figuring out how to make guidelines
flexible & comprehensive enough to fit a
variety of situations, collaborative
ventures & partners, for now and in the
future.
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
The Coverage element
• Official DC definition:
“The extent or scope of the content of the resource”
•
•
•
•
What does that mean exactly?
How does it differ from the Subject
Does not mean date/place of publication
Our description of this element:
“Describes the spatial or temporal characteristics of the
intellectual content of the resource”
– For art objects and artifacts, this could be the place where the
object originated and the date or time period during which it
was made.
– Currently recommended only for maps, etc. or when place or
time period cannot be adequately described by Subject
element.
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Source versus Relation element
• Looking at California State Library’s Metadata
Standards helped
– Source maps to MARC 534
• Note about original version
– Relation maps to MARC 787
• Note about a related title
• The lights came on for catalogers who wanted
to provide similar MARC tag equivalents; voted
down as too library-centric. There are other
standards beside MARC!
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Relation refinements explained
• Relation element has 12 possible refinements
• Meaning of each not always obvious
• Some of the differences not clear
– Relation.IsFormatOf versus
Relation.HasFormatOf.
- We provided DC’s explanation of the
relationship between the resource and the
object described in relation field.
- Also gave concrete example of each
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Publisher element
• Not straightforward when object is
digitized version of previously published
item
– Is it the digitizing institution?
– Is it the publisher of the original version?
• Our guideline explains
– “The Publisher element contains information about
the digital publisher. Publisher information from
earlier stages in an object’s publishing history may
be listed in ... Source and Contributor.”
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Date element
• What could possibly be confusing about “Date”?
- Date originally issued, published,
made, or created?
- Date digitized?
- Date of an associated event?
- We created two new refinements to distinguish
most important dates:
- Date.Original “Creation or modification dates for the
original resource from which the digital object was
derived or created.”
- Date.Digital “Date of creation or availability of the
digital resource.”
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Enter initial articles in titles?
• Sounds innocuous but ...
Affects sorting for display or reports
Do you want the title “The toupee worn by....” to sort by
“Toupee” or “The”?
- MARC controls via indicators; other formats don’t have
- If leave out, creates possible problem if migrating data
into or out of a MARC format databases.
- One person dryly commented:
“there will probably be some sort of trouble no
matter what we decide.”
- Our guidelines recommend omitting initial articles.
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Making guidelines“one size fits all”
• Tried to encourage users to think about
ramifications of their metadata decisions
• Reminded them to think about how data
may be migrated and shared in future
• Listed lots of different thesauri &
schemes to give users some choices
• Listed important info that metadata
creators should include in record.
– Example: the Format.Creation field
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Improving quality & detail of data
• Format.Creation field guidelines describe
important technical data that users might
want to include:
– File size, quality (bit depth, resolution), extent
(playtime, etc.), compression, checksum value,
operating system, creation hardware & software,
etc.
• Format.Creation also gives links to
resources with more info about terms and
standards.
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
A rose by any other name...
• One user community’s “autograph
album” might be another’s “libri
amicorum”
• How can we accommodate many
potential controlled vocabularies
• Does the public need or want to know
what vocabularies are used?
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Flexible subject guidelines
– Allow and provide links to many different
thesauri
– Separate out different subject/genre
schemes
• Example: Put all the Lib of Congress subject
headings in one field; put all the genre terms
from Thesaurus for Graphic Materials in another.
– Identify thesauri used via scheme qualifier in
field label, not mixed in with data in field
itself which is searchable.
• Example: Label is Subject.MeSH so that “mesh”
does not become a searchable term.
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Taming Metadata in the Wild
West
Part 3: Applications
©2002 Colorado Digitization Project
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
©2002 Colorado Digitization Program
Metadata Application Depends On:
•
•
•
•
•
Information available about the artifact
Expertise of the researcher
Complexity of records
Expertise of the cataloger
Data entry system and display
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
MARC to Dublin Core – DCBuilder
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Original Museum Record
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Museum Record after CDP
Title Acer florissanti
Creator
Contributor
Link http://planning.nps.gov/flfo/tax3_Detail.cfm?ID=13484004 [Access] [URI]
Publisher 1. Florissant Fossil Beds National Monument 2. National Park Service
Description Plant (Angiosperm, Dicotyledon) Family: Aceraceae
Date Digital 2000
Subject(s) Aceraceae -- Colorado
Angiosperms, Fossil -- Colorado
Dicotyledons, Fossil – Colorado
Florissant (Colo.)
Type 1. image [DCMI Type Vocabulary] 2. text [DCMI Type Vocabulary]
Source National Museum of Natural History, Smithsonian Institution
USNM-333761
Languages eng [ISO 639-2]
Relation
MacGinitie, D.D., Fossil Plants of the Florissant Beds, Colorado,
Carnegie
Format Use 1. image/jpeg [IMT] [medium] 2.text/html [IMT] [medium]
Rights
National Museum of Natural History, Smithsonian Institution
Project
Florissant Fossil Beds National Monument
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Metadata Record in ContentDM
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Metadata Record in ContentDM
Continued
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Metadata Elements - Public Display
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Historical Society Metadata Record
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Direct Input
Title Annual report of the Jewish Consumptives' Relief Society at Denver, Colo.
Creator Jewish Consumptives' Relief Society (U.S.)
Contributor
Link http://library.du.edu/About/collections/SpecialCollections/jcrs/annualreports.cfm
Publisher University of Denver. Penrose Library
Description The <3rd- > reports published <1907- > as regular numbered issues of:
The Sanatorium, v. <1- > The 11th and 12th reports (covering 1914-15) issued
in combined form as: The Sanatorium ; v. 10, nos. 3/4 (July-Sept./Oct.-Dec. 1916)
Reports cover the year ending Dec. 31. Chiefly in English, with some Hebrew.
Date Original 1905-1906. [Issued] [W3C-DTF]
Date Digital 2002-01-04 [Created]
Subject(s) Jewish Consumptives' Relief Society (U.S.) -- Periodicals.
Tuberculosis -- Patients -- Colorado.
Sanatoriums -- Colorado -- Denver.
Type image [DCMI Type vocabulary]
Source 23-26 cm.
Languages eng [ISO 639-2]; heb [ISO 639-2]
Relation
Beck Archives/Rocky Mountain Jewish History Society. Jewish
Consumptives' Relief Society Collection. Special Collections Dept., Penrose
Library, University of Denver, Denver, Colo.
Format Create jpg; 300 dpi; 145 files; Epson Expression 836 XL Scanner; Adobe
Photoshop version 5.5.
Format Use image/jpg [Medium] [IMT]
Rights http://www.penlib.du.edu/specoll/copyri.html
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Metadata Record in ContentDM
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Taming Metadata in the Wild
West
Part 4: Accommodation of levels of
expertise
©2002 Colorado Digitization Project
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
©2002 Colorado Digitization Program
Local Metadata Routes
mappings
&
migrations
LEGACY
METADATA
Local
MetaBase
Services
NEW
latest
metadata
standard
CONTENT
constituents
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Z39.50 Access
content
w/out local
database
Z39.50 Connections
Conversion
Scripts
Local
MetaBase
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
Heritage
Colorado
Metadata
Colorado
Western Trails
Metadata
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
OAI Access
CONTENT
PROVIDER
FSU
DIGITAL
OAI-WT
OAI-DC
CONTENT
PROVIDER
OAI-METS
CONTENT
PROVIDER
COLORADO
HERITAGE
MOUNTAIN
WEST DL
OAI-WT
OAI-DC
WESTERN
TRAILS
SERVICES
PROVIDER
OAI-WT
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Content Provider Challenges
 Implementing OAI
- Intermediate Brokers May Be Necessary
 Choosing Brokers & Harvesters
 Maintaining Current OAI Provider Support
 Awareness of Current Metadata Standards
 Mapping Local Metadata to Supported Schema
 Maintaining Current Transformation Procedure
- Examples
 Knowing Who Has Your Metadata
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Service Provider Challenges
 Maintaining Current OAI Harvester Support
- Continuing support for older versions
 Awareness of Communities & Metadata Schema
- What to collect?
- Multiple views / repurposing
- Added value of relationships between objects/collections
- Link in a greater series of brokers?
 Maintaining Multiple Data About Same Objects?
- Examples
 Active Role as Harvester/Service Provider
- Contrast with more passive current OAI role
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Thank You!
Liz Bishoff
Colorado Digitization Program
liz@bishoff.com
Cheryl Walters
Utah State University
Cheryl.Walters@usu.edu
Chuck Thomas
Florida State University Libraries
cthomas@fsu.edu
Elizabeth “Betty” Meagher
University of Denver
emeagher@du.edu
©2002 Colorado Digitization Project
©2003 Colorado Digitization Program
http://coloradodigital.coalliance.org
http://www.cdpheritage.org
Download