Taming Metadata in the Wild West Liz Bishoff, Colorado Digitization Program Cheryl Walters, Utah State University Chuck Thomas, Florida State University Elizabeth S. Meagher, University of Denver ©2002 Colorado Digitization Project http://coloradodigital.coalliance.org http://www.cdpheritage.org ©2002 Colorado Digitization Program Who what when where—Western Trails Digital Standards • Western Trails 2001 IMLS funded grant – Multi-state initiative to create a collection of digital objects on topic of Western Trails – 23 participating institutions, creating 20,000 digital objects – Each institution would host their own digital object/each would create their own metadata/each would use their own metadata standards and their own database or local system – Each state would create a statewide http://coloradodigital.coalliance.org ©2002 Colorado Digitization Project database http://www.cdpheritage.org ©2003 Colorado Digitization Program Who, what, when, where (con’t) • Interoperability of the 4 state databases was through Z39.50, with a SiteSearch Web-Z interface • Based on the CDP experience – Crosswalks from various databases – Reviewed the CDP Best Practices – Agreed to utilize Dublin Core as the common format • Involved more than just the 4 states – Utah Academic Library Consortia, New Mexico, Arizona, Minnesota, Kansas, Nebraska, Colorado, Wyoming – 18 representatives from archives, museums, historical societies in these states met over a 9 month period to develop the document ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Why Western States Best Practices? • Improve user results/satisfaction • Improve consistency across different cultural heritage institutions • Enhance potential for creating union catalogs from multiple databases/ILS • Provide guidance for cultural heritage institutions on use of Dublin Core for digital resources • Support interoperability • Support emerging standards--OAI ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Taming Metadata in the Wild West Part 2: Writing the metadata guidelines ©2002 Colorado Digitization Project http://coloradodigital.coalliance.org http://www.cdpheritage.org ©2002 Colorado Digitization Program Western Trail Metadata Task Forces - Descriptive Elements - Title Creator Subject Description Date.Original Relation - Contributor Publisher Language Source Coverage - Technical Elements - Date.Digital Type Format.Use Holding.Institution ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program Format.Creation Identifier Rights Management http://coloradodigital.coalliance.org http://www.cdpheritage.org Getting It Done • Set up two electronic discussion lists • Used Colorado Digital Project’s metadata guidelines as a base document • Created initial working draft – base document – decisions made during WSDSG’s first meeting – input from task force members • Distributed draft to task force for revision (additions, deletions, rewrites, etc.) • Changes discussed via email & incorporated • Result taken back to entire Western States Digital Standards Group for review ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Some concepts • Define all terms used • Avoid being library-centric • Do not assume cataloging or metadata experience • Provide lots of examples • Provide links to related thesauri, standards, etc. ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Problems, points of contention • Figuring out exactly what data each Dublin Core element should contain Not as easy as it sounds! - Figuring out how to make guidelines flexible & comprehensive enough to fit a variety of situations, collaborative ventures & partners, for now and in the future. ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org The Coverage element • Official DC definition: “The extent or scope of the content of the resource” • • • • What does that mean exactly? How does it differ from the Subject Does not mean date/place of publication Our description of this element: “Describes the spatial or temporal characteristics of the intellectual content of the resource” – For art objects and artifacts, this could be the place where the object originated and the date or time period during which it was made. – Currently recommended only for maps, etc. or when place or time period cannot be adequately described by Subject element. ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Source versus Relation element • Looking at California State Library’s Metadata Standards helped – Source maps to MARC 534 • Note about original version – Relation maps to MARC 787 • Note about a related title • The lights came on for catalogers who wanted to provide similar MARC tag equivalents; voted down as too library-centric. There are other standards beside MARC! ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Relation refinements explained • Relation element has 12 possible refinements • Meaning of each not always obvious • Some of the differences not clear – Relation.IsFormatOf versus Relation.HasFormatOf. - We provided DC’s explanation of the relationship between the resource and the object described in relation field. - Also gave concrete example of each ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Publisher element • Not straightforward when object is digitized version of previously published item – Is it the digitizing institution? – Is it the publisher of the original version? • Our guideline explains – “The Publisher element contains information about the digital publisher. Publisher information from earlier stages in an object’s publishing history may be listed in ... Source and Contributor.” ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Date element • What could possibly be confusing about “Date”? - Date originally issued, published, made, or created? - Date digitized? - Date of an associated event? - We created two new refinements to distinguish most important dates: - Date.Original “Creation or modification dates for the original resource from which the digital object was derived or created.” - Date.Digital “Date of creation or availability of the digital resource.” ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Enter initial articles in titles? • Sounds innocuous but ... Affects sorting for display or reports Do you want the title “The toupee worn by....” to sort by “Toupee” or “The”? - MARC controls via indicators; other formats don’t have - If leave out, creates possible problem if migrating data into or out of a MARC format databases. - One person dryly commented: “there will probably be some sort of trouble no matter what we decide.” - Our guidelines recommend omitting initial articles. ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Making guidelines“one size fits all” • Tried to encourage users to think about ramifications of their metadata decisions • Reminded them to think about how data may be migrated and shared in future • Listed lots of different thesauri & schemes to give users some choices • Listed important info that metadata creators should include in record. – Example: the Format.Creation field ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Improving quality & detail of data • Format.Creation field guidelines describe important technical data that users might want to include: – File size, quality (bit depth, resolution), extent (playtime, etc.), compression, checksum value, operating system, creation hardware & software, etc. • Format.Creation also gives links to resources with more info about terms and standards. ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org A rose by any other name... • One user community’s “autograph album” might be another’s “libri amicorum” • How can we accommodate many potential controlled vocabularies • Does the public need or want to know what vocabularies are used? ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Flexible subject guidelines – Allow and provide links to many different thesauri – Separate out different subject/genre schemes • Example: Put all the Lib of Congress subject headings in one field; put all the genre terms from Thesaurus for Graphic Materials in another. – Identify thesauri used via scheme qualifier in field label, not mixed in with data in field itself which is searchable. • Example: Label is Subject.MeSH so that “mesh” does not become a searchable term. ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Taming Metadata in the Wild West Part 3: Applications ©2002 Colorado Digitization Project http://coloradodigital.coalliance.org http://www.cdpheritage.org ©2002 Colorado Digitization Program Metadata Application Depends On: • • • • • Information available about the artifact Expertise of the researcher Complexity of records Expertise of the cataloger Data entry system and display ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org MARC to Dublin Core – DCBuilder ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Original Museum Record ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Museum Record after CDP Title Acer florissanti Creator Contributor Link http://planning.nps.gov/flfo/tax3_Detail.cfm?ID=13484004 [Access] [URI] Publisher 1. Florissant Fossil Beds National Monument 2. National Park Service Description Plant (Angiosperm, Dicotyledon) Family: Aceraceae Date Digital 2000 Subject(s) Aceraceae -- Colorado Angiosperms, Fossil -- Colorado Dicotyledons, Fossil – Colorado Florissant (Colo.) Type 1. image [DCMI Type Vocabulary] 2. text [DCMI Type Vocabulary] Source National Museum of Natural History, Smithsonian Institution USNM-333761 Languages eng [ISO 639-2] Relation MacGinitie, D.D., Fossil Plants of the Florissant Beds, Colorado, Carnegie Format Use 1. image/jpeg [IMT] [medium] 2.text/html [IMT] [medium] Rights National Museum of Natural History, Smithsonian Institution Project Florissant Fossil Beds National Monument ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Metadata Record in ContentDM ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Metadata Record in ContentDM Continued ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Metadata Elements - Public Display ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Historical Society Metadata Record ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Direct Input Title Annual report of the Jewish Consumptives' Relief Society at Denver, Colo. Creator Jewish Consumptives' Relief Society (U.S.) Contributor Link http://library.du.edu/About/collections/SpecialCollections/jcrs/annualreports.cfm Publisher University of Denver. Penrose Library Description The <3rd- > reports published <1907- > as regular numbered issues of: The Sanatorium, v. <1- > The 11th and 12th reports (covering 1914-15) issued in combined form as: The Sanatorium ; v. 10, nos. 3/4 (July-Sept./Oct.-Dec. 1916) Reports cover the year ending Dec. 31. Chiefly in English, with some Hebrew. Date Original 1905-1906. [Issued] [W3C-DTF] Date Digital 2002-01-04 [Created] Subject(s) Jewish Consumptives' Relief Society (U.S.) -- Periodicals. Tuberculosis -- Patients -- Colorado. Sanatoriums -- Colorado -- Denver. Type image [DCMI Type vocabulary] Source 23-26 cm. Languages eng [ISO 639-2]; heb [ISO 639-2] Relation Beck Archives/Rocky Mountain Jewish History Society. Jewish Consumptives' Relief Society Collection. Special Collections Dept., Penrose Library, University of Denver, Denver, Colo. Format Create jpg; 300 dpi; 145 files; Epson Expression 836 XL Scanner; Adobe Photoshop version 5.5. Format Use image/jpg [Medium] [IMT] Rights http://www.penlib.du.edu/specoll/copyri.html ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Metadata Record in ContentDM ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Taming Metadata in the Wild West Part 4: Accommodation of levels of expertise ©2002 Colorado Digitization Project http://coloradodigital.coalliance.org http://www.cdpheritage.org ©2002 Colorado Digitization Program Local Metadata Routes mappings & migrations LEGACY METADATA Local MetaBase Services NEW latest metadata standard CONTENT constituents ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Z39.50 Access content w/out local database Z39.50 Connections Conversion Scripts Local MetaBase ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program Heritage Colorado Metadata Colorado Western Trails Metadata http://coloradodigital.coalliance.org http://www.cdpheritage.org OAI Access CONTENT PROVIDER FSU DIGITAL OAI-WT OAI-DC CONTENT PROVIDER OAI-METS CONTENT PROVIDER COLORADO HERITAGE MOUNTAIN WEST DL OAI-WT OAI-DC WESTERN TRAILS SERVICES PROVIDER OAI-WT ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Content Provider Challenges Implementing OAI - Intermediate Brokers May Be Necessary Choosing Brokers & Harvesters Maintaining Current OAI Provider Support Awareness of Current Metadata Standards Mapping Local Metadata to Supported Schema Maintaining Current Transformation Procedure - Examples Knowing Who Has Your Metadata ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Service Provider Challenges Maintaining Current OAI Harvester Support - Continuing support for older versions Awareness of Communities & Metadata Schema - What to collect? - Multiple views / repurposing - Added value of relationships between objects/collections - Link in a greater series of brokers? Maintaining Multiple Data About Same Objects? - Examples Active Role as Harvester/Service Provider - Contrast with more passive current OAI role ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org Thank You! Liz Bishoff Colorado Digitization Program liz@bishoff.com Cheryl Walters Utah State University Cheryl.Walters@usu.edu Chuck Thomas Florida State University Libraries cthomas@fsu.edu Elizabeth “Betty” Meagher University of Denver emeagher@du.edu ©2002 Colorado Digitization Project ©2003 Colorado Digitization Program http://coloradodigital.coalliance.org http://www.cdpheritage.org