Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description Outline • • • • Why create and follow metadata standards? What kinds of standards are there? How does this all work? How do standards evolve? The world of standards A standard is any agreed-upon means of doing something. Standards can be formally created and adopted or merely customary. With standards, products and processes have a certain level of consistency and predictability that can make production and use more efficient. Goals of metadata standards Metadata standards enable more reliable description. For example, by agreeing to use separate fields to indicate first names and last names of resource creators, displays of search results by author can be properly alphabetized and more easily read, no matter if first name or last name comes first in the display. Reliable description enables the sharing of data across different systems. Types of standards Elings and Waibel describe four types of metadata standards: • Data structure (fields); MARC and EAD. • Data content (values); AACR2 (RDA) and DACS. • Data format; XML. • Data exchange; Z39.50 and OAI. These are useful categories, but sometimes standards may straddle them. You could say, for example, that MARC reflects AACR2 and not the other way around (although MARC defines data fields in a technical sense, AACR2 defines the content with which the fields are populated and to some degree conceptually determines the MARC fields; in practice these two become functionally intertwined). Multiple standards at work A cataloger uses AACR2 to determine: • That a book’s title should be part of its description. • The wording, spelling, capitalization, and punctuation of the title. The cataloger uses MARC to record the title information in a consistent form that computers can process. Multiple standards at work Two computer networks can use Z39.50 to determine how to exchange their MARC catalog records. The result? A user at Library A can search Library B’s catalog and not discern a difference in the way that information is structured and presented. It just works. Developing and adopting standards Organizations agree to adopt standards because the benefits of creating products or services that work together can be great. However, developing standards and forging that agreement can be a difficult process. For metadata content standards, using them can be complicated, and there is plenty of room for interpretive flexibility. Content standards: considerations Why are content standards so complicated? Because documents are various! Most content standards will try to implement a few basic guidelines supplemented by rules and options for special cases. Ideally, the basic guidelines will be based on clearly articulated goals and principles. Example: RDA goals RDA has articulated a concrete set of descriptive goals and principles. A few goals: • Enable description of any resource (not just printed materials). • Align with the FRBR conceptual model (works, expressions, manifestations, resources) and its objectives (finding, selecting, understanding, and so on). • Create content descriptions that can be used in multiple encodings and displays. • Retain backward compatibility with existing records. Example: RDA Principles One principle is that descriptions should reflect “the resource’s representation of itself.” This is a longstanding principle in library cataloging: where possible, description = transcription. This can be linked to the objective of finding known items: the catalog description should match how the item is known to others, which is most likely from the item itself. Example: RDA guidelines This principle of transcription underlies the basic guideline for RDA titles, which is that the “title proper” or primary title should come from the preferred source of information, which for books is the title page. While the wording comes from the title page, though, the capitalization and punctuation are standardized for all titles. Example: RDA special cases What if... • Some introductory words on the title page seem like they’re not really part of the title (e.g., Walt Disney Presents Sleeping Beauty)? • The title is given in two languages (e.g., Canadian Literature/Litterature Canadienne)? • There is a spelling mistake in the title? • The document is a manifestation of a commonly known work but has a slightly different title than most manifestations (e.g., William Shakespeare’s Hamlet)? • A subtitle appears under what seems to be the main title (e.g., Museum Informatics an introductory textbook)? • The title is over one paragraph long? Keeping standards relevant Standards are immediately out of date, of course. RDA has been in development since 2004, as part of a cooperative effort by U.S., U.K., Canadian, and Australian library associations. These are tremendous efforts! Particular institutions, such as the Library of Congress, will issue their own rules for interpreting the standards, which smaller organizations (such as the University of Texas) may or may not choose to adopt. Your mission Complete your subject classification for next week: introduction, classified structure, alphabetical structure, and reflective essay. A few notes on assignments, based on what I’ve seen in meeting with many of you, follow... Sort like with like Try to place like kinds of things together (processes, products, people), not just things that have some thematic relation. Remember, a hierarchy in its strict form takes one kind of thing and goes from the most general category to the most specific. this: Animals -> domesticated animals -> animals raised for food -> pigs this: Agricultural processes -> farming -> factory farming this: Effects -> effects of farming practices -> effects on animals -> overcrowding not this: Animals -> pastures, pens, cages -> overcrowding not this: Animals -> factory farming -> mercury poisoning Levels of abstraction Wrangling your concepts can be difficult when they are at different levels of abstraction. You may need to generate intermediate levels that weren’t explicit in your source documents. Source concepts: meat eating, e.coli, cholesterol, sustainability disadvantages of meat eating health risks health risks associated with meat eating high cholesterol health risks associated with industrial meat production bacterial contamination e.coli contamination unsustainable practices effects of industrial meat production consumption of resources pollution Node labels or subfacet labels Especially because your classifications are small, many of you may make use of labels that help clarify the principles of division used in your classified structure. In most cases, you will not use these terms to describe documents, and they are not, strictly speaking, actual concepts in your classification. You don’t need to include them in your alphabetical representation. Example Computers <by form factor> Desktop Laptop <by operating system> MacOS Linux Windows <by operating system> is just a structural label. It’s not a concept you’ll use to categorize documents. Non-subject concepts Don’t include document attributes that aren’t subjects, such as forms or genres (blogs, articles, books, diaries...). Really, I mean it. You are creating a representation of a subject that can be used to organize documents; you are not describing the types of documents in which users might be interested. Include in your classification: terms for concepts that relate to gardening, such as types of plants (grasses, cacti, shrubs). Do not include in your classification: Document types that list such plants (plant databases, seed catalogs). However, you might use your classification to categorize a cactus database with the Cacti concept... INF 384 C, Spring 2009