Metadata standards and interoperability The world of standards A standard is any agreed-upon means of doing something. Standards can be formally created and adopted or merely customary. With standards, products and processes have a certain level of consistency and predictability that can make production and use more efficient. Goals of metadata standards Metadata standards enable more reliable and consistent description. For example, by agreeing to use separate fields to indicate first names and last names of resource creators, displays of search results by author can be properly alphabetized and more easily read, no matter if first name or last name comes first in the display. Reliable description facilitates the sharing of data across different systems—interoperability. Interoperability: for money as well as love Interoperable records facilitate information access and exchange across contexts: not just for cultural heritage. A distributor like Amazon sells products from many different providers. To the extent that Amazon can get interoperable product records from its suppliers, its job is easier—and you can find that book, or pair of shoes, or compost bin. Interoperability: for money as well as love Interoperable wine records? • Astor Wines • Wine.com • Sherry-Lehmann • 67 Wine Types of standards Elings and Waibel describe four types of metadata standards: • Data structure (attributes, elements, or fields): Dublin Core; CDWA (museums), EAD (archives), MARC (libraries). • Data content (values): CCO (museums), RDA (libraries), DACS (archives). • Data format: XML (aaaannddd...MARC; EAD is also built around XML). • Data exchange: Z39.50 and OAI. These are useful categories, but standards may straddle them. You could say, for example, that MARC reflects RDA and not the other way around—although MARC defines data fields in a technical sense, RDA defines the content with which the fields are populated and to some degree conceptually determines the MARC fields; in practice these two become functionally intertwined. Multiple standards at work A cataloger uses RDA to determine: • That a book’s title should be part of its description. • The wording, spelling, capitalization, and punctuation of the title. The cataloger uses MARC to record the title information in a consistent form that computers can process. Multiple standards at work Two computer networks can use Z39.50 to determine how to exchange their MARC catalog records. The result? A user at Library A can search Library B’s catalog and not discern a difference in the way that information is structured and presented. It just works. Multiple standards at work An archivist uses EAD to determine that an archival finding aid should include a scope and content note. The archivist uses DACS for guidance on what to include in the scope and content note and how to express that content. Multiple standards at work The archivist uses EAD to include the scope and content note in a machine-accessible document. The result? A researcher can access finding aids from Archive A online, and these documents have similar content and structure. Other areas of the finding aid document might appear as links in the Scope and Content note. Multiple standards at work A museum curator is documenting a new acquisition in proprietary museum database software. •The collection management system includes a field for the “Work Type,” which is a core attribute from CDWA. •Guidance for describing the work type is given in CCO. •The Art and Architecture Thesaurus (AAT) includes vocabulary terms that can be used to describe the work type. Multiple standards at work Later, collection data is mapped from CDWA (data structure) to the Europeana Data Model (EDM) (data structure), for aggregation into Europeana and subsequent data reuse. In this mapping, the proprietary database format (data format) is translated to the EDM’s RDF/XML schema (data format). Developing and adopting standards Organizations agree to adopt standards because the benefits of creating products or services that work together can be great. However, developing standards and forging that agreement can be a difficult process. For metadata content standards, using them can be complicated, and there is plenty of room for interpretive flexibility. Content standards: considerations Why are content standards so complicated? Because documents are various! Most content standards will try to implement a few basic guidelines supplemented by rules and options for special cases. Ideally, the basic guidelines will be based on clearly articulated goals and principles. Example: RDA goals RDA has articulated a concrete set of descriptive goals and principles. A few goals: • Enable description of any resource (not just printed materials). • Align with the FRBR conceptual model (works, expressions, manifestations, resources) and its objectives (finding, selecting, understanding, and so on). • Create content descriptions that can be used in multiple encodings and displays. • Retain backward compatibility with existing records. Example: RDA Principles One principle is that descriptions should reflect “the resource’s representation of itself.” This is a longstanding principle in library cataloging: where possible, description = transcription. This can be linked to the objective of finding known items: the catalog description should match how the item is known to others, which is most likely from the item itself. Example: RDA guidelines This principle of transcription underlies the basic guideline for RDA titles, which is that the “title proper” or primary title should come from the preferred source of information, which for books is the title page. While the wording comes from the title page, though, the capitalization and punctuation are standardized for all titles. Example: RDA special cases What if... • Some introductory words on the title page seem like they’re not really part of the title (e.g., Walt Disney Presents Sleeping Beauty)? • The title is given in two languages (e.g., Canadian Literature/Literature Canadienne)? • There is a spelling mistake in the title? • The document is a manifestation of a commonly known work but has a slightly different title than most manifestations (e.g., William Shakespeare’s Hamlet)? • A subtitle appears under what seems to be the main title (e.g., Museum Informatics an introductory textbook)? • The title is over one paragraph long? Keeping standards relevant Standards are immediately out of date. Particular institutions, such as the Library of Congress, will issue their own rules for interpreting the standards, which smaller organizations (such as the University of Texas) may or may not choose to adopt. Levels of interoperability Different kinds of standards enable different kinds of interoperability. Let’s say someone gives you a metadata record to incorporate in your database of records from your schema. What can you do with it? •Your computer can read the file—system interoperability. •Your database understands the file format—syntax interoperability. •The attributes match other records in the database—structural interoperability. •The values in the fields are consistent with other records in the database—semantic interoperability. Derivation New schemas are subsets, supersets, or direct translations of existing schemas: • CDWA Lite is a subset of CDWA (removes some attributes). • French Dublin Core is a translated version of Dublin Core (same attributes, different labels). • Gateway to Educational Materials (GEM) adds elements to Dublin Core. Application profiles Application profiles mix attributes from different existing schemas or mix usage rules for attributes from different existing schemas. The application profile for the Digital Public Library of America (DPLA) uses elements from: •Dublin Core. •The Europeana data model (EDM). •A “Basic Geo” schema created by the W3C (wgs84) for simple geographic information. •The DPLA itself (published separately from the profile). Crosswalks Crosswalks are mappings between one schema to another. For example, a crosswalk might specify that the Title element in CDWA should be mapped to the Title element in Dublin Core. Crosswalks can map only schema elements that are semantically equivalent, or they can map semantically “close” elements to each other. Switching languages Switches map multiple schemas to a single switching language. For example, multiple content schemas could all be mapped to Dublin Core. The Dublin Core content could then in turn be mapped to something else. (This is more efficient than mapping each individual schema to the result.) Imagine a multilingual conversation in which everyone has a different native language but speaks French... Frameworks A basic set of concepts and specifications that are agreed upon by a particular group. For example, the Warwick Framework is an early specification that designates the idea of a “container” as an aggregation of metadata sets, or “packages.” Agreements on ideas like containers and packages facilitate the sharing of different sorts of units. (The DPLA, for example, relies on “service hubs” that aggregate metadata sets from individual contributing institutions.) Registries Registries publish information about metadata schemas. Registries constitute reference information that facilitate the development of new application profiles, crosswalks, and so on. Open Metadata Registry Aggregated infrastructures Some examples of systems that are enabled via all of this stuff: • Europeana, the European cultural heritage data aggregation. • The Digital Public Library of America (DPLA). Europeana and the DPLA describe themselves primarily as platforms: they want you (really, they want you) to create applications and other cool stuff with the data (really metadata) that they aggregate and publish. Schema assignment notes Consider whether attributes should be: • Mandatory or optional. • Repeatable. You might include general guidelines that apply to all attributes in your schema, as well as guidelines for each attribute. (Check the CDP best practice guidelines for an example.)