Interoperability and standards

advertisement
Metadata standards
and interoperability
The world of standards
A standard is any agreed-upon means of doing
something.
Standards can be formally created and adopted or
merely customary.
With standards, products and processes have a certain
level of consistency and predictability that can make
production and use more efficient.
Goals of metadata standards
Metadata standards enable more reliable and consistent
description. For example, by agreeing to use separate fields
to indicate first names and last names of resource creators,
displays of search results by author can be properly
alphabetized and more easily read, no matter if first name or
last name comes first in the display.
Reliable description facilitates the sharing of data across
different systems—interoperability.
Interoperability:
for money as well as love
Interoperable records facilitate information
access and exchange across contexts: not just for
cultural heritage.
A distributor like Amazon sells products from
many different providers. To the extent that
Amazon can get interoperable product records
from its suppliers, its job is easier—and you can
find that book, or pair of shoes, or compost bin.
Interoperability:
for money as well as love
Interoperable wine records?
• Astor Wines
• Wine.com
• Sherry-Lehmann
• 67 Wine
Types of standards
Elings and Waibel describe four types of metadata standards:
• Data structure (attributes, elements, or fields): Dublin Core; CDWA (museums),
EAD (archives), MARC (libraries).
• Data content (values): CCO (museums), RDA (libraries), DACS (archives).
• Data format: XML (aaaannddd...MARC; EAD is also built around XML).
• Data exchange: Z39.50 and OAI.
These are useful categories, but standards may straddle them. You could say, for
example, that MARC reflects RDA and not the other way around—although
MARC defines data fields in a technical sense, RDA defines the content with
which the fields are populated and to some degree conceptually determines the
MARC fields; in practice these two become functionally intertwined.
Multiple standards at work
A cataloger uses RDA to determine:
• That a book’s title should be part of its description.
• The wording, spelling, capitalization, and punctuation
of the title.
The cataloger uses MARC to record the title information
in a consistent form that computers can process.
Multiple standards at work
Two computer networks can use Z39.50 to determine
how to exchange their MARC catalog records.
The result? A user at Library A can search Library B’s
catalog and not discern a difference in the way that
information is structured and presented. It just works.
Multiple standards at work
An archivist uses EAD to determine that an archival
finding aid should include a scope and content note.
The archivist uses DACS for guidance on what to
include in the scope and content note and how to
express that content.
Multiple standards at work
The archivist uses EAD to include the scope and content
note in a machine-accessible document.
The result? A researcher can access finding aids from
Archive A online, and these documents have similar
content and structure. Other areas of the finding aid
document might appear as links in the Scope and
Content note.
Multiple standards at work
A museum curator is documenting a new acquisition in
proprietary museum database software.
•The collection management system includes a field for
the “Work Type,” which is a core attribute from CDWA.
•Guidance for describing the work type is given in
CCO.
•The Art and Architecture Thesaurus (AAT) includes
vocabulary terms that can be used to describe the work
type.
Multiple standards at work
Later, collection data is mapped from CDWA (data
structure) to the Europeana Data Model (EDM) (data
structure), for aggregation into Europeana and
subsequent data reuse.
In this mapping, the proprietary database format (data
format) is translated to the EDM’s RDF/XML schema
(data format).
Developing and adopting standards
Organizations agree to adopt standards because the benefits
of creating products or services that work together can be
great.
However, developing standards and forging that agreement
can be a difficult process.
For metadata content standards, using them can be
complicated, and there is plenty of room for interpretive
flexibility.
Content standards: considerations
Why are content standards so complicated? Because
documents are various!
Most content standards will try to implement a few
basic guidelines supplemented by rules and options for
special cases.
Ideally, the basic guidelines will be based on clearly
articulated goals and principles.
Example: RDA goals
RDA has articulated a concrete set of descriptive goals and
principles. A few goals:
• Enable description of any resource (not just printed materials).
• Align with the FRBR conceptual model (works, expressions,
manifestations, resources) and its objectives (finding, selecting,
understanding, and so on).
• Create content descriptions that can be used in multiple
encodings and displays.
• Retain backward compatibility with existing records.
Example: RDA Principles
One principle is that descriptions should reflect “the resource’s
representation of itself.”
This is a longstanding principle in library cataloging: where
possible, description = transcription.
This can be linked to the objective of finding known items: the
catalog description should match how the item is known to
others, which is most likely from the item itself.
Example: RDA guidelines
This principle of transcription underlies the basic
guideline for RDA titles, which is that the “title proper”
or primary title should come from the preferred source
of information, which for books is the title page.
While the wording comes from the title page, though,
the capitalization and punctuation are standardized for
all titles.
Example: RDA special cases
What if...
• Some introductory words on the title page seem like they’re not really part of
the title (e.g., Walt Disney Presents Sleeping Beauty)?
• The title is given in two languages (e.g., Canadian Literature/Literature
Canadienne)?
• There is a spelling mistake in the title?
• The document is a manifestation of a commonly known work but has a
slightly different title than most manifestations (e.g., William Shakespeare’s
Hamlet)?
• A subtitle appears under what seems to be the main title (e.g., Museum
Informatics an introductory textbook)?
• The title is over one paragraph long?
Keeping standards relevant
Standards are immediately out of date.
Particular institutions, such as the Library of Congress, will issue
their own rules for interpreting the standards, which smaller
organizations (such as the University of Texas) may or may not
choose to adopt.
Levels of interoperability
Different kinds of standards enable different kinds of
interoperability. Let’s say someone gives you a metadata record
to incorporate in your database of records from your schema.
What can you do with it?
•Your computer can read the file—system interoperability.
•Your database understands the file format—syntax
interoperability.
•The attributes match other records in the database—structural
interoperability.
•The values in the fields are consistent with other records in the
database—semantic interoperability.
Derivation
New schemas are subsets, supersets, or direct translations of
existing schemas:
• CDWA Lite is a subset of CDWA (removes some attributes).
• French Dublin Core is a translated version of Dublin Core
(same attributes, different labels).
• Gateway to Educational Materials (GEM) adds elements to
Dublin Core.
Application profiles
Application profiles mix attributes from different existing
schemas or mix usage rules for attributes from different existing
schemas.
The application profile for the Digital Public Library of America
(DPLA) uses elements from:
•Dublin Core.
•The Europeana data model (EDM).
•A “Basic Geo” schema created by the W3C (wgs84) for simple
geographic information.
•The DPLA itself (published separately from the profile).
Crosswalks
Crosswalks are mappings between one schema to another.
For example, a crosswalk might specify that the Title element in
CDWA should be mapped to the Title element in Dublin Core.
Crosswalks can map only schema elements that are semantically
equivalent, or they can map semantically “close” elements to
each other.
Switching languages
Switches map multiple schemas to a single switching language.
For example, multiple content schemas could all be mapped to
Dublin Core. The Dublin Core content could then in turn be
mapped to something else. (This is more efficient than mapping
each individual schema to the result.)
Imagine a multilingual conversation in which everyone has a
different native language but speaks French...
Frameworks
A basic set of concepts and specifications that are agreed upon by
a particular group.
For example, the Warwick Framework is an early specification
that designates the idea of a “container” as an aggregation of
metadata sets, or “packages.”
Agreements on ideas like containers and packages facilitate the
sharing of different sorts of units. (The DPLA, for example, relies
on “service hubs” that aggregate metadata sets from individual
contributing institutions.)
Registries
Registries publish information about metadata schemas.
Registries constitute reference information that facilitate the
development of new application profiles, crosswalks, and so on.
Open Metadata Registry
Aggregated infrastructures
Some examples of systems that are enabled via all of this stuff:
• Europeana, the European cultural heritage data aggregation.
• The Digital Public Library of America (DPLA).
Europeana and the DPLA describe themselves primarily as
platforms: they want you (really, they want you) to create
applications and other cool stuff with the data (really metadata)
that they aggregate and publish.
Schema assignment notes
Consider whether attributes should be:
• Mandatory or optional.
• Repeatable.
You might include general guidelines that apply
to all attributes in your schema, as well as
guidelines for each attribute. (Check the CDP
best practice guidelines for an example.)
Download