Metadata standards Guidelines, data structures, and file formats to facilitate reliability and

advertisement
Metadata standards
Guidelines, data structures, and file
formats to facilitate reliability and
quality of description
Outline
•
•
•
•
Why create and follow metadata standards?
What kinds of standards are there?
How does this all work?
How do standards evolve?
The world of standards
A standard is any agreed-upon means of doing
something.
Standards can be formally created and adopted or
merely customary.
With standards, products and processes have a certain
level of consistency and predictability that can make
production and use more efficient.
Goals of metadata standards
Metadata standards enable more reliable description. For
example, by agreeing to use separate fields to indicate first
names and last names of resource creators, displays of
search results by author can be properly alphabetized and
more easily read, no matter if first name or last name comes
first in the display.
Reliable description enables the sharing of data across
different systems.
Types of standards
Elings and Waibel describe four types of metadata standards:
• Data structure (fields); MARC and EAD.
• Data content (values); AACR2 (RDA) and DACS.
• Data format; XML.
• Data exchange; Z39.50 and OAI.
These are useful categories, but sometimes standards may straddle
them. You could say, for example, that MARC reflects AACR2 and
not the other way around (although MARC defines data fields in a
technical sense, AACR2 defines the content with which the fields
are populated and to some degree conceptually determines the
MARC fields; in practice these two become functionally
intertwined).
Multiple standards at work
A cataloger uses AACR2 to determine:
• That a book’s title should be part of its description.
• The wording, spelling, capitalization, and punctuation
of the title.
The cataloger uses MARC to record the title information
in a consistent form that computers can process.
Multiple standards at work
Two computer networks can use Z39.50 to determine
how to exchange their MARC catalog records.
The result? A user at Library A can search Library B’s
catalog and not discern a difference in the way that
information is structured and presented. It just works.
Developing and adopting standards
Organizations agree to adopt standards because the benefits
of creating products or services that work together can be
great.
However, developing standards and forging that agreement
can be a difficult process.
For metadata content standards, using them can be
complicated, and there is plenty of room for interpretive
flexibility.
Content standards: considerations
Why are content standards so complicated? Because
documents are various!
Most content standards will try to implement a few
basic guidelines supplemented by rules and options for
special cases.
Ideally, the basic guidelines will be based on clearly
articulated goals and principles.
Example: RDA goals
RDA has articulated a concrete set of descriptive goals and
principles.
A few goals:
• Enable description of any resource (not just printed materials).
• Align with the FRBR conceptual model (works, expressions,
manifestations, resources) and its objectives (finding, selecting,
understanding, and so on).
• Create content descriptions that can be used in multiple
encodings and displays.
• Retain backward compatibility with existing records.
Example: RDA Principles
One principle is that descriptions should reflect “the resource’s
representation of itself.”
This is a longstanding principle in library cataloging: where
possible, description = transcription.
This can be linked to the objective of finding known items: the
catalog description should match how the item is known to
others, which is most likely from the item itself.
Example: RDA guidelines
This principle of transcription underlies the basic
guideline for RDA titles, which is that the “title proper”
or primary title should come from the preferred source
of information, which for books is the title page.
While the wording comes from the title page, though,
the capitalization and punctuation are standardized for
all titles.
Example: RDA special cases
What if...
• Some introductory words on the title page seem like they’re not really part of
the title (e.g., Walt Disney Presents Sleeping Beauty)?
• The title is given in two languages (e.g., Canadian Literature/Litterature
Canadienne)?
• There is a spelling mistake in the title?
• The document is a manifestation of a commonly known work but has a
slightly different title than most manifestations (e.g., William Shakespeare’s
Hamlet)?
• A subtitle appears under what seems to be the main title (e.g., Museum
Informatics an introductory textbook)?
• The title is over one paragraph long?
Keeping standards relevant
Standards are immediately out of date, of course.
RDA has been in development since 2004, as part of a
cooperative effort by U.S., U.K., Canadian, and Australian library
associations. These are tremendous efforts!
Particular institutions, such as the Library of Congress, will issue
their own rules for interpreting the standards, which smaller
organizations (such as the University of Texas) may or may not
choose to adopt.
Your mission
Complete your subject classification for next
week: introduction, classified structure,
alphabetical structure, and reflective essay.
A few notes on assignments, based on what I’ve
seen in meeting with many of you, follow...
Sort like with like
Try to place like kinds of things together (processes, products,
people), not just things that have some thematic relation.
Remember, a hierarchy in its strict form takes one kind of thing
and goes from the most general category to the most specific.
this: Animals -> domesticated animals -> animals raised for food
-> pigs
this: Agricultural processes -> farming -> factory farming
this: Effects -> effects of farming practices -> effects on animals
-> overcrowding
not this: Animals -> pastures, pens, cages -> overcrowding
not this: Animals -> factory farming -> mercury poisoning
Levels of abstraction
Wrangling your concepts can be difficult when they are at different
levels of abstraction. You may need to generate intermediate levels
that weren’t explicit in your source documents.
Source concepts: meat eating, e.coli, cholesterol, sustainability
disadvantages of meat eating
health risks
health risks associated with meat eating
high cholesterol
health risks associated with industrial meat production
bacterial contamination
e.coli contamination
unsustainable practices
effects of industrial meat production
consumption of resources
pollution
Node labels or subfacet labels
Especially because your
classifications are small, many of you
may make use of labels that help
clarify the principles of division used
in your classified structure.
In most cases, you will not use these
terms to describe documents, and
they are not, strictly speaking, actual
concepts in your classification. You
don’t need to include them in your
alphabetical representation.
Example
Computers
<by form factor>
Desktop
Laptop
<by operating system>
MacOS
Linux
Windows
<by operating system> is just a
structural label. It’s not a
concept you’ll use to categorize
documents.
Non-subject concepts
Don’t include document attributes that aren’t subjects, such as forms or genres
(blogs, articles, books, diaries...). Really, I mean it.
You are creating a representation of a subject that can be used to organize
documents; you are not describing the types of documents in which users might
be interested.
Include in your classification: terms for concepts that relate to gardening, such
as types of plants (grasses, cacti, shrubs).
Do not include in your classification: Document types that list such plants
(plant databases, seed catalogs). However, you might use your classification to
categorize a cactus database with the Cacti concept...
INF 384 C, Spring 2009
Download