Books and bones

advertisement
Books and bones
Books
As you probably know, library catalog records for different editions of the same work have certain
variations. Subject headings are one site of such variation. Subject headings in catalog records are
selected from a controlled vocabulary, the Library of Congress Subject Headings (LCSH). A catalog
record can have multiple subject headings. (In contrast, a library resource has a single classification code;
the classification is a different metadata element and is used to determine physical placement of
resources.)
Here are some variations in the assignment of subject headings for a single book, the Communist
Manifesto. (These are from actual library catalog records.)
Edition 1: Communism
Edition 2: Socialism
Edition 3: Communism; Socialism
Edition 4: Socialism; Communism
Edition 5: Socialism; Communism; Electronic books
Edition 6: Marx, Karl, 1818-1883. Manifest der Kommunistischen Partei; Marx, Karl, 1818-1883;
Socialism; Communism; Political science / Political Ideologies / Communism & Socialism; Electronic
books
Edition 7: Communism; Communism -- Germany
And for you fiction fans, here are some variations in the assignment of subject headings for another book,
Uncle Tom’s Cabin (also from actual library catalog records).
Edition 1: Slavery -- United States -- Fiction
Edition 2: Slavery -- United States -- History -- Fiction; Political fiction; Didactic fiction
Edition 3: Uncle Tom (Fictitious character) -- Fiction; Master and servant -- Fiction; African Americans - Fiction; Fugitive slaves -- Fiction; Plantation life -- Fiction; Slavery -- Fiction; Slaves -- Fiction;
Political fiction; Southern States -- Fiction; English fiction; United States; Political fiction; Didactic
fiction
Edition 4: African Americans -- History -- Fiction; Slavery -- United States -- Fiction; United States -History -- 1815-1861 -- Fiction
Edition 5: Slavery -- United States -- Fiction ; Abolitionists -- Fiction
Questions for discussion
 What is your initial reaction to such variation in the application of controlled vocabulary values?
 What effects do this kind of variation have on searching (e.g., to find all editions of the
Communist Manifesto, to find all books that the library has on the subject of communism, to find
books about Karl Marx, and so on)?
 What effects do this kind of variation have on understanding (e.g., to understand what the
difference might be between communism and socialism, to understand abolitionism, and so on)?
Bones
The situation with cataloging books is not unique. Scientists make similar decisions when describing
specimens. Even when disciplinary communities agree on descriptive practices, assigning values from
controlled vocabularies to a specimen is often not straightforward. For example, Charles Goodwin, an
anthropologist, describes difficulties encountered by archeologists in using the Munsell color chart to
assign values to soil samples.1 The chart is designed to make the process of soil color identification as
objective and systematic as possible; the archeologist takes a sample on a trowel and places the sample
into appropriate holes on the chart (see picture below).
But this procedure, as described by Goodwin, involves significant judgment calls. The soil is matte and
the chart glossy, for one, which makes comparison more difficult; also, a sample can fall between shades,
and collectors will disagree about which to apply.
Such interpretive flexibility in the assignment of metadata, even when controlled vocabularies are used,
complicates potential re-use of scientific data. The Atici, et al article that you read for this week describes
such a situation. Three zooarcheologists independently undertook to analyze a dataset of 30,000 animal
bone specimens from excavations at Chogha Mish, Iran, during the 1960s and 1970s. While all three
analysts agreed that the data was of sufficient quality to warrant further analysis, the three took different
approaches in determining how to account for metadata decisions made by the original data collector. For
example, the article notes that the original data collector “was very conservative and certain in her
taxonomic identifications” using designations such as “Ovis/Capra/Gazelle,” “large-size mammal,” or
“medium artiodactyl” to “account for large samples in many of the assemblages from almost all the
periods” instead of more specific identifications. The article clarifies that “these are common
‘methodological categories’ that zooarchaeologists employ when they lack confidence or certainty in
identification,” so this was not an error on the part of the original collector, just a decision. The article
implies that all three analyses are equally correct in their interpretation of the original data.
Questions for discussion:
 Do you perceive the variation between zooarcheological dataset analyses differently than you
perceive the variation in descriptions in library catalog records?
 The variation in these zooarcheological analyses did not arise from insufficient “quality” or
“accuracy” in the application of vocabulary terms to the specimens. That is, the original data
collector made decisions, not mistakes (just like with the catalog records). How does this
interpretive flexibility affect “data curation” efforts to store, document, and potentially aggregate
and re-use scientific data?
 Does it matter if computers are collecting the data? (You might think about how computers track
your Web transactions, phone calls, physical location [given the GPS in your phone], and so on.)
1
Goodwin, Charles. (1994) Professional vision. American Anthropologist 96(3): 606-633.
Download