Books and bones Books As you probably know, library catalog records for different editions of the same work have certain variations. Subject headings are one site of such variation. Subject headings in catalog records are selected from a controlled vocabulary, the Library of Congress Subject Headings (LCSH). A catalog record can have multiple subject headings. (In contrast, a library resource has a single classification code; the classification is a different metadata element and is used to determine physical placement of resources.) Here are some variations in the assignment of subject headings for a single book, the Communist Manifesto. (These are from actual library catalog records.) Edition 1: Communism Edition 2: Socialism Edition 3: Communism; Socialism Edition 4: Socialism; Communism Edition 5: Socialism; Communism; Electronic books Edition 6: Marx, Karl, 1818-1883. Manifest der Kommunistischen Partei; Marx, Karl, 1818-1883; Socialism; Communism; Political science / Political Ideologies / Communism & Socialism; Electronic books Edition 7: Communism; Communism -- Germany And for you fiction fans, here are some variations in the assignment of subject headings for another book, Uncle Tom’s Cabin (also from actual library catalog records). Edition 1: Slavery -- United States -- Fiction Edition 2: Slavery -- United States -- History -- Fiction; Political fiction; Didactic fiction Edition 3: Uncle Tom (Fictitious character) -- Fiction; Master and servant -- Fiction; African Americans - Fiction; Fugitive slaves -- Fiction; Plantation life -- Fiction; Slavery -- Fiction; Slaves -- Fiction; Political fiction; Southern States -- Fiction; English fiction; United States; Political fiction; Didactic fiction Edition 4: African Americans -- History -- Fiction; Slavery -- United States -- Fiction; United States -History -- 1815-1861 -- Fiction Edition 5: Slavery -- United States -- Fiction ; Abolitionists -- Fiction Questions for discussion What is your initial reaction to such variation in the application of controlled vocabulary values? What effects do this kind of variation have on searching (e.g., to find all editions of the Communist Manifesto, to find all books that the library has on the subject of communism, to find books about Karl Marx, and so on)? What effects do this kind of variation have on understanding (e.g., to understand what the difference might be between communism and socialism, to understand abolitionism, and so on)? Bones The situation with cataloging books is not unique. Scientists make similar decisions when describing specimens. Even when disciplinary communities agree on descriptive practices, assigning values from controlled vocabularies to a specimen is often not straightforward. For example, Charles Goodwin, an anthropologist, describes difficulties encountered by archeologists in using the Munsell color chart to assign values to soil samples.1 The chart is designed to make the process of soil color identification as objective and systematic as possible; the archeologist takes a sample on a trowel and places the sample into appropriate holes on the chart (see picture below). But this procedure, as described by Goodwin, involves significant judgment calls. The soil is matte and the chart glossy, for one, which makes comparison more difficult; also, a sample can fall between shades, and collectors will disagree about which to apply. Such interpretive flexibility in the assignment of metadata, even when controlled vocabularies are used, complicates potential re-use of scientific data. The Atici, et al article that you read for this week describes such a situation. Three zooarcheologists independently undertook to analyze a dataset of 30,000 animal bone specimens from excavations at Chogha Mish, Iran, during the 1960s and 1970s. While all three analysts agreed that the data was of sufficient quality to warrant further analysis, the three took different approaches in determining how to account for metadata decisions made by the original data collector. For example, the article notes that the original data collector “was very conservative and certain in her taxonomic identifications” using designations such as “Ovis/Capra/Gazelle,” “large-size mammal,” or “medium artiodactyl” to “account for large samples in many of the assemblages from almost all the periods” instead of more specific identifications. The article clarifies that “these are common ‘methodological categories’ that zooarchaeologists employ when they lack confidence or certainty in identification,” so this was not an error on the part of the original collector, just a decision. The article implies that all three analyses are equally correct in their interpretation of the original data. Questions for discussion: Do you perceive the variation between zooarcheological dataset analyses differently than you perceive the variation in descriptions in library catalog records? The variation in these zooarcheological analyses did not arise from insufficient “quality” or “accuracy” in the application of vocabulary terms to the specimens. That is, the original data collector made decisions, not mistakes (just like with the catalog records). How does this interpretive flexibility affect “data curation” efforts to store, document, and potentially aggregate and re-use scientific data? Does it matter if computers are collecting the data? (You might think about how computers track your Web transactions, phone calls, physical location [given the GPS in your phone], and so on.) 1 Goodwin, Charles. (1994) Professional vision. American Anthropologist 96(3): 606-633.