Subjects Indexing, or assigning subject terms to documents Subjects When we say “I want some information about gardening” or “I read a great book about Andrew Jackson’s presidency” we all know what those things mean, right? We are referring to gardening and Andrew Jackson’s presidency as subjects. These are concepts that describe what the document is about, its topic or its major themes. But how do we determine a document’s subject? What if I say, “Oh, I read that Andrew Jackson book. It uses Andrew Jackson’s presidency as an analytical focus, but I would say it’s really about democracy and federalism in the early United States.” Is the subject Andrew Jackson? Is it democracy and federalism in the United States? How do we know? Oh no, it’s like the “work” Yep. Similar to the idea of the “work” that we talked about earlier in the course, the idea of a “subject” seems intuitive and easy to define, but it’s actually difficult to pin down precisely. Remember the catalog records for the Uncle Tom’s Cabin? Is that book “about” the history of slavery in the United States, or the character of Uncle Tom, or political fiction? Why should we care? So it’s hard to say for certain what the subject of a document is. What’s the problem? Like the idea of the work, we use the idea of the subject constantly when we are seeking, using, selecting between, evaluating, interpreting, and otherwise needing to describe documents. We can’t ignore subjects because they are difficult. Document-oriented definitions of the subject As described by Fidel, subject is often defined as a document attribute. (The answer to the question “what is it about?”) Two ideas of the subject emerge from this line of thinking: • Subjects exist as ideal forms, and they can be accurately identified in documents via some rational decision process (Hjorland’s “objective idealism”). • There is no ideal form of subject; a subject depends on the way that people interpret the document, and it is hard to decide which interpretation is best (Hjorland’s “subjective idealism”) Terminology interlude: indexing “Indexing” is the assignment of subject terms to a document to represent its contents. “Subject cataloging” is similar, but the terms are known as “headings.” “Classification” is the assignment of a subject class, or category, to a document. Yeah, they’re kind of the same thing. But they have slightly different histories. Terminology interlude: postcoordinate and precoordinate indexing Precoordinate indexing means that complex terms have been enumerated in advance, and the indexer assigns, typically, the most specific appropriate term. Postcoordinate indexing means that multiple component terms may be assigned to indicate a complex term. The “coordination” or combination occurs when the document is indexed or searched for. Examples: postcoordinate and precoordinate indexing Precoordinate indexing terms might include: • Design of hypertext literature. • Mushroom foraging in the Pacific Northwest. Postcoordinate indexing terms might include: • Design. • Hypertext. • Literature. • Mushrooms. • Foraging. • Pacific Northwest. Subject as inherent property In this view, ideal subjects exist in some Platonic dimension. Because subjects do, in fact, exist independently of people, we can talk about the subject as an objective property that is inherent in documents. We can accurately and objectively identify subjects in documents by following logical processes of deduction. Example: My sister’s book Elusive Equality: Gender, Citizenship, and the Limits of Democracy in Czechoslovakia, 1918-1950 In the UT catalog, my sister’s book is about: • Women -- Czechoslovakia -- Social conditions. • Women’s rights -- Czechoslovakia. But she thinks it’s about civil rights and democracy. According to the “objective idealist” view of the subject, neither of these might be right, but the right description IS out there. Subject as interpretive construct The “subject” is subjective! Anyone’s interpretation is ok. Maybe my sister’s book is about women’s rights in Czechoslovakia, maybe it’s about civil rights and democracy. Can’t we just all get along? But can I say the book is about sea slugs or Andrew Jackson’s presidency or how bad things happen when you let those pesky wimmin get the vote and stuff? Use-oriented definitions of the subject In these definitions, the subject depends on how a document is or might be used. Two variations of this idea: • “Request-oriented” ideas of the subject, or what users need from documents (Hjorland’s “pragmatic” view). • The ultimate contribution of the document to knowledge (Hjorland’s “realist/materialist” view). Subject as what you need In “request-oriented indexing” as described by Fidel, or in Hjorland’s discussion of pragmatic views of the subject, the subject of a document is based on its relevance to user needs. If you are writing a paper on democracy and my sister’s book can help you, then her book is about democracy. For you. In that situation. Subject as prediction of a document’s contribution In Hjorland’s “realist/materialist” view, the subject describes a document’s contribution to its discipline, or to human knowledge. A subject determination is thus a kind of prediction about what a document’s importance will be to the field. This is “realist” because eventually there will be a sort of answer. Example: The Protocols of the Elders of Zion is really about antisemitism and hoaxes—that’s its ultimate contribution to knowledge—not the Jewish conspiracy to rule the world. In the meantime, a subject determination is an argument for what might or should happen. Genre, form, and subject Do artistic works have subjects? Does it matter if the work is text (fiction, poetry) or not (images, music, film)? Is the “subject” of Uncle Tom’s Cabin that it is a novel? It’s not about novels! Is Uncle Tom’s Cabin about slavery in the same way that a history book is about slavery? Of-ness and aboutness This photo is of a rose (it contains a rose). Is it about happiness? (It was tagged with “happiness” in Flickr.) Subject analysis Subject analysis involves the systematic determination of a document’s subject for the purpose of placing the document in an organized collection. Concepts that designate the subject are often assigned using some type of controlled vocabulary or classification scheme. In this case, we want to be consistent in how subjects are assigned, so that the subject has a consistent meaning in the context of the collection, at least. Subject analysis process There are a number of models, but the ISO standard for subject analysis involves three activities: • Examining the document and identifying the subject. • Identifying the primary concepts in the subject. • Determining how to express those concepts in the vocabulary that is being used for indexing. Example: subject analysis An article is about possible economic effects on U.S. agriculture as the result of a policy decision by the EU to enable member countries to ban genetically engineered crops. Concepts in this subject description might be: • Bans on genetically engineered crops. • European Union agricultural import policies. • United States agriculture industry. Exhaustivity and specificity If we want to attempt consistency in indexing, we need to determine: • What makes a theme or topic important enough to be indexed (exhaustivity). • The level of detail at which index terms are assigned (specificity). Summary • The subject, or what a document is about, is a complex concept that is difficult to define precisely. • Some ideas of the subject are based on what a document says, others on the context of use. • Subject analysis involves identifying a what a document is “about,” expressing that subject as a set of concepts, and then selecting the index terms that best represent those concepts. Your next assignment (already!) You will be developing a small subject language with which to describe the subjects of documents in a particular (tiny) domain. • Your first step: decide on a subject area to represent by 3 p.m. next Wednesday, February 26. • More specific and technical subject areas are easiest: baking, woodworking, photography, tattooing. Look at some basic resources to get a sense of the domain. • Your approach to the subject area will be mediated through a specific audience and purpose for your subject language. • You are developing a structure to describe the subjects of documents. The components of your subject language will be subject concepts, not genre or form terms.