Subjects Indexing, or assigning subject terms to documents

advertisement
Subjects
Indexing, or assigning subject
terms to documents
Subjects
When we say “I want some information about
gardening” or “I read a great book about Andrew
Jackson’s presidency” we all know what those
things mean, right?
We are referring to gardening and Andrew
Jackson’s presidency as subjects. These are
concepts that describe what the document is
about, its topic or its major themes.
But how do we determine a
document’s subject?
What if I say, “Oh, I read that Andrew Jackson
book. I don’t think it’s really about Andrew
Jackson, though. It’s really about democracy and
federalism in the early United States.”
Is the subject Andrew Jackson? Is it democracy
and federalism in the United States? How do we
know?
Oh no, it’s like the “work”
Similar to the idea of the “work” that we talked
about earlier in the course, the idea of a “subject”
seems intuitive and easy to define, but it’s
actually difficult to pin down precisely.
Remember the catalog records for the Protocols
of the Elders of Zion? Is that book “about” the
Jewish conspiracy to create a world government?
Is it “about” anti-semitism? Is it “about” hoaxes?
Why should we care?
So it’s hard to say for certain what the subject of
a document is. What’s the problem?
Like the idea of the work, we use the idea of the
subject constantly when we are seeking, using,
evaluating, and otherwise needing to describe
documents. The subject is a key attribute in
facilitating effective information retrieval. So we
kind of have to deal with it.
Document-oriented definitions
of the subject
As described by Fidel, subject is often defined as a document
attribute. (The answer to the question “what is it about?”)
Two ideas of the subject emerge from this line of thinking:
• Subjects exist as ideal forms, and they can be accurately identified
in documents via some rational decision process (Hjorland’s
“objective idealism”).
• There is no ideal form of subject; a subject depends on the way
that people interpret the document, and it is hard to decide which
interpretation is best (Hjorland’s “subjective idealism”)
Interlude: indexing
“Indexing” is the assignment of subject terms to a
document to represent its contents. “Subject cataloging”
is similar, but the terms are known as “headings.”
“Classification” is the assignment of a subject class, or
category, to a document.
Yeah, they’re kind of the same thing. Except they have
slightly different histories.
Interlude: postcoordinate and
precoordinate indexing
Precoordinate indexing means that complex terms have
been enumerated in advance, and the indexer assigns,
typically, the most specific appropriate term.
Postcoordinate indexing means that multiple component
terms may be assigned to indicate a complex term. The
“coordination” or combination occurs when the
document is indexed or searched for.
Examples: postcoordinate and
precoordinate indexing
Precoordinate indexing terms might include:
• Design of hypertext literature.
• Mushroom foraging in the Pacific Northwest.
Postcoordinate indexing terms might include:
• Design.
• Hypertext.
• Literature.
• Mushrooms.
• Foraging.
• Pacific Northwest.
Subject as inherent property
In this view, ideal subjects exist in some Platonic
dimension.
Because subjects do, in fact, exist independently of
people, we can talk about the subject as an objective
property that is inherent in documents. We can
accurately and objectively identify subjects in
documents by following logical processes of deduction.
Example: My sister’s book
Elusive Equality: Gender,
Citizenship, and the Limits of
Democracy in
Czechoslovakia, 1918-1950
In the UT catalog, my sister’s book is
about:
•
Women -- Czechoslovakia -- Social
conditions.
• Women's rights -- Czechoslovakia.
But she thinks it’s about civil rights and
democracy.
According to the “objective idealist”
view of the subject, neither of these
might be right, but the right description
IS out there.
Subject as interpretive construct
The “subject” is subjective! Anyone’s interpretation is
ok!
Maybe my sister’s book is about women’s rights in
Czechoslovakia, maybe it’s about civil rights and
democracy. Can’t we just all get along?
But can I say the book is about sea slugs or Andrew
Jackson’s presidency or how bad things happen when
you let those pesky wimmin get the vote and stuff?
Use-oriented definitions
of the subject
In these definitions, the subject depends on how a
document is or might be used. Two variations of this
idea:
• “Request-oriented” ideas of the subject, or what users
need from documents (Hjorland’s “pragmatic” view).
• The ultimate contribution of the document to
knowledge (Hjorland’s “realist/materialist” view).
Subject as what you need
In “request-oriented indexing” as described by
Fidel, or in Hjorland’s discussion of pragmatic
views of the subject, the subject of a document is
based on its relevance to user needs.
If you are writing a paper on democracy and my
sister’s book can help you, then her book is
about democracy. For you. In that situation.
Subject as prediction of a
document’s contribution
In Hjorland’s “realist/materialist” view, the subject describes a
document’s contribution to its discipline, or to human knowledge.
A subject determination is thus a kind of prediction about what a
document’s importance will be to the field. This is “realist”
because eventually there will be a sort of answer. The Protocols
of the Elders of Zion is really about anti-semitism and hoaxes—
that’s its ultimate contribution to knowledge—not the Jewish
conspiracy to rule the world.
In the meantime, a subject determination is an argument for what
might or should happen.
Genre, form, and subject
Do artistic works have subjects? Does it matter if the
work is text (fiction, poetry) or not (images, music,
film)?
Is the “subject” of Uncle Tom’s Cabin that it is a novel?
It’s not about novels!
Is the subject “slavery?” Is Uncle Tom’s Cabin about
slavery in the same way that a history book is about
slavery?
Of-ness and aboutness
This photo is of a rose
(it contains a rose).
Is it about happiness?
(It was tagged with
“happiness” in Flickr.)
Subject analysis
Subject analysis involves the systematic determination
of a document’s subject for the purpose of placing the
document in an organized collection. Concepts that
designate the subject are often assigned uusing some
type of controlled vocabulary or classification scheme.
In this case, we want to be consistent in how subjects
are assigned, so that the subject has a consistent
meaning in the context of the collection, at least.
Subject analysis process
There are a number of models, but the ISO standard for
subject analysis involves three activities:
• Examining the document and identifying the subject.
• Identifying the primary concepts in the subject.
• Determining how to express those concepts in the
vocabulary that is being used for indexing.
Example: subject analysis
An article is about possible economic effects on U.S.
agriculture as the result of a policy decision by the EU
to enable member countries to ban genetically
engineered crops.
Concepts in this subject description might be:
• Bans on genetically engineered crops.
• European Union agricultural import policies.
• United States agriculture industry.
Exhaustivity and specificity
If we want to attempt consistency in indexing,
we need to determine:
• What makes a theme or topic important enough
to be indexed (exhaustivity).
• The level of detail at which index terms are
assigned (specificity).
Summary
• The subject, or what a document is about, is a
complex concept that is difficult to define
precisely.
• Some ideas of the subject are based on what a
document says, others on the context of use.
• Subject analysis involves identifying a what a
document is “about,” expressing that subject as a
set of concepts, and then selecting the index
terms that best represent those concepts.
Download