Subject Analysis - Columbia University

advertisement
Subject Metadata
Subject Analysis
• SUBJECT ANALYSIS: The process of ascertaining the “aboutness”
of a document by describing its topic, the discipline in which the
topic is treated, and the form of the document.
• Discipline: An area or a branch of knowledge. The discipline is
distinct from the thing being studied by the discipline. A broad field of
inquiry; the context in which any subject is treated
• Subject (Phenomena): Broadly, the things studied by disciplines
• Form: What the document is rather than what it contains’
– Intellectual: method by which the document has been compiled:
history, biography, textbook, Festschrift
– Presentation: manner in which subject content has been
organized. Statistical compilation
– Physical form: Structure of the document as an artefact. Book,
video.
Definitions
• Subject analysis is the part of indexing or
cataloging that deals with
– the conceptual analysis of an item: what is it
about? what is its form/genre/format?
– translating that analysis into a particular
subject heading system
• Subject heading: a term or phrase used
in a subject heading list to represent a
concept, event, or name
Types of concepts to identify
• Topics
• Names of:
– Persons
– Corporate bodies
– Geographic areas
• Time periods
• Titles of works
• Form of the item
Subjects vs. forms/genres
• Subject: what the item is about
• Form: what the item is, rather than what it is
about
–
–
–
–
Physical character (video, map, miniature book)
Type of data it contains (statistics)
Arrangement of information (diaries, indexes)
Style, technique (drama, romances)
• Genre: works with common theme, setting, etc.
– Mystery fiction; Comedy films
What is a Controlled Vocabulary?
• From Wikepedia: A controlled vocabulary is a
carefully selected list of words and phrases …
The terms are chosen and organized by trained
professionals (including librarians and
information scientists) who possess expertise in
the subject area. Controlled vocabulary terms
can accurately describe what a given document
is actually about, even if the terms themselves
do not occur within the document’s text.
Controlled Vocabularies: Subject
Heading lists vs. Thesauri
• Thesauri
• Created largely in
indexing communities
• Made up of single terms
and bound terms
representing single
concepts (usually called
descriptors). Bound terms
occur when some
concepts can only be
represented by two or
more words (e.g. Type A
Personality)
• Subject heading lists
• Created largely in library
communities
• Consist of phrases and
other precoordinated
terms in addition to single
terms
Controlled Vocabularies: Subject
Heading lists vs. Thesauri
• Thesauri
• More strictly hierarchical.
Because they are made
up of single terms, each
term usually has only one
broader term
• Narrow in scope. Usually
made up of terms from
one specific subject area
• More likely to be
multilingual. Because
single terms used, easier
to maintain in multiple
languages
• Subject heading lists
• Not strictly hierarchical.
Some headings may
have no broader and/or
narrower terms
• More general in scope,
covering a broad subject
area, or the entire scope
of knowledge
• Usually not multilingual
Translating key words & concepts
into controlled vocabulary
• Controlled vocabulary
– Thesauri (examples)
• Art & Architecture Thesaurus (AAT)
• Thesaurus for Graphic Materials I: Subject Terms (TGMI)
• Thesaurus for Graphic Materials II: Genre and Physical
Characteristic Terms (TGMII)
• Thesaurus of Geographic Names (TGN)
– Subject heading lists (examples)
• Library of Congress Subject Headings (LCSH)
• Sears List of Subject Headings
• Medical Subject Headings (MeSH)
Keywords vs. Controlled Terms
• System should allow for both
• Keywords give access using “nonstandard” terms
• Keywords include terms not yet in
vocabularies; places or names not indexed
Drawbacks to Controlled
Vocabulary
•
•
•
•
Time to assign = $$
Need for trained catalogers = $$
Time lag to add relevant terms
Time lag to delete outdated terms
– … so use both keywords and controlled terms
Why use controlled vocabulary?
Controlled vocabularies:
• identify a preferred way of expressing a
concept
• allow for multiple entry points (i.e., crossreferences) leading to the preferred term
• identify a term’s relationship to broader,
narrower, and related terms
Function of keywords
Advantages:
• provide access to the words used in
bibliographic records
Disadvantages:
• cannot compensate for complexities of
language and expression
• cannot compensate for context
Keyword searching is enhanced by
assignment of controlled vocabulary!
Vocabulary Control
• Vocabulary control is used to improve the
effectiveness of information storage and
retrieval systems, Web navigation
systems, and other environments that
seek to both identify and locate desired
content via some sort of description using
language. The primary purpose of
vocabulary control is to achieve
consistency in the description of content
objects and to facilitate retrieval.
Need for Vocabulary Control
• The need for vocabulary control arises from two basic
features of natural language
• Two or more words or terms can be used to represent a single
concept
– Example:
• salinity/saltiness
• VHF/Very High Frequency
• Two or more words that have the same spelling can represent
different concepts
– Example:
• Mercury (planet)
• Mercury (metal)
• Mercury (automobile)
• Mercury (mythical being)
Principles of Controlled
Vocabularies
• There are four important principles of
vocabulary control that guide their design
and development.
• eliminating ambiguity
• controlling synonyms
• establishing relationships among terms
where appropriate
• testing and validation of terms
Ambiguity
•
Ambiguity occurs in natural language when a word or phrase (a homograph
.
or polyseme) has more than one meaning
•
A controlled vocabulary must compensate for the problems caused by
ambiguity by ensuring that each term has one and only one meaning
Synonymy
• A different problem occurs when a concept can be
represented by two or more synonymous or nearly
synonymous words or phrases. This is called synonymy.
This means that desired content may be scattered
around an information space or database because it can
be described by different but equivalent terminology
• A controlled vocabulary must compensate for the
problems caused by synonymy by ensuring that each
concept is represented by a single preferred term. The
vocabulary should list the other synonyms and variants
as non-preferred terms with USE references to the
preferred term.
Type of vocabulary control
Controlled Lists
A list is a simple group of terms
Example:
Alabama
Alaska
Arkansas
California
Colorado
....
Frequently used in Web site pick lists
and pull down menus
What are these?
• Flying Horse
• King Fisher
• Royal Challenge
-- The meaning is not clear.
-- Need to eliminate ambiguity
What are these?
•
•
•
•
•
•
•
Flying Horse
King Fisher
Royal Challenge
Heineken
Budweiser
Miller-lite
Bud-light
Drinks
•
•
•
•
•
•
•
•
•
•
Flying Horse
King Fisher
Royal Challenge
Taj Mahal
Hayward’s 2000
Heineken
Corona
Budweiser
Miller-lite
Bud-light
Synonym Rings
A synonym ring is a list of synonyms
or near synonyms that are used
interchangeably for retrieval
purposes
Synonym Rings
-- Examples
Synonym rings are
usually found as sets of
lists that allow users to
access all content
containing any of the
terms.
e.g., cholesterol:
Cholesterol
Blood Cholesterol
Serum Cholesterol
Good Cholesterol
Bad Cholesterol
LDL
.
.
.
Synonym rings are used …
• Synonym rings are used to expand queries
for content objects.
– If a user enters any one of these terms as
a query to the system, all items are
retrieved that contain any of the terms in
the cluster.
An example from International SEMATECH;
a search for Silicon would look like this:
Synonym rings are used …
• Synonym rings are often used in systems where
the underlying content objects are left in their
unstructured natural language format,
– the control is achieved through the interface by
drawing together similar terms into these clusters.
• Synonym rings are used in conjunction with
search engines and provide a minimal amount of
control of the diversity of the language found in the
texts of the underlying documents.
Search: Tilenol, Result: Tylenol
Synonym rings can be used for assigning
keywords in metadata fields
IBM Homepage source code:
<meta name="Keywords" content="ibm,
international business machines, internet, ebusiness, ebusiness, e-business on demand,
ebusiness on demand, on demand, ibm on
demand, on demand business, on demand
enterprise, on demand services, ondemand, ondemand, personal computer, personal system,
e-commerce, ecommerce, pc, workstation,
mainframe, unix, linux, technical support,
homepage, home page"/>
Where to find synonyms
Search logs
Dictionaries
Existing authority files
LC Name Authority File (NAF)
The Union List of Artist Names (ULAN)
The Getty Thesaurus of Geographic Names
(TGN)
Lexical databases, e.g., WordNet
http://www.cogsci.princeton.edu/~wn/
Taxonomies
A taxonomy is a set of preferred terms, all
connected by a hierarchy or polyhierarchy
Example:
Chemistry
Organic chemistry
Polymer chemistry
Nylon
Frequently used in web navigation systems
United Nations Standard Products
and Services Classification
Thesauri
A thesaurus is a controlled vocabulary
with multiple types of relationships
Example:
Rice
UF Paddy
BT Cereals
BT Plant products
NT Brown rice
RT Rice straw
Thesauri
Relationship types:
• Use/Used For – indicates preferred term
• Hierarchy – indicates broader and
narrower terms
• Associative – almost unlimited types of
relationships may be used
It is the most complex format for
controlled vocabularies and widely used.
National Monuments Record Thesauri-Archaeological Objects Thesaurus
Use of Controlled Vocabularies
in Information Storage and
Retrieval Systems
Dublin Core
Content data for some elements may be selected from a
controlled vocabulary, as indicated by best practice
guidelines
Content
Coverage
Description
Type
Relation
Source
Subject
Title
Intellectual
Property
Contributor
Creator
Publisher
Rights
Instantiation
Date
Format
Identifier
Language
Example from LOM
(Learning Object metadata)
5.2 Learning Resource Type
Explanation: Specific kind of learning
object. The most dominant kind shall
be first.
NOTE: --The vocabulary terms are
defined as in the OED:1989 and as
used by educational communities of
practice.
•
Controlled terms 
Value Space:
ordered exercise
simulation
questionnaire
diagram
figure
graph
index
slide
table
narrative text
exam
experiment
problem statement
self assessment
lecture
Build in a pick-list for creating metadata records
Build in a thesaurus for automatic assignment of subject
terms
Build in a thesaurus to assist searching
Build in an illustrated thesaurus to assist
searching
Advantages and Disadvantages of
Particular Structures
• Lists:
– Simple to implement, use, and maintain
– Provide little or no guidance for the user
• Synonym Rings:
– Are constructed manually and are not used in
indexing
– Can be useful in retrieval as they allow
synonyms and near-synonyms to be treated
equally in searching.
Advantages and Disadvantages of
Particular Structures
• Taxonomies
– Good information about hierarchical relationships among terms
– Useful for both indexers and searchers who need to discover the
most appropriate, specific terms for their purposes
– There is no entry vocabulary, (i.e. USE/USED FOR terms)
– Taxonomies do not indicate other types of relationships among
terms
• Thesauri
– Good information about hierarchical relationships among terms
– Good information about relationships among terms
– Entry vocabulary to help users locate the correct terms
– Thesauri are useful for both indexers and searchers who need to
discover the most appropriate, specific terms for their purposes
– Thesauri are time-consuming and labor intensive to develop and
maintain
Typical applications of Lists, Synonym
Rings, Taxonomies, and Thesauri
•
•
•
Lists
– Lists are frequently used to display small sets of terms that are to be
used for quite narrowly defined purposes such as a web pull-down list or
list of menu choices.
Synonym Rings
– Synonym rings are frequently used behind-the-scenes to enhance
retrieval, especially in an environment in which the indexing uses an
uncontrolled vocabulary and/or there is no indexing as when searching
full text.
Taxonomies
• Taxonomies are often created and used in indexing applications and for
web navigation. Because of their (usually simple) hierarchical structure)
they are effective at leading users to the most specific terms available in
a particular domain.
Thesauri
– Thesauri are the most typical form of controlled vocabulary developed
for use in indexing and searching applications because they provide the
richest structure and cross-reference environment. Thesauri can be
narrow in scope and cover a limited domain or they can be broad in
scope and widely applicable to many different types of content.
Subject Analysis
• Subject analysis is the abstracting and indexing
of an item’s conceptual content
• A two step process:
– ascertaining the subject
– translating the subject into controlled vocabulary
• Important considerations include: cataloger
objectivity, cataloger’s background knowledge,
and consistency in determining the content
Subject Analysis
• Finding (find a work of which subject is
known)
• Collocating (find what repository has on
subject)
• Evaluating (assist in making informed
decision)
• Navigating (provide users with links to
related terms)
Subject Analysis
• What is it about? (aboutness or subject)
• What is it for? (relevance or use)
• These can be the same question in some
instances, but often the subject of a work
can be quite separate from the use to
which the searcher may put it or the
reasons why the searcher considers it
relevant.
There are a number of methods for determining
the aboutness of an item
• The Purposive Method tries to determine the
author's purpose in creating the work.
• The Figure-Ground Method tries to determine
what is most central to the work (highly
subjective).
• The Objective Method counts references to
topics and presume that commonly used topic
words are central (this is one of the methods
used by computers).
• The Appealing to Unity Method tries to
determine what holds the work together.
This photograph is from the Library of Congress, and it was
taken by Marion Post Wolcottin March 1940
What is this scene about?
•
•
•
•
•
•
photo of a town covered in snow at night
from 1940
is this about winter?
small town America?
the introduction of electric lights?
the depression?
The answer is it is about all of those things, and probably more. But
it is a photo of a small town in the U.S. in the snow, it is a main
street, we see automobiles and houses but also commercial
buildings, footsteps in the snow, electric lights, and so on.
There is a fundamental difference between what an artifact is (a book
or a photograph), what it is of, and what it is about. But all of those
things usually get lumped together in subject headings and
classifications.
Summarization for Subject Analysis
• Sumarization is the process of deciding
what an item is about and translating this
into index terms from a subject language.
• This process should examine three
distinct areas: the discipline in which the
item was produced, the specific subjects
or topics treated and the form of the item.
Summarization
• "Summarization" is the word used for a string of
terms that describe the aboutness of an artifact.
• Discipline | Topic {Facet} | Form
• The photograph could be described as:
• Sociology | Depression; Winter; American town |
Photograph
OR
• History | Winter; Small Town America |
Photograph
Subject Access Points
• Serve to identify the subject of particular archival
collections, series, subseries, or items, and
facilitate direct topical retrieval of these materials
• Subject headings allow the user to see the entire
scope of a repository’s holdings on a given topic
by causing these bibliographic records to
collocate, or appear side-by-side, under a
subject heading in the catalog
• When LCSH are used, the archival materials will
collocate with published material on the same
topic
Topical Subjects
• The topical subject matter to which the records
pertain is among the most important aspects of
the archival materials. Terms suggesting topics
that might be employed as access points may be
found in the following areas of the descriptive
record:
– Title Element (2.3)
– Scope and Content Element (3.1)
– Administrative/Biographical History Element (2.7,
Chapter 10)
Documentary Forms
• Terms that indicate the documentary form(s) or
intellectual characteristics of the records being described
(e.g., minutes, diaries, reports, watercolors,
documentaries) provide the user with an indication of the
content of the materials based on an understanding of
the common properties of particular document types. For
example, one can deduce the contents of ledgers
because they are a standard form of accounting record,
one that typically contains certain types of data.
Documentary forms are most often noted in the following
areas of the descriptive record:
– Title Element (2.3)
– Extent Element (2.5)
– Scope and Content Element (3.1)
Occupations
• The occupations, avocations, or other life
interests of individuals that are documented in a
body of archival material may be of significance
to users. Such information is most often
mentioned in the following areas of the
descriptive record:
– Scope and Content Element (3.1)
– Administrative/Biographical History Element (2.7,
Chapter 10)
Functions and Activities
• Terms indicating the function(s), activity(ies),
transaction(s), and process(es) that generated the
material being described help to define the context in
which records were created. Examples of such concepts
might be the regulation of hunting and fishing or the
conservation of natural resources. Functions and
activities are often noted in these areas of the descriptive
record:
– Title Element (2.3)
– Scope and Content Element (3.1)
– Administrative/Biographical History Element (2.7, Chapter 10)
Subject Analysis for Archival
Materials: Questions
• Concept of aboutness:
– How is it determined for archival materials?
– Is it of any use to information seekers?
– Should other concepts (occupation, form,
genre) take precedence over topicality?
• Means of providing subject access
– Should it be LCSH or other thesauri (or a
combination)
Depth of Subject Analysis
• Summary Level
– Most library materials analyzed at this level.
The analysis of the collection will proceed as
though it were a single entity. Reduce analysis
to a single phrase that identifies its main
topical theme.
– Rarely appropriate for archival materials.
Depth of complexity of materials will be lost in
gross generalizations
Depth of Subject Analysis
• Depth Level
– Although rarely used in library cataloging,
usually will provide a more meaningful
approach to archival collections
– Break collection into appropriate components
and summarize each component individually
• Exhaustive Level
– Analyze every component of a collection. This
is very expensive and time-consuming, so will
be utilized only in special cases
Archival Management and the
Depth Level
• Consider amount of processing that is being
conducted on a collection at the point that
description occurs
– Summary level may be appropriate for a recently
acquired collection that is not yet processed and has
a preliminary record
– When processing is underway and the collection has
been arranged into series and subseries, the depth
level might be a better choice
– The exhaustive level is probably only appropriate
occasionally when some segment of records is
heavily used or considered to be of central
importance to the repository’s users
Subject Analysis for Archival
Materials
• Discipline
• Topic
– Provenance
• Creator, Function, Activity
– Cultural orientation
• Chronological
• Geographic
• Form
– Intellectual, e.g. historical sources
– Physical, e.g. diaries or correspondence
– Presentation, e.g. statistics
Library of Congress
Subject Headings (LCSH)
• Originally designed as a controlled vocabulary for
representing the subject and form of books and serials in
the LC collection
• Literary warrant: LC collection
• originally for use in LC catalogs
• now global standard for (i) library catalogs, (ii)
bibliographic databases
• Approximately 259,000 headings
• c.10,000 new headings added each year
• c.10,000 new headings added each year
• Approximately 36% of headings are followed by LC
Class numbers
LCSH Principles
• User and usage based
• Literary warrant
• Uniform headings
–
–
–
–
–
•
•
•
•
•
Synonymous terms
Spelling variants
English vs. foreign language terms
Scientific/technical vs. popular terms
Currentness
Unique headings
Specific entry and co-extensivity
Internal consistency
Stability
Precoordination: indexing terms are chosen and coordinated (“put
together as a string”) at the time of cataloging
LCSH Headings can be:
• Personal names
– Individuals
– Families, dynasties, etc
– Mythological, legendary or
fictitious characters
•
•
•
•
•
•
Corporate bodies
Historical events
Names of animals
Other proper names
Languages
Ideas, events
• Prizes, awards
• Holidays, days of the
week, etc.
• Ethnic groups, tribes,
nationalities, etc.
• Religious, philosophical
systems
• Geographic names
– Jurisdictional headings
– Geographic features
• You name it – it can be a
subject heading
LCSH Conventions for
Relationships
• UF: used for: specific see reference
• BT: broader term: specific see also reference
• NT: narrower term:
specific see also reference
• SA: see also: general see also reference
• RT: related term: specific see also reference
Syndetic structure: references
• Equivalence relationships
• Hierarchical relationships
• Associative relationships
Equivalence or USE/UF references
• Link terms that are not authorized to their
preferred form
• Example:
Baby sitting
USE Babysitting
Categories of USE/UF
references
• Synonyms and near synonyms
– Dining establishments USE Restaurants
• Variant spellings
– Haematology USE Hematology
• Singular/plural variants
– Salsa (Cookery) USE Salsas (Cookery)
Categories of USE/UF
references
• Variant forms of expression
– Nonbank banks USE Nonbank financial
institutions
• Alternate arrangement of terms
– Dogs—Breeds USE Dog breeds
• Earlier forms of headings
– Restaurants, lunch rooms, etc. USE
Restaurants
Hierarchical references: broader
terms and narrower terms
• Link authorized headings
• Show reciprocal relationships
• Allow users to enter at any level and be
led to the next level of either more specific
or more general topics
Three types of hierarchical
references
• Genus/species (or class/class member)
Dog breeds
NT Shih tzus
Shih tzus
BT Dog breeds
• Whole/part
Foot
Toes
NT Toes
BT Foot
• Instance (or generic topic/proper-named
example)
Mississippi River
BT Rivers—United States
Rivers—United States
NT Mississippi River
Associative or related term
references
• Link two headings associated in some manner
other than hierarchy
• Currently made between
– Headings with overlapping meanings
• Carpets RT Rugs
– Headings for a discipline and the focus of that
discipline
• Ornithology RT
Birds
– Headings for persons and their field of endeavor
• Physicians
RT Medicine
Entry in LCSH
Automobiles (May Subd Geog)
[TL1-296.5]
UF Autos (Automobiles)
Cars (Automobiles)
Gasoline automobiles
Motorcars (Automobiles)
BT Motor vehicles
Transportation, Automotive
SA headings beginning with the word Automobile
NT A.C. Automobile
Abarth automobiles
Alfa Romero automobile
Etc.
Entry in LCSH
Librarians (May Subd Geog)
[Z682 (Personnel)]
[Z720 (Biography]
BT Information scientists
Library employees
RT Libraries
NT Academic librarians
Acquisitions librarians
Adult services librarians
Bisexual librarians
Etc.
Limitations of subject access for
primary sources
• Standard terminology can be too generic or
heterogeneous
• Terms change over time (e.g., place names,
archaic terms)
• Large number of terms needing to be assigned
• Lack of overlap in terms being assigned by
different describers
Alternatives to subject access
•
•
•
•
•
Provenance
Function
Genre or form-of-material
Geographic coordinates
Date
Headings and Subdivisions Useful for Archival
Purposes: Correspondence
• Use for personal correspondence of individuals
• Assign the following combination of headings to
collections of personal correspondence:
– 600 X0 $a [name of the letter writer(s)] $v
Correspondence.
– 600 X0 $a [name of the addressee(s)] $v
Correspondence.
– 650 #0 $a [class of persons, or ethnic group] $v
Correspondence.
– 650 #0 $a [special topics discussed in the letters]
Example
• Title: Letters from John Smith,
metallurgist, to his student, John Doe,
concerning his research into zinc alloys.
– 600 10 $a Smith, John $v Correspondence.
– 600 10 $a Doe, John $v Correspondence.
– 650 #0 $a Metallurgists $z Maryland $v
Correspondence.
– 650 #0 $a Zinc alloys.
Example
• Title: The exchange of correspondence
between Irish American author, Mary O'Brien
and her publisher, Sam Brown, during her stay
in France in 1925-30.
– 600 10 $a O'Brien, Mary $v Correspondence.
– 600 10 $a Brown, Sam $v Correspondence.
– 650 #0 $a Authors, American $y 20th century $v
Correspondence.
– 650 #0 $a Publishers and publishing $z New York
(State) $v Correspondence.
– 651 #0 $a France $x Description and travel.
– 651 #0 $a France $x Civilization $y 1901-1945.
Headings and Subdivisions Useful for
Archival Purposes: History--Sources
• Assign the free-floating subdivision History–Sources
under historical headings for collections or discussions of
historical source materials.
• The subdivision –Sources is used directly after headings
and subdivisions that denote history or a historical event,
or have an obvious historical connotation.
• The subdivision –History–Sources, or –History–
[period subdivision]–Sources is used after other
headings to denote historical source materials.
• Since the correspondence or diaries of an individual
person may or may not be regarded as historical source
material, depending on the viewpoint of the reader, do
not add the subdivision –Sources or –History–Sources
to the headings assigned to works of this type
Headings and Subdivisions Useful for
Archival Purposes: Archives
• Archives are collections of documents or records relating to the
activities, business dealings, etc., of a person, family, corporation,
association, community, or nation.
• Use the free-floating subdivision –Archives as a form or topical
subdivision under types of corporate bodies and educational
institutions, classes of persons, and ethnic groups, and under
names of individual corporate bodies, educational institutions,
persons, and families, for collections or discussions of documentary
material, such as manuscripts, household records, diaries,
correspondence, photographs, memorabilia, etc., pertaining to these
persons or institutions.
• Code –Archives as a $v subfield if the work consists of collections
of documentary material. Code it as an $x subfield if the work
discusses the documentary material.
Examples
• Title: The personal archives of President Calvin
Coolidge.
– 600 10 $a Coolidge, Calvin, $d l872-1933 $v Archives.
– 651 #0 $a United States $x Politics and government $y 19231929 $v Sources.
• Title: Documents of the State Department relating to the
history of Greece from 1950 to 1954.
– 651 #0 $a Greece $x History $y 1950-1967 $v Sources.
– 610 10 $a United States. $b Dept. of State $v Archives.
• Title: Papers of the Society of American Indians
[microform]
– 650 #0 $a Indians of North America $x History $v Sources.
– 610 20 $a Society of American Indians $v Archives.
Headings and Subdivisions Useful for
Archival Purposes: Archives
• Use the free-floating subdivision –Archives
under names of corporate bodies, including
individual educational institutions, provided that
the corporate body or educational institution is
an authoring party in the preparation of the
archive, not merely the institution that houses
the archive.
• If the collection is a formally organized archive
for which a name heading can be established,
use that heading, as appropriate, instead of the
subdivision –Archives under the name of the
corporate body.
Headings and Subdivisions Useful for
Archival Purposes: Manuscripts
• Because of the unique characteristics of manuscripts
and works about them, it is necessary for subject
catalogers to assign a complex of subject headings in
order to bring out various aspects, each of which
represents a possible means of retrieval.
• Included among these various aspects are the following:
the topical information presented in the manuscript; the
category of works to which the manuscript belongs, such
as missals; the illuminations present; the name of the
collection to which the manuscript belongs; etc.
• See SCM H1855 for details
• See SCM 1845 for specific instructions for genealogy
and local history collections
6XX (Subject Headings)
• 600 (Personal Name Subject Heading), 610 (Corporate Name
Subject Headings), 611 (Meeting Name Subject Heading)
• Use for subject access to the main entry.
• As a rule, put the name in the 100 field into the 600 field,
because often the archival and manuscript material (such as
letters or personal papers) is as much about the person as it
is authored by the person. There are, however, exceptions to
this, as when someone has written a book about someone or
something else, and it is not logical to put the author into a
600 field but into the 700 field.
• Choosing names: Significant personal or professional
subjects of either correspondence or other nature significant
to the collection should be included. Also, include a person if
there is a large volume of letters sent to that person, but none
are received from them.
6XX (Subject Headings)
• Although there is no limit to the number of
names one can include using this field, limit your
choices to only the most significant names
reflected by a collection.
• Use authorized forms of names.
• 600 1st indicator
– 0 Forename 100 0 $a Liberace
– 1 Surname 100 1 $a Chiang, Kai-shek
– 3 Family name 100 3 $a Dunlop family
6XX (Subject Headings)
• 655 (Genre/Form Heading)
• Terms indicating the genre, form, and/or physical
characteristics of the materials being described.
• 2nd indicator
– 0 LCSH
– 2 MeSH
– 7 Source specified in $2
• $2 Examples of thesauri used:
– aat (Art and Architecture Thesaurus)
– gmgpc Thesaurus for graphic materials: TGM II, Genre
and physical characteristic terms
– rbgenr (Genre Terms Created by the Bibliographic Standards
Committee of RBMS)
• 655 #7 $a Diaries $2 aat
6XX (Subject Headings)
• 656 (Index Term/Occupation)
• Contains terms giving occupations and avocations
reflected in the contents of the described materials. It is
NOT used to list the occupations of the creator, unless
they are significantly reflected in the materials
themselves.
• Major sources for occupational terms and $2 codes are:
– aat Art and Architecture Thesaurus
– lcsh Library of Congress Subject Headings
– dot Dictionary of Occupational Titles (U.S. Dept.
of Labor)
• 656 #7 $a Politicians. $2 lcsh
6XX (Subject Headings)
• 657 (Index Term/Function)
• An index term describing the activity
or function that generated the
described materials (e.g., property
assessment or voter registration).
• 650 #7 Annual inventory ‡x Ladies'
apparel. ‡2 [thesaurus code]
Getty Vocabularies
• Structure & content are based upon
standards (e.g., ISO, CDWA)
• Are compiled resources (not
comprehensive)
• Growth through collaboration,
inside Getty & outside
Getty Vocabularies
• Art & Architecture Thesaurus (AAT)
• Union List of Artist Names (ULAN)
• Getty Thesaurus of Geographic Names
(TGN)
Types of terms in vocabularies
• personal names: Painter of the Wedding
Procession (attributed to); Nikodemos (signed,
as potter)
• geographic names: Athens
• object names: storage vessels, Panathenaic
amphorae
• corporate names: J. Paul Getty Museum
• iconographic subjects and themes: Nike
Crowning the Victor, with Judge on right and
defeated opponent on left
• genre terms: Antiquities, ceremonies
• multilingual terms: Athínaí (Greek) = Athens
(English) = Athenae (Latin)
Types of terms in vocabularies
•
•
•
•
•
•
•
personal names
in the Union List of Artist Names you will find "Georgia O’Keeffe"
geographic place names
in the Getty Thesaurus of Geographic Names you will find "Botswana"
corporate names
in the Library of Congress Name Authority File you will find "Metropolitan
Museum of Art (New York. N.Y)"
object names
in the Art & Architecture Thesaurus you will find "scroll paintings"
iconographic subjects and themes
in ICONCLASS you will find the "education of Cupid by Venus and Mercury"
genre terms
in the Thesaurus for Graphic Materials II: Genre and Physical Characteristic
Terms you will find "political cartoons"
multi-lingual terms
in the Multilingual Egyptological Thesaurus you will find the term "pottery" in
English, German, "keramik" and French, "céramique".
Getty Vocabularies
• data value standards that provide terminology for use in cataloging,
indexing and documentation practice. They are most effective when used
in combination with data structure standards (e.g., CDWA) and data content
standards (e.g., AACR2).
• thesauri built according to standards. They follow the rules and
conventions prescribed by standards organizations such as ISO, NISO, and
other codes of practice for thesaurus construction.
• designed for use in both indexing and retrieval. They are intended to
bridge the language of the indexer and that of the searcher. If the
vocabularies are available at the time of the search query, the searcher can
consult the vocabulary to see what likely terms are available for the query.
Getty Vocabularies
• facilitators for information-sharing among different types of
collections. For example, the AAT can be used to describe subject
matter for books in a library, works of art in a museum, records in an
archive, or images on the Web.
• application independent. The Vocabularies can be applied in the
electronic environment in a variety of applications (e.g., databases
and search engines) as well as in manual indexing systems, such as
a card file.
• evolving and growing tools. Work with contributors allows for ongoing community input and expansion of coverage in specialized
subject areas.
AAT
• focus of the AAT is on art and architecture, as the title
suggest.
• However, the AAT can provide terminology for the
description, documentation, and retrieval of visual and
textual surrogates for art, and for related disciplines.
• The scope of the AAT is global, although currently it is
richest in terminology used for art of Western Europe
and North America.
• The AAT is growing and expanding coverage by
incorporating additional data from a variety of Getty
projects and external contributors. For example, a
working group from the National Museum of African Art
has added terminology for African styles/periods and
object names.
AAT
• The AAT includes terminology related to:
– works of art (e.g., painting, sculpture, mixed
media)
– architecture (e.g., the built and natural
environment)
– material culture (e.g., furniture, costume, and
equipment)
– forms and genre (e.g., document types, records)
– cultural traditions (e.g., events)
High-Backed Chair
for Miss Cranston's tea rooms
AAT terms in Italics
•
•
•
•
•
•
•
•
•
•
•
What is it? high-backed chair
What is it made of? oak, horsehair
How was it made? upholstered, stained, pierced
Who made it? Charles Rennie Mackintosh, architect
When was it made?1898-99
What style is it? Arts and Crafts
What is it part of? tea room
What condition is it in? reupholstered
How was it used? dining
What is it about? anthropomorphic
Where did it come from? Miss Cranston's Arbyle Street Tea
Rooms
• Where is it? Glasgow School of Art, Glasgow
AAT does not include certain types
of terminology
• Personal Names: Charles Rennie Mackintosh
(ULAN)
• Corporate Names: Glasgow School of Art (Library
of Congress authority files)
• Geographic Place Names: Glasgow (TGN)
• Building Names: Miss Cranston's Argyle Street
Tearoom (local authority)
• Historic Events: Exhibition of Decorative Art,
London, 1923 (Library of Congress authority files)
• Iconographic themes: Venus and Cupid
(ICONCLASS)
Art &
Architectur
e
Thesaurus
• Contains around 34,000 concepts,
131,000 terms
•Records contain terms, notes,
relationships, bibliography
Scope ranges from
antiquity to present
 Global, but
preponderance of
Western concepts
 Terms describe Art,
Architecture,
Decorative Arts,
Material Culture, &
Archival Materials
Elements of an AAT record
parent concept
furnishings
mirrors
wall mirrors
concept
Note: The Focus of each
vocabulary record is a concept
- not a “term”
object, material,
activity, style,
attribute...
scope note
Tall, narrow mirrors
intended to fill the pier, the
space between two
windows...
names/terms
pier glasses
pier mirrors
trumeaux
related concepts
pier tables
sources
Comstock, Helen. The Looking
Glass in America, 1700-1825.
Page 17.
TGN
• The TGN is a structured vocabulary containing
around 1,000,000 names and other information
about places.
• The TGN includes all continents and nations of
the modern political world, as well as historical
places.
• It includes physical features and administrative
entities, such as cities and nations.
• The emphasis in TGN is on places important for
art and architecture.
Getty
Thesaurus Scope and range
of Geographic
Names
Records for 912,000
places, 1,106,000 names
Names, coordinates,
relationships, dates &
bibliography
 Includes all continents and
nations of modern political
world, historical places
 Includes physical features
 Includes inhabited places,
other administrative and
political entities
 Emphasis on places
important to art &
architectural history
Elements of a TGN record
Focus is concept
names
Siena
Sena Julia
parent place
notes
Italy
Tuscany
Siena province
geographic
coordinates
place
43 19 N, 011 21 E
Founded as Etruscan hill
town; later was Roman city of
Sena Julia; thrived under
Lombard kings; was medieval
self-governing commune; was
seat of Ghibelline power ...
place types
bibliography
Annuario Generale (1980)
Dizionario Corografico Toscana (1977)
Webster's Geographical Dictionary (1984)
Hook, Siena (1979), 6 ff.
TCI: Toscana (1984), 479 ff.
Times Atlas of the World (1992), 183
Canby, Historic Places (1984), II, 861
Milanesi, Storia dell'Arte Senese (1969)
inhabited place
provincial capital
dates
settled by Etruscans
(flourished 6th cen. BCE)
ULAN
• The ULAN is a structured vocabulary that
contains around 220,000 names and other
information about artists.
• The coverage of the ULAN is from Antiquity to
the present, and the scope is global.
• The scope of the ULAN includes any identified
individual or "corporate body" (i.e., a group of
people working together) involved in the design
or creation of art and architecture.
Scope and Range
Union
List of Artist
Names
 Scope is from Antiquity to
the present
 Coverage is global,
preponderance Western
artists
 Identified individuals or
groups of individuals
working together
(corporate bodies)
 Involved in the conception
or production of visual arts
ULAN contains records for 120,000 ‘artists’,
& architecture
293,000 names
Records contain names, biographical
information, relationships, & bibliography
Elements of a ULAN record
roles
painter
draftsman
Focus is concept
geographic location
Ferrara (Italy)
Venice (Italy)
notes
Although early biographers,
including Vasari, noted a birth
date of ca. 1475, modern scholars
agree that he cannot have been
born much before 1490...
bibliography
Artist
names
names
Dosso Dossi
Dosso
Dossi
Giovanni
de Lutero
Giovanni
de Lutero
Dosso da Ferrara
Dosso
da Ferrara
Giovanni
di Niccolò
Giovanni di Niccolò
life dates
born ca. 1490,
active from 1512,
died 1542
related people
student of:
Lorenzo Costa di Ottavio,
from 1507
*Bénézit; Berenson; *Bolaffi;
*Encyc. world art;
Gibbons, DOSSO AND BATT.
DOSSI (1968); Grove Dict of Art
Cataloguing Cultural Objects
as a tool for subject cataloguers
Aims
• practical guidance for subject cataloguers,
indexers
• intra- and inter-indexer consistency
• user–indexer consistency
• retrieval effectiveness
Cataloguing Cultural Objects
as a tool for subject cataloguers
Challenges
1.
2.
3.
4.
5.
what does “subject” mean? -- i.e., what kinds of
property of works should be indexed?
what kinds of method should be used to determine
the subject(s) of works, and ...
... to select terms that represent those subjects?
what kinds of control should be imposed on the
lists of terms from which selection is made, and
how should such authority control be implemented?
what metadata elements should be established for
recording subject data?
Kinds of subject
Subjects, objects, images, texts
• subjects: e.g., people, things, events,
places, concepts
• objects (works) [in museums, archives]:
e.g., artworks, buildings, artifacts,
documents, collections
– descriptive cataloguing: what the objects are
– subject cataloguing: what subjects the objects
are of / about
Kinds of subject
• images [in visual resource collections]: visual
representations of objects, e.g.,
photographs, slides, digital files
– descriptive cataloguing: what the images are; what
objects the images are of
– subject cataloguing: what subjects the images are about
• texts [in libraries]: verbal representations of
objects, e.g., books, journal articles
– descriptive cataloguing: what the texts are
– subject cataloguing: what objects the texts are about;
what subjects the texts are about
CDWA Subject
• In CDWA, subject matter is analyzed
according to a method based on the work
of Erwin Panofsky
• Panofsky identified three main levels of
meaning in art:
– Pre-iconographic description
– Iconographic identification
– Iconographic interpretation or “iconology”
CDWA Subject
• Three sets of subcategories under the
category Subject Matter in CDWA reflect
this traditional art-historical approach to
subject analysis
• Simplified and practical for purposes of
retrieval
CDWA Subject
• CDWA levels of subject analysis
– Subject matter–Description. A description of the work
in terms of the generic elements of the image or
images depicted in, on, or by it
– Subject matter–Identification. The name of the subject
depicted in or on a work of art: its iconography.
Iconography is the named mythological, fictional,
religious, or historical narrative subject matter of a
work of art, or its non-narrative content in the form of
persons, places, or things
– Subject matter-Interpretation. The meaning or theme
represented by the subject matter or iconography of a
work of art.
Mantegna’s Adoration of the Magi
• Subject matter–Description: woman, baby, men,
vessels, coins, turbans, etc.
• Subject matter–Identification: Known
iconographic subject. Based on New Testament
(Matthew 2). Balthasar, Melchoir, Caspar, Mary,
Jesus, Joseph
• Subject matter-Interpretation: Three Ages of
Man (Youth, Middle Age, Old Age); Three Races
of Man; Three Parts of the World
Kinds of subject
Representation
• representational (figurative) works
– narrative subjects
• stories
• episodes in stories, i.e., events
– non-narrative subjects
• people, animals, plants
• objects, e.g., buildings
• activities; places; periods
• [work types: portraits, still lifes, landscapes, genre
scenes, architectural drawings ...]
Kinds of subject
• non-representational works
•
•
•
•
abstract works
buildings
furniture
decorative arts
– “subject” / content =
• meaning (symbolic, allegorical, thematic,
conceptual)
• form, composition
• function, purpose, use
Kinds of subject
Ofness and aboutness
• what is the work of?
– generically: description
• e.g., “Nude standing woman seen from front, holding dagger
in right hand”
– specifically: identification
• e.g., “The suicide of Lucretia”
• what is the work about?
– interpretation
• e.g., “virtuousness”
CCO recommendation #1
• subject data should be consistently given
for all works, not just for representational
ones
– (even if those data end up overlapping with
the content of other elements, e.g. Work
Type)
Subject analysis
Ofness
• who? what? where? when?
– people, objects/activities, places, times
• generic to specific
• left to right; top to bottom; foreground
to background ...
Subject analysis
Aboutness
• what is the meaning of the work?
• what is expressed by the work?
• what do the objects, events, etc., depicted in the
work symbolize?
• how may the image be interpreted?
• what was the intention of the work’s creator?
• how has the work been interpreted historically?
CCO recommendation #2
• take a methodical approach to subject
analysis
Term selection
What kinds of terms? How many terms?
• factors that can’t help but affect the
specificity of indexing:
– quality and quantity of available scholarly information
about the work
– extent of indexer’s knowledge of the work
– extent of indexer’s general pre-iconographic knowledge
– depth of indexer’s indexing expertise
– availability of time; money; human resources;
technology at institution’s disposal
Term selection
• factors that should also affect the specificity
of indexing
–
–
–
–
–
needs of end-users: expert and non-expert
characteristics of the collection
relative importance of the work
presence of unusual details in the work
institutional policies
• number of terms to be assigned per work
• method of subject analysis to be used
– capabilities of system
• e.g., to link NTs to BTs, preferred terms to synonyms and
RTs, etc.
CCO recommendation #3a
• don’t be specific without the support of
scholarly evidence
– better to be general and accurate than
specific and wrong
CCO recommendation #3b
• use subject terms that have been
identified as “preferred” in established
authority files (controlled vocabularies)
Authority control
Four kinds of authority file
• Personal and Corporate Body Authority
– preferred forms of names of real people/bodies
(as artists, patrons, subjects of works)
• Geographic Place Authority
– preferred forms of names of real places
Authority control
• Concept Authority
– preferred forms of genre terms
• e.g. “still life,” “landscape”
– preferred forms of generic subject terms
• objects, materials, activities, agents, properties,
styles, periods treated as subjects
Authority control
• Subject Authority
– preferred forms of iconographical terms
• proper names, uniform titles, standard labels ...
• ... of characters, situations, events, themes, works
(e.g., buildings) ...
• ... in historical, mythological, religious, literary
contexts
Authority control
• cf. AAT: Art & Architecture Thesaurus
– terms for describing what objects / images are
– project began 1980; funded by CLR, NEH, Mellon, then Getty from 1985;
sponsored by ARLIS, CAA, SAH, etc.
– current: version 3.0-Web, at
http://www.getty.edu/research/conducting_research/vocabularies/aat/
• cf. ICONCLASS: Iconographic Classification System
– terms for describing what objects / images are of / about
– an iconographic classification system (not a vocabulary per se)
– a collection of circa 24,000 ready-made definitions (in English) of objects,
persons, events, situations, and abstract ideas that can be the subject of a work
of art (emphasis is on Western art)
– 1949: van de Waal (U. Leiden) began to develop ideas that led to ICONCLASS
– 1973-85: published in 17 vols.
– ICONCLASS Libertas Browser (KNAW, Amsterdam): web-accessible version, at
http://www.iconclass.nl/
ICONCLASS
• Iconclass was developed by Henri van de Waal (19101972), Professor of Art History at the University of Leiden
• His ideas for a systematic overview of subjects, themes
and motifs in Western art, which later became the
Iconclass System, took form in the early 50’s.
• The complete Iconclass System was finished in the
years after 1972 by a large group of scholars and was
published between 1973 and 1985 by the Royal
Netherlands Academy of Arts and Sciences (KNAW) of
which Van de Waal was a member.
ICONCLASS
• Iconclass is a subject-specific classification
system; it is a hierarchically ordered collection of
definitions of objects, persons, events and
abstract ideas that can be the subject of an
image.
• Art historians, researchers and curators use it to
describe, classify and examine the subject of
images represented in various media such as
paintings, drawings and photographs.
ICONCLASS
• Numerous institutions across the world use
Iconclass to describe and classify their
collections in a standardized manner.
• In turn, users ranging from art historians to
museum visitors use Iconclass to search and
retrieve images from these collections.
• As a research tool, Iconclass is also used to
identify the significance of entire scenes or
individual elements represented within an
image.
The three main component of
Iconclass are
• Classification System: 28,000 hierarchically ordered
definitions divided into ten main divisions. Each definition
consists of an alphanumeric classification code
(notation) and the description of the iconographic subject
(textual correlate). The definitions are used to index,
catalogue and describe the subjects of images
represented in works of art, reproductions, photographs
and other sources.
• Alphabetical Index: 14,000 keywords used for locating
the notation and its textual correlate needed to describe
and/or index an image. This index is a valuable tool for
iconographers in the identification, search and retrieval
of subjects and scenes.
• Bibliography: 40,000 references to books and articles
of iconographical interest.
Authority control
Kinds of source of terminology
for local authority files
– distinguished by structure:
• hierarchical vs. non-hierarchical
– by object type:
• subjects vs. people/places
– by scope:
• domain-specific vs. interdisciplinary
– by purpose:
• authority control vs. end-user reference
CCO recommendation #4
• link the occurrences of subject terms in
work records to the authority records for
those terms
– (in authority files that implement synonym
control and hierarchical structure)
Record structure
Metadata element sets
• cf. CDWA: Categories for the Description of Works of
Art
– ed. Baca, Harpring
– funded by Getty, NEH, CAA
– 2000: version 2.0; on web at
http://www.getty.edu/research/conducting_research/standards/cd
wa/
• cf. VRA Core Categories
– ed. Lanzi, Whiteside
– 2007: version 4.0; on web at
http://www.vraweb.org/projects/vracore4/index.html
Record structure
Subject metadata elements recommended
by CCO
• Description [free-text; non-repeatable]
• Subject [required; controlled; repeatable]
• Extent
– for designating the part of the work to which the
subject terms are applicable
• Subject Type
– for distinguishing between description,
identification, interpretation
CCO recommendation #5
• implement separate subject elements for
display and for retrieval
Example
• Statue of Hercules
(Lansdowne Herakles)
• Unknown Roman
sculptor; after the School
of Polykleitos
• about 125 CE
• marble
• height: 193.5cm
• J. Paul Getty Museum
(Los Angeles, CA)
• ©2004 J. Paul Getty
Trust.
Example
Description: Herakles standing in
contrapposto, holding his
attributes, the skin of the
Nemean lion and a club. This
statue was found in Tivoli ca.
1790, in the ruins of Hadrian’s
villa; it was in the collection of
the Marquess of Lansdowne
until 1951. It is related in
appearance to works
attributed to 4th-century BCE
Greek sculptors; however, the
work has an eclectic style that
is purely Roman.
Subject--Description:
religion/mythology; human
figure; male; nude; lion skin;
club
Subject--Identification: Hercules
(Greek/Roman hero); Nemean
Lion
Example of a Subject Authority record
Subject Names: Hercules (preferred); Herakles; Heracles; Ercole; Hercule;
Hércules
Hierarchical Position: Classical mythology--Greek heroic legends--Story of
Hercules--Hercules
Indexing Terms: Greek hero; king; strength; fortitude; perseverance;
Argos; Thebes
Note: Probably based on an actual historical figure, a king of ancient
Argos. The legendary figure was the son of Zeus and Alcmene ...
Related Subjects: Labors of Hercules; Love Affairs of Hercules; Zeus
(Greek god); Alcmene (Greek heroine); Hera (Greek goddess)
Dates: Story developed in Argos, but was taken over at early date by
Thebes; literary sources are late, though earlier texts may be
surmised. Earliest: -1000 Latest: 9999
Sources: ICONCLASS http://www.iconclass.nl/; Grant, Michael and John
Hazel. Gods and Mortals in Classical Mythology. Springfield, MA: G &
C Merriam Company, 1973. Page: 212 ff.
Opportunities
•
•
•
•
•
integrity and longevity of data
consistent, reliable access to data
exchange, sharing, reuse of data
interoperability of systems
easy migration of data to new systems
• communication, cooperation, collaboration
Questions
• should indexers be expected to do
iconographical research to index aboutness?
• should cultural-historical questions about a
work’s unintended meanings be answered by
indexers?
• how may future users’ needs be predicted?
• what role for general knowledge-organization
schemes?
Download