CONTROLLED VOCABULARY IN INFORMATION ORGANIZATION F. Miksa

advertisement
CONTROLLED VOCABULARY IN
INFORMATION ORGANIZATION
F. Miksa
The University of Texas at Austin, School of Information
INF 389K: Lifecycle Metadata for Digital Objects
Professor Pat Galloway, Instructor
9 October 2006
1
Topics to be Covered
1. Background Concepts related to
Information Organization
2. Definitions
3. Purposes of and Special Considerations
when Creating Controlled Vocabularies
4. Elements of Controlled Vocabularies
5. Sample Tools of Controlled Vocabularies
2
1. General Concepts of Information
Organization-I
1.
Information organization—a broad term standing for the process of
making information systems.
2.
Information systems are created to provide access to:
a.
Informational objects (visual, audio, tactile) found in or with a variety of
different
•
•
•
•
•
material media (stone, skins, paper, celluloid, electronic, etc.)
production states (unique, reproduced; eye-readable, non-eye-readable
requiring special mechanisms for reading, etc.)
production methods (hand-created, mechanically or electronically produced,
etc.)
symbol systems (language, graphic, etc.)
genre or kinds (books, articles, poems, tracts, pictures, spoken word sound
recordings, music sound recordings, motion pictures, electronic data bases,
websites, email, etc.)
b.
Information inside informational objects (inside books, articles, music
sound recordings, websites, databases, email, etc.) [i.e., “data strings”]
“Information” as it is used in this lecture refers to either or both of these things.
3
General Concepts of Information
Organization-II
Information organization systems have been the
product of modern social traditions, for ex.,
a.
b.
c.
d.
e.
f.
g.
h.
Bibliography (15th c. +)
Library cataloging (16th c. +)
Indexing & abstracting (Late 19th c. +)
Documentation (1890s-1960s)
Archival organization (French Rev. +)
Records organization (1900+, especially Post-WWII)
Museum organization (19th c. +, especially 1990s +)
Computerized Information Storage & Retrieval
(1950s+)
Convergence of information organization traditions
4
General Concepts of Information
Organization-III
Information organization system components:
•
•
•
Environments-contexts
Content
Users (needs, desires, habits in searching for information)
File types: Item files // Surrogate files (or combinations of these)
System vocabulary: the set of terms in a system available for searching
and to which information is linked (i.e., each given system has its own
“vocabulary” used in searching. Note!—System vs. Entry vocabulary
Terms:
—Words, codes, & other metadata that represent attributes of information or
information objects (names, titles, concepts, other attributes, etc.) and by
means of which that information is searched in a given system.
—Constitute metadata used in searching
—Generated from or imposed on information or information objects to which
they refer
System vocabularies and traditions of Information organization
has to do with how information objects have been
represented
5
Information Object Representation
A system’s vocabulary pertains to the attribute terms used
for searching. Shall it be “natural language” (NL)—i.e.,
i.e., strictly as found in the information or information
object—or controlled (CV) in some manner?
6
Information Object Representation in
Library Cataloging
7
Library Cataloging—MARC Tagging
See MARC 21 Concise Format at
<http://www.loc.gov/marc/bibliographic/ecbdhome.html>
8
ENCODED ARCHIVAL DESCRIPTION (EAD)
(See its TAG Library--http://www.loc.gov/ead/tglib/element_index.html)
9
FULL TEXT INDEXING (Natural Language)
(e.g., Google, for the term Controlled Vocabulary )
( Queensland Univ. of Tech’y )
10
2. Definitions
CONTROLLED VOCABULARY
1. A controlled vocabulary is an established list of
standardized terminology for use in indexing and retrieval
of information. An example of a controlled vocabulary is
subject headings used to describe library resources. –
(Library & Archives Canada)
2. A list of terms that have been enumerated explicitly. This
list is controlled by and is available from a controlled
vocabulary registration authority. All terms in a controlled
vocabulary must have an unambiguous, non-redundant
definition.—(ANSI/NISO Z39.19-2005)
3. “[O]rganized lists of words and phrases, or notation
systems, that are used to initially tag content, and then to
find it through navigation or search.” (Amy Warner)
4. “A controlled list of index terms is generally known as a
controlled vocabulary or as an authority list.” (F. W.
Lancaster, Vocabulary Control for Information Retrieval)
11
Definitions (cont’d)
AUTHORITY CONTROL/AUTHORITY WORK
1. Authority control is the means by which catalogers
maintain consistency of form, or a controlled
vocabulary, in catalog headings (names, places, titles,
subjects). (Moving Image Collections [MIC] website)
2. “[T]he consistent use and maintanence of the forms of
names, subjects, uniform titles, etc. used as headings
in a catalog.” An authority file is “a set of authority
records listing the chosen form of a heading and its
appropriate cross-references. Types of authority files
include name authority files, series authority files, and
subject authority files.” University of Buffalo Library.
Central Technical Services Website
12
Definitions Compared
1. Controlled vocabulary is ordinarily spoken of in
terms of subject or topical indexing.
2. Authority work is ordinarily spoken of in terms
of the specially created headings constructed
in library catalogs, including those related to
authors and titles, as well as those related to
subjects (i.e., both subject headings and
classification call numbers).
3. Keeping a record of CV is ordinarily done in
the form of a thesaurus, whereas keeping a
record of names, titles, and subject headings
in authority work is ordinarily done in the form
of an authority file. Most such files of the latter
kind are open-ended and incomplete with
respect to listing all possible terms.
13
Purposes of Controlled Vocabulary (CV)
1. From the standpoint of the searcher:
a. It disambiguates
•
•
equivalent terms
homographic terms
b. It provides term relationships to aid system
navigation
•
•
for assisting in query formulation or reformulation
for searching efficiency
2. From the standpoint of the information objects
& data strings to which it refers
a. It links similar or like objects and data strings
b. It gathers together similar or like objects and data
strings
In short, CV accomplishes the act of “collocation.”
14
Considerations when Creating a of
Controlled Vocabulary (CV)
1. Relationship to “automatic” indexing
2. Labor-intensive (even though it
represents a value-added activity of
information organization)
a. Thus, expensive
b. Thus, given human work, will contain errors
c. Not all retrieval needs it
•
•
Retrieval as “mapping”
vs
Retrieval as Question-Answer
15
Elements of Controlled Vocabularies
Disambiguation of Equivalent Names & Titles
1. Names
a.
Persons, family names
•
With surnames
–
–
•
b.
c.
2.
Single
Compound
Forenames only
Corporate body names (inc. private & public sectors)
Geographic names
Titles
a.
b.
c.
Variant titles
Ambiguous titles
Constructed titles
3. Concepts/Subject terms, etc.
4. Near-synonymy [synonym rings in Zeng, Kent State]
For examples of names and titles, see Miksa, Kinds of
Access Points, or, his Chapter 7 (Access Points: Kinds
16
and Forms)
Relationships between Terms &
Examples—from Zeng—Kent State4
1.
2.
3.
4.
5.
Semantic Linking
Equivalency
Hierarchy
Associative
[Other]
17
Structures of CVs
—from Zeng—Kent State3
1.
2.
3.
4.
Lists
Synonym rings
Taxonomy
Thesaurus
18
Establishing Terms and the Idea of
“Warrant”
1. See Zeng—Kent State2 specifically,
section 2.4
a. Literary warrant
b. User warrant
c. Organizational warrant
2. See the many writings of Claire Beghtol
on the idea of “warrant”
19
Sources of CV terms
1. Library of Congress
a. LoC “Authorities” website
b. LoC ClassificationWeb for Library of
Congress Subject Headings (LCSH) and the
Library of Congress Classification System
(LCC)
c. DDC website
d. LoC Cataloger’s Desktop for Subject
Cataloging Manual (SCM:SH)
20
Established Thesauri & Taxonomies
1. E.g., UNESCO thesaurus
2. Queensland University of Technology List
of Sources Queensland Univ. of Tech’y
3. Library and Archives Canada (Thesauri
and Controlled Vocabularies—
Bibliography)
4. Resource Description Framework
(RDF)—Schema Web
21
Download