Comparison of Keyword Searching Using FAST vs. Using

advertisement
Comparison of Keyword
Searching Using
FAST vs. Using LCSH
Presentation for the ALCTS CCS
Program: FAST: A New System
of Subject Access for
Cataloging and Metadata
New Orleans, Saturday, June 24, 2006
by Arlene G. Taylor
The Database

OCLC (i.e., Ed O’Neill and team) created a
test database of bibliographic records


Each record had both a set of LCSH
headings and a set of FAST headings


Records were a subset of Worldcat records
The FAST headings were “translated” from the
LCSH headings
Two indexes were created by OCLC’s
research team – one to search FAST
headings and one to search LCSH
headings
© 2006 Arlene G. Taylor
2
The Project


Participants were students at the
University of Pittsburgh in a Subject
Analysis class
Two parts



Search both LCSH and FAST indexes for
Newspapers in home state
Search four topics of interest in both LCSH
and FAST indexes
Students were asked to explain
differences found in the two indexes
© 2006 Arlene G. Taylor
3
Newspaper searches

Searches for “newspapers” and any
state that has an authorized AACR2
abbreviation are almost always different
in the two indexes


A search retrieves both records for
newspapers themselves and records for
works about newspapers
The state is abbreviated on some records,
using abbreviations in AACR2, but the
searcher almost always spells out the state
name (some states, e.g., Ohio and Iowa,
have no abbreviation)
© 2006 Arlene G. Taylor
4
Newspaper searches (cont.)

A record about newspapers may have an
LCSH subject heading:


In FAST this is translated to:



650 0 American newspapers $z
Pennsylvania $z Bucks County
650 7 American newspapers $2 fast
651 7 Pennsylvania $z Bucks County
fast
$2
A keyword search for “Newspapers
Pennsylvania” will retrieve the record in
both LCSH and FAST indexes.
© 2006 Arlene G. Taylor
5
Newspaper searches (cont.)

A record for a newspaper itself may have
the LCSH heading:


$v
In FAST this is translated to:



651 0 Clearfield (Clearfield County, Pa.)
Newspapers.
651 7 Pennsylvania $z Clearfield (Clearfield
County) $2 fast
655 7 Newspapers $2 fast
A keyword search for “Newspapers
Pennsylvania” will retrieve the record
only in the FAST index.
© 2006 Arlene G. Taylor
6
Part II of the project
While most students understood on
some level the different results they
got in Part I, few of them
understood their different results in
Part II.
 Therefore, the result of Part II was
to generate 76 topics that I then
searched again to determine results
and the reasons for differences.

© 2006 Arlene G. Taylor
7
Basic statistics






Number searches – 76
Number records found using FAST index
– 2371
Number records found using LCSH index
– 2340
Number records same using either index
– 2200
Number records not found using LCSH
index – 171
Number records not found using FAST
index - 140
© 2006 Arlene G. Taylor
8
Reasons for variation in
searching results





Invalid LCSH (or not established) not
translated to FAST
$x and or $v in 600 and 610 fields not
indexed in the LCSH index
Word indexed in FAST index because it
was in a 650 field with 2nd indicator 7 and
a $2 at the end, but the $2 contained a
code for a vocabulary other than FAST
Some names (personal or corporate) not
translated to FAST
Differences between LCSH and FAST
© 2006 Arlene G. Taylor
9
Invalid LCSH (or not established)
not translated to FAST



At the time of creation of the FAST file we
were working with, the rule was to
convert LCSH (6xx, 2nd indicator 0) to
FAST, but then only those headings that
matched a FAST authority record were
kept as FAST headings in the record.
117 records found using the LCSH index
were not found using the FAST index due
to this “rule”
An example showing a result of searching
for “information literacy” follows:
© 2006 Arlene G. Taylor
10
Search for “information literacy”:
650 0 Business $x Research.
650 0 Business $x Research $x Computer
network resources.
650 0 Information retrieval $x Study and teaching.
650 0 Electronic information resource literacy $x
Study and teaching.
650 7 Business $x Research $2 fast
650 7 Business $x Research $x Computer
network resources $2 fast
650 7 Information retrieval $x Study and
teaching $2 fast
Invalid LCSH (or not established)
not translated to FAST (cont.)



“Electronic information resource literacy”
is in the FAST authority file, but not
“Study and teaching.”
Currently the heading would have the
subdivision removed and a match would
be made to the heading without the
subdivision.
A keyword search for “information
literacy” in the future would find this
record through the FAST index as well as
the LCSH index.
© 2006 Arlene G. Taylor
12
$x and or $v in 600 and 610 fields
not indexed for the LCSH index



At the time of creation of the FAST and
LCSH indexes we were working with, only
subfields a,b,c,d (and q in 600) in fields
600 and 610 (with 2nd indicator 0) were
indexed for the LCSH index.
72 records found using the FAST index
were not found using the LCSH index due
to this “rule”
An example showing a result of searching
for “archives catalogs” follows:
© 2006 Arlene G. Taylor
13
Search for “archives catalogs”:
610 20 Baptist Missionary Society $x Archives $v
Catalogs.
650 0 Baptists $x Missions $z West Indies.
650 0 Baptists $x Missions $z Africa.
650 0 Baptists $x Missions $z Asia.
610 27 Baptist Missionary Society. $2 fast
650 7 Archives $2 fast
650 7 Baptists $x Missions $2 fast
651 7 Africa $2 fast
651 7 Asia $2 fast
651 7 West Indies $2 fast
655 7 Catalogs $2 fast
$x and or $v in 600 and 610 fields
not indexed for the LCSH file (cont.)
Currently these subfields would be
included in the LCSH index.
 A keyword search for “archives
catalogs” in the future would find
this record through the LCSH index
as well as the FAST index.

© 2006 Arlene G. Taylor
15
Word indexed in FAST index because it
was in a 650 field with 2nd indicator 7
and a $2 at the end
Not all 2nd indicator 7, $2 designated
terms are FAST terms – some are
from gsafd, nasa, ram, lctgm, etc.
 40 records found using the FAST
index were not found using the
LCSH index due to this oversight
 An example showing a result of
searching for “dog training” follows:

© 2006 Arlene G. Taylor
16
Search for “dog training”:
650 0 Dog trainers $z Arkansas $z Blanchard
Springs.
650 7 Animal training $z Arkansas $z Blanchard
Springs $y 1950-1960. $2 lctgm
650 7 Dogs $z Arkansas $z Blanchard Springs $y
1950-1960. $2 lctgm
650 7 Photojournalism $z Arkansas $z Little
Rock $y 1950-1960. $2 lctgm
650 7 Dog trainers $2 fast
651 7 Arkansas $z Little Rock $2 fast
Word indexed in FAST index because it
was in a 650 field with 2nd indicator 7
and a $2 at the end (cont.)
Currently the indexing program
would be refined so as not to include
fields with 2nd indicator 7 and $2
unless “fast” is in $2.
 A keyword search for “dog training”
in the future would not find this
record through either the LCSH
index or the FAST index.

© 2006 Arlene G. Taylor
18
Some names (personal or
corporate) not translated to FAST



The program that translated LC 6xx
headings to FAST compared names to the
“FAST authority file” and validated only
those that were matched in the file.
20 records found using the LCSH index
were not found using the FAST index due
to this “rule”
An example showing a result of searching
for “technical services” follows:
© 2006 Arlene G. Taylor
19
Search for “technical services”:
610 20 Kansas Real Estate Commission $x
Auditing.
610 10 Kansas. $b State Board of Technical
Professions $x Auditing.
610 10 Kansas. $b Board of Emergency Medical
Services $x Auditing.
610 27 Kansas Real Estate Commission $2 fast
610 17 Kansas. $b State Board of Technical
Professions $2 fast
650 7 Auditing $2 fast
Some names (personal or corporate)
not translated to FAST (cont.)
The corporate name containing
“technical” is in the FAST authority
file, but not the name containing
“services.”
 A keyword search for “technical
services” in the future would find
this record through the FAST index
as well as the LCSH index.

© 2006 Arlene G. Taylor
21
Differences between LCSH
and FAST







“Politics and government” as a subdivision in LCSH is
changed to “Political science” in FAST
“Appropriations and expenditures” as a subdivision in
LCSH is changed to “Expenditures, Public” in FAST
“Exhibitions” as a subdivision in LCSH is changed to
“Exhibition catalogs” in FAST
“Columbia River Watershed” and “Pacific Coast (U.S.)”
were translated to FAST with “United States” as a
geographic heading
“Arabic is a language element in LCSH and is also coded in
the 008 field. This is considered redundant in FAST
“Library” as a subdivision in LCSH is changed to “Libraries”
in FAST
“Study and teaching (Higher)” as a subdivision in LCSH is
changed to “Higher education” in FAST
© 2006 Arlene G. Taylor
22
“Politics and government” as a
subdivision in LCSH is changed to
“Political science” in FAST



This change affects any keyword search using
any one of the words: politics, government,
political, or science
1 record found using the LCSH index was not
found using the FAST index, and 27 records
found using the FAST index were not found using
the LCSH index due to this “rule”
Examples showing a result of searching for
“government documents” and a result of
searching for “religion and science” follow:
© 2006 Arlene G. Taylor
23
Search for “government documents”:
651 0 Egypt $x Politics and government $y 30 B.C.640 A.D. $v Sources.
650 0 Legal documents $z Egypt $x History $v
Sources.
648 7 30 B.C. - 640 A.D. $2 fast
650 7 Legal documents $2 fast
650 7 Political science $2 fast
651 7 Egypt $2 fast
655 7 History $2 fast
655 7 Sources $2 fast
Search for “religion and science”:
650 0 Islam and politics $z Algeria.
650 0 Religion and politics $z Algeria.
651 0 Algeria $x Politics and government.
650 7 Islam and politics $2 fast
650 7 Political science $2 fast
650 7 Religion and politics $2 fast
651 7 Algeria $2 fast
“Appropriations and expenditures” as
a subdivision in LCSH is changed to
“Expenditures, Public” in FAST



This change affects any keyword search
using the word “appropriations” or the
word “public”
23 records found using the FAST index
were not found using the LCSH index
due to this “rule”
An example is the search for “public
service”:
© 2006 Arlene G. Taylor
26
Search for “public service”:
610 10 United States. $b Dept. of the Air
Force $x Appropriations and expenditures.
610 10 United States. $b Defense Finance and
Accounting Service. $b Denver Center $x
Auditing.
610 17 United States. $b Defense Finance and
Accounting Service. $b Denver Center $2 fast
610 17 United States. $b Dept. of the Air
Force. $2 fast
650 7 Auditing $2 fast
650 7 Expenditures, Public $2 fast
“Exhibitions” as a subdivision in LCSH
is changed to “Catalogs $v Exhibition
catalogs” in FAST
This change affects any keyword
search using the words: exhibition,
exhibitions, or catalogs
 4 records found using the FAST
index were not found using the
LCSH index due to this “rule”
 An example is the search for
“archives catalogs”:

© 2006 Arlene G. Taylor
28
Search for “archives catalogs”:
610 10 United States. $b National Archives and Records
Administration $x Photograph collections $v
Exhibitions.
650 0 Photography $z United States $x History $y
20th century $v Exhibitions.
610 17 United States. $b National Archives and Records
Administration $2 fast
648 7 1900 - 1999 $2 fast
650 7 Photograph collections $2 fast
650 7 Photography $2 fast
651 7 United States $2 fast
655 7 Catalogs $v Exhibition catalogs $2 fast
655 7 History $2 fast
“Columbia River Watershed” and “Pacific
Coast (U.S.)” were translated to FAST with
“United States” as a geographic heading
This change affects any searches
qualified by “United States” spelled
out
 2 records found using the FAST
index were not found using the
LCSH index due to this “rule”
 An example is the search for
“endangered species United States”:

© 2006 Arlene G. Taylor
30
Search for “endangered species United States”:
650 0 Endangered species $z Columbia River
Watershed.
650 0 Logging $x Environmental aspects $z
Columbia River Watershed.
610 20 Plum Creek Timber Company.
610 27 Plum Creek Timber Company $2 fast
650 7 Endangered species $2 fast
650 7 Logging $x Environmental aspects $2
fast
651 7 United States $z Columbia River
Watershed $2 fast
“Arabic is a language element in LCSH
and is also coded in the 008 field –
redundant in FAST
This change affects any searches
using the word “Arabic.”
 2 records found using the LCSH
index were not found using the
FAST index due to this “rule”
 An example is the search for “arabic
books”:

© 2006 Arlene G. Taylor
32
Search for “arabic books”:
008
990614s1960 ru 000 0 ara d
500
In Russian and Arabic.
650 0 Russian language $v Conversation and
phrase books $x Arabic.
650 7 Russian language $2 fast
655 7 Conversation and phrase books $2 fast
“Library” as a subdivision in LCSH
is changed to “Libraries” in FAST
This change affects any searches
using the word “library” or the word
“libraries”
 2 records found using the FAST
index were not found using the
LCSH index due to this “rule”
 An example is the search for
“medical libraries”:

© 2006 Arlene G. Taylor
34
Search for “medical libraries”:
650 0 Medicine $v Bibliography $v Catalogs.
610 20 Moody Medical Library $v Catalogs.
600 10 Blocker, T. G. $q (Truman Graves) $x
Library $v Catalogs.
600 17 Blocker, T. G. $q (Truman Graves) $2
fast
610 27 Moody Medical Library. $2 fast
650 7 Libraries $2 fast
650 7 Medicine $2 fast
655 7 Bibliography $v Catalogs $2 fast
655 7 Catalogs $2 fast
“Study and teaching (Higher)” as a
subdivision in LCSH is changed to
“Higher education” in FAST
This change affects any searches
using the words: study, teaching,
education
 1 record found using the FAST index
was not found using the LCSH index
due to this “rule”
 An example is the search for
“education policy”:

© 2006 Arlene G. Taylor
36
Search for “education policy”:
650 0 Arctic regions $x Research $x
Government policy $z Canada.
650 0 Research $z Arctic regions.
651 0 Arctic regions $x Study and teaching
(Higher) $z Canada.
650 7 Education, Higher $2 fast
650 7 Research $2 fast
650 7 Research $x Government policy $2 fast
651 7 Arctic regions $2 fast
651 7 Canada $2 fast
Conclusions
A total of 62 records were affected
by real differences between LCSH
and FAST – about 3%
 The real differences affected 9 of
the 76 searches – about 12% –
(but only 62 of the records in those
9 searches were affected – 472
records in the 9 searches were the
same in both indexes)

© 2006 Arlene G. Taylor
38
Thank you!
Arlene G. Taylor
ataylor@mail.sis.pitt.edu
© 2006 Arlene G. Taylor
39
Download