LIS 605

advertisement
Indexing & retrieval
Approaches to indexing
Key word indexing
Concept indexing
Social indexing
Non-text indexing
Keyword Indexing
Keyword indexing (1)
Entity-oriented - draw terms from entity itself
Advantages:
• Quick
How to
succeed in
graduate
school
Keyword indexing (1)
Entity-oriented - draw terms from entity itself
Advantages:
• Quick
• Inexpensive
• No vocabulary lag
• Multiple access points
• Accuracy
• No intellectual effort needed
Keyword indexing (2)
Disadvantages:
• No control over synonyms,
near synonyms
• No control over homographs
Keyword indexing (3)
Disadvantages:
• Dependent on authors for
informative and accurate titles
Artificial metalloenzymes based on the
biotin−avidin technology:
enantioselective catalysis and beyond
The golden peaches of Samarkhand
Keyword indexing (4)
Disadvantages:
• No control over word forms
Communicating in the library
or
Communications in libraries
Keyword indexing (5)
Disadvantages:
• No cross reference structure
Historical key word
indexing methodologies
Uniterm cards
Edge-notched cards
Optical coincidence cards
Key word in context (KWIC)
Spatial indexing
Pre- versus post-coordinate
indexing
Mortimer Taube
China—Folklore
China—History
China —Politics
France —Folklore
France —History
France —Politics
Germany —Folklore
Germany —History
Germany —Politics
Russia —Folklore
Russia —History
Russia —Politics
(12 terms)
China, France,
Germany,
Russia, Folklore,
History, Politics
(7 terms)
Post-coordinate index
searching
History of France → France + History
Two sets of
documents
France
Boolean AND
search yields
intersection of
the two sets
History
France AND History
Advantages to Taube's system
No need to develop a list of authorized
terms—pulling terms from documents
themselves
No need to articulate rules of punctuation
for representing complex concepts
(France—History)
No need to delineate citation order
(France—history v. History—France)
No need to formulate rules for
subheadings ("May subdivide geog.")
Uniterm cards
One card per term
Document no. 102
"Arrest statistics of the Arizona State Police"
state
police
31 102 53 24 75 96 107 68 49 70
34 95
117
59
115
147
109
11 102 23
91
85 96 87 68 49
115
107
79
60
Searching with uniterm cards
Query: looking for documents about state
police
state
police
31 102 53 24 75 96 107 68 49 70
34 95
117
59
115
147
109
11 102 23
91
85 96 87 68 49
115
107
79
60
102 Arrest statistics of the Arizona State Police.
107 A short history of the Wisconsin State Police.
115 The modern police state.
Edge-notched cards
One card per bibliographic item
pet-care
Whirdeaux, Ima
bears
Caring for your pet
pterodactyl / by Ima
Whirdeaux
Turner, Paige
Call
no.pet
Q54321 .W45
Caring for
your
grizzly / by Paige Turner
pterodactyls
Call no. Q12345 .T8
Pyramid coding for edgenotched cards
Coding the year 1947*
20 dots
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
10 dots
9 5 2 0
8 4 1
7 3
6
9 5 2 0
8 4 1
7 3
6
*They hadn't heard of the Y2K problem yet.
Optical coincidence cards
Pre-printed cards with numbers for entire
database
fleas
0
10
20
30
40
50
60
70
80
90
1
11
21
31
41
51
61
71
81
91
2
12
22
32
42
52
62
72
82
92
3
13
23
33
43
53
63
73
83
93
4
14
24
34
44
54
64
74
84
94
5
15
25
35
45
55
65
75
85
95
6
16
26
36
46
56
66
76
86
96
7
17
27
37
47
57
67
77
87
97
8
18
28
38
48
58
68
78
88
98
9
19
29
39
49
59
69
79
89
99
Key Word in Context
Stop
(KWIC)
Index
word
Stop
word
Doc 15 title: "A comparison of OCLC and WLN
hit rates for monographs and an analysis of
the types of records retrieved"
CONTEXT
ttems of remote users: an
hit rates for monograph/A
comparison of OCLC and WLN
OCLC and WLN hit rates for
onographs/ A comparison of
arison of OCLC and WLN hit
n analysis of the types of
s of the types of records
phs and an analysis of the
A comparison of OCLC and
KEY WORDS
analysis of the types of
comparison of OCLC and WLN
hit rates for monographs and /
monographs and an analysi/
OCLC and WLN hit rates for
rates for monographs and /
records retrieved. A com/
retrieved. A comparison /
types of records retrieve/
WLN hit rates for monogra/
POINTER
15
15
15
15
15
15
15
15
15
15
Key Word Out of Context
(KWOC) Index
aardvark
baggage
banyan
coconut
driving
elementary
elephant
garage
hardware
meter
nadir
101
123
128, 159, 179
955, 654
196, 488, 788
455, 785
128, 465, 783
678, 398
849, 483, 399
768
877
noxious
opium
opus
people
quark
radar
radio
stereo
television
ultraviolet
zebra
112
289
985, 159, 849
629, 458
137, 492
968, 295
430, 206, 749
294, 837, 873
745, 727, 883
958, 774
276
Vector space model (VSM)
technology
Each document represented by a vector
libraries
Vector for
document entitled
"Assistive
technology for
libraries"
Vector space model matching
technology
Similarity between query and document
vectors
Vector for document 1
Vector for document 2
Vector for query
libraries
VSM term weighting
Assign high weights to terms that appear
frequently in the document but
infrequently in the database
No. of
Freq. w/in
documents
Term
document
with term
conclusion
low
high
information
high
high
blind
high
low
Query: "I'm looking for articles about
assistive technology for the blind."
VSM refinements
Adding semantic and syntactical
parsing.
Bill is going to the store to
make a purchase.
Bill is going to purchase the
store.
Bill is going to store his
purchase.
Concept indexing
Concept indexing
Rather than pulling terms from
documents, assign concept identifier
(e.g. France—History) to documents
dealing with history of France
Requires intellectual effort
Takes more time than key word
indexing so less economical
Avoids problems of false coordination
and synonymy through use of
vocabulary control
Vocabulary control (1)
One indexing term or phrase to
represent a concept
– Unidentified flying objects not flying
saucers
– Point user to correct term with "use"
reference
– Reduces number of searches
needed to find items about a
particular topic
Vocabulary control (2)
One form of a word to
represent the concept
– Dictionaries not dictionary
Vocabulary control (3)
One usage of a homographic
term
– Fault (geologic) not fault
(responsibility for error)
– Usage identified though scope note
– Consistency among indexers as well
as one indexer over time
– Helps user to avoid false drops
Vocabulary control (4)
Syndetic structure
– Broader terms
– Narrower terms
– Related terms (see also)
– User can negotiate structure to find
most appropriate term, as well as
identify additional related terms of
potential use in finding relevant
documents
Social network indexing
• Tags
• Tag clouds
• User-created tags providing
access to library resources
flickr
http://www.flickr.com/
Tags
Tags
Tags
architecture
Bohemian South
Country
Czech Republic
Europe
European
historical
medieval
old
Old Town
Other Keywords
River
Snow
town
Vltava
Tags
Tags
Tags
(177,583 photos)
Tags
Tag clouds
Geotagging
Librarian tagging
Library using flickr
Peace Palace Library (PPL)
Social bookmarking:
http://www.delicious.com
http://www.delicious.com/mauicclibrary
http://www.delicious.com/mauicclibrary
technology
The economic case for open access
in academic publishing
Portable software for USB drives
CU Researcher Finds 10,000-Year-Old
Hunting Weapon in Melting Ice Patch
University of Pennsylvania
http://www.library.upenn.edu/
PennTags
Item list with PennTags
Adding a PennTag
Add to PennTags
Non-text indexing
Indexing Music
Indexing music transcription
1
1
5
5
6
6
5
Indexing Music - melodic
contour
*
*
R
-
U
/
R
-
U
/
R
-
D
\
Query by humming
Query by humming (2)
Hummed Queries
Digital
MIDI Songs
Pitch Tracker
Melodic
Melody
Database
Audio
contour
Query Engine
Ranked List
Of
Matching Melodies
Source: Ghias, Asif; Logan, Jonathan; Chamberlin, David; and Brian C. Smith. 1995. Query by humming-musical Information retrieval in an audio database. ACM Multimedia 95 - Electronic Proceedings.
http://www.cs.cornell.edu/Info/Faculty/bsmith/query-by-humming.html
Indexing Music - melodic contour
http://www.musipedia.org/
*
R
U
R
U
R
D
Indexing Music - melodic contour
http://www.musipedia.org/
RURURD
*
R
U
R
U
R
D
Indexing Music - melodic contour
http://www.musipedia.org/
*
R
U
R
U
R
D
Indexing images
Source: Trust Territory archives.
Indexing images - chair (1)
Indexing images - ?
Indexing images - chair (2)
Biometrics - face
Biometrics - differences
Biometrics - similarities
Look at ratios of distances
between marker points
Indexing images
• Color
• Layout
• Shape
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by layout
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English
Indexing images by layout
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English
Indexing images by layout
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English
Indexing images by layout
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English
Indexing images by layout
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English
Indexing images by layout
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English
Indexing images by shape
http://shape.cs.princeton.edu/search.html
Indexing images by shape
http://shape.cs.princeton.edu/search.html
Indexing images by shape
http://shape.cs.princeton.edu/search.html
Indexing images by shape
Original
http://shape.cs.princeton.edu/search.html
Search by Shape – Commercial
Usage
http://www.youtube.com/watch?v=grShwnDXyUA
Search by Color Exercise
1
Title?
Artist?
3
4
2
5
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?selLang=English
Download