Relevance, Precision, and Recall

advertisement
Measuring system
performance
The library
A system view
Environment
Inputs
energy
money
materials
personnel
information
Transformational
process
Outputs
products
services
U
s
e
r
s
System performance
measures
recall
precision
relevance
Robert Taylor's four levels
of question formation
Q1
The actual but unexpressed need for
information (the visceral need)
Q2
The conscious, within-brain description
of the need (the conscious need)
Q3
The formal statement of the need
(the formalized need)
Q4
The question as presented to the information system (the compromised need)
Taylor, Robert S. 1968. Question-negotiation and information seeking in
libraries. College & Research Libraries 29(3): 178-194 (May 1968).
System-defined relevance
"My feet are killing me."
find health AND feet
The health of the lumber
90% industry in terms of cubic feet
of lumber produced
Information retrieval process
Question
formulation
Relevancy
determination
System: Which documents
are relevant to the query?
User: Are these documents
relevant to my needs?
Defining relevance
System-defined
User-defined
vs.
relevance
relevance
Objective
Often topical.
Does it match
the query?
Subjective.
Situational.
Is it useful?
User-defined relevance
"My feet are killing me."
The effect of lysergic acid diethylamide
ingestion on toenail fungus in cloned mice
Soothing remedies for aching feet
Controlling the body by controlling the mind-meditative techniques for dealing with pain
Determining topical relevance
• Analyze work as to what it
is about
• Assign to the document
one or more terms from a
finite list of topics
• Users can then search on
those topic indicators
Recall
No. of relevant documents
retrieved
Recall =
Total no. of relevant
documents in the file
Precision
No. of relevant documents
retrieved
Precision =
Total no. of documents
retrieved from the file
Precision vs. Recall
An inverse relationship
As the level of recall rises the
level of precision generally
declines and vice versa.
The Cranfield experiments (1957 & 1962)
Cyril Cleverdon, p.i.
Precision vs. Recall
Subject: sexual dimorphism
Word stemming:
sex
sexy
sexes
sexier
Recall Precision
sexual
sexiest
Field-specific searches:
DE,TI/sexual()dimorphism
Recall Precision
User-defined relevance
"Relevance appears to be a
subjective quality, unique
between the individual and a
given document supporting
the assumption that
relevance can only be judged
by the information user."
Miranda Pao
Years later
"My feet are still killing me."
The effect of lysergic acid diethylamide
ingestion on toenail fungus in cloned mice
Soothing remedies for aching feet
Controlling the body by controlling the mind-meditative techniques for dealing with pain
Factors affecting
relevance (1)
• Purpose of the information
• Situation of the user
• Level at which the information
source is written
– Journal of the Amer. Med. Assn.
– Healthy times
Factors affecting
relevance (2)
• Subject knowledge of the user
– Is the data new to the user?
– Does the information relate to the
user's prior knowledge?
• Values - ethical, social,
philosophical, political, religious,
legal
User-defined relevance
Subjectivity and fluidity make it
difficult to use as measuring tool
for system performance
Incorporating user-defined
relevance into information
retrieval systems (1)
• User performs search
• System retrieves results
.
.
.
Incorporating user-defined
relevance into information
retrieval systems (2)
• System asks user if he/she would
like to retrieve similar documents
Search for other documents with
similar word frequencies
Search for other documents with
same subject descriptors
Search for other documents
with same subject
descriptors
Main Author:
Title:
Gribbin, John R.
In search of Schrodinger's cat :
quantum physics and reality /
by John Gribbin.
Subject(s):
Schrodinger, Erwin, 1887-1961.
Quantum theory History.
Reality.
Amazon.com
Amazon.com
Amazon.com
Assisting users in
determining relevancy
Title
Abstract
Indexing
terms
Citation
data
Source: Barry, Carol L. 1998. Document representations and clues to document relevance.
Journal of the American Society for Information Science 49(14):1293-1303.
How
relevant
are these?
Document representation
research
Titles
Full
text
How
relevant
are these?
Title: Getting good
grades in
graduate school
Title: How to impress
your advisor in
graduate school
Title: Writing a
dissertation
Title: The well-written
graduate paper
Getting good grades in
graduate school
How to impress your advisor
in graduate school
The best way to get good
grades is to study hard…
Never show up late for a
meeting with your advisor…
Writing a dissertation
The well-written graduate
paper
The first thing to do is to
pick a topic that truly
interests you…
Before finalizing your topic do
a preliminary search on…
Document representation
research
How
relevant
are these?
Titles
Full
text
Citation Indexing
Abstracts
data
terms
Full
text
Full
text
Full
text
How
relevant
are these?
Utility studies - Indications that
user found relevant materials
• Citation & abstract databases
– User requests citations be formatted for
printing
– User requests citations be sent by e-mail
– User downloads citations
• Full-text databases
– Pull up the full text
– Print the article
– Download the article to their Blackberry
If user stops may
Utility studies - Indications
that
not have found a
user found relevantrelevant
materials
article
Search
chocolate
Short
list
Utility studies - Indications that
user found relevant materials
Search
View full
citation
data for
article
Short
list
View full
text of
article
Modifies
search
Download
or print
article
Assume that user
found article
relevant
Characteristics of searches
that produce relevant
materials
• Subject searching
• Utilization of Boolean operators
• Search modification
• Increased time in display activities
• User of greater number of
databases
Cooper, Michael Dr. and Hui-Min Chen. 2001. Predicting the relevance of a library catalog search.
Journal of the American Society for Information Science and Technology 52 (10):813-827.
Importance of abstract (1)
• Indication as to depth/scope of
the article Authors studied leg-hair count
variations of Drosophila in
Kawainui Marsh
• Delineates methodology-indication of reliability and
Random sampling in 40
validity
sectors during March, June,
September & December
• Gives indication as to content
Greater variation in June
novelty
Importance of abstract (2)
• Basis for research may
indicate recency
American housing market was
selected because it is always
robust.
• Delineation of results
indicates "tangibility"
(important, useful data)
Authors concluded that
American teenagers listen to
rock music.
Types of abstracts
• Indicative
• Informative
• Critical (evaluative)
(Not common in
library databases)
Indicative abstract
Indicates what the document is about but
doesn't report findings
Title: A review of the current
literature on relevance.
Abstract: The author reviews the
current literature on relevance.
Informative abstract
Acts as a substitute for the document
Title: The effects of library school on
the mental health of library students
Abstract: The authors performed
longitudinal studies on 32 graduate
students in 8 library and information
science programs and found a
significant increase in aberrant
psychological traits over time.
(fictitious title and abstracts)
Abstract creation
• Author-produced
• Vendor-added
• Automated abstracting
Automated abstracting
1. Word counts
2. Remove stop words
3. Weight remaining words
according to frequency
4. Search for sentences with
highest density of most
frequently-occurring words
1. Word count
Title: Seasonal variations in the feral cat
population of Fargo
the
is
a
to
cats
number
season
winter
81
68
56
42
61
45
27
11
summer
spring
fall
monthly
temperature
variation
food
availability
11
11
11
10
61
12
10
10
average
concept
per
over
immediate
implement
mortality
survival
9
7
8
9
5
3
8
9
2. Eliminate stop words
Title: Seasonal variations in the feral cat
population of Fargo
the
is
a
to
cats
number
season
winter
81
68
56
42
61
45
27
11
summer
spring
fall
monthly
temperature
variation
food
availability
11
11
11
10
61
12
10
10
average
concept
per
over
immediate
implement
mortality
survival
9
7
8
9
5
3
8
9
3. Rank by frequency
Title: Seasonal variations in the feral cat
population of Fargo
cats
temperature
number
seasonal
variation
winter
61
61
45
27
12
11
summer
spring
fall
monthly
food
availability
11
11
11
10
10
10
average
survival
mortality
concept
immediate
implement
9
9
8
7
5
3
4. Search for sentences with
highest density of high
frequency words
Title: Seasonal variations in the feral cat
population of Fargo
We found a significant seasonal variation in
the number of cats.
The highest number of cats are found in the
summer, the lowest number of cats in the
winter.
Automated abstract
... The Children's Internet Protection Act (CIPA) sets
conditions on public libraries' receipt of federal financial
assistance for Internet access. ... It would not have been
possible for the broadcasting station to limit the use of
federal funds to all non-editorializing activities. ... The
instant Court distinguished Velazquez, restricting its
holding to situations in which the grantee is "pit[ted] . . .
against the Government. ... " Justice Stevens asserted
that the filtering condition was unconstitutional because
it distorted the normal usage of library Internet terminals
as sources of a wide array of information. ... A condition
mandating Internet filters distorts this mission by
"deny[ing] patrons access to constitutionally protected
speech that libraries would otherwise provide. ...
Relevance and
information overload
In this age of information
overload, tools to aid the user
in determining relevance are
increasingly critical.
Download