Evaluation in searching.ppt

advertisement
EVALUATION
in searching
IR systems
Digital libraries
Reference sources
Web sources
© Tefko Saracevic, Rutgers
University
1
Importance of
evaluation
Integral part of searching
always there - wanted or not
informal or formal
Growing problem for all
 information explosion makes
finding “good” stuff very difficult
Formal evaluation part of
professional job & skills
requires knowledge of evaluation
criteria, measures, methods
more & more prized
© Tefko Saracevic, Rutgers
University
2
Information
systems
Considered here:
information retrieval (IR)
systems, e.g. Dialog, Nexis …
sources included in digital
libraries, e.g. Rutgers
reference services e.g. in libraries
or commercial on the Web
Web sources e.g. as found in
many domain sites
Many approaches, criteria,
measures, methods are similar &
can be adapted for specific
source or information system
© Tefko Saracevic, Rutgers
University
3
Broad context
Evaluating the role that an
information system plays as
related to:
SOCIETY - community,
culture, discipline ...
INSTITUTION - university,
organization, company ...
INDIVIDUALS - users &
potential users (nonusers)
Roles lead to broad, but hard
questions as to what context
to choose for evaluation
© Tefko Saracevic, Rutgers
University
4
Context (cont.)
Social:
how well does an information
system support social
demands & roles?
• hardest to evaluate
Institutional:
how well does it support
institutional/organizational
mission & objectives?
• tied to objectives of institution
• also hard to evaluate
Individual:
how well does it support inf.
needs & activities of people?
• most evaluations in this context
© Tefko Saracevic, Rutgers
University
5
Approaches to
evaluation
Many approaches exist
quantitative, qualitative …
effectiveness, efficiency ...
each has strong & weak points
Systems approach prevalent
Effectiveness: How well does a
system perform that for which it
was designed?
Evaluation related to objective(s)
Requires choices:
• Which objective, function to
evaluate?
© Tefko Saracevic, Rutgers
University
6
Approaches …
(cont.)
Economics approach:
Efficiency: at what costs?
Cost-effectiveness: cost for a
given level of effectiveness
Ethnographic approach
practices, effects within an
organization, community
learning & using practices &
comparisons
© Tefko Saracevic, Rutgers
University
7
Basic requirements
for evaluation
Once a context is selected need
to specify all five:
1. Construct
A system, process, source
• e.g. a given IR function or system;
a Web site, a Dlib source
2. Criteria - to reflect objective(s)
e.g. relevance, utility,
satisfaction, accuracy,
completeness, time, costs
3. Measure(s) - to reflect criteria
precision, recall, various Likert
scales, $$$, ...
© Tefko Saracevic, Rutgers
University
8
Requirements …
(cont.)
4. Measuring instrument judgments by users on relevance
or on a scale; cost/function
5. Methodology - procedures for
collecting & analyzing data
No evaluation can proceed if not
ALL of these are specified!
Sometimes specification on
some are informal & implied, but
they are always there.
© Tefko Saracevic, Rutgers
University
9
In IR: 1. Construct
In research: most done on test
collections & test questions
Text Retrieval Conference - TREC
• evaluation of algorithms, interactions
• reported in research literature
In practice: on use & user level:
mostly done on operational
collections & systems
e.g. Dialog, Nexis, various files
• evaluation, comparison of various
procedures, commands, contents
• user proficiencies, characteristics
• evaluation of interactions
• reported in professional literature
© Tefko Saracevic, Rutgers
University
10
2. Criteria
Relevance basic & most
used criterion
strengths, weaknesses
Relevance as area of study
basic notion in inf. science
User & use level: many other
utility, satisfaction, success,
time, value, impact ...
Market evaluations:
those + quality, fitness-of-use,
penetration ...
© Tefko Saracevic, Rutgers
University
11
3. Measures
Precision & recall preferred
Problem with recall: how?
use of methodological “tricks”
some think as metaphysical
Use & user level
Likert scales, differentials for
many criteria
• e.g. satisfaction on a scale of 1
to x (1=not satisfied, x=satisfied)
observational measures
• e.g. overlap, consistency
© Tefko Saracevic, Rutgers
University
12
4.Instruments
People used as instruments
they judge relevance, scale ...
But people who?
users, surrogates, analysts,
domain experts, librarians ...
How do relevance, utility ...
judges effect results?
who knows?
Reliability of judgments:
about 50 - 60% for experts
© Tefko Saracevic, Rutgers
University
13
5. Methods
Includes design, procedures
for observations, experiments,
analysis of results
Challenges:
Validity? Reliability? Reality?
•
•
•
•
Collection - selection? size?
Request - generation?
Searching - conduct?
Results - obtaining? judging?
feedback?
• Analysis - conduct? tools?
• Interpretation - warranted?
generalizable?
© Tefko Saracevic, Rutgers
University
14
Criteria for
evaluation of
information sources
Includes Dlib & Web sources
What?
Content: Subject? Topic?
Level? Depth? Exhaustively?
Specificity? Organization?
Timeliness of content? Up-todate? Revisions? Accuracy?
Why?
Purpose? Scope? Viewpoint?
For?
Intended audience? What need
satisfied? Appropriateness?
© Tefko Saracevic, Rutgers
University
15
Criteria ...
Who done it?
Author(s), institution, company,
publisher, creator:
• Authority? Reputation? Credibility?
Persistency? Trustworthiness?
• Refereeing? Transparency?
How?
Content treatment:
• Readability? Style? Organization?
Clarity?
Physical treatment:
• Format? Layout? Legibility?
Visualization?
Where?
Availability? Accessibility?
© Tefko Saracevic, Rutgers
University
16
Criteria ...
How?
Searching, navigation,
browsing?
Feedback? Links?
Output: Organization?
Features? Variations? Control?
Effort? Learning factors?
How much?
Price? Total costs? Costbenefits?
In comparison to?
Other similar sources?
© Tefko Saracevic, Rutgers
University
17
Conclusions
Evaluation is a complex task
but also an essential part of being
an information professional
Traditional approaches &
criteria still apply
but new ones added or adapted
to satisfy new sources, & new
methods of access & use
Evaluation skills in growing
demand particularly because
Web is value neutral
Great professional skill to sell!
© Tefko Saracevic, Rutgers
University
18
Download