Search tactics.ppt

advertisement
Search strategy
& tactics
Governed by
effectiveness
&
feedback
© Tefko Saracevic, Rutgers University
1
Some definitions
• Search statement (query):
– set of search terms with logical
connectors and attributes - file
and system dependent
• Search strategy (big picture):
– overall approach to searching of
a question
• selection of systems, files, search
statements & tactics, sequence,
output formats; cost, time aspects
• Search tactics (action choices):
– choices & variations in search
statements
• terms, connectors, attributes
© Tefko Saracevic, Rutgers University
2
Some definitions (cont.)
• Cycle :
– set of commands from start
(begin) to viewing (type)
results, or from viewing to
viewing command
• Move :
– modifications of search
strategies or tactics that are
aimed at improving the results
© Tefko Saracevic, Rutgers University
3
Some definitions (cont.)
• Effectiveness :
– performance as to objectives
• to what degree did a search
accomplish what desired?
• how well done in terms of
relevance?
• Efficiency :
– performance as to costs
• at what cost and/or effort, time?
• Both KEY concepts & criteria
for selection of strategy,
tactics & evaluation
© Tefko Saracevic, Rutgers University
4
Effectiveness criteria
• Search tactics chosen &
changed following some
criteria of accomplishment
– none - no thought given
– relevance
– magnitude
– output attributes
– topic/strategy
• Tactics altered interactively
– role & types of feedback
• Knowing what tactics may
produce what results
– key to professional searcher
© Tefko Saracevic, Rutgers University
5
Relevance:
key concept in IR
• Attribute/criterion reflecting
effectiveness of exchange of
inf. between people (users) &
IR systems in communication
contacts, based on valuation
by people
• Some attributes:
– in IR - user dependent
– multidimensional or faceted
– dynamic
– measurable - somewhat
– intuitively well understood
© Tefko Saracevic, Rutgers University
6
Types of relevance
• Several types considered:
– Systems or algorithmic
relevance
• relation between between a
query as entered and objects in
the file of a system as retrieved
or failed to be retrieved by a
given procedure or algorithm.
Comparative effectiveness.
– Topical or subject relevance:
• relation between topic in the
query & topic covered by the
retrieved objects, or objects in
the file(s) of the system, or even
in existence; Aboutness..
© Tefko Saracevic, Rutgers University
7
Types of relevance (cont.)
– Cognitive relevance or
pertinence:
• relation between state of knowledge &
cognitive inf. need of a user and the
objects provided or in the file(s).
Informativeness, novelty ...
– Motivational or affective
relevance
• relation between intents, goals &
motivations of a user & objects
retrieved by a system or in the file, or
even in existence. Satisfaction ...
– Situational relevance or utility:
• relation between the task or problemat-hand. and the objects retrieved (or in
the files). Relates to usefulness in
decision-making, reduction of
uncertainty ...
© Tefko Saracevic, Rutgers University
8
Effectiveness measures
• Precision:
– probability that given that an
object is retrieved it is relevant,
or the ratio of relevant items
retrieved to all items retrieved
• Recall:
– probability that given that an
object is relevant it is retrieved,
or the ratio of relevant items
retrieved to all relevant items
in a file
• Precision easy to establish,
recall is not
– union of retrievals as a “trick”
to establish recall
© Tefko Saracevic, Rutgers University
9
Calculation
Items
RETRIEVED
Items
NOT RETRIEVED
Judged
RELEVANT
Judged
NOT RELEVANT
a
No. of items
relevant & retrieved
c
relevant &
not retrieved
b
not relevant &
retrieved
d
not relevant &
not retrieved
Precision =
Recall
=
a
a+b
a
a+c
High precision = maximize a, minimize b
High recall
= maximize a, minimize c
© Tefko Saracevic, Rutgers University
10
Precision-recall trade-off
• USUALLY: precision & recall
are inversely related
– higher recall usually lower
precision & vice versa
Precision
100 %
0
Recall
© Tefko Saracevic, Rutgers University
100 %
11
Search tactics
• What variations possible?
– Several ‘things’ in a query can
be selected or changed that
affect effectiveness:
1. LOGIC
– choice of connectors among
terms (AND, OR, NOT, W …)
2. SCOPE
– no. of concepts linked - ANDs
(A AND B vs A AND B AND C)
3.EXHAUSTIVITY
– for each concept no. of related
terms - OR connections
(A OR B vs. A OR B OR C)
© Tefko Saracevic, Rutgers University
12
Search tactics (cont.)
4. TERM SPECIFICITY
– for each concept level in hierarchy
(broader vs narrower terms)
5. SEARCHABLE FIELDS
– choice for text terms & non-text
attributes
(titles only, limits)
6. FILE OR SYSTEM SPECIFIC
CAPABILITIES
(ranking, target, sorting)
© Tefko Saracevic, Rutgers University
13
Effectiveness “laws”
SCOPE
– more ANDs
EXHAUSTIVITY
– more ORs
USE OF NOTs
BROAD TERM
USE
– low specificity
PHRASE USE
- high specificity
© Tefko Saracevic, Rutgers University
Output size: down
Recall: down
Precision: up
Output size: up
Recall: up
Precision: down
Output size down
Recall: down
Precision: up
Output size: up
Recall: up
Precision: down
Output size: down
Recall: down
14
Precision: up
Recall, precision devices
BROADENING higher recall:
Fewer ANDs
More ORs
Fewer NOTs
More free text
Fewer controlled
More synonyms
Broader terms
Less specific
More truncation
Fewer qualifiers
Fewer LIMITs
Citation growing
© Tefko Saracevic, Rutgers University
NARROWING higher precision:
More ANDs
Fewer ORs
More NOTs
Less free text
More controlled
Less synonyms
Narrower terms
More specific
Less truncation
More qualifiers
More LIMITs
Building blocks 15
Examples from a study
•
•
•
•
•
40 users; question each
4 intermediaries; triadic HCI
regular setting
videotaped, logged
48 hrs of tape (72 min. avrg)
– presearch: 16 min avrg.
– online: 56 min avrg.
• User judgments: 6225 items
– 3565 relevant or part. rel.
– 2660 not relevant
• Many variables, measures &
analyses
© Tefko Saracevic, Rutgers University
16
Feedback
Relevance feedback loops:
• Content relevance feedback
– judging relevance of items
• Term relevance feedback
– looking for new terms
• Magnitude feedback
– number of postings
Strategy feedback loops:
• Tactical review feedback
– review of strategy (DS)
• Terminology review feedback
– term review & evaluation
© Tefko Saracevic, Rutgers University
17
Data on feedback types
Total feedback
loops
Content rel. fdb.
Term rel. fdb.
Magnitude fdb.
Tactic. rev. fdb.
Termin. rev. fdb
Feedbacks
initiated by:
User
Intermediary
885
(in 40 questions)
354 (40%)
67 (8%)
396 (45%)
56 (6%)
12 (1%)
Rank
2
3
1
4
5
351 (40%)
534 (60%)
(mostly magnitude)
© Tefko Saracevic, Rutgers University
18
DIALOG commands
Total number:
1677
(in 40 questions)
By type:
Select
Type
Change db.
Display sets
Limit
Expand
© Tefko Saracevic, Rutgers University
In no. of quest.
1057 (63%)
462 (28%)
67 (4%)
57 (3%)
19 (1%)
6 (1%)
40
40
24
22
11
6
19
Download