Search tactics.ppt

advertisement

Search strategy

& tactics

Governed by effectiveness

& feedback

© Tefko Saracevic 1

Some definitions

Search statement (query):

– set of search terms with logical connectors and attributes - file and system dependent

Search strategy (big picture):

– overall approach to searching of a question

 selection of systems, files, search statements & tactics, sequence, output formats; cost, time aspects

© Tefko Saracevic 2

Some definitions

(cont.)

Search tactics

(action choices):

– choices & variations in search statements

 terms, connectors, attributes

Move :

– modifications of search strategies or tactics that are aimed at improving the results

Cycle

(particularly applicable to systems such as DIALOG):

– set of commands from start

(begin) to viewing (type) results, or from a viewing to a viewing command

© Tefko Saracevic 3

Some definitions

(cont.)

Effectiveness :

– performance as to objectives

 to what degree did a search accomplish what desired?

 how well done in terms of relevance?

Efficiency :

– performance as to costs

 at what cost and/or effort, time?

Both KEY concepts & criteria for selection of strategy, tactics & evaluation

© Tefko Saracevic 4

Effectiveness criteria

• Search tactics chosen & changed following some criteria of accomplishment, such as:

– none - no thought given

– relevance (very often)

– magnitude (also very often)

– output attributes

– topic/strategy

• Tactics altered interactively

– role & types of feedback

Knowing what tactics may produce what results key to professional searcher

© Tefko Saracevic 5

Relevance: key concept in IR

• Attribute/criterion reflecting effectiveness of exchange of inf. between people (users) & IR systems in communication contacts, based on valuation by people

• Some attributes:

– in IR - user dependent

– multidimensional or faceted

– dynamic

– measurable - somewhat

– intuitively well understood

© Tefko Saracevic 6

Types of relevance

• Several types considered:

Systems or algorithmic relevance

 relation between between a query as entered and objects in the file of a system as retrieved or failed to be retrieved by a given procedure or algorithm. Comparative effectiveness.

Topical or subject relevance:

 relation between topic in the query & topic covered by the retrieved objects, or objects in the file(s) of the system, or even in existence; Aboutness..

© Tefko Saracevic 7

Types of relevance

(cont.)

Cognitive relevance or

pertinence:

 relation between state of knowledge

& cognitive inf. need of a user and the objects provided or in the file(s).

Informativeness, novelty ...

Motivational or affective relevance

 relation between intents, goals & motivations of a user & objects retrieved by a system or in the file, or even in existence. Satisfaction ...

Situational relevance or utility:

 relation between the task or problem-at-hand. and the objects retrieved (or in the files). Relates to usefulness in decision-making, reduction of uncertainty ...

© Tefko Saracevic 8

Effectiveness measures

Precision:

– probability that given that an object is retrieved it is relevant, or the ratio of relevant items retrieved to all items retrieved

• Recall:

– probability that given that an object is relevant it is retrieved, or the ratio of relevant items retrieved to all relevant items in a file

• Precision easy to establish, recall is not

 union of retrievals as a “trick” to establish recall

© Tefko Saracevic 9

Calculation

Items

RETRIEVED

Items

NOT RETRIEVED

Judged

RELEVANT a

No. of items relevant & retrieved c relevant & not retrieved

Judged

NOT RELEVANT b not relevant &

retrieved d not relevant & not retrieved

Precision

Recall

=

= a a + b a a + c

High precision = maximize a, minimize b

High recall = maximize a, minimize c

© Tefko Saracevic 10

Interpretation:

PRECISION

• Precision= percent of relevant stuff you have in your answer

– or conversely percent of junk

– high precision = most stuff relevant

– low precision = a lot of junk

• Some users demand high precision

– do not want to wade through much stuff

– but it comes at a price: relevant stuff may be missed

 tradeoff

© Tefko Saracevic 11

Interpretation:

RECALL

• A file may have a lot of relevant stuff

• Recall = percent of that relevant stuff in the file that you retrieved

– conversely percent of stuff you missed

– high recall = you missed little

– low recall = you missed a lot

• Some users demand high recall

(e.g. PhD students doing dissertation)

– want to make sure that important stuff is not missed

– but will have to pay a price of wading through a lot of junk

 tradeoff

© Tefko Saracevic 12

Precision-recall trade-off

USUALLY: precision & recall are inversely related

– higher recall usually lower precision & vice versa

100 %

0

© Tefko Saracevic

Recall

100 %

13

Interpretation:

TRADE-OFF

• It is like in life, usually:

– you get some lose some

• Usually, but not always

 keep in mind these are probabilities

– when you have high precision most stuff you got is relevant or on the target but you missed stuff that is also relevant – it was left behind

– when you have high recall you did not miss much but you got also a lot of junk - wading through it

You use different tactics for high recall from those for high precision

© Tefko Saracevic 14

Search tactics

• What variations possible?

– several ‘things’ in a query can be selected or changed that affect effectiveness

– each variation has consequence in output

 if I do X then Y will happen

1. LOGIC

– choice of connectors among terms ( AND, OR, NOT, W …)

2. SCOPE

– no. of terms linked ANDs

(A AND B vs A AND B AND C)

© Tefko Saracevic 15

Search tactics (cont.)

3.EXHAUSTIVITY

– for each concept no. of related terms

OR connections

(A OR B vs. A OR B OR C)

4. TERM SPECIFICITY

– for each concept level in hierarchy

( broader vs narrower terms)

5. SEARCHABLE FIELDS

– choice for text terms & non-text attributes

 e.g. titles only, limit as to years

6. FILE OR SYSTEM SPECIFIC

CAPABILITIES

– e.g. ranking, sorting

© Tefko Saracevic 16

Effectiveness “laws”

SCOPE

- adding more ANDs

Output size: down

Recall: down

Precision: up

EXHAUSTIVITY

- adding more more

ORs

USE OF NOTs

- adding more NOTs

Output size: up

Recall: up

Precision: down

Output size down

Recall: down

Precision: up

BROAD TERM

USE

– low specificity

PHRASE USE

high specificity

© Tefko Saracevic

Output size: up

Recall: up

Precision: down

Output size: down

Recall: down

Precision: up

17

Tactics: What to do?

• To increase precision:

– use precision devices

• To increase recall:

– use recall devices

• Each will also affect magnitude of output

• With experience use of these devices will become will become second nature

© Tefko Saracevic 18

Recall, precision devices

BROADENING higher recall:

Fewer ANDs

More ORs

Fewer NOTs

More free text

Fewer controlled

More synonyms

Broader terms

Less specific

More truncation

Fewer qualifiers

Fewer limits

Citation growing

NARROWING higher precision:

More ANDs

Fewer ORs

More NOTs

Less free text

More controlled

Less synonyms

Narrower terms

More specific

Less truncation

More qualifiers

More limits

Building blocks

© Tefko Saracevic 19

Other tactics

• Citation growing:

– find a relevant document

– look for documents cited in

– look for documents citing it

– repeat on newly found relevant documents

• Building blocks

– find documents with term A

– review – add term B & so on

• Using different feedbacks

– a most important tool

© Tefko Saracevic 20

Feedback in searching

• Any feedback implies loops

– a completion of a process provides information for modification, if any, for the next process

– information from output is used to change previous or create new input

• In searching:

– some information taken from output of a search is used to do something with next query (search statement)

 examine what you got to decide what to do next in searching

– a basic tactic in searching

• Several feedback types used in searching

– each used for different decisions

© Tefko Saracevic 21

Feedback types

• Content relevance feedback

– judge relevance of items retrieved

– make decision what to do next

 switch files, change exhaustivity …

• Term relevance feedback

– find relevant documents

– examine what other terms used in those documents

– search using additional terms

 also called query modification & in some systems done automatically

• Magnitude feedback

– on the basis of size of output make tactical decisions

 often the size so big that documents are not examined but next search done to limit size

© Tefko Saracevic 22

Feedback types (cont.)

• Tactical review feedback

– after a number of queries (search statements) in the same search review tactics as to getting desired outputs

 review terms, logic, limits …

– change tactics accordingly

• Strategic review feedback

– after a while (or after consultation with user) review the “big” picture on what searched and how

 sources, terms, relevant documents, need satisfaction, changes in question, query …

– do next searches accordingly

– used in reiterative searching

• There is a difference between reviewing strategy & tactics

– but they can be combined

© Tefko Saracevic 23

Bates Berry-picking model of searching

“…moving through many actions towards a general goal of satisfactory completion of research related to information need.”

– query is shifting (continually)

 as search progresses queries are changing

 different tactics are used

– searcher (user) may move through a variety of sources

 new files, resources may be used

 strategy may change

© Tefko Saracevic 24

Berry-picking …

– new information may provide new ideas, new directions feedback is used in various ways

– question is not satisfied by a single set of answers, but by a series of selections & bits of information found along the way

 results may vary & may have to be provided in appropriate ways & means

© Tefko Saracevic 25

Download