Search strategy & tactics Governed by effectiveness & feedback © Tefko Saracevic, Rutgers University 1 Some definitions • Search statement (query): – set of search terms with logical connectors and attributes - file and system dependent • Search strategy (big picture): – overall approach to searching of a question • selection of systems, files, search statements & tactics, sequence, output formats; cost, time aspects • Search tactics (action choices): – choices & variations in search statements • terms, connectors, attributes © Tefko Saracevic, Rutgers University 2 Some definitions (cont.) • Cycle : – set of commands from start (begin) to viewing (type) results, or from viewing to viewing command • Move : – modifications of search strategies or tactics that are aimed at improving the results © Tefko Saracevic, Rutgers University 3 Some definitions (cont.) • Effectiveness : – performance as to objectives • to what degree did a search accomplish what desired? • how well done in terms of relevance? • Efficiency : – performance as to costs • at what cost and/or effort, time? • Both KEY concepts & criteria for selection of strategy, tactics & evaluation © Tefko Saracevic, Rutgers University 4 Effectiveness criteria • Search tactics chosen & changed following some criteria of accomplishment – none - no thought given – relevance – magnitude – output attributes – topic/strategy • Tactics altered interactively – role & types of feedback • Knowing what tactics may produce what results – key to professional searcher © Tefko Saracevic, Rutgers University 5 Relevance: key concept in IR • Attribute/criterion reflecting effectiveness of exchange of inf. between people (users) & IR systems in communication contacts, based on valuation by people • Some attributes: – in IR - user dependent – multidimensional or faceted – dynamic – measurable - somewhat – intuitively well understood © Tefko Saracevic, Rutgers University 6 Types of relevance • Several types considered: – Systems or algorithmic relevance • relation between between a query as entered and objects in the file of a system as retrieved or failed to be retrieved by a given procedure or algorithm. Comparative effectiveness. – Topical or subject relevance: • relation between topic in the query & topic covered by the retrieved objects, or objects in the file(s) of the system, or even in existence; Aboutness.. © Tefko Saracevic, Rutgers University 7 Types of relevance (cont.) – Cognitive relevance or pertinence: • relation between state of knowledge & cognitive inf. need of a user and the objects provided or in the file(s). Informativeness, novelty ... – Motivational or affective relevance • relation between intents, goals & motivations of a user & objects retrieved by a system or in the file, or even in existence. Satisfaction ... – Situational relevance or utility: • relation between the task or problemat-hand. and the objects retrieved (or in the files). Relates to usefulness in decision-making, reduction of uncertainty ... © Tefko Saracevic, Rutgers University 8 Effectiveness measures • Precision: – probability that given that an object is retrieved it is relevant, or the ratio of relevant items retrieved to all items retrieved • Recall: – probability that given that an object is relevant it is retrieved, or the ratio of relevant items retrieved to all relevant items in a file • Precision easy to establish, recall is not – union of retrievals as a “trick” to establish recall © Tefko Saracevic, Rutgers University 9 Calculation Items RETRIEVED Items NOT RETRIEVED Judged RELEVANT Judged NOT RELEVANT a No. of items relevant & retrieved c relevant & not retrieved b not relevant & retrieved d not relevant & not retrieved Precision = Recall = a a+b a a+c High precision = maximize a, minimize b High recall = maximize a, minimize c © Tefko Saracevic, Rutgers University 10 Precision-recall trade-off • USUALLY: precision & recall are inversely related – higher recall usually lower precision & vice versa Precision 100 % 0 Recall © Tefko Saracevic, Rutgers University 100 % 11 Search tactics • What variations possible? – Several ‘things’ in a query can be selected or changed that affect effectiveness: 1. LOGIC – choice of connectors among terms (AND, OR, NOT, W …) 2. SCOPE – no. of concepts linked - ANDs (A AND B vs A AND B AND C) 3.EXHAUSTIVITY – for each concept no. of related terms - OR connections (A OR B vs. A OR B OR C) © Tefko Saracevic, Rutgers University 12 Search tactics (cont.) 4. TERM SPECIFICITY – for each concept level in hierarchy (broader vs narrower terms) 5. SEARCHABLE FIELDS – choice for text terms & non-text attributes (titles only, limits) 6. FILE OR SYSTEM SPECIFIC CAPABILITIES (ranking, target, sorting) © Tefko Saracevic, Rutgers University 13 Effectiveness “laws” SCOPE – more ANDs EXHAUSTIVITY – more ORs USE OF NOTs BROAD TERM USE – low specificity PHRASE USE - high specificity © Tefko Saracevic, Rutgers University Output size: down Recall: down Precision: up Output size: up Recall: up Precision: down Output size down Recall: down Precision: up Output size: up Recall: up Precision: down Output size: down Recall: down 14 Precision: up Recall, precision devices BROADENING higher recall: Fewer ANDs More ORs Fewer NOTs More free text Fewer controlled More synonyms Broader terms Less specific More truncation Fewer qualifiers Fewer LIMITs Citation growing © Tefko Saracevic, Rutgers University NARROWING higher precision: More ANDs Fewer ORs More NOTs Less free text More controlled Less synonyms Narrower terms More specific Less truncation More qualifiers More LIMITs Building blocks 15 Examples from a study • • • • • 40 users; question each 4 intermediaries; triadic HCI regular setting videotaped, logged 48 hrs of tape (72 min. avrg) – presearch: 16 min avrg. – online: 56 min avrg. • User judgments: 6225 items – 3565 relevant or part. rel. – 2660 not relevant • Many variables, measures & analyses © Tefko Saracevic, Rutgers University 16 Feedback Relevance feedback loops: • Content relevance feedback – judging relevance of items • Term relevance feedback – looking for new terms • Magnitude feedback – number of postings Strategy feedback loops: • Tactical review feedback – review of strategy (DS) • Terminology review feedback – term review & evaluation © Tefko Saracevic, Rutgers University 17 Data on feedback types Total feedback loops Content rel. fdb. Term rel. fdb. Magnitude fdb. Tactic. rev. fdb. Termin. rev. fdb Feedbacks initiated by: User Intermediary 885 (in 40 questions) 354 (40%) 67 (8%) 396 (45%) 56 (6%) 12 (1%) Rank 2 3 1 4 5 351 (40%) 534 (60%) (mostly magnitude) © Tefko Saracevic, Rutgers University 18 DIALOG commands Total number: 1677 (in 40 questions) By type: Select Type Change db. Display sets Limit Expand © Tefko Saracevic, Rutgers University In no. of quest. 1057 (63%) 462 (28%) 67 (4%) 57 (3%) 19 (1%) 6 (1%) 40 40 24 22 11 6 19