Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs Reporter Hsan-Yu Lin Outline • • • • • Introduction Related Work Reformulation Strategies Reformulation Effectiveness Metrics Discussion And Conclusion Introduction • Query reformulation (refinement) – Users frequently modify a previous search query in hope of retrieving better results • Goal: – Look at the types of query reformulation users perform – Evaluate them using effectiveness metrics such as click data Related Work • Computer-Generated Reformulations Related Work • Query Session Boundary Detection – Automatic new topic identification using multiple linear regression (Information Processing & Management 2006) • using time and common words – Identification of User Sessions with Hierarchical Agglomerative Clustering (ASIS&T ‘06) • using hierarchical clustering to find better timeout value Procedure 1. Create taxonomy of query reformulation strategies defined by formal language 2. An unsupervised rule-based classifier in detecting the different query reformulation strategies 3. Analysis of correlations between query reformulation strategies and effectiveness metrics Reformulation Strategies • Definitions: _ : space character P = {',−,.} : punctuation λ : empty string Σ = {[a - z],[0 - 9]} U P : alphabet ci ∈ Σ : character wi ∈ Σ∗ : word zi ∈ ( Σ U {_} )∗ : any string Reformulation Strategies • REFORM. 1: WORD REORDER – seattle pizza palace pizza seattle palace • REFORM. 2: WHITESPACE AND PUNCTUATION – wal mart, tomatoprices walmart tomato prices Reformulation Strategies • REFORM. 3: REMOVE WORDS – yahoo stock price price yahoo • REFORM. 4: ADD WORDS – eastlake home eastlake home price index • REFORM. 5: URL STRIPPING – http www.yahoo.com yahoo Reformulation Strategies • REFORM. 6: STEMMING – running over bridges run over bridge • REFORM. 7: FORM ACRONYM – personal computer pc • REFORM. 8: EXPAND ACRONYM – pda personal digital assistant Reformulation Strategies • REFORM. 9: SUBSTRING – is there spyware on my computer is there spywa • REFORM. 10: SUPERSTRING – nevada police rec nevada police records 2008 • REFORM. 11: ABBREVIATION – shortened dict --> short dictionary Reformulation Strategies • REFORM. 12: WORD SUBSTITUTION • • • • • Synonym: easter egg search easter egg hunt Hyponym: crimson scarf red scarf Hypernym: personal computer laptop Meronym: finger hand Holonym: automobile wheel • REFORM. 13: SPELLING CORRECTION – reformualtion reformulation Undetected Reformulations • Categories of reformulations which are not included in taxonomy: – Semantic Rephrasing • how to calculate nutritional values weight watchers calculator – Multi-Reformulations • lane county gabrage lane county garbage disposal (add words and spelling correction) – Classifier Rule Limitations • spelling correction used a Levenshtein edit distance of 2 • Wordnet database limitation Undetected Reformulations The Rule-based Classifier Measures For Session Boundary Detection • Test data: – 100 users in the AOL query logs for evaluation – Same queries were removed (40.8% of queries) – 9,091 query pairs – 2,483 reformulations and 6,608 new queries (27.3% reformulations) Measures For Session Boundary Detection • Hope high precision but not necessarily high recall – interested in inter-reformulation rather than intrareformulation Reformulation Effectiveness Metrics • Data: AOL query logs (released on 08/03/2006) • Queries: 36,389,567 – 16,069,421 new queries – 14,861,326 same queries – 3,411,706 reformulations • Metrics – Click Pattern – Click URL – Rank Change of Clicked Results Click Pattern Click Pattern • (SkipSkip + ClickSkip) v.s (SkipClick + ClickClick) • (SkipSkip) v.s (SkipClick) Click URL Rank Change and Median Time between Queries Discussion • different reformulation strategies were effective depending on the action from the initial query – Word substitution • Skip Skip • Click Click – spelling correction • Skip Click • Click Skip Limitations • Lack of Context • Normalized Query Logs • Ambiguous Queries – ‘american airlines’ , ‘delta airlines’ • Search Engine Effects CONCLUSIONS • Describes the human side of query reformulation and contributes to our understanding of users in search interaction • add/remove words, word substitution, acronym expansion, and spelling correction seem most effective • acronym formation and reordering words may be less beneficial to the user