ppt

advertisement
Analyzing and Evaluating
Query Reformulation Strategies in
Web Search Logs
Reporter Hsan-Yu Lin
Outline
•
•
•
•
•
Introduction
Related Work
Reformulation Strategies
Reformulation Effectiveness Metrics
Discussion And Conclusion
Introduction
• Query reformulation (refinement)
– Users frequently modify a previous search query
in hope of retrieving better results
• Goal:
– Look at the types of query reformulation users
perform
– Evaluate them using effectiveness metrics such as
click data
Related Work
• Computer-Generated Reformulations
Related Work
• Query Session Boundary Detection
– Automatic new topic identification using multiple
linear regression (Information Processing &
Management 2006)
• using time and common words
– Identification of User Sessions with Hierarchical
Agglomerative Clustering (ASIS&T ‘06)
• using hierarchical clustering to find better timeout
value
Procedure
1. Create taxonomy of query reformulation
strategies defined by formal language
2. An unsupervised rule-based classifier in
detecting the different query reformulation
strategies
3. Analysis of correlations between query
reformulation strategies and effectiveness
metrics
Reformulation Strategies
• Definitions:





_ : space character
P = {',−,.} : punctuation
λ : empty string
Σ = {[a - z],[0 - 9]} U P : alphabet
ci ∈ Σ : character
 wi ∈ Σ∗ : word
 zi ∈ ( Σ U {_} )∗ : any string
Reformulation Strategies
• REFORM. 1: WORD REORDER
– seattle pizza palace  pizza seattle palace
• REFORM. 2: WHITESPACE AND PUNCTUATION
– wal mart, tomatoprices  walmart tomato prices
Reformulation Strategies
• REFORM. 3: REMOVE WORDS
– yahoo stock price  price yahoo
• REFORM. 4: ADD WORDS
– eastlake home  eastlake home price index
• REFORM. 5: URL STRIPPING
– http www.yahoo.com  yahoo
Reformulation Strategies
• REFORM. 6: STEMMING
– running over bridges  run over bridge
• REFORM. 7: FORM ACRONYM
– personal computer  pc
• REFORM. 8: EXPAND ACRONYM
– pda  personal digital assistant
Reformulation Strategies
• REFORM. 9: SUBSTRING
– is there spyware on my computer  is there
spywa
• REFORM. 10: SUPERSTRING
– nevada police rec  nevada police records 2008
• REFORM. 11: ABBREVIATION
– shortened dict --> short dictionary
Reformulation Strategies
• REFORM. 12: WORD SUBSTITUTION
•
•
•
•
•
Synonym: easter egg search  easter egg hunt
Hyponym: crimson scarf  red scarf
Hypernym: personal computer  laptop
Meronym: finger  hand
Holonym: automobile  wheel
• REFORM. 13: SPELLING CORRECTION
– reformualtion  reformulation
Undetected Reformulations
• Categories of reformulations which are not
included in taxonomy:
– Semantic Rephrasing
• how to calculate nutritional values  weight watchers
calculator
– Multi-Reformulations
• lane county gabrage  lane county garbage disposal (add
words and spelling correction)
– Classifier Rule Limitations
• spelling correction used a Levenshtein edit distance of 2
• Wordnet database limitation
Undetected Reformulations
The Rule-based Classifier
Measures For Session Boundary Detection
• Test data:
– 100 users in the AOL query logs for evaluation
– Same queries were removed (40.8% of queries)
– 9,091 query pairs
– 2,483 reformulations and 6,608 new queries
(27.3% reformulations)
Measures For Session Boundary Detection
• Hope high precision but not necessarily high recall
– interested in inter-reformulation rather than intrareformulation
Reformulation Effectiveness Metrics
• Data: AOL query logs (released on 08/03/2006)
• Queries: 36,389,567
– 16,069,421 new queries
– 14,861,326 same queries
– 3,411,706 reformulations
• Metrics
– Click Pattern
– Click URL
– Rank Change of Clicked Results
Click Pattern
Click Pattern
• (SkipSkip + ClickSkip)
v.s
(SkipClick + ClickClick)
•
(SkipSkip)
v.s
(SkipClick)
Click URL
Rank Change and Median Time
between Queries
Discussion
• different reformulation strategies were
effective depending on the action from the
initial query
– Word substitution
• Skip  Skip
• Click  Click
– spelling correction
• Skip  Click
• Click  Skip
Limitations
• Lack of Context
• Normalized Query Logs
• Ambiguous Queries
– ‘american airlines’ , ‘delta airlines’
• Search Engine Effects
CONCLUSIONS
• Describes the human side of query
reformulation and contributes to our
understanding of users in search interaction
• add/remove words, word substitution,
acronym expansion, and spelling correction
seem most effective
• acronym formation and reordering words may
be less beneficial to the user
Download