Gupta R.

advertisement
Context-Sensitive Query
Auto-Completion
AUTHORS:NAAMA KRAUS AND ZIV BAR-YOSSEF
DATE OF PUBLICATION:NOVEMBER 2010
SPEAKER:RISHU GUPTA
1
Motivating Example
Desired Result
Current Result
I want to buy a
good Digital
Camera
digital camera reviews
digital camera buying guide
digital camera with wifi
digital camera deals
digital camera world
digital picture frame
digital copy
2
Most Challenging Auto-Completion Scenario
Challenge :Query Auto-Completion predicts the correct user’s query with only 12.8%
probability.
 Goal :To predict the user’s intended query reliably when user has entered only
one character.
 Advantages:
◦ Makes search experience faster
◦ Reduces load on servers in Instant Search
3
QAC Algorithms
User enters the
prefix “x” of
Query “q”
Ordered By
Quality Score
Returns a List
of “K”
Completions
Completion
“c” of Top K
Completion
List
“Hit” occurs if
“c”=“q”
QAC Algorithm
should also work
if “c” is
semantically
equal to “q”
Need efficient
data structure
for faster
lookup
Hash Table or Trie
4
Context-Sensitive Auto-Completion
How to Compensate for the lack of
information ??
Observation:
•
•
User searches within some context.
User context reflects user’s intent.
Context examples
•
Recent queries
•
Recently visited pages
•
Recent Tweets
•
etc…..
Our focus – “Recent queries”
• Accessible by search engines
• 49% of searches are preceded by a different
query in the same session
• For simplicity, in this presentation we focus
on the most recent query
5
Recent Query Use Approaches
How to tackle this problem ???
Generalize Most Popular
Completion Algorithm
Cluster Similar Queries
(Use of Techniques like HMMs)
Nearest Completion Algorithm
(Assumption:Context relevant
to the query)
Problem with this approach ??
• None of these previous studies took the user input (prefix) into
account in the prediction
• In 37% of the query pairs the former query has not occurred in
the log before
6
Nearest Completion:Measure of Similarity
Challenge:
Choosing similarity
measure that is
correlated and
universally
applicable
Completions must
be semantically
related to the
context query.
How to Overcome this challenge ??
Recommendation Based Query Expansion
• Represent queries and contexts as high- dimensional
term-weighted vectors and resort to cosine similarity.
• Idea :rich representation of a query is constructed not
from its search results, but rather from its
recommendation tree.
Recommendation Based Query
• Outputs list of recommendations which are
reformulations of previous query.
• Problem occurs when none of the recommendation
compatible with user query
7
Evaluation
EVALUATION METRIC
EVALUATION FRAMEWORK
MRR-Mean Reciprocal Rank
Evaluation Set
• A standard IR measure to evaluate a
retrieval of a specific object at a high
rank
• A random sample of (context,
query) pairs from the AOL log
wMRR-Weighted MRR
Prediction Task
• Weight sample pairs according to
“prediction difficulty” (total # of
candidate completions)
• Given context query and first
character of intended query 
predict intended query at as high
rank as possible
8
Analysis
NearestCompletion
• Fails when the context is
irrelevant (difficult to
predict whether the
context is relevant)
MostPopularCompletion
• Fails when the intended
query is not highly
popular (long tail)
Solution:
HybridCompletion
• HybridCompletion: a
combination of Most
popular Completion and
Nearest Completions
• Its MRR is 31.5% higher
than that of
MostPopularCompletion.
9
Most Popular VS Nearest Completion
Relevant Context:MRR of NearestCompletion (with
depth-3 traversal) is higher in 48% than that of
MostPopular-Completion.
NearestCompletion becomes
destructive, so its MRR is 19%
lower than that of
MostPopularCompletion.
10
How Hybrid Completion Works??
Produce Lists
Standardize
Hybrid Score
is Convex
Combination
• Produce top k completions of NearestCompletion
• Produce top k completions of MostPopularCompletion
• 𝑍𝑠𝑐𝑜𝑟𝑒 𝑞 = (𝑠𝑐𝑜𝑟𝑒 𝑞 − 𝜇)/𝜎
• Two lists differ in units and scale
• hybscore(q) = α · Zsimscore(q) + (1 − α) · Zpopscore(q)
• 0≤ α ≤1 is a tunable parameter
• Prior probability that context is relevant
11
MostPopular, Nearest, and Hybrid (2)
HybridCompletion
is shown to be at
least as good as
NearestCompletio
n when the
context is relevant
and almost as
good as
MostPopularCom
pletion when the
context is
irrelevant.
Examples
13
Conclusion
Query Auto Completion
Context Sensitive-Query
Auto Completion
MostPopularCompletion
Algorithm
Based on Popular
Queries(AOL Query Log)
Nearest Completion
Algorithm
•
•
Relevent Context:Based
on Users Recent
Queries
Recommendation
Based Algorithm: Rich
Query Representatin
HybridCompletion
Algorithm
Convex Combination of
NearestCompletion and
MostPopular
14
Future
• NearestCompletition: More effective session segmentation technique
• Predicting the first query in a session still remains an open problem
 Use of Other Context Resources like Recently Visited Web Pages or Search History
• Measure of Quality Evaluation should be more relaxed
• Rich query representation may be further fine tuned.
15
Download