Context-Sensitive Query Auto-Completion AUTHORS:NAAMA KRAUS AND ZIV BAR-YOSSEF DATE OF PUBLICATION:NOVEMBER 2010 SPEAKER:RISHU GUPTA 1 Motivating Example Desired Result Current Result I want to buy a good Digital Camera digital camera reviews digital camera buying guide digital camera with wifi digital camera deals digital camera world digital picture frame digital copy 2 Most Challenging Auto-Completion Scenario Challenge :Query Auto-Completion predicts the correct user’s query with only 12.8% probability. Goal :To predict the user’s intended query reliably when user has entered only one character. Advantages: ◦ Makes search experience faster ◦ Reduces load on servers in Instant Search 3 QAC Algorithms User enters the prefix “x” of Query “q” Ordered By Quality Score Returns a List of “K” Completions Completion “c” of Top K Completion List “Hit” occurs if “c”=“q” QAC Algorithm should also work if “c” is semantically equal to “q” Need efficient data structure for faster lookup Hash Table or Trie 4 Context-Sensitive Auto-Completion How to Compensate for the lack of information ?? Observation: • • User searches within some context. User context reflects user’s intent. Context examples • Recent queries • Recently visited pages • Recent Tweets • etc….. Our focus – “Recent queries” • Accessible by search engines • 49% of searches are preceded by a different query in the same session • For simplicity, in this presentation we focus on the most recent query 5 Recent Query Use Approaches How to tackle this problem ??? Generalize Most Popular Completion Algorithm Cluster Similar Queries (Use of Techniques like HMMs) Nearest Completion Algorithm (Assumption:Context relevant to the query) Problem with this approach ?? • None of these previous studies took the user input (prefix) into account in the prediction • In 37% of the query pairs the former query has not occurred in the log before 6 Nearest Completion:Measure of Similarity Challenge: Choosing similarity measure that is correlated and universally applicable Completions must be semantically related to the context query. How to Overcome this challenge ?? Recommendation Based Query Expansion • Represent queries and contexts as high- dimensional term-weighted vectors and resort to cosine similarity. • Idea :rich representation of a query is constructed not from its search results, but rather from its recommendation tree. Recommendation Based Query • Outputs list of recommendations which are reformulations of previous query. • Problem occurs when none of the recommendation compatible with user query 7 Evaluation EVALUATION METRIC EVALUATION FRAMEWORK MRR-Mean Reciprocal Rank Evaluation Set • A standard IR measure to evaluate a retrieval of a specific object at a high rank • A random sample of (context, query) pairs from the AOL log wMRR-Weighted MRR Prediction Task • Weight sample pairs according to “prediction difficulty” (total # of candidate completions) • Given context query and first character of intended query predict intended query at as high rank as possible 8 Analysis NearestCompletion • Fails when the context is irrelevant (difficult to predict whether the context is relevant) MostPopularCompletion • Fails when the intended query is not highly popular (long tail) Solution: HybridCompletion • HybridCompletion: a combination of Most popular Completion and Nearest Completions • Its MRR is 31.5% higher than that of MostPopularCompletion. 9 Most Popular VS Nearest Completion Relevant Context:MRR of NearestCompletion (with depth-3 traversal) is higher in 48% than that of MostPopular-Completion. NearestCompletion becomes destructive, so its MRR is 19% lower than that of MostPopularCompletion. 10 How Hybrid Completion Works?? Produce Lists Standardize Hybrid Score is Convex Combination • Produce top k completions of NearestCompletion • Produce top k completions of MostPopularCompletion • 𝑍𝑠𝑐𝑜𝑟𝑒 𝑞 = (𝑠𝑐𝑜𝑟𝑒 𝑞 − 𝜇)/𝜎 • Two lists differ in units and scale • hybscore(q) = α · Zsimscore(q) + (1 − α) · Zpopscore(q) • 0≤ α ≤1 is a tunable parameter • Prior probability that context is relevant 11 MostPopular, Nearest, and Hybrid (2) HybridCompletion is shown to be at least as good as NearestCompletio n when the context is relevant and almost as good as MostPopularCom pletion when the context is irrelevant. Examples 13 Conclusion Query Auto Completion Context Sensitive-Query Auto Completion MostPopularCompletion Algorithm Based on Popular Queries(AOL Query Log) Nearest Completion Algorithm • • Relevent Context:Based on Users Recent Queries Recommendation Based Algorithm: Rich Query Representatin HybridCompletion Algorithm Convex Combination of NearestCompletion and MostPopular 14 Future • NearestCompletition: More effective session segmentation technique • Predicting the first query in a session still remains an open problem Use of Other Context Resources like Recently Visited Web Pages or Search History • Measure of Quality Evaluation should be more relaxed • Rich query representation may be further fine tuned. 15