Extracting Query Facets From Search Results 1 Date : 2013/08/20 Source : SIGIR’13 Authors : Weize Kong and James Allan Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang OUTLINE Introduction Approach Experiment Conclusion 2 What is query facet ? Definition : query facet a set of coordinate terms ( terms that share a semantic relationship by being grouped under a relationship ) a query facet (Mars rovers) 3 WHAT CAN WE DO WITH QUERY FACETS ? • • Flight type • Domestic • International Travel Class • First • Business • Economy 4 GOAL Extract query facets from the top-k web search results D={𝐷1 , 𝐷2 , … , 𝐷𝑘 } 5 OUTLINE Introduction Approach Step 1 : Extracting candidate lists Step 2 : Finding query facets from candidate lists Experiment Conclusion 6 PATTERN-BASED SEMANTIC CLASS EXTRACTION Reference from : Z. Dou, S. Hu, Y. Luo, R. Song, and J.-R. Wen. Finding dimensions for queries. For example : There are many Mars rovers, such as Curiosity, Opportunity, and Spirit. <ul> <li>first class</li> <li>business class</li> <li>economy class</li> </ul> 7 CANDIDATE LISTS All the list items are normalized by converting text to lowercase and removing non-alphanumeric characters. Then, we remove stopwords and duplicate items in each lists. Finally, we discard all lists that contain fewer than two item or more than 200 items. • • • The candidate lists are usually noisy, and could be non-relevant to the issued query. To address this problem, we use a supervised method. 8 NOTE : WHAT IS SUPERVISED METHOD EXAMPLE : LA-100 David Quiz 1 Quiz 2 Quiz 3 Final Exam A- B+ A- ? ? James B A LA-99 (Training Data) A Quiz 1 Quiz 2 Quiz 3 Final Exam John A B+ B- B Eric A+ A A+ A Peter B+ A- A+ A+ Steve A+ A+ B- B+ Mark C A+ B+ B Larry B+ B+ B+ A 9 NOTE : WHAT IS SUPERVISED LEARNING Training data (with features) Training Model New Data Model Prediction 10 OUTLINE Introduction Approach Step 1 : Extracting candidate lists Step 2 : Finding query facets from candidate lists Experiment Conclusion 11 PROBLEM DEFINITION Whether a list item is a facet term Whether a pair of list items is in one query facet 12 FEATURES 13 GRAPH 14 LOGISTIC-BASED CONDITIONAL PROBABILITY DISTRIBUTIONS 15 PARAMETER ESTIMATION Maximizing the log-likelihood using gradient descent. 16 INFERENCE The training is finished. The graphical model does not enforce the labeling to produce strict partitioning for facet terms. For example, when𝑍1,2 =1, 𝑍2,3 =1, we may have 𝑍1,3 = 0. 17 REPHRASE THE OPTIMIZATION PROBLEM The optimization target becomes , where is the set of all possible query facet sets that can be generated from L with the strict partitioning constraint. This optimization problem is NP-hard, which can be proved by a reduction from the Multiway Cut problem. Therefore, we propose two algorithms, QF-I and QF-J, to approximate the results. 18 QF-I 1. Select list items 𝑡𝑖 with 𝑃 𝑡𝑖 > 𝑤𝑚𝑖𝑛 as facet terms. 2. 19 QF-J 20 RANKING QUERY FACETS score for a query facet : score for a facet term : 21 OUTLINE Introduction Approach Step 1 : Extracting candidate lists Step 2 : Finding query facets from candidate lists Experiment Evaluation Experiment Result Conclusion 22 DATA Using Top 10 query facets generated by different models. 23 EVALUATION METRICS Using “∗” to distinguish between system generated results and human labeled results, which we used as ground truth. 24 CLUSTERING QUALITY 25 OVERALL QUALITY fp-nDCG is weighted by rp-nDCG is weighted by 26 OUTLINE Introduction Approach Step 1 : Extracting candidate lists Step 2 : Finding query facets from candidate lists Experiment Evaluation Experiment Result Conclusion 27 FACET TERMS 28 CLUSTERING FACET TERMS 29 OVERALL 30 OUTLINE Introduction Approach Step 1 : Extracting candidate lists Step 2 : Finding query facets from candidate lists Experiment Evaluation Experiment Result Conclusion 31 CONCLUSION We developed a supervised method based on a graphical model to recognize query facets from the noisy facet candidate lists extracted from the top ranked search results. We proposed two algorithms for approximate inference on the graphical model. We designed a new evaluation metric for this task to combine recall and precision of facet terms with grouping quality. Experimental results showed that the supervised method significantly outperforms other unsupervised methods, suggesting that query facet extraction can be effectively learned. 32