Exploring the Query-Flow Graph with a Mixture Model for Query Recommendation Lu Bai, Jiafeng Guo, Xueqi Cheng, Xiubo Geng, Pan Du Institute of Computing Technology , CAS Outline • • • • Introduction Our approach Experimental results Conclusion & Future work Introduction • Query recommendation – Generated from web query log – Different types of information are considered, including search results, clickthrough data, search sessions. Introduction • Recently, query-flow graph was introduced into query recommendation. Yahoo 360 Yahoo Yahoo mail 360 Xbox 360 kinect Kinect Xbox 720 1 1 1 1 Yahoo messenger Yahoo Yahoo mail Yahoo messenger 360 Xbox 360 Xbox 720 apple Yahoo apple apple tree 1 1 1 2 1 1 Introduction • Traditionally, personalized random walk over queryflow graph was used for recommendation. • Dangling queries 1 – No out links – Nearly 9% of whole queries • Ambiguous queries – Mixed recommendation 1 1 1 1 2 1 1 1 1 • Hard to read – Dominant recommendation • Cannot satisfy different needs Query = apple Query = 360 Yahoo Xbox 360 apple tree Xbox 720 Yahoo mail Kinect Our Work • Explore query-flow graph for better recommendation – Apply a novel mixture model over query-flow graph to learn the intents of queries. – Perform an intent-biased random walk on the query-flow graph for recommendation. Probabilistic model of generating query-flow graph • Model the generation of the query-flow graph with a novel mixture model • Assumptions – Queries are triggered by query intents. – Consecutive queries in one search session are from the same intent. Probabilistic model of generating query-flow graph • Process of generating a directed edge e ij – Draw an intent indicator g r from the multinomial distribution . – Draw query nodes q i , q j from the same multinomial intent distribution r , respectively. – Draw the directed edge e from a binomial distribution ij ij ij , r N Likelihood function P r( G | , , ) i 1 K r r , i r , j j : jC ( i ) r 1 ij , r w ij Probabilistic model of generating query-flow graph • EM algorithm is used to estimate parameters – E step q ij , r r r , i r , j ij , r K r 1 r r , i r , j ij , r – M step N r i 1 K j : jC ( i ) N r 1 w ij q ij , r i 1 j : j C ( i ) w ij q ij , r ij , r w ij q ij , r w ij q ij , r w ji q ji , r r ,i j : jC ( i ) N i 1 w ij q ij , r j : jC ( i ) k :i C ( k ) w ij q ij , r w ki q ki , r w ki q ki , r k :i C ( k ) Intent-biased random walk • Based on the learned query intents, we apply intentbiased random walk for query recommendation. transition probability matrix Ai , r (1 ) M 1 Pi , r row normalized weight matrix preference vector Pi , r ·e (1 ) r , 𝜌 ∈ [0, 1] T i All entries are zeroes, except that the i-th is 1 A row vector of query distribution of intent r – Dangling queries: back off to its intents – Ambiguous queries: recommend under the each intent Experiments • Data Set – A 3-month query log generated from a commercial search engine. – Sessions are split by 30 minutes. – No stemming and no stop words removing. – The biggest connected graph is extracted for experiments, which is consisted of 16,980 queries and 51,214 edges. Experiments • Learning performance on different intent number. Experiments • Learned query intents: lyrics cars poems lyrics bmw poems song lyrics lexus love poems lyrics com audi poetry a z lyrics toyota friendship poems music lyrics acura famous love poems azlyrics nissan love quotes lyric infiniti sad poems az lyrics mercedes benz quotes rap lyrics volvo mother s day poems country lyrics mercedes mothers day poems Experiments • Dangling query suggestion • Ambiguous query suggestion Query = hilton Query = yamaha motor Baseline Ours Baseline Ours marriott [hotel] mapquest yamaha expedia marriott american idol honda holiday inn holiday inn yahoo mail suzuki hyatt sheraton hotel hampton inn home depot kawasaki mapquest embassy suites bank of america yamaha motorcycles hampton inn hotels com target yamaha motorcycle sheraton [celebrity] hilton com paris hilton hotels com michelle wie embassy suites nicole richie residence inn jessica simpson choice hotels pamela anderson marriot daniel dipiero hilton honors richard hatch Experiments • Performance improvement based on user click behaviors Baseline method Our approach Average Hit Number 4.09 4.21(+2.9%) Average Hit Score 0.598 0.652(+9.0%) Average Score 0.181 0.194(+7.1%) Conclusion and Future work • conclusion – We explore the query-flow graph with a novel probabilistic mixture model for learning query intents. – An intent-biased random walk is introduced to integrate the learned intents for recommendation. • Future work – Learn query intents with more auxiliary information: clicks, URLs, words etc.