Mixture model on query

advertisement
Exploring the Query-Flow Graph
with a Mixture Model for
Query Recommendation
Lu Bai, Jiafeng Guo, Xueqi Cheng,
Xiubo Geng, Pan Du
Institute of Computing Technology , CAS
Outline
•
•
•
•
Introduction
Our approach
Experimental results
Conclusion & Future work
Introduction
• Query recommendation
– Generated from web query log
– Different types of information
are considered, including
search results, clickthrough
data, search sessions.
Introduction
• Recently, query-flow graph was introduced
into query recommendation.
Yahoo  360
Yahoo  Yahoo mail
360  Xbox 360  kinect
Kinect  Xbox 720
1
1
1
1
Yahoo messenger  Yahoo
Yahoo mail  Yahoo messenger
360  Xbox 360  Xbox 720
apple  Yahoo
apple  apple tree
1
1
1
2
1
1
Introduction
• Traditionally, personalized random walk over queryflow graph was used for recommendation.
• Dangling queries
1
– No out links
– Nearly 9% of whole queries
• Ambiguous queries
– Mixed recommendation
1
1
1
1
2
1
1
1
1
• Hard to read
– Dominant recommendation
• Cannot satisfy different needs
Query = apple
Query = 360
Yahoo
Xbox 360
apple tree
Xbox 720
Yahoo mail
Kinect
Our Work
• Explore query-flow graph for better
recommendation
– Apply a novel mixture model over query-flow
graph to learn the intents of queries.
– Perform an intent-biased random walk on the
query-flow graph for recommendation.
Probabilistic model of generating
query-flow graph
• Model the generation of the query-flow graph
with a novel mixture model
• Assumptions
– Queries are triggered by query intents.
– Consecutive queries in one search session are
from the same intent.
Probabilistic model of generating
query-flow graph
• Process of generating a
directed edge e

ij
– Draw an intent indicator g  r
from the multinomial
distribution  .
– Draw query nodes q i , q j from
the same multinomial intent
distribution  r , respectively.
– Draw the directed edge e
from a binomial distribution 

ij

ij

ij , r
N
Likelihood function
P r( G |  ,  ,  ) 

i 1
 K
    r  r , i  r , j
j : jC ( i )  r 1


ij , r 

w ij
Probabilistic model of generating
query-flow graph
• EM algorithm is used to estimate parameters
– E step
q ij , r 
 r  r , i  r , j ij , r
K

r 1
r
 r , i  r , j ij , r
– M step
N
 
r 
i 1
K
j : jC ( i )
N
 
r 1
w ij q ij , r
i 1
j : j C ( i )
w ij q ij , r
 ij , r 
w ij q ij , r
w ij q ij , r  w ji q ji , r

 r ,i 
j : jC ( i )
N

 
i 1
w ij q ij , r 

j : jC ( i )

k :i C ( k )
w ij q ij , r 
w ki q ki , r

w ki q ki , r 
k :i  C ( k )


Intent-biased random walk
• Based on the learned query intents, we apply intentbiased random walk for query recommendation.
transition probability matrix
Ai , r  (1   ) M   1 Pi , r
row normalized weight matrix
preference vector
Pi , r   ·e  (1   )   r , 𝜌 ∈ [0, 1]
T
i
All entries are zeroes, except that the i-th is 1
A row vector of query distribution of intent r
– Dangling queries: back off to its intents
– Ambiguous queries: recommend under the each intent
Experiments
• Data Set
– A 3-month query log generated from a
commercial search engine.
– Sessions are split by 30 minutes.
– No stemming and no stop words removing.
– The biggest connected graph is extracted for
experiments, which is consisted of 16,980 queries
and 51,214 edges.
Experiments
• Learning performance on different intent
number.
Experiments
• Learned query intents:
lyrics
cars
poems
lyrics
bmw
poems
song lyrics
lexus
love poems
lyrics com
audi
poetry
a z lyrics
toyota
friendship poems
music lyrics
acura
famous love poems
azlyrics
nissan
love quotes
lyric
infiniti
sad poems
az lyrics
mercedes benz
quotes
rap lyrics
volvo
mother s day poems
country lyrics
mercedes
mothers day poems
Experiments
• Dangling query suggestion • Ambiguous query suggestion
Query = hilton
Query = yamaha motor
Baseline
Ours
Baseline
Ours
marriott
[hotel]
mapquest
yamaha
expedia
marriott
american idol
honda
holiday inn
holiday inn
yahoo mail
suzuki
hyatt
sheraton
hotel
hampton inn
home depot
kawasaki
mapquest
embassy suites
bank of america
yamaha motorcycles
hampton inn
hotels com
target
yamaha motorcycle
sheraton
[celebrity]
hilton com
paris hilton
hotels com
michelle wie
embassy suites
nicole richie
residence inn
jessica simpson
choice hotels
pamela anderson
marriot
daniel dipiero
hilton honors
richard hatch
Experiments
• Performance improvement based on user click
behaviors
Baseline method
Our approach
Average Hit Number
4.09
4.21(+2.9%)
Average Hit Score
0.598
0.652(+9.0%)
Average Score
0.181
0.194(+7.1%)
Conclusion and Future work
• conclusion
– We explore the query-flow graph with a novel
probabilistic mixture model for learning query
intents.
– An intent-biased random walk is introduced to
integrate the learned intents for recommendation.
• Future work
– Learn query intents with more auxiliary
information: clicks, URLs, words etc.
Download