Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims

advertisement
Evaluating the Robustness of
Learning from Implicit Feedback
Filip Radlinski
Thorsten Joachims
Presentation by
Dinesh Bhirud
bhiru002@d.umn.edu
Introduction
• The paper evaluates the robustness of
learning to rank documents based on Implicit
feedback.
• What is implicit feedback?
– Relevance feedback obtained from search engine
log files
– Easier to collect large amount of such training data
as against explicitly collecting relevance feedback.
Osmot
• Osmot – Search engine developed at Cornell
University based on Implicit Feedback
• Name Osmot comes from the word “osmosis”
– learning from the users by osmosis
• Query Chains – Sequence of reformulated
queries.
– Osmot learns ranked retrieval function by
observing query chains and monitoring user clicks.
High Level Block Diagram
Data
generation
User
behavior
simulation
(based on
original
ranking
fucntion)
Preference
generation
SVM
Learning
User
behavior
simulatoin
(based on
learned
ranking
function)
Data Generation
• Set of W words are chosen, word frequencies
obeying a Ziph’s law
• T topics are picked by picking N words/topic
uniformly from W.
• Each document d is generated as
– Pick kd binomially from [0,T]
– Repeat kd times
• Pick topic t
• Pick L/kd words from topic t.
Relevance
• 3 kinds of relevance
– Relevance with respect to topic
• Can be measured/known because document collection and
topics are synthetic
• Used for evaluating the ranking function.
– Relevance with respect to query
• Actual relevance score of a document with respect to a
query
• Used to rank documents
– Observed relevance
• Relevance of a document as judged by the user seeing only
the abstract.
• Used to simulate user behavior.
User behavior parameters
• Noise – Accuracy of user’s relevance estimate
– Affects observed relevance. (obsRel)
– obsRel is drawn from an incomplete Beta distribution
where α gives noise level and β is selected so that
mode is at rel(d,q)
• Threshold – User selectivity over results (rT)
• Patience – Number of results user looks at before
giving up (rP)
• Reformulation – How likely is the user to
reformulate query(Preform)
User Behavior Model
While question T is unanswered
1.1 Generate query q (Let d1,d2..,dn be results for q)
1.2 Start with document 1 ie i = 1
1.3 while patience (Rp) > 0
1.3.1 if obsRel(di,q) > rT
1.3.1.1 if obsRel(di+1, q) > obsRel(di,q) + c then
continue looking further in the list
1.3.1.2 else
di is a good document, click on it.
If rel(di,T) is 1, user is DONE
Decrease patience Rp.
1.3.2 else
Decrease patience Rp
Rp = Rp - (rT – obsRel(di,q))
1.3. 3 Set i = i + 1
1.4 With probability (1 – Preform) , user gives up.
User Preference Model
• Based on the clickthrough log files, users’
preferences for documents given query q can
be found.
• Clickthrough logs generated by simulating
users.
• From preference, features values are
calculated.
Feedback Strategies
Single Query Strategy
• Click >q Skip Above
– For query q, if document di is clicked, di is
preferred over all dj, j < i.
• Click 1st >q No-Click 2nd
– For query q, if document 1 is clicked, it is
preferred over the 2nd document in the list.
Feedback Strategies
2-Query Strategy 1
• This strategy uses 2 queries in a query chain, but
document rankings only for the later query.
• Given queries q' and q in a query chain
• Click >q' Skip Above
– For query q', if document di is clicked in query q, di is
preferred over all dj, j < i
• Click 1st > q' No-Click 2nd
– For query q', if document 1 is clicked, it is preferred
over the 2nd document in the list for q
Feedback Strategies
2-Query Strategy 2
• This strategy uses 2 queries in a query chain, and
document rankings for both used.
• Given queries q' and q in a query chain
• Click >q' Skip Earlier Query
– For query q', if document di is clicked in query q, di is
preferred over seen documents in query previous
query.
• Click > q' Top two earlier Query
– If no document clicked for query q', then di preferred
over top two in previous query.
Example
Q1
Q2
D1
D4
D2
D5
D3
D6
Preferences
• D2 >q1 D1
• D4 >q2 D5
• D4 >q1 D5
• D4 >q1 D1
• D4 >q1 D3
Features
• Document di would be mapped to feature
vector Ø(di , q) with respect to query q.
• 2 types of features defined
– Rank Features Ørank(di , q)
– Term/Document Features Øterm(di , q)
Ørank(di , q) 
Ø(di , q)  

Ø
term
(di
,
q)


Rank Features
• Rank features allow representation of ranking
given by the existing static retrieval function.
• Used a simple TFIDF weighted cosine similarity
metric (rel0)
• 28 rank features used for ranks
1,2,..,10,15,20,…100.
• Set to 1 if clicked document is at or above
specified rank.
Term Features
• Allows representation of fine grained
relationship between query terms and
documents.
• If for query q, document d is clicked, then for
each word w  q , Øterm(d , w)  1
• Forms a sparse feature vector, as only very few
words are included in query.
Learning
• Retrieval Function rel(di, q) defined as

rel(di, q)  w  Ø(di , q)

where w is the weight vector.
• Intuitively, weight vector assigns weight to each
feature identified.
• Task of learning a ranking function is reduced to
the task of learning an optimal weight vector.
How

does waffect
ranking?
• Points are ordered by 
their projections onto w

• For w1 the ordering will
be 1,2,3,4.

• For w2 the ordering will
be 2,3,1,4.

• Weight vector w needs
to be learnt that will
minimize number of
discordant rankings.
Learning Problem
Learning problem can be formalized as follows

• Find weight vector w such that
maximum of following inequalities fulfilled.
(di, dj )  r1 such that r1(di, q)  r1(dj , q)


then w  Ø(di , q)  w  Ø(dj , q)
• Without using slack variables, this is NP-hard
problem.
SVM Learning
• Equivalent optimization problem would be
Minimize
1  
w  w  C  ij
2
ij

Subject to (q, i, j ) : w  Ø(di , q)  w
 Ø(dj , q)  1 - 
rearranging which
we get constraint

(q, i, j ) : w  (Ø(di , q) - Ø(dj , q))  1 - 
ij
ij
and ij : ij  0
and i [1,28] : wi  0.01
Re-ranking using the learnt model
• SVM-Light package is used.
• Model provides   y values for all support
vectors.
• User behavior is again simulated, this time using
the learnt ranking function.
• How does reranking work?
– First, a ranked list of documents is obtained using the
original ranking function.
– This list is re-ordered, using the weights of each
feature obtained from the learnt model.
Experiments
• Experiments done to study the behavior of the
search engine by varying parameters like
– Noise in users’ relevance judgement
– Ambiguity of words in topics and queries
– Threshold value which user considers good
document
– Users’ trust in ranking
– Users’ probability of reformulation of query.
Results - Noise
Ranking Function performance at various noise levels
100
Expected Relevance
95
90
85
80
75
70
0
1
2
3
4
Learning Iterations
Noise Low
Noise Medium
Noise High
Maximum Noise
5
6
Noise – My experiment
• Did implementation for extracting preferences
and encoding them in features.
Ranking function performance at various noise levels
(My implementation)
86
Expected Relevance
84
82
80
78
Noise Low
76
Noise Medium
74
Noise High
72
70
0
1
2
3
Learning Iterations
4
5
6
Topic and Word Ambiguity
Ranking function performance at different levels of word ambiguity
100
Expected Relevance
95
90
No ambiguous words
85
Words somewhat ambiguouos
Words more ambiguous
80
75
70
0
1
2
3
Learning Iterations
4
5
6
Probability of user reformulating query
100
95
Expected Relevance
90
25% give up probability
85
50% Give up probabilty
75% Give up probability
100% Give up probability
80
75
70
0
1
2
3
Learning Iterations
4
5
6
Thank You 
Download