Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims Presentation by Dinesh Bhirud bhiru002@d.umn.edu Introduction • The paper evaluates the robustness of learning to rank documents based on Implicit feedback. • What is implicit feedback? – Relevance feedback obtained from search engine log files – Easier to collect large amount of such training data as against explicitly collecting relevance feedback. Osmot • Osmot – Search engine developed at Cornell University based on Implicit Feedback • Name Osmot comes from the word “osmosis” – learning from the users by osmosis • Query Chains – Sequence of reformulated queries. – Osmot learns ranked retrieval function by observing query chains and monitoring user clicks. High Level Block Diagram Data generation User behavior simulation (based on original ranking fucntion) Preference generation SVM Learning User behavior simulatoin (based on learned ranking function) Data Generation • Set of W words are chosen, word frequencies obeying a Ziph’s law • T topics are picked by picking N words/topic uniformly from W. • Each document d is generated as – Pick kd binomially from [0,T] – Repeat kd times • Pick topic t • Pick L/kd words from topic t. Relevance • 3 kinds of relevance – Relevance with respect to topic • Can be measured/known because document collection and topics are synthetic • Used for evaluating the ranking function. – Relevance with respect to query • Actual relevance score of a document with respect to a query • Used to rank documents – Observed relevance • Relevance of a document as judged by the user seeing only the abstract. • Used to simulate user behavior. User behavior parameters • Noise – Accuracy of user’s relevance estimate – Affects observed relevance. (obsRel) – obsRel is drawn from an incomplete Beta distribution where α gives noise level and β is selected so that mode is at rel(d,q) • Threshold – User selectivity over results (rT) • Patience – Number of results user looks at before giving up (rP) • Reformulation – How likely is the user to reformulate query(Preform) User Behavior Model While question T is unanswered 1.1 Generate query q (Let d1,d2..,dn be results for q) 1.2 Start with document 1 ie i = 1 1.3 while patience (Rp) > 0 1.3.1 if obsRel(di,q) > rT 1.3.1.1 if obsRel(di+1, q) > obsRel(di,q) + c then continue looking further in the list 1.3.1.2 else di is a good document, click on it. If rel(di,T) is 1, user is DONE Decrease patience Rp. 1.3.2 else Decrease patience Rp Rp = Rp - (rT – obsRel(di,q)) 1.3. 3 Set i = i + 1 1.4 With probability (1 – Preform) , user gives up. User Preference Model • Based on the clickthrough log files, users’ preferences for documents given query q can be found. • Clickthrough logs generated by simulating users. • From preference, features values are calculated. Feedback Strategies Single Query Strategy • Click >q Skip Above – For query q, if document di is clicked, di is preferred over all dj, j < i. • Click 1st >q No-Click 2nd – For query q, if document 1 is clicked, it is preferred over the 2nd document in the list. Feedback Strategies 2-Query Strategy 1 • This strategy uses 2 queries in a query chain, but document rankings only for the later query. • Given queries q' and q in a query chain • Click >q' Skip Above – For query q', if document di is clicked in query q, di is preferred over all dj, j < i • Click 1st > q' No-Click 2nd – For query q', if document 1 is clicked, it is preferred over the 2nd document in the list for q Feedback Strategies 2-Query Strategy 2 • This strategy uses 2 queries in a query chain, and document rankings for both used. • Given queries q' and q in a query chain • Click >q' Skip Earlier Query – For query q', if document di is clicked in query q, di is preferred over seen documents in query previous query. • Click > q' Top two earlier Query – If no document clicked for query q', then di preferred over top two in previous query. Example Q1 Q2 D1 D4 D2 D5 D3 D6 Preferences • D2 >q1 D1 • D4 >q2 D5 • D4 >q1 D5 • D4 >q1 D1 • D4 >q1 D3 Features • Document di would be mapped to feature vector Ø(di , q) with respect to query q. • 2 types of features defined – Rank Features Ørank(di , q) – Term/Document Features Øterm(di , q) Ørank(di , q) Ø(di , q) Ø term (di , q) Rank Features • Rank features allow representation of ranking given by the existing static retrieval function. • Used a simple TFIDF weighted cosine similarity metric (rel0) • 28 rank features used for ranks 1,2,..,10,15,20,…100. • Set to 1 if clicked document is at or above specified rank. Term Features • Allows representation of fine grained relationship between query terms and documents. • If for query q, document d is clicked, then for each word w q , Øterm(d , w) 1 • Forms a sparse feature vector, as only very few words are included in query. Learning • Retrieval Function rel(di, q) defined as rel(di, q) w Ø(di , q) where w is the weight vector. • Intuitively, weight vector assigns weight to each feature identified. • Task of learning a ranking function is reduced to the task of learning an optimal weight vector. How does waffect ranking? • Points are ordered by their projections onto w • For w1 the ordering will be 1,2,3,4. • For w2 the ordering will be 2,3,1,4. • Weight vector w needs to be learnt that will minimize number of discordant rankings. Learning Problem Learning problem can be formalized as follows • Find weight vector w such that maximum of following inequalities fulfilled. (di, dj ) r1 such that r1(di, q) r1(dj , q) then w Ø(di , q) w Ø(dj , q) • Without using slack variables, this is NP-hard problem. SVM Learning • Equivalent optimization problem would be Minimize 1 w w C ij 2 ij Subject to (q, i, j ) : w Ø(di , q) w Ø(dj , q) 1 - rearranging which we get constraint (q, i, j ) : w (Ø(di , q) - Ø(dj , q)) 1 - ij ij and ij : ij 0 and i [1,28] : wi 0.01 Re-ranking using the learnt model • SVM-Light package is used. • Model provides y values for all support vectors. • User behavior is again simulated, this time using the learnt ranking function. • How does reranking work? – First, a ranked list of documents is obtained using the original ranking function. – This list is re-ordered, using the weights of each feature obtained from the learnt model. Experiments • Experiments done to study the behavior of the search engine by varying parameters like – Noise in users’ relevance judgement – Ambiguity of words in topics and queries – Threshold value which user considers good document – Users’ trust in ranking – Users’ probability of reformulation of query. Results - Noise Ranking Function performance at various noise levels 100 Expected Relevance 95 90 85 80 75 70 0 1 2 3 4 Learning Iterations Noise Low Noise Medium Noise High Maximum Noise 5 6 Noise – My experiment • Did implementation for extracting preferences and encoding them in features. Ranking function performance at various noise levels (My implementation) 86 Expected Relevance 84 82 80 78 Noise Low 76 Noise Medium 74 Noise High 72 70 0 1 2 3 Learning Iterations 4 5 6 Topic and Word Ambiguity Ranking function performance at different levels of word ambiguity 100 Expected Relevance 95 90 No ambiguous words 85 Words somewhat ambiguouos Words more ambiguous 80 75 70 0 1 2 3 Learning Iterations 4 5 6 Probability of user reformulating query 100 95 Expected Relevance 90 25% give up probability 85 50% Give up probabilty 75% Give up probability 100% Give up probability 80 75 70 0 1 2 3 Learning Iterations 4 5 6 Thank You