Learning to rank Web Science 2013

Learning to rank
Web Science 2013
Jaspreet Singh
• Optimizing search engines using click through
data. Thorsten Joachims, SIGKDD 2002.
• Large Scale learning to rank. D. Sculley.
Machine Learning
Optimizing search engines using click
through data.
• Explicit feedback vs Click through data
• Click through data as triplets (q,r,c)
q is the query, r is the ranking, c is the list of links the user has clicked on
Assuming that the user scanned
the ranking from top to bottom,
he must have observed link 2
before clicking on 3, making a
decision to not click on it.
Learning of retrieval functions
• Exact ordering of documents close to impossible
• Measure similarity between optimal ordering and given ordering using
average precision (Kendall’s tau)
• Maximizing Kendall’s tau is equivalent to reducing the average rank.
• For a fixed but unknown distribution Pr(q,r∗) of queries and target
rankings on a document collection D with m documents, the goal is to
learn a retrieval function f(q) for which the expected Kendall’s τ is maximal
• The above equation is equivalent to a risk function where – τ is the loss.
• Empirical risk minimization principle states that the learning algorithm
should choose a hypothesis which minimizes the empirical risk
Rank SVM
• Is it possible to design an algorithm and a family of ranking functions F so
that finding the function f belonging to F maximizing τ is efficient and that
this function generalizes well beyond the training data.
• Usage of weight vectors to adjust rank.
• Instead of maximizing τ directly, it is equivalent to minimize the number of
discordant pairs in the calculation of τ. This is equivalent to finding the
weight vector so that the maximum number of the following inequalities is
Rank SVM
• NP hard problem similar to SVM classification
• Use some regularization parameters to bound and approximate the result.
• SVM light
• Meta search engine used to collect results from the best search engines
and combine them into a single list by union.
• To be able to compare the quality of different retrieval functions, the key
idea is to present two rankings at the same time. Then measure which
ranking has more clicks.
Ranking A
1. D1
2. D2
3. D3
Ranking B
1. D4
2. D5
3. D6
1. D1
2. D4
3. D2
4. D5
5. D3
6. D6
• Offline experiment: verify that the Ranking SVM can indeed learn a
retrieval function maximizing Kendall’s tau on partial preference feedback.
• Split the collected queries into training and test set and then train the
classifier using SVM light.
• Result: Ranking SVM can learn regularities in the preferences. More the
training queries lesser the error.
• Online experiment: verifies that the learned retrieval function does
improve retrieval quality as desired.
• The learned retrieval function is compared against : Google, MSNSearch,
• Result: More links from the learned ranking clicked on.
• The key insight is that such click through data can provide training data in
the form of relative preferences
• The experimental results show that the Ranking SVM can successfully
learn an improved retrieval function from click through data. Without any
explicit feedback or manual parameter tuning, it has automatically
adapted to the particular preferences of a group of 20 users(112 queries).
• There is a trade-off between the amount of training data (ie. large group)
and maximum homogeneity (ie. single user)
• Optimizing search engines using click through
data. Thorsten Joachims, SIGKDD 2002.
• Large Scale learning to rank. D. Sculley.
Machine Learning
Large scale learning to rank
• Pair-wise learning to rank methods such as Rank SVM give good
performance, but suffer from the computational burden of optimizing an
objective defined over O(n2) possible pairs for data sets with n examples.
• Removal of super-linear dependence on training set size by sampling pairs
from an implicit pair-wise expansion and applying efficient stochastic
gradient descent learners for approximate SVMs
• The main approach of this paper is to adapt the pair-wise learning to rank
problem into the stochastic gradient descent framework
Optimization and stochastic gradient
• The paper is restricted to solving the classic Rank SVM optimization
problem, first posed by Joachims: Minimize the hinge loss.
• Stochastic gradient descent is a gradient descent optimization method for
minimizing an objective function that is written as a sum of differentiable
• generalization ability of stochastic gradient descent relies only on the
number of stochastic steps taken, not the size of the data set
Indexed Sampling - GetRandomnPair
• 2 level nested hashmap
• First level : query is key
• Second level: rank is key
Stochastic gradient descent
• Stochastic implies sampling
• Gradient descent is a step wise process to find the local minimum of a
• Rank SVM has a hinge loss function. The hinge loss is used for "maximummargin" classification.
• Hence we need to minimize this function and get a good classifier.
• The hinge loss is a convex function, so many of the usual convex
optimizers used in machine learning can work with it. Hence we can use
• Depending on how they perform updates to the weight vector there are
many SGD variations.
LETOR Experiment and Results
• LETOR: Learning to Rank for Information Retrieval
• Ranking performance: comparable if not better
• Training speed: 100 times faster
• Click through data can be used as partial relevance feedback
• We can learn a retrieval function that can improve mean average precision
• Learning retrieval functions can be done on a large scale using stochastic
gradient descent.
Machine Learning