Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem Behl Puneet Dokania Pritish Mohapatra C. V. Jawahar PASCAL VOC “Jumping” Classification Processing Features Training Classifier PASCAL VOC “Jumping” Classification Processing Features ✗ Training Classifier Think of a classifier !!! PASCAL VOC “Jumping” Ranking Processing Features ✗ Training Classifier Think of a classifier !!! Ranking vs. Classification Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Average Precision = 1 Ranking vs. Classification Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 0.92 Average Precision = 0.81 1 Accuracy = 0.67 1 Ranking vs. Classification Ranking is not the same as classification Average precision is not the same as accuracy Should we use 0-1 loss based classifiers? Or should we use AP loss based rankers? Outline • Optimizing Average Precision (AP-SVM) • High-Order Information • Missing Information Yue, Finley, Radlinski and Joachims, SIGIR 2007 Problem Formulation Single Input X Φ(xi) for all i P Φ(xk) for all k N Problem Formulation Single Output R +1 if i is better ranked than k Rik = -1 if k is better ranked than i Problem Formulation Scoring Function si(w) = wTΦ(xi) for all i P sk(w) = wTΦ(xk) for all k N S(X,R;w) = Σi P Σk N Rik(si(w) - sk(w)) Ranking at Test-Time R(w) = maxR S(X,R;w) x x x x x x x x 1 2 3 4 5 6 7 8 Sort samples according to individual scores si(w) Learning Formulation Loss Function Δ(R*,R(w)) = 1 – AP of rank R(w) Non-convex Parameter cannot be regularized Learning Formulation Upper Bound of Loss Function S(X,R(w);w) + Δ(R*,R(w)) - S(X,R(w);w) Learning Formulation Upper Bound of Loss Function S(X,R(w);w) + Δ(R*,R(w)) - S(X,R*;w) Learning Formulation Upper Bound of Loss Function maxR S(X,R;w) + Δ(R*,R) Convex - S(X,R*;w) Parameter can be regularized minw ||w||2 + C ξ S(X,R;w) + Δ(R*,R) - S(X,R*;w) ≤ ξ, for all R Optimization for Learning Cutting Plane Computation maxR S(X,R;w) + Δ(R*,R) x x x x x x x x 1 2 3 4 5 6 7 8 Sort positive samples according to scores si(w) Sort negative samples according to scores sk(w) Find best rank of each negative sample independently Optimization for Learning Training Time Cutting Plane Computation AP 5x slower 0-1 Slightly faster AP Mohapatra, Jawahar and Kumar, NIPS 2014 Experiments Images Classes PASCAL VOC 2011 Jumping Phoning 10 ranking tasks Playing Instrument Reading Poselets Features Riding Bike Riding Horse Running Taking Photo Using Computer Walking Cross-validation AP-SVM vs. SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 8 classes, tied in 2 classes AP-SVM vs. SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP AP-SVM is statistically better in 3 classes SVM is statistically better in 0 classes Outline • Optimizing Average Precision • High-Order Information (HOAP-SVM) • Missing Information Dokania, Behl, Jawahar and Kumar, ECCV 2014 High-Order Information • People perform similar actions • People strike similar poses • Objects are of same/similar sizes • “Friends” have similar habits • How can we use them for ranking? classification Problem Formulation x Input x = {x1,x2,x3} Output y = {-1,+1}3 Ψ(x,y) = Ψ1(x,y) Unary Features Ψ2(x,y) Pairwise Features Learning Formulation x Input x = {x1,x2,x3} Output y = {-1,+1}3 Δ(y*,y) = Fraction of incorrectly classified persons Optimization for Learning x Input x = {x1,x2,x3} Output y = {-1,+1}3 maxy wTΨ(x,y) + Δ(y*,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search Classification x Input x = {x1,x2,x3} Output y = {-1,+1}3 maxy wTΨ(x,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search Ranking? x Input x = {x1,x2,x3} Output y = {-1,+1}3 Use difference of max-marginals Max-Marginal for Positive Class x Input x = {x1,x2,x3} Output y = {-1,+1}3 Best possible score when person i is positive mm+(i;w) = maxy,yi=+1 wTΨ(x,y) Convex in w Max-Marginal for Negative Class x Input x = {x1,x2,x3} Output y = {-1,+1}3 Best possible score when person i is negative mm-(i;w) = maxy,yi=-1 wTΨ(x,y) Convex in w Ranking x Input x = {x1,x2,x3} Output y = {-1,+1}3 Use difference of max-marginals HOB-SVM si(w) = mm+(i;w) – mm-(i;w) Difference-of-Convex in w Ranking Why not optimize AP directly? High Order AP-SVM HOAP-SVM si(w) = mm+(i;w) – mm-(i;w) Problem Formulation Single Input X Φ(xi) for all i P Φ(xk) for all k N Problem Formulation Single Input R +1 if i is better ranked than k Rik = -1 if k is better ranked than i Problem Formulation Scoring Function si(w) = mm+(i;w) – mm-(i;w) for all i P sk(w) = mm+(k;w) – mm-(k;w) for all k N S(X,R;w) = Σi P Σk N Rik(si(w) - sk(w)) Ranking at Test-Time R(w) = maxR S(X,R;w) x x x x x x x x 1 2 3 4 5 6 7 8 Sort samples according to individual scores si(w) Learning Formulation Loss Function Δ(R*,R(w)) = 1 – AP of rank R(w) Learning Formulation Upper Bound of Loss Function minw ||w||2 + C ξ S(X,R;w) + Δ(R*,R) - S(X,R*;w) ≤ ξ, for all R Optimization for Learning Difference-of-convex program Very efficient CCCP Linearization step by Dynamic Graph Cuts Kohli and Torr, ECCV 2006 Update step equivalent to AP-SVM Experiments Images Classes PASCAL VOC 2011 Jumping Phoning 10 ranking tasks Playing Instrument Reading Poselets Features Riding Bike Riding Horse Running Taking Photo Using Computer Walking Cross-validation HOB-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 4, worse in 3 and tied in 3 classes HOB-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP HOB-SVM is statistically better in 0 classes AP-SVM is statistically better in 0 classes HOAP-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 7, worse in 2 and tied in 1 class HOAP-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP HOAP-SVM is statistically better in 4 classes AP-SVM is statistically better in 0 classes Outline • Optimizing Average Precision • High-Order Information • Missing Information (Latent-AP-SVM) Behl, Jawahar and Kumar, CVPR 2014 Fully Supervised Learning Weakly Supervised Learning Rank images by relevance to ‘jumping’ Two Approaches • Use Latent Structured SVM with AP loss – Unintuitive Prediction – Loose Upper Bound on Loss – NP-hard Optimization for Cutting Planes • Carefully design a Latent-AP-SVM – Intuitive Prediction – Tight Upper Bound on Loss – Optimal Efficient Cutting Plane Computation Results Questions? Code + Data Available