Modeling Latent Variable Uncertainty for Loss-based Learning M. Pawan Kumar Ben Packer Daphne Koller http://cvc.centrale-ponts.fr http://dags.stanford.edu Aim: Accurate parameter estimation Objective Minimize Rao’s Dissimilarity Coefficient from weakly supervised datasets h Latent Variable Models x : input y : output Values known during training y h : latent variables (LV) Values unknown during training h x y = “Deer” Object Detection • Predict the image class y • Predict the object location h Latent SVM Pθ(hi|yi,xi) x Linear prediction rule with parameter w Test: maxy,h wTΨ(x,y,h) Train: minw Σi Δ(yi,yi(w),hi(w)) Ψ: joint feature vector Δ: loss function; measures risk ✔ Employs a user-defined loss function (with restricted form) ✖ Does not model uncertainty in LV The EM Algorithm Test: maxy,h θTΨ(x,y,h) TΨ(x,y,h)) exp(θ Pθ(y,h|x) = Z minθ,wΣi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) Encourages prediction with correct output and No object scale variation -βΣh,h’ Δ(yi,h,yi,h’)Pθ(h|yi,xi)Pθ(h’|yi,xi) high probability LV Property 1 If loss function is independent of h, we recover latent SVM Optimization Pθ(hi|yi,xi) hi Pw(yi,hi|xi) (yi,hi) Models predicted output and LV (yi(w),hi(w)) Ideally, the two learned distributions should match exactly Limited representational power prevents exact match Latent Space = All possible pixel positions Average 0/1 Test Loss Block coordinate descent over (w,θ) y Fix delta distribution; Optimize conditional distribution Case I: Delta distribution predicts correct output, y = y(w) hi(w) hi(w) Average Overlap Test Loss 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.6 0.5 0.4 LSVM Our Our 0.2 0 Fold 2 Fold 3 Fold 4 Fold 5 Fold 1 Statistically Significant Increase the probability of the predicted LV h(w) Case II: Delta distribution predicts incorrect output, y ≠ y(w) LSVM 0.3 0.1 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Not Statistically Significant Action Detection Poselet Features Increase the diversity of the conditional distribution Fix conditional distribution; Optimize delta distribution (yi,hi(w)) (yi,hi(w)) Two distributions for two tasks Models uncertainty in LV Object Detection HOG Features Train: maxθ Σi Σhi log (Pθ(yi,hi|xi)) ✔ Models uncertainty in LV ✖ Does not model accuracy of LV prediction ✖ Does not employ a user-defined loss function Overview Pw(yi,hi|xi) Results Known ground-truth LV values at test time Predict correct output and high probability LV Difference-of-convex upper bound of expected loss Efficient concave-convex procedure similar to latent SVM Property 2 If Pθ is modeled as delta, we recover iterative latent SVM Code available at http://cvc.centrale-ponts.fr/personnel/pawan Large object scale variation Latent Space = Top k person detections Average 0/1 Test Loss Average Overlap Test Loss 1.2 1 0.8 LSVM 0.6 Our 0.4 0.2 0 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Statistically Significant 0.74 0.73 0.72 0.71 0.7 0.69 0.68 0.67 0.66 0.65 0.64 0.63 LSVM Our Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Statistically Significant