Poster - M. Pawan Kumar

advertisement
Modeling Latent Variable Uncertainty for Loss-based Learning
M. Pawan Kumar
Ben Packer
Daphne Koller
http://cvc.centrale-ponts.fr
http://dags.stanford.edu
Aim: Accurate parameter estimation Objective
Minimize Rao’s Dissimilarity Coefficient
from weakly supervised datasets
h
Latent Variable Models
x : input y : output
Values known during training
y
h : latent variables (LV)
Values unknown during training
h
x y = “Deer”
Object Detection
• Predict the image class y
• Predict the object location h
Latent SVM
Pθ(hi|yi,xi)
x
Linear prediction rule with parameter w
Test: maxy,h wTΨ(x,y,h)
Train: minw Σi Δ(yi,yi(w),hi(w))
Ψ: joint feature vector Δ: loss function; measures risk
✔ Employs a user-defined loss function (with restricted form)
✖ Does not model uncertainty in LV
The EM Algorithm
Test: maxy,h θTΨ(x,y,h)
TΨ(x,y,h))
exp(θ
Pθ(y,h|x) =
Z
minθ,wΣi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) Encourages prediction
with correct output and No object
scale variation
-βΣh,h’ Δ(yi,h,yi,h’)Pθ(h|yi,xi)Pθ(h’|yi,xi) high probability LV
Property 1
If loss function is independent of h, we recover latent SVM
Optimization
Pθ(hi|yi,xi)
hi
Pw(yi,hi|xi)
(yi,hi)
Models predicted
output and LV
(yi(w),hi(w))
Ideally, the two learned distributions should match exactly
Limited representational power prevents exact match
Latent Space =
All possible
pixel positions
Average 0/1 Test Loss
Block coordinate descent over (w,θ)
y
Fix delta distribution; Optimize conditional distribution
Case I: Delta distribution predicts correct output, y = y(w)
hi(w)
hi(w)
Average Overlap Test Loss
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.6
0.5
0.4
LSVM
Our
Our
0.2
0
Fold 2
Fold 3
Fold 4
Fold 5
Fold 1
Statistically Significant
Increase the probability of the predicted LV h(w)
Case II: Delta distribution predicts incorrect output, y ≠ y(w)
LSVM
0.3
0.1
Fold 1
Fold 2
Fold 3
Fold 4
Fold 5
Not Statistically Significant
Action
Detection
Poselet Features
Increase the diversity of the conditional distribution
Fix conditional distribution; Optimize delta distribution
(yi,hi(w))
(yi,hi(w))
Two distributions for two tasks
Models uncertainty
in LV
Object
Detection
HOG Features
Train: maxθ Σi Σhi log (Pθ(yi,hi|xi))
✔ Models uncertainty in LV
✖ Does not model accuracy of LV prediction
✖ Does not employ a user-defined loss function
Overview
Pw(yi,hi|xi)
Results Known ground-truth LV values at test time
Predict correct output and high probability LV
Difference-of-convex upper bound of expected loss
Efficient concave-convex procedure similar to latent SVM
Property 2
If Pθ is modeled as delta, we recover iterative latent SVM
Code available at
http://cvc.centrale-ponts.fr/personnel/pawan
Large object
scale variation
Latent Space =
Top k person
detections
Average 0/1 Test Loss
Average Overlap Test Loss
1.2
1
0.8
LSVM
0.6
Our
0.4
0.2
0
Fold 1
Fold 2
Fold 3
Fold 4
Fold 5
Statistically Significant
0.74
0.73
0.72
0.71
0.7
0.69
0.68
0.67
0.66
0.65
0.64
0.63
LSVM
Our
Fold 1
Fold 2
Fold 3
Fold 4
Fold 5
Statistically Significant
Download