Loss-based Visual Learning with Weak Supervision M. Pawan Kumar Joint work with Pierre-Yves Baudin, Danny Goodman, Puneet Kumar, Nikos Paragios, Noura Azzabou, Pierre Carlier SPLENDID Self-Paced Learning for Exploiting Noisy, Diverse or Incomplete Data Machine Learning Weak Annotations Noisy Annotations Applications Computer Vision Nikos Paragios Equipe Galen INRIA Saclay 2012 2013 Daphne Koller DAGS Stanford Medical Imaging 2 Visits from INRIA to Stanford 1 Visit from Stanford to INRIA 3 Visits Planned ICML MICCAI Medical Image Segmentation MRI Acquisitions of the thigh Medical Image Segmentation MRI Acquisitions of the thigh Segments correspond to muscle groups Random Walks Segmentation Probabilistic segmentation algorithm Computationally efficient Interactive segmentation Automated shape prior driven segmentation L. Grady, 2006 L. Grady, 2005; Baudin et al., 2012 Random Walks Segmentation x: Medical acquisition y(i,s): Probability that voxel ‘i’ belongs to segment ‘s’ miny E(x,y) = yTL(x)y + wshape||y-y0||2 Positive semi-definite Laplacian matrix Shape prior on the segmentation Parameter of the RW algorithm Convex Hand-tuned Random Walks Segmentation Several Laplacians L(x) = Σα wαLα(x) Several shape and appearance priors Σβ wβ||y-yβ||2 Hand-tuning large number of parameters is onerous Parameter Estimation Learn the best parameters from training data Σα wαyTLα(x)y + Σβ wβ||y-yβ||2 Parameter Estimation Learn the best parameters from training data wTΨ(x,y) w is the set of all parameters Ψ(x,y) is the joint feature vector of input and output Outline • Parameter Estimation – Supervised Learning – Hard vs. Soft Segmentation – Mathematical Formulation • Optimization • Experiments • Related and Future Work in SPLENDID Supervised Learning Dataset of segmented fMRIs Sample xk, voxel i Probabilistic segmentation?? 1, s is ground-truth zk(i,s) = 0, otherwise Supervised Learning minw Σk ξk + λ||w||2 wTΨ(xk,ŷ) Energy of Segmentation wTΨ(xk,zk) ≥ Δ(ŷ,zk) - ξk Energy of Ground-truth Δ(ŷ,zk) = Fraction of incorrectly labeled voxels Structured-output Support Vector Machine Taskar et al., 2003; Tsochantardis et al., 2004 Supervised Learning Convex with several efficient algorithms No parameter provides ‘hard’ segmentation We only need a correct ‘soft’ probabilistic segmentation Outline • Parameter Estimation – Supervised Learning – Hard vs. Soft Segmentation – Mathematical Formulation • Optimization • Experiments • Related and Future Work in SPLENDID Hard vs. Soft Segmentation Hard segmentation zk Don’t require 0-1 probabilities Hard vs. Soft Segmentation Soft segmentation yk Compatible with zk Binarizing yk gives zk Hard vs. Soft Segmentation Soft segmentation yk Compatible with zk yk C(zk) Which yk to use?? yk provided by best parameter Unknown Outline • Parameter Estimation – Supervised Learning – Hard vs. Soft Segmentation – Mathematical Formulation • Optimization • Experiments • Related and Future Work in SPLENDID Learning with Hard Segmentation minw Σk ξk + λ||w||2 wTΨ(xk,ŷ) - wTΨ(xk,zk) ≥ Δ(ŷ,zk) - ξk Learning with Soft Segmentation minw Σk ξk + λ||w||2 wTΨ(xk,ŷ) - wTΨ(xk,yk) ≥ Δ(ŷ,zk) - ξk Learning with Soft Segmentation minw Σk ξk + λ||w||2 wTΨ(xk,ŷ) - minyk wTΨ(xk,yk) ≥ Δ(ŷ,zk) - ξk yk C(zk) Latent Support Vector Machine Smola et al., 2005; Felzenszwalb et al., 2008; Yu et al., 2009 Outline • Parameter Estimation • Optimization • Experiments • Related and Future Work in SPLENDID Latent SVM minw Σk ξk + λ||w||2 wTΨ(xk,ŷ) – minyk wTΨ(xk,yk) ≥ Δ(ŷ,zk) – ξk yk C(zk) Difference-of-convex problem Concave-Convex Procedure (CCCP) CCCP Estimate soft segmentation yk* = minyk wTΨ(xk,yk) s.t. yk C(zk) Efficient optimization using dual decomposition Update parameters minw Σk ξk + λ||w||2 wTΨ(xk,ŷ) – wTΨ(xk,yk*) ≥ Δ(ŷ,zk) – ξk Convex optimization Repeat until convergence Outline • Parameter Estimation • Optimization • Experiments • Related and Future Work in SPLENDID Dataset 30 MRI volumes of thigh Dimensions: 224 x 224 x 100 4 muscle groups + background 80% for training, 20% for testing Parameters 4 Laplacians 2 shape priors 1 appearance prior Baudin et al., 2012 Grady, 2005 Baselines Hand-tuned parameters Structured-output SVM Hard segmentation Soft segmentation based on signed distance transform Results Small but statistically significant improvement Outline • Parameter Estimation • Optimization • Experiments • Related and Future Work in SPLENDID Loss-based Learning x: Input a: Annotation Loss-based Learning x: Input a: Annotation h: Hidden information h h = “soft-segmentation” a = “jumping” Loss-based Learning Annotation Mismatch min Σk Δ(correct ak, predicted ak) x: Input a: Annotation h: Hidden information h h = “soft-segmentation” a = “jumping” Loss-based Learning Annotation Mismatch min Σk Δ(correct ak, predicted ak) Small improvement using small medical dataset Loss-based Learning Annotation Mismatch min Σk Δ(correct ak, predicted ak) Large improvement using large vision dataset Loss-based Learning Output Mismatch Modeled using a distribution min Σk Δ(correct {ak,hk}, predicted {ak,hk}) Inexpensive annotation No experts required Richer models can be learnt Kumar, Packer and Koller, ICML 2012 Questions?