Filter-based Mean-Field Inference for Random Fields with Higher-Order Terms and Product Label-Spaces Vibhav Vineet*, Jonathan Warrell*, Philip H.S. Torr http://cms.brookes.ac.uk/research/visiongroup/ *Joint first authors 1 Labelling problems Many vision problems can be expressed as dense image labelling problems Object segmentation Stereo Optical flow 2 Overview • Graph cuts so far have proved the method of choice for CRFs 3 Overview • Graph cuts so far have proved the method of choice for CRFs • Recently message passing methods have started to achieve equal performance with much faster run times • But only for pairwise CRFs 4 Overview • Graph cuts so far have proved the method of choice for CRFs • Recently message passing methods have started to achieve equal performance with much faster run times • But only for pairwise CRFs • Some problems require higher order information • Co-occurrences terms • Product label spaces 5 Overview • Graph cuts so far have proved the method of choice for CRFs • Recently message passing methods have started to achieve equal performance with much faster run times • But only for pairwise CRFs • Some problems require higher order information • Co-occurrences terms • Product label spaces • Our contribution is to develop fast message passing based methods for certain classes of higher order information 6 Importance of co-occurrence terms Context is an important cue for global scene understanding Can you identify this object? Slide courtesy A Torralba 7 Importance of co-occurrence terms We can identify it as keyboard through scene context Slide courtesy A Torralba 8 Importance of co-occurrence terms The keyboard, table and monitor often co-occur together Shown to improve accuracy recently in Ladický et al (ECCV ’10) Slide courtesy A Torralba 9 Importance of PN Potts terms • PN Potts enforce region consistency • Detector-based PN potentials are formed by applying grab-cut to bounding box to create a clique • Improves over pairwise terms only Result without detections Slide courtesy L Ladicky Set of detections Final Result 10 Importance of higher order terms We use higher order information to improve object class segmentation … Image Object labels 11 Importance of higher order terms … and also to improve joint object and stereo labelling using product label spaces Image Object labels Disparity labels 12 CRF formulation Standard CRF energy formulation Pairwise CRF Data term Inference Smoothness term 13 CRF formulation Standard CRF energy formulation Higher Order CRF Data term Inference Smoothness term Higher order terms Co-occurrence term 14 Inference Standard CRF energy Data term Smoothness term Higher order term Co-occurrence term Can be solved using graph-cuts based method But with co-occurrence ~10 times slower than pairwise only Relatively fast but still computationally expensive! 15 Our inference Standard CRF energy Data term Smoothness term Higher order term Co-occurrence term We use filter-based mean-field inference approach Our method achieves almost 10-40 times speed up compared to graph cuts based methods Much faster due to efficient filtering 16 Efficient inference in pairwise CRF • Krähenbühl et al (NIPS ’11) propose an efficient method for inference in pairwise CRF under two assumptions: • Mean-field approximation to CRF • Pairwise weights take a linear combination of Gaussian kernels 17 Efficient inference in pairwise CRF • Krähenbühl et al (NIPS ’11) propose an efficient method for inference in pairwise CRF under two assumptions: • Mean-field approximation to CRF • Pairwise weights take a linear combination of Gaussian kernels • They achieve almost 5 times speed up over graph cuts + also allow dense connectivity Fully connected (dense) pairwise CRF Slide courtesy P Krahenbuhl Inference 18 Mean-field based inference • Mean-field approximation approximate intractable P with Q from a tractable family P • Minimize the KL-divergence between Q and P Slide courtesy S Nowozin 19 Mean-field based inference • Mean-field update for pairwise terms: 20 Mean-field based inference • Mean-field update for pairwise terms: • This can be evaluated using Gaussian convolutions 21 Mean-field based inference • Mean-field update for pairwise terms: • This can be evaluated using Gaussian convolutions • We evaluate two approaches for Gaussian convolution • **Permutohedral lattice based filtering • ***Domain transform based filtering **Adams et.al. Fast high-dimensional filtering using the permutohedral lattice. CG-10 ***Gasta et.al. Domain transform for edge-aware image and video processing. TOG-11 22 Q distribution Q distribution for different classes across different iterations Iteration 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 123 Q distribution Q distribution for different classes across different iterations Iteration 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 124 Q distribution Q distribution for different classes across different iterations Iteration 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 125 Q distribution Q distribution for different classes across different iterations Iteration 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 126 Higher order mean-field update • Marginal update in mean-field 27 Higher order mean-field update • Marginal update in mean-field - - - - - - - - - Labels: =1 =2 =3 28 Higher order mean-field update • Marginal update in mean-field • High time complexity for general higher order terms: O(L|C|) We show how these can be solved for PN Potts and co-occurrence terms efficiently 29 PN Potts example PN Potts enforces region consistent labellings Label set consists of 3 labels Potts patterns Clique of 6 variables Example: Detector potentials 30 Expectation update Sum across possible states of the clique Clique takes label l Clique does not taking label l By rearranging the expectation as above, we reduce the time complexity from O(LN) to O(NL) Can be extended to pattern-based potentials (Komodakis et al CVPR ’09) 31 Global co-occurrence terms Co-occurrence models which objects belong together Λ(x)={ aeroplane, tree, flower, building, boat, grass, sky } Λ(x)={ building, tree, grass, sky } 32 Global co-occurrence terms Associates a cost with each possible label subset ={ , , } 33 Global co-occurrence terms Associates a cost with each possible label subset ={ , , } We use a second order assumption to cost function 34 Our model We define a cost over a set of latent variables: Y{1…L} Each latent variable represents a label Y: Costs include unary and pairwise cost Each latent variable node is connected to each image variable node K Latent variable binary states: :on :off X: 35 Global co-occurrence constraints Constraint on the model Constraint violation Y: K K X: Pay cost K for each violation If latent variable is off, no image variable should take that label Overall complexity: O(NL+L2) 36 Product label space Assign an object and disparity label to each pixel Joint energy function defined over product label space: data term Left Camera Image Right Camera Image smoothness term Inference in product label space higher order term Object Class Segmentation Dense Stereo Reconstruction 37 PascalVOC-10 dataset - qualitative Image Ground truth Fully connected pairwise CRF* alpha-expansion** Ours Observe an improvement over alternative methods *Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11 38 **Ladicky L. et.al. Graph cut based inference with co-occurrence statistics, ECCV-10 PascalVOC - quantitative Algorithm Time (s) Overall Av. Recall Av. I/U AHCRF+Cooc 36 81.43 38.01 30.9 Dense pairwise 0.67 71.43 34.53 28.40 Dense pairwise 4.35 + Potts 79.87 40.71 30.18 Dense pairwise 4.4 + Potts + Cooc 80.44 43.08 33.2 Observe an improvement of 2.3% in I/U score over Ladicky et.al* *Ladicky L. et.al. Graph cut based inference with co-occurrence statistics, ECCV-10 39 PascalVOC - quantitative Algorithm Time (s) Overall Av. Recall Av. I/U AHCRF+Cooc 36 81.43 38.01 30.9 Dense pairwise 0.67 71.43 34.53 28.40 Dense pairwise 4.35 + Potts 79.87 40.71 30.18 Dense pairwise 4.4 + Potts + Cooc 80.44 43.08 33.2 Observe an improvement of 2.3% in I/U score over Ladicky et.al* Achieve 8-9x speed up compared to alpha-expansion based method of Ladicky et.al* *Ladicky L. et.al. Graph cut based inference with co-occurrence statistics, ECCV-10 40 Leuven dataset - qualitative Left image Ground truth Ours Right image Ground truth Ours 41 Leuven dataset - quantitative Algorithm Time (s) Object (% correct) Stereo (% correct) GC + Range (1) 24.6 95.94 76.97 GC + Range (2) 49.9 95.94 77.31 GC + Range (3)* 74.4 95.94 77.46 Extended CostVol 4.2 95.20 77.18 Dense + HO (PLBF) 3.1 95.24 78.89 Dense + HO (DTBF) 2.1 95.06 78.21 Dense + HO + CostVol +DTBF 6.3 94.98 79.00 Achieve 12-35x speed up compared to alpha-expansion based method of Ladicky et.al* *Ladicky L. et.al. Joint optimisation for object class segmentation an dense stereo reconstruction. BMVC-2010 42 Conclusion • We provide efficient ways of incorporating higher-order terms into fully connected pairwise CRF models • Demonstrate improved efficiency compared to previous models with higher-order terms • Also demonstrate improved accuracy over previous approaches • Similar methods applicable to a broad range of vision problems • Code is available for download: http://cms.brookes.ac.uk/staff/VibhavVineet/ 43 EXTRA … 44 Joint object-stereo model Introduce two different set of variables Xi: object variable Yi: disparity variable Z_i: [ x_i y_i ] Messages exchanged between object and stereo variables Joint energy function: Unary Pairwise Higher order 45 Marginal update for object variables Message from disparity variables to object variables Filtering is done using permutohedral lattice based filtering* strategy *Adams A. et.al. Fast high-dimensional filtering using the permutohedral lattice. Computer Graphics Forum-2010 46 Marginal update for disparity variables Message from object variables to disparity variables Filtering is done using domain transform based filtering* strategy *Gasta E.S.L. et.al. Domain transform for edge-aware image and video processing. ACM Trans. Graph.-2011 47 Mean-field Vs. Graph-cuts • Measure I/U score on PascalVOC-10 segmentation • Increase standard deviation for mean-field • Increase window size for graph-cuts method • Both achieve almost similar accuracy 48 Mean-field Vs. Graph-cuts • Measure I/U score on PascalVOC-10 segmentation • Increase standard deviation for mean-field • Increase window size for graph-cuts method •Time complexity very high, making infeasible to work with large neighbourhood system 49 Window sizes • Comparison on matched energy Algorithm Model Time (s) Av. I/U Alpha-exp (n=10) Pairwise 326.17 28.59 Mean-field pairwise 0.67 28.64 Alpha-exp (n=3) Pairwise + Potts 56.8 29.6 Mean-field Pairwise + Potts 4.35 30.11 Alpha-exp (n=1) Pairwise + Potts + Cooc 103.94 30.45 Mean-field Pairwise + Potts + Cooc 4.4 32.17 Impact of adding more complex costs and increasing window size50