Efficient Inference for FullyConnected CRFs with Stationarity Yimeng Zhang, Tsuhan Chen CVPR 2012 Summary • Explore object-class segmentation with fullyconnected CRF models • Only restriction on pairwise terms is `spatial stationarity’ (i.e. depend on relative locations) • Show how efficient inference can be achieved by – Using a QP formulation – Using FFT to calculate gradients in complexity (linear in) O(NlogN) Fully-connected CRF model • General pairwise CRF model: • • • • Image I Class labeling, X: Label set, L: V = set of pixels, N_i = neighbourhood of pixel i, Z(I) = partition function, psi = potential functions Fully-connected CRF model • General pairwise CRF model: • In fully-connected CRF, for all i, N_i = V Unary Potential • Unary potential generates a score for each object class per pixel (TextonBoost) Pairwise Potential • Pairwise potential measures compatibility of the labels at each pair of pixels • Combines spatial and colour contrast factors Pairwise Potential • Colour contrast: • Spatial term: Pairwise Potential • Learning the spatial term MAP inference using QP relaxation • Introduce a binary indicator variable for each pixel and label • MAP inference expressed as a quadratic integer program, and relaxed to give the QP MAP inference using QP relaxation • QP relaxation has been proved to be tight in all cases (Ravikumar ICML 2006 [24]) • Moreover, it is convex whenever matrix of edgeweights is negative-definite • Additive bound for non-convex case • QP requires O(KN) variables, LP requires (K^2E) MAP inference using QP relaxation • Gradient • Derive fixed-point update by forming Lagrangian and setting its derivative to 0 Illustration of QP updates Efficiently evaluating the gradient • Required summation • Would be a convolution without the color term • With color term is requires 5D-filtering • Can be approximated by clustering into C color clusters, => C convolutions across Efficiently evaluating the gradient • Hence, for the case x_i = x_j, we need to evaluate • Instead, evaluate for C clusters (C = 10 to 15) • where • Finally, interpolate Update complexity • FFTs of each spatial filters can be calculated in advance (K^2 filters) • At each update, we require C FFTs calculating, O(CNlogN) • K^2 convolutions are needed, each requiring a multiplication, O(K^2CN) • Terms can be added in Fourier domain, => only KC inverse FFTs needed, O(KCNlogN) • Run-time per iteration < 0.1s for 213x320 pixels (+ downsampling by factor of 5) MSRC synthetic experiment • Unary terms randomized • Spatial distributions set to ground-truth MSRC synthetic experiment • Running times Sowerby synthetic experiment MSRC full experiment • Use TextonBoost unary potentials • Compare with several other CRFs with same unaries – Grid only – Grid + P^N (Kohli, CVPR 2008) – Grid + P^N + Cooccurrence (Ladickỳ, ECCV 2010) – Fully-connected + Gaussian spatial (Krähenbühl, NIPS 2011) MSRC full experiment • Qualitative comparison MSRC full experiment • Quantitative comparison – Overall – Per-class – Timing: 2-8s per image