Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Computer Vision – Lecture 7 Graph-Theoretic Segmentation 17.11.2011 Bastian Leibe RWTH Aachen http://www.mmp.rwth-aachen.de leibe@umic.rwth-aachen.de Announcements • Please don’t forget to register for the exam! Augmented Computing and Sensory PerceptualVision WS 11/12 Computer On the Campus system 2 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Course Outline • Image Processing Basics • Segmentation Segmentation and Grouping Graph-Theoretic Segmentation • Recognition • • • • Global Representations Subspace representations Local Features & Matching Object Categorization 3D Reconstruction Motion and Tracking 3 Recap: Gestalt Theory • Gestalt: whole or group Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Whole is greater than sum of its parts Relationships among parts can yield new properties/features • Psychologists identified series of factors that predispose set of elements to be grouped (by human visual system) “I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of colour. Do I have "327"? No. I have sky, house, and trees.” Max Wertheimer (1880-1943) Untersuchungen zur Lehre von der Gestalt, Psychologische Forschung, Vol. 4, pp. 301-350, 1923 http://psy.ed.asu.edu/~classics/Wertheimer/Forms/forms.htm B. Leibe 4 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Recap: Gestalt Factors • These factors make intuitive sense, but are very difficult to translate into algorithms. B. Leibe 5 Image source: Forsyth & Ponce Recap: Image Segmentation Augmented Computing and Sensory PerceptualVision WS 11/12 Computer • Goal: identify groups of pixels that go together Slide credit: Steve Seitz, Kristen Grauman B. Leibe 6 Recap: K-Means Clustering • Basic idea: randomly initialize the k cluster centers, and Augmented Computing and Sensory PerceptualVision WS 11/12 Computer iterate between the two steps we just saw. 1. 2. Randomly initialize the cluster centers, c1, ..., cK Given cluster centers, determine points in each cluster – 3. Given points in each cluster, solve for ci – 4. For each point p, find the closest ci. Put p into cluster i Set ci to be the mean of points in cluster i If ci have changed, repeat Step 2 • Properties Will always converge to some solution Can be a “local minimum” – Does not always find the global minimum of objective function: Slide credit: Steve Seitz B. Leibe 7 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Recap: Expectation Maximization (EM) • Goal • Find blob parameters θ that maximize the likelihood function: Approach: 1. 2. 3. E-step: given current guess of blobs, compute ownership of each point M-step: given ownership probabilities, update blobs to maximize likelihood function Repeat until convergence Slide credit: Steve Seitz B. Leibe 8 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Recap: Mean-Shift Algorithm • Iterative Mode Search 1. 2. 3. 4. Initialize random seed, and window W Calculate center of gravity (the “mean”) of W: Shift the search window to the mean Repeat Step 2 until convergence Slide credit: Steve Seitz B. Leibe 9 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Recap: Mean-Shift Clustering • Cluster: all data points in the attraction basin of a mode • Attraction basin: the region for which all trajectories lead to the same mode Slide by Y. Ukrainitz & B. Sarel B. Leibe 10 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Recap: Mean-Shift Segmentation • • • • Find features (color, gradients, texture, etc) Initialize windows at individual pixel locations Perform mean shift for each window until convergence Merge windows that end up near the same “peak” or mode Slide credit: Svetlana Lazebnik B. Leibe 11 Back to the Image Segmentation Problem… Augmented Computing and Sensory PerceptualVision WS 11/12 Computer • Goal: identify groups of pixels that go together • Up to now, we have focused on ways to group pixels into image segments based on their appearance… Segmentation as clustering. • We also want to enforce region constraints. Spatial consistency Smooth borders B. Leibe 12 Topics of This Lecture • Graph theoretic segmentation Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Normalized Cuts Using texture features Extension: Multi-level segmentation • Segmentation as Energy Minimization Markov Random Fields Graph cuts for image segmentation Applications B. Leibe 13 Images as Graphs Augmented Computing and Sensory PerceptualVision WS 11/12 Computer q wpq w p • Fully-connected graph Node (vertex) for every pixel Link between every pair of pixels, (p,q) Affinity weight wpq for each link (edge) – wpq measures similarity – Similarity is inversely proportional to difference (e.g., in color and position…) Slide credit: Steve Seitz B. Leibe 14 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Segmentation by Graph Cuts w A B C • Break Graph into Segments Delete links that cross between segments Easiest to break links that have low similarity (low weight) – Similar pixels should be in the same segments – Dissimilar pixels should be in different segments Slide credit: Steve Seitz B. Leibe 15 Measuring Affinity Augmented Computing and Sensory PerceptualVision WS 11/12 Computer • Distance • Intensity • Color 1 1 1 aff ( x, y) exp 2 2 x y d 2 aff ( x, y) exp 2 2 I ( x) I ( y) 2 d aff ( x, y) exp 2 2 dist c( x), c( y) 2 d (some suitable color space distance) • Texture aff ( x, y) exp 2 2 f ( x) f ( y) 1 d 2 (vectors of filter outputs) B. Leibe 16 Source: Forsyth & Ponce Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Scale Affects Affinity • Small σ: group only nearby points • Large σ: group far-away points Small Slide credit: Svetlana Lazebnik B. Leibe Medium Large 17 Image Source: Forsyth & Ponce Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Graph Cut B A • Set of edges whose removal makes a graph disconnected • Cost of a cut Sum of weights of cut edges: cut ( A, B ) w p , q pA, qB • A graph cut gives us a segmentation What is a “good” graph cut and how do we find one? Slide credit: Steve Seitz B. Leibe 18 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Graph Cut Here, the cut is nicely defined by the block-diagonal structure of the affinity matrix. How can this be generalized? B. Leibe 19 Image Source: Forsyth & Ponce Minimum Cut • We can do segmentation by finding the minimum cut in Augmented Computing and Sensory PerceptualVision WS 11/12 Computer a graph Efficient algorithms exist for doing this • Drawback: Weight of cut proportional to number of edges in the cut Minimum cut tends to cut off very small, isolated components Cuts with lesser weight than the ideal cut Ideal Cut Slide credit: Khurram Hassan-Shafique B. Leibe 20 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Normalized Cut (NCut) • A minimum cut penalizes large segments • This can be fixed by normalizing for size of segments • The normalized cut cost is: cut ( A, B) cut ( A, B) assoc ( A,V ) assoc ( B,V ) assoc(A,V) = sum of weights of all edges in V that touch A • The exact solution is NP-hard but an approximation can be computed by solving a generalized eigenvalue problem. J. Shi and J. Malik. Normalized cuts and image segmentation. PAMI 2000 Slide credit: Svetlana Lazebnik B. Leibe 21 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Interpretation as a Dynamical System • Treat the links as springs and shake the system Elasticity proportional to cost Vibration “modes” correspond to segments – Can compute these by solving a generalized eigenvector problem Slide credit: Steve Seitz B. Leibe 22 NCuts as a Generalized Eigenvector Problem Augmented Computing and Sensory PerceptualVision WS 11/12 Computer • Definitions W : the affinity matrix, W (i, j ) wi , j ; D : the diag. matrix, D(i, i ) j W (i, j ); x : a vector in {1, 1}N , x(i ) 1 i A. • Rewriting Normalized Cut in matrix form: NCut (A,B) cut (A,B) cut (A,B) assoc(A,V) assoc(B,V) (1 x)T ( D W )(1 x) (1 x)T ( D W )(1 x) ; k T T k1 D1 (1 k )1 D1 D(i, i) D(i, i) xi 0 i ... Slide credit: Jitendra Malik B. Leibe 23 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Some More Math… Slide credit: Jitendra Malik B. Leibe 24 NCuts as a Generalized Eigenvalue Problem Augmented Computing and Sensory PerceptualVision WS 11/12 Computer • After simplification, we get yT ( D W ) y T NCut ( A, B) , with y {1, b }, y D1 0. i T y Dy This is hard, as y is discrete! • This is a so-called Rayleigh Quotient Solution given by the “generalized” eigenvalue problem (D ¡ W)y = ¸ Dy Solved by converting to standard eigenvalue problem 1 2 1 2 1 2 D (D W)D z λz , where z D y Relaxation: continuous y. • Subtleties Optimal solution is second smallest eigenvector Gives continuous result—must convert into discrete values of y Slide credit: Alyosha Efros B. Leibe 25 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer NCuts Example Smallest eigenvectors NCuts segments B. Leibe 26 Image source: Shi & Malik Discretization • Problem: eigenvectors take on continuous values Augmented Computing and Sensory PerceptualVision WS 11/12 Computer How to choose the splitting point to binarize the image? Image Eigenvector NCut scores • Possible procedures a) b) c) Pick a constant value (0, or 0.5). Pick the median value as splitting point. Look for the splitting point that has the minimum NCut value: 1. Choose n possible splitting points. 2. Compute NCut value. 3. Pick minimum. B. Leibe 27 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer NCuts: Overall Procedure 1. Construct a weighted graph G=(V,E) from an image. 2. Connect each pair of pixels, and assign graph edge weights W (i, j ) Prob. that i and j belong to the same region. 3. Solve ( D W ) y Dy for the smallest few eigenvectors. This yields a continuous solution. 4. Threshold eigenvectors to get a discrete cut This is where the approximation is made (we’re not solving NP). 5. Recursively subdivide if NCut value is below a prespecified value. NCuts Matlab code available at http://www.cis.upenn.edu/~jshi/software/ Slide credit: Jitendra Malik B. Leibe 28 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Color Image Segmentation with NCuts Slide credit: Steve Seitz B. Leibe 29 Image Source: Shi & Malik Using Texture Features for Segmentation Augmented Computing and Sensory PerceptualVision WS 11/12 Computer • Texture descriptor is vector of filter bank outputs J. Malik, S. Belongie, T. Leung and J. Shi. "Contour and Texture Analysis for Image Segmentation". IJCV 43(1),7-27,2001. Slide credit: Svetlana Lazebnik B. Leibe 30 Using Texture Features for Segmentation Augmented Computing and Sensory PerceptualVision WS 11/12 Computer • Texture descriptor is vector of filter bank outputs. • Textons are found by clustering. Slide credit: Svetlana Lazebnik B. Leibe 31 Using Texture Features for Segmentation Augmented Computing and Sensory PerceptualVision WS 11/12 Computer • Texture descriptor is vector of filter bank outputs. • Textons are found by clustering. • Affinities are given by similarities of texton histograms over windows given by the “local scale” of the texture . Slide credit: Svetlana Lazebnik B. Leibe 32 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Results with Color & Texture B. Leibe 33 Summary: Normalized Cuts • Pros: Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Generic framework, flexible to choice of function that computes weights (“affinities”) between nodes Does not require any model of the data distribution • Cons: Time and memory complexity can be high – Dense, highly connected graphs many affinity computations – Solving eigenvalue problem for each cut Preference for balanced partitions – If a region is uniform, NCuts will find the modes of vibration of the image dimensions Slide credit: Kristen Grauman B. Leibe 34 Extension: Multi-Level Segmentation Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Example segmentations for several contrasts • It is often difficult to extract a single good segmentation Idea: Extract a hierarchy of segmentations instead B. Leibe 35 Source: [S. Todorovic, N. Ahuja, CVPR’06] Multiscale Segmentation Tree Sample cutsets • Segmentations can be arrangend Augmented Computing and Sensory PerceptualVision WS 11/12 Computer in a tree Segmentation tree Example segmentations B. Leibe 36 Source: [S. Todorovic, N. Ahuja, CVPR’06] Topics of This Lecture • Graph theoretic segmentation Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Normalized Cuts Using color and texture features Extension: Multi-level segmentation • Segmentation as Energy Minimization Markov Random Fields Graph cuts for image segmentation Applications B. Leibe 37 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Markov Random Fields • Allow rich probabilistic models for images • But built in a local, modular way Learn local effects, get global effects out Observed evidence Hidden “true states” Neighborhood relations Slide credit: William Freeman B. Leibe 38 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer MRF Nodes as Pixels Original image Degraded image B. Leibe Reconstruction from MRF modeling pixel neighborhood statistics 39 MRF Nodes as Patches Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Image patches Scene patches ( xi , yi ) Image ( xi , x j ) Scene Slide credit: William Freeman B. Leibe 40 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Network Joint Probability P( x, y) ( xi , yi ) ( xi , x j ) i Scene Image i, j Image-scene compatibility function Scene-scene compatibility function Local observations Slide credit: William Freeman B. Leibe Neighboring scene nodes 41 Energy Formulation • Joint probability Y Augmented Computing and Sensory PerceptualVision WS 11/12 Computer P(x; y) = Y ©(x i ; yi ) i ª (x i ; x j ) i ;j • Maximizing the joint probability is the same as minimizing the negative log X log P (x; y) = X log ©(x i ; yi ) + i X ¡ E (x; y) = X Á(x i ; yi ) + i log ª (x i ; x j ) i ;j Ã(x i ; x j ) i ;j • This is similar to free-energy problems in statistical mechanics (spin glass theory). We therefore draw the analogy and call E an energy function. • Á and à are called potentials. B. Leibe 42 Energy Formulation Augmented Computing and Sensory PerceptualVision WS 11/12 Computer • Energy function X ¡ E (x; y) = Á(x i ; yi ) X Á(x i ; yi ) i + Ã(x i ; x j ) Ã(x i ; x j ) i ;j Single-node potentials Pairwise potentials • Single-node potentials Á Encode local information about the given pixel/patch How likely is a pixel/patch to belong to a certain class (e.g. foreground/background)? • Pairwise potentials à Encode neighborhood information How different is a pixel/patch’s label from that of its neighbor? (e.g. based on intensity/color/texture difference, edges) B. Leibe 43 Energy Minimization • Goal: Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Á(x i ; yi ) Infer the optimal labeling of the MRF. • Many inference algorithms are available, e.g. Ã(x i ; x j ) Gibbs sampling, simulated annealing Iterated conditional modes (ICM) Variational methods Belief propagation Graph cuts • Recently, Graph Cuts have become a popular tool Only suitable for a certain class of energy functions But the solution can be obtained very fast for typical vision problems (~1MPixel/sec). B. Leibe 44 Topics of This Lecture • Graph theoretic segmentation Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Normalized Cuts Using color and texture features Extension: Multi-level segmentation • Segmentation as Energy Minimization Markov Random Fields Graph cuts for image segmentation Applications B. Leibe 45 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Graph Cuts for Optimal Boundary Detection • Idea: convert MRF into source-sink graph t n-links a cut hard constraint hard constraint s Minimum cost cut can be computed in polynomial time (max-flow/min-cut algorithms) Slide credit: Yuri Boykov B. Leibe I pq w pq exp 2 2 I pq 46 [Boykov & Jolly, ICCV’01] Simple Example of Energy Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Regional term E ( L) p t Dp (Lp ) t-links pqN pq ( L p Lq ) n-links I pq L p {s, t} D p (s ) s Slide credit: Yuri Boykov w I pq w pq exp 2 2 a cut D p (t ) Boundary term (binary object segmentation) B. Leibe 47 Adding Regional Properties Augmented Computing and Sensory PerceptualVision WS 11/12 Computer D p (t ) t n-links a cut w pq D p (s ) s Regional bias example Suppose I s and I t are given “expected” intensities of object and background D (t ) exp || I Dp (s) exp || I p I s ||2 / 2 2 p t 2 2 I || / 2 p NOTE: hard constrains are not required, in general. Slide credit: Yuri Boykov B. Leibe 48 [Boykov & Jolly, ICCV’01] Adding Regional Properties Augmented Computing and Sensory PerceptualVision WS 11/12 Computer D p (t ) t n-links a cut w pq D p (s ) s “expected” intensities of object and background I s and I t D (t ) exp || I p can be re-estimated Dp (s) exp || I p I s ||2 / 2 2 t 2 2 I || / 2 p EM-style optimization Slide credit: Yuri Boykov B. Leibe 49 [Boykov & Jolly, ICCV’01] Adding Regional Properties • More generally, regional bias can be based on any Augmented Computing and Sensory PerceptualVision WS 11/12 Computer intensity models of object and background t a cut D p (s ) D p ( Lp ) log Pr( I p | Lp ) Pr( I p | t ) D p (t ) Pr( I p | s) Ip s I given object and background intensity histograms Slide credit: Yuri Boykov B. Leibe 50 [Boykov & Jolly, ICCV’01] How to Set the Potentials? Some Examples • Color potentials Augmented Computing and Sensory PerceptualVision WS 11/12 Computer e.g. modeled with a Mixture of Gaussians ( xi , yi ; ) log ( xi , k ) P(k | xi )N ( yi ; yk , k ) k • Edge potentials e.g. a “contrast sensitive Potts model” ( xi , x j , gij ( y); ) T gij ( y) ( xi x j ) where gij ( y ) e yi y j 2 2 avg yi y j 2 • Parameters , need to be learned, too! B. Leibe 51 [Shotton & Winn, ECCV’06] When Can s-t Graph Cuts Be Applied? Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Regional term E ( L) Boundary term E p ( Lp ) p t-links E(L pqN p , Lq ) n-links L p {s, t} • s-t graph cuts can only globally minimize binary energies that are submodular. E(L) can be minimized by s-t graph cuts [Boros & Hummer, 2002, Kolmogorov & Zabih, 2004] E ( s, s) E (t , t ) E ( s, t ) E (t , s) Submodularity (“convexity”) • Non-submodular cases can still be addressed with some optimality guarantees. Current research topic B. Leibe 52 Topics of This Lecture • Graph theoretic segmentation Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Normalized Cuts Using color and texture features • Hierarchical segmentation Case study: segmentation-based recognition • Segmentation as Energy Minimization Markov Random Fields Graph cuts for image segmentation Applications B. Leibe 53 GraphCut Applications: “GrabCut” • Interactive Image Segmentation [Boykov & Jolly, ICCV’01] Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Rough region cues sufficient Segmentation boundary can be extracted from edges • Procedure User marks foreground and background regions with a brush. This is used to create an initial segmentation which can then be corrected by additional brush strokes. Additional segmentation cues User segmentation cues Slide credit: Matthieu Bray GrabCut: Data Model Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Foreground color Background color Global optimum of the energy • Obtained from interactive user input User marks foreground and background regions with a brush Alternatively, user can specify a bounding box Slide credit: Carsten Rother B. Leibe 55 GrabCut: Coherence Model Augmented Computing and Sensory PerceptualVision WS 11/12 Computer • An object is a coherent set of pixels: ( x, y ) x n ( m , n )C How to choose 2 Error (%) over training set: ? 25 Slide credit: Carsten Rother xm e ym y n B. Leibe 56 Iterated Graph Cuts R Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Foreground & Foreground Background Background Background G Color model (Mixture of Gaussians) Result 1 2 3 4 Energy after each iteration Slide credit: Carsten Rother B. Leibe 58 Augmented Computing and Sensory PerceptualVision WS 11/12 Computer GrabCut: Example Results • This is included in the newest version of MS Office! B. Leibe 59 Image source: Carsten Rother Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Applications: Interactive 3D Segmentation Slide credit: Yuri Boykov B. Leibe 60 [Y. Boykov, V. Kolmogorov, ICCV’03] Improving Efficiency of Segmentation • Problem: Images contain many pixels Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Even with efficient graph cuts, an MRF formulation has too many nodes for interactive results. • Efficiency trick: Superpixels Group together similar-looking pixels for efficiency of further processing. Cheap, local oversegmentation Important to ensure that superpixels do not cross boundaries • Several different approaches possible Superpixel code available here http://www.cs.sfu.ca/~mori/research/superpixels/ B. Leibe 61 Image source: Greg Mori Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Superpixels for Pre-Segmentation Graph structure B. Leibe Speedup 62 Summary: Graph Cuts Segmentation • Pros Augmented Computing and Sensory PerceptualVision WS 11/12 Computer Powerful technique, based on probabilistic model (MRF). Applicable for a wide range of problems. Very efficient algorithms available for vision problems. Becoming a de-facto standard for many segmentation tasks. • Cons/Issues Graph cuts can only solve a limited class of models – Submodular energy functions – Can capture only part of the expressiveness of MRFs Only approximate algorithms available for multi-label case B. Leibe 63 Segmentation: Caveats Augmented Computing and Sensory PerceptualVision WS 11/12 Computer • We’ve looked at bottom-up ways to segment an image into regions, yet finding meaningful segments is intertwined with the recognition problem. • Often want to avoid making hard decisions too soon • Difficult to evaluate; when is a segmentation successful? Often depends on the rest of the system pipeline. Slide credit: Kristen Grauman B. Leibe 64 References and Further Reading • Background information on Normalized Cuts can be Augmented Computing and Sensory PerceptualVision WS 11/12 Computer found in Chapter 14 of D. Forsyth, J. Ponce, Computer Vision – A Modern Approach. Prentice Hall, 2003 • Try the NCuts Matlab code at http://www.cis.upenn.edu/~jshi/software/ • Try the GraphCut implementation at http://www.adastral.ucl.ac.uk/~vladkolm/software.html