Lecture 11: E-M and MeanShift CAP 5415 Fall 2007 Review on Segmentation by Clustering Each Pixel Data Vector Example (From Comanciu and Meer) Review of k-means • Let's find three clusters in this data • These points could represent RGB triplets in 3D Review of k-means • Begin by guessing where the “center” of each cluster is Review of k-means • Now assign each point to the closest cluster Review of k-means • Now move each cluster center to the center of the points assigned to it • Repeat this process until it converges Probabilistic Point of View • We'll take a generative point of view • How to generate a data point: 1) Choose a cluster,z, from (1 .... N) 2) Sample that point from the distribution associated with that cluster 1D Example Called a Mixture Model Probability of choosing cluster k Probability of x given the cluster is k • z indicates which cluster is chosen or To make it a Mixture of Gaussians Called a mixing coefficient Brief Review of Gaussians Mixture of Gaussians In Context of Our Previous Model • Now, we have means and covariances How does this help with clustering? • Let's think about a different problem first • What if we had a set of data points and we wanted to find the parameters of the mixture model? • Typical strategy: Optimize parameters to maximize likelihood of the data Maximizing the likelihood • Easy if we knew which cluster each point should belong to • But we don't, so we get its probability function by using Bayes Rule Where this comes from • Let's differentiate with respect to \mu_k EM Algorithm • This is called the E-Step • M-Step: Using these estimates of maximize the rest of the parameters • Lots of interesting math and intuitions that go into this algorithm, that I'm not covering • Take Pattern Recognition! Back to clustering • Now we have • Can be seen as a soft-clustering Another Clustering Application Another Clustering Application • In this case, we have a video and we want to segment out what's moving or changing from C. Stauffer and W. Grimson Easy Solution • Average a bunch of frames to get a “Background” Image • Computer the difference between background and foreground The difficulty with this approach • The background changes (From Stauffer and Grimson) Solution • Fit a mixture model to the background • I.E. A background pixel could have multiple colors Can use this to track in surveillance Suggested Reading • Chapter 14, David A. Forsyth and Jean Ponce, “Computer Vision: A Modern Approach”. • Chapter 3, Mubarak Shah, “Fundamentals of Computer Vision” Advantages and Disadvantages Mean-Shift • Like EM, this algorithm is built on probabilistic intuitions. • To understand EM we had to understand mixture models • To understand mean-shift, we need to understand kernel density estimation (Take Pattern Recognition!) Basics of Kernel Density Estimation • Let’s say you have a bunch of points drawn from some distribution • What’s the distribution that generated these points? Using a Parametric Model • Could fit a parametric model (like a Gaussian) • Why: – Can express distribution with a few number of parameters (like mean and variance) • Why not: – Limited in flexibility Non-Parametric Methods • We’ll focus on kernel-density estimates • Basic Idea: Use the data to define the distribution • Intuition: – If I were to draw more samples from the same probability distribution, then those points would probably be close to the points that I have already drawn – Build distribution by putting a little mass of probability around each data-point Example (From Tappen – Thesis) Formally Kernel • Most Common Kernel: Gaussian or Normal Kernel • Another way to think about it: – Make an image, put 1(or more) wherever you have a sample – Convolve with a Gaussian What is Mean-Shift? • The density will have peaks (also called modes) • If we started at point and did gradient-ascent, we would end up at one of the modes • Cluster based on which mode each point belongs to Gradient Ascent? • Actually, no. • A set of iterative steps can be taken that will monotonically converge to a mode – No worries about step sizes – This is an adaptive gradient ascent (x = yj) Results Results Normalized Cuts • Clustering approach based on graphs • First some background Graphs • A graph G(V,E) is a triple consisting of a vertex set V(G) an edge set E(G) and a relation that associates with each edge two vertices called its end points. (From Slides by Khurram Shafique) Connected and Disconnected Graphs • A graph G is connected if there is a path from every vertex to every other vertex in G. • A graph G that is not connected is called disconnected graph. (From Slides by Khurram Shafique) Can represent a graph with a matrix a b c e d One Row Per Node (Based on Slides by Khurram Shafique) [ 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 1 0 Adjacency Matrix: W ] Can add weights to edges [ 0 1 3 ∞ ∞ 1 0 4 ∞ 2 3 4 0 6 7 ∞ ∞ 6 0 1 Weight Matrix: W (Based on Slides by Khurram Shafique) ∞ 2 7 1 0 ] Minimum Cut A cut of a graph G is the set of edges S such that removal of S from G disconnects G. Minimum cut is the cut of minimum weight, where weight of cut <A,B> is given as (Based on Slides by Khurram Shafique) Minimum Cut • There can be more than one minimum cut in a given graph • All minimum cuts of a graph can be found in polynomial time1. H. Nagamochi, K. Nishimura and T. Ibaraki, “Computing all small cuts in an undirected network. SIAM J. Discrete Math. 10 (1997) 469-481. 1 (Based on Slides by Khurram Shafique) How does this relate to image segmentation? • When we compute the cut, we've divided the graph into two clusters • To get a good segmentation, the weight on the edges should represent pixels affinity for being in the same group (Images from Khurram Shafique) Affinities for Image Segmentation Brightness Features • Interpretation: – High weight edges for pixels that • Have similar intensity • Are close to each other Min-Cut won't work though • The minimum-cut will often choose a cut with one small cluster (Image From Shi and Malik) We need a better criterion • Instead of min-cut, we can use the normalized cut • Basic Idea: Big clusters will increase assoc(A,V), thus decreasing Ncut(A,B) Finding the Normalized Cut • NP-Hard Problem • Can find approximate solution by finding the eigenvector with the second-smallest eigenvalue of this generalized eigenvalue problem • That splits the data into two clusters • Can recursively partition data to find more clusters • Code available on Jianbo Shi's webpage Results Figure from “Normalized cuts and image segmentation,” Shi and Malik, 2000 So what if I want to segment my image? • Ncuts is a very common solution • Mean-shift is also very popular