Lecture 11: Mean Shift and Normalized Cuts

advertisement
Lecture 11: E-M and MeanShift
CAP 5415
Fall 2007
Review on Segmentation by
Clustering
Each Pixel
Data Vector
Example
(From Comanciu and Meer)
Review of k-means
• Let's find three clusters in this data
• These points could represent RGB triplets in
3D
Review of k-means
• Begin by guessing where the “center” of
each cluster is
Review of k-means
• Now assign each point to the closest cluster
Review of k-means
• Now move each cluster center to the center
of the points assigned to it
• Repeat this process until it converges
Probabilistic Point of View
• We'll take a
generative point of
view
• How to generate a
data point:
1) Choose a cluster,z,
from (1 .... N)
2) Sample that point
from the distribution
associated with that
cluster
1D Example
Called a Mixture Model
Probability of
choosing cluster k
Probability of x given
the cluster is k
• z indicates which cluster is chosen
or
To make it a Mixture of
Gaussians
Called a mixing coefficient
Brief Review of Gaussians
Mixture of Gaussians
In Context of Our Previous Model
• Now, we have means and covariances
How does this help with
clustering?
• Let's think about a different problem first
• What if we had a set of data points and we
wanted to find the parameters of the mixture
model?
• Typical strategy: Optimize parameters to
maximize likelihood of the data
Maximizing the likelihood
• Easy if we knew which cluster each point
should belong to
• But we don't, so we get its probability function
by using Bayes Rule
Where this comes from
• Let's differentiate with respect to \mu_k
EM Algorithm
• This is called the E-Step
• M-Step: Using these estimates of
maximize the rest of the parameters
• Lots of interesting math and intuitions that go
into this algorithm, that I'm not covering
• Take Pattern Recognition!
Back to clustering
• Now we have
• Can be seen as a soft-clustering
Another Clustering Application
Another Clustering Application
• In this case, we have a video and we want to
segment out what's moving or changing
from C. Stauffer and W. Grimson
Easy Solution
• Average a bunch of frames to get a
“Background” Image
• Computer the difference between
background and foreground
The difficulty with this approach
• The background changes
(From Stauffer and Grimson)
Solution
• Fit a mixture model to the background
• I.E. A background pixel could have multiple
colors
Can use this to track in
surveillance
Suggested Reading
• Chapter 14, David A. Forsyth and Jean Ponce,
“Computer Vision: A Modern Approach”.
• Chapter 3, Mubarak Shah, “Fundamentals of
Computer Vision”
Advantages and Disadvantages
Mean-Shift
• Like EM, this algorithm is built on
probabilistic intuitions.
• To understand EM we had to understand
mixture models
• To understand mean-shift, we need to
understand kernel density estimation
(Take Pattern Recognition!)
Basics of Kernel Density Estimation
• Let’s say you
have a bunch of
points drawn from
some distribution
• What’s the
distribution that
generated these
points?
Using a Parametric Model
• Could fit a
parametric model
(like a Gaussian)
• Why:
– Can express
distribution with a
few number of
parameters (like
mean and variance)
• Why not:
– Limited in flexibility
Non-Parametric Methods
• We’ll focus on kernel-density estimates
• Basic Idea: Use the data to define the
distribution
• Intuition:
– If I were to draw more samples from the
same probability distribution, then those
points would probably be close to the points
that I have already drawn
– Build distribution by putting a little mass of
probability around each data-point
Example
(From Tappen – Thesis)
Formally
Kernel
• Most Common Kernel: Gaussian or Normal
Kernel
• Another way to think about it:
– Make an image, put 1(or more) wherever you have a
sample
– Convolve with a Gaussian
What is Mean-Shift?
• The density will have peaks (also called modes)
• If we started at point and did gradient-ascent, we
would end up at one of the modes
• Cluster based on which mode each point belongs
to
Gradient Ascent?
• Actually, no.
• A set of iterative steps can be taken that
will monotonically converge to a mode
– No worries about step sizes
– This is an adaptive gradient ascent
(x =
yj)
Results
Results
Normalized Cuts
• Clustering approach based on graphs
• First some background
Graphs
• A graph G(V,E) is a triple consisting of a
vertex set V(G) an edge set E(G) and a
relation that associates with each edge
two vertices called its end points.
(From Slides by Khurram Shafique)
Connected and Disconnected
Graphs
• A graph G is connected if there
is a path from every vertex to
every other vertex in G.
• A graph G that is not connected
is called disconnected graph.
(From Slides by Khurram Shafique)
Can represent a graph with a
matrix
a
b
c
e
d
One Row Per
Node
(Based on Slides by Khurram Shafique)
[
0
1
0
0
1
1
0
0
0
0
0
0
0
0
1
0
0
0
0
1
1
0
1
1
0
Adjacency Matrix: W
]
Can add weights to edges
[
0
1
3
∞
∞
1
0
4
∞
2
3
4
0
6
7
∞
∞
6
0
1
Weight Matrix: W
(Based on Slides by Khurram Shafique)
∞
2
7
1
0
]
Minimum Cut
A cut of a graph G is the set of
edges S such that removal of
S from G disconnects G.
Minimum cut is the cut of minimum
weight, where weight of cut <A,B> is
given as
(Based on Slides by Khurram Shafique)
Minimum Cut
• There can be more than one minimum cut in
a given graph
• All minimum cuts of a graph can be found in
polynomial time1.
H. Nagamochi, K. Nishimura and T. Ibaraki, “Computing all small cuts in
an undirected network. SIAM J. Discrete Math. 10 (1997) 469-481.
1
(Based on Slides by Khurram Shafique)
How does this relate to image
segmentation?
• When we compute the cut, we've divided
the graph into two clusters
• To get a good segmentation, the weight on
the edges should represent pixels affinity
for being in the same group
(Images from Khurram Shafique)
Affinities for Image
Segmentation
Brightness Features
• Interpretation:
– High weight edges for pixels that
• Have similar intensity
• Are close to each other
Min-Cut won't work though
• The minimum-cut will often choose a cut
with one small cluster
(Image From Shi and Malik)
We need a better criterion
• Instead of min-cut, we can use the
normalized cut
• Basic Idea: Big clusters will increase
assoc(A,V), thus decreasing Ncut(A,B)
Finding the Normalized Cut
• NP-Hard Problem
• Can find approximate solution by finding the
eigenvector with the second-smallest eigenvalue
of this generalized eigenvalue problem
• That splits the data into two clusters
• Can recursively partition data to find more
clusters
• Code available on Jianbo Shi's webpage
Results
Figure from “Normalized cuts and image
segmentation,” Shi and Malik, 2000
So what if I want to segment my
image?
• Ncuts is a very common solution
• Mean-shift is also very popular
Download