Mean-Shift : Theory and Applications

advertisement
SE 263 - R. Venkatesh Babu
Mean-Shift : Theory
Many Slides from : Yaron Ukrainitz & Bernard Sarel
SE 263 - R. Venkatesh Babu
Organization
• What is Mean Shift ?
• Kernel
• Kernel profile
• Shadow Kernel
• Density Estimation Methods
• Mean Shift – formulation
• Intuitive Description
• Deriving the Mean Shift
• Mean shift properties
SE 263 - R. Venkatesh Babu
What is Mean Shift ?
A tool for:
Finding modes in a set of data samples, manifesting an
underlying probability density function (PDF) in RN
PDF in feature space
• Color space
• Scale space
• Actually any feature space you can conceive
•…
SE 263 - R. Venkatesh Babu
What is Mean Shift ?
Non-parametric
Density Estimation
Discrete PDF Representation
Data
Non-parametric
Density GRADIENT Estimation
(Mean Shift)
PDF Analysis
SE 263 - R. Venkatesh Babu
History
• K. Fukunaga and L.D. Hostetler, “The estimation of the gradient of a
density function, with applications in pattern recognition,” IEEE Trans.
Information Theory, vol. 21, pp. 32-40, 1975.
• Y. Cheng, “Mean Shift, Mode Seeking, and Clustering”, IEEE Trans.
PAMI, vol. 17, no. 8, pp. 790-799, 1995.
• Dorin Comaniciu and Peter Meer, “Mean Shift: A Robust Approach
Toward Feature Space Analysis, IEEE Trans. PAMI, Vol. 24, No. 5, May 2002
SE 263 - R. Venkatesh Babu
Kernel
The d-variate Kernel K(x) is a bounded function with compact support
satisfying the following properties.
Kernel Properties:
• Normalized
 K (x)dx  1
Rd
• Symmetric
 xK (x)dx  0
Rd
• Exponential weight
decay
lim x K (x)  0
d
x 
• Uncorrelated
T
xx
 K (x)dx  cI
Rd
SE 263 - R. Venkatesh Babu
Kernel Density Estimation
Parzen Windows - General Framework
Given a set of data points {x1…xn }in d-dimensional space,
The multivariate kernel density estimator with kernel K(x)
1 n
P(x)   KH(x - xi )
n i 1
Where, H is the bandwidth matrix,
Usually chosen as H=h2I.
Data
SE 263 - R. Venkatesh Babu
Multivariate Kernel from Univariate
Kernels
In practice one uses the forms:
or
Same function on each dimension
(product of univariate kernel K1)
Function of vector length only
(Radially Symmetric, Obtained
by Rotating univariate kernel K1)
SE 263 - R. Venkatesh Babu
Kernel Profile
• We are interested in special class of radially symmetric
kernels satisfying:
 
K ( x)  cd ,k k x , x  R d
2
• Where k(x) is called kernel profile (defined only for R+).
• Kernel profile Examples
– Normal:
1  12 t
k t  
e , t  [0, )
2
– Epanechnikov (Triangular )
k (t )  1  t , t [0,1]
1
k(t)
0
t 1
SE 263 - R. Venkatesh Babu
Kernel Profile - Properties
 
K ( x)  cd ,k k x , x  R
2
d
• k is Non-negative
• Normalized to 1
• k is non-increasing: k(a) ≥ k(b) if a < b.
• Continuous except a finite number of points
SE 263 - R. Venkatesh Babu
Non-Parametric Density Estimation
Assumption : The data points are sampled from an underlying PDF
Data point density
implies PDF value !
Assumed Underlying PDF
Real Data Samples
SE 263 - R. Venkatesh Babu
Parametric Density Estimation
Assumption : The data points are sampled from an underlying PDF
PDF(x) =  ci  e

( x-μi )2
2 i 2
i
Estimate
Assumed Underlying PDF
Real Data Samples
SE 263 - R. Venkatesh Babu
Kernel Density Estimation
Various Kernels
1 n
P ( x)   K ( x - x i )
n i 1
Examples:
A function of some finite number of data points
x1…xn


 c 1 x
• Epanechnikov Kernel K E (x)  

 0
2
Data

x 1
otherwise
• Uniform Kernel
c
x 1
KU (x)  
 0 otherwise
• Normal Kernel
 1 2
K N (x)  c  exp   x 
 2

SE 263 - R. Venkatesh Babu
Non-Parametric KDE
[Parzen Window]
The distribution at x,
For density to exist, F must be differentiable,
SE 263 - R. Venkatesh Babu
Non-Parametric KDE
• The density at x,
=
=
=
=
=
Where,
No of samples fall in [x-h,x+h],
SE 263 - R. Venkatesh Babu
Non-Parametric KDE
Let the hypercube window Rn have dimension d=2, then k(x) is simply
a count of the random samples that fall in the square with sides of length
hn and centered at x.
The new window function equivalent
to 1-D Indicator function,
w(s) defines the boundary for a
unit-hypercube centered at the origin
SE 263 - R. Venkatesh Babu
Non-Parametric KDE
Define kn(x) as,
So,
iff
or
The density estimate,
=
=
SE 263 - R. Venkatesh Babu
Mean Shift- Original Formulation
[Fukunaga and Hostler 1975]
•
Given a sample S={si:siє Rn} and a kernel K, the sample mean
using K at point x:
Where, K is a flat Kernel,
Now Mean-shift is given by m(x)-x.
Let x=m(x) and repeat the above procedure.
– This repeated movement of x called the mean-shift
algorithm
SE 263 - R. Venkatesh Babu
Generalization of MS Algorithm
[Cheng, PAMI, 1995]
• Non-flat kernels are allowed.
• Points in data can be weighted.
• Shift can be performed on any subset of Euclidean
space (X)
SE 263 - R. Venkatesh Babu
Shadow of a Kernel
[Cheng, PAMI, 1995]
Kernel H is said to be a shadow of
kernel K, if the mean shift using K,
is in the gradient direction at x of the
density estimate using H
SE 263 - R. Venkatesh Babu
Shadow of a Kernel
[Cheng, PAMI, 1995]
The mean shift using kernel K
with
The gradient of q(x) (q’(x)) at x is
To have m(x) and q’(x) point to the same direction, we need
h’(r) = - ck(r) for all r and some c > 0.
SE 263 - R. Venkatesh Babu
Kernel-Shadow Pairs
SE 263 - R. Venkatesh Babu
Intuitive Description
Region of
interest
Center of
mass
Mean Shift
vector
Objective : Find the densest region
Distribution of identical billiard balls
SE 263 - R. Venkatesh Babu
Intuitive Description
Region of
interest
Center of
mass
Mean Shift
vector
Objective : Find the densest region
Distribution of identical billiard balls
SE 263 - R. Venkatesh Babu
Intuitive Description
Region of
interest
Center of
mass
Mean Shift
vector
Objective : Find the densest region
Distribution of identical billiard balls
SE 263 - R. Venkatesh Babu
Intuitive Description
Region of
interest
Center of
mass
Mean Shift
vector
Objective : Find the densest region
Distribution of identical billiard balls
SE 263 - R. Venkatesh Babu
Intuitive Description
Region of
interest
Center of
mass
Mean Shift
vector
Objective : Find the densest region
Distribution of identical billiard balls
SE 263 - R. Venkatesh Babu
Intuitive Description
Region of
interest
Center of
mass
Mean Shift
vector
Objective : Find the densest region
Distribution of identical billiard balls
SE 263 - R. Venkatesh Babu
Intuitive Description
Region of
interest
Center of
mass
Objective : Find the densest region
Distribution of identical billiard balls
SE 263 - R. Venkatesh Babu
Kernel Density Estimation
Gradient
1 n
P(x)   K (x - xi )
n i 1
Using the
Kernel form:
We get :
Give up estimating the PDF !
Estimate ONLY the gradient
 x - xi
K (x - xi )  ck 
 h

2



Size of window
 n

x
g
i i

c n
c n
 
i 1
P (x)   ki    gi    n
 x
n i 1
n  i 1  

g
i
 

i 1
g(x)  k (x)
SE 263 - R. Venkatesh Babu
Mean Shift for Estimation of Local
Maxima
2
 x  xi
k

 h
i 1 




n
• KDE of pdf:
fˆh , K ( x) 
• Gradient at x:

'  x  xi
f h, K ( x)  nhd  ( x  xi )k  h
i 1

^
• Define a new Kernel G:
• Then Gradient at2Cx:  
f
h,K
( x) 
n
nh
k ,d
d 2
nhd
2Ck ,d
• Profile of a new kernel:
^
C k ,d
 g 
 i 1 
n
g ( x)  k ' ( x )
 
G( x)  cd , g g x
2
 n
 x  xi 2  
 
  xi g 
2
 
h
x  xi   i 1

x


 n  x  x 2 
h
 
i


g



 i 1

 h



2




SE 263 - R. Venkatesh Babu
•
Mean Shift for Estimation of Local
Maxima
^
2Ck ,d , g ^
 f
( x) 
f h,G ( x). mh,G ( x)
Gradient at x:
d 2
h, K
nh
• KDE of f with Kernel G:
 x  xi
f h,G  nhd  g  h
i 1 
• Mean Shift Vector:
 x  xi 2 

xi g 



h
i 1

x
(
x
)

mh,G
n
 x  xi 2 

g

 h

i 1 

^
C g ,d
n
n
^
m
h ,G
• Gradient is proportional
to Mean shift:
( x)  C.
f
h, K
( x)
2




SE 263 - R. Venkatesh Babu
Mean Shift for Estimation of Local
Maxima
• Directions of Mean Shift and Gradient
Vector is
^
same.
( x)  C.
( x)
m
h ,G
f
h, K
• Gradient vector directed towards maximum
increase of density, and so is Mean Shift.
• So, Mean Shift points towards a local maxima.
• mh,G(x)+x is nearer to the local maxima than x.
SE 263 - R. Venkatesh Babu
Mean Shift for Estimation of Local
Maxima
• Recursive Mean Shift for local maxima
– Compute mean shift mh,G
– Translate the kernel window of G by mh,g and
re-compute weighted mean.
– Stop iteration if gradient is closed to zero.
• Recursive formula for weighted mean,
y1 is the initial
window center:
 y j  xi 2 

xi g 



h
i 1


yi 1  mh,G ( yi )  yi  n 
2

y

x
j
i


g



h
i 1


n
SE 263 - R. Venkatesh Babu
Computing The Mean Shift
 n

x
g
i i

c n
c n
 
i 1
P (x)   ki    gi    n
 x
n i 1
n  i 1  

gi

 i 1

Yet another Kernel
density estimation !
Simple Mean Shift procedure:
• Compute mean shift vector
 n

 x - xi 2 
  xi g 




h
 i 1

  x
m ( x)  

2
n


x
x


i
g





 h 

 i 1 

•Translate the Kernel window by m(x)
g(x)  k (x)
SE 263 - R. Venkatesh Babu
Mean Shift Properties
• Automatic convergence speed – the mean shift
vector size depends on the gradient itself.
• Near maxima, the steps are small and refined
• Convergence is guaranteed for infinitesimal
steps only  infinitely convergent,
(therefore set a lower bound)
• For Uniform Kernel (
), convergence is achieved in
a finite number of steps
• Normal Kernel (
) exhibits a smooth trajectory, but
is slower than Uniform Kernel (
).
Adaptive
Gradient
Ascent
SE 263 - R. Venkatesh Babu
Real Modality Analysis
Tessellate the space
with windows
Run the procedure in parallel
SE 263 - R. Venkatesh Babu
Real Modality Analysis
The blue data points were traversed by the windows towards the mode
SE 263 - R. Venkatesh Babu
Real Modality Analysis
An example
Window tracks signify the steepest ascent directions
SE 263 - R. Venkatesh Babu
Mean Shift Strengths & Weaknesses
Strengths :
Weaknesses :
• Application independent tool
• The window size (bandwidth
selection) is not trivial
• Suitable for real data analysis
• Does not assume any prior shape
(e.g. elliptical) on data clusters
• Can handle arbitrary feature
spaces
• Only ONE parameter to choose
• h (window size) has a physical
meaning, unlike K-Means
• Inappropriate window size can
cause modes to be merged,
or generate additional “shallow”
modes  Use adaptive window
size
SE 263 - R. Venkatesh Babu
Clustering
Cluster : All data points in the attraction basin of a mode
Attraction basin : the region for which all trajectories lead to the same mode
Mean Shift : A robust Approach Toward Feature Space Analysis, by Comaniciu, Meer
SE 263 - R. Venkatesh Babu
Clustering
Synthetic Examples
Simple Modal Structures
Complex Modal Structures
SE 263 - R. Venkatesh Babu
Clustering
Feature space:
L*u*v representation
Modes found
Real Example
Modes after
pruning
Initial window
centers
Final clusters
SE 263 - R. Venkatesh Babu
Clustering
Real Example
L*u*v space representation
SE 263 - R. Venkatesh Babu
Clustering
Real Example
2D (L*u)
space
representation
Not all trajectories
in the attraction basin
reach the same mode
Final clusters
SE 263 - R. Venkatesh Babu
Discontinuity Preserving Smoothing
Feature space : Joint domain = spatial coordinates + color space
 xs   xr 
K (x)  C  ks 
  kr 

h
h
s
r


 
Meaning : treat the image as data points in the spatial and gray level domain
Image Data
(slice)
Mean Shift
vectors
Smoothing
result
Mean Shift : A robust Approach Toward Feature Space Analysis, by Comaniciu, Meer
SE 263 - R. Venkatesh Babu
Discontinuity Preserving Smoothing
Flat regions induce the modes !
z
y
SE 263 - R. Venkatesh Babu
Discontinuity Preserving Smoothing
The effect of
window size
in spatial and
range spaces
SE 263 - R. Venkatesh Babu
Discontinuity Preserving Smoothing
Example
SE 263 - R. Venkatesh Babu
Discontinuity Preserving Smoothing
Example
SE 263 - R. Venkatesh Babu
Segmentation
Example
SE 263 - R. Venkatesh Babu
Segmentation
Example
SE 263 - R. Venkatesh Babu
Segmentation
Example
SE 263 - R. Venkatesh Babu
Segmentation
Example
Download