SE 263 - R. Venkatesh Babu Mean-Shift : Theory Many Slides from : Yaron Ukrainitz & Bernard Sarel SE 263 - R. Venkatesh Babu Organization • What is Mean Shift ? • Kernel • Kernel profile • Shadow Kernel • Density Estimation Methods • Mean Shift – formulation • Intuitive Description • Deriving the Mean Shift • Mean shift properties SE 263 - R. Venkatesh Babu What is Mean Shift ? A tool for: Finding modes in a set of data samples, manifesting an underlying probability density function (PDF) in RN PDF in feature space • Color space • Scale space • Actually any feature space you can conceive •… SE 263 - R. Venkatesh Babu What is Mean Shift ? Non-parametric Density Estimation Discrete PDF Representation Data Non-parametric Density GRADIENT Estimation (Mean Shift) PDF Analysis SE 263 - R. Venkatesh Babu History • K. Fukunaga and L.D. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition,” IEEE Trans. Information Theory, vol. 21, pp. 32-40, 1975. • Y. Cheng, “Mean Shift, Mode Seeking, and Clustering”, IEEE Trans. PAMI, vol. 17, no. 8, pp. 790-799, 1995. • Dorin Comaniciu and Peter Meer, “Mean Shift: A Robust Approach Toward Feature Space Analysis, IEEE Trans. PAMI, Vol. 24, No. 5, May 2002 SE 263 - R. Venkatesh Babu Kernel The d-variate Kernel K(x) is a bounded function with compact support satisfying the following properties. Kernel Properties: • Normalized K (x)dx 1 Rd • Symmetric xK (x)dx 0 Rd • Exponential weight decay lim x K (x) 0 d x • Uncorrelated T xx K (x)dx cI Rd SE 263 - R. Venkatesh Babu Kernel Density Estimation Parzen Windows - General Framework Given a set of data points {x1…xn }in d-dimensional space, The multivariate kernel density estimator with kernel K(x) 1 n P(x) KH(x - xi ) n i 1 Where, H is the bandwidth matrix, Usually chosen as H=h2I. Data SE 263 - R. Venkatesh Babu Multivariate Kernel from Univariate Kernels In practice one uses the forms: or Same function on each dimension (product of univariate kernel K1) Function of vector length only (Radially Symmetric, Obtained by Rotating univariate kernel K1) SE 263 - R. Venkatesh Babu Kernel Profile • We are interested in special class of radially symmetric kernels satisfying: K ( x) cd ,k k x , x R d 2 • Where k(x) is called kernel profile (defined only for R+). • Kernel profile Examples – Normal: 1 12 t k t e , t [0, ) 2 – Epanechnikov (Triangular ) k (t ) 1 t , t [0,1] 1 k(t) 0 t 1 SE 263 - R. Venkatesh Babu Kernel Profile - Properties K ( x) cd ,k k x , x R 2 d • k is Non-negative • Normalized to 1 • k is non-increasing: k(a) ≥ k(b) if a < b. • Continuous except a finite number of points SE 263 - R. Venkatesh Babu Non-Parametric Density Estimation Assumption : The data points are sampled from an underlying PDF Data point density implies PDF value ! Assumed Underlying PDF Real Data Samples SE 263 - R. Venkatesh Babu Parametric Density Estimation Assumption : The data points are sampled from an underlying PDF PDF(x) = ci e ( x-μi )2 2 i 2 i Estimate Assumed Underlying PDF Real Data Samples SE 263 - R. Venkatesh Babu Kernel Density Estimation Various Kernels 1 n P ( x) K ( x - x i ) n i 1 Examples: A function of some finite number of data points x1…xn c 1 x • Epanechnikov Kernel K E (x) 0 2 Data x 1 otherwise • Uniform Kernel c x 1 KU (x) 0 otherwise • Normal Kernel 1 2 K N (x) c exp x 2 SE 263 - R. Venkatesh Babu Non-Parametric KDE [Parzen Window] The distribution at x, For density to exist, F must be differentiable, SE 263 - R. Venkatesh Babu Non-Parametric KDE • The density at x, = = = = = Where, No of samples fall in [x-h,x+h], SE 263 - R. Venkatesh Babu Non-Parametric KDE Let the hypercube window Rn have dimension d=2, then k(x) is simply a count of the random samples that fall in the square with sides of length hn and centered at x. The new window function equivalent to 1-D Indicator function, w(s) defines the boundary for a unit-hypercube centered at the origin SE 263 - R. Venkatesh Babu Non-Parametric KDE Define kn(x) as, So, iff or The density estimate, = = SE 263 - R. Venkatesh Babu Mean Shift- Original Formulation [Fukunaga and Hostler 1975] • Given a sample S={si:siє Rn} and a kernel K, the sample mean using K at point x: Where, K is a flat Kernel, Now Mean-shift is given by m(x)-x. Let x=m(x) and repeat the above procedure. – This repeated movement of x called the mean-shift algorithm SE 263 - R. Venkatesh Babu Generalization of MS Algorithm [Cheng, PAMI, 1995] • Non-flat kernels are allowed. • Points in data can be weighted. • Shift can be performed on any subset of Euclidean space (X) SE 263 - R. Venkatesh Babu Shadow of a Kernel [Cheng, PAMI, 1995] Kernel H is said to be a shadow of kernel K, if the mean shift using K, is in the gradient direction at x of the density estimate using H SE 263 - R. Venkatesh Babu Shadow of a Kernel [Cheng, PAMI, 1995] The mean shift using kernel K with The gradient of q(x) (q’(x)) at x is To have m(x) and q’(x) point to the same direction, we need h’(r) = - ck(r) for all r and some c > 0. SE 263 - R. Venkatesh Babu Kernel-Shadow Pairs SE 263 - R. Venkatesh Babu Intuitive Description Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls SE 263 - R. Venkatesh Babu Intuitive Description Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls SE 263 - R. Venkatesh Babu Intuitive Description Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls SE 263 - R. Venkatesh Babu Intuitive Description Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls SE 263 - R. Venkatesh Babu Intuitive Description Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls SE 263 - R. Venkatesh Babu Intuitive Description Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls SE 263 - R. Venkatesh Babu Intuitive Description Region of interest Center of mass Objective : Find the densest region Distribution of identical billiard balls SE 263 - R. Venkatesh Babu Kernel Density Estimation Gradient 1 n P(x) K (x - xi ) n i 1 Using the Kernel form: We get : Give up estimating the PDF ! Estimate ONLY the gradient x - xi K (x - xi ) ck h 2 Size of window n x g i i c n c n i 1 P (x) ki gi n x n i 1 n i 1 g i i 1 g(x) k (x) SE 263 - R. Venkatesh Babu Mean Shift for Estimation of Local Maxima 2 x xi k h i 1 n • KDE of pdf: fˆh , K ( x) • Gradient at x: ' x xi f h, K ( x) nhd ( x xi )k h i 1 ^ • Define a new Kernel G: • Then Gradient at2Cx: f h,K ( x) n nh k ,d d 2 nhd 2Ck ,d • Profile of a new kernel: ^ C k ,d g i 1 n g ( x) k ' ( x ) G( x) cd , g g x 2 n x xi 2 xi g 2 h x xi i 1 x n x x 2 h i g i 1 h 2 SE 263 - R. Venkatesh Babu • Mean Shift for Estimation of Local Maxima ^ 2Ck ,d , g ^ f ( x) f h,G ( x). mh,G ( x) Gradient at x: d 2 h, K nh • KDE of f with Kernel G: x xi f h,G nhd g h i 1 • Mean Shift Vector: x xi 2 xi g h i 1 x ( x ) mh,G n x xi 2 g h i 1 ^ C g ,d n n ^ m h ,G • Gradient is proportional to Mean shift: ( x) C. f h, K ( x) 2 SE 263 - R. Venkatesh Babu Mean Shift for Estimation of Local Maxima • Directions of Mean Shift and Gradient Vector is ^ same. ( x) C. ( x) m h ,G f h, K • Gradient vector directed towards maximum increase of density, and so is Mean Shift. • So, Mean Shift points towards a local maxima. • mh,G(x)+x is nearer to the local maxima than x. SE 263 - R. Venkatesh Babu Mean Shift for Estimation of Local Maxima • Recursive Mean Shift for local maxima – Compute mean shift mh,G – Translate the kernel window of G by mh,g and re-compute weighted mean. – Stop iteration if gradient is closed to zero. • Recursive formula for weighted mean, y1 is the initial window center: y j xi 2 xi g h i 1 yi 1 mh,G ( yi ) yi n 2 y x j i g h i 1 n SE 263 - R. Venkatesh Babu Computing The Mean Shift n x g i i c n c n i 1 P (x) ki gi n x n i 1 n i 1 gi i 1 Yet another Kernel density estimation ! Simple Mean Shift procedure: • Compute mean shift vector n x - xi 2 xi g h i 1 x m ( x) 2 n x x i g h i 1 •Translate the Kernel window by m(x) g(x) k (x) SE 263 - R. Venkatesh Babu Mean Shift Properties • Automatic convergence speed – the mean shift vector size depends on the gradient itself. • Near maxima, the steps are small and refined • Convergence is guaranteed for infinitesimal steps only infinitely convergent, (therefore set a lower bound) • For Uniform Kernel ( ), convergence is achieved in a finite number of steps • Normal Kernel ( ) exhibits a smooth trajectory, but is slower than Uniform Kernel ( ). Adaptive Gradient Ascent SE 263 - R. Venkatesh Babu Real Modality Analysis Tessellate the space with windows Run the procedure in parallel SE 263 - R. Venkatesh Babu Real Modality Analysis The blue data points were traversed by the windows towards the mode SE 263 - R. Venkatesh Babu Real Modality Analysis An example Window tracks signify the steepest ascent directions SE 263 - R. Venkatesh Babu Mean Shift Strengths & Weaknesses Strengths : Weaknesses : • Application independent tool • The window size (bandwidth selection) is not trivial • Suitable for real data analysis • Does not assume any prior shape (e.g. elliptical) on data clusters • Can handle arbitrary feature spaces • Only ONE parameter to choose • h (window size) has a physical meaning, unlike K-Means • Inappropriate window size can cause modes to be merged, or generate additional “shallow” modes Use adaptive window size SE 263 - R. Venkatesh Babu Clustering Cluster : All data points in the attraction basin of a mode Attraction basin : the region for which all trajectories lead to the same mode Mean Shift : A robust Approach Toward Feature Space Analysis, by Comaniciu, Meer SE 263 - R. Venkatesh Babu Clustering Synthetic Examples Simple Modal Structures Complex Modal Structures SE 263 - R. Venkatesh Babu Clustering Feature space: L*u*v representation Modes found Real Example Modes after pruning Initial window centers Final clusters SE 263 - R. Venkatesh Babu Clustering Real Example L*u*v space representation SE 263 - R. Venkatesh Babu Clustering Real Example 2D (L*u) space representation Not all trajectories in the attraction basin reach the same mode Final clusters SE 263 - R. Venkatesh Babu Discontinuity Preserving Smoothing Feature space : Joint domain = spatial coordinates + color space xs xr K (x) C ks kr h h s r Meaning : treat the image as data points in the spatial and gray level domain Image Data (slice) Mean Shift vectors Smoothing result Mean Shift : A robust Approach Toward Feature Space Analysis, by Comaniciu, Meer SE 263 - R. Venkatesh Babu Discontinuity Preserving Smoothing Flat regions induce the modes ! z y SE 263 - R. Venkatesh Babu Discontinuity Preserving Smoothing The effect of window size in spatial and range spaces SE 263 - R. Venkatesh Babu Discontinuity Preserving Smoothing Example SE 263 - R. Venkatesh Babu Discontinuity Preserving Smoothing Example SE 263 - R. Venkatesh Babu Segmentation Example SE 263 - R. Venkatesh Babu Segmentation Example SE 263 - R. Venkatesh Babu Segmentation Example SE 263 - R. Venkatesh Babu Segmentation Example