Vision Topics Seminar Mean Shift Egorov Svetlana Based on: D. Comaniciu, P. Meer: Mean Shift Analysis and Applications, IEEE Int. Conf. Computer Vision (ICCV'99), Kerkyra , Greece , 1197-1203, 1999 1 Presentation plan • Motivation and Goal • Intro: problem formulation, previous methods overview • Base paper on the mean-shift in details • Recent modifications and improvements • Recent applications 2 Presentation goals • Present Mean-Shift algorithm used as a common technique for two Computer Vision tasks: – Image filtering and discontinuity preserving smoothing – Clustering/segmentation • Highlight the pros/cons and tradeoffs of this method, compare with previous methods. • Review recent modifications and improvements. • Present possible applications of the method, emphasizing on one specific case. 3 Segmentation methods – overview • As presented in “Segmentation and low-level grouping” by Bill Freeman, MIT, following methods exist for segmentation: • Background subtraction – Estimate the background using a moving average and subtract from the current frame to extract the foreground. • K-means clustering – The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k < n • Mean-shift algorithm (focus of this PPT). • Normalized cuts 4 Mean-shift – motivation and intuitive description • Given a distribution of points, mean shift is a procedure for finding the densest region. • Example for simple 2D case (see next slide): – Start from arbitrary point in the distribution – Region of interest is a circle centered in this point – On each iteration find the center of the mass for the region of interest – Move the circle to the center of the mass – Continue the iterations until convergence 5 Intuitive Description Region of interest Center of mass Objective : Find the densest region Distribution of identical billiard balls From “Mean Shift Theory and Applications”, presentation for “Advanced Topics in Computer Vision” course, Weizmann Institute. Mean Shift vector 6 Intuitive Description Region of interest Center of mass Mean Shift vector From “Mean Shift Theory and Applications”, presentation for “Advanced Topics in Computer Vision” course, Weizmann Institute. 7 Intuitive Description Region of interest Center of mass Mean Shift vector From “Mean Shift Theory and Applications”, presentation for “Advanced Topics in Computer Vision” course, Weizmann Institute. 8 Intuitive Description Region of interest Center of mass Mean Shift vector From “Mean Shift Theory and Applications”, presentation for “Advanced Topics in Computer Vision” course, Weizmann Institute. 9 Intuitive Description Region of interest Center of mass Mean Shift vector From “Mean Shift Theory and Applications”, presentation for “Advanced Topics in Computer Vision” course, Weizmann Institute. 10 Intuitive Description Region of interest Center of mass Mean Shift vector From “Mean Shift Theory and Applications”, presentation for “Advanced Topics in Computer Vision” course, Weizmann Institute. 11 Intuitive Description Region of interest Center of mass From “Mean Shift Theory and Applications”, presentation for “Advanced Topics in Computer Vision” course, Weizmann Institute. 12 Mean-shift – algorithm formal definition. • The Basic Mean Shift Algorithm is formulated according to the following paper: D. Comaniciu, P. Meer: Mean Shift Analysis and Applications, IEEE Int. Conf. Computer Vision (ICCV'99), Kerkyra , Greece , 1197-1203, 1999 13 Mean-shift – algorithm formal definition. • Given: set of n points in the d-dimensional space: {xi}i=1..n • Model: We assume non-parametric statistical model, i.e. there is a probability density function (PDF) associated with the set of points, without any assumptions on its parameters. • Goal: for any given point find closest local mode of the density function. 14 Non-parametric density gradient estimation Non-parametric Density Estimation Discrete PDF Representation Data Non-parametric Density GRADIENT Estimation (Mean Shift) PDF Analysis • Non-parametric – no assumption about PDF form (e.g. normal distribution) • Density Gradient is estimated instead of Density itself. From “Mean Shift Theory and Applications”, presentation for “Advanced Topics in Computer Vision” course, Weizmann Institute. 15 Kernels • Kernel notion is used for PDF gradient estimation method (referred also as Parzen windows method used in statistics) • A kernel is a non-negative real-valued integrable function K satisfying the following requirements Kernel Properties: • Normalized K ( x ) dx 1 Rd • Symmetric xK (x)dx 0 Rd • Exponential weight decay lim x K (x) 0 d x xxT K (x)dx cI Rd From “Mean Shift Theory and Applications”, presentation for “Advanced Topics in Computer Vision” course, Weizmann Institute. 16 Kernel - examples In practice one of the following forms is used, where k( ) is a Kernel profile K(x) = c∏k(x(j)) or K (x) ck x Where x(j) are individual coordinates Examples: c 1 x • Epanechnikov Kernel K E (x) 0 2 x 1 otherwise • Uniform Kernel c x 1 KU (x) 0 otherwise • Normal Kernel 1 2 K N (x) c exp x 2 From “Mean Shift Theory and Applications”, presentation for “Advanced Topics in Computer Vision” course, Weizmann Institute. 17 Mean-shift – algorithm (cont). • The multivariate kernel density estimate obtained with kernel K(x) and window radius h, computed in the point x: • The optimum kernel yielding asymptotic minimum mean integrated square error (AMISE) is the Epanechnikov kernel where cd is the volume of the unit d-dimensional sphere 18 Mean-shift – algorithm (cont). • Density gradient estimate for Epanechnikov kernel: where Sh(x) is a sphere of radius h centered on x and containing nx data points. • The sample mean shift is given by: The first term is the center of the mass of the points within the sphere, when all the points are equally weighted. 19 Mean-shift – algorithm (cont). • Mean shift relation to f(x) and its gradient: Mean-shift vector has the same direction as the density gradient. 20 Mean-shift properties • Estimate of the normalized gradient can be obtained by computing the sample mean shift in a uniform kernel centered on x. • The mean shift vector has the direction of the gradient of the density estimate at x when this estimate is obtained with the Epanechnikov kernel. • The mean shift vector always points towards the direction of the maximum increase in the density and can define a path leading to a density mode. • The mean shift procedure, obtained by successive computation of the mean shift vector Mh(x) and translation of the window Sh(x) by Mh(x), is guaranteed to converge 21 Processing in joint Spatial-Range Domain • An image is typically represented as a 2dimensional lattice of r-dimensional vectors (pixels) – r is 1 in the gray level case, 3 for color images, or r > 3 in the multi-spectral case (frequencies beyond the visible light range) • The space of the lattice is the spatial domain • The gray level, color, or spectral information is represented in the range domain. • After a normalization with global parameters σs and σr, the location and range vectors concatenated to a joint spatial-range domain of 22 dimension d = r + 2. Processing in joint Spatial-Range Domain (cont.) • The discussed method applies the mean shift procedure for the data points in the joint spatial-range domain. • Each data point becomes associated to a point of convergence which represents the local mode of the density in the ddimensional space 23 Mean shift applications - Discontinuity preserving filtering • The output of the mean shift filter for an image pixel is the range information carried by the point of convergence. • Filtering procedure: – {xj}j=1...n - original image (normalized with σs and σr) – {zj}j=1...n - filtered image 24 Computational complexity • The lattice structure of the spatial domain is used for the efficient search of the points . • This search can be limited to a rectangular window of size 2x2 in the normalized space, which corresponds to image pixels • The arithmetic complexity of mean shift filtering is about ops per image pixel. where kc is the mean number of iterations to convergence. 25 Filtering - example Original image Filtered image 26 Example from Comaniciu & Meer Mean shift applications - Segmentation Segmentation divides the image into segments or clusters The arithmetic complexity of the segmentation is similar to that of the mean shift filtering. 27 Segmentation examples Original image Segmented image 28 Example from Comaniciu & Meer Corresponding contours Segmentation examples Original image Segmented image 29 Example from Comaniciu & Meer Mean Shift - recent modifications and improvements • One of the recent modifications to the basic mean shift, is P.A.M.S. The path assigned mean shift algorithm: A new fast mean shift implementation for colour image segmentation Pooransingh, A.; Radix, C.-A.; Kokaram, A.; 15th IEEE International Conference on Image Processing, 2008. ICIP 2008. • According to this paper, the mean shift method is effective in high density regions but for multidimensional data sets proves to be computationally expensive. The goal of the method proposed in this paper is to achieve fast mean shift methods capable of processing multidimensional data sets easily. 30 General mean-shift (GMS) method for YUV colour space (revised algorithm). The main computational load is the calculation of the mean shift vector, mc(U,V). The computational cost is O(n2) where n is the size of the data set. 31 Fast mean-shift methods. • A number of modifications were proposed to improve complexity: – Use of single metric to represent each data point – Hierarchical clustering method: repeatedly applying the mean shift over increasingly large bandwidth, with each step using the results of the previous to initialize. – Neighbourhood consistency algorithm: Step 1: Partition: The original data set is decomposed into a number of local subsets of similar size and centre calculated. Step 2: Clustering: The mean shift is calculated for each sample rather than the whole data set to find a single class for each sample 32 Path Assigned Mean Shift (PAMS) – main idea • For any random start point, the mean shift vector always points to the mode point • In the PAMS assignment, all points along the path toward the mode point are assigned to that final mode value. – points already assigned modes are eliminated from the mean shift process and are not traversed in the future 33 Path Assigned Mean Shift Algorithm in the Colour Domain • The complexity is reduced to O(φ2) where φ is the total number of unassigned points per iteration. 34 Example illustrating GMS vs. PAMS General mean-shift PAMS 35 Comparison of segmentation results between different algorithms (a),(b) Original (c)(d) GMS (g),(h) PAMS (e)-(f) other fast mean shift method. 36 Mean Shift - Recent Applications. • One of the recent mean shift applications is presented in the following paper: Region-based mean shift tracking: Application to face tracking Vilaplana, V.; Marques, F.; 15th IEEE International Conference on Image Processing, 2008. ICIP 2008. Refer to Appendix for details Face tracking: • • Face tracking is a task required by applications such as video indexing, visual surveillance, human-computer interaction, or facial expression recognition. In these applications, it is necessary to detect the faces, track them from frame to frame and analyze the tracks, e.g. to understand the object’s behavior. Tracking methods are organized in three groups, based on the model selected to describe the shape – Point tracking – Kernel tracking – Silhouette tracking 37 Face tracking - example 38 Examples from http://gps-tsc.upc.es/imatge/_Veronica/RegionBasedMeanShift.html Conclusions • Mean-shift is a useful method for low-level tasks such as filtering or segmentation. Minor details of the background are eliminated, while objects discontinuities are preserved • The method is non-parametric, i.e. doesn’t assume any model for underlying density function • The method works in joint spatial-range domain • The M.S. method is guaranteed to converge • Scaling factors (σs and σr) have major impact on algorithm performance and should be adjusted to the objects nature • The Basic M.S. is computationally expensive. Some efficient modifications, with improved complexity and same quality were proposed recently. One example is Path Assigned M.S. • Another possible application of the mean-shift is face tracking. Consistent tracking can be achieved by combining mean-shift with image partition into regions. 39 References [1] D. Comaniciu, P. Meer: Mean Shift Analysis and Applications, IEEE Int. Conf. Computer Vision (ICCV'99), Kerkyra , Greece , 1197-1203, 1999 [2] Segmentation and low-level grouping. Bill Freeman, MIT. [3] The path assigned mean shift algorithm: A new fast mean shift implementation for colour image segmentation Pooransingh, A.; Radix, C.-A.; Kokaram, A.; 15th IEEE International Conference on Image Processing, 2008. ICIP 2008. [4] Region-based mean shift tracking: Application to face tracking Vilaplana, V.; Marques, F.; 15th IEEE International Conference on Image Processing, 2008. ICIP 2008. [5] D. Comaniciu, P. Meer: Mean Shift: A Robust Approach toward Feature Space Analysis, IEEE Trans. Pattern Analysis Machine Intell., Vol. 24, No. 5, 603-619, 2002 [6] “Mean Shift Theory and Applications”, PowerPoint slides for “Advanced Topics in Computer Vision” course, Weizmann Institute. 40 Appendix – Region based Face Tracking 41 Mean shift (revised) • • • • • X – n-dimensional space S - a finite set, the sample data Kernel: K(x)=k(||x||2) where k( ) is kernel profile w : S → [0,∞) a weight function The sample mean with kernel K at a point x from X: • Mean shift is m(x) − x • The repeated movement of data points to the sample mean is called mean shift algorithm. 42 Mean shift (cont.) • Let T be a finite set, and m(T) = {m(t) : t T}. • The full mean shift procedure iterates and evolves T until it finds a fixed point T = m(T). • The weights w(s) can be fixed or re-evaluated after each iteration and may also be a function of the current set T. • Kernels define an influence zone for each point x in T and can be scaled to modify their spatial extent. 43 Mean shift for tracking • In object tracking, the evolving set T typically consists of just one point, the object centroid. • A sample corresponds to the spatial coordinates of a pixel x, and has an associated sample weight w(x), which defines how likely the pixel with color I(x) belongs to an object model. • The mean shifts seek the mode of the kernel density computed with these weights. • Implementation requires defining: – – – – The kernel (scale and shape), An object model, The weight function The shape of the final object. 44 Kernel selection considerations • The basic mean shift requires isotropic kernels (e.g. Epanechnikov or Gaussian) and assumes constant object scale and orientation during the tracking • However, objects may have complex shapes whose scale and orientation constantly change. This leads to using generalized kernels 45 Kernel selection considerations • Two main parameters for Kernel selection are scale and shape, both should be adjusted to the tracked object • Scale: – The kernel scale determines the size of the window where sample weights are examined and is a crucial parameter in the mean shift algorithm. – Changes in the object scale require adjusting the kernel bandwidth to consistently track the object. • Shape: – In the basic formulation, radially symmetric kernels which are isotropic in shape are used. However, objects often have anisotropic structure and, therefore, anisotropic symmetric kernels like rectangles or ellipses are frequently used. 46 Object model and weight image • The tracked object is modeled as a class conditional color distribution P(I(x)/O) that estimates, for each pixel with color I(x), the probability of the color of the pixel, given that the pixel belongs to the tracked object O. – The object distribution is learned off-line from training images or during the initialization. – The model is commonly built with histograms in a particular color space. • The weight function measures, for each pixel, some feature related to its similarity to the object model. – Example: the object histogram is compared with a histogram of colors observed within the current mean shift target window – To adapt to background variation, the background model is continuously recomputed. 47 Final shape definition • The tracking output at each frame is usually the object centroid and a rectangle which has the size of the last iteration window. This rectangle is used as an estimate of the object extent. 48 Region-based mean shift for tracking • Approach of Vilaplana& Marques combines mean shift with the use of regions. • Regions are useful to compute the weight image and to define precisely the contours of the tracked objects and provide a natural mechanism to initialize the search in the next frame. • The algorithm works with pixels that lie within a subimage defined by a rectangular search window W and an image partition P. 49 Region based method – Kernel selection • Kernel scale: – At each frame, the size of the rectangular search window is defined as the size of the bounding box of the object O found in the previous frame, scaled by a fixed factor (constant). – The window size is the same for all iterations within a frame, except for occasional cases when the search window size is underestimated. 50 Region based method – Kernel selection • Shape: – The image partition P is fitted to the search window W to define the kernel shape. The kernel extent is defined by all the regions R in partition P that are completely included in W: – At each iteration, the kernel scale changes according to the size of the tracked object and its shape takes into account the color homogeneity observed in the image since it is defined by the regions in the partition 51 Region based partition and kernel 52 Example by Vilaplana& Marques Object model and weight image • The object is modeled as a class conditional color distribution computed with a histogram in the YCbCr color space. – YCbCr is a more efficient way of encoding RGB information • Given a pixel x with color I(x), the probability of the pixel given the object is p(I(x)/O) = hO(I(x)), where hO is the object histogram. • The histogram is generated from the object segmented in the first frame. 53 Object model and weight image (cont.) • The weight w(x) associated to the pixel x is the probability that the pixel represents the object, given its color – P(O) – probability that the pixel belongs to the object – P(I(x)) – probability that the pixel has color I: where p(B) is the probability that the pixel is part of the background • Each region Ri in the fitted partition is assigned a weight value, which is computed as the average of the individual weights of the pixels that form that region: 54 Object model and weight update • The object model p(I(x)/O) (i.e. object histogram) is recomputed at each frame, using the object segmented in the previous frame • p(I(x)) which depends on the background, is estimated, building a histogram hW of the pixels that are within the current search window W, which is recomputed at every iteration – avoids tracking failure when the background scene changes. • The value of p(O) is estimated as the ratio between the sizes in pixels of the object detected in the previous frame and the kernel. 55 Final shape definition • Once the mean shift converges, the fitted partition in the last search window is used to define the final shape of the object (three steps procedure): – Initial object mask – Shape matching – Final object mask 56 Example by Vilaplana& Marques Region-based mean shift trackingResults Region-based mean shift tracking is compared with basic mean shift and demonstrate superior performance for the new method. 57