Human-Computer Interaction Segmentation Hanyang University Jong-Il Park Why segment images? Form large chunks of pixels that can be dealt with together for efficiency because these might represent objects Join up image tokens that together convey information Grouping Humans interpret image information collectively in “groups” Eg. Muller-Lyer illusion Applications Shot boundary detection summarize video by find shot boundaries obtain “most representative” frame Background subtraction find “interesting bits” of image by subtracting known background Eg. find person in an office Eg. find cars on a road Interactive segmentation user marks some foreground/background pixels system cuts object out of image useful for image editing, etc. Technique: Shot Boundary Detection Find the shots in a sequence of video shot boundaries cause big differences between succeeding frames Strategy: compute interframe distances declare a boundary where these are big Possible distances frame differences histogram differences block comparisons edge differences Technique: Background Subtraction If we know the background, easy to find “interesting bits” Approach: use a moving average to estimate background image subtract from current frame large absolute values are interesting pixels trick: use morphological operations to clean up pixels Interactive segmentation Goals User cuts an object out of one image to paste into another User forms a matte weights between 0 and 1 to mix pixels with background to cope with, say, hair Interactions mark some foreground, background pixels with strokes put a box around foreground Technical problem allocate pixels to foreground/background class consistent with interaction information segments are internally coherent and different from one another Superpixels Pixels are too small and too detailed a representation for recognition for some kinds of reconstruction Replace with superpixels small groups of pixels that are clumpy like one another a reasonable representation of the underlying pixels Segmentation as clustering Cluster together (pixels, tokens, etc.) that belong together Agglomerative clustering attach closest to cluster it is closest to repeat Divisive clustering split cluster along best boundary repeat The watershed algorithm An agglomerative clusterer with a special metric Clustering pixels Natural to use k-means represent pixels with intensity vector; color vector; vector of nearby filter responses perhaps position The Mean Shift Algorithm Originally intended to find modes in scattered data Strategy start at a promising estimate of mode iterate until the estimate doesn’t change fit a model of probability density to some points near estimate find the peak of this model Model smoothing kernel the update takes a special form shift the mode to a weighted mean of the nearby points hence the name. Clustering with Mean Shift Model data points as samples from a probability model clusters are associated with modes but it might be hard to find one mode per cluster if there’s more than one mode per cluster, they should be close together Apply mean shift to find modes modes should form small, widely separated clusters Now cluster the modes with (say) agglomerative clusterer easy, because there are small, widely separated clusters Point belongs to cluster that its closest mode belongs to Mean Shift Segmentation Cluster pixels using mean shift each cluster is a segment Represent with color, position important color distances are not the same as position distances choose one scale for each Evaluating Segmenters Collect “correct” segmentations from human labellers these may not be perfect, but ... Now apply your segmenter Count % human boundary pixels close to your boundary pixels -- Recall % of your boundary pixels close to human boundary pixels -- Precision Segmentation codes