Similarity and Difference Pete Barnum January 25, 2006 Advanced Perception Visual Similarity Color Texture Uses for Visual Similarity Measures Classification Image Retrieval Is it a horse? Show me pictures of horses. Unsupervised segmentation Which parts of the image are grass? Histogram Example Slides from Dave Kauchak Cumulative Histogram Normal Histogram Cumulative Histogram Slides from Dave Kauchak Joint vs Marginal Histograms Images from Dave Kauchak Joint vs Marginal Histograms Images from Dave Kauchak Adaptive Binning Clusters (Signatures) Higher Dimensional Histograms Histograms generalize to any number of features Colors Textures Gradient Depth Distance Metrics y y x - x = Euclidian distance of 5 units - = Grayvalue distance of 50 values - =? Bin-by-bin Bad! Good! Cross-bin Bad! Good! Distance Measures Heuristic Nonparametric test statistics 2 (Chi Square) Kolmogorov-Smirnov (KS) Cramer/von Mises (CvM) Information-theory divergences Minkowski-form Weighted-Mean-Variance (WMV) Kullback-Liebler (KL) Jeffrey-divergence (JD) Ground distance measures Histogram intersection Quadratic form (QF) Earth Movers Distance (EMD) Heuristic Histogram Distances Minkowski-form distance Lp 1/ p p D( I , J ) f (i, I ) f (i, J ) i Special cases: L1: absolute, cityblock, or Manhattan distance L2: Euclidian distance L: Maximum value distance Slides from Dave Kauchak More Heuristic Distances Weighted-Mean-Variance Only includes minimal information about the distribution I J r I r J D (I , J ) r r r r r Slides from Dave Kauchak Nonparametric Test Statistics 2 Measures the underlying similarity of two samples DI , J i 2 ˆ f i; I f i , fˆ i fˆ i f i; I f i; J / 2 Images from Kein Folientitel Nonparametric Test Statistics Kolmogorov-Smirnov distance Measures the underlying similarity of two samples Only for 1D data Nonparametric Test Statistics Kramer/von Mises Euclidian distance Only for 1D data Information Theory Kullback-Liebler Cost of encoding one distribution as another Information Theory Jeffrey divergence Just like KL, but more numerically stable Ground Distance Histogram intersection Good for partial matches Ground Distance Quadratic form Heuristic D I , J f I f J t A f I f J Images from Kein Folientitel Ground Distance Earth Movers Distance g d DI , J g ij ij i, j ij i, j Images from Kein Folientitel Summary Images from Kein Folientitel Moving Earth ≠ Moving Earth ≠ Moving Earth = The Difference? (amount moved) = The Difference? (amount moved) * (distance moved) = Linear programming P m clusters (distance moved) * (amount moved) Q All movements n clusters Linear programming P m clusters (distance moved) * (amount moved) Q n clusters Linear programming P m clusters * (amount moved) Q n clusters Linear programming P m clusters Q n clusters Constraints 1. Move “earth” only from P to Q P m clusters P’ n clusters Q’ Q Constraints 2. Cannot send more “earth” than there is P m clusters P’ n clusters Q’ Q Constraints 3. Q cannot receive more “earth” than it can hold P m clusters P’ n clusters Q’ Q Constraints 4. As much “earth” as possible must be moved P m clusters P’ n clusters Q’ Q Advantages Uses signatures Nearness measure without quantization Partial matching A true metric Disadvantage High computational cost Not effective for unsupervised segmentation, etc. Examples Using Color (CIE Lab) Color + XY Texture (Gabor filter bank) Image Lookup L1 distance Image Lookup Jeffrey divergence χ2 statistics Quadratic form distance Earth Mover Distance Image Lookup Concluding thought - - - = it depends on the application