INF 386, V-2003 Selected Themes from Digital Image Analysis Lecture 7 09.04.2003 Color and Texture based Image Retrieval Why “CBIR” ? • Large databases of digital images are accessible. — high volumes produced by scanners and digital cameras — larger storage capacities for lower costs — easy access to emormous image volumes via internet • Manual indexing by keywords is — very time consuming Fritz Albregtsen — unrewarding — unlikely to specify all aspects of image Department of Informatics University of Oslo INF 386, 2003, Lecture 1, page 1 of 24 INF 386, 2003, Lecture 1, page 2 of 24 Some systems - so far • QBIC (Nieblack et al. (IBM)) Querying images by content, using color, texture and shape SPIE Conf. on Storage and Retrieval for Image and Video Databases, 1993. • Chabot Retrieval from a relational database of images Computer, 1995. • VIRAGE (Bach et al.) Virage image search engine: an open framework for image management SPIE Conf. on Storage and Retrieval for Image and Video Databases, 1996. • Photobook (Pentland et al.) Photobook: content-based manipulation of image databases Int. Journ. of Computer Vision, 1996. • Excalibur (Feder) Image recognition and content-based retrieval for the world wide web Advanced Imaging, 1996. Query characteristics • Queries formulated by combinations of low-level image features such as color, texture and shape. • Specified explicitely by feature values or by feature range. • Implicite specification by example. • Spatial organization of features, giving absolute or relative location. • Relevance feedback: allow user to refine search by indicating relevance of returned images. • VisualSEEK (Smith) Integrated Spatial and Feature Image Systems: Retrieval, Analysis and Compression PhD Thesis, Columbia University, 1997. INF 386, 2003, Lecture 1, page 3 of 24 INF 386, 2003, Lecture 1, page 4 of 24 Selecting features Problems • What features will generally describe the content of an image well? • How to summarize the distribution of these features over an image? • How to measure the dissimilarity between distributions of features? • How to effectively display the results of a search? • How to browse images of a database in an intuitive and efficient way? • The focus is often on color. • The distribution of colors within the image is often a useful clue to the content of the image. • Absolute or relative locations of different color distributions improve result. • One has to select some color representation — color space (e.g. RGB, IHS, Lab, ...) — representation of distribution • While color is a single-pixel property, texture describes the appearance of bigger regions. — Statistical methods — Structural methods — MRF methods — Filter-based methods • For both color and texture, one has to select features that relate to perceptual similarity. INF 386, 2003, Lecture 1, page 5 of 24 INF 386, 2003, Lecture 1, page 6 of 24 Describing distributions • Describing CBIR features has to do with — Perceptual significance — Invariance (several aspects) — Efficiency • Distributions should be described in a way that reflects a human’s appreciation of similarities and differences. • At the same time, the distributions should be represented by a small data set, for efficiency, but rich enough to reproduce the essential information. • Histograms may use regular or adaptive binning, using a prototype determined by e.g. vector quantization. • Signatures represent a set of feature clusters, each represented by mean (or mode) and the fraction of pixels that belong to the cluster. Clusters in signatures are defined for each image. • A histogram can be seen as a signature, where each bin is a cluster. INF 386, 2003, Lecture 1, page 7 of 24 Distances and metrics • A space is called a metric space if for any of its two elements x amd y, there is a number ρ(x, y), called the distance, that satisfies the following properties — ρ(x, y) ≥ 0 (non-negativity) — ρ(x, y) = 0 if and only if x = y (identity) — ρ(x, y) = ρ(y, x) (symmetry) — ρ(x, z) ≤ ρ(x, y) + ρ(y, z) (∆ inequality) • Distances between two points x and µ in n-dimensional space 1) Euclidian DE (x, µ) =k x − µ k= " n X k=1 (xk − µk )2 #1/2 2) “City block”/”Taxi”/ “Absolute value” n X |xk − µk | D4(x, µ) = k=1 3) “Chessboard”/”Maximum value” D8(x, µ) = max |xk − µk | INF 386, 2003, Lecture 1, page 8 of 24 Bin-by-bin dissimilarity The distance between two distributions. Useful when comparing e.g. histograms in image search and retrieval. • Minkowski distance: dLp (H, K) = X i |hi − ki|p !1/p L1 often used to compute dissimilarity between color images. L2 and L∞ often used for texture dissimilarity. L1-based retrieval may give many false negatives, as neighboring bins are not considered. • Histogram intersection: P min(hi, ki) d∩ = 1 − i P i ki Attractive because it handles partial matches when area of one histogram is smaller than the other. When areas are equal, it is equaivalent to normalized L1 distance. INF 386, 2003, Lecture 1, page 9 of 24 Bin-by-bin dissimilarity -II • Kullback-Leibner divergence: X hi hi log dKL(H, K) = ki i Measures how inefficient it would be to code one histogram using the other as code-book. Non-symmetric, and sensitive to binning. • Jeffrey divergence: X hi ki hi log + ki log dJ (H, K) = m mi i i mi = (hi + ki)/2 Is a modification of K-L; symmetric and more robust to noise and binning. • χ2 statistics: dχ2 (H, K) = X (hi − mi)2 i mi Measures how unlikely it is that one distribution was drawn from the population represented by the other. INF 386, 2003, Lecture 1, page 10 of 24 Cross-bin measures Drawbacks of bin-by-bin • Compares contents of corresponding histogram bins hi and ki for all i, but not hi and kj for i 6= j • K-L is justified by information theory, and χ2 by statistics, but they do not necessarily match perceptual similarity well. • This can be fixed by using correspondences between bins, and the cross-bin distance.. • Bin-by-bin is sensitive to bin size. Coarse binning may not give sufficient discrimination. Too fine binning may place similar features in different bins. • Cross-bin dissimilarity measures always yield better results when bins get smaller. • We need a cross-bin distance. Cross-bin distances use the ground distance dij , def. as the distance between the representative features for bin i and bin j. • Quadratic-form distance q dA(H, K) = (h − k)T A(h − k) where h and k are vectors listing all the entries in H and K. This is used for color in QBIC. • Cross-bin information comes in via a similarity matrix A = [aij ] where dij dmax With this choice, it can be shown that A is a metric. aij = 1 − • Quadratic-form distance may give false positives, as it will overestimate similarity of (color) distributions without a pronounced mode. INF 386, 2003, Lecture 1, page 11 of 24 INF 386, 2003, Lecture 1, page 12 of 24 Cross-bin measures - II • 1-D match distance dM (H, K) = Cross-bin measures - III X i |ĥi − k̂i| P where ĥi = j≤i hj is the cumulative histogram of {hi}, and similarly for {ki}. • The match distance is the L1 distance between the cumulative histograms. • For histograms having equal areas, this is a special case of the EMD (later). • The 1-D match distance does not extend to higher dimensions, because the ralation j ≤ i is not a total ordering in more than one dimension. • Match distance may be extended to multi-dimensional histograms by graph matching. INF 386, 2003, Lecture 1, page 13 of 24 • Kolmogorov-Smirnov statistics dKS (H, K) = max(|ĥi − k̂i|) i where ĥi and {ki} are cumulative histograms. • K-S statistics is defined on the cumulative distributions, so that no binning is actually required. • Under the null hypothesis (data drawn from same distribution), the distribution of the statistics can be calculated, giving the significance of the result. • Similar to match distance, it is defined only for one dimension. INF 386, 2003, Lecture 1, page 14 of 24 Special case of EMD Earth Mover’s Distance (EMD) • One of several measures of the minimum cost of matching elements between two histograms. • Given two distributions, one seen as piles of earth in feature space, the other as a collection of holes in the same space, we need to solve the transportation problem, finding the least amount of work needed to fill the holes with earth. • The Monge-Kantorowitch mass transfer problem (1781). This distance first used in computer vision by Werman, Peleg and Rosenfeld 1985. • EMD applies to histograms and signatures in any dimensions. • It allows for partial matches. • Generally: Solve linear optimization problem. • If ground distance is a metric and total weights of signatures is equal, the EMD is a true metric. INF 386, 2003, Lecture 1, page 15 of 24 • Minimum cost distance between two one-dimensional distributions f (t) and g(t) is the L1 distance between the cumulative distribution functions Z ∞ Z ∞ Z ∞ f (t)dt − g(t)dt dx −∞ −∞ −∞ • If feature space is one-dimensional, ground distance is d(pi, qj ) = |pi − qj |, and the total weights of the two signatures are equal: ψ(P, Q) = m+n−1 X k=1 |p̂k − q̂k |(rk+1 − rk ) where r1 , r2 , ..., rm+n is the sorted list p1 , p2 , ..., pm, q1 , q2 , ..., qn, and p̂k = m X i=1 [Pi ≤ rk ] wpi , q̂k = n X j=1 [qj ≤ rk ] wqj where [·] is 1 when its argument is true, and 0 otherwise. Here P = {(p1, wp1), ..., (pm, wpm} is the first signature with m clusters, where pi is the cluster representative and wpi is the weight of the cluster; Q = {(q1, wq1), ..., (qn , wqn} is the second signature with n clusters; and D = [di,j ] is a ground distance matrix where di,j is the ground distance between clusters pi and qj . INF 386, 2003, Lecture 1, page 16 of 24 Color Spaces • RGB Based on adding the three primaries. Commonly used in CRT displays. • CMY(K) Subtractive system commonly used in printing. Black is redundant, and used only for technical reasons. • HSL and HSB Intuitive spaces. Allow users to specify colors easily Separates luminance, which is good for coding and transmission. • YIQ, YUV, YCbCr Used for different standards of TV transmission (NTSC, PAL, digital TV). • Opponent Colors Used for modelling color perception. Based on the fact that some pairs of hues cannot coexist in a single color sensation. INF 386, 2003, Lecture 1, page 17 of 24 Perceptally Uniform Spaces • For image retrieval, it is important to measure differences between colors so that it matches perceptual similarity. • This is simplified by perceptually uniform color spaces. • In 1976, CIE standardized two such color spaces, Luv and Lab. • L defines luminance, a, b and u, v define the chrominance. • Both spaces are defined with respect to the CIE XY Z color space, using a reference white [XnYn Zn] • XYZ is related to RGB by linear conversions R 0.412453 0.357580 0.180423 X Y = 0.212671 0.715160 0.072169 G B 0.019334 0.119193 0.950227 Z and X 3.240479 −1.537150 −0.498535 R G = −0.969256 1.875992 0.041556 Y Z 0.055648 −0.204043 1.057311 B INF 386, 2003, Lecture 1, page 18 of 24 Texture-Based Image Similarity Luminance and Chrominance • Luminance is the same for both spaces: ( 116(Y /Yn)1/3 − 16 if Y /Yn > 0.008856 ∗ L = 903.3(Y /Yn) otherwise • The chrominances are different: — CIE Luv: 0 0 u∗ = 13L∗(u − un ) 0 0 v ∗ = 13L∗(v − vn ) where 4X u = X + 15Y + 3Z 9Y 0 v = X + 15Y + 3Z 0 — CIE Lab: a∗ = 500(f (X/Xn) − f (Y /Yn)) b∗ = 200(f (y/Yn) − f (Z/Zn)) where f (t) = ( if Y /Yn > 0.008856 t1/3 7.787t + 16/116 otherwise • Any set of texture features may be used. • A texture may be represented by a vector of values, each corresponding to the energy in a specific scale and orientation subband. • Spectral decomposition methods include — Quadrature filters (Knutsson and Granlund 1983) — Gabor filters (Farrokhnia and Jain 1991) — Oriented pyramids of derivatives of a Gaussian (Perona 1991) — Various wavelets (Daubechies 1991) • The size of the texel may be larger than the support of the filter. • Texture regions may be inhomogeneous because of foreshortening and variations in illumination. • Natural textures are regular only in a statistical sense. • Images often contain several textures. INF 386, 2003, Lecture 1, page 19 of 24 INF 386, 2003, Lecture 1, page 20 of 24 Comparing Dissimilarity Texture Image Preprocessing • The size of the support of the filters may be a problem: — They may be large enough to straddle boundaries between different textures. This will give a mixed description. — They may be too small to see enough of the texture to give a reliable description. This gives descriptions of components of the texture. • Textures exhibit variations. • It is preferable to sift and summarize , before using texture features to compute texture signatures — eliminate vectors that describe mixtures — average away variations between adjacent descriptors of similar patches. • This should be done in texture vector space, not in intensity image. • Requires generalization of gradient to a vector function, using texture contrast to define significant regions as regions where the contrast is low after a few iterations of smoothing. • Only texture features that are in significant regions are used to compute the texture signature. • A meaningful quality measure must be defined. • Image retrieval is measured by precision, which is the number of relevant images retrieved relative to the number of retrieved images, and recall, which is the number of relevant images retrieved, relative to the total number of relevant images in the database. • The relative importance of good recall vs. good precision differs according to the task at hand. • Performance comparisons should account for the variety of parameters that can affect the behaviour of each measure used. • Difference between feature-by-feature approach, and a systems approach. • Processing steps that affect performance independently should be evaluated separately, to lower complexity and heighten insight. • Ground truth should be available. INF 386, 2003, Lecture 1, page 21 of 24 INF 386, 2003, Lecture 1, page 22 of 24 Multidimensional Scaling Some general results • Bin-by-bin dissimilarity measures improve by increasing number of bins up to a point, then performance degrades. • Cross-bin dissimilarity measures perform better. • Signatures carry less information than histograms, but perform better. • Jeffrey divergence and χ2 statistics give almost identical results. • In color space, the L2 distance, by construction, matches the perceptual similarity between colors. • In histogram space, L1 is better than L2, which is better than L∞ INF 386, 2003, Lecture 1, page 23 of 24 • Given a set of n images and the dissimilarities δij between them, the MDS technique computes a configuration of points {pi} in a low-dimensional Euclidian space Rd so that the Euclidian distances dij = ||pi − pj || between the points in Rd match the original dissimilarities δij between the corresponding images as well as possible. • Kruskal (1964) formulated a minimization of "P # 2 1/2 (f (δ ) − d ) ij ij i,j P 2 Ψ= i,j dij • There are two types of MDS: — In the metric MDS, f (δij ) is a monotonic, metric-preserving function. — In non-metric MDS, f (δij ) is a weakly monotonic transformation that only preserves the rank-ordering of the δij ’s. INF 386, 2003, Lecture 1, page 24 of 24