Bi-level thresholding INF 5300, V-2004 Selected Themes from Digital Image Analysis Lecture 10 Friday 28.05.2004 • Histogram is assumed to be twin-peaked. Let P1 og P 2 be the a priori probabilities of background and foreground. (P1+P 2=1). Two distributions given by b(z) and f (z). The complete histogram is given by p(z) = P1 · b(z) + P2 · f (z) • The probabilities of mis-classifying a pixel, given a threshold t: E1 (t) = Repetition of Central Themes E2(t) = Fritz Albregtsen t f (z)dz −∞ ∞ Z b(z)dz t • The total error is : E(t) = P1 · Department of Informatics University of Oslo Z Z ∞ t b(z)dz + P2 · Z t f (z)dz −∞ • Differentiate with respect to the threshold t ∂E = 0 ⇒ P1 · b(T ) = P2 · f (T ) ∂t INF 5300, 2004, Lecture 10, page 1 of 52 INF 5300, 2004, Lecture 10, page 2 of 52 The method of Ridler and Calvard Bi-level thresholding • Initial threshold value, t0, equal to average brightness. • For Gaussian distributions (T −µ1 )2 (T −µ2 )2 − − P P √ 1 e 2σ12 = √ 2 e 2σ22 2πσ1 2πσ2 • Threshold value for k + 1-th iteration given by • Two thresholds may be necessary ! tk+1 = • If the variances are equal T = σ2 P2 (µ1 + µ2) + ln 2 (µ1 − µ2) P1 • If a priori probabilities are equal T = (µ1 + µ2) 2 INF 5300, 2004, Lecture 10, page 3 of 52 µ1(tk ) + µ2 (tk ) 1 = 2 2 "P # PG−1 z=tk +1 zp(z) + P G−1 z=0 p(z) tk +1 p(z) tk Pz=0 tk zp(z) • Note that µ1(t) and µ2(t) are the a posteriori mean values, estimated from overlapping and truncated distributions. The a priori µ1 and µ2 are unknown to us. • The correctness of the estimated threshold depends on the extent of the overlap, as well as on the correctness of the P1 ≈ P2-assumption. INF 5300, 2004, Lecture 10, page 4 of 52 The method of Reddi The method of Otsu • Maximizes the a posteriori between-class variance σB2 (t), given by σB2 (t) = P1(t) [µ1(t) − µ0]2 + P2(t) [µ2(t) − µ0]2 • The expression for σB2 (t) reduces to σB2 (t) = P1 (t)µ21 (t) + P2(t)µ22 (t) − µ20 = [µ0 P1 (t) − µ1 (t)]2 . P1 (t) [1 − P1(t)] • Optimal threshold T is found by a sequential search for the maximum of σB2 (t) for values of t where 0 < P1(t) < 1. • The method of Reddi et al. is based on the same assumptions as the method of Otsu, maximizing the a posteriori between-class variance σB2 (t). • We may write σB2 = P1 (t)µ21(t) + P2(t)µ22 (t) − µ20 hP i2 P t 2 G−1 zp(z) z=t+1 zp(z) − µ20 + PG−1 σB2 (t) = Pz=0 t z=0 p(z) z=t+1 p(z) • Differentiating σB2 and setting δσB2 (t)/δt = 0, we find a solution for "P # PG−1 T z=T +1 zp(z) z=0 zp(z) = 2T + PG−1 PT z=0 p(z) z=T +1 p(z) • Exhaustive sequential search gives same result as Otsu’s method. • Starting with a threshold t0 = µ0 , fast convergence is obtained equivalent to the ad hoc technique of Ridler and Calvard. INF 5300, 2004, Lecture 10, page 5 of 52 INF 5300, 2004, Lecture 10, page 6 of 52 Uniform error thresholding A “minimum error” method • Find T that minimizes the KL distance between observed histogram and model distribution. J(t) = 1 + 2 [P1 (t)lnσ1(t) + P2(t)lnσ2 (t)] −2 [P1 (t)lnP1 (t) + P2 (t)lnP2 (t)] . • As t varies, model parameters change. Compute J(t) for all t; find minimum. • The a posteriori model parameters will represent biased estimates. Correctness relies on small overlap. Improved estimates of parameters are possible. • The uniform error threshold is given by (see page 2) E1 (t) = E2 (t) • For a given threshold t, let p(t) = fraction of background pixels above t q(t) = fraction of object pixels with gray level above t. • The uniform error threshold is then found when p(t) = 1 − q(t) or equivalently φ − 1 = 0, where φ = p + q. • Find solution by using b2 − c = φ2 . a2 − b a = αp + (1 − α)q b = αp2 + (1 − α)q 2 c = αp4 + (1 − α)q 4 α(t) = the background area. • In a single pass through the image, a table may be formed, giving estimates of a, b, c for all values of t. • Select gray level t where | φ − 1 | is a minimum. INF 5300, 2004, Lecture 10, page 7 of 52 INF 5300, 2004, Lecture 10, page 8 of 52 Two-feature entropy Entropy-based methods • For two distributions separated by a threshold t the sum of the two class Shannon entropies are ψ(t) = − t G−1 X X p(z) p(z) p(z) p(z) ln − ln P (t) P (t) 1 − P (t) 1 − P1(t) 1 1 1 z=0 z=t+1 • Using Ht = − HG = − t X H1(st) = − H2(st) = − where t s X X pij pij ln P Pst i=0 j=0 st G−1 G−1 X X i=s+1 j=t+1 Pst = − p(z)ln(p(z)) z=0 G−1 X • For two distributions and a threshold pair (s, t), where s and t denote gray level and average gray level, the entropies are p(z)ln(p(z)) z=0 the sum of the two entropies may be written as HG − H t Ht + . ψ(t) = ln [P1 (t)(1 − P1(t))] + P1(t) 1 − P1(t) • The discrete value T of t which maximizes ψ(t) is now the selected threshold. INF 5300, 2004, Lecture 10, page 9 of 52 pij pij ln 1 − Pst 1 − Pst t s X X pij . i=0 i=0 • The sum of the two entropies is now ψ(s, t) = H1 (st) + H2 (st) = ln [Pst (1 − Pst )] + Hst HGG − Hst + Pst 1 − Pst where the total system entropy HGG and the partial entropy Hst are given by HGG = − G−1 G−1 XX i=0 j=0 pij ln(pij ), Hst = − s X t X pij ln(pij ) i=1 j=1 • The discrete pair (S, T ) which maximizes ψ(s, t) are now the threshold values which maximize the loss of entropy, and thereby the gain in information by introducing the two thresholds. INF 5300, 2004, Lecture 10, page 10 of 52 Exponential convex hull • “Convex deficiency” is obtained by subtracting the histogram from its convex hull. • This may work even if no “valley” exists. • Upper concavity of histogram tail regions can often be eliminated by considering ln{p(z)} instead of the histogram p(z). • In the ln{p(z)}-domain, upper concavities are produced by bimodality or shoulders, not by tail of normal or exponential, nor by extension of histogram. • Transform histogram p(z) by ln{p(z)}, compute convex hull, and transform convex hull back to histogram domain by he(k) = exp(h(k)). • Threshold is found by sequential search for maximum exponential convex hull deficiency. INF 5300, 2004, Lecture 10, page 11 of 52 Texture Analysis Methods • Statistical methods are often based on accumulating second or higher order statistics (matrices), and using feature vectors that descrive these probability distributions directly, and therefore describe the image texture only indirectly. • Structural methods are based upon an assumption that textures are composed of texels which are regular and repetitive. Both texels and placement rules have to be described. • Structural-statistical methods characterize the texel by a feature vector and describe the probability distribution of these features statistically INF 5300, 2004, Lecture 10, page 12 of 52 Gray Level Cooccurrence Matrices • How is the matrix constructed? • What size has it? Gray Level Run Length • What order is it? • How can we make the statistics isotropic? • What does it look like? • What role does the pixel distance parameter play? • What do the different static GLCM features measure? • How many - and which of them - should we use? • How is the matrix constructed? • What size has it? • What order is it? • How can we make the statistics isotropic? • What does it look like? • How can it be simplified? • What is the relation to sum and difference histograms? INF 5300, 2004, Lecture 10, page 13 of 52 INF 5300, 2004, Lecture 10, page 14 of 52 Generalized Cooccurrence Matrices • Davis et al. (1979) introduced generalized matrices (GCM). • GCM was based on local maxima of the gradient image of the texture. • Coocurrence of gradient magnitude and direction, using spatial constraint predicates instead of specific geometric distances. Cooccurrence of Gray Level Runs • How can we combine the two methods? • Can we produce adaptive - not static features? • Could be “cooccurrence of anything”. INF 5300, 2004, Lecture 10, page 15 of 52 INF 5300, 2004, Lecture 10, page 16 of 52 What is “shape” ? Assumptions • We have a segmented, labeled image. • A numerical description of the spatial configurations in the image. • Each object that is to be described has been identified. • There is no generally accepted methodology of shape description. • The image objects can be represented as • Location and description of high curvature points give essential information. • Shape description of 2D planar objects is “easy”. — binary image (whole regions) — contour (region boundaries) — through a run length code — through a chain code — through a quad tree • Shape is defined in an image, but its usefulness in a 3D world depends on how well the 3D -> 2D mapping is handled. — in cartesian coordinates • Invariance is an important issue. — as coefficients of some transform INF 5300, 2004, Lecture 10, page 17 of 52 — in polar coordinates — in some other coordinates — ... INF 5300, 2004, Lecture 10, page 18 of 52 Invariance of features • Assume that we have an object, and that we want to extract some features to describe the object. Shape Feature • We may wish that the features are: • Area from the number of pixels in the region. — Position invariant independent of the position of the object within the image. • Area from boundary contour (Green’s theorem). — Scaling invariant independent of the size of the object. • Boundary from recursive splitting. — Rotation invariant independent of the orientation of the object. • Boundary from sequential polygonization. • Perimeter from chain codes. — Warp invariant independent of a deformation of the object. • In most cases we want position invariant features. • The other depend on the application. INF 5300, 2004, Lecture 10, page 19 of 52 INF 5300, 2004, Lecture 10, page 20 of 52 Non-orthogonal moments The continuous two-dimensional (p + q)-th order Cartesian moment is defined as: Z ∞Z ∞ mpq = xpy q f (x, y)dxdy Statistical moments −∞ (2) −∞ It is assumed that f (x, y) is a piecewise continuous, bounded function and that it can have non-zero values only in the finite region of the xy plane. Then, moments of all orders exist and the uniqueness theorem holds: The general form of a moment of order (p + q), evaluating over the complete image plane ξ is: Z Z ψpq (x, y)f (x, y)dxdy mpq = (1) ξ where the weighting kernel or basis function is ψpq . This produces a weighted description of f (x, y) over ξ. The choice of basis function depends on the application and on any desired invariant properties. The moment sequence mpq with basis xpy q is uniquely defined by f (x, y) and f (x, y) is uniquely defined by mpq . Thus, the original image can be described and reconstructed, if sufficiently high order moments are used. The discrete version of the Cartesian moment for an image consisting of pixels Pxy , is: mpq = M X N X xpy q Px,y (3) x=1 y=1 mpq is a two dimensional Cartesian moment, where M and N are the image dimensions and the monomial product xpy q is the basis function. INF 5300, 2004, Lecture 10, page 21 of 52 INF 5300, 2004, Lecture 10, page 22 of 52 Low order moments Central moments • The zero order moment m00 is defined as the total mass (or power) of the image. • For a binary M × N image of an object, this gives the number of pixels in the object. m00 = M X N X Px,y (4) x=1 y=1 • The two first order moments are used to find the Centre Of Mass (COM) of an image. If this is applied to a binary image and the results are normalised by m00, then the result is the centre co-ordinates of the object. m01 m10 ȳ = (5) x̄ = m00 m00 INF 5300, 2004, Lecture 10, page 23 of 52 • The definition of a 2D discrete central moment is: XX µp,q = (x − x̄)p(y − ȳ)q f (x, y) x y where x̄ = m10 , m00 ȳ = m01 m00 • This corresponds to computing ordinary Cartesian moments after translating the object so that center of mass is in origo. • This means that central moments are invariant under translation. • Central moments are not scaling or rotation invariant. INF 5300, 2004, Lecture 10, page 24 of 52 Moments of inertia • The two second order central moments µ20 = XX (x − x̄)2f (x, y) x µ02 = y XX x y (y − ȳ)2f (x, y) correspond to the “moments of inertia” relative to the coordinate directions, while the cross moment of inertia is given by µ11 = XX (x − x̄)(y − ȳ)f (x, y) x Object orientation y • An elongated object having a random orientation will have moments of inertia that do not reflect the true shape of the object, as they are not orientation invariant. • Orientation is defined as the angle (relative to the x-axis) of the axis through the center of mass that gives the lowest moment of inertia. • Orientation, θ, relative to x-axis is found by minimizing the sum XX 2 I(θ) = β − β̄ f (α, β) α β where the rotated coordinates are given by α = x cos θ + y sin θ, β = y cos θ − x sin θ • Orientation is then given by θ= 1 2µ1,1 tan−1 2 µ2,0 − µ0,2 where θ ∈ [0, π/2] if µ11 > 0, and θ ∈ [π/2, π] if µ11 < 0. • The three second order µpq can easily be made invariant to rotation. INF 5300, 2004, Lecture 10, page 25 of 52 INF 5300, 2004, Lecture 10, page 26 of 52 Normalization and invariants Orientation invariant features • The radius of gyration of an object R̂ = r µ20 + µ02 µ00 • The semimajor and semiminor axes of the object ellipse v h i u p u 2 µ20 + µ02 ± (µ20 − µ02 )2 + 4µ2 11 t (â, b̂) = µ00 • The numerical eccentricity of the ellipse = r a2 − b 2 a2 INF 5300, 2004, Lecture 10, page 27 of 52 • Changing the scale of f (x, y) by (α, β) in the (x, y)-direction gives a new image 0 f (x, y) = f (x/α, y/β) 0 • The transformed central moments µpq can be expressed by the original µpq 0 µpq = α1+pβ 1+q µpq 0 • For β = α we have µpq = α2+p+q µpq . We get scaling invariant central moments by the normalization p+q µpq , γ= + 1, ∀(p + q) ≥ 2. ηpq = γ (µ00) 2 INF 5300, 2004, Lecture 10, page 28 of 52 Hu’s rotation invariance 1 Find principal axes of object, rotate coordinats. This method can break down when images do not have unique principal axes. 2 The method of absolute moment invariants. This is a set of seven combined normalized central moment invariants, which can be used for scale, position, and rotation invariant pattern identification. φ1 = η20 + η02 2 φ2 = (η20 − η02)2 + 4η11 φ3 = (η30 − 3η12)2 + (3η21 − η03)2 φ4 = (η30 + η12)2 + (η21 + η03)2 φ5 = (η30 − 3η12)(η30 + η12) (η30 + η12)2 − 3(η21 + η03)2 2 +(3η21 − η03)(η21 + η03) 3(η30 + η12) − (η21 + η03)2 2 2 φ6 = (η20 − η02) (η30 + η12) − (η21 + η03) + 4η11(η30 + η12)(η21 + η03) φ7 = (3η21 − η03)(η30 + η12) (η30 + η12)2 − 3(η21 + η03)2 +(3η12 − η30)(η21 + η03) 3(η30 + η12)2 − (η21 + η03)2 • φ7 is skew invariant, to help distinguish mirror images. • These moments are of finite order, therefore, they do not comprise a complete set of image descriptors. However, higher order invariants can be derived. Orthogonal moments • Moments produced using orthogonal basis sets have the advantage of needing lower precision to represent differences to the same accuracy as the monomials. • The orthogonality condition simplifies the reconstruction of the original function from the generated moments. • Orthogonality means mutually perpendicular: two functions ym and ym are orthogonal over an interval a ≤ x ≤ b if and only if: Z b ym (x)yn(x)dx = 0; m 6= n a • Here we are primarily interested in discrete images, so the integrals within the moment descriptors are replaced by summations. • Two such (well established) orthogonal moments are Legendre and Zernike. INF 5300, 2004, Lecture 10, page 29 of 52 INF 5300, 2004, Lecture 10, page 30 of 52 Legendre moments • The Legendre moments of order (m + n): Z Z (2m + 1)(2n + 1) 1 1 λmn = Pm(x)Pn(y)f (x, y)dxdy 4 −1 −1 (6) where m, n = 0, 1, 2, ..., ∞, Pm and Pn are the Legendre polynomials f (x, y) is the continuous image function. • For orthogonality to exist in the moments, the image function f (x, y) is defined over the same interval as the basis set, where the n-th order Legendre polynomial is defined as: Pn(x) = n X anj xj (7) j=0 and the Legendre coefficients are given by: anj = (−1)(n−j)/2 1 (n + j)! , n−j = even. (8) 2n ( (n−j) )!( (n+j) )!j! 2 2 • For a discrete image with current pixel Pxy , the Legendre moments of order (m + n) are given by (2m + 1)(2n + 1) X X λmn = Pm(x)Pn(y)Pxy (9) 4 x y Complex Zernike moments • The Zernike moment of order m and repetition n is m+1 XX f (x, y) [Vmn(x, y)]∗ , Amn = π x y where x2 + y 2 ≤ 1 m = 0, 1, 2, ..., ∞; f (x, y) is the image function, ∗ denotes the complex conjugate, and n is an integer (positive or negative) depicting the angular dependence or rotation, subject to the conditions m − |n| = even, |n| ≤ m • The Zernike moments are projections of the input image onto a space spanned by the orthogonal V functions Vmn(x, y) = Rmnejnθ √ where j = −1, and (m−|n|)/2 Rmn (x, y) = X s=0 (−1)s(x2 + y 2 )(m/2)−s (m − s)! s! m+|n| − s ! m−|n| −s ! 2 2 and x, y are defined over the interval [−1, 1]. INF 5300, 2004, Lecture 10, page 31 of 52 INF 5300, 2004, Lecture 10, page 32 of 52 Orthogonal radial polynomial • The Zernike polynomials Vmn(x, y), expressed in polar coordinates are: Vmn(r, θ) = Rmn(r)ejnθ where (r, θ) are defined over the unit disc and Rmn is the orthogonal radial polynomial (m−|n|)/2 X Rmn(r) = (−1)sF (m, n, s, r) s=0 where F (m, n, s, r) = (m − s)! r(m−2s) m−|n| − s ! − s ! s! m+|n| 2 2 • The first radial polynomials are R00 = 1 , R11 = r R20 = 2r2 − 1 , R22 = r2 R31 = 3r3 − 2r , R33 = r3 • So for a discrete image, if P(x,y) is the current pixel, m +1 XX Amn = P (x, y) [Vmn(x, y)]∗ , x2 + y 2 ≤ 1 π x y Image reconstruction • The image within the unit circle may be reconstructed to an arbitrary precision by f (x, y) = lim N →∞ • Suppose we have an object S and that we are able to find the length of its contour. • We partition the contour into M segments of equal length, and thereby find M equidistant points along the contour of S. • The coordinates (x, y) of these M points are then put into a complex vector f f (k) = x(k) + iy(k), k ∈ [0, M − 1] • We view the x-axis as the real axis and the y-axis as the imaginary one for a sequence of complex numbers (Granlund 1972). • The description of the object contour is changed, but all the information is preserved. • And we have transformed the contour problem from 2D to 1D. INF 5300, 2004, Lecture 10, page 35 of 52 AnmVnm(x, y) n=0 m where the second sum is taken over all |m| ≤ n, such that n − |m| is even. • The contribution of the Zernike moment of order m to the reconstruction is X |Im(x, y)| = | AmnVmn(r, θ)| n where x2 + y 2 ≤ 1, |n| ≤ m and m − |n| is even. • Gibbs phenomena may appear in the reconstructed object. This is caused by the inability of a continuous function to recreate a step function - no matter how many finite high order terms are used, an overshoot of the function will occur. Outside of the original area of a binary object, “ripples” of overshoot of the continuous function may be visible. INF 5300, 2004, Lecture 10, page 33 of 52 Contour description N X X INF 5300, 2004, Lecture 10, page 34 of 52 Fourier-coefficients • We perform a forward Fourier transform M −1 1 X −2πiuk F (u) = f (x) exp M M k=0 for u ∈ [0, M − 1]. • F (0) now contains the center of mass of the object, and the coefficients F (1), F (2), F (3), ..., F (M − 1) will describe the object in increasing detail. • These features depend on rotation, scaling and starting point on the contour. • We do not want to use all coefficients as features, but terminate at F (N ), N < M . • This corresponds to setting F (k) = 0, k > N − 1 INF 5300, 2004, Lecture 10, page 36 of 52 Approximation • When transforming back, we get an approximation to the original contour N −1 X 2πiuk ˆ f (k) = F (u) exp M u=0 defined for k ∈ [0, M − 1]. • We have only used N features to reconstruct each component of fˆ(k), but k still runs from 0 to M − 1. • The number of points in the approximation is the same (M ), but the number of coefficients (features) used to reconstruct each point is smaller (N < M ). • The first 10 – 15 descriptorsare found to be sufficient for character description. Why “CBIR” ? • Large databases of digital images are accessible. — high volumes produced by scanners and digital cameras — larger storage capacities for lower costs — easy access to emormous image volumes via internet • Manual indexing by keywords is — very time consuming — unrewarding — unlikely to specify all aspects of image • The Fourier descriptors can be invariant to translation and rotation if the co-ordinate system is appropriately chosen. INF 5300, 2004, Lecture 10, page 37 of 52 Query characteristics • Queries formulated by combinations of low-level image features such as color, texture and shape. • Specified explicitely by feature values or by feature range. • Implicite specification by example. • Spatial organization of features, giving absolute or relative location. • Relevance feedback: allow user to refine search by indicating relevance of returned images. INF 5300, 2004, Lecture 10, page 39 of 52 INF 5300, 2004, Lecture 10, page 38 of 52 Problems • What features will generally describe the content of an image well? • How to summarize the distribution of these features over an image? • How to measure the dissimilarity between distributions of features? • How to effectively display the results of a search? • How to browse images of a database in an intuitive and efficient way? INF 5300, 2004, Lecture 10, page 40 of 52 Distances and metrics Selecting features • The focus is often on color. • A space is called a metric space if for any of its two elements x amd y, there is a number ρ(x, y), called the distance, that satisfies the following properties • The distribution of colors within the image is often a useful clue to the content of the image. — ρ(x, y) ≥ 0 (non-negativity) • Absolute or relative locations of different color distributions improve result. — ρ(x, y) = ρ(y, x) (symmetry) • One has to select some color representation — color space (e.g. RGB, IHS, Lab, ...) — representation of distribution • While color is a single-pixel property, texture describes the appearance of bigger regions. — ρ(x, y) = 0 if and only if x = y (identity) — ρ(x, z) ≤ ρ(x, y) + ρ(y, z) (∆ inequality) • Distances between two points x and µ in n-dimensional space 1) Euclidian DE (x, µ) =k x − µ k= — Statistical methods — Structural methods — MRF methods — Filter-based methods • For both color and texture, one has to select features that relate to perceptual similarity. " n X k=1 (xk − µk )2 #1/2 2) “City block”/”Taxi”/ “Absolute value” n X |xk − µk | D4(x, µ) = k=1 3) “Chessboard”/”Maximum value” D8(x, µ) = max |xk − µk | INF 5300, 2004, Lecture 10, page 41 of 52 Bin-by-bin dissimilarity The distance between two distributions. Useful when comparing e.g. histograms in image search and retrieval. • Minkowski distance: dLp (H, K) = X i |hi − ki|p !1/p L1 often used to compute dissimilarity between color images. L2 and L∞ often used for texture dissimilarity. L1-based retrieval may give many false negatives, as neighboring bins are not considered. • Histogram intersection: P min(hi, ki) d∩ = 1 − i P i ki Attractive because it handles partial matches when area of one histogram is smaller than the other. When areas are equal, it is equaivalent to normalized L1 distance. INF 5300, 2004, Lecture 10, page 43 of 52 INF 5300, 2004, Lecture 10, page 42 of 52 Bin-by-bin dissimilarity -II • Kullback-Leibner divergence: X hi dKL(H, K) = hi log ki i Measures how inefficient it would be to code one histogram using the other as code-book. Non-symmetric, and sensitive to binning. • Jeffrey divergence: X hi ki dJ (H, K) = hi log + ki log mi mi i mi = (hi + ki)/2 Is a modification of K-L; symmetric and more robust to noise and binning. • χ2 statistics: dχ2 (H, K) = X (hi − mi)2 i mi Measures how unlikely it is that one distribution was drawn from the population represented by the other. INF 5300, 2004, Lecture 10, page 44 of 52 Cross-bin measures Drawbacks of bin-by-bin • Compares contents of corresponding histogram bins hi and ki for all i, but not hi and kj for i 6= j • K-L is justified by information theory, and χ2 by statistics, but they do not necessarily match perceptual similarity well. • This can be fixed by using correspondences between bins, and the cross-bin distance.. • Bin-by-bin is sensitive to bin size. Coarse binning may not give sufficient discrimination. Too fine binning may place similar features in different bins. • Cross-bin dissimilarity measures always yield better results when bins get smaller. • We need a cross-bin distance. Cross-bin distances use the ground distance dij , def. as the distance between the representative features for bin i and bin j. • Quadratic-form distance q dA(H, K) = (h − k)T A(h − k) where h and k are vectors listing all the entries in H and K. This is used for color in QBIC. • Cross-bin information comes in via a similarity matrix A = [aij ] where dij dmax With this choice, it can be shown that A is a metric. aij = 1 − • Quadratic-form distance may give false positives, as it will overestimate similarity of (color) distributions without a pronounced mode. INF 5300, 2004, Lecture 10, page 45 of 52 INF 5300, 2004, Lecture 10, page 46 of 52 Cross-bin measures - II • 1-D match distance dM (H, K) = Cross-bin measures - III X i |ĥi − k̂i| P where ĥi = j≤i hj is the cumulative histogram of {hi}, and similarly for {ki}. • The match distance is the L1 distance between the cumulative histograms. • For histograms having equal areas, this is a special case of the EMD (later). • The 1-D match distance does not extend to higher dimensions, because the ralation j ≤ i is not a total ordering in more than one dimension. • Match distance may be extended to multi-dimensional histograms by graph matching. INF 5300, 2004, Lecture 10, page 47 of 52 • Kolmogorov-Smirnov statistics dKS (H, K) = max(|ĥi − k̂i|) i where ĥi and {ki} are cumulative histograms. • K-S statistics is defined on the cumulative distributions, so that no binning is actually required. • Under the null hypothesis (data drawn from same distribution), the distribution of the statistics can be calculated, giving the significance of the result. • Similar to match distance, it is defined only for one dimension. INF 5300, 2004, Lecture 10, page 48 of 52 Special case of EMD Earth Mover’s Distance (EMD) • One of several measures of the minimum cost of matching elements between two histograms. • Given two distributions, one seen as piles of earth in feature space, the other as a collection of holes in the same space, we need to solve the transportation problem, finding the least amount of work needed to fill the holes with earth. • The Monge-Kantorowitch mass transfer problem (1781). This distance first used in computer vision by Werman, Peleg and Rosenfeld 1985. • EMD applies to histograms and signatures in any dimensions. • It allows for partial matches. • Generally: Solve linear optimization problem. • If ground distance is a metric and total weights of signatures is equal, the EMD is a true metric. INF 5300, 2004, Lecture 10, page 49 of 52 • Minimum cost distance between two one-dimensional distributions f (t) and g(t) is the L1 distance between the cumulative distribution functions Z x Z ∞ Z x dx f (t)dt − g(t)dt −∞ −∞ −∞ • If feature space is one-dimensional, ground distance is d(pi, qj ) = |pi − qj |, and the total weights of the two signatures are equal: ψ(P, Q) = m+n−1 X k=1 |p̂k − q̂k |(rk+1 − rk ) where r1 , r2 , ..., rm+n is the sorted list p1 , p2 , ..., pm, q1 , q2 , ..., qn, and p̂k = m X i=1 [Pi ≤ rk ] wpi , q̂k = n X j=1 [qj ≤ rk ] wqj where [·] is 1 when its argument is true, and 0 otherwise. Here P = {(p1, wp1), ..., (pm, wpm} is the first signature with m clusters, where pi is the cluster representative and wpi is the weight of the cluster; Q = {(q1, wq1), ..., (qn , wqn} is the second signature with n clusters; and D = [di,j ] is a ground distance matrix where di,j is the ground distance between clusters pi and qj . INF 5300, 2004, Lecture 10, page 50 of 52 Comparing Dissimilarity • A meaningful quality measure must be defined. • Image retrieval is measured by precision, which is the number of relevant images retrieved relative to the number of retrieved images, and recall, which is the number of relevant images retrieved, relative to the total number of relevant images in the database. • The relative importance of good recall vs. good precision differs according to the task at hand. • Performance comparisons should account for the variety of parameters that can affect the behaviour of each measure used. • Difference between feature-by-feature approach, and a systems approach. • Processing steps that affect performance independently should be evaluated separately, to lower complexity and heighten insight. Some general results • Bin-by-bin dissimilarity measures improve by increasing number of bins up to a point, then performance degrades. • Cross-bin dissimilarity measures perform better. • Signatures carry less information than histograms, but perform better. • Jeffrey divergence and χ2 statistics give almost identical results. • In color space, the L2 distance, by construction, matches the perceptual similarity between colors. • In histogram space, L1 is better than L2, which is better than L∞ • Ground truth should be available. INF 5300, 2004, Lecture 10, page 51 of 52 INF 5300, 2004, Lecture 10, page 52 of 52