INF 5300, V-2005 Selected Themes from Digital Image Analysis Lecture 10 08.04.2005 Moment Based Shape Representation Fritz Albregtsen Department of Informatics University of Oslo What is “shape” ? • A numerical description of the spatial configurations in the image. • There is no generally accepted methodology of shape description. • Location and description of high curvature points give essential information. • Many useful, application-dependent heuristics. • Shape description of 2D planar objects is “easy”. • Shape is defined in an image, but its usefulness in a 3D world depends on how well the 3D -> 2D mapping is handled. • Invariance is an important issue. INF 5300, 2005, Lecture 10, page 1 of 42 INF 5300, 2005, Lecture 10, page 2 of 42 Considerations Assumptions • We have a segmented, labeled image. • Each object that is to be described has been identified. • The image objects can be represented as — binary image (whole regions) — contour (region boundaries) — through a run length code — through a chain code — through a quad tree — in cartesian coordinates — in polar coordinates — in some other coordinates — as coefficients of some transform — ... • Input representation form, boundaries or whole regions? • Object reconstruction ability? • Incomplete shape recognition ability? • Local/global description? • Mathematical or heuristic techniques? • Statistical or syntactic object description? • Robustness of description to translation, rotation, and scale transformations? • Shape description properties in different resolutions? — description changes discontinously. • Robustness against — noise — geometric sampling (discretization) — quantization INF 5300, 2005, Lecture 10, page 3 of 42 INF 5300, 2005, Lecture 10, page 4 of 42 Area • Features may be computed both from the region and its boundary. Recursive boundary splitting • Area is given by the number of pixels of which the region consists. In its original form (Douglas and Peucker 1973 ), the algorithm is as follows : • The surface integral over S (contour C) is given by Green’s theorem: 1) Draw a straight line between the first and the last point. A= Z Z dxdy = S Z xdy C s := 0.0; n := n + 1; pkt[n].x := pkt[1].x; pkt[n].y := pkt[1].y; for i:=2 step 1 until n do begin dy := pkt[i].y - pkt[i-1].y s := s + (pkt[i].x + pkt[i-1].x)/2 * dy; end; area := if (s > 0) then s else -s; This is probably the most frequently used method. The method is not sequential, and is not particularly fast. • The region can also be represented by n polygon vertices A= 2) For each intermediate point, compute the distance from the line. - The point lying farthest from the line becomes a new break point, provided that its distance from the line is greater than a given threshold. 1) and 2) are then repeated for the two new parts of the path. n−1 1 X (ik jk+1 − ik+1jk ) 2 k=0 where the sign of the sum reflects the polygon orientation. INF 5300, 2005, Lecture 10, page 5 of 42 Boundary polygonization • The polygonization method of Wall and Danielsson (1984) steps from point to point through an ordered sequence of points, and outputs the previous point as a new breakpoint if the area deviation per unit length of the approximating line segment exceeds a prespecified tolerance, TW D . • Using the previous breakpoint as the current origin, the area between the curve f (x) and the approximating line segment is accumulated by Perimeter from chain codes Freeman computed the perimeter as the length of the chain. The formula for the perimeter is √ (1) P = ne + 2no where ne is the number of even chain elements and no the number of odd chain elements. An even chain element indicates a vertical or horizontal connection between two boundary pixels, having length 1, while an odd chain element indicates a diagonal √ connection, which has length 2. Vossepoel and Smeulders improved Freeman’s method in estimating lengths of straight lines by using a corner count n c , defined as the number of occurrences of consecutive unequal chain elements in the Freeman chain code string. The length is given by P = 0.980ne + 1.406no − 0.091nc (2) f(x) si Ai x 1 Ai = Ai−1 + (yixi−1 − xiyi−1) , 2 INF 5300, 2005, Lecture 10, page 6 of 42 si = q where the weights were found by a least-square fitting for all straight lines with ne + no = 1000. x2i + yi2 • If | Ai | /si < T , i is incremented and (Ai, si) is recomputed. • Otherwise, the previous point is a new breakpoint. The origin is moved, and the previous value of si is stored. • This method is purely sequential and very fast. INF 5300, 2005, Lecture 10, page 7 of 42 The methods based on the chain coding compute the perimeter as the length of the chain, and often give an overestimated result. Kulpa derived a compensation factor for computing the length of straight lines. With this factor, Eq. (1) becomes √ √ π (3) P = (1 + 2)(ne + 2no ) 8 where the factor is approximately 0.948. Kulpa found that this compensation also gave good results for most of the blob-like objects met in practice. Dorst and Smeulders proved that Eq. (3) gave a consistent estimate for the length of a circular arc of π/4. INF 5300, 2005, Lecture 10, page 8 of 42 Invariance of features • Assume that we have an object, and that we want to extract some features to describe the object. Shape invariants • Shape descriptors depend on viewpoint, => object recognition may often be impossible if object or observer changes position. • We may wish that the features are: — Position invariant independent of the position of the object within the image. • Shape description invariance is important – shape invariants represent properties which remain unchanged under an appropriate class of transforms. • Stability of invariants is a crucial property which affects their applicability. The robustness of invariants to image noise and errors introduced by image sensors is of prime importance. — Scaling invariant independent of the size of the object. — Rotation invariant independent of the orientation of the object. — Warp invariant independent of a deformation of the object. • In most cases we want position invariant features. • The other depend on the application. INF 5300, 2005, Lecture 10, page 9 of 42 INF 5300, 2005, Lecture 10, page 10 of 42 Statistical moments Moments are applicable to many aspects of image processing; invariant pattern recognition, image encoding, pose estimation. Here we are using them to extract properties that have analogies in statistics or mechanics. The general form of a moment of order (p + q), evaluating over the complete image plane ξ is: Z Z ψpq (x, y)f (x, y)dxdy (4) mpq = ξ where the weighting kernel or basis function is ψpq . Non-orthogonal moments The continuous two-dimensional (p + q)-th order Cartesian moment is defined as: Z ∞Z ∞ mpq = xpy q f (x, y)dxdy −∞ (7) −∞ It is assumed that f (x, y) is a piecewise continuous, bounded function and that it can have non-zero values only in the finite region of the xy plane. Then, moments of all orders exist and the uniqueness theorem holds: This produces a weighted description of f (x, y) over ξ. The basis functions may have a range of useful properties that may be passed onto the moments, producing descriptions which can be invariant under rotation, scale, translation and orientation. To apply this to digital images, Equation 4 needs to be expressed in discrete form. The probability density function (of a continuous distribution) is different from that of the probability of a discrete distribution. For simplicity we assume that ξ is divided into square pixels of dimension 1 × 1, with constant intensity I over each square pixel. So if Px,y is a discrete pixel value then: Px,y = I(x, y)∆A where ∆A is the sample or pixel area equal to one. Thus, XX Mpq = ψpq (x, y)Px,y ; p, q = 0, 1, 2, ..., ∞ x (5) (6) y The moment sequence mpq with basis xpy q is uniquely defined by f (x, y) and f (x, y) is uniquely defined by mpq . Thus, the original image can be described and reconstructed, if sufficiently high order moments are used. The discrete version of the Cartesian moment for an image consisting of pixels Pxy , is: mpq = M X N X xpy q Px,y (8) x=1 y=1 mpq is a two dimensional Cartesian moment, where M and N are the image dimensions and the monomial product xpy q is the basis function. The choice of basis function depends on the application and on any desired invariant properties. INF 5300, 2005, Lecture 10, page 11 of 42 INF 5300, 2005, Lecture 10, page 12 of 42 Fast computation of moments Green’s theorem states that if a function g(x, y) is integrable over a simply connected region O and can be decomposed as a sum of the derivatives of two functions N and M Low order moments g(x, y) = • The zero order moment m00 is defined as the total mass (or power) of the image. • For a binary M × N image of an object, this gives the number of pixels in the object. m00 = N M X X (9) Px,y then we have Z Z ∂N (x, y) ∂M (x, y) − ∂x ∂y g(x, y)dxdy = O Z [M (x, y)dx + N (x, y)dy] (11) (12) L where L is the anticlockwise oriented boundary of the region O. Replacing g(x, y) by the moment kernel xp y q , we obtain Z Z Z Z 1 1 xp+1y q dy = xpy q+1 dx xpy q dxdy = mpq = p + 1 q + 1 L L O (13) x=1 y=1 • The two first order moments are used to find the Centre Of Mass (COM) of an image. If this is applied to a binary image and the results are normalised by m00, then the result is the centre co-ordinates of the object. m01 m10 ȳ = (10) x̄ = m00 m00 Li and Shen (1991) approximate Eq. 13 by X p+1 q X p q+1 1 1 xi yi ∆yi = xi yi ∆xi mpq = p+1 q+1 (xi ,yi )∈L (14) (xi ,yi )∈L where ∆yi = yi+1 − yi and ∆xi = xi+1 − xi , and (xi+1, yi+1) is the point following (xi, yi) on L. From one edge point to the next, the change of coordinate values x and y is at most 1. When x increases or decreases by 1, we have (x + 1)py q = p X Cpk xk y q ; k=0 (x − 1)py q = p X (−1)p−k Cpk xk y q where Cpk is the binomial coefficient ! p p! . Cpk = = k!(p − k)! k INF 5300, 2005, Lecture 10, page 13 of 42 (15) k=0 (16) INF 5300, 2005, Lecture 10, page 14 of 42 Our method Fast computation .. II (x ± 1)py q can be computed from xpy q . Li and Shen implement Eq. 15 by the method shown in the figure below. q y q xy (x 1) y q c d c d 2 (x 1) y 2 q xy q 3 (x 1) y 3 q xy .... .... xy Let O be a discrete region, and L be its 8-connected boundary. A boundary point is a pixel in the object, of which at least one of the four neighbors is in the background. We then have X X f (x, y) = (Fx (x, y)DY (x, y) + f (x, y)CY (x, y)) (17) O q L where .... p p q We have proposed a method which is computationally as effective as the method of Li and Shen, and achieves better accuracy. In our method, Green’s theorem is applied in a different way. .... (x 1) y q xp(y ± 1)q can be computed from xpy q in a similar way. In this way, xpiyiq is updated by only additions. To compute the 10 moments with order (p + q) ≤ 3, only 20 additions are needed to update xp+1y q when x is changed, and only 16 additions are needed to update xp+1y q when y is changed. The method of Li and Shen is simple to implement and effective in computation. To compute the 10 low order moments, it needs about 23L additions and 23 multiplications plus contour following. The disadvantage of this method is its inaccuracy for small objects and objects with complex shape. The inaccuracy is due to the approximation by Eq. 14. INF 5300, 2005, Lecture 10, page 15 of 42 Fx (x, y) = x X f (i, y) (18) i=1 and DY (x, y) and CY (x, y) are defined as 1 (x, y) is only a “right hand point” DY (x, y) = −1 (x, y) is only a “left hand point” 0 otherwise ( 1 (x, y) is a “left hand point” CY (x, y) = 0 otherwise. A “right hand point” is a boundary point at the right hand side of the object, and a “left hand point” is a boundary point at the left hand side of the object. A boundary point can simultaneously be both a “left” and a “right hand point”, in which case DY (x, y) = 0, CY (x, y) = 1. We propose to consider background to object transitions instead of the boundary points. Assuming that a pixel is a square with four sides, a background to object transition is a side of the boundary pixel at which the background and the object are divided. A boundary pixel may have one to four transitions. A background to object transition has four possible directions, as illustrated in the next figure. INF 5300, 2005, Lecture 10, page 16 of 42 Our method .. III Our method .. II 2 1 3 If we let upq = 0 x i= x y i= y dyi = 0 x i= x−1 y i= y dyi = −1 xi = x y i= y dyi = 0 xi = x y i= y dyi = 1 X q xp+1 i yi ∆yi (22) i Eq. 21 becomes p These four directions are enumerated as 0, 1, 2, and 3. A triplet (xi, yi , ∆yi) can be computed for each transition from the background to the object. For the i’th transition, (xi, yi) is recorded as the coordinate of the boundary point, except when the direction is 1, then xi is x-coordinate of boundary point minus 1, i.e. xi = x − 1. ∆yi is 0 when the direction is 0 or 2, it is −1 when the direction is 1, and 1 when the direction is 3. By using the transitions instead of the boundary points, it is not difficult to rewrite Eq. 17 as X X f (x, y) = Fx (xi, yi)∆yi (19) i (x,y)∈O which is a sum over all the transitions from the background to the object. If a boundary point is a “left hand point”, then a transition with direction 1 will be recorded; if a boundary point is a “right hand point”, then a transition with direction 3 will be recorded. Letting f (x, y) = xp y q , we have Fx (xi, yi ) = xi X x=1 p xp yiq = q xp+1 xpy q X 1 j−1 i yi yiq (20) + i i + C Bj xp−j+1 i p+1 2 j p j=2 where Cpj−1 is the combination number, Bj is the j’th Bernoulli number, and Eq. 19 now becomes " p X xp+1y q xpy q X 1 j−1 i i + i i + Cp Bj xp−j+1 yiq ∆yi. (21) mpq = i p+1 2 j j=2 i mpq = upq up−1,q X 1 j−1 + + C Bj up−j,q . p+1 2 j p j=2 (23) The moment mpq is a linear combination of ujq for j = 0, 1, 2, · · · p. The monomials ujq can be computed by the method of Li and Shen. The 10 low order moments can thereafter be computed as u0q 1 0 0 0 m0q u m 1/2 1/2 0 0 1q 1q . = m2q 1/6 1/2 1/3 0 u2q u3q 0 1/4 1/2 1/4 m3q Our algorithm to compute the geometric moments mpq for all the orders from 0 up to (P + Q) for any integers P and Q is now stated as follows. • The contour following is applied, and a triple (xi, yi, ∆yi), as defined in the above text and illustrated in the figure, is recorded for each transition from the background to the object. q • xp+1 i yi for all p and q are computed by the updating as described in Figure 1. • upq as defined in Eq. 22 is then computed, and the moments mpq is a linear combination of ujq for j = 0, 1, 2, · · · p as in Eq. 23. The result will be exactly the same as the one obtained by the double summation in Eq. 8. INF 5300, 2005, Lecture 10, page 17 of 42 INF 5300, 2005, Lecture 10, page 18 of 42 Central moments Comparison of methods • Algorithms for fast, but not exact computation of moments are given by B.-C. Li and J. Chen, Pattern Recognition 24, 807-813, 1991. • Faster and exact algorithms are given by L. Yang and F. Albregtsen, Pattern Recognition 29, 1061-1073, 1996. L. Yang, F. Albregtsen, and T. Taxt, Graphical Models and Image Processing, 59, 97-108, 1997. • The definition of a 2D discrete central moment is: XX µp,q = (x − x̄)p(y − ȳ)q f (x, y) x y where x̄ = m10 , m00 ȳ = m01 m00 • This corresponds to computing ordinary Cartesian moments after translating the object so that center of mass is in origo. • This means that central moments are invariant under translation. • Central moments are not scaling or rotation invariant. INF 5300, 2005, Lecture 10, page 19 of 42 INF 5300, 2005, Lecture 10, page 20 of 42 µpq from mpq • Central moments µpq can easily be obtained from ordinary moments mpq . A translation of an image f (x, y) by (∆x, ∆y) in the (x, y)-direction gives a new image 0 f (x, y) = f (x − ∆x, y − ∆y) • If we assume that ∆x = x̄ and ∆y = ȳ, then the moments of p + q ≤ 3 are : µ00 µ10 µ20 µ02 µ11 µ30 µ12 µ21 µ03 = = = = = = = = = m00 µ01 = 0 m20 − x̄m10 m02 − ȳm01 m11 − ȳm10 m30 − 3x̄m20 + 2x̄2m10 m12 − 2ȳm11 − x̄m02 + 2ȳ 2 m10 m21 − 2x̄m11 − ȳm20 + 2x̄2m01 m03 − 3ȳm02 + 2ȳ 2m01 • The general 3D central moments, µpqr , are generally expressed by the mpqr moments: µpqr = q X p X r X s=0 t=0 u=0 ! p s −1[D−d] q t ! r u ! ∆xp−s∆y q−t ∆z r−u mstu where D = (p + q + r); d = (s + t + u) and the binomial coefficients are given by ! v w = Moments of inertia • The two second order central moments µ20 = XX (x − x̄)2f (x, y) x µ02 = y XX x y (y − ȳ)2f (x, y) correspond to the “moments of inertia” relative to the coordinate directions, while the cross moment of inertia is given by µ11 = XX (x − x̄)(y − ȳ)f (x, y) x y • An elongated object having a random orientation will have moments of inertia that do not reflect the true shape of the object, as they are not orientation invariant. • The three second order µpq can easily be made invariant to rotation. v! , w<v w!(v − w)! INF 5300, 2005, Lecture 10, page 21 of 42 INF 5300, 2005, Lecture 10, page 22 of 42 Object orientation • Orientation is defined as the angle (relative to the x-axis) of the axis through the center of mass that gives the lowest moment of inertia. • Orientation, θ, relative to x-axis is found by minimizing the sum XX 2 I(θ) = β − β̄ f (α, β) α β where the rotated coordinates are given by α = x cos θ + y sin θ, β = y cos θ − x sin θ Bounding rectangle • “Image-oriented bounding box” The smallest rectangle around the object, having sides that are parallell to the edges of the image. Found by searching for min and max x and y within the object (xmin , xmax, ymin, ymax). Then we get I(θ) = x y We require that ∂I(θ) ∂θ XX x XX y [(y − ȳ) cos θ − (x − x̄) sin θ]2 f (x, y) = 0 and get 2f (x, y) [(y − ȳ) cos θ − (x − x̄) sin θ]×[−(y − ȳ) sin θ − (x − x̄) cos θ] = 0 XX x = y 2f (x, y) (x − x̄)(y − ȳ)(cos2 θ − sin2 θ) XX x y 2f (x, y) (x − x̄)2 − (y − ȳ)2 sin θ cos θ 2 sin θ cos θ 2 tan θ 2µ11 = = tan 2θ = (µ20 − µ02 ) cos2 θ − sin2 θ 1 − tan2 θ • Orientation is then given by θ= 1 2µ1,1 tan−1 2 µ2,0 − µ0,2 • “Object-oriented bounding box” Smallest rectangle around the object, having one side parallell to the orientation of the object (θ). The transformation α = x cos θ + y sin θ β = −x sin θ + y cos θ is applied to all pixels in the object (or its boundary). Then search for αmin , αmax, βmin , βmax. where θ ∈ [0, π/2] if µ11 > 0, and θ ∈ [π/2, π] if µ11 < 0. INF 5300, 2005, Lecture 10, page 23 of 42 INF 5300, 2005, Lecture 10, page 24 of 42 Orientation invariant features • The radius of gyration of an object is defined as the radius of a circle where we could concentrate all the mass of the object without altering the moment of inertia about its center of mass. • This feature is invariant to orientation. R̂ = r Normalization and invariants • Changing the scale of f (x, y) by (α, β) in the (x, y)-direction gives a new image µ20 + µ02 µ00 • The object ellipse is also a simple, invariant object feature. The object ellipse is defined as the ellipse whose least and greatest moments of inertia equal those of the object. • The semimajor and semiminor axes v h i u p u 2 µ20 + µ02 ± (µ20 − µ02 )2 + 4µ2 11 t (â, b̂) = µ00 0 f (x, y) = f (x/α, y/β) 0 • The transformed central moments µpq can be expressed by the original µpq 0 µpq = α1+pβ 1+q µpq 0 • For β = α we have µpq = α2+p+q µpq . We get scaling invariant central moments by the normalization p+q µpq , γ= + 1, ∀(p + q) ≥ 2. ηpq = (µ00)γ 2 • The numerical eccentricity of the ellipse = r a2 − b 2 a2 INF 5300, 2005, Lecture 10, page 25 of 42 Symmetry properties • The degree and direction of skewness can be determined using the two third order moments, µ30 and µ03 . • Character recognition based on symmetry properties is possible, using the first seven scale-normalised central moments (η11 , η20 , η02 , η21 , η12 , η30 , η03 ). • Symmetry about the x or y axes will produce η11 = 0. Symmetry about the y axis ⇐ η12 = 0, η30 = 0. Symmetry about the x axis ⇐ η03 = 0, η12 > 0. • Further, the following generalities are true: ηpq = 0 ∀p = 0, 2, 4, ...; q = 1, 3, 5, ... for objects symmetric about the x axis. However, objects asymmetrical about the x axis produce: and ηpq < 0 ∀p = 0, 2, 4, ...; q = 1, 3, 5, ... ηp0 > 0, η0p > 0 ∀p = 0, 2, 4, ...; f (x, y) > 0. • The sign of the normalized central moments give qualitative information about the symmetry. The magnitude of the central moments gives a quantitative description. INF 5300, 2005, Lecture 10, page 27 of 42 INF 5300, 2005, Lecture 10, page 26 of 42 Hu’s rotation invariance 1 Find principal axes of object, rotate coordinats. This method can break down when images do not have unique principal axes. 2 The method of absolute moment invariants. This is a set of seven combined normalized central moment invariants, which can be used for scale, position, and rotation invariant pattern identification. φ1 = η20 + η02 2 φ2 = (η20 − η02)2 + 4η11 φ3 = (η30 − 3η12)2 + (3η21 − η03)2 φ4 = (η30 + η12)2 + (η21 + η03)2 φ5 = (η30 − 3η12)(η30 + η12) (η30 + η12)2 − 3(η21 + η03)2 2 +(3η21 − η03)(η21 + η03) 3(η30 + η12) − (η21 + η03)2 2 2 φ6 = (η20 − η02) (η30 + η12) − (η21 + η03) + 4η11(η30 + η12)(η21 + η03) φ7 = (3η21 − η03)(η30 + η12) (η30 + η12)2 − 3(η21 + η03)2 2 +(3η12 − η30)(η21 + η03) 3(η30 + η12) − (η21 + η03)2 • φ7 is skew invariant, to help distinguish mirror images. • These moments are of finite order, therefore, they do not comprise a complete set of image descriptors. However, higher order invariants can be derived. INF 5300, 2005, Lecture 10, page 28 of 42 Contrast invariants • A change in contrast gives a new intensity 0 distribution f (x, y) = cf (x, y). The transformed moments are then 0 µpq = cµpq cancels both scaling and contrast. The normalization is given by µpq ηpq = µ00 0 0 µ00 µ20 + µ02 • A set of four moments that are invariant under general affine transforms µ20µ02 − µ211 µ400 µ230µ203 − 6µ30µ21 µ12µ03 + 4µ30µ312 µ03 − 3µ212 µ221 = µ10 00 µ20(µ21µ30 − µ212 ) − µ11 (µ30µ03 − µ21µ12 + µ02 (µ30µ12 − µ221) = µ700 = {µ320µ203 − 6µ220µ02µ21 µ03 + 9µ220µ02 µ212 +12µ20µ211 µ21 µ03 + 6µ20µ11 µ02 µ30µ03 − 18µ20µ11µ02 µ21µ12 −8µ311µ30µ03 − 6µ20µ202 µ30µ12 + 9µ20µ202 µ221 +12µ211µ02 µ30 µ12 − 6µ11µ202 µ30 µ21 + µ302 µ230 }/µ11 00 I1 = • Abo-Zaid et al. have defined a normalization that 0 Affine invariants I2 I3 (p+q) 2 I4 0 • If we use µ00 = cα2 µ00 , µ02 = cα4 µ02 , and µ20 = cα4 µ20 , 0 we easily see that ηpq is invariant to both scale and contrast. • This normalization also reduces the dynamic range of the moment features, so that we may use higher order moments without having to resort to logarithmic representation. • Abo-Zaid’s normalization cancels the effect of changes in contrast, but not the effect of changes in 0 intensity: f (x, y) = f (x, y) + b. In practice, we often experience a combination : 0 f (x, y) = cf (x, y) + b. • J. Flusser, Pattern Recognition 33, 1405-1410, 2000 gives an excellent overview of moment invariants. • O. Trier, A.K. Jain and T. Taxt, "Feature Extraction Methods for Character Recognition-A Survey", Pattern Recognition, Vol. 29, 3, pp. 641-662, April 1996, gives a very good and useful overview. INF 5300, 2005, Lecture 10, page 29 of 42 INF 5300, 2005, Lecture 10, page 30 of 42 Legendre moments Orthogonal moments • Moments produced using orthogonal basis sets have the advantage of needing lower precision to represent differences to the same accuracy as the monomials. • The orthogonality condition simplifies the reconstruction of the original function from the generated moments. • Orthogonality means mutually perpendicular: two functions ym and ym are orthogonal over an interval a ≤ x ≤ b if and only if: Z b ym (x)yn(x)dx = 0; m 6= n • The Legendre moments of order (m + n): Z Z (2m + 1)(2n + 1) 1 1 Pm(x)Pn(y)f (x, y)dxdy λmn = 4 −1 −1 (24) where m, n = 0, 1, 2, ..., ∞, Pm and Pn are the Legendre polynomials f (x, y) is the continuous image function. • For orthogonality to exist in the moments, the image function f (x, y) is defined over the same interval as the basis set, where the n-th order Legendre polynomial is defined as: Pn(x) = • Two such (well established) orthogonal moments are Legendre and Zernike. anj xj (25) j=0 and the Legendre coefficients are given by: a • Here we are primarily interested in discrete images, so the integrals within the moment descriptors are replaced by summations. n X anj = (−1)(n−j)/2 1 (n + j)! , 2n ( (n−j) )!( (n+j) )!j! 2 2 n − j = even. (26) • For a discrete image with current pixel Pxy , the Legendre moments of order (m + n) are given by (2m + 1)(2n + 1) X X Pm(x)Pn(y)Pxy (27) λmn = 4 x y and x, y are defined over the interval [−1, 1]. INF 5300, 2005, Lecture 10, page 31 of 42 INF 5300, 2005, Lecture 10, page 32 of 42 Complex Zernike moments • The Zernike moment of order m and repetition n is m+1 XX f (x, y) [Vmn(x, y)]∗ , Amn = π x y where Orthogonal radial polynomial • The Zernike polynomials Vmn(x, y), expressed in polar coordinates are: Vmn(r, θ) = Rmn(r)ejnθ where (r, θ) are defined over the unit disc and Rmn is the orthogonal radial polynomial (m−|n|)/2 x2 + y 2 ≤ 1 m = 0, 1, 2, ..., ∞; f (x, y) is the image function, ∗ denotes the complex conjugate, and n is an integer (positive or negative) depicting the angular dependence or rotation, subject to the conditions m − |n| = even, (m−|n|)/2 Rmn (x, y) = X s=0 (−1)s(x2 + y 2 )(m/2)−s (m − s)! − s ! m−|n| −s ! s! m+|n| 2 2 (−1)sF (m, n, s, r) s=0 where F (m, n, s, r) = s! |n| ≤ m • The Zernike moments are projections of the input image onto a space spanned by the orthogonal V functions Vmn(x, y) = Rmnejnθ √ where j = −1, and X Rmn(r) = m+|n| 2 (m − s)! r(m−2s) − s ! m−|n| −s ! 2 • The first radial polynomials are R00 = 1 , R11 = r R20 = 2r2 − 1 , R22 = r2 R31 = 3r3 − 2r , R33 = r3 • So for a discrete image, if P(x,y) is the current pixel, m +1 XX P (x, y) [Vmn(x, y)]∗ , x2 + y 2 ≤ 1 Amn = π x y INF 5300, 2005, Lecture 10, page 33 of 42 INF 5300, 2005, Lecture 10, page 34 of 42 Relating Zernike and Cartesian Orthogonal radial polynomial II • To calculate the Zernike moments, the image (or region of interest) is first mapped to the unit disc using polar coordinates, where the centre of the image is the origin of the unit disc. Those pixels falling outside the unit disc are not used in the calculation. • When mapping from Cartesian to polar coordinates care must be taken as to which quadrant the Cartesian coordinates appear in. • Translation invariance is achieved by moving the origin to the image’s COM. • Scale invariance is produced by altering each object so that its area (or pixel count for a binary image) is m00 = β, where β is a predetermined value. • For a binary image, both invariance properties can be achieved using r x y β + x̄, + ȳ , a = h(x, y) = f a a m00 As a result, |A11 | is set to zero and |A00| = βπ . This will be affected by the discrete implementation where the error in the mapping will decrease as the object size (or the pixel-resolution) increases. • Zernike polynomials are given by Vmn (r, θ) = Rmn (r)ejnθ and the radial polynomials are defined by P(m−|n|)/2 Rmn (r) = s=0 (−1)sF (m, n, s, r) • Re-arranging F (m, n, s, r) gives Fmn(r) = s! = s! m+|n| 2 (m − s)! r(m−2s) − s ! m−|n| −s ! 2 (m − s)! r(m−2s) ! m−2s−|n| ! 2 m−2s+|n| 2 • Substituting k = (m − 2s) and rearranging gives Rmn (r) = m X k=n Bmnk = Bmnk rk , (m − k)even, n ≥ 0 (−1)(m−k)/2 m+k ! 2 k+n k−n m−k ! 2 ! 2 ! 2 • This produces Zernike moment definitions (in continuous form): Z 2π Z 1 m m+1X Bmnk rk e−jnθ f (r, θ)rdrdθ; r ≤ 1 Amn = π 0 0 k=n which when translated to Cartesian coordinates is Z Z m m+1X Amn = Bmnk (x−jy)n(x2+y 2 )(k−n)/2f (x, y)dxdy; x2+y 2 ≤ 1 π x y k=n • The double integral can now be expressed in terms of a series of summed Cartesian moments, e.g. Z Z 0 1 1X B00k (x − jy)0(x2 + y 2 )(k)/2 f (x, y)dxdy = m00 Z00 = π π x y k=0 INF 5300, 2005, Lecture 10, page 35 of 42 INF 5300, 2005, Lecture 10, page 36 of 42 Moment noise sensitivity • Various invariant moment schemes have proved useful in recognition and reconstruction tests for images containing very little or no noise. • In the presence of noise, the Hu invariant moments begin to degrade. • Higher order moments are more vulnerable to white noise, thus making their use undesirable for pattern recognition. • When using Hu’s moments, even a slight discrepancy in the image (i.e. minor shape deformation or digitisation errors) can cause considerable confusion if trying to discriminate between two similar images. • However, noise simulation is very involved, and is highly dependent on the type of noise being simulated, its distribution, how it is applied etc. Care must be taken when generalising to alternative noise-related conditions. INF 5300, 2005, Lecture 10, page 37 of 42 Contour invariants Image reconstruction • The image within the unit circle may be reconstructed to an arbitrary precision by f (x, y) = lim N →∞ N X X AnmVnm(x, y) n=0 m where the second sum is taken over all |m| ≤ n, such that n − |m| is even. • The contribution of the Zernike moment of order m to the reconstruction is X AmnVmn(r, θ)| |Im(x, y)| = | n where x2 + y 2 ≤ 1, |n| ≤ m and m − |n| is even. • Gibbs phenomena may appear in the reconstructed object. This is caused by the inability of a continuous function to recreate a step function - no matter how many finite high order terms are used, an overshoot of the function will occur. Outside of the original area of a binary object, “ripples” of overshoot of the continuous function may be visible. INF 5300, 2005, Lecture 10, page 38 of 42 Contour description • Moments can be used to characterize shapes even if the region is represented by its boundary. • Suppose we have an object S and that we are able to find the length of its contour. • Assume a closed boundary as an ordered sequence z(i) of Euclidean distance between the centroid and all N boundary pixels. • We partition the contour into M segments of equal length, and thereby find M equidistant points along the contour of S. • No extra processing for spiral or concave contours. • Translation, rotation, and scale invariant one-dimensional normalized contour sequence moments can be estimated as N 1 X mr = [z(i)]r , N i=1 N 1 X µr = [z(i) − m1]r N i=1 • Normalized ordinary and central contour sequence moments: P r 1N N i=1 [z(i)] r m̄r = (µm r/2 = h ir/2 ) 2 PN 2 1 i=1 [z(i) − m1 ] N PN 1 [z(i) − m1 ]r µ̄r = (µµ2 )rr/2 = h N i=1 ir/2 PN 2 1 i=1 [z(i) − m1 ] N INF 5300, 2005, Lecture 10, page 39 of 42 • The coordinates (x, y) of these M points are then put into a complex vector f f (k) = x(k) + iy(k), k ∈ [0, M − 1] • We view the x-axis as the real axis and the y-axis as the imaginary one for a sequence of complex numbers (Granlund 1972). • The description of the object contour is changed, but all the information is preserved. • And we have transformed the contour problem from 2D to 1D. INF 5300, 2005, Lecture 10, page 40 of 42 Fourier-coefficients • We perform a forward Fourier transform M −1 1 X −2πiuk F (u) = f (x) exp M M k=0 for u ∈ [0, M − 1]. • F (0) now contains the center of mass of the object, and the coefficients F (1), F (2), F (3), ..., F (M − 1) will describe the object in increasing detail. • These features depend on rotation, scaling and starting point on the contour. • We do not want to use all coefficients as features, but terminate at F (N ), N < M . • This corresponds to setting F (k) = 0, k > N − 1 INF 5300, 2005, Lecture 10, page 41 of 42 Approximation • When transforming back, we get an approximation to the original contour N −1 X 2πiuk fˆ(k) = F (u) exp M u=0 defined for k ∈ [0, M − 1]. • We have only used N features to reconstruct each component of fˆ(k), but k still runs from 0 to M − 1. • The number of points in the approximation is the same (M ), but the number of coefficients (features) used to reconstruct each point is smaller (N < M ). • The first 10 – 15 descriptorsare found to be sufficient for character description. • The Fourier descriptors can be invariant to translation and rotation if the co-ordinate system is appropriately chosen. INF 5300, 2005, Lecture 10, page 42 of 42