INF 386, V-2003 Selected Themes from Digital Image Analysis Lecture 3 05.03.2003 Statistical Texture Analysis Static or Adaptive? Fritz Albregtsen Department of Informatics University of Oslo INF 386, 2003, Lecture 3, page 1 of 32 What is texture ? • Visual textures are spatially extended visual patterns of more or less accurate repetitions of some basic texture elements, called texels. • Each texel usually contains several pixels. • Its characteristics and placement can be periodic, quasi-periodic or random. Thus, textures may have statistical or structural properties, or both. • Texture features characterize the statistical or structural relationship between pixels (or texels), and provide measures of properties such as contrast, smoothness, coarseness, randomness, regularity, linearity, directionality, periodicity, and structural complexity. • Morphometric features measure the size and shape of objects, independent of the gray level values of the pixels within the object. • Densitometric features measure the distribution of gray levels or optical density within an object, but not the positions of the pixels. INF 386, 2003, Lecture 3, page 2 of 32 Categories of Methods A large number of texture analysis methods have been developed for automated analysis of visual texture. We can broadly divide them into three groups: • Statistical methods are often based on accumulating second or higher order statistics (matrices), and using feature vectors that descrive these probability distributions directly, and therefore describe the image texture only indirectly. • Structural methods are based upon an assumption that textures are composed of texels which are regular and repetitive. Both texels and placement rules have to be described. • Structural-statistical methods characterize the texel by a feature vector and describe the probability distribution of these features statistically INF 386, 2003, Lecture 3, page 3 of 32 Surveys and Reviews • R.M. Haralick, “Statistical and structural approaches to texture”, Proc. IEEE, 67, 786-804, 1979. • J.M.H. du Buf et al., “Texture feature performance for image segmentation”, Pattern Recognition 23, 291-309, 1990. • P.P. Ohanian and R.C. Dubes, “Performance evaluation for four classes of textural features”, Pattern Recognition 25, 819-833, 1992. • T.R. Reed and J.M.H. du Buf, “ A review of recent texture segmentation and feature extraction techniques”, CVGIP: Image Understanding, 57, 359-372, 1993. • M. Tuceryan and A.K. Jain, “Texture analysis”, In C.H. Chen et al. (eds.) “Handbook of Pattern Recognition and Computer Vision”, World Scientific Publ., 235-276, 1993. • Randen and Husøy, “Filtering for texture classification: A comparative study”, IEEE PAMI 21, 291-310, 1999. INF 386, 2003, Lecture 3, page 4 of 32 Gray Level Cooccurrence Matrices • The matrix element P (m, n) gives 2. order statistical probability of going from graylevel m to n when moving a distance d in direction θ within the image (or a sub-image). • Given a M × N image having G gray levels. Let f (m, n) be the pixel value at (m, n). Then we have where P (i, j | ∆x, ∆y) = W Q(i, j | ∆x, ∆y) 1 (M − ∆x)(N − ∆y) Q(i, j | ∆x, ∆y) = A= ( • Angular Second Moment (ASM) : ASM = G−1 G−1 X X i=0 j=0 {P (i, j)}2 • Entropy : EN T ROP Y = − • Correlation: CORR = G−1 G−1 X X i=0 j=0 P (i, j) × log(P (i, j)) G−1 G−1 X X (i − µx)(j − µy )P (i, j) σx × σ y i=0 j=0 W = and GLCM Features NX −∆y MX −∆x n=1 • Contrast: A m=1 1 if f (m, n) = i and f (m + ∆x, n + ∆y) = j 0 elsewhere • Alternativ notation, given distance and direction (d, theta) P (i, j | d, θ) CON T RAST = G−1 X n=0 n2 { G G X X i=1 j=1 | i − j |= n • Inverse Difference Moment (IDM) : IDM = G−1 G−1 X X i=0 j=0 1 P (i, j) 1 + (i − j)2 • Sum of Squares, Variance : V ARIAN CE = G−1 G−1 X X i=0 j=0 INF 386, 2003, Lecture 3, page 5 of 32 P (i, j)}, (i − µ)2 P (i, j) INF 386, 2003, Lecture 3, page 6 of 32 GLCM Features • Sum Average : AV ER = 2G−2 X Static GLCM features iPx+y (i) i=0 • Sum Entropy : SEN T = − 2G−2 X Px+y (i)log(Px+y (i)) i=0 DEN T = − • Inertia : IN ERT IA = Px+y (i)log(Px+y (i)) i=0 SHADE = G−1 G−1 X X • Examples of the first category: ASM = G−1 X G−1 X {i − j}2 × P (i, j) i=0 j=0 {P (i, j)}2 G−1 G−1 X X i=0 j=0 {i + j − µx − µy }3 × P (i, j) P (i, j) × log(P (i, j)) • Examples of the second category: G−1 G−1 X X i=0 j=0 • Cluster Prominence : G−1 G−1 X X G−1 X G−1 X EN T ROP Y = − IDM = i=0 j=0 P ROM = 1: weighting based on GLCM value i=0 j=0 i=0 j=0 • Cluster Shade : • Two general categories: 2: weighting based on GLCM position • Difference Entropy : G−1 X • Static GLCM features are weighted sum of the cooccurrence matrix element values. IN ERT IA = 1 P (i, j) 1 + (i − j)2 G−1 G−1 X X i=0 j=0 {i − j}2 × P (i, j) • Note that shape of feature function will depend on G. {i + j − µx − µy }4 × P (i, j) INF 386, 2003, Lecture 3, page 7 of 32 INF 386, 2003, Lecture 3, page 8 of 32 Sum and difference histogram • Sum and difference define the principal components of a 2. order probability density function. Gray Level Run Length • Define normalized sum and difference histogram for a domain D • A “run” = is a set of consecutive (8-neighbor), colinear pixels having the same gray level value. Ps (i | ∆x, ∆y) = W Card{(m, n) ∈ D, s∆x,∆y (m, n) = i} • “Run length” = number of pixels in a “run”. Pd(j | ∆x, ∆y) = W Card{(m, n) ∈ D, d∆x,∆y (m, n) = j} W = 1 (M − ∆x)(N − ∆y) s∆x,∆y (m, n) = f (m, n) + f (m + ∆x, n + ∆y) d∆x,∆y (m, n) = f (m, n) − f (m + ∆x, n + ∆y) • There are 2G − 1 possible values in the histogram. • The most frequently used features from GLCM can be found exactly from Ps and Pd. Contrast from Pd CON = 2G−2 X j=0 j 2 Pd(j | ∆x, ∆y) Contrast from GLCM CON = G−1 X n=0 2 n{ G G X X i=1 j=1 P (i, j)}, • “Run length value” = number of “runs” in an image. • Each element p(i, j | θ) of a GLRLM gives the number of “runs” of gray level i, of lenghth j, in a given direction θ. • Let P (i, j | θ) be elements of the normalized GLRLM, i.e. p(i, j | θ) p(i, j | θ) = P (i, j | θ) = PG PR S i=1 j=1 p(i, j | θ) S is the total number of runs in the image. • The number of gray levels must be low. | i − j |= n INF 386, 2003, Lecture 3, page 9 of 32 INF 386, 2003, Lecture 3, page 10 of 32 Simplifying GLRLM • Let r(j | θ) = G X represent the number of runs of length j, while g(i | θ) = R X j=1 p(i, j | θ) is the number of runs having gray level i. • Let S be the total number of runs in the image: S= G X R X i=1 j=1 p(i, j | θ) = G X i=1 g(i | θ) = R X j=1 r(j | θ) 4 1 3 3 0 0 0 1 0 1 0 0 0 0 0 0 4 2 3 4 11 1 1 0 S=13 LRE = G R R 1 X X p(i, j | θ) 1 X r(j | θ) = S i=1 j=1 j2 S j=1 j 2 G R R 1 XX 2 1X j p(i, j | θ) = r(j | θ)j 2 S i=1 j=1 S j=1 !2 = !2 = G R 1X X GLN = p(i, j | θ) S i=1 j=1 R G 1X X RLN = p(i, j | θ) S j=1 i=1 G RP = r(j | θ) gray level run length, j i 1 2 3 4 g(i | θ) 1 2 3 4 • The expressions for the static GLRLM features : SRE = p(i, j | θ) i=1 Simplification GLRLM R 1X [r(j | θ)]2 S j=1 R 1 XX 1X p(i, j | θ) = r(j | θ) n i=1 j=1 n j=1 LGRE = HGRE = R G 1X [g(i | θ)]2 S i=1 G G R 1 X g(i | θ) 1 X X p(i, j | θ) = S i=1 j=1 i2 S i=1 i2 G G R 1 XX 2 1X 2 i p(i, j | θ) = i g(i | θ) S i=1 j=1 S i=1 • Note that all features may be computed without actually accumulating a 2D GLRL matrix. • The feature weights contain run length or gray level, and never both of them at the same time. INF 386, 2003, Lecture 3, page 11 of 32 INF 386, 2003, Lecture 3, page 12 of 32 GLCM and GLRLM • GLCM : probability of pixel gray level pairs having a given spatial and intensity relation. — expresses pixel pair contrast, not texel size. — several matrices are needed to estimate texel properties (e.g. size, quasi-periodicity, orientation). • GLRLM : probability of several connected, colinear pixels so close in gray level that they form "gray level runs". — does not capture the true shape aspects of the texels — comes much closer to doing this than the GLCM — discards information on contrast between gray levels. INF 386, 2003, Lecture 3, page 13 of 32 Generalized Cooccurrence Matrices • Davis et al. (1979) introduced generalized matrices (GCM). • GCM was based on local maxima of the gradient image of the texture. • Coocurrence of gradient magnitude and direction, using spatial constraint predicates instead of specific geometric distances. • Could be “cooccurrence of anything”. INF 386, 2003, Lecture 3, page 14 of 32 Cooccurrence of Gray Level Runs • 1D Histograms → probability distribution of single pixel intensity. • 2D GLCM’s → probability distribution of intensity of pixel pairs. • 2D GLRLM’s → probability distribution of intensity of runs of pixels. • 4D CGLRLM’s → probability of neighboring pairs of runs of pixels. • An increasing amount of information involved, ⇒ better description of image texture ⇒ more matrix bins to populate. INF 386, 2003, Lecture 3, page 15 of 32 From 2D GLCM to 1D Histograms • The sum and difference define the principal axes of the second order GLCM probability distribution function. • Replace 2D GLCM by 1D sum and difference histograms. • The usual (Haralick) texture features (or some close approximations) associated with the 2D GLCM can be computed directly from the sum and difference histograms. • This is widely used, mostly for computational reasons. INF 386, 2003, Lecture 3, page 16 of 32 From 4D to 2D Matrices • Two independent runs of (graylevel,runlength) = (i, j) and (k, l) may be viewed as two random variables with the same variance. • A 4D CGLRLM probability matrix P (i, j, k, l) may be replaced by - one 2D sum run length matrix, Ps(ξ, ψ), - one 2D difference run length matrix, Pd(γ, δ). • So for all neighboring runs in a given image : — compute the sum ξ and difference γ of the two graylevels and the sum ψ and difference δ of the two run lengths. — accumulate the entries in the two 2D matrices. — Finally, normalize the sum and difference run length matrices. • A complexity reduction of GR/8 (G = gray levels, R = max run length). INF 386, 2003, Lecture 3, page 17 of 32 Ad hoc Features Fk = X P (i, j)Wk (i, j) i,j • As shown earlier, GLCM feature extraction is usually performed by computing a number of non-adaptive pre-defined, (ad hoc) weighted sums of matrix elements, — either based on the value of each matrix element — or based on the position of the element within the matrix — shape of weight function depends on quantization. • The GLRLM feature weights only contain run length or gray level, never both of them at the same time. • Complex matrix structures are not captured in any single feature. • Features do not adapt to problem-specific matrix structures. INF 386, 2003, Lecture 3, page 18 of 32 Adaptive Features - I • Assume that the n-th image of class ωc gives a 2D matrix Pn(i, j|ωc). • Calculate average matrix over all N (ωc) images in each class ωc P (i, j|ωc) = N (ωc ) 1 X Pn (i, j) N (ωc) n=1 • the class variance matrix σP2 (i, j|ωc) N (ωc ) 1 X = (Pn (i, j) − P (i, j|ωc ))2 N (ωc) n=1 • the Class Difference Matrix ∆P (i, j|ω1, ω2 ) = P (i, j|ω1 ) − P (i, j|ω2 ) • and finally the Mahalanobis Class Distance Matrix JP (i, j|ω1 , ω2 ) = 2 (P (i, j|ω1 ) − P (i, j|ω2 ))2 (σP2 (i, j|ω1 ) + σP2 (i, j|ω2)) INF 386, 2003, Lecture 3, page 19 of 32 Adaptive Features - II • We use the squared Mahalanobis class distance as weights. • We use the disjoint positive/negative parts of the class difference matrices as the domains of the weighted summation. • An image having a matrix Pk (i, j) → two adaptive features F+ = X Pk (i, j) [JP (i, j|ω1, ω2)]2 ∆P (i,j|ω1 ,ω2 )≥0 F− = X Pk (i, j) [JP (i, j|ω1, ω2)]2 ∆P (i,j|ω1 ,ω2 )<0 • Highest weight on the most discriminatory parts of the matrices!!! INF 386, 2003, Lecture 3, page 20 of 32 Difference and Distance Matrices • Here, two class difference matrices ∆ have to be used, - one from the two sum run length matrices (class 1 and 2) - one from the two difference run length matrices (class 1 and 2) ∆s(ξ, ψ|ω1, ω2) = P̄s (ξ, ψ|ω1) − P̄s(ξ, ψ|ω2) ∆d(γ, δ|ω1, ω2) = P̄d(γ, δ|ω1) − P̄d(γ, δ|ω2) where P̄k (., .|ωn) is the average normalized sum (k = s) or difference (k = d) run length matrix for class ωn, n = 1, 2. • Two Mahalanobis class distance matrices must be used, - JPs (ξ, ψ|ω1, ω2) for the sum run length matrices, - JPd (γ, δ|ω1, ω2) for the difference run length matrices. INF 386, 2003, Lecture 3, page 21 of 32 The Four CGLRLM Features • The four features from the sum and difference run length matrices for an image from class ωn, where n ∈ {1, 2}, are then given by X F s+ = Ps(ξ, ψ) [JPs (ξ, ψ|ω1 , ω2 )]2 ∆s (ξ,ψ|P̄s )≥0 F s− = X Ps(ξ, ψ) [JPs (ξ, ψ|ω1 , ω2 )]2 ∆s (ξ,ψ|P̄s )<0 F d+ = X Pd (γ, δ) [JPd (γ, δ|ω1 , ω2 )]2 ∆d (γ,δ|P̄d )≥0 F d− = X Pd(γ, δ) [JPd (γ, δ|ω1 , ω2 )]2 ∆d (γ,δ|P̄d )<0 • Note that the weighted summations are performed over the two disjoint (+/-) partitions of each class difference matrix. INF 386, 2003, Lecture 3, page 22 of 32 INF 386, 2003, Lecture 3, page 23 of 32 INF 386, 2003, Lecture 3, page 24 of 32 5.5 4.4 6.6 4.9 6.2 3.2 3.7 4.9 1.9 ERR, training ERR, test JB Method average of best GLCM average of best GLRLM adaptive CGLRLM Best combinations of two features : 2.89 3.08 1.92 2.00 5.00 9.6 6.5 10.2 8.3 3.7 9.4 5.7 10.8 7.8 2.5 best GLCM average (CON) average of best GLCM best GLRLM average (RLN) average of best GLRLM adaptive CGLRLM • Given 10 textures, we get 45 texture pairs. ERR, training ERR, test • Each sub-image was normalized to the same mean value and standard deviation (µ=127.5 and σ=50.0). Method • The 48 sub-images were divided randomly but equally into a training set and a test set. Best single features : • Each texture image was partitioned into 48 non-overlapping 75 × 75 pixels sub-images. Results - Brodatz Textures • From 112 Brodatz textures we have selected the 10 most relevant, i.e. stochastic, isotropic, homogeneous and fine-grained textures. JB Brodatz Textures TEM images of mouse liver cell nuclei Liver cell results Method Features JB Error (%) classical GLRLM classical GLRLM classical GLRLM classical GLRLM SRE LRE RLN RP 1.50 1.21 1.45 1.06 10 10 10 10 adaptive CGLRLM adaptive CGLRLM adaptive CGLRLM adaptive CGLRLM F s+ F s− F d+ F d− 0.87 1.30 1.35 1.33 10 5 5 10 q 1 −2J (ω ,ω ) B 1 2 1 − (1 − 4P (ω1)P (ω2 )e ) < ε1,2 2 p ε1,2 < P (ω1 )P (ω2 )e−JB (ω1 ,ω2 ) 25 20 Examples of liver cell nuclei from normal (top) and noduli (bottom) samples. The borders between the 30% peripheral and 70% central part are outlined as a thin white line. 15 10 5 0 0.8 INF 386, 2003, Lecture 3, page 25 of 32 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 INF 386, 2003, Lecture 3, page 26 of 32 Mouse liver cell nuclei in TEM → Class differences and distances are very localized! → Adaptive features give high class distances! Class Difference Matrix normal - premalignant run length difference matrix. Class Distance Matrix normal - premalignant run length difference matrix. INF 386, 2003, Lecture 3, page 27 of 32 Ovarian Cancer Eight monolayer cell nuclei from a good prognosis sample (upper) and eight nuclei from a bad prognosis sample (lower). INF 386, 2003, Lecture 3, page 28 of 32 FIGO stage I Ovarian Cancer peripheral and central segments of the cell nuclei −3 x 10 6 0.01 4 0.005 2 0 0 −2 −0.005 −4 −6 20 −0.01 20 15 40 30 10 20 5 10 0 0 15 40 30 10 20 5 10 0 0 • The positive/negative parts of the class difference are found in different locations in the matrices from center (left) and periphery (right). • These subtle texture differences are very hard to see from the gray level images themselves. INF 386, 2003, Lecture 3, page 29 of 32 FIGO stage I Ovarian Cancer chromatin structure size and contrast differences • A gray level difference of one between neighboring runs is less probable in cell nuclei from good prognosis than in bad prognosis. • Larger gray level differences are more probable in good prognosis. • Subvisual difference in texel size and contrast between classes. INF 386, 2003, Lecture 3, page 30 of 32 Gray Level Entropy Matrices • GLEM is a way of extracting higher order texture information. • The GLEM element P (i, H|w) gives an estimate of the probability of finding a first order (histogram) entropy H for a window of size w × w centered on a pixel having gray level value i. • The entropy value H is defined from the normalized gray level histogram p(g) within the window by G X P (g) log [p(g)] , P (g) 6= 0 (1) H=− g=1 • The GLEM may be computed for a variety of window sizes, w, and it is natural to presume that the probability distribution within the matrix will vary as w is altered. It is therefore important that P (i, H|w) is estimated for all possible locations where the whole window is inside the image or the image segment. Otherwise, the results from different window sizes would be mixed. INF 386, 2003, Lecture 3, page 31 of 32 Complexity Graylevel Matrix • The basic concept of CGM is that the texture information is extracted from a local neighborhood of w × w pixelsm, w = 2k + 1, k >= 1. • The local texture information is represented by the complexity value, which is the number of black-to-white transitions within the neighborhood, when the center pixel value is used as a threshold. • The complexity value c will vary from 0 for a homogeneous binary to c = 2w(w − 1) = 8k 2 + 4k for the most complex binary neighborhood (12 for a checkered 3 × 3 pattern). • Computing the local complexity value over the whole graylevel image, we accumulate a 2D histogram N (i, j), giving the number of windows having center pixel graylevel value i and complexity value j. Normalizing the 2D histogram N (i, j) we get the CGM CGM (i, j) = N (i, j) , i ∈ [0, 1, 2, ..., G − 1] , j ∈ [0, 1, 2, ..., 2w(w − 1)] (Nx − 2k)(Ny − 2k) where the Nx × Ny image has G graylevels, and a sliding w × w window has been used. INF 386, 2003, Lecture 3, page 32 of 32