Segmentation INF 5300, V-2004 Selected Themes from Digital Image Analysis • Segmentation is one of the most important components of a complete image analysis system. • Segmentation creates regions and objects in images. Lecture 1 • There are two categories of methods, based on two different principles, namely similarity and discontinuity. Monday 16.02.2004 • In region based segmentation we find those pixels that are similar. Segmentation by Thresholding • Thresholding splits the image histogram. Pixels belonging to same class get same label. These pixels are not necessarily neighbours. Fritz Albregtsen • In edge based segmentation we find the basic elements of edges, i.e. edge- pixels (or even line-pixels, corner-pixels etc), based on local estimates of gradient. In the next steps we thin broad edges, join edge fragments together into edge chains. Then we have a (partial) region border. Department of Informatics University of Oslo INF 5300, 2004, Lecture 1, page 1 of 21 INF 5300, 2004, Lecture 1, page 2 of 21 Introduction to Thresholding • Automatic thresholding is important in applications where speed or the physical conditions prevent human interaction. • In bi-level thresholding, the histogram of the image is usually assumed to have one valley between two peaks, the peaks representing objects and background, respectively. • Thresholding is usually a pre-process for various pattern recognition techniques. • Thresholding may also be a pre-process for adaptive filtering, adaptive compression etc. INF 5300, 2004, Lecture 1, page 3 of 21 Parametric versus Non-parametric • Parametric techniques: — Estimate parameters of two distributions from given histogram. — Difficult or impossible to establish a reliable model. • Non-parametric case: — Separate the two gray level classes in an optimum manner according to some criterion ∗ between-class variance ∗ divergence ∗ entropy ∗ conservation of moments. — Nonparametric methods are more robust, and usually faster. INF 5300, 2004, Lecture 1, page 4 of 21 Global and Non-contextual ? Automatic versus Interactive • Automatic means that the user does not have to specify any parameters. • There are no truly automatic methods, always built-in parameters. • Distinction between automatic methods and interactive methods. • Distinction between supervised (with training) and unsupervised (clustering). • Global methods use single threshold for entire image • Local methods optimize new threshold for a number of blocks or sub-images. • Global methods put severe restrictions on — the gray level characteristics of objects and background — the uniformity in lighting and detection • The fundamental framework of the global methods is also applicable to local sub-images. • Non-contextual methods rely only on the gray level histogram of the image. • Contextual methods make use of the geometrical relations between pixels. INF 5300, 2004, Lecture 1, page 5 of 21 INF 5300, 2004, Lecture 1, page 6 of 21 Bi-level thresholding • Histogram is assumed to be twin-peaked. Let P1 og P 2 be the a priori probabilities of background and foreground. (P1+P 2=1). Two distributions given by b(z) and f (z). The complete histogram is given by p(z) = P1 · b(z) + P2 · f (z) • The probabilities of mis-classifying a pixel, given a threshold t: Z E1 (t) = E2(t) = t −∞ Z ∞ E(t) = P1 · t • We get a quadratic equation: (σ12 − σ22) · T 2 +2(µ1σ22 − µ2σ12) · T P1 σ 2 2 2 2 2 2 2 =0 +σ1 µ2 − σ2 µ1 + 2σ1 σ2 ln P2 σ 1 • Two thresholds may be necessary ! b(z)dz • If the two variances are equal • The total error is : ∞ • For Gaussian distributions 2 (T −µ2 )2 1) − P P1 − (T −µ 2 2 2 √ e 2σ1 = √ e 2σ2 2πσ1 2πσ2 f (z)dz t Z Bi-level thresholding b(z)dz + P2 · σB2 = σF2 = σ 2 Z t f (z)dz −∞ • Differentiate with respect to the threshold t ∂E = 0 ⇒ P1 · b(T ) = P2 · f (T ) ∂t INF 5300, 2004, Lecture 1, page 7 of 21 (µ1 + µ2) σ2 P2 T = + ln 2 (µ1 − µ2) P1 • If the a priori probabilities P1 og P2 are equal (µ1 + µ2) T = 2 INF 5300, 2004, Lecture 1, page 8 of 21 The method of Ridler and Calvard The method of Otsu • Initial threshold value, t0, equal to average brightness. • Maximizes the a posteriori between-class variance σB2 (t), given by • Threshold value for k + 1-th iteration given by σB2 (t) = P1(t) [µ1(t) − µ0]2 + P2(t) [µ2(t) − µ0]2 tk+1 µ1(tk ) + µ2 (tk ) 1 = = 2 2 "P tk zp(z) Pz=0 tk z=0 p(z) + PG−1 z=tk +1 zp(z) PG−1 tk +1 p(z) # • µ1(tk ) is the mean value of the gray values below the previous threshold tk , and µ2(tk ) is the mean value of the gray values above the previous threshold. • Note that µ1(t) and µ2(t) are the a posteriorimean values, estimated from overlapping and truncated distributions. The a priori µ1 and µ2 are unknown to us. • The correctness of the estimated threshold depends on the extent of the overlap, as well as on the correctness of the P1 ≈ P2-assumption. INF 5300, 2004, Lecture 1, page 9 of 21 2 • The sum of the within-class variance σW and the between-class variance σB2 is equal to the total variance σ02: 2 + σB2 = σ02, σW 2 . • Maximizing σB2 ⇔ minimizing σW • The expression for σB2 (t) reduces to σB2 (t) = P1 (t)µ21 (t) + P2(t)µ22 (t) − µ20 [µ0 P1 (t) − µ1 (t)]2 . = P1 (t) [1 − P1(t)] • Optimal threshold T is found by a sequential search for the maximum of σB2 (t) for values of t where 0 < P1(t) < 1. INF 5300, 2004, Lecture 1, page 10 of 21 The method of Reddi Maximizing inter-class variance for M thresholds • The method of Reddi et al. is based on the same assumptions as the method of Otsu, maximizing the a posteriori between-class variance σB2 (t). • The interclass variance reaches a maximum when • We may write σB2 = P1 (t)µ21(t) + P2(t)µ22 (t) − µ20 hP i2 P t 2 G−1 zp(z) z=t+1 zp(z) + PG−1 − µ20 σB2 (t) = Pz=0 t z=0 p(z) z=t+1 p(z) • Differentiating σB2 and setting δσB2 (t)/δt = 0, we find a solution for "P # PG−1 T zp(z) zp(z) +1 + Pz=T = 2T Pz=0 T G−1 p(z) z=0 z=T +1 p(z) • This may be written as µ1 (T ) + µ2 (T ) = 2T, where µ1 and µ2 are the mean values below and above the threshold. • Exhaustive sequential search gives same result as Otsu’s method. • Starting with a threshold t0 = µ0 , fast convergence is obtained equivalent to the ad hoc technique of Ridler and Calvard. INF 5300, 2004, Lecture 1, page 11 of 21 µ(0, t1) + µ(t1 , t2) = 2t1 µ(t1 , t2) + µ(t2 , t3) = 2t2 . . µ(tM −1, tM ) + µ(tM , G) = 2tM where µ(ti, tj ) is the mean value between neighbouring thresholds ti and tj . • Starting with an arbitrary set of initial thresholds t1 , ..., tM we iteratively compute a new set of 0 0 thresholds t1, ..., tM by 1 0 t1 = (µ(0, t1) + µ(t1 , t2)) 2 . . 1 0 tM = (µ(tM −1, tM ) + µ(tM , G)) 2 • The process is repeated until all thresholds are stable. • This procedure has a very fast convergence. • It gives the same numerical results as the extensive search technique of Otsu, but is orders of magnitude faster in multi-level thresholding ! INF 5300, 2004, Lecture 1, page 12 of 21 A “minimum error” method • Kittler and Illingworth (1985) assume a mixture of two Gaussian distributions (five unknown parameters). Find T that minimizes the KL distance between observed histogram and model distribution. J(t) = 1 + 2 [P1 (t)lnσ1(t) + P2(t)lnσ2 (t)] −2 [P1 (t)lnP1 (t) + P2 (t)lnP2 (t)] . • As t varies, model parameters change. Compute J(t) for all t; find minimum. • The criterion function has local minima at the boundaries of the gray scale. • An unfortunate starting value for an iterative search may cause the iteration to terminate at a nonsensical threshold value. • The a posteriori model parameters will represent biased estimates. Correctness relies on small overlap. Cho et al. (1989) have given improvement. INF 5300, 2004, Lecture 1, page 13 of 21 Uniform error thresholding • The uniform error threshold is given by E1 (t) = E2 (t) • Suppose we knew the background area α(t), and also which pixels belonged to object and background. • For a given threshold t, let p(t) = fraction of background pixels above t q(t) = fraction of object pixels with gray level above t. • The uniform error threshold is then found when p(t) = 1 − q(t) or equivalently φ − 1 = 0, where φ = p + q. • Now define a = Prob (pixel gray level > t) b = Prob (two neighbouring pixels both > t) c = Prob (four neighbouring pixels all > t) • Assuming that border effects may be neglected, we may find these probabilities by examining all 2 × 2 neighbourhoods throughout the image. • Alternatively, the above probabilities may be written a = αp + (1 − α)q b = αp2 + (1 − α)q 2 c = αp4 + (1 − α)q 4 INF 5300, 2004, Lecture 1, page 14 of 21 Uniform error thresholding - II Maximum correlation thresholding • Now we note that (α2 − α)p4 + 2α(1 − α)p2q 2 + (1 − α)2 − (1 − α) q 4 b2 − c = a2 − b (α2 − α)p2 + 2α(1 − α)pq + [(1 − α)2 − (1 − α)] q 2 (p2 − q 2 )2 = = (p + q)2 = φ2 . (p − q)2 • Select gray level t where | φ − 1 | is a minimum. • φ − 1 is a monotonously decreasing function. Root-finding algorithm instead of extensive search. • No assumptions about underlying distributions, or about a priori probabilities. Only estimates of a, b and c for each trial value of t. • Instead of one pass through the whole image for each trial value of t, probabilities may be tabulated for all possible values of t in one initial pass. • For a given 2 × 2 neighbourhood, the four pixels are sorted in order of increasing gray level, g1 , g2 , g3 , g4 . Then for all thresholds t < g1 , the neighbourhood has four single pixels, six pairs and one 4-tuple > t. We may set up the scheme: Threshold Rank t < g1 g1 ≤ t < g 2 g2 ≤ t < g 3 g3 ≤ t < g 4 1 2 3 4 Number of single pairs 4-tuples 4 3 2 1 6 3 1 1 • In a single pass through the image, a table may be formed, giving estimates of a, b, c for all values of t. INF 5300, 2004, Lecture 1, page 15 of 21 • Brink (1989) maximized the correlation between the original gray level image f and the thresholded image g. • The gray levels of the two classes in the thresholded image may be represented by the two a posteriori average values µ1(t) and µ1(t): µ1(t) = t X zp(z)/ z=0 µ2 (t) = G−1 X z=t+1 t X p(z) z=0 zp(z)/ G−1 X p(z) z=t+1 • The correlation coefficient has a very smooth behaviour, and starting with the overall average graylevel value, the optimal threshold may be found by a steepest ascent search for the value T which maximizes the correlation coefficient ρf g (t). G−1 ρf g (T ) = maxt=0 ρf g (t) INF 5300, 2004, Lecture 1, page 16 of 21 Two-feature entropy Entropy-based methods • Kapur et al. proposed a thresholding algorithm based on Shannon entropy. • For two distributions separated by a threshold t the sum of the two class entropies is • Abutaleb (1989) proposed a thresholding method based on 2 − D entropy. For two distributions and a threshold pair (s, t), where s and t denote gray level and average gray level, the entropies are t s X X pij pij ln H1(st) = − P Pst i=0 j=0 st H2(st) = − where ψ(t) = − t X z=0 G−1 X p(z) p(z) p(z) p(z) ln − ln P1 (t) P1 (t) z=t+1 1 − P1 (t) 1 − P1(t) • Using Ht = − HG = − t X p(z)ln(p(z)) z=0 G−1 X p(z)ln(p(z)) z=0 the sum of the two entropies may be written as ψ(t) = ln [P1 (t)(1 − P1(t))] + HG − H t Ht + . P1(t) 1 − P1(t) • The discrete value T of t which maximizes ψ(t) is now the selected threshold. G−1 X G−1 X pij pij ln 1 − Pst 1 − Pst i=s+1 j=t+1 Pst = − s X t X pij . i=0 i=0 • The sum of the two entropies is now ψ(s, t) = H1 (st) + H2 (st) = ln [Pst (1 − Pst )] + Hst HGG − Hst + Pst 1 − Pst where the total system entropy HGG and the partial entropy Hst are given by HGG = − G−1 G−1 XX i=0 j=0 pij ln(pij ), Hst = − s X t X pij ln(pij ) i=1 j=1 • The discrete pair (S, T ) which maximizes ψ(s, t) are now the threshold values which maximize the loss of entropy, and thereby the gain in information by introducing the two thresholds. • A much faster alternative is to treat the two features s and t separately. • In most cases, this gives an appreciable improvement over the single feature entropy method of Kapur et al. (1985). INF 5300, 2004, Lecture 1, page 17 of 21 INF 5300, 2004, Lecture 1, page 18 of 21 Preservation of moments • Observed image f is seen as blurred version of thresholded image g with gray levels z1 and z2. • Find threshold T such that if all below-threshold values in f are replaced by z1, and all above-threshold values are replaced by z2 , then the first three moments are preserved. • The i-th moment may be computed from the normalized histogram p(z) by mi = G−1 X pj (zj )i, i = 1, 2, 3. j=0 • Let P1 (t) and P2(t) denote a posteriori fractions of below-threshold and above-threshold pixels in f . • We want to preserve the moments 2 X 0 Pj (t)(zj )i = mi = mi = j=1 and we have G−1 X pj (zj )i, i = 1, 2, 3. Solving the equations • In the bi-level case, the equations are solved as follows m0 m1 cd = m1 m2 c0 = (1/cd ) −m2 m1 −m3 m2 c1 = (1/cd ) m0 −m2 m1 −m3 h i z1 = (1/2) −c1 − (c21 − 4c0 )1/2 h i z2 = (1/2) −c1 + (c21 − 4c0 )1/2 Pd = 1 1 z1 z2 P1 = (1/Pd ) 1 1 m2 z 2 j=0 P1 (t) + P2 (t) = 1 • The optimal threshold, T , is then chosen as the P1 -tile (or the gray level value closest to the P1-tile) of the histogram of f . • Solving the four equations will give threshold T . INF 5300, 2004, Lecture 1, page 19 of 21 INF 5300, 2004, Lecture 1, page 20 of 21 Exponential convex hull • “Convex deficiency” is obtained by subtracting the histogram from its convex hull. • This may work even if no “valley” exists. • Upper concavity of histogram tail regions can often be eliminated by considering ln{p(z)} instead of the histogram p(z). • In the ln{p(z)}-domain, upper concavities are produced by bimodality or shoulders, not by tail of normal or exponential, nor by extension of histogram. • Transform histogram p(z) by ln{p(z)}, compute convex hull, and transform convex hull back to histogram domain by he(k) = exp(h(k)). • Threshold is found by sequential search for maximum exponential convex hull deficiency. INF 5300, 2004, Lecture 1, page 21 of 21