INF 386, 2003, Lecture 1, page 1 of 21 University of Oslo Department of Informatics Fritz Albregtsen Segmentation by Thresholding 03.03.2003 Lecture 1 from Digital Image Analysis Selected Themes INF 386, V-2003 INF 386, 2003, Lecture 1, page 2 of 21 • In edge based segmentation we find the basic elements of edges, i.e. edge- pixels (or even line-pixels, corner-pixels etc), based on local estimates of gradient. In the next steps we thin broad edges, join edge fragments together into edge chains. Then we have a (partial) region border. • Thresholding splits the image histogram. Pixels belonging to same class get same label. These pixels are not necessarily neighbours. • In region based segmentation we find those pixels that are similar. • There are two categories of methods, based on two different principles, namely similarity and discontinuity. • Segmentation creates regions and objects in images. • Segmentation is one of the most important components of a complete image analysis system. Segmentation INF 386, 2003, Lecture 1, page 3 of 21 • Thresholding may also be a pre-process for adaptive filtering, adaptive compression etc. • Thresholding is usually a pre-process for various pattern recognition techniques. • In bi-level thresholding, the histogram of the image is usually assumed to have one valley between two peaks, the peaks representing objects and background, respectively. • Automatic thresholding is important in applications where speed or the physical conditions prevent human interaction. Introduction to Thresholding INF 386, 2003, Lecture 1, page 4 of 21 — Nonparametric methods are more robust, and usually faster. — Separate the two gray level classes in an optimum manner according to some criterion ∗ between-class variance ∗ divergence ∗ entropy ∗ conservation of moments. • Non-parametric case: — Difficult or impossible to establish a reliable model. — Estimate parameters of two distributions from given histogram. • Parametric techniques: Parametric versus Non-parametric INF 386, 2003, Lecture 1, page 5 of 21 • Distinction between supervised (with training) and unsupervised (clustering). • Distinction between automatic methods and interactive methods. • There are no truly automatic methods, always built-in parameters. • Automatic means that the user does not have to specify any parameters. Automatic versus Interactive INF 386, 2003, Lecture 1, page 6 of 21 • Contextual methods make use of the geometrical relations between pixels. • Non-contextual methods rely only on the gray level histogram of the image. • The fundamental framework of the global methods is also applicable to local sub-images. — the uniformity in lighting and detection — the gray level characteristics of objects and background • Global methods put severe restrictions on • Local methods optimize new threshold for a number of blocks or sub-images. • Global methods use single threshold for entire image Global and Non-contextual ? t ∞ Z b(z)dz f (z)dz b(z)dz + P2 · t −∞ Z ∞ t −∞ t f (z)dz INF 386, 2003, Lecture 1, page 7 of 21 ∂E = 0 ⇒ P1 · b(T ) = P2 · f (T ) ∂t • Differentiate with respect to the threshold t E(t) = P1 · Z • The total error is : E2(t) = E1 (t) = Z • The probabilities of mis-classifying a pixel, given a threshold t: p(z) = P1 · b(z) + P2 · f (z) • Histogram is assumed to be twin-peaked. Let P1 og P 2 be the a priori probabilities of background and foreground. (P1+P 2=1). Two distributions given by b(z) and f (z). The complete histogram is given by Bi-level thresholding INF 386, 2003, Lecture 1, page 8 of 21 • If the a priori probabilities P1 og P2 are equal (µ1 + µ2) T = 2 σ2 P2 (µ1 + µ2) + ln T = 2 (µ1 − µ2) P1 σB2 = σF2 = σ 2 • If the two variances are equal • Two thresholds may be necessary ! +2(µ1σ22 − µ2σ12) · T P1σ2 2 2 2 2 2 2 =0 +σ1 µ2 − σ2 µ1 + 2σ1 σ2 ln P2σ1 (σ12 − σ22) · T 2 • We get a quadratic equation: • For Gaussian distributions 2 (T −µ2 )2 1) − P P1 − (T −µ 2 2 2 √ e 2σ1 = √ e 2σ2 2πσ1 2πσ2 Bi-level thresholding + z=tk +1 zp(z) PG−1 tk +1 p(z) PG−1 # INF 386, 2003, Lecture 1, page 9 of 21 • The correctness of the estimated threshold depends on the extent of the overlap, as well as on the correctness of the P1 ≈ P2-assumption. • Note that µ1(t) and µ2(t) are the a posteriorimean values, estimated from overlapping and truncated distributions. The a priori µ1 and µ2 are unknown to us. • µ1(tk ) is the mean value of the gray values below the previous threshold tk , and µ2(tk ) is the mean value of the gray values above the previous threshold. tk+1 tk zp(z) Pz=0 tk z=0 p(z) "P = P1 (t)µ21 (t) + P2(t)µ22 (t) − µ20 [µ0 P1 (t) − µ1 (t)]2 = . P1 (t) [1 − P1(t)] INF 386, 2003, Lecture 1, page 10 of 21 • Optimal threshold T is found by a sequential search for the maximum of σB2 (t) for values of t where 0 < P1(t) < 1. σB2 (t) • The expression for σB2 (t) reduces to 2 . • Maximizing σB2 ⇔ minimizing σW 2 σW + σB2 = σ02, 2 • The sum of the within-class variance σW and the between-class variance σB2 is equal to the total variance σ02: σB2 (t) = P1(t) [µ1(t) − µ0]2 + P2(t) [µ2(t) − µ0]2 • Threshold value for k + 1-th iteration given by µ1(tk ) + µ2 (tk ) 1 = = 2 2 • Maximizes the a posteriori between-class variance σB2 (t), given by The method of Otsu • Initial threshold value, t0, equal to average brightness. The method of Ridler and Calvard INF 386, 2003, Lecture 1, page 11 of 21 • Starting with a threshold t0 = µ0 , fast convergence is obtained equivalent to the ad hoc technique of Ridler and Calvard. • Exhaustive sequential search gives same result as Otsu’s method. where µ1 and µ2 are the mean values below and above the threshold. µ1 (T ) + µ2 (T ) = 2T, • This may be written as • Differentiating σB2 and setting δσB2 (t)/δt = 0, we find a solution for "P # PG−1 T zp(z) zp(z) +1 = 2T + Pz=T Pz=0 T G−1 p(z) z=0 z=T +1 p(z) • We may write σB2 = P1 (t)µ21(t) + P2(t)µ22 (t) − µ20 hP i2 P t 2 G−1 zp(z) z=t+1 zp(z) + PG−1 σB2 (t) = Pz=0 − µ20 t z=0 p(z) z=t+1 p(z) • The method of Reddi et al. is based on the same assumptions as the method of Otsu, maximizing the a posteriori between-class variance σB2 (t). The method of Reddi INF 386, 2003, Lecture 1, page 12 of 21 • It gives the same numerical results as the extensive search technique of Otsu, but is orders of magnitude faster in multi-level thresholding ! • This procedure has a very fast convergence. • The process is repeated until all thresholds are stable. 1 0 t1 = (µ(0, t1) + µ(t1 , t2)) 2 . . 1 0 tM = (µ(tM −1, tM ) + µ(tM , G)) 2 • Starting with an arbitrary set of initial thresholds t1 , ..., tM we iteratively compute a new set of 0 0 thresholds t1, ..., tM by where µ(ti, tj ) is the mean value between neighbouring thresholds ti and tj . µ(0, t1) + µ(t1 , t2) = 2t1 µ(t1 , t2) + µ(t2 , t3) = 2t2 . . µ(tM −1, tM ) + µ(tM , G) = 2tM • The interclass variance reaches a maximum when Maximizing inter-class variance for M thresholds INF 386, 2003, Lecture 1, page 13 of 21 • The a posteriori model parameters will represent biased estimates. Correctness relies on small overlap. Cho et al. (1989) have given improvement. • An unfortunate starting value for an iterative search may cause the iteration to terminate at a nonsensical threshold value. • The criterion function has local minima at the boundaries of the gray scale. • As t varies, model parameters change. Compute J(t) for all t; find minimum. −2 [P1 (t)lnP1 (t) + P2 (t)lnP2 (t)] . J(t) = 1 + 2 [P1 (t)lnσ1(t) + P2(t)lnσ2 (t)] • Kittler and Illingworth (1985) assume a mixture of two Gaussian distributions (five unknown parameters). Find T that minimizes the KL distance between observed histogram and model distribution. A “minimum error” method INF 386, 2003, Lecture 1, page 14 of 21 a = αp + (1 − α)q b = αp2 + (1 − α)q 2 c = αp4 + (1 − α)q 4 • Alternatively, the above probabilities may be written • Assuming that border effects may be neglected, we may find these probabilities by examining all 2 × 2 neighbourhoods throughout the image. • Now define a = Prob (pixel gray level > t) b = Prob (two neighbouring pixels both > t) c = Prob (four neighbouring pixels all > t) or equivalently φ − 1 = 0, where φ = p + q. p(t) = 1 − q(t) • The uniform error threshold is then found when • For a given threshold t, let p(t) = fraction of background pixels above t q(t) = fraction of object pixels with gray level above t. • Suppose we knew the background area α(t), and also which pixels belonged to object and background. E1 (t) = E2 (t) • The uniform error threshold is given by Uniform error thresholding 1 2 3 4 4 3 2 1 6 3 1 1 Number of single pairs 4-tuples • In a single pass through the image, a table may be formed, giving estimates of a, b, c for all values of t. INF 386, 2003, Lecture 1, page 15 of 21 t < g1 g1 ≤ t < g2 g2 ≤ t < g3 g3 ≤ t < g4 Threshold Rank • For a given 2 × 2 neighbourhood, the four pixels are sorted in order of increasing gray level, g1 , g2 , g3 , g4 . Then for all thresholds t < g1 , the neighbourhood has four single pixels, six pairs and one 4-tuple > t. We may set up the scheme: • Instead of one pass through the whole image for each trial value of t, probabilities may be tabulated for all possible values of t in one initial pass. • No assumptions about underlying distributions, or about a priori probabilities. Only estimates of a, b and c for each trial value of t. • φ − 1 is a monotonously decreasing function. Root-finding algorithm instead of extensive search. • Select gray level t where | φ − 1 | is a minimum. (α2 − α)p4 + 2α(1 − α)p2q 2 + (1 − α)2 − (1 − α) q 4 b2 − c = a2 − b (α2 − α)p2 + 2α(1 − α)pq + [(1 − α)2 − (1 − α)] q 2 (p2 − q 2 )2 = = (p + q)2 = φ2 . (p − q)2 • Now we note that Uniform error thresholding - II z=t+1 z=t+1 G−1 X zp(z)/ G−1 X p(z) p(z) z=0 INF 386, 2003, Lecture 1, page 16 of 21 G−1 ρf g (T ) = maxt=0 ρf g (t) • The correlation coefficient has a very smooth behaviour, and starting with the overall average graylevel value, the optimal threshold may be found by a steepest ascent search for the value T which maximizes the correlation coefficient ρf g (t). µ2 (t) = zp(z)/ t X z=0 µ1(t) = t X • The gray levels of the two classes in the thresholded image may be represented by the two a posteriori average values µ1(t) and µ1(t): • Brink (1989) maximized the correlation between the original gray level image f and the thresholded image g. Maximum correlation thresholding z=0 HG = − Ht = − z=0 G−1 X z=0 t X p(z)ln(p(z)) p(z)ln(p(z)) p(z) p(z) p(z) p(z) ln − ln P1 (t) P1 (t) z=t+1 1 − P1 (t) 1 − P1(t) G−1 X Ht HG − Ht + . P1(t) 1 − P1(t) INF 386, 2003, Lecture 1, page 17 of 21 • The discrete value T of t which maximizes ψ(t) is now the selected threshold. ψ(t) = ln [P1 (t)(1 − P1(t))] + the sum of the two entropies may be written as • Using ψ(t) = − t X • For two distributions separated by a threshold t the sum of the two class entropies is • Kapur et al. proposed a thresholding algorithm based on Shannon entropy. Entropy-based methods Pst = − i=0 i=0 t s X X Hst HGG − Hst + Pst 1 − Pst i=0 j=0 G−1 G−1 X X pij ln(pij ), Hst = − i=1 j=1 t s X X pij ln(pij ) INF 386, 2003, Lecture 1, page 18 of 21 • In most cases, this gives an appreciable improvement over the single feature entropy method of Kapur et al. (1985). • A much faster alternative is to treat the two features s and t separately. • The discrete pair (S, T ) which maximizes ψ(s, t) are now the threshold values which maximize the loss of entropy, and thereby the gain in information by introducing the two thresholds. HGG = − where the total system entropy HGG and the partial entropy Hst are given by ψ(s, t) = H1 (st) + H2 (st) = ln [Pst (1 − Pst )] + pij . pij pij ln 1 − Pst 1 − Pst i=s+1 j=t+1 G−1 G−1 X X • The sum of the two entropies is now where H2(st) = − • Abutaleb (1989) proposed a thresholding method based on 2 − D entropy. For two distributions and a threshold pair (s, t), where s and t denote gray level and average gray level, the entropies are t s X X pij pij H1(st) = − ln P Pst i=0 j=0 st Two-feature entropy j=0 G−1 X pj (zj )i, i = 1, 2, 3. j=0 G−1 X pj (zj )i, i = 1, 2, 3. INF 386, 2003, Lecture 1, page 19 of 21 • Solving the four equations will give threshold T . P1 (t) + P2 (t) = 1 0 Pj (t)(zj )i = mi = mi = and we have j=1 2 X • We want to preserve the moments • Let P1 (t) and P2(t) denote a posteriori fractions of below-threshold and above-threshold pixels in f . mi = • The i-th moment may be computed from the normalized histogram p(z) by • Find threshold T such that if all below-threshold values in f are replaced by z1, and all above-threshold values are replaced by z2 , then the first three moments are preserved. • Observed image f is seen as blurred version of thresholded image g with gray levels z1 and z2. Preservation of moments INF 386, 2003, Lecture 1, page 20 of 21 • The optimal threshold, T , is then chosen as the P1 -tile (or the gray level value closest to the P1-tile) of the histogram of f . • In the bi-level case, the equations are solved as follows m m 0 1 cd = m1 m2 −m m 2 1 c0 = (1/cd ) −m3 m2 m −m 0 2 c1 = (1/cd ) m1 −m3 h i z1 = (1/2) −c1 − (c21 − 4c0 )1/2 h i z2 = (1/2) −c1 + (c21 − 4c0 )1/2 1 1 Pd = z1 z2 1 1 P1 = (1/Pd ) m2 z2 Solving the equations INF 386, 2003, Lecture 1, page 21 of 21 • Threshold is found by sequential search for maximum exponential convex hull deficiency. • Transform histogram p(z) by ln{p(z)}, compute convex hull, and transform convex hull back to histogram domain by he(k) = exp(h(k)). • In the ln{p(z)}-domain, upper concavities are produced by bimodality or shoulders, not by tail of normal or exponential, nor by extension of histogram. • Upper concavity of histogram tail regions can often be eliminated by considering ln{p(z)} instead of the histogram p(z). • This may work even if no “valley” exists. • “Convex deficiency” is obtained by subtracting the histogram from its convex hull. Exponential convex hull