$ ' $ ' INTRODUCTION IN 384, H-2001 • Automatic thresholding is important in applications where speed or the physical conditions prevent human interaction. • In bi-level thresholding, the histogram of the image is usually assumed to have one valley between two peaks, the peaks representing objects and background, respectively. SEGMENTATION METHODS 21/11 2001 • Thresholding is usually a pre-process for various pattern recognition techniques. Fritz Albregtsen • Thresholding may also be a pre-process for adaptive filtering, adaptive compression etc. FA/BLAB/IfI & ' % & $ ' Parametric versus Non-parametric $ • Automatic means that the user does not have to specify any parameters. – Estimate parameters of two distributions from given histogram. – Difficult or impossible to establish a reliable model. • There are no truly automatic methods, always built-in parameters. • Distinction between automatic methods and interactive methods. • Non-parametric case: – Separate the two gray level classes in an optimum manner according to some criterion ∗ between-class variance ∗ divergence ∗ entropy ∗ conservation of moments. – Nonparametric methods are more robust, and usually faster. FA/BLAB/IfI % Automatic versus Interactive • Parametric techniques: & FA/BLAB/IfI • Distinction between supervised (with training) and unsupervised (clustering). % & FA/BLAB/IfI % $ ' Bi-level thresholding Global and Non-contextual ? • Global methods use single threshold for entire image • The histogram is assumed to be twin-peaked. Let P1 og P 2 be the a priori probabilities of background and foreground. (P1+P 2=1). The two distributions are given by b(z) and f (z). The complete histogram is given by • Local methods optimize new threshold for a number of blocks or sub-images. • Global methods put severe restrictions on p(z) = P1 · b(z) + P2 · f (z) – the gray level characteristics of objects and background – the uniformity in lighting and detection • The probabilities of mis-classifying a pixel, given a threshold t: E1(t) = • The fundamental framework of the global methods is also applicable to local sub-images. E2(t) = E(t) = P1 · ' % & $ ' Bi-level thresholding P2 e 2πσ2 t b(z)dz + P2 · Z t −∞ f (z)dz FA/BLAB/IfI % $ tk+1 = +2(µ1σ22 − µ2σ12) · T P 1 σ2 +σ12µ22 − σ22µ21 + 2σ12σ22 ln =0 P 2 σ1 • If the two variances are equal σB2 = σF2 = σ 2 z=0 zp(z) Ptk z=0 p(z) PG−1 Ptk + z=tk +1 zp(z) PG−1 tk +1 p(z) • Note that µ1(t) and µ2(t) are the a posteriorimean values, estimated from overlapping and truncated distributions. The a priori µ1 and µ2 are unknown to us. P2 (µ1 + µ2) σ2 T = + ln 2 (µ1 − µ2) P1 • If the a priori probabilities P1 og P2 are equal • The correctness of the estimated threshold depends on the extent of the overlap, as well as on the correctness of the P1 ≈ P2-assumption. (µ1 + µ2) 2 FA/BLAB/IfI µ1(tk ) + µ2(tk ) 1 = 2 2 • µ1(tk ) is the mean value of the gray values below the previous threshold tk , and µ2(tk ) is the mean value of the gray values above the previous threshold. • Two thresholds may be necessary ! & Z ∞ • The threshold value for the k + 1-th iteration is given by (σ12 − σ22) · T 2 b(z)dz • The initial threshold value, t0, is set equal to the average brightness. (T −µ2 )2 − 2σ22 • We get a quadratic equation: T = t The method of Ridler and Calvard • For Gaussian distributions =√ Z ∞ f (z)dz ∂E = 0 ⇒ P1 · b(T ) = P2 · f (T ) ∂t FA/BLAB/IfI & P1 e 2πσ1 −∞ • Differentiate with respect to the threshold t • Contextual methods make use of the geometrical relations between pixels. √ Z t • The total error is : • Non-contextual methods rely only on the gray level histogram of the image. (T −µ1 )2 − 2σ12 $ ' % & FA/BLAB/IfI % $ ' $ ' The method of Otsu The method of Reddi • Maximizes the a posteriori between-class variance σB2 (t), given by • The method of Reddi et al. is based on the same assumptions as the method of Otsu, maximizing the a posteriori between-class variance σB2 (t). σB2 (t) = P1(t) [µ1(t) − µ0]2 + P2(t) [µ2(t) − µ0]2 • We may write σB2 = P1(t)µ21(t) + P2(t)µ22(t) − µ20 σB2 (t) 2 • The sum of the within-class variance σW and the between-class variance σB2 is equal to the total variance σ02: P PG−1 + z=t+1 zp(z) PG−1 z=t+1 p(z) 2 − µ20 T zp(z) + z=0 p(z) PG−1 z=T +1 zp(z) PG−1 z=T +1 p(z) = 2T µ1(T ) + µ2(T ) = 2T, where µ1 and µ2 are the mean values below and above the threshold. [µ0P1(t) − µ1(t)]2 = . P1(t) [1 − P1(t)] • Exhaustive sequential search gives same result as Otsu’s method. • Optimal threshold T is found by a sequential search for the maximum of σB2 (t) for values of t where 0 < P1(t) < 1. FA/BLAB/IfI & 2 • This may be written as • The expression for σB2 (t) reduces to = z=0 zp(z)] Pt z=0 p(z) z=0 P T 2 • Maximizing σB2 is equivalent to minimizing σW . P1(t)µ21(t)+P2(t)µ22(t)−µ20 = Pt • Differentiating σB2 and setting δσB2 (t)/δt = 0, we find a solution for 2 σW + σB2 = σ02, σB2 (t) [ ' • Starting with a threshold t0 = µ0, fast convergence is obtained equivalent to the ad hoc technique of Ridler and Calvard. % & $ ' FA/BLAB/IfI $ A “minimum error” method Maximum correlation thresholding • Assuming Gaussian distributions we have five unknown parameters. Kittler and Illingworth optimized a criterion function related to the average pixel classification error rate, based on estimated a posteriori parameters: • Brink (1989) maximized the correlation between the original gray level image f and the thresholded image g. • The gray levels of the two classes in the thresholded image may be represented by the two a posteriori average values µ1(t) and µ1(t): J(t) = 1 + 2 [P1(t)lnσ1(t) + P2(t)lnσ2(t)] −2 [P1(t)lnP1(t) + P2(t)lnP2(t)] . µ1(t) = • As t is varied, the model parameters change. µ2(t) = • Compute J(t) for all t, and find minimum value. • The criterion function has local minima at the boundaries of the gray scale. • The a posteriori model parameters will represent biased estimates, as the tails of the overlapping distributions are truncated. Thus, the correctness of the estimated threshold relies on this overlap being small. FA/BLAB/IfI t X zp(z)/ z=0 G−1 X z=t+1 t X p(z) z=0 zp(z)/ G−1 X p(z) z=t+1 • The correlation coefficient has a very smooth behaviour, and starting with the overall average graylevel value, the optimal threshold may be found by a steepest ascent search for the value T which maximizes the correlation coefficient ρf g (t). • An unfortunate starting value for an iterative search may cause the iteration to terminate at a nonsensical threshold value. & % ρf g (T ) = maxG−1 t=0 ρf g (t) % & FA/BLAB/IfI % $ ' $ ' Preservation of moments Entropy-based methods • Gives threshold without iteration or search. • Kapur et al. proposed a thresholding algorithm based on Shannon entropy. • The observed image f may be seen as blurred version of the thresholded image g with gray levels z1 and z2 . • For two distributions separated by a threshold t the sum of the two class entropies is ψ(t) = − t X z=0 • The method selects a threshold T such that if all belowthreshold values in f are replaced by z1 , and all abovethreshold values are replaced by z2 , then the first three moments are preserved. G−1 p(z) p(z) p(z) p(z) X ln − ln P1(t) P1(t) z=t+1 1 − P1(t) 1 − P1(t) • Using Ht = − HG = − • The i-th moment mi of the image f may be computed from the normalized histogram p(z) by mi = t X pj (zj )i , i = 1, 2, 3. j=0 p(z)ln(p(z)) z=0 G−1 X G−1 X • Let P1 (t) and P2 (t) denote the a posteriori fractions of the below-threshold and above-threshold pixels in f . Then the moments of g are given by p(z)ln(p(z)) z=0 the sum of the two entropies may be written as 0 mi = Ht HG − Ht + . ψ(t) = ln [P1(t)(1 − P1(t))] + P1(t) 1 − P1(t) 2 X Pj (t)(zj )i , i = 1, 2, 3. j=1 • We want to preserve the moments • The discrete value T of t which maximizes ψ(t) is now the selected threshold. 2 X 0 Pj (t)(zj )i = mi = mi = j=1 G−1 X pj (zj )i , i = 1, 2, 3. j=0 and we have P1 (t) + P2 (t) = 1 • Solving these four equations will give the threshold T . FA/BLAB/IfI & ' % & $ ' Solving the equations c0 = (1/cd ) c1 = (1/cd ) • This may work even where no “valley” exists. m0 −m2 m1 −m3 h i h i z1 = (1/2) −c1 − (c21 − 4c0)1/2 z2 = (1/2) −c1 + (c21 − 4c0)1/2 Pd = P1 = • Upper concavity of histogram tail regions can often be eliminated by considering ln{p(z)} instead of the histogram p(z). m1 m2 −m2 −m3 1 1 z1 z2 (1/Pd) 1 m2 $ • “Convex deficiency” is obtained by subtracting the histogram from its convex hull. m0 m1 cd = m1 m2 % Exponential convex hull • In the bi-level case, the equations are solved as follows FA/BLAB/IfI • In the ln{p(z)}-domain, upper concavities can be produced by bimodality and by shoulders, but not by the tail of normal or exponential, nor by extension of histogram. • Transform histogram p(z) by ln{p(z)}, compute convex hull, and transform convex hull back to histogram domain by he (k) = exp(h(k)). • Threshold is found by sequential search for maximum exponential convex hull deficiency. 1 z2 • The optimal threshold, T , is then chosen as the P1 -tile (or the gray level value closest to the P1 -tile) of the histogram of f . & FA/BLAB/IfI % & FA/BLAB/IfI % $ ' $ ' SEGMENTERING Region-basert segmentering • Segmentering er ett av de viktigste elementene i et komplett bilde-analyse system. • Segmentering deler et bilde opp i regioner. • Vi kan finne (deler av) omrisset til regionene i et bilde ved kant-deteksjon. • I segmenteringen får vi fram regioner og objekter som senere skal beskrives og gjenkjennes. • Vi kan segmentere ved terskling. • Vi omtaler to kategorier av metoder, basert på to egenskaper ved piksler i bilder, nemlig likhet og diskontinuitet • Vi skal nå finne regionene direkte. • La R være hele bildet, og anta at det består av regionene Ri , i ∈ {1, ..., n} – Ved region-basert segmentering (terskling, groing, split-and-merge) får vi fram de pikslene som ligner hverandre. Dermed har vi alle pikslene i objektet. • Regionene danner tilsammen hele bildet ∪ni=1Ri = R • Regionene er disjunkte – I kant-basert segmentering finner vi basalelementer Ri ∩ Rj = φ, ∗ kant-, linje-, hjørne-punkter, ... i 6= j • Et predikat P er tilfredsstilt av alle pikslene i hver region P (Ri) = TRUE for i ∈ {1, ..., n} I neste steg: Tynner og kjeder sammen til ∗ kanter, linjer, hjørner, ... Dermed har vi omrisset av objektene. • To nabo-regioner er forskjellige mht predikatet P P (Ri ∪ Rj ) = FALSE for i 6= j der P (R) er et predikat over pikslene i mengden R og φ er den tomme mengde. FA-in384-edge1-2 & ' % & FA/IN384/region-01 % $ ' $ Groing “Split and merge” • Så noen “frø-punkter” i bildet, og gro regioner ved å ta med nabo-piksler inn i regionen dersom de har egenskaper (gråtone, farge, tekstur, etc.) som faller innenfor en gitt toleranse. • Split = del en region i to disjunkte regioner. • merge = slå sammen to regioner. • Split hvis P (Ri) = FALSE • Problemer: - Valg av “frø-punkter” - Valg av egenskaper og toleranser - Stopp-kriterier • “Merge” Rj og Rk hvis P (Rj ∪ Rk ) =TRUE • Lag et histogram over egenskapene. • Velg punkter som har en hyppig forekommende egenskaps-vektor. • Gitt a priori kunnskap: bruk den! • Metoden er ofte applikasjons-avhengig. • Teksturerte objekter blir homogene hvis vi bruker passende tekstur-egenskap. & • Gitt en region Ri og et predikat P . FA/IN384/region-02 % • Stopp når det ikke kan splittes eller “merges” mer. • Spesialtilfelle: - splitt i fire kvadranter hvis P (Ri) = FALSE - merge nabo-regioner hvis P (Rj ∪ Rk ) = TRUE & FA/IN384/region-03 % $ ' $ ' “Merging of regions” Hough-transform • Gitt to regioner R1 og R2 med perimetre P1 og P2. C = lengden av grenselinjen D = lengde av C der δ(ampl.) < 1 • Anta at vi har foretatt en kantdeteksjon og en terskling av resultat-bildet. • “Merge” hvis • Vi vil finne delmengder av disse punktene som ligger på rette linjer. • Dermed har vi n punkter som f.eks. beskriver et objekt. D > 2, 2 ≈ 1/2 min(P1, P2) • Betrakt et punkt (xi, yi) og en rett linje yi = axi + b Dette forhindrer merging av like store regioner. • Hvert punkt (xi, yi) i bilde-domenet svarer til en rett linje gitt ved • Videre merging hvis D > 3, C b = −xia + yi 3 ≈ 3/4 i parameter-domenet (ab-planet) • Fungerer for intensitets-bilder med få objekter og lite tekstur. • Teksturerte regioner blir homogene hvis vi bruker passende tekstur-egenskap. • Bruk region-groing i tekstur-egenskaps bilder. & FA/IN384/region-04 % & FA/IN384/hough-01 % ' $ ' $ Hough-transform II Hough-transform III • Del opp ab-planet i akkumulator-celler A [a, b] , a ∈ [amin, amax] ; b ∈ [bmin, bmax] • sett alle A[a,b] := 0; for hvert punkt (x,y) der grad(x,y) > T begin for alle mulige verdier av a begin b = -xa + y; A[a,b] := A[a,b] + 1; end end • En verdi A [i, j] = M svarer til at M punkter i x, y-planet ligger på linjen y = ai x + bj • Denne representasjonen håndterer ikke vertikale linjer ! & • Benytter normal-representasjonen for en rett linje, gitt ved x cos θ + y sin θ = ρ • Hvert punkt (xi, yi) i xy-planet gir nå opphav til en sinusoid i ρθ-planet. • M ko-lineære punkter som ligger på linjen x cos θj + sin θj = ρi gir M kurver som skjærer hverandre i punktet (ρi, θj ) i parameter-planet. • Lokale maksima => signifikante linjer FA/IN384/hough-02 % & FA/IN384/hough-03 % $ ' $ ' Hough-transform IV Hough-transform, problemer • Del opp ρθ-planet i akkumulatorceller A [ρ, θ] , ρ ∈ [ρmin, ρmax] ; θ ∈ [θmin, θmax] • Lengde og posisjon til et linjesegment kan ikke finnes. • “Range” for θ er ±90o fra x-aksen • Ko-lineære linjesegmenter kan ikke skilles fra hverandre. – horisontal linje har θ = 90o, ρ ≥ 0 – vertikal linje har θ = 90o, ρ ≥ 0 eller θ = −90o, ρ ≤ 0 √ • “Range” for ρ er ±D 2 der D er lengden av sidekanten i bildet. • Hvordan skal vi finne lokale maksima i parameter-planet, som svarer til sammen hengende linjestykker i bildeplanet? – Høy terskel => korte linjestykker går tapt – Lav terskel => deteksjon av ikkesammenhengende piksler. • Løsning: – Lokal terskling. – Traverser de piksler som svarer til et maksimum i parameter-planet. Sjekk at gap < toleranse. & FA/IN384/hough-04 % & FA/IN384/hough-05 % ' $ ' $ Hough-transform, lokal gradient Hough-transform, sirkler • Gitt et gradient-magnitude bilde g(x, y) som inneholder et linjesegment. • Parametrisering av sirkel i xy-planet • Enkel algoritme: for alle g(xi, yi) > T do for alle θ do ρ := xi cos θ + yi sin θ inkrementér A(ρ, θ) • Vi få r altså et 3-dimensjonalt parameter-rom å søke i. (x − a)2 + (y − b)2 = c2 • Enkel prosedyre, A[i, j, k] : • FORENKLET: Gitt gradient-magnituden g(x, y) og gradient-komponentene gx and! gy , eller φg (x, y) = arctan ggxy • Algoritme: for alle g(xi, yi) > T do ρ := xi cos(φg (x, y)) + yi sin(φg (x, y)) inkrementér A(ρ, φg (x, y)) gitt akkumulator sett alle A[a,b,c] := 0; for hvert punkt (x,y) der grad(x,y) > T begin for alle mulige a og b begin c = sqrt((x-a)**2 + (y-b)**2); A[a,b,c] := A[a,b,c] + 1; end; end; • Andre, mer elegante forslag ? & FA/IN384/hough-06 % & FA/IN384/hough-07 %