INF 5300, V-2004 Bi-level thresholding Selected Themes from Digital Image Analysis

advertisement
Bi-level thresholding
INF 5300, V-2004
Selected Themes
from Digital Image Analysis
Lecture 10
Friday 28.05.2004
• Histogram is assumed to be twin-peaked.
Let P1 og P 2 be the a priori probabilities
of background and foreground. (P1+P 2=1).
Two distributions given by b(z) and f (z).
The complete histogram is given by
p(z) = P1 · b(z) + P2 · f (z)
• The probabilities of mis-classifying a pixel,
given a threshold t:
E1 (t) =
Repetition of Central Themes
E2(t) =
Fritz Albregtsen
t
f (z)dz
−∞
∞
Z
b(z)dz
t
• The total error is :
E(t) = P1 ·
Department of Informatics
University of Oslo
Z
Z
∞
t
b(z)dz + P2 ·
Z
t
f (z)dz
−∞
• Differentiate with respect to the threshold t
∂E
= 0 ⇒ P1 · b(T ) = P2 · f (T )
∂t
INF 5300, 2004, Lecture 10, page 1 of 52
INF 5300, 2004, Lecture 10, page 2 of 52
The method of Ridler and Calvard
Bi-level thresholding
• Initial threshold value, t0, equal to average
brightness.
• For Gaussian distributions
(T −µ1 )2
(T −µ2 )2
−
−
P
P
√ 1 e 2σ12 = √ 2 e 2σ22
2πσ1
2πσ2
• Threshold value for k + 1-th iteration given
by
• Two thresholds may be necessary !
tk+1 =
• If the variances are equal
T =
σ2
P2
(µ1 + µ2)
+
ln
2
(µ1 − µ2)
P1
• If a priori probabilities are equal
T =
(µ1 + µ2)
2
INF 5300, 2004, Lecture 10, page 3 of 52
µ1(tk ) + µ2 (tk ) 1
=
2
2
"P
#
PG−1
z=tk +1 zp(z)
+ P
G−1
z=0 p(z)
tk +1 p(z)
tk
Pz=0
tk
zp(z)
• Note that µ1(t) and µ2(t) are the
a posteriori mean values, estimated from
overlapping and truncated distributions.
The a priori µ1 and µ2 are unknown to us.
• The correctness of the estimated threshold
depends on the extent of the overlap,
as well as on the correctness of the
P1 ≈ P2-assumption.
INF 5300, 2004, Lecture 10, page 4 of 52
The method of Reddi
The method of Otsu
• Maximizes the a posteriori between-class
variance σB2 (t), given by
σB2 (t) = P1(t) [µ1(t) − µ0]2 + P2(t) [µ2(t) − µ0]2
• The expression for σB2 (t) reduces to
σB2 (t) = P1 (t)µ21 (t) + P2(t)µ22 (t) − µ20 =
[µ0 P1 (t) − µ1 (t)]2
.
P1 (t) [1 − P1(t)]
• Optimal threshold T is found by a sequential search
for the maximum of σB2 (t) for values of t where
0 < P1(t) < 1.
• The method of Reddi et al. is based on the same
assumptions as the method of Otsu, maximizing the
a posteriori between-class variance σB2 (t).
• We may write σB2 = P1 (t)µ21(t) + P2(t)µ22 (t) − µ20
hP
i2
P t
2
G−1
zp(z)
z=t+1
zp(z)
− µ20
+ PG−1
σB2 (t) = Pz=0
t
z=0 p(z)
z=t+1 p(z)
• Differentiating σB2 and setting δσB2 (t)/δt = 0, we find
a solution for
"P
#
PG−1
T
z=T +1 zp(z)
z=0 zp(z)
= 2T
+ PG−1
PT
z=0 p(z)
z=T +1 p(z)
• Exhaustive sequential search gives same result as
Otsu’s method.
• Starting with a threshold t0 = µ0 ,
fast convergence is obtained equivalent to
the ad hoc technique of Ridler and Calvard.
INF 5300, 2004, Lecture 10, page 5 of 52
INF 5300, 2004, Lecture 10, page 6 of 52
Uniform error thresholding
A “minimum error” method
• Find T that minimizes the KL distance
between observed histogram and model
distribution.
J(t) = 1 + 2 [P1 (t)lnσ1(t) + P2(t)lnσ2 (t)]
−2 [P1 (t)lnP1 (t) + P2 (t)lnP2 (t)] .
• As t varies, model parameters change.
Compute J(t) for all t; find minimum.
• The a posteriori model parameters will
represent biased estimates.
Correctness relies on small overlap.
Improved estimates of parameters are
possible.
• The uniform error threshold is given by (see page 2)
E1 (t) = E2 (t)
• For a given threshold t, let
p(t) = fraction of background pixels above t
q(t) = fraction of object pixels with gray level above t.
• The uniform error threshold is then found when
p(t) = 1 − q(t)
or equivalently φ − 1 = 0, where φ = p + q.
• Find solution by using
b2 − c
= φ2 .
a2 − b
a = αp + (1 − α)q
b = αp2 + (1 − α)q 2
c = αp4 + (1 − α)q 4
α(t) = the background area.
• In a single pass through the image, a table may be
formed, giving estimates of a, b, c for all values of t.
• Select gray level t where | φ − 1 | is a minimum.
INF 5300, 2004, Lecture 10, page 7 of 52
INF 5300, 2004, Lecture 10, page 8 of 52
Two-feature entropy
Entropy-based methods
• For two distributions separated by a
threshold t the sum of the two class
Shannon entropies are
ψ(t) = −
t
G−1
X
X
p(z) p(z)
p(z)
p(z)
ln
−
ln
P
(t)
P
(t)
1
−
P
(t)
1
−
P1(t)
1
1
1
z=0
z=t+1
• Using
Ht = −
HG = −
t
X
H1(st) = −
H2(st) = −
where
t
s X
X
pij pij
ln
P
Pst
i=0 j=0 st
G−1
G−1 X
X
i=s+1 j=t+1
Pst = −
p(z)ln(p(z))
z=0
G−1
X
• For two distributions and a threshold pair (s, t), where s and t
denote gray level and average gray level, the entropies are
p(z)ln(p(z))
z=0
the sum of the two entropies may be
written as
HG − H t
Ht
+
.
ψ(t) = ln [P1 (t)(1 − P1(t))] +
P1(t) 1 − P1(t)
• The discrete value T of t which maximizes
ψ(t) is now the selected threshold.
INF 5300, 2004, Lecture 10, page 9 of 52
pij
pij
ln
1 − Pst 1 − Pst
t
s X
X
pij .
i=0 i=0
• The sum of the two entropies is now
ψ(s, t) = H1 (st) + H2 (st) = ln [Pst (1 − Pst )] +
Hst HGG − Hst
+
Pst
1 − Pst
where the total system entropy HGG and the partial entropy
Hst are given by
HGG = −
G−1
G−1
XX
i=0 j=0
pij ln(pij ), Hst = −
s X
t
X
pij ln(pij )
i=1 j=1
• The discrete pair (S, T ) which maximizes ψ(s, t) are now the
threshold values which maximize the loss of entropy, and
thereby the gain in information by introducing the two
thresholds.
INF 5300, 2004, Lecture 10, page 10 of 52
Exponential convex hull
• “Convex deficiency” is obtained by
subtracting the histogram from its convex
hull.
• This may work even if no “valley” exists.
• Upper concavity of histogram tail regions
can often be eliminated by considering
ln{p(z)} instead of the histogram p(z).
• In the ln{p(z)}-domain, upper concavities
are produced by bimodality or shoulders,
not by tail of normal or exponential,
nor by extension of histogram.
• Transform histogram p(z) by ln{p(z)},
compute convex hull, and transform
convex hull back to histogram domain by
he(k) = exp(h(k)).
• Threshold is found by sequential search for
maximum exponential convex hull
deficiency.
INF 5300, 2004, Lecture 10, page 11 of 52
Texture Analysis Methods
• Statistical methods are often based on
accumulating second or higher order
statistics (matrices), and using feature
vectors that descrive these probability
distributions directly, and therefore
describe the image texture only indirectly.
• Structural methods are based upon an
assumption that textures are composed of
texels which are regular and repetitive.
Both texels and placement rules have to be
described.
• Structural-statistical methods characterize
the texel by a feature vector and describe
the probability distribution of these
features statistically
INF 5300, 2004, Lecture 10, page 12 of 52
Gray Level Cooccurrence Matrices
• How is the matrix constructed?
• What size has it?
Gray Level Run Length
• What order is it?
• How can we make the statistics isotropic?
• What does it look like?
• What role does the pixel distance
parameter play?
• What do the different static GLCM features
measure?
• How many - and which of them - should
we use?
• How is the matrix constructed?
• What size has it?
• What order is it?
• How can we make the statistics isotropic?
• What does it look like?
• How can it be simplified?
• What is the relation to sum and difference
histograms?
INF 5300, 2004, Lecture 10, page 13 of 52
INF 5300, 2004, Lecture 10, page 14 of 52
Generalized Cooccurrence Matrices
• Davis et al. (1979) introduced generalized
matrices (GCM).
• GCM was based on local maxima of the
gradient image of the texture.
• Coocurrence of gradient magnitude and
direction, using spatial constraint
predicates instead of specific geometric
distances.
Cooccurrence of Gray Level Runs
• How can we combine the two methods?
• Can we produce adaptive - not static features?
• Could be “cooccurrence of anything”.
INF 5300, 2004, Lecture 10, page 15 of 52
INF 5300, 2004, Lecture 10, page 16 of 52
What is “shape” ?
Assumptions
• We have a segmented, labeled image.
• A numerical description of the spatial
configurations in the image.
• Each object that is to be described has
been identified.
• There is no generally accepted
methodology of shape description.
• The image objects can be represented as
• Location and description of high curvature
points give essential information.
• Shape description of 2D planar objects is
“easy”.
— binary image (whole regions)
— contour (region boundaries)
— through a run length code
— through a chain code
— through a quad tree
• Shape is defined in an image, but its
usefulness in a 3D world depends on how
well the 3D -> 2D mapping is handled.
— in cartesian coordinates
• Invariance is an important issue.
— as coefficients of some transform
INF 5300, 2004, Lecture 10, page 17 of 52
— in polar coordinates
— in some other coordinates
— ...
INF 5300, 2004, Lecture 10, page 18 of 52
Invariance of features
• Assume that we have an object, and that
we want to extract some features to
describe the object.
Shape Feature
• We may wish that the features are:
• Area from the number of pixels in the
region.
— Position invariant
independent of the position of the
object within the image.
• Area from boundary contour (Green’s
theorem).
— Scaling invariant
independent of the size of the object.
• Boundary from recursive splitting.
— Rotation invariant
independent of the orientation of the
object.
• Boundary from sequential polygonization.
• Perimeter from chain codes.
— Warp invariant
independent of a deformation of the
object.
• In most cases we want position invariant
features.
• The other depend on the application.
INF 5300, 2004, Lecture 10, page 19 of 52
INF 5300, 2004, Lecture 10, page 20 of 52
Non-orthogonal moments
The continuous two-dimensional (p + q)-th order
Cartesian moment is defined as:
Z ∞Z ∞
mpq =
xpy q f (x, y)dxdy
Statistical moments
−∞
(2)
−∞
It is assumed that f (x, y) is a piecewise continuous,
bounded function and that it can have non-zero values
only in the finite region of the xy plane. Then, moments
of all orders exist and the uniqueness theorem holds:
The general form of a moment of order (p + q),
evaluating over the complete image plane ξ is:
Z Z
ψpq (x, y)f (x, y)dxdy
mpq =
(1)
ξ
where the weighting kernel or basis function is ψpq .
This produces a weighted description of f (x, y) over ξ.
The choice of basis function depends on the
application and on any desired invariant properties.
The moment sequence mpq with basis xpy q is uniquely
defined by f (x, y) and f (x, y) is uniquely defined by mpq .
Thus, the original image can be described and
reconstructed, if sufficiently high order moments are
used.
The discrete version of the Cartesian moment for an
image consisting of pixels Pxy , is:
mpq =
M X
N
X
xpy q Px,y
(3)
x=1 y=1
mpq is a two dimensional Cartesian moment, where M
and N are the image dimensions and the monomial
product xpy q is the basis function.
INF 5300, 2004, Lecture 10, page 21 of 52
INF 5300, 2004, Lecture 10, page 22 of 52
Low order moments
Central moments
• The zero order moment m00 is defined as
the total mass (or power) of the image.
• For a binary M × N image of an object, this
gives the number of pixels in the object.
m00 =
M X
N
X
Px,y
(4)
x=1 y=1
• The two first order moments are used to
find the Centre Of Mass (COM) of an
image. If this is applied to a binary image
and the results are normalised by m00, then
the result is the centre co-ordinates of the
object.
m01
m10
ȳ =
(5)
x̄ =
m00
m00
INF 5300, 2004, Lecture 10, page 23 of 52
• The definition of a 2D discrete central
moment is:
XX
µp,q =
(x − x̄)p(y − ȳ)q f (x, y)
x
y
where
x̄ =
m10
,
m00
ȳ =
m01
m00
• This corresponds to computing ordinary
Cartesian moments after translating the
object so that center of mass is in origo.
• This means that central moments are
invariant under translation.
• Central moments are not scaling or
rotation invariant.
INF 5300, 2004, Lecture 10, page 24 of 52
Moments of inertia
• The two second order central moments
µ20 =
XX
(x − x̄)2f (x, y)
x
µ02 =
y
XX
x
y
(y − ȳ)2f (x, y)
correspond to the “moments of inertia”
relative to the coordinate directions,
while the cross moment of inertia
is given by
µ11 =
XX
(x − x̄)(y − ȳ)f (x, y)
x
Object orientation
y
• An elongated object having a random
orientation will have moments of inertia
that do not reflect the true shape of the
object, as they are not orientation
invariant.
• Orientation is defined as the angle (relative
to the x-axis) of the axis through the center
of mass that gives the lowest moment of
inertia.
• Orientation, θ, relative to x-axis is found by minimizing the
sum
XX
2
I(θ) =
β − β̄ f (α, β)
α
β
where the rotated coordinates are given by
α = x cos θ + y sin θ, β = y cos θ − x sin θ
• Orientation is then given by
θ=
1
2µ1,1
tan−1
2
µ2,0 − µ0,2
where θ ∈ [0, π/2] if µ11 > 0, and θ ∈ [π/2, π] if µ11 < 0.
• The three second order µpq can easily be
made invariant to rotation.
INF 5300, 2004, Lecture 10, page 25 of 52
INF 5300, 2004, Lecture 10, page 26 of 52
Normalization and invariants
Orientation invariant features
• The radius of gyration of an object
R̂ =
r
µ20 + µ02
µ00
• The semimajor and semiminor axes of the
object ellipse
v h
i
u
p
u 2 µ20 + µ02 ± (µ20 − µ02 )2 + 4µ2
11
t
(â, b̂) =
µ00
• The numerical eccentricity of the ellipse
=
r
a2 − b 2
a2
INF 5300, 2004, Lecture 10, page 27 of 52
• Changing the scale of f (x, y) by (α, β) in the
(x, y)-direction gives a new image
0
f (x, y) = f (x/α, y/β)
0
• The transformed central moments µpq can
be expressed by the original µpq
0
µpq = α1+pβ 1+q µpq
0
• For β = α we have µpq = α2+p+q µpq .
We get scaling invariant central moments
by the normalization
p+q
µpq
, γ=
+ 1, ∀(p + q) ≥ 2.
ηpq =
γ
(µ00)
2
INF 5300, 2004, Lecture 10, page 28 of 52
Hu’s rotation invariance
1 Find principal axes of object, rotate coordinats.
This method can break down when images do not
have unique principal axes.
2 The method of absolute moment invariants.
This is a set of seven combined normalized central
moment invariants, which can be used for scale,
position, and rotation invariant pattern
identification.
φ1 = η20 + η02
2
φ2 = (η20 − η02)2 + 4η11
φ3 = (η30 − 3η12)2 + (3η21 − η03)2
φ4 = (η30 + η12)2 + (η21 + η03)2
φ5 = (η30 − 3η12)(η30 + η12) (η30 + η12)2 − 3(η21 + η03)2
2
+(3η21 − η03)(η21 + η03) 3(η30 + η12) − (η21 + η03)2
2
2
φ6 = (η20 − η02) (η30 + η12) − (η21 + η03) + 4η11(η30 + η12)(η21 + η03)
φ7 = (3η21 − η03)(η30 + η12) (η30 + η12)2 − 3(η21 + η03)2
+(3η12 − η30)(η21 + η03) 3(η30 + η12)2 − (η21 + η03)2
• φ7 is skew invariant, to help distinguish mirror
images.
• These moments are of finite order, therefore, they do
not comprise a complete set of image descriptors.
However, higher order invariants can be derived.
Orthogonal moments
• Moments produced using orthogonal basis sets
have the advantage of needing lower precision to
represent differences to the same accuracy as the
monomials.
• The orthogonality condition simplifies the
reconstruction of the original function from the
generated moments.
• Orthogonality means mutually perpendicular: two
functions ym and ym are orthogonal over an interval
a ≤ x ≤ b if and only if:
Z b
ym (x)yn(x)dx = 0; m 6= n
a
• Here we are primarily interested in discrete images,
so the integrals within the moment descriptors are
replaced by summations.
• Two such (well established) orthogonal moments
are Legendre and Zernike.
INF 5300, 2004, Lecture 10, page 29 of 52
INF 5300, 2004, Lecture 10, page 30 of 52
Legendre moments
• The Legendre moments of order (m + n):
Z Z
(2m + 1)(2n + 1) 1 1
λmn =
Pm(x)Pn(y)f (x, y)dxdy
4
−1 −1
(6)
where m, n = 0, 1, 2, ..., ∞,
Pm and Pn are the Legendre polynomials
f (x, y) is the continuous image function.
• For orthogonality to exist in the moments, the image
function f (x, y) is defined over the same interval as
the basis set, where the n-th order Legendre
polynomial is defined as:
Pn(x) =
n
X
anj xj
(7)
j=0
and the Legendre coefficients are given by:
anj = (−1)(n−j)/2
1
(n + j)!
, n−j = even. (8)
2n ( (n−j) )!( (n+j) )!j!
2
2
• For a discrete image with current pixel Pxy , the
Legendre moments of order (m + n) are given by
(2m + 1)(2n + 1) X X
λmn =
Pm(x)Pn(y)Pxy (9)
4
x
y
Complex Zernike moments
• The Zernike moment of order m and repetition n is
m+1 XX
f (x, y) [Vmn(x, y)]∗ ,
Amn =
π
x
y
where
x2 + y 2 ≤ 1
m = 0, 1, 2, ..., ∞; f (x, y) is the image function, ∗
denotes the complex conjugate, and n is an integer
(positive or negative) depicting the angular
dependence or rotation, subject to the conditions
m − |n| = even,
|n| ≤ m
• The Zernike moments are projections of the input
image onto a space spanned by the orthogonal V
functions
Vmn(x, y) = Rmnejnθ
√
where j = −1, and
(m−|n|)/2
Rmn (x, y) =
X
s=0
(−1)s(x2 + y 2 )(m/2)−s (m − s)!
s! m+|n|
− s ! m−|n|
−s !
2
2
and x, y are defined over the interval [−1, 1].
INF 5300, 2004, Lecture 10, page 31 of 52
INF 5300, 2004, Lecture 10, page 32 of 52
Orthogonal radial polynomial
• The Zernike polynomials Vmn(x, y), expressed in
polar coordinates are:
Vmn(r, θ) = Rmn(r)ejnθ
where (r, θ) are defined over the unit disc
and Rmn is the orthogonal radial polynomial
(m−|n|)/2
X
Rmn(r) =
(−1)sF (m, n, s, r)
s=0
where
F (m, n, s, r) =
(m − s)!
r(m−2s)
m−|n|
−
s
!
−
s
!
s! m+|n|
2
2
• The first radial polynomials are
R00 = 1 , R11 = r
R20 = 2r2 − 1 , R22 = r2
R31 = 3r3 − 2r , R33 = r3
• So for a discrete image, if P(x,y) is the current pixel,
m +1 XX
Amn =
P (x, y) [Vmn(x, y)]∗ , x2 + y 2 ≤ 1
π
x
y
Image reconstruction
• The image within the unit circle may be
reconstructed to an arbitrary precision by
f (x, y) = lim
N →∞
• Suppose we have an object S and that we
are able to find the length of its contour.
• We partition the contour into M segments
of equal length, and thereby find M
equidistant points along the contour of S.
• The coordinates (x, y) of these M points
are then put into a complex vector f
f (k) = x(k) + iy(k), k ∈ [0, M − 1]
• We view the x-axis as the real axis and the
y-axis as the imaginary one for a sequence
of complex numbers (Granlund 1972).
• The description of the object contour is
changed, but all the information is
preserved.
• And we have transformed the contour
problem from 2D to 1D.
INF 5300, 2004, Lecture 10, page 35 of 52
AnmVnm(x, y)
n=0 m
where the second sum is taken over all |m| ≤ n,
such that n − |m| is even.
• The contribution of the Zernike moment of order m
to the reconstruction is
X
|Im(x, y)| = |
AmnVmn(r, θ)|
n
where x2 + y 2 ≤ 1, |n| ≤ m and m − |n| is even.
• Gibbs phenomena may appear in the reconstructed
object. This is caused by the inability of a
continuous function to recreate a step function - no
matter how many finite high order terms are used,
an overshoot of the function will occur. Outside of
the original area of a binary object, “ripples” of
overshoot of the continuous function may be visible.
INF 5300, 2004, Lecture 10, page 33 of 52
Contour description
N X
X
INF 5300, 2004, Lecture 10, page 34 of 52
Fourier-coefficients
• We perform a forward Fourier transform
M −1
1 X
−2πiuk
F (u) =
f (x) exp
M
M
k=0
for u ∈ [0, M − 1].
• F (0) now contains the center of mass of
the object, and the coefficients
F (1), F (2), F (3), ..., F (M − 1) will describe
the object in increasing detail.
• These features depend on rotation, scaling
and starting point on the contour.
• We do not want to use all coefficients as
features, but terminate at F (N ), N < M .
• This corresponds to setting
F (k) = 0, k > N − 1
INF 5300, 2004, Lecture 10, page 36 of 52
Approximation
• When transforming back, we get an
approximation to the original contour
N
−1
X
2πiuk
ˆ
f (k) =
F (u) exp
M
u=0
defined for k ∈ [0, M − 1].
• We have only used N features to
reconstruct each component of fˆ(k), but k
still runs from 0 to M − 1.
• The number of points in the
approximation is the same (M ), but the
number of coefficients (features) used to
reconstruct each point is smaller (N < M ).
• The first 10 – 15 descriptorsare found to be
sufficient for character description.
Why “CBIR” ?
• Large databases of digital images are
accessible.
— high volumes produced by scanners and
digital cameras
— larger storage capacities for lower costs
— easy access to emormous image
volumes via internet
• Manual indexing by keywords is
— very time consuming
— unrewarding
— unlikely to specify all aspects of image
• The Fourier descriptors can be invariant to
translation and rotation if the co-ordinate
system is appropriately chosen.
INF 5300, 2004, Lecture 10, page 37 of 52
Query characteristics
• Queries formulated by combinations of
low-level image features such as color,
texture and shape.
• Specified explicitely by feature values
or by feature range.
• Implicite specification by example.
• Spatial organization of features,
giving absolute or relative location.
• Relevance feedback:
allow user to refine search by
indicating relevance of returned images.
INF 5300, 2004, Lecture 10, page 39 of 52
INF 5300, 2004, Lecture 10, page 38 of 52
Problems
• What features will generally describe the
content of an image well?
• How to summarize the distribution of
these features over an image?
• How to measure the dissimilarity between
distributions of features?
• How to effectively display the results of a
search?
• How to browse images of a database in an
intuitive and efficient way?
INF 5300, 2004, Lecture 10, page 40 of 52
Distances and metrics
Selecting features
• The focus is often on color.
• A space is called a metric space if for any of
its two elements x amd y, there is a number
ρ(x, y), called the distance, that satisfies the
following properties
• The distribution of colors within the image is often a
useful clue to the content of the image.
— ρ(x, y) ≥ 0 (non-negativity)
• Absolute or relative locations of different color
distributions improve result.
— ρ(x, y) = ρ(y, x) (symmetry)
• One has to select some color representation
— color space (e.g. RGB, IHS, Lab, ...)
— representation of distribution
• While color is a single-pixel property, texture
describes the appearance of bigger regions.
— ρ(x, y) = 0 if and only if x = y (identity)
— ρ(x, z) ≤ ρ(x, y) + ρ(y, z) (∆ inequality)
• Distances between two points x and µ
in n-dimensional space
1) Euclidian
DE (x, µ) =k x − µ k=
— Statistical methods
— Structural methods
— MRF methods
— Filter-based methods
• For both color and texture, one has to select features
that relate to perceptual similarity.
"
n
X
k=1
(xk − µk )2
#1/2
2) “City block”/”Taxi”/ “Absolute value”
n
X
|xk − µk |
D4(x, µ) =
k=1
3) “Chessboard”/”Maximum value”
D8(x, µ) = max |xk − µk |
INF 5300, 2004, Lecture 10, page 41 of 52
Bin-by-bin dissimilarity
The distance between two distributions.
Useful when comparing e.g. histograms
in image search and retrieval.
• Minkowski distance:
dLp (H, K) =
X
i
|hi − ki|p
!1/p
L1 often used to compute dissimilarity
between color images.
L2 and L∞ often used for texture
dissimilarity.
L1-based retrieval may give many false
negatives, as neighboring bins are not
considered.
• Histogram intersection:
P
min(hi, ki)
d∩ = 1 − i P
i ki
Attractive because it handles partial
matches when area of one histogram is
smaller than the other.
When areas are equal, it is equaivalent to
normalized L1 distance.
INF 5300, 2004, Lecture 10, page 43 of 52
INF 5300, 2004, Lecture 10, page 42 of 52
Bin-by-bin dissimilarity -II
• Kullback-Leibner divergence:
X
hi
dKL(H, K) =
hi log
ki
i
Measures how inefficient it would be to
code one histogram using the other as
code-book.
Non-symmetric, and sensitive to binning.
• Jeffrey divergence:
X
hi
ki
dJ (H, K) =
hi log
+ ki log
mi
mi
i
mi = (hi + ki)/2
Is a modification of K-L; symmetric and
more robust to noise and binning.
• χ2 statistics:
dχ2 (H, K) =
X (hi − mi)2
i
mi
Measures how unlikely it is that one
distribution was drawn from the
population represented by the other.
INF 5300, 2004, Lecture 10, page 44 of 52
Cross-bin measures
Drawbacks of bin-by-bin
• Compares contents of corresponding
histogram bins hi and ki for all i, but not hi
and kj for i 6= j
• K-L is justified by information theory, and
χ2 by statistics, but they do not necessarily
match perceptual similarity well.
• This can be fixed by using
correspondences between bins, and the
cross-bin distance..
• Bin-by-bin is sensitive to bin size.
Coarse binning may not give sufficient
discrimination. Too fine binning may
place similar features in different bins.
• Cross-bin dissimilarity measures always
yield better results when bins get smaller.
• We need a cross-bin distance.
Cross-bin distances use the ground
distance dij , def. as the distance between
the representative features for bin i and
bin j.
• Quadratic-form distance
q
dA(H, K) = (h − k)T A(h − k)
where h and k are vectors listing all the
entries in H and K. This is used for color in
QBIC.
• Cross-bin information comes in via a
similarity matrix
A = [aij ]
where
dij
dmax
With this choice, it can be shown that A is a
metric.
aij = 1 −
• Quadratic-form distance may give false
positives, as it will overestimate similarity
of (color) distributions without a
pronounced mode.
INF 5300, 2004, Lecture 10, page 45 of 52
INF 5300, 2004, Lecture 10, page 46 of 52
Cross-bin measures - II
• 1-D match distance
dM (H, K) =
Cross-bin measures - III
X
i
|ĥi − k̂i|
P
where ĥi = j≤i hj is the cumulative
histogram of {hi}, and similarly for {ki}.
• The match distance is the L1 distance
between the cumulative histograms.
• For histograms having equal areas, this is a
special case of the EMD (later).
• The 1-D match distance does not extend to
higher dimensions, because the ralation
j ≤ i is not a total ordering in more than
one dimension.
• Match distance may be extended to
multi-dimensional histograms by graph
matching.
INF 5300, 2004, Lecture 10, page 47 of 52
• Kolmogorov-Smirnov statistics
dKS (H, K) = max(|ĥi − k̂i|)
i
where ĥi and {ki} are cumulative
histograms.
• K-S statistics is defined on the cumulative
distributions, so that no binning is actually
required.
• Under the null hypothesis (data drawn
from same distribution), the distribution
of the statistics can be calculated, giving
the significance of the result.
• Similar to match distance, it is defined only
for one dimension.
INF 5300, 2004, Lecture 10, page 48 of 52
Special case of EMD
Earth Mover’s Distance (EMD)
• One of several measures of the minimum cost of
matching elements between two histograms.
• Given two distributions, one seen as piles of earth in
feature space, the other as a collection of holes in
the same space, we need to solve the transportation
problem, finding the least amount of work needed
to fill the holes with earth.
• The Monge-Kantorowitch mass transfer problem
(1781). This distance first used in computer vision
by Werman, Peleg and Rosenfeld 1985.
• EMD applies to histograms and signatures in any
dimensions.
• It allows for partial matches.
• Generally: Solve linear optimization problem.
• If ground distance is a metric and total weights of
signatures is equal, the EMD is a true metric.
INF 5300, 2004, Lecture 10, page 49 of 52
• Minimum cost distance between two
one-dimensional distributions f (t) and g(t) is the L1
distance between the cumulative distribution
functions
Z x
Z ∞ Z x
dx
f
(t)dt
−
g(t)dt
−∞
−∞
−∞
• If feature space is one-dimensional, ground distance
is d(pi, qj ) = |pi − qj |, and the total weights of the two
signatures are equal:
ψ(P, Q) =
m+n−1
X
k=1
|p̂k − q̂k |(rk+1 − rk )
where r1 , r2 , ..., rm+n is the sorted list
p1 , p2 , ..., pm, q1 , q2 , ..., qn, and
p̂k =
m
X
i=1
[Pi ≤ rk ] wpi , q̂k =
n
X
j=1
[qj ≤ rk ] wqj
where [·] is 1 when its argument is true, and 0
otherwise.
Here P = {(p1, wp1), ..., (pm, wpm} is the first signature
with m clusters, where pi is the cluster representative
and wpi is the weight of the cluster;
Q = {(q1, wq1), ..., (qn , wqn} is the second signature
with n clusters;
and D = [di,j ] is a ground distance matrix where di,j
is the ground distance between clusters pi and qj .
INF 5300, 2004, Lecture 10, page 50 of 52
Comparing Dissimilarity
• A meaningful quality measure must be
defined.
• Image retrieval is measured by precision,
which is the number of relevant images
retrieved relative to the number of
retrieved images, and recall, which is the
number of relevant images retrieved,
relative to the total number of relevant
images in the database.
• The relative importance of good recall vs.
good precision differs according to the task
at hand.
• Performance comparisons should account
for the variety of parameters that can affect
the behaviour of each measure used.
• Difference between feature-by-feature
approach, and a systems approach.
• Processing steps that affect performance
independently should be evaluated
separately, to lower complexity and
heighten insight.
Some general results
• Bin-by-bin dissimilarity measures improve
by increasing number of bins up to a point,
then performance degrades.
• Cross-bin dissimilarity measures perform
better.
• Signatures carry less information than
histograms, but perform better.
• Jeffrey divergence and χ2 statistics give
almost identical results.
• In color space, the L2 distance, by
construction, matches the perceptual
similarity between colors.
• In histogram space, L1 is better than L2,
which is better than L∞
• Ground truth should be available.
INF 5300, 2004, Lecture 10, page 51 of 52
INF 5300, 2004, Lecture 10, page 52 of 52
Download