Introduction INF 269 Digital Image Analysis Lecture 1

advertisement
INF 269
Digital Image Analysis
Lecture 1
27/8 2002
Introduction
• We make a distinction between image
processing and image analysis.
• The result of image processing is usually a
new (and better) image, or a set of
coefficients that through a suitable
transform will produce a new image.
— Image enhancement
— Image restoration
The digitized image
and its properties
— Image compression
— Image coding
• Image analysis extracts information,
performs quantitative measurements.
Fritz Albregtsen
Department of Informatics
University of Oslo
INF 269, 2002, Lecture 1, page 1 of 28
— Image analysis should be able to handle
a complex scene.
— Pattern recognition is limited to
classification into an a priori described
set of classes.
INF 269, 2002, Lecture 1, page 2 of 28
Digitalization
Imaging and image functions
• The world around us is 3D.
• Monocular projections give 2D images.
• We use third dimension to display
intensity (2 12 D).
• Information about depth is lost.
• Stereo images are two projections
from different viewpoints.
• Finding corresponding points,
we may recover depth information.
• Holograms are another source
of depth and intensity information.
• 2D images are not always projections:
E.g. 2D slices in medical microscopy.
• Series of physical or non-invasive 2D slices
give 3D volumetric images.
INF 269, 2002, Lecture 1, page 3 of 28
• A continous real world scene may be
modeled by a function having continous
domain and range.
• Putting a (sampling) grid on the projected
image, the model domain becomes
discrete.
• Scaling and discretizing the range to fit the
digital sampling and storage, we have a
digital representation of the scene.
• The sampling points in the grid
correspond to pixels.
• The set of pixels covers the entire image,
regardless of the physical size of the
individual detectors.
• Pixels have an implicit position, given by
their location in an array, and a pixel
value; the measured intensity.
• A pixel may also be referred to as a point,
and the pixel value as the image function
value at that point.
INF 269, 2002, Lecture 1, page 4 of 28
The dimensions of digital images
• A digital image may be e.g. a
5-dimensional discrete function
f [x, y, z, t, λ]
where
— x an y are the two first dimensions
representing the image plane
— z is the third dimension representing
volume
— t is the time dimension in image
sequences (motion pictures, video)
— λ is the wavelength (frequency)
dimension, e.g. RGB in colour images.
• Unless otherwise stated, we will assume
that f is 2D, i.e. f (x, y).
• A scalar function may describe a
monochromatic image.
• Vectors may describe a three-component
colour image.
Quantization
• The digital image function value is
discretized by quantization into 2b gray
levels, ranging from 0 = black to 2b − 1 =
white.
• Often, gray level images are represented
with b = 8 (=1 Byte) per pixel, giving 256
different gray levels.
• Colour images are often stored with b = 8
per colour, i.e. 3 Bytes per pixel.
• Medical and other scientific gray level
images may be stored with e.g. 216 = 65536
gray levels.
• Our visual system may handle ≈ 50 < 26
distinguishable gray levels.
• The range of light intensity levels that our
visual system can adapt to is enormous on the order of 1010.
INF 269, 2002, Lecture 1, page 5 of 28
Spatial, temporal and
wavelength discretization
The dimensions must also be discretized:
• The spatial dimensions x and y are often a
power of 2 (e.g. 512 × 512, 1024 × 1024, ...).
• The resolution in the z-direction
is often lower than in x and y
(e.g. confocal microscopy).
• The resolution in the t-dimension
may vary greatly:
— In video it is a fixed rate per second.
— In medical MR it may be governed by the
heart rate.
— It may also be variable, e.g. triggered by
change (motion) detection.
• The resolution in wavelength is governed
by the problem at hand, and the available
technology.
— RGB-cameras
— LANDSAT: 7 channels
INF 269, 2002, Lecture 1, page 6 of 28
Sampling and convolution
• The convolution of two 2D functions f and
h is given by
g(x, y) = fZ (x, y)
∗ h(x, y)
∞ Z ∞
f (a, b)h(x − a, y − b)dadb
=
−∞
Z−∞
∞ Z ∞
f (x − a, y − b)h(a, b)dadb
=
−∞
−∞
• Sampling of a 2D projected continous
image f (x, y) at a given location (λ, µ) may
be described by a convolution
Z ∞Z ∞
f (λ, µ) =
f (x, y)δ(x − λ, y − µ)dx, dy
−∞
−∞
where the Dirac distribution is given by
Z ∞Z ∞
δ(x, y)dxdy = 1, δ(x, y) = 0 ∀x, y 6= 0
−∞
−∞
• The complete image is then a linear
combination of such convolutions, one for
each point in the grid.
— Hyperspectral: more than 100 bands
INF 269, 2002, Lecture 1, page 7 of 28
INF 269, 2002, Lecture 1, page 8 of 28
Real sampling
• Actually, the 2D continous image is formed
by a convolution of the ideal projection
with the (symmetric) PSF of e.g. the
camera aperture and lens system
h0(r) =
1
(1 − ε2)2
2J1 (πr/β0 )
(πr/β0 )
− 2
2J1 (πεr/β0 )
(πεr/β0 )
2
β0 = λ/D, ε ∈ [0, 1], D=diameter of aperture,
ε=relative diameter of (eventual) central obscuration
• This continous image is then sampled by
finite detectors (area A) in the focal plane.
So actually, a pixel value is given by an
integration
Z
g(x, y) =
f (x, y) ∗ h0(x, y)
A(x,y)
• Convolution is a linear
translation-invariant operation.
• The translation has to be small for this to
be true, because of the finite support.
• If noise is absent, the PSF may be inverted,
and a sharp(ened) image obtained.
• The integration over the detector surface is
not possible to invert.
Fourier transform pair
• To investigate image properties (analysis)
and to filter images (image processing)
we may decompose the image function
using a linear combination of orthonormal
functions.
• The Fourier transform uses harmonic
functions (sine and cosine) to decompose
an assumed periodic function f (x, y). (Are
all images periodic ?)
• Let f (x, y) be a continous function of the
real variables (x, y).
The 2D Fourier transform
F (f (x, y)) = F (u, v) is defined by
Z ∞Z ∞
F (f (x, y)) =
f (x, y)e−2πi(xu+yv)dxdy
−∞
−∞
• Given F (u, v), the image function f (x, y)
can be obtained by the inverse Fourier
fransform
Z ∞Z ∞
−1
F (u, v)e2πi(xu+yv)dudv
F (F (u, v))) =
−∞
−∞
• Noise makes the problem non-trivial!
INF 269, 2002, Lecture 1, page 9 of 28
Fourier transform
• F (u, v) is generally complex:
F (u, v) = <(u, v) + i=(u, v) = |F (u, v)|eiφ(u,v)
<(u, v) is the real component of F (u, v).
=(u, v) is the imaginary component of
F (u, v).
p
|F (u, v)| = <2(u, v) + =2(u, v)
is the magnitude function, also called the
Fourier spectrum.
h
i
φ(u, v) = tan−1 =(u,v)
<(u,v)
is the phase angle.
• So, F (u, v) can be interpreted as the
weights in a summed linear combination
of simple periodic patterns over a range of
frequencies (u, v) that will produce the
image f (x, y).
INF 269, 2002, Lecture 1, page 11 of 28
INF 269, 2002, Lecture 1, page 10 of 28
Discrete Fourier transform
• When f (x, y) is a discrete image function
the Fourier transform becomes
M −1 N −1
ux vy
1 XX
F (u, v) =
f (x, y)e−2πi( M + N )
M N x=0 y=0
• The inverse Fourier transform is then
M
−1 N
−1
X
X
ux vy
f (x, y) =
F (u, v)e2πi( M + N )
u=0 v=0
• The sampling increments in the spatial
and the frequence domains are related by
1
∆x =
M ∆u
and
1
∆y =
N ∆v
INF 269, 2002, Lecture 1, page 12 of 28
Linearity, Separability, Translation
• Linearity
Fourier transform properties
F [af1(x, y) + bf2(x, y)] = aF1(u, v) + bF2(u, v)
• Linearity
Real images are not strictly linear. They
have limited size, and the number of gray
levels is finite.
• Separability
• Separability
For images with M = N , F (u, v) or f (x, y)
can be obtained by 2 successive
applications of the simple 1D Fourier
transform or its inverse.
• Translation
• Similarity
• Periodicity
• Translation
• Symmetry
F [f (x − a, y − b)] = F (u, v)e−2πi(au+bv)
• Rotation
• Duality of convolution
A shift in the origin in the image domain
does not affect the magnitude function
|F (u, v)|, only the phase angle φ(u, v).
• Similarity
F f(ax, by) = |ab|−1F (u/a, v/b)
INF 269, 2002, Lecture 1, page 13 of 28
INF 269, 2002, Lecture 1, page 14 of 28
Periodicity, Symmetry, Rotation
• Periodicity
The discrete Fourier transform and its
inverse are periodic with period N
F (u, v) = F (u+N, v) = F (u, v+N ) = F (u+N, v+N )
⇒ Only one period is necessary
to reconstruct f (x, y).
• Symmetry
As f (x, y) is real valued,
F (−u, −v) = F ∗(u, v)
and we can use the result of the Fourier
transform in the first quadrant without
loss of generality.
If f (x, y) = f (−x, −y),
then F (u, v) is a real function.
• Rotation
Rotating f (x, y) by an angle θ
rotates F (u, v) by the same angle.
Rotating F (u, v) rotates f (x, y) by same
angle.
INF 269, 2002, Lecture 1, page 15 of 28
Convolution theorem
Convolution in the spatial domain
m
multiplication in the frequency domain,
i.e.
f (x, y) ∗ g(x, y) = F −1 [F (u, v) · G(u, v)]
and vice versa
F (u, v) ∗ G(u, v) = F [f (x, y) · g(x, y)]
INF 269, 2002, Lecture 1, page 16 of 28
Colour images
Shannon Sampling Theorem
• Problem: How many (equidistant) samples
should we obtain,
so that no information is lost in the
sampling process?
• For complete recovery of a function f the
following must be satisfied:
1) F must be band-limited,
i.e. F (u, v) = 0∀|u|, |v| > w,
2) Sampling must be performed
1
with interval smaller than 2w
,
w is the highest frequency in the image.
• If the conditions are not satisfied,
f is undersampled,
and the resulting image (or signal)
is corrupted by aliasing.
• Humans detect colours as combinations of
primary colours red, gree, and blue.
• Standard primary colours are 700 nm,
546.1 nm, and 435.8 nm.
• Tri-stimulus curves (x(λ), y(λ), z(λ))have
been defined (CIE 1931).
• For a given colour having a wavelengt
distribution c(λ), we may find the triplet
(X,Y,Z) giving the amounts of primaries
that will produce the colour C.
• The colour is then reduced to a point in the
positive octant of XYZ-space.
• y(λ) equals the human light sensitivity
function. Thus, Y equals luminance.
• The RGB space provides (2b)3 distinct
colours.
• A more relevant colour space is IHS.
• Conversions between colour spaces are
simple, but be aware of pitfalls!
INF 269, 2002, Lecture 1, page 17 of 28
INF 269, 2002, Lecture 1, page 18 of 28
Histograms
Distances and metrics
• The histogram value h(z) gives the number
of times the gray value z occurs in a given
image.
• A space is called a metric space if for any of
its two elements x amd y, there is a number
ρ(x, y), called the distance, that satisfies the
following properties
• The sum of the histogram corresponds to
the size of the image
• The normalized histogram
H(z) =
h(z)
,
N ×M
G−1
X
— ρ(x, y) ≥ 0 (non-negativity)
— ρ(x, y) = 0 if and only if x = y (identity)
— ρ(x, y) = ρ(y, x) (symmetry)
H(Z) = 1
Z=0
gives the first order probability density of
the pixel value.
• The histogram provides information on
the use of the gray scale.
• One histogram may correspond to several
different images, as no geometric
information is involved.
• Local histogram maxima and minima may
be removed by smoothing.
• Images may be standardized by forcing
their histograms to be equal, or be some
first order statistics to be equal.
INF 269, 2002, Lecture 1, page 19 of 28
— ρ(x, z) ≤ ρ(x, y) + ρ(y, z) (∆ inequality)
• Distances between two points x and µ
in n-dimensional space
1) Euclidian
DE (x, µ) =k x − µ k=
" n
X
#1/2
(xk − µk )
2
k=1
2) “City block”/”Taxi”/ “Absolute value”
n
X
D4(x, µ) =
|xk − µk |
k=1
3) “Chessboard”/”Maximum value”
D8(x, µ) = max |xk − µk |
INF 269, 2002, Lecture 1, page 20 of 28
Applications of distances
• The distance from a point to an object.
Useful when making maps of distances
from object borders.
• The distance between two distributions.
Useful when comparing e.g. histograms
in image search and retrieval.
— Minkowski distance:
X
dLp (H, K) =
!1/p
|hi − ki|p
i
— Kullback-Leibner divergence:
X
hi
dKL(H, K) =
hi log
ki
i
— Jeffrey divergence:
X
hi
ki
hi log
+ ki log
dJ (H, K) =
mi
mi
i
mi = (hi + ki)/2
— χ2 statistics:
dχ2 (H, K) =
X (hi − mi)2
i
Adjacency, paths, and regions
• Any two pixels are 4-neighbors if their
D4 = 1.
• Any two pixels are 8-neighbors if their
D8 = 1.
• A path between two pixels P1 and Pn is a
sequence of pixels P1, P2, ..., Pn, where Pi+1
is a neighbor of Pi for i = 1, ..., n − 1.
• A region is a set of pixels in which there is a
path between any pair of its pixels, and all
pixels of the path are also belonging to the
set.
• If there is a path between two pixels, they
are contiguous.
• A region is a set of pixels where each pixel
pair in the set is contiguous.
mi
— Earth Mover’s Distance
INF 269, 2002, Lecture 1, page 21 of 28
Regions, background, holes
• Assume that Ri are disjoint regions
(sets of contiguous pixels).
• Assume that none of these touch the set B
of image boundary pixels.
• Let R be the union of all Ri.
• Let RC be the set complement of R.
• The subset of R that is contiguous with
the set B is called background,
and the rest of RC is called holes.
C
• A region without holes
is simply contiguous.
• A region with holes is multiply contigous.
INF 269, 2002, Lecture 1, page 23 of 28
INF 269, 2002, Lecture 1, page 22 of 28
Objects, borders, edges
• Based on properties, a region may be
(part of) an object.
• Paradoxes may be avoided using D4 for
objects and D8 for background,
or vice versa.
• The inner border of a region R is the
set of pixels ∈ R that have one or more
neighbors outside R.
• The edge is a local property of a pixel and
its immediate neighborhood.
It is a vector, with magnitude and
direction.
INF 269, 2002, Lecture 1, page 24 of 28
Image quality I
• Both subjective and objective methods
for assessing image quality are
problem-dependent.
Visual perception of images
Important aspects include:
• Quantitative measures on an image f (x, y)
may refer to a reference image g(x, y).
• The “error” is then
e(x, y) = g(x, y) − f (x, y)
• The RMS-deviation between two images is

1/2
N X
M
X
1
erms = 
(g(x, y) − f (x, y))2
N × M x=1 y=1
• Contrast
• Acuity
• Object borders
• Colour
• The signal to noise ratio (SNR) is
PN PM 2
x=1
y=1 g (x, y)
(SN R)ms = PN PM
2
x=1
y=1 [e (x, y)]
• Texture
• The RMS-value of the SNR is then
v
u PN PM
u x=1 y=1 [g 2(x, y)]
(SN R)RM S = t PN PM
2
x=1
y=1 [e (x, y)]
INF 269, 2002, Lecture 1, page 25 of 28
INF 269, 2002, Lecture 1, page 26 of 28
Noise in images
Image quality II
• Alternatively: give the “peak” SNR:
v
u PN PM
u x=1 y=1 [max(g(x, y)) − min(g(x, y))]2
(SN R)p = t
PN PM 2
x=1
y=1 [e (x, y)]
• SNR is often given in dB
Λ = 10 log10(SN R)
• “Objective” measures often sum
over the whole image.
• A few big differences may give same result
as a lot of small ones.
• Percepted quality does not correlate well
with such crude measures.
• Better to segment image according to
local information content.
INF 269, 2002, Lecture 1, page 27 of 28
• White noise implies that the noise is
independent of frequency, i.e. constant
power spectrum.
• Most photodetectors suffer from thermal
noise, which can be modeled by a
Gaussian distribution
−(x−µ)2
1
p(x) = √ e 2σ2
σ 2π
• Additive noise is signal independent
f (x, y) = g(x, y) + η(x, y)
• Multiplicative noise is related to
the image intensity itself
f (x, y) = g(x, y) + ηg(x, y)
• Quantization noise is caused by too few or misplaced - quantization levels.
• Impulse noise pixels deviate significantly
from that of the neighborhood.
• The term salt-and-pepper noise describes
saturated (white and black) impulsive
noise.
INF 269, 2002, Lecture 1, page 28 of 28
Download