INF 269 Digital Image Analysis Lecture 1 27/8 2002 Introduction • We make a distinction between image processing and image analysis. • The result of image processing is usually a new (and better) image, or a set of coefficients that through a suitable transform will produce a new image. — Image enhancement — Image restoration The digitized image and its properties — Image compression — Image coding • Image analysis extracts information, performs quantitative measurements. Fritz Albregtsen Department of Informatics University of Oslo INF 269, 2002, Lecture 1, page 1 of 28 — Image analysis should be able to handle a complex scene. — Pattern recognition is limited to classification into an a priori described set of classes. INF 269, 2002, Lecture 1, page 2 of 28 Digitalization Imaging and image functions • The world around us is 3D. • Monocular projections give 2D images. • We use third dimension to display intensity (2 12 D). • Information about depth is lost. • Stereo images are two projections from different viewpoints. • Finding corresponding points, we may recover depth information. • Holograms are another source of depth and intensity information. • 2D images are not always projections: E.g. 2D slices in medical microscopy. • Series of physical or non-invasive 2D slices give 3D volumetric images. INF 269, 2002, Lecture 1, page 3 of 28 • A continous real world scene may be modeled by a function having continous domain and range. • Putting a (sampling) grid on the projected image, the model domain becomes discrete. • Scaling and discretizing the range to fit the digital sampling and storage, we have a digital representation of the scene. • The sampling points in the grid correspond to pixels. • The set of pixels covers the entire image, regardless of the physical size of the individual detectors. • Pixels have an implicit position, given by their location in an array, and a pixel value; the measured intensity. • A pixel may also be referred to as a point, and the pixel value as the image function value at that point. INF 269, 2002, Lecture 1, page 4 of 28 The dimensions of digital images • A digital image may be e.g. a 5-dimensional discrete function f [x, y, z, t, λ] where — x an y are the two first dimensions representing the image plane — z is the third dimension representing volume — t is the time dimension in image sequences (motion pictures, video) — λ is the wavelength (frequency) dimension, e.g. RGB in colour images. • Unless otherwise stated, we will assume that f is 2D, i.e. f (x, y). • A scalar function may describe a monochromatic image. • Vectors may describe a three-component colour image. Quantization • The digital image function value is discretized by quantization into 2b gray levels, ranging from 0 = black to 2b − 1 = white. • Often, gray level images are represented with b = 8 (=1 Byte) per pixel, giving 256 different gray levels. • Colour images are often stored with b = 8 per colour, i.e. 3 Bytes per pixel. • Medical and other scientific gray level images may be stored with e.g. 216 = 65536 gray levels. • Our visual system may handle ≈ 50 < 26 distinguishable gray levels. • The range of light intensity levels that our visual system can adapt to is enormous on the order of 1010. INF 269, 2002, Lecture 1, page 5 of 28 Spatial, temporal and wavelength discretization The dimensions must also be discretized: • The spatial dimensions x and y are often a power of 2 (e.g. 512 × 512, 1024 × 1024, ...). • The resolution in the z-direction is often lower than in x and y (e.g. confocal microscopy). • The resolution in the t-dimension may vary greatly: — In video it is a fixed rate per second. — In medical MR it may be governed by the heart rate. — It may also be variable, e.g. triggered by change (motion) detection. • The resolution in wavelength is governed by the problem at hand, and the available technology. — RGB-cameras — LANDSAT: 7 channels INF 269, 2002, Lecture 1, page 6 of 28 Sampling and convolution • The convolution of two 2D functions f and h is given by g(x, y) = fZ (x, y) ∗ h(x, y) ∞ Z ∞ f (a, b)h(x − a, y − b)dadb = −∞ Z−∞ ∞ Z ∞ f (x − a, y − b)h(a, b)dadb = −∞ −∞ • Sampling of a 2D projected continous image f (x, y) at a given location (λ, µ) may be described by a convolution Z ∞Z ∞ f (λ, µ) = f (x, y)δ(x − λ, y − µ)dx, dy −∞ −∞ where the Dirac distribution is given by Z ∞Z ∞ δ(x, y)dxdy = 1, δ(x, y) = 0 ∀x, y 6= 0 −∞ −∞ • The complete image is then a linear combination of such convolutions, one for each point in the grid. — Hyperspectral: more than 100 bands INF 269, 2002, Lecture 1, page 7 of 28 INF 269, 2002, Lecture 1, page 8 of 28 Real sampling • Actually, the 2D continous image is formed by a convolution of the ideal projection with the (symmetric) PSF of e.g. the camera aperture and lens system h0(r) = 1 (1 − ε2)2 2J1 (πr/β0 ) (πr/β0 ) − 2 2J1 (πεr/β0 ) (πεr/β0 ) 2 β0 = λ/D, ε ∈ [0, 1], D=diameter of aperture, ε=relative diameter of (eventual) central obscuration • This continous image is then sampled by finite detectors (area A) in the focal plane. So actually, a pixel value is given by an integration Z g(x, y) = f (x, y) ∗ h0(x, y) A(x,y) • Convolution is a linear translation-invariant operation. • The translation has to be small for this to be true, because of the finite support. • If noise is absent, the PSF may be inverted, and a sharp(ened) image obtained. • The integration over the detector surface is not possible to invert. Fourier transform pair • To investigate image properties (analysis) and to filter images (image processing) we may decompose the image function using a linear combination of orthonormal functions. • The Fourier transform uses harmonic functions (sine and cosine) to decompose an assumed periodic function f (x, y). (Are all images periodic ?) • Let f (x, y) be a continous function of the real variables (x, y). The 2D Fourier transform F (f (x, y)) = F (u, v) is defined by Z ∞Z ∞ F (f (x, y)) = f (x, y)e−2πi(xu+yv)dxdy −∞ −∞ • Given F (u, v), the image function f (x, y) can be obtained by the inverse Fourier fransform Z ∞Z ∞ −1 F (u, v)e2πi(xu+yv)dudv F (F (u, v))) = −∞ −∞ • Noise makes the problem non-trivial! INF 269, 2002, Lecture 1, page 9 of 28 Fourier transform • F (u, v) is generally complex: F (u, v) = <(u, v) + i=(u, v) = |F (u, v)|eiφ(u,v) <(u, v) is the real component of F (u, v). =(u, v) is the imaginary component of F (u, v). p |F (u, v)| = <2(u, v) + =2(u, v) is the magnitude function, also called the Fourier spectrum. h i φ(u, v) = tan−1 =(u,v) <(u,v) is the phase angle. • So, F (u, v) can be interpreted as the weights in a summed linear combination of simple periodic patterns over a range of frequencies (u, v) that will produce the image f (x, y). INF 269, 2002, Lecture 1, page 11 of 28 INF 269, 2002, Lecture 1, page 10 of 28 Discrete Fourier transform • When f (x, y) is a discrete image function the Fourier transform becomes M −1 N −1 ux vy 1 XX F (u, v) = f (x, y)e−2πi( M + N ) M N x=0 y=0 • The inverse Fourier transform is then M −1 N −1 X X ux vy f (x, y) = F (u, v)e2πi( M + N ) u=0 v=0 • The sampling increments in the spatial and the frequence domains are related by 1 ∆x = M ∆u and 1 ∆y = N ∆v INF 269, 2002, Lecture 1, page 12 of 28 Linearity, Separability, Translation • Linearity Fourier transform properties F [af1(x, y) + bf2(x, y)] = aF1(u, v) + bF2(u, v) • Linearity Real images are not strictly linear. They have limited size, and the number of gray levels is finite. • Separability • Separability For images with M = N , F (u, v) or f (x, y) can be obtained by 2 successive applications of the simple 1D Fourier transform or its inverse. • Translation • Similarity • Periodicity • Translation • Symmetry F [f (x − a, y − b)] = F (u, v)e−2πi(au+bv) • Rotation • Duality of convolution A shift in the origin in the image domain does not affect the magnitude function |F (u, v)|, only the phase angle φ(u, v). • Similarity F f(ax, by) = |ab|−1F (u/a, v/b) INF 269, 2002, Lecture 1, page 13 of 28 INF 269, 2002, Lecture 1, page 14 of 28 Periodicity, Symmetry, Rotation • Periodicity The discrete Fourier transform and its inverse are periodic with period N F (u, v) = F (u+N, v) = F (u, v+N ) = F (u+N, v+N ) ⇒ Only one period is necessary to reconstruct f (x, y). • Symmetry As f (x, y) is real valued, F (−u, −v) = F ∗(u, v) and we can use the result of the Fourier transform in the first quadrant without loss of generality. If f (x, y) = f (−x, −y), then F (u, v) is a real function. • Rotation Rotating f (x, y) by an angle θ rotates F (u, v) by the same angle. Rotating F (u, v) rotates f (x, y) by same angle. INF 269, 2002, Lecture 1, page 15 of 28 Convolution theorem Convolution in the spatial domain m multiplication in the frequency domain, i.e. f (x, y) ∗ g(x, y) = F −1 [F (u, v) · G(u, v)] and vice versa F (u, v) ∗ G(u, v) = F [f (x, y) · g(x, y)] INF 269, 2002, Lecture 1, page 16 of 28 Colour images Shannon Sampling Theorem • Problem: How many (equidistant) samples should we obtain, so that no information is lost in the sampling process? • For complete recovery of a function f the following must be satisfied: 1) F must be band-limited, i.e. F (u, v) = 0∀|u|, |v| > w, 2) Sampling must be performed 1 with interval smaller than 2w , w is the highest frequency in the image. • If the conditions are not satisfied, f is undersampled, and the resulting image (or signal) is corrupted by aliasing. • Humans detect colours as combinations of primary colours red, gree, and blue. • Standard primary colours are 700 nm, 546.1 nm, and 435.8 nm. • Tri-stimulus curves (x(λ), y(λ), z(λ))have been defined (CIE 1931). • For a given colour having a wavelengt distribution c(λ), we may find the triplet (X,Y,Z) giving the amounts of primaries that will produce the colour C. • The colour is then reduced to a point in the positive octant of XYZ-space. • y(λ) equals the human light sensitivity function. Thus, Y equals luminance. • The RGB space provides (2b)3 distinct colours. • A more relevant colour space is IHS. • Conversions between colour spaces are simple, but be aware of pitfalls! INF 269, 2002, Lecture 1, page 17 of 28 INF 269, 2002, Lecture 1, page 18 of 28 Histograms Distances and metrics • The histogram value h(z) gives the number of times the gray value z occurs in a given image. • A space is called a metric space if for any of its two elements x amd y, there is a number ρ(x, y), called the distance, that satisfies the following properties • The sum of the histogram corresponds to the size of the image • The normalized histogram H(z) = h(z) , N ×M G−1 X — ρ(x, y) ≥ 0 (non-negativity) — ρ(x, y) = 0 if and only if x = y (identity) — ρ(x, y) = ρ(y, x) (symmetry) H(Z) = 1 Z=0 gives the first order probability density of the pixel value. • The histogram provides information on the use of the gray scale. • One histogram may correspond to several different images, as no geometric information is involved. • Local histogram maxima and minima may be removed by smoothing. • Images may be standardized by forcing their histograms to be equal, or be some first order statistics to be equal. INF 269, 2002, Lecture 1, page 19 of 28 — ρ(x, z) ≤ ρ(x, y) + ρ(y, z) (∆ inequality) • Distances between two points x and µ in n-dimensional space 1) Euclidian DE (x, µ) =k x − µ k= " n X #1/2 (xk − µk ) 2 k=1 2) “City block”/”Taxi”/ “Absolute value” n X D4(x, µ) = |xk − µk | k=1 3) “Chessboard”/”Maximum value” D8(x, µ) = max |xk − µk | INF 269, 2002, Lecture 1, page 20 of 28 Applications of distances • The distance from a point to an object. Useful when making maps of distances from object borders. • The distance between two distributions. Useful when comparing e.g. histograms in image search and retrieval. — Minkowski distance: X dLp (H, K) = !1/p |hi − ki|p i — Kullback-Leibner divergence: X hi dKL(H, K) = hi log ki i — Jeffrey divergence: X hi ki hi log + ki log dJ (H, K) = mi mi i mi = (hi + ki)/2 — χ2 statistics: dχ2 (H, K) = X (hi − mi)2 i Adjacency, paths, and regions • Any two pixels are 4-neighbors if their D4 = 1. • Any two pixels are 8-neighbors if their D8 = 1. • A path between two pixels P1 and Pn is a sequence of pixels P1, P2, ..., Pn, where Pi+1 is a neighbor of Pi for i = 1, ..., n − 1. • A region is a set of pixels in which there is a path between any pair of its pixels, and all pixels of the path are also belonging to the set. • If there is a path between two pixels, they are contiguous. • A region is a set of pixels where each pixel pair in the set is contiguous. mi — Earth Mover’s Distance INF 269, 2002, Lecture 1, page 21 of 28 Regions, background, holes • Assume that Ri are disjoint regions (sets of contiguous pixels). • Assume that none of these touch the set B of image boundary pixels. • Let R be the union of all Ri. • Let RC be the set complement of R. • The subset of R that is contiguous with the set B is called background, and the rest of RC is called holes. C • A region without holes is simply contiguous. • A region with holes is multiply contigous. INF 269, 2002, Lecture 1, page 23 of 28 INF 269, 2002, Lecture 1, page 22 of 28 Objects, borders, edges • Based on properties, a region may be (part of) an object. • Paradoxes may be avoided using D4 for objects and D8 for background, or vice versa. • The inner border of a region R is the set of pixels ∈ R that have one or more neighbors outside R. • The edge is a local property of a pixel and its immediate neighborhood. It is a vector, with magnitude and direction. INF 269, 2002, Lecture 1, page 24 of 28 Image quality I • Both subjective and objective methods for assessing image quality are problem-dependent. Visual perception of images Important aspects include: • Quantitative measures on an image f (x, y) may refer to a reference image g(x, y). • The “error” is then e(x, y) = g(x, y) − f (x, y) • The RMS-deviation between two images is 1/2 N X M X 1 erms = (g(x, y) − f (x, y))2 N × M x=1 y=1 • Contrast • Acuity • Object borders • Colour • The signal to noise ratio (SNR) is PN PM 2 x=1 y=1 g (x, y) (SN R)ms = PN PM 2 x=1 y=1 [e (x, y)] • Texture • The RMS-value of the SNR is then v u PN PM u x=1 y=1 [g 2(x, y)] (SN R)RM S = t PN PM 2 x=1 y=1 [e (x, y)] INF 269, 2002, Lecture 1, page 25 of 28 INF 269, 2002, Lecture 1, page 26 of 28 Noise in images Image quality II • Alternatively: give the “peak” SNR: v u PN PM u x=1 y=1 [max(g(x, y)) − min(g(x, y))]2 (SN R)p = t PN PM 2 x=1 y=1 [e (x, y)] • SNR is often given in dB Λ = 10 log10(SN R) • “Objective” measures often sum over the whole image. • A few big differences may give same result as a lot of small ones. • Percepted quality does not correlate well with such crude measures. • Better to segment image according to local information content. INF 269, 2002, Lecture 1, page 27 of 28 • White noise implies that the noise is independent of frequency, i.e. constant power spectrum. • Most photodetectors suffer from thermal noise, which can be modeled by a Gaussian distribution −(x−µ)2 1 p(x) = √ e 2σ2 σ 2π • Additive noise is signal independent f (x, y) = g(x, y) + η(x, y) • Multiplicative noise is related to the image intensity itself f (x, y) = g(x, y) + ηg(x, y) • Quantization noise is caused by too few or misplaced - quantization levels. • Impulse noise pixels deviate significantly from that of the neighborhood. • The term salt-and-pepper noise describes saturated (white and black) impulsive noise. INF 269, 2002, Lecture 1, page 28 of 28