Convolution and correlation

advertisement
Neighborhood Operations
o
o
o
o
o
Date : 07/14/04
the operation at a pixel depends on the gray-level values of a region of interest surrounding
that pixel
we assume that a neighborhood is centered on a pixel, then it must be a square neighborhood
with odd dimensions
for point operations, once an input pixel has been processed its original value is no longer
needed. This is not possible with neighborhood operators because, even after an output pixel
has been calculated, the corresponding input pixel at that location is included in other
neighborhoods
let us examine spatial variations in the image
applications include blurring, sharpening, noise reduction, edge detection, feature
identification, measuring the similarity of two images
Many image processing operations can be modeled as a linear system.
1. Linear Systems Theory
o
o
o
o
o
o
a linear system
 x(t) => y(t) is linear if x1(t) + x2(t) => y1(t) + y2(t)
 then ax(t) => ay(t)
shift invariance
 x(t - T) => y(t-T)
convolution integral
 want an expression that relates the output signal to the input signal
 use the properties of linearity and shift invariance to produce the convolution integral
 output signal is given by the convolution of the input signal with the impulse response
convolution is commutative, associative, and distributive over addition
2D convolution
discrete 2D convolution
For a linear system, when the input to the system is an impulse (x,y) centered at the origin , the output h(x,y)
is the system’s impulse response.
Input
(x,y)
Linear system
Output
h(x,y)
A system whose response remains the same irrespective of the position of the input pulse is called a space
invariant system:
Input
(x-x0,y-y0)
Linear space invariant
system
Output
h (x-x0, y-y0)
A linear space invariant system can be completely described by its impulse response h(x,y) as follows:
Input
f(x,y)
Linear space invariant
system h(x,y)
Output
g(x,y)
Where f(x,y) and g(x,y) are input and output image . The above system must satisfy the following
relationship:
af1 ( x, y)  bf 2 ( x, y)  ag1 ( x, y)  bg 2 ( x, y) ,where a and b are constant scaling factors.
For such a system the output g(x,y) is convolution of f(x,y) with the impulse response h(x,y) and for the
discrete functions:
n
m
g (i, j )   f ( x, y)h(i  x, j  y)
(1)
x1 y 1
where h (i-x,j-y)=h(x, i,y,j) is impulse response .
We also can define h(x,i,y,j) as a point spread function .
In optics, h(x,y) the inverse ( Fourier transform)of the optical transfer function , is called the point spread
function.
This name is based on the optical phenomenon that the impulse corresponds to a point of light and that an
optical system responds by blurring the point, with the degree of blurring being determined by the quality of
the optical components.
1.1 How are operators defined? The point spread function of an operator is what we get if we apply the
operator on a point source.
O[point source] = point spread function
O[(x-i, y-j)] =h(x,i,y,j)
Where (x-i, y-j) is a point source of brighthness 1 centered at point (I,J)
This function express how much the input value at position (x,y) influences the output value at position (i,j) .
If the influence expressed by he point spread function is independent of the actual positions but depends only
on the relative position of the influencing and the influenced pixels, we have shift invariant point spread
function
- h (i-x,j-y)=h(x, i,y,j)
1.2 How does an operator transform an image? If the operator is linear, when the point source (the pixel)
each with its own brightness value is a a times brighter, the result will be a time larger:O[a(x-i, y-j)]
=ah(x,i,y,j)
An image is a collection of point sources (the pixels).We may say that an image is the sum of these point
sources. Then the effect of an operator characterized by point spread function h(x,i,y,j) on an image f(x,y)
can be presented by equation (2).
N 1 N 1
For the input image N x N a convolution is: g (i. j )   f ( x, y)h( x, i, y, j ) (2)
x 0 y 0
We can express Equation (2) using the shorthand form g(x,y) = h*f(x,y) , where * is convolution operation.
If the columns are influenced independent from the row of the image then the point spread function is
separable: h( x, i, y, j )  hc ( x, i)hr ( y, j )
Then the equation (2) can be written as a cascade of two1D transformation:
N 1
N 1
x 0
y 0
g (i, j )   hc ( x, i) f ( x, y)hr ( y, j )
(3)
N 1
Notice that factor
 f ( x, y)h ( y, j) actually represents the product of two N x N matrices, which must be
y 0
r
another matrix of the same size. Let us define it as:
N 1
s( x, j )   f ( x, y)hr ( y, j )  fhr
(4) and
g  hcT fhr
(5)
y 0
where f and g are the input and output images and hr , hc, are matrices expressing the point spread function
of the operator.
g  hcT f
hr
A separable n x n kernel can be represented as a vector product of two orthogonal, one-dimensional kernels,
each of width n. Convolution with the kernel can be carried out using these one-dimensional components.
One of them is applied down the columns of an image, generating an intermediate result. The other kernel is
then applied along the rows of the intermediate image, producing the final result.
If the digital image looks like this:
 f (0,0)
 f (1,0)
f ( x, y )  

...

 f ( N  1,0)
f (0,1)
f (1,1)
...
...
....
...
f ( N  1,1) .....
f (0, N  1)
f (1, N  1)
...





f ( N  1, N  1)
We can rewrite equation (2) as follows:
g (i, j )  f (0,0)h(0, i,0, j )  f (1,0)h(1, i,0, j )  ...  f ( N  1,0)h( N  1, i,0, j ) 
f (0,1)h(0, i,1, j )  f (1,1)h(1, i,1, j )  .......  f ( N  1,1, h( N  1, i,1, j ) 
 ....  f (0, N  1)h(0, i, N  1, j )  f (1, N  1)h(1, i, N  1, j ) 
f ( N  1, N  1)h( N  1, i, N  1, j )
g  Hf
(6)
The right hand side of this expression can be thought of as the dot product of vector
h Tij  [h(0, i,0, j ), h(1, i,0, j ),...h( N  1, i,0, j ), h(0, i,1, j ), h(1, i,1, j ),...
h( N  1, i,1, j ),...., h(1, i, N  1, j ).,...., h( N  1, i, N  1, j )
with vector
f T  [ f (0,0), f (1,0),...., f ( N  1,0), f (1,1), , , , f ( N  1,1),..., f (0, N  1), f (1, N  1),... f ( N  1, N  1)
This last vector is actually the image f(x,y) written as a vector by stacking its columns one under other. It we
imagine writing g(i,j) in the same way, then vectors hTij will arrange themselves as the rows of a matrix H,
where for i=0, j will run 0 to N-1 to give the first N rows of the matrix, then for i=2, j will run again from 0 to
N-1 to give the second N rows of the matrix, and so on . H is the square N2 x N2 matrix that is made up of N x
N submatrices of size N x N each, arranged in the following way:





H 
i





x
 y  0


 j  0
 y  0


 j  1
 y0 


 j  N  1
 y  1


 j  0
 y  1


 j  1
 y 1 


 j  N  1


y

N

1




 j  0 

 y  N  1 


 j  1 
 y  N  1 

 
 j  N  1 
According to the separability assumption, we can replace h(x,i,y,j) by hc(x,i)hr(y,j). We can present matrix H
as a Kornecker product of matrices hc(x,i)hr(y,j).
H = hTc(x,i)  hTr(y,j)
g  hcT f
hr
2. What is the purpose of Image Processing ?
 Given an image f choose matrices hc and hr so that the output image g is “better” than f according to some
subject criteria. This is the problem of Image Enhancement.
 Given an image f choose matrices hc and hr so that g can be represented by fewer bits than f without much
loss of detail. This is the problem of Image Compression.
 Given an image g and estimate of h(x,i,y,j), recover image f. This is the problem of Image Restoration.
 Given an image f choose matrices hc and hr so that the output image g included certain features of f. This
is the problem of preparation of an image for Automatic Vision
3. Calculating a convolution
During convolution, we take each kernel coefficient in turn and multiply it by a value from a neighborhood
of the image lying under the kernel. We apply the kernel to the image in such a way that the value of the topleft corner of the kernel (h) is multiplied by the value at the bottom-right corner of the neighborhood.
g ( x, y )  h(1,1) f ( x  1, y  1)
h(1,0) f ( x  1, y )


h(1,1) f ( x  1, y  1)
h(0,1) f ( x, y  1)
h(0,0) f ( x, y )


h(0,1) f ( x, y  1)

h(1,1) f ( x  1, y  1)
h(1,0) f ( x  1, y )



h(1,1) f ( x  1, y  1)
The pixels associates with these kernel coefficients are sequenced in precisely the opposite direction.
g ( x, y) 
1
1

h( j, k ) f ( x  j, y  k )
(7)
k  1 j  1
1
1
0
1
0
1
1 0 1
2 0 2
1 0 1
x
72 53 60
y 76 56 65
88 78 82
g(x,y) =(-1 x 82) +(1 x 88)+(-2 x 65) +(2 x 76) + (-1 x 60) +(1 x 72) =40
Note that if we rotate the kernel by 180, then both sequences would run in the same direction. This is
implementation of correlation rather than convolution. Note that the distinction between convolution and
correlation disappears when the kernel is symmetric under 180 rotation.
3.1 Convolution - Conclusions
o
o
used for filtering and edge detection
the output gray-level at a pixel is a weighted sum of the gray-levels of pixels in the
neighborhood
the weights are expressed in matrix form in a convolution kernel
kernel is typically small and the coefficients may be real numbers, and they may be negative
kernel elements are paired with image element as follows:
 kernel elements are taken left-to-right, top-to-bottom
 image elements (in the neighborhood) are taken right-to-left, bottom-to-top
o
o
o
4. Correlation
The correlation can be expressed as follows:
g ( x, y) 
1
1

h( j, k ) f ( x  j, y  k )
(8)
k  1 j  1
This differs from convolution, only that kernel indices j and k are added to, rather than subtracted from, pixel
coordinates x and y. This has the effect of pairing each kernel coefficient with the image pixel that lies
directly beneath it.
The correlation function given in (8) has the disadvantage of being sensitive to changes in the amplitude of
f(x,y) and h(j,k). in brighter parts of an image. For example, doubling all values of f(x,y) doubles the value of
g(x,y). It is customary to normalize g(x,y) dividing by the sum of gray levels in the image neighborhood, i.e
g ' ( x, y ) 

k
g ( x, y )
f ( x  j, y  k )
(9)
j
An approach frequently used to overcome this difficulty is to perform matching via the correlation
coefficient, which is defined as:

 ( x, y ) 
  [h( j, k )  h][ f ( x 
k
j
  [ f (x 
k
j


j , y  k )  f ( x, y )]
j , y  k )  f ( x, y )]
2
(10)

  [ h( j , k )  h ]
k
2
j


where h is the average pixel value in the template, and
neigborhood
f ( x, y ) is the average pixel value in the image
4.1 Correlation - Conclusions
o
o
o
o
o
o
o
used for feature recognition
the kernel is now called a template
template may be larger and usually contains a small image (with integer gray-scale elements)
algorithm is the same as convolution except for the pairing of weights with pixels:
 kernel elements are taken left-to-right, top-to-bottom
 image elements (in the neighborhood) are taken left-to-right, top-to-bottom
you can get correlation using the convolution algorithm by rotating the kernel 180°
convolution and correlation are the same for kernels that are symmetric under 180° rotation
need to normalize the output
5. Computational problems with convolution and correlation
5. 1The first problem is one of representation.
o
o
weighted sum result may exceed the range supported by the number of bits per pixel
 we could normalize the result (usually factored into the kernel coefficients)
 or we could define the output image with more bits per pixel
If any of the kernel coefficients are negative, it is possible for g(x,y) to be negative . In such
cases, we must use a signed data type for the output image.
5.2 The second problem concerns the borders of the image. The entire neighborhood is not defined when
processing pixels that border the image .



1) set the border pixels to black . The simple solution is to ignore those pixels for
which convolution is not possible.
 causes a black border in the output image
 will affect histogram of the output image
2) set the output border pixels to the gray-level in the input border pixels . To copy the
corresponding pixel value from the input image wherever it is not possible to carry out
convolution.
 causes a border of unprocessed pixels in the output image
3) make the output image smaller than the input image and include only processed
pixels
 will affect ability to arithmetically combine input and output images . To
compare them on a pixel by pixel basis.





4) truncate the kernel . To modified the kernel
 complicates the algorithm for performing convolution or correlation
 not always possible to come up with sensible truncated kernels
5) reflected indexing
 let the image pixels outside the border of the image be a reflection of the row
or column they are adjacent to
6) circular indexing
 let the image pixels outside the border of the image be a repetition of the row
or column they are adjacent to . Making the input image periodic by assuming
the first column comes immediately after the last.
Performance
o convolution and correlation are O(n2)
o if the kernel is separable it can be represented by two orthogonal 1D kernels
o separable convolution is done in two passes
 apply one of the kernels down the columns of the image, generating an intermediate
result
 apply the other kernels across the rows of the intermediate image to generate the final
result
o separable convolution is O(n)
o Java Hotspot Technology replaces JIT compilation
Convolution in Java2D
o Kernel class
int width = 3;
int height = 3;
float[] coeff = new float[width * height];
for (int i = 0; i < coeff.length; i++)
coeff[i] = 1.0f/coeff.length;
Kernel kernel = new Kernel(width, height, coeff);
o
o
this kernel is normalized (sum of the coefficients equals 1)
ConvolveOp implements BufferedImageOp
ConvolveOp op = new ConvolveOp(kernel);
BufferedImage outImage = op.filter(inImage,
ConvolveOp.EDGE_ZERO_FILL);
o


ConvolveOp drawbacks
 does not support all border processing methods
 does not support separable convolution
 does not support rescaling of output image (clamps output)
Efford's implementation
o StandardKernel.java
o NeighborhoodOp.java
o ConvolutionOp.java
ConvolutionTool application
Download