Computer Vision

advertisement
RECOMMENDED TEXTS
Neural Networks, Computer Vision
Robotics Engineering Master Degree
EMARO - European Master on Advanced Robotics
• D. Forsyth and J. Ponce, Computer Vision: A Modern Approach,
Prentice Hall 2002
• Y. Ma, S. Soatto, J. Kosecka, S. Sastry, An Invitation to 3D
Vision. From Images
g to Geometric Models,, Springer-Verlag,
p g
g,
New York 2004. On-line: vision.ucla.edu/MASKS/
Computer Vision
Fabio Solari
fabio.solari@unige.it
• R.C. Gonzalez and R.E. Woods, Digital image processing,
Prentice-Hall, 2008
URL:
http://www.pspc.unige.it
Silvio P. Sabatini
silvio.sabatini@unige.it
Manuela Chessa
manuela.chessa@unige.it
•
I. Pitas, Digital Image Processing Algorithms, Prentice Hall, New York, 1993.
•
O Faugeras,
O.
F
Th
Three-dimensional
di
i
l computer
t vision.
i i
A geometric
t i viewpoint,
i
i t The
Th MIT Press.
P
Cambridge, Mass. 1993.
• Material distributed by lecturers:
http://www.pspc.unige.it/~fabio/teaching/Computer_Vision.html
Computer Vision
2
SOFTWARE TOOL: MATLAB
SOFTWARE TOOL: OPENCV
• MATLAB is a high-performance language for
technical computing. It integrates computation,
visualization, and programming in an easy-touse environment where problems and solutions
are expressed in familiar mathematical notation.
• OpenCV (Open Source Computer Vision) is a
lib
library
off programming
i
f
functions
ti
f reall time
for
ti
computer vision. OpenCV is released under a
BSD license,
li
it is
i free
f
f both
for
b th academic
d i and
d
commercial use. It has C++, C, Python and soon
J
Java
i t f
interfaces
running
i
on Windows,
Wi d
Li
Linux,
Android and Mac.
• Very good online help.
• Good tool for learning and prototyping.
prototyping
Computer Vision
• http://opencv.willowgarage.com/wiki/
p p
g g
• Good
tool
for
implementing
applications
applications.
3
Computer Vision
real-world
4
INTRODUCTION
INTRODUCTION
• Goal and objectives:
– To introduce the fundamental problems of computer
vision.
– To introduce the main concepts and techniques used
to solve those.
– To enable participants to implement solutions for
reasonably
bl complex
l problems.
bl
Computer Vision
5
• The sense of vision plays an important role in
the life of primates: it allows them to infer
spatio-temporal properties of the environment
that are necessary to perform crucial tasks for
survival.
p
to endow machines with a
• Thus,, it is important
sense of vision in order to have them interacting
with humans and the environment in a dynamic
y
way.
Computer Vision
INTRODUCTION
INTRODUCTION
• But vision is deceivingly easy:
• To interact with the real-world, we have to tackle
the problem of inferring 3
3-D
D information of a
scene from a set of 2-D images.
• In general, this problem falls into the category of
so-called inverse problems, which are prone to be
ill-conditioned and difficult to solve in their full
generality unless additional assumptions are
imposed.
– vision
s o iss immediate
ed ate for
o us
– we perceive the visual world as external to ourselves,
but it is a reconstruction within our brain
• Before we address how to reconstruct 3-D
geometry from 2-D images, we first need to
understand how 2-D images are generated and
processed.
d
• Vision is computationally demanding
demanding.
Computer Vision
6
7
Computer Vision
8
ARTIFICIAL VISION
ARTIFICIAL VISION
Computer
Vision
Computer
Vision
Image
Processing
Image
Processing
Pattern
Recognition
Machine
Vision
Pattern
Recognition
vision
i i for
f action
ti
Machine
Vision
dynamic 3D information
about the environment
Computer Vision
9
Real-time implementation
of the vision algorithms
Computer Vision
10
ARTIFICIAL VISION
ARTIFICIAL VISION
real world
edge detection
objects
edge detection
image segmentation
images
Matrix of
pixels
object recognition
camera
objects
optic flow
disparity
visual features
Computer Vision
specific
objects
object speed
image
image segmentation
object distance
real-world
properties 11
pixels
Computer Vision
visual features
12
ARTIFICIAL VISION
ARTIFICIAL VISION
visual features
real world
visual features
real world
disparity map
camera1
optic flow
far object
camera2
object speed=v1
camera
near object
image1
image2
object speed=v2
image1
image2
image3
…
Computer Vision
t=t1
pixels
t=t1
13
Computer Vision
OUTLINE
t=t2
14
1D SIGNAL PROCESSING
• The convolution operator can be thought as a
general moving average.
average
• 1D digital signal processing
– Convolution, Fourier Transform, Nyquist rate
– Continuous domain:
(f * g )(t )   f ( )g (t   )d
• Image formation
– Pinhole camera
– Visual sensors
– Digital image
– Discrete domain:
(f * g )(m )   f (n )g (m  n )
n
• Convolution can describe the effect of an LTI
(Linear Time-Invariant) system on a signal.
• Digital
Di it l iimage operations
ti
Computer Vision
t=t3
pixels
15
Computer Vision
16
SIGNAL PROCESSING
SIGNAL PROCESSING
• Basic steps of the convolution:
• Frequency Domain (we use the FFT):
•
•
•
•
Flip (reverse) one of the digital functions.
functions
Shift it along the time axis by one sample.
Multiply the corresponding values of the two digital functions.
Summate the products from step 3 to get one point of the
digital convolution.
• Repeat steps 1-4
1 4 to obtain the digital convolution at all times
that the functions overlap.
– Note that convolution in the time domain is equivalent
to multiplication in the frequency domain (and vice
)
versa):
{( f * g )(t )}  { f }{g}  F ( )G ( )
g
f
time
2
2
12
1.5
1.5
10
1
1
*
0.5
0

0.5
0
8
6
4
2
-0.5
-0.5
-1
-1
0
10
20
30
40
50
60
70
80
90
100
10
20
40
50
60
70
80
90
100
10
20
{g}
{ f }
frequency
30
12
12
10
10
8
8
6
6
4
4
2
2
0
0
30
40
50
60
70
80
90
100
80
90
100
{}
140
120

100
80
60
40
20
10
Computer Vision
17
20
30
40
50
60
80
90
100
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
Computer Vision
18
SIGNAL PROCESSING
SIGNAL PROCESSING
• Often, the result of the Fourier Transform needs
t be
to
b expressed
d iin tterms off the
th delta
d lt ffunction
ti
Time domain
xs (t )  x (t )  x(t )
x(t )
x (t )
xs (t )
Computer Vision
70
19
Computer Vision
Frequency
q
y domain
X s ( f )  X ( f )  X ( f )
| X(f )|
| X ( f ) |
| Xs( f )|
20
SIGNAL PROCESSING
SIGNAL PROCESSING
• Sampling theorem: A bandlimited signal with no
spectral components beyond , can be uniquely
determined by values sampled at uniform
intervals of
• Amplitude quantizing: Mapping samples of a
continuous amplitude waveform to a finite set of
amplitudes.
Original signal
Q
Quantized
ti d signal
i
l
1
Out
0.8
0.6
• The sampling rate,
rate.
0.4
is called Nyquist
In
0.2
0
-0
0.2
2
-0.4
-0.6
-0
0.8
8
aliasing
li i
-1
0
Computer Vision
21
1
2
3
4
5
6
7
Computer Vision
22
IMAGE PROCESSING AND ANALYSIS
IMAGE FORMATION
• Digital
Image
processing
concerns
the
transformation of an image to a digital format and
its processing by digital computers.
computers
• Digital image formation is the first step in any
digital image processing application.
• The digital image formation system consists
basically of the optical system, the sensor and the
digitizer.
– Both the input and output of digital image processing
system are digital images.
images
• Di
Digital
it l image
i
analysis
l i is
i related
l t d to
t the
th description
d
i ti
and recognition of the digital image content.
– Its input is a digital image and its output is a symbolic
image description.
Computer Vision
23
s
Optical
O
ti l
system
f
Visuall
Vi
sensor
g
i
Digitizer
The effect of the recording process is the addition of a noise contribution.
g i is called noisy
y image.
g
The recorded image
Computer Vision
24
OPTICAL SYSTEM
OPTICAL SYSTEM
• The optical system can be modeled as a linear
shift invariant system having a two-dimensional
impulse response h(x,y).
h( )
• The input-output relation of the optical system is
described by a 2-d convolution (both
(
signals s and
f represent optical intensities):
• The two-dimensional impulse response h(x,y) is
also called PSF (point spread function) and its
Fourier transform is called transfer function.
• Linear motion blur: it is due to the relative motion,
motion during
exposure, between the camera and the object being
photographed, H(w)~exp(jTvw)sinc(Tvw).
• Out-of-focus
O t ff
bl H(w)~
blur:
H( ) (1/D|w|)
(1/D| |)3/2 cos(D|w|).
(D| |)
• Atmospheric turbulence blur: H(w)~ exp(-s2 |w|2).
• No blur:
h(x y) = (x,
h(x,
(x y).
y)
f ( x, y )   s( , )h ( x   , y   )dd
Computer Vision
Motion blur
25
Computer Vision
OPTICAL SYSTEM
26
PINHOLE CAMERA
Abstract camera model
Animal eye
• Images are two-dimensional
two dimensional patterns of
brightness values. They are formed by the
projection of 3D objects.
objects
Pinhole Perspective
p
Equation
q
• Basic abstraction is the pinhole camera:
P  ( X ,Y , Z )
– Lenses required to ensure image
g is not too dark.
– Pinhole cameras work in practice.
P'  ( x' , y ' )
Computer Vision
27
Computer Vision
X

x
'

f
'

Z

 y'  f ' Y

Z
28
PINHOLE CAMERA
VISUAL SENSOR AND DIGITIZER
inverse problems, which are
prone to be ill-conditioned
P(X,Y,Z)
y
o
Small and Big and
z
near object far object
P(x’,y’)
P(X,Y,Z,t)
y
o
X

x
'

f
'

Z

 y'  f ' Y

Z
Fast and z
Slow and
near object far object
P(x’,y’,t)
Computer Vision
29
• The visual sensor transduces the intensity signal
into an electric signal.
signal
• In the case of a analog device, the twodimensional signal is analog yet: sampling and
digitization are performed by an A/D converter (in
a frame grabber).
grabber)
• It transforms the analog image g(x,y) to a digital
i
image
i( 1 2) n1=1,...N,
i(n1,n2),
1 1 N n2=1,...,M.
2 1 M
• If q is the quantization step, the quantized image
is allowed to have illumination at the levels kq,
k=0,1,2,...
Computer Vision
VISUAL SENSOR AND DIGITIZER
30
DIGITAL IMAGE REPRESENTATION
42
40
18
19
81
79
• Image quantization introduces an error.
40
49
30
39
103
78
22 84 64 100
54 67 38 14
13 76 19 33
34 90 70 112
128 57 75 91
89 55 101 100
200
100
0
50
40
The image represented as a two-dimensional
matrix NxM of integers (subsampled).
30
50
20
40
30
10
20
10
0
The image represented as a two-dimensional
surface (subsampled).
20
40
• Grayscale images are quantized at 256 levels and require
1 byte (8 bits) for the representation of each pixel.
pixel
• Binary images have only two quantization levels:0,1. They
are represented with 1 bit per pixel.
Computer Vision
31
0
60
80
100
120
140
160
180
200
The image represented as a “picture”.
220
50
100
Computer Vision
150
200
32
DIGITAL IMAGE
IMAGE SUBSAMPLING
• We can think of an image as a function, g, from R2
to R: g(x,y)
g(x y) gives the intensity at position (x,y).
(x y)
Realistically, g: [a,b]x[c,d]  [0,1].
5
10
15
20
25
30
35
40
45
20
50
40
55
• The digital (discrete) image is obtained by:
60
10
80
100
– Sampling the 2D space (domain) on a regular grid.
– Quantizing the amplitude of each sample (round to
nearest integer).
20
30
40
50
Sub sampling
Sub-sampling
120
140
160
180
200
10
220
50
100
150
200
• If our samples are step apart, we can write this
as:
( , ) Quantize{{ g(
g(h step,
p, k step)
p) }
i(h,k)=
20
30
40
50
60
10
20
30
40
50
60
Low-pass filtering and sub-sampling
Computer Vision
33
VISUAL SENSOR AND DIGITIZER
Computer Vision
VISUAL SENSOR AND DIGITIZER
• In the case of a digital acquisition device, the twodimensional signal is a digital image.
image
• A possible architecture of a
CCD:
• We consider two types of cameras:
– CCD (charge-coupled devices ).
– CMOS (complementary metal oxide semiconductor).
• There are separate photo-sensors at regular
positions and no scanning.
p
g
Computer Vision
34
35
Computer Vision
• CMOS:
– Same sensor elements as
CCD.
– Each photo sensor has its
own amplifier.
– More noise (reduced by
subtracting ‘black’
black image)
– Lower sensitivity (lower fill
rate).
)
– Allows
to
put
other
components on chip “Smart’
pixels”.
i l ”
36
VISUAL SENSOR AND DIGITIZER
VISUAL SENSOR AND DIGITIZER
• If we consider a color image, its RGB
(red green blue) values are represented by three
(red,green,blue)
matrices.
• Color camera:
Filter mosaic is directly into sensor.
– Prism (with 3 sensors).
– Filter
Filt mosaic.
i
Demosaicing (to obtain full color & “full” resolution image)
• The prism color camera separates the source light
in 3 beams using a prism.
Computer Vision
37
• The log-polar image geometry, first introduced to
model the space-variant topology of the human
retina receptors in relation to the data
compression it achieves,
achieves has become popular in
the active vision community for the important
algorithmic benefits it provides.
provides
20
40
40
60
• We consider the images a(i,j), b(i,j) and c(i,j) with
i=1
i=1,…,N
N and j=1,…,M.
j=1
M
• Image addition/subtraction: c(i ,j) = a(i ,j)± b(i ,j)
• Multiplication of an image by a constant value:
b(i ,j) = const a(i ,j)
60
80
80
100
100
120
120
140
140
160
160
180
180
200
• Point nonlinear transformations of the form:
b(i ,j)
j) = h( a(i ,j)
j) )
200
220
220
50
Computer Vision
38
ELEMENTARY DIGITAL IMAGE PROCESSING
OPERATIONS
LOG-POLAR VISION
20
Computer Vision
100
150
200
50
100
150
200
39
Computer Vision
40
ELEMENTARY DIGITAL IMAGE PROCESSING
OPERATIONS
• Clipping:
b(i ,j)
j) =
ELEMENTARY DIGITAL IMAGE PROCESSING
OPERATIONS
• Another set of elementary digital image
processing operations is related to geometric
transforms.
cmax if a(i ,j) >cmax
a(i
(i ,j)
j) if cmin
i <= a(i
(i ,j)
j) <= cmax
cmin if a(i ,j) <cmin
• Thresholding:
a1 if
a(i ,j)
j) < T
a2 if
a(i ,j) >= T
• Image translation: b(i ,j) = a(i + k ,j + h)
b(i ,j) =
• Contrast stretching:
b(i ,j) = ((a(i ,j) –a_min)/(a_max-a_min)) (omax-omin)+omin
Computer Vision
41
ELEMENTARY DIGITAL IMAGE PROCESSING
OPERATIONS
Computer Vision
42
ELEMENTARY DIGITAL IMAGE PROCESSING
OPERATIONS
• The rotated version b(x',y') of the image a(x,y) is
given by:
b(x’,y’) = b(x cos - y sin ,x sin + y cos)= a(x,y)
• Scaling
is
another
basic
geometrical
transformation It is described by the equation:
transformation.
• It must be notated that the digital coordinates are
integers.
integers
• It is worth noting that the actual algorithm maps
pixels of the destination image to the source
image. The opposite is possible and it is easier to
implement,
p
, but it creates p
pattern spikes
p
in the
rotated image.
p
can requires
q
image
g interpolation.
p
• This operation
Computer Vision
• Image
age rotation:
otat o if tthe
e image
age po
pointt a(
a(x,y)
,y) iss rotated
otated
by  degrees, its new coordinates (x',y') are given
by:
y
i   x 
 x '  cos   sin
 y '   sin  cos    y 
  
 
43
 x '  
 y '   0
  
0  x 
   y 
• Image scaling or zooming is given by:
b(x’,y’) = b(  x ,  y )= a(x,y)
Computer Vision
44
HOMEWORK
• Reading Assignments:
– Ma et al book: chapter
p 3,, sections 3.1 and 3.2.
– Forsyth and Ponce book: chapter 1, sections 1.1.1, 1.1.2,
1.3 and 1.4
– Gonzalez and Woods book: chapters 1 and 2.
– MATLAB “getting started” documentation at
htt //
http://www.mathworks.com.
th
k
– OpenCV documentation at
http://docs opencv org/
http://docs.opencv.org/
• Programming assignments:
– To implement
p
some examples
p
in MATLAB.
• Next class: Tuesday , October 2
Computer Vision
45
Download