RECOMMENDED TEXTS Neural Networks, Computer Vision Robotics Engineering Master Degree EMARO - European Master on Advanced Robotics • D. Forsyth and J. Ponce, Computer Vision: A Modern Approach, Prentice Hall 2002 • Y. Ma, S. Soatto, J. Kosecka, S. Sastry, An Invitation to 3D Vision. From Images g to Geometric Models,, Springer-Verlag, p g g, New York 2004. On-line: vision.ucla.edu/MASKS/ Computer Vision Fabio Solari fabio.solari@unige.it • R.C. Gonzalez and R.E. Woods, Digital image processing, Prentice-Hall, 2008 URL: http://www.pspc.unige.it Silvio P. Sabatini silvio.sabatini@unige.it Manuela Chessa manuela.chessa@unige.it • I. Pitas, Digital Image Processing Algorithms, Prentice Hall, New York, 1993. • O Faugeras, O. F Th Three-dimensional di i l computer t vision. i i A geometric t i viewpoint, i i t The Th MIT Press. P Cambridge, Mass. 1993. • Material distributed by lecturers: http://www.pspc.unige.it/~fabio/teaching/Computer_Vision.html Computer Vision 2 SOFTWARE TOOL: MATLAB SOFTWARE TOOL: OPENCV • MATLAB is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-touse environment where problems and solutions are expressed in familiar mathematical notation. • OpenCV (Open Source Computer Vision) is a lib library off programming i f functions ti f reall time for ti computer vision. OpenCV is released under a BSD license, li it is i free f f both for b th academic d i and d commercial use. It has C++, C, Python and soon J Java i t f interfaces running i on Windows, Wi d Li Linux, Android and Mac. • Very good online help. • Good tool for learning and prototyping. prototyping Computer Vision • http://opencv.willowgarage.com/wiki/ p p g g • Good tool for implementing applications applications. 3 Computer Vision real-world 4 INTRODUCTION INTRODUCTION • Goal and objectives: – To introduce the fundamental problems of computer vision. – To introduce the main concepts and techniques used to solve those. – To enable participants to implement solutions for reasonably bl complex l problems. bl Computer Vision 5 • The sense of vision plays an important role in the life of primates: it allows them to infer spatio-temporal properties of the environment that are necessary to perform crucial tasks for survival. p to endow machines with a • Thus,, it is important sense of vision in order to have them interacting with humans and the environment in a dynamic y way. Computer Vision INTRODUCTION INTRODUCTION • But vision is deceivingly easy: • To interact with the real-world, we have to tackle the problem of inferring 3 3-D D information of a scene from a set of 2-D images. • In general, this problem falls into the category of so-called inverse problems, which are prone to be ill-conditioned and difficult to solve in their full generality unless additional assumptions are imposed. – vision s o iss immediate ed ate for o us – we perceive the visual world as external to ourselves, but it is a reconstruction within our brain • Before we address how to reconstruct 3-D geometry from 2-D images, we first need to understand how 2-D images are generated and processed. d • Vision is computationally demanding demanding. Computer Vision 6 7 Computer Vision 8 ARTIFICIAL VISION ARTIFICIAL VISION Computer Vision Computer Vision Image Processing Image Processing Pattern Recognition Machine Vision Pattern Recognition vision i i for f action ti Machine Vision dynamic 3D information about the environment Computer Vision 9 Real-time implementation of the vision algorithms Computer Vision 10 ARTIFICIAL VISION ARTIFICIAL VISION real world edge detection objects edge detection image segmentation images Matrix of pixels object recognition camera objects optic flow disparity visual features Computer Vision specific objects object speed image image segmentation object distance real-world properties 11 pixels Computer Vision visual features 12 ARTIFICIAL VISION ARTIFICIAL VISION visual features real world visual features real world disparity map camera1 optic flow far object camera2 object speed=v1 camera near object image1 image2 object speed=v2 image1 image2 image3 … Computer Vision t=t1 pixels t=t1 13 Computer Vision OUTLINE t=t2 14 1D SIGNAL PROCESSING • The convolution operator can be thought as a general moving average. average • 1D digital signal processing – Convolution, Fourier Transform, Nyquist rate – Continuous domain: (f * g )(t ) f ( )g (t )d • Image formation – Pinhole camera – Visual sensors – Digital image – Discrete domain: (f * g )(m ) f (n )g (m n ) n • Convolution can describe the effect of an LTI (Linear Time-Invariant) system on a signal. • Digital Di it l iimage operations ti Computer Vision t=t3 pixels 15 Computer Vision 16 SIGNAL PROCESSING SIGNAL PROCESSING • Basic steps of the convolution: • Frequency Domain (we use the FFT): • • • • Flip (reverse) one of the digital functions. functions Shift it along the time axis by one sample. Multiply the corresponding values of the two digital functions. Summate the products from step 3 to get one point of the digital convolution. • Repeat steps 1-4 1 4 to obtain the digital convolution at all times that the functions overlap. – Note that convolution in the time domain is equivalent to multiplication in the frequency domain (and vice ) versa): {( f * g )(t )} { f }{g} F ( )G ( ) g f time 2 2 12 1.5 1.5 10 1 1 * 0.5 0 0.5 0 8 6 4 2 -0.5 -0.5 -1 -1 0 10 20 30 40 50 60 70 80 90 100 10 20 40 50 60 70 80 90 100 10 20 {g} { f } frequency 30 12 12 10 10 8 8 6 6 4 4 2 2 0 0 30 40 50 60 70 80 90 100 80 90 100 {} 140 120 100 80 60 40 20 10 Computer Vision 17 20 30 40 50 60 80 90 100 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 Computer Vision 18 SIGNAL PROCESSING SIGNAL PROCESSING • Often, the result of the Fourier Transform needs t be to b expressed d iin tterms off the th delta d lt ffunction ti Time domain xs (t ) x (t ) x(t ) x(t ) x (t ) xs (t ) Computer Vision 70 19 Computer Vision Frequency q y domain X s ( f ) X ( f ) X ( f ) | X(f )| | X ( f ) | | Xs( f )| 20 SIGNAL PROCESSING SIGNAL PROCESSING • Sampling theorem: A bandlimited signal with no spectral components beyond , can be uniquely determined by values sampled at uniform intervals of • Amplitude quantizing: Mapping samples of a continuous amplitude waveform to a finite set of amplitudes. Original signal Q Quantized ti d signal i l 1 Out 0.8 0.6 • The sampling rate, rate. 0.4 is called Nyquist In 0.2 0 -0 0.2 2 -0.4 -0.6 -0 0.8 8 aliasing li i -1 0 Computer Vision 21 1 2 3 4 5 6 7 Computer Vision 22 IMAGE PROCESSING AND ANALYSIS IMAGE FORMATION • Digital Image processing concerns the transformation of an image to a digital format and its processing by digital computers. computers • Digital image formation is the first step in any digital image processing application. • The digital image formation system consists basically of the optical system, the sensor and the digitizer. – Both the input and output of digital image processing system are digital images. images • Di Digital it l image i analysis l i is i related l t d to t the th description d i ti and recognition of the digital image content. – Its input is a digital image and its output is a symbolic image description. Computer Vision 23 s Optical O ti l system f Visuall Vi sensor g i Digitizer The effect of the recording process is the addition of a noise contribution. g i is called noisy y image. g The recorded image Computer Vision 24 OPTICAL SYSTEM OPTICAL SYSTEM • The optical system can be modeled as a linear shift invariant system having a two-dimensional impulse response h(x,y). h( ) • The input-output relation of the optical system is described by a 2-d convolution (both ( signals s and f represent optical intensities): • The two-dimensional impulse response h(x,y) is also called PSF (point spread function) and its Fourier transform is called transfer function. • Linear motion blur: it is due to the relative motion, motion during exposure, between the camera and the object being photographed, H(w)~exp(jTvw)sinc(Tvw). • Out-of-focus O t ff bl H(w)~ blur: H( ) (1/D|w|) (1/D| |)3/2 cos(D|w|). (D| |) • Atmospheric turbulence blur: H(w)~ exp(-s2 |w|2). • No blur: h(x y) = (x, h(x, (x y). y) f ( x, y ) s( , )h ( x , y )dd Computer Vision Motion blur 25 Computer Vision OPTICAL SYSTEM 26 PINHOLE CAMERA Abstract camera model Animal eye • Images are two-dimensional two dimensional patterns of brightness values. They are formed by the projection of 3D objects. objects Pinhole Perspective p Equation q • Basic abstraction is the pinhole camera: P ( X ,Y , Z ) – Lenses required to ensure image g is not too dark. – Pinhole cameras work in practice. P' ( x' , y ' ) Computer Vision 27 Computer Vision X x ' f ' Z y' f ' Y Z 28 PINHOLE CAMERA VISUAL SENSOR AND DIGITIZER inverse problems, which are prone to be ill-conditioned P(X,Y,Z) y o Small and Big and z near object far object P(x’,y’) P(X,Y,Z,t) y o X x ' f ' Z y' f ' Y Z Fast and z Slow and near object far object P(x’,y’,t) Computer Vision 29 • The visual sensor transduces the intensity signal into an electric signal. signal • In the case of a analog device, the twodimensional signal is analog yet: sampling and digitization are performed by an A/D converter (in a frame grabber). grabber) • It transforms the analog image g(x,y) to a digital i image i( 1 2) n1=1,...N, i(n1,n2), 1 1 N n2=1,...,M. 2 1 M • If q is the quantization step, the quantized image is allowed to have illumination at the levels kq, k=0,1,2,... Computer Vision VISUAL SENSOR AND DIGITIZER 30 DIGITAL IMAGE REPRESENTATION 42 40 18 19 81 79 • Image quantization introduces an error. 40 49 30 39 103 78 22 84 64 100 54 67 38 14 13 76 19 33 34 90 70 112 128 57 75 91 89 55 101 100 200 100 0 50 40 The image represented as a two-dimensional matrix NxM of integers (subsampled). 30 50 20 40 30 10 20 10 0 The image represented as a two-dimensional surface (subsampled). 20 40 • Grayscale images are quantized at 256 levels and require 1 byte (8 bits) for the representation of each pixel. pixel • Binary images have only two quantization levels:0,1. They are represented with 1 bit per pixel. Computer Vision 31 0 60 80 100 120 140 160 180 200 The image represented as a “picture”. 220 50 100 Computer Vision 150 200 32 DIGITAL IMAGE IMAGE SUBSAMPLING • We can think of an image as a function, g, from R2 to R: g(x,y) g(x y) gives the intensity at position (x,y). (x y) Realistically, g: [a,b]x[c,d] [0,1]. 5 10 15 20 25 30 35 40 45 20 50 40 55 • The digital (discrete) image is obtained by: 60 10 80 100 – Sampling the 2D space (domain) on a regular grid. – Quantizing the amplitude of each sample (round to nearest integer). 20 30 40 50 Sub sampling Sub-sampling 120 140 160 180 200 10 220 50 100 150 200 • If our samples are step apart, we can write this as: ( , ) Quantize{{ g( g(h step, p, k step) p) } i(h,k)= 20 30 40 50 60 10 20 30 40 50 60 Low-pass filtering and sub-sampling Computer Vision 33 VISUAL SENSOR AND DIGITIZER Computer Vision VISUAL SENSOR AND DIGITIZER • In the case of a digital acquisition device, the twodimensional signal is a digital image. image • A possible architecture of a CCD: • We consider two types of cameras: – CCD (charge-coupled devices ). – CMOS (complementary metal oxide semiconductor). • There are separate photo-sensors at regular positions and no scanning. p g Computer Vision 34 35 Computer Vision • CMOS: – Same sensor elements as CCD. – Each photo sensor has its own amplifier. – More noise (reduced by subtracting ‘black’ black image) – Lower sensitivity (lower fill rate). ) – Allows to put other components on chip “Smart’ pixels”. i l ” 36 VISUAL SENSOR AND DIGITIZER VISUAL SENSOR AND DIGITIZER • If we consider a color image, its RGB (red green blue) values are represented by three (red,green,blue) matrices. • Color camera: Filter mosaic is directly into sensor. – Prism (with 3 sensors). – Filter Filt mosaic. i Demosaicing (to obtain full color & “full” resolution image) • The prism color camera separates the source light in 3 beams using a prism. Computer Vision 37 • The log-polar image geometry, first introduced to model the space-variant topology of the human retina receptors in relation to the data compression it achieves, achieves has become popular in the active vision community for the important algorithmic benefits it provides. provides 20 40 40 60 • We consider the images a(i,j), b(i,j) and c(i,j) with i=1 i=1,…,N N and j=1,…,M. j=1 M • Image addition/subtraction: c(i ,j) = a(i ,j)± b(i ,j) • Multiplication of an image by a constant value: b(i ,j) = const a(i ,j) 60 80 80 100 100 120 120 140 140 160 160 180 180 200 • Point nonlinear transformations of the form: b(i ,j) j) = h( a(i ,j) j) ) 200 220 220 50 Computer Vision 38 ELEMENTARY DIGITAL IMAGE PROCESSING OPERATIONS LOG-POLAR VISION 20 Computer Vision 100 150 200 50 100 150 200 39 Computer Vision 40 ELEMENTARY DIGITAL IMAGE PROCESSING OPERATIONS • Clipping: b(i ,j) j) = ELEMENTARY DIGITAL IMAGE PROCESSING OPERATIONS • Another set of elementary digital image processing operations is related to geometric transforms. cmax if a(i ,j) >cmax a(i (i ,j) j) if cmin i <= a(i (i ,j) j) <= cmax cmin if a(i ,j) <cmin • Thresholding: a1 if a(i ,j) j) < T a2 if a(i ,j) >= T • Image translation: b(i ,j) = a(i + k ,j + h) b(i ,j) = • Contrast stretching: b(i ,j) = ((a(i ,j) –a_min)/(a_max-a_min)) (omax-omin)+omin Computer Vision 41 ELEMENTARY DIGITAL IMAGE PROCESSING OPERATIONS Computer Vision 42 ELEMENTARY DIGITAL IMAGE PROCESSING OPERATIONS • The rotated version b(x',y') of the image a(x,y) is given by: b(x’,y’) = b(x cos - y sin ,x sin + y cos)= a(x,y) • Scaling is another basic geometrical transformation It is described by the equation: transformation. • It must be notated that the digital coordinates are integers. integers • It is worth noting that the actual algorithm maps pixels of the destination image to the source image. The opposite is possible and it is easier to implement, p , but it creates p pattern spikes p in the rotated image. p can requires q image g interpolation. p • This operation Computer Vision • Image age rotation: otat o if tthe e image age po pointt a( a(x,y) ,y) iss rotated otated by degrees, its new coordinates (x',y') are given by: y i x x ' cos sin y ' sin cos y 43 x ' y ' 0 0 x y • Image scaling or zooming is given by: b(x’,y’) = b( x , y )= a(x,y) Computer Vision 44 HOMEWORK • Reading Assignments: – Ma et al book: chapter p 3,, sections 3.1 and 3.2. – Forsyth and Ponce book: chapter 1, sections 1.1.1, 1.1.2, 1.3 and 1.4 – Gonzalez and Woods book: chapters 1 and 2. – MATLAB “getting started” documentation at htt // http://www.mathworks.com. th k – OpenCV documentation at http://docs opencv org/ http://docs.opencv.org/ • Programming assignments: – To implement p some examples p in MATLAB. • Next class: Tuesday , October 2 Computer Vision 45