Introduction to Robot Vision Ziv Yaniv Computer Aided Interventions and Medical Robotics, Georgetown University Vision The special sense by which the qualities of an object (as color, luminosity, shape, and size) constituting its appearance are perceived through a process in which light rays entering the eye are transformed by the retina into electrical signals that are transmitted to the brain via the optic nerve. [Miriam Webster dictionary] The Sensor endoscope Single Lens Reflex (SLR) Camera webcam C-arm X-ray The Sensor Model: Pin-hole Camera, Perspective Projection z y y x z focal point x Machine Vision Goal: Obtain useful information about the 3D world from 2D images. Model: images Regions Textures Corners Lines … 3D Geometry Object identification Activity detection … actions Machine Vision Goal: Obtain useful information about the 3D world from 2D images. •Low level (image processing) • image filtering (smoothing, histogram modification…), • feature extraction (corner detection, edge detection,…) • stereo vision • shape from X (shading, motion,…) •… • High level (machine learning/pattern recognition) • object detection • object recognition • clustering •… Machine Vision • How hard can it be? Machine Vision • How hard can it be? Robot Vision 1. Simultaneous Localization and Mapping (SLAM) 2. Visual Servoing. Robot Vision 1. Simultaneous Localization and Mapping (SLAM) – create a 3D map of the world and localize within this map. NASA stereo vision image processing, as used by the MER Mars rovers Robot Vision 1. Simultaneous Localization and Mapping (SLAM) – create a 3D map of the world and localize within this map. “Simultaneous Localization and Mapping with Active Stereo Vision”, J. Diebel, K. Reuterswärd, S. Thrun, J. Davis, R. Gupta, IROS 2004. Robot Vision 1. Visual Servoing – Using visual feedback to control a robot: a) image-based systems: desired motion directly from image. “An image-based visual servoing scheme for following paths with nonholonomic mobile robots” A. Cherubini, F. Chaumette, G. Oriolo, ICARCV 2008. Robot Vision 1. Visual Servoing – Using visual feedback to control a robot: b) Position-based systems: desired motion from 3D reconstruction estimated from image. System Configuration • Difficulty of similar tasks in different settings varies widely: – How many cameras? – Are the cameras calibrated? – What is the camera-robot configuration? – Is the system calibrated (hand-eye calibration)? Common configurations: y x z x x z y y y z x z System Characteristics • The greater the control over the system configuration and environment the easier it is to execute a task. • System accuracy is directly dependent upon model accuracy – what accuracy does the task require?. • All measurements and derived quantitative values have an associated error. Stereo Reconstruction • Compute the 3D location of a point in the stereo rig’s coordinate system: • Rigid transformation between the two cameras is known. • Cameras are calibrated –given a point in the world coordinate system we know how to map it to the image. • Same point localized in the two images. world Camera 1 Camera 2 2 1 T Commercial Stereo Vision Polaris Vicra infra-red system (Northern Digitial Inc.) MicronTracker visible light system (Claron Technology Inc.) Commercial Stereo Vision Images acquired by the Polaris Vicra infra-red stereo system: left image right image Stereo Reconstruction • Wide or short baseline – reconstruction accuracy vs. difficulty of point matching Camera 1 Camera 2 Camera 2 Camera 2 Camera Model • Points P, p, and O, given in the camera coordinate system, are collinear. There is a number a for which O + aP = p aP = p a = f/Z , therefore x f O x f y f Y Z P=[X,Y,Z] p=[x,y,f] y X Z z u f v 0 w 0 0 f 0 X 0 0 Y 0 0 Z 1 0 1 Camera Model Transform the pixel coordinates from the camera coordinate system to the image coordinate system: • Image origin (principle point) is at [x0,y0] relative to the camera coordinate system. • Need to change from metric units to pixels, scaling factors kx, ky. y [x’,y’] x principle point u ' fk x v' 0 w' 0 0 fk y 0 x0 y0 1 X 0 Y 0 Z 0 1 • Finally, the image coordinate system may be skewed resulting in: u ' fk x v' 0 w' 0 s fk y 0 x0 y0 1 X 0 Y 0 Z 0 1 Camera Model • As our original assumption was that points are given in the camera coordinate system, a complete projection matrix is of the form: M34 R RC K 33[I33 | 031 ] KR[I | C] 1 0 • How many degrees of freedom does M have? M 34 m11 m21 m31 m12 m13 m22 m23 m32 m33 m14 M 1T m24 M 2T m34 M 3T fk x K 0 0 s fk y 0 x0 y0 1 C – camera origin in the world coordinate system. Camera Calibration • Given pairs of points, piT=[x,y,w], PiT=[X,Y,Z,W], in homogenous coordinates we have: image coordinate system p MP z y Our goal is to estimate M y x calibration object/ world coordinate system principle point x z camera coordinate system •As the points are in homogenous coordinates the vectors p and MP are not necessarily equal, they have the same direction but may differ by a non-zero scale factor. p MP 0 Camera Calibration • After a bit of algebra we have: 0T T w P i i yi PiT wi PiT 0T xi PiT yi PiT M1 xi PiT M 2 0 0T M 3 Am 0 • The three equations are linearly dependent: xi y A1 i A 2 A 3 wi wi • Each point pair contributes two equations. • Exact solution: M has 11 degrees of freedom, requiring a minimum of n=6 pairs. • Least squares solution: For n>6 minimize ||Am|| s.t. ||m||=1. Obtaining the Rays • Camera location in the calibration object’s coordinate system, C, is given by the one dimensional right null space of the matrix M (MC=0). • A 3D homogenous point P = M+p is on the ray defined by p and the camera center [it projects onto p, MM+p =Ip=p]. • These two points define our ray in the world coordinate system. • As both cameras were calibrated with respect to the same coordinate system the rays will be in the same system too. Intersecting the Rays r1 (t1 ) a1 t1n1 r2 (t2 ) a2 t2n2 a1 n1 a2 t1 ((a 2 a1 ) n 2 ) (n1 n 2 ) T n1 n 2 2 t2 n2 ((a 2 a1 ) n1 )T (n1 n 2 ) [r1 (t1 ) r2 (t 2 )] 2 n1 n 2 2 World vs. Model • Actual cameras most often don’t follow the ideal pin-hole model, usually exhibit some form of distortion (barrel, pin-cushion, S). • Sometimes the world changes to fit your model, improvements in camera/lens quality can improve model performance. old image-Intensifier x-ray: pin-hole+distortion replaced by flat panel x-ray: pin-hole Additional Material • Code: – Camera calibration toolbox for matlab (Jean-Yves Bouguet ) http://www.vision.caltech.edu/bouguetj/calib_doc/ • Machine Vision: – “Multiple View Geometry in Computer Vision”, Hartley and Zisserman, Cambridge University Press. – "Machine Vision", Jain, Kasturi, Schunck, McGraw-Hill. • Robot Vision: – “Simultaneous Localization and Mapping: Part I”, H. Durant-Whyte, T. Bailey, IEEE Robotics and Automation Magazine, Vol. 13(2), pp. 99-110, 2006. – “Simultaneous Localization and Mapping (SLAM) : Part II”,T. Bailey, H. DurantWhyte, IEEE Robotics and Automation Magazine, Vol. 13(3), pp. 108-117, 2006. – “Visual Servo Control Part I: Basic Approaches”, IEEE Robotics and Automation Magazine, Vol. 13(4), 82-90, 2006. – Visual Servo Control Part II: Advanced Approaches”, IEEE Robotics and Automation Magazine, Vol. 14(1), 109-118, 2007.