3D Computer Vision 3D Vision and Video Computing CSC I6716 Spring2011 Topic 1 of Part II Camera Models Zhigang Zhu, City College of New York zhu@cs.ccny.cuny.edu 3D Computer Vision and Video Computing Closely Related Disciplines Image Processing – images to mages Computer Graphics – models to images Computer Vision – images to models Photogrammetry – obtaining accurate measurements from images What is 3-D ( three dimensional) Vision? 3D Vision Motivation: making computers see (the 3D world as humans do) Computer Vision: 2D images to 3D structure Applications : robotics / VR /Image-based rendering/ 3D video Lectures on 3-D Vision Fundamentals Camera Geometric Models (1 lecture) Camera Calibration (2 lectures) Stereo (2 lectures) Motion (2 lectures) 3D Computer Vision and Video Computing Geometric Projection of a Camera Pinhole camera model Perspective projection Weak-Perspective Projection Camera Parameters Intrinsic Parameters: define mapping from 3D to 2D Extrinsic parameters: define viewpoint and viewing direction Basic Vector and Matrix Operations, Rotation Camera Models Revisited Linear Version of the Projection Transformation Equation Lecture Outline Perspective Camera Model Weak-Perspective Camera Model Affine Camera Model Camera Model for Planes Summary 3D Computer Vision and Video Computing Camera Geometric Models Lecture Assumptions Knowledge about 2D and 3D geometric transformations Linear algebra (vector, matrix) This lecture is only about geometry Goal Build up relation between 2D images and 3D scenes -3D Graphics (rendering): from 3D to 2D -3D Vision (stereo and motion): from 2D to 3D -Calibration: Determning the parameters for mapping 3D Computer Vision and Video Computing Image Formation 3D Computer Vision Image Formation and Video Computing Light (Energy) Source Surface Imaging Plane Camera: Spec & Pose 3D Scene Pinhole Lens World Optics Sensor Signal 2D Image 3D Computer Vision Pinhole Camera Model and Video Computing Image Plane Optical Axis f Pinhole lens Pin-hole is the basis for most graphics and vision Derived from physical construction of early cameras Mathematics is very straightforward 3D World projected to 2D Image Image inverted, size reduced Image is a 2D plane: No direct depth information Perspective projection f called the focal length of the lens given image size, change f will change FOV and figure sizes 3D Computer Vision Focal Length, FOV and Video Computing Consider case with object on the optical axis: Image plane f z viewpoint Optical axis: the direction of imaging Image plane: a plane perpendicular to the optical axis Center of Projection (pinhole), focal point, viewpoint, nodal point Focal length: distance from focal point to the image plane FOV : Field of View – viewing angles in horizontal and vertical directions 3D Computer Vision Focal Length, FOV and Video Computing Consider case with object on the optical axis: Image plane z f Out of view Optical axis: the direction of imaging Image plane: a plane perpendicular to the optical axis Center of Projection (pinhole), focal point, viewpoint, , nodal point Focal length: distance from focal point to the image plane FOV : Field of View – viewing angles in horizontal and vertical directions Increasing f will enlarge figures, but decrease FOV 3D Computer Vision Equivalent Geometry and Video Computing Consider case with object on the optical axis: f z More convenient with upright image: z f Projection plane z = f Equivalent mathematically 3D Computer Vision Perspective Projection and Video Computing Compute the image coordinates of p in terms of the world (camera) coordinates of P. y Y p(x, y) x P(X,Y,Z ) X 0 Z Z=f Origin of camera at center of projection Z axis along optical axis Image Plane at Z = f; x // X and y//Y X x f Z Y y f Z 3D Computer Vision Reverse Projection and Video Computing Given a center of projection and image coordinates of a point, it is not possible to recover the 3D depth of the point from a single image. P(X,Y,Z) can be anywhere along this line p(x,y) All points on this line have image coordinates (x,y). In general, at least two images of the same point taken from two different locations are required to recover depth. 3D Computer Vision and Video Computing Pinhole camera image Amsterdam : what do you see in this picture? straight line size parallelism/angle shape shape of planes depth Photo by Robert Kosara, robert@kosara.net http://www.kosara.net/gallery/pinholeamsterdam/pic01.html 3D Computer Vision and Video Computing Pinhole camera image Amsterdam straight line size parallelism/angle shape shape of planes depth Photo by Robert Kosara, robert@kosara.net http://www.kosara.net/gallery/pinholeamsterdam/pic01.html 3D Computer Vision and Video Computing Pinhole camera image Amsterdam straight line size parallelism/angle shape shape of planes depth Photo by Robert Kosara, robert@kosara.net http://www.kosara.net/gallery/pinholeamsterdam/pic01.html 3D Computer Vision and Video Computing Pinhole camera image Amsterdam straight line size parallelism/angle shape shape of planes depth Photo by Robert Kosara, robert@kosara.net http://www.kosara.net/gallery/pinholeamsterdam/pic01.html 3D Computer Vision and Video Computing Pinhole camera image Amsterdam straight line size parallelism/angle shape shape of planes depth Photo by Robert Kosara, robert@kosara.net http://www.kosara.net/gallery/pinholeamsterdam/pic01.html 3D Computer Vision and Video Computing Pinhole camera image Amsterdam straight line size parallelism/angle shape shape of planes parallel to image depth Photo by Robert Kosara, robert@kosara.net http://www.kosara.net/gallery/pinholeamsterdam/pic01.html 3D Computer Vision and Video Computing Pinhole camera image Amsterdam: what do you see? straight line size parallelism/angle shape shape of planes parallel to image Depth ? stereo motion size structure … - We see spatial shapes rather than individual pixels - Knowledge: top-down vision belongs to human - Stereo &Motion most successful in 3D CV & application - You can see it but you don't know how… 3D Computer Vision and Video Computing Yet other pinhole camera images Rabbit or Man? Markus Raetz, Metamorphose II, 1991-92, cast iron, 15 1/4 x 12 x 12 inches Fine Art Center University Gallery, Sep 15 – Oct 26 3D Computer Vision and Video Computing Yet other pinhole camera images 2D projections are not the “same” as the real object as we usually see everyday! Markus Raetz, Metamorphose II, 1991-92, cast iron, 15 1/4 x 12 x 12 inches Fine Art Center University Gallery, Sep 15 – Oct 26 3D Computer Vision and Video Computing It’s real! 3D Computer Vision and Video Computing Weak Perspective Projection Average depth Z is much larger than the relative distance between any two scene points measured along the optical axis y Y p(x, y) x P(X,Y,Z ) X 0 Z Z=f A sequence of two transformations Orthographic projection : parallel rays Isotropic scaling : f/Z Linear Model Preserve angles and shapes X x f Z Y y f Z 3D Computer Vision Camera Parameters and Video Computing Pose / Camera xim Image frame (xim,yim) yim y x p Coordinate Systems Frame Grabber O Frame coordinates (xim, yim) pixels Image coordinates (x,y) in mm Camera coordinates (X,Y,Z) World coordinates (Xw,Yw,Zw) Camera Parameters Object / World Zw P Pw Xw Yw Intrinsic Parameters (of the camera and the frame grabber): link the frame coordinates of an image point with its corresponding camera coordinates Extrinsic parameters: define the location and orientation of the camera coordinate system with respect to the world coordinate system 3D Computer Vision and Video Computing y (0,0) x p (x,y,f) O Image center Directions of axes Pixel size ox xim o y yim Pixel (xim,yim) x ( xim ox ) s x y ( yim o y ) s y From 3D to 2D Size: (sx,sy) From image to frame Intrinsic Parameters (I) Perspective projection Intrinsic Parameters (ox ,oy) : image center (in pixels) (sx ,sy) : effective size of the pixel (in mm) f: focal length X x f Z Y y f Z 3D Computer Vision and Video Computing (x, y) Intrinsic Parameters (II) k1 , k2 (xd, yd) Lens Distortions Modeled as simple radial distortions r2 = xd2+yd2 x xd (1 k1r 2 k2r 4 ) (xd , yd) distorted points 2 4 y y ( 1 k r k r d 1 2 ) k1 , k2: distortion coefficients A model with k2 =0 is still accurate for a CCD sensor of 500x500 with ~5 pixels distortion on the outer boundary 3D Computer Vision Extrinsic Parameters and Video Computing xim (xim,yim) yim O y x p From World to Camera Zw P R Pw T Extrinsic Parameters P Pw Xw T Yw A 3-D translation vector, T, describing the relative locations of the origins of the two coordinate systems (what’s it?) A 3x3 rotation matrix, R, an orthogonal matrix that brings the corresponding axes of the two systems onto each other 3D Computer Vision and Video Computing Linear A point as a 2D/ 3D vector p x ( x, y)T Algebra: Vector and Matrix Image point: 2D vector Scene point: 3D vector Translation: 3D vector y T: Transpose P ( X , Y , Z )T T (Tx , Ty , Tz )T Vector Operations Addition: Translation of a 3D vector Dot product ( a scalar): P Pw T ( X w Tx , Yw Ty , Z w Tz )T a.b = |a||b|cosq c a b aT b Cross product (a vector) Generates a new vector that is orthogonal to both of them c ab a x b = (a2b3 - a3b2)i + (a3b1 - a1b3)j + (a1b2 - a2b1)k 3D Computer Vision and Video Computing Linear Rotation: 3x3 matrix Orthogonal : R 1 RT , i.e. RRT RT R I Algebra: Vector and Matrix r11 R rij r21 33 r31 r12 r22 r32 r13 R1T T r23 R 2 r33 RT3 9 elements => 3+3 constraints (orthogonal/cross ) => 2+2 constraints (unit vectors) => 3 DOF ? (degrees of freedom, orthogonal/dot) How to generate R from three angles? (next few slides) Matrix Operations R Pw +T= ? - Points in the World are projected on three new axes (of the camera system) and translated to a new origin r11 X w r12Yw r13 Z w Tx R1T Pw Tx T P RPw T r21 X w r22Yw r23 Z w T y R 2 Pw T y T R P T r X r Y r Z T z 31 w 32 w 33 w z 3 w 3D Computer Vision and Video Computing Rotation: from Angles to Matrix Rotation around the Axes Result of three consecutive rotations around the coordinate axes O Zw R R R R Notes: Only three rotations Every time around one axis Bring corresponding axes to each other Xw = X, Yw = Y, Zw = Z First step (e.g.) Bring Xw to X Xw Yw 3D Computer Vision and Video Computing Rotation: from Angles to Matrix cos R sin 0 cos 0 0 0 1 Zw O Yw Rotation around the Zw Axis sin Rotate in XwOYw plane Goal: Bring Xw to X But X is not in XwOYw Xw YwX X in XwOZw (Yw XwOZw) Yw in YOZ ( X YOZ) Next time rotation around Yw 3D Computer Vision and Video Computing Rotation: from Angles to Matrix cos R sin 0 cos 0 0 0 1 Zw Xw Rotation around the Zw Axis sin Rotate in XwOYw plane so that YwX X in XwOZw (YwXwOZw) Yw in YOZ ( XYOZ) Zw does not change O Yw 3D Computer Vision and Video Computing cos R 0 sin Zw Xw Rotation around the Yw Axis 0 sin 1 0 0 cos Rotation: from Angles to Matrix O Rotate in XwOZw plane so that Xw = X Zw in YOZ (& Yw in YOZ) Yw does not change Yw 3D Computer Vision and Video Computing cos R 0 sin 0 sin 1 0 0 cos Rotation around the Yw Axis Rotation: from Angles to Matrix O Rotate in XwOZw plane so that Xw = X Zw in YOZ (& Yw in YOZ) Yw does not change Yw Xw Zw 3D Computer Vision and Video Computing 0 1 R 0 cos 0 sin sin cos 0 Rotation around the Xw(X) Axis Rotation: from Angles to Matrix Rotate in YwOZw plane so that Yw = Y, Zw = Z (& Xw = X) Xw does not change O Yw Xw Zw 3D Computer Vision and Video Computing 0 1 R 0 cos 0 sin Rotation: from Angles to Matrix sin cos Yw 0 O Xw Zw Rotation around the Xw(X) Axis Rotate in YwOZw plane so that Yw = Y, Zw = Z (& Xw = X) Xw does not change 3D Computer Vision and Video Computing Rotation: from Angles to Matrix Appendix A.9 of the textbook Rotation around the Axes Result of three consecutive rotations around the coordinate axes O R R R R Zw Notes: Rotation directions The order of multiplications matters: ,, Same R, 6 different sets of ,, R Non-linear function of ,, R is orthogonal It’s easy to compute angles from R cos cos R sin sin cos cos sin cos sin cos sin sin cos sin sin sin sin cos cos cos sin sin sin cos Xw Yw sin sin cos cos cos 3D Computer Vision and Video Computing Rotation- Axis and Angle Appendix A.9 of the textbook According to Euler’s Theorem, any 3D rotation can be described by a rotating angle, q, around an axis defined by an unit vector n = [n1, n2, n3]T. Three degrees of freedom – why? n12 R I cos q n2 n1 n3 n1 n1n2 n22 n3 n2 n1n3 0 n2 n3 (1 cos q ) n3 n2 n32 n3 0 n1 n2 n1 sin q 0 3D Computer Vision Linear Version and Video Computing World to Camera Camera: P = (X,Y,Z)T Image: p = (x,y)T Not linear equations Image to Frame r11 X w r12Yw r13 Z w Tx R1T Pw Tx P RPw T r21 X w r22Yw r23 Z w T y RT2 Pw T y T r X r Y r Z T 31 w 32 w 33 w z R 3 Pw Tz Camera to Image Camera: P = (X,Y,Z)T World: Pw = (Xw,Yw,Zw)T Transform: R, T of Perspective Projection Neglecting distortion Frame (xim, yim)T World to Frame (Xw,Yw,Zw)T -> (xim, yim)T Effective focal lengths fx = f/sx, fy=f/sy Three are not independent ( x, y ) ( f X Y , f ) Z Z x ( xim ox ) s x y ( yim o y ) s y r X r Y r Z T xim ox f x 11 w 12 w 13 w x r31 X w r32Yw r33 Z w Tz yim o y f y r21 X w r22Yw r23 Z w Ty r31 X w r32Yw r33 Z w Tz 3D Computer Vision and Video Computing Projective Space Add fourth coordinate Only extrinsic parameters World to camera 3x3 Matrix Mint X x1 w x2 M int M ext Yw Z x 3 w 1 xim x1 / x3 yim x2 / x3 x1/x3 =xim, x2/x3 =yim 3x4 Matrix Mext Pw = (Xw,Yw,Zw, 1)T Define (x1,x2,x3)T such that Linear Matrix Equation of perspective projection Only intrinsic parameters Camera to frame r11 M ext r21 r31 r12 r22 r32 f x M int 0 0 0 fy 0 r13 Tx R1T r23 T y RT2 r33 Tz RT3 Tx Ty Tz ox o y 1 Simple Matrix Product! Projective Matrix M= MintMext (Xw,Yw,Zw)T -> (xim, yim)T Linear Transform from projective space to projective plane M defined up to a scale factor – 11 independent entries 3D Computer Vision and Video Computing Perspective Camera Model Making some assumptions Known center: Ox = Oy = 0 Square pixel: Sx = Sy = 1 fr11 M fr21 r31 fr12 fr22 r32 fr13 fTx fr23 fTy r33 Tz 11 independent entries <-> 7 parameters Weak-Perspective Camera Model Average Distance Z >> Range dZ Define centroid vector Pw Z Z RT 3 Pw Tz Three Camera Models 8 independent entries Affine Camera Model Mathematical Generalization of Weak-Pers Doesn’t correspond to physical camera But simple equation and appealing geometry M wp Doesn’t preserve angle BUT parallelism 8 independent entries fr11 fr21 0 fr12 fr22 0 a11 M af a21 0 fr13 fTx fr23 fTy 0 RT 3 Pw Tz a12 a22 0 a13 b1 a23 b2 0 b3 3D Computer Vision and Video Computing Planes are very common in the Man-Made World nx X w n yYw nz Z w d One more constraint for all points: Zw is a function of Xw and Yw Zw=0 Pw =(Xw, Yw,0, 1)T 3D point -> 2D point Projective Model of a Plane n T Pw d Special case: Ground Plane Camera Models for a Plane 8 independent entries General Form ? 8 independent entries x1 fr11 x2 fr21 x r 3 31 fr12 fr22 r32 Xw fr13 fTx Yw fr23 fTy Z 0 r33 Tz w 1 3D Computer Vision and Video Computing Camera Models for a Plane A Plane in the World nx X w n yYw nz Z w d Zw=0 Pw =(Xw, Yw,0, 1)T 3D point -> 2D point x1 fr11 x2 fr21 x r 3 31 fr12 fr22 r32 Projective Model of Zw=0 One more constraint for all points: Zw is a function of Xw and Yw Special case: Ground Plane n T Pw d 8 independent entries General Form ? Xw fr13 fTx Yw fr23 fTy Z 0 r33 Tz w 1 8 independent entries x1 fr11 x2 fr21 x r 3 31 fr12 fr22 r32 fTx X w fT23 Yw Tz 1 3D Computer Vision and Video Computing Camera Models for a Plane A Plane in the World nx X w n yYw nz Z w d x1 fr11 x2 fr21 x r 3 31 Zw=0 Pw =(Xw, Yw,0, 1)T 3D point -> 2D point Projective Model of Zw=0 One more constraint for all points: Zw is a function of Xw and Yw Special case: Ground Plane n T Pw d fr12 fr22 r32 Xw fr13 fTx Yw fr23 fTy Zw r33 Tz 1 8 independent entries General Form ? x1 f (r11 nx r13 ) f (r12 n y r13 ) f (dr13 Tx ) X w x f ( r n r ) f ( r n r ) f ( dr T ) Y nz = 1 2 21 x 23 22 y 23 23 y w (r32 n y r33 ) (dr33 Tz ) 1 Z w d nx X w n yYw x3 (r31 nx r33 ) 8 independent entries 2D (xim,yim) -> 3D (Xw, Yw, Zw) ? 3D Computer Vision and Video Computing Graphics /Rendering From 3D world to 2D image Changing viewpoints and directions Changing focal length Fast rendering algorithms Vision / Reconstruction From 2D image to 3D model Applications and Issues xim x1 / x3 y x / x im 2 3 Inverse problem Much harder / unsolved Robust algorithms for matching and parameter estimation Need to estimate camera parameters first Calibration X x1 w x2 M int M ext Yw Z x 3 w 1 Find intrinsic & extrinsic parameters Given image-world point pairs Probably a partially solved problem ? 11 independent entries <-> 10 parameters: fx, fy, ox, oy, ,,, Tx,Ty,Tz f x M int 0 0 r11 M ext r21 r31 0 fy 0 r12 r22 r32 ox o y 1 r13 Tx r23 Ty r33 Tz 3D Computer Vision and Video Computing Geometric Projection of a Camera Pinhole camera model Perspective projection Weak-Perspective Projection Camera Parameters (10 or 11) Camera Model Summary Intrinsic Parameters: f, ox,oy, sx,sy,k1: 4 or 5 independent parameters Extrinsic parameters: R, T – 6 DOF (degrees of freedom) Linear Equations of Camera Models (without distortion) General Projection Transformation Equation : 11 parameters Perspective Camera Model: 11 parameters Weak-Perspective Camera Model: 8 parameters Affine Camera Model: generalization of weak-perspective: 8 Projective transformation of planes: 8 parameters 3D Computer Vision Next and Video Computing Determining the value of the extrinsic and intrinsic parameters of a camera Calibration (Ch. 6) X x1 w x2 M int M ext Yw Z x 3 w 1 f x M int 0 0 r11 M ext r21 r31 0 fy 0 r12 r22 r32 ox o y 1 r13 Tx r23 Ty r33 Tz