Introduction

3D Computer Vision 3D Vision and Video Computing CSC I6716 Spring2011 Topic 1 of Part II Camera Models Zhigang Zhu, City College of New York zhu@cs.ccny.cuny.edu 3D Computer Vision and Video Computing  Closely Related Disciplines      Image Processing – images to mages Computer Graphics – models to images Computer Vision – images to models Photogrammetry – obtaining accurate measurements from images What is 3-D ( three dimensional) Vision?     3D Vision Motivation: making computers see (the 3D world as humans do) Computer Vision: 2D images to 3D structure Applications : robotics / VR /Image-based rendering/ 3D video Lectures on 3-D Vision Fundamentals     Camera Geometric Models (1 lecture) Camera Calibration (2 lectures) Stereo (2 lectures) Motion (2 lectures) 3D Computer Vision and Video Computing  Geometric Projection of a Camera     Pinhole camera model Perspective projection Weak-Perspective Projection Camera Parameters   Intrinsic Parameters: define mapping from 3D to 2D Extrinsic parameters: define viewpoint and viewing direction   Basic Vector and Matrix Operations, Rotation Camera Models Revisited  Linear Version of the Projection Transformation Equation      Lecture Outline Perspective Camera Model Weak-Perspective Camera Model Affine Camera Model Camera Model for Planes Summary 3D Computer Vision and Video Computing  Camera Geometric Models     Lecture Assumptions Knowledge about 2D and 3D geometric transformations Linear algebra (vector, matrix) This lecture is only about geometry Goal Build up relation between 2D images and 3D scenes -3D Graphics (rendering): from 3D to 2D -3D Vision (stereo and motion): from 2D to 3D -Calibration: Determning the parameters for mapping 3D Computer Vision and Video Computing Image Formation 3D Computer Vision Image Formation and Video Computing Light (Energy) Source Surface Imaging Plane Camera: Spec & Pose 3D Scene Pinhole Lens World Optics Sensor Signal 2D Image 3D Computer Vision Pinhole Camera Model and Video Computing Image Plane Optical Axis f Pinhole lens    Pin-hole is the basis for most graphics and vision  Derived from physical construction of early cameras  Mathematics is very straightforward 3D World projected to 2D Image  Image inverted, size reduced  Image is a 2D plane: No direct depth information Perspective projection  f called the focal length of the lens  given image size, change f will change FOV and figure sizes 3D Computer Vision Focal Length, FOV and Video Computing  Consider case with object on the optical axis: Image plane f z viewpoint      Optical axis: the direction of imaging Image plane: a plane perpendicular to the optical axis Center of Projection (pinhole), focal point, viewpoint, nodal point Focal length: distance from focal point to the image plane FOV : Field of View – viewing angles in horizontal and vertical directions 3D Computer Vision Focal Length, FOV and Video Computing  Consider case with object on the optical axis: Image plane z f Out of view       Optical axis: the direction of imaging Image plane: a plane perpendicular to the optical axis Center of Projection (pinhole), focal point, viewpoint, , nodal point Focal length: distance from focal point to the image plane FOV : Field of View – viewing angles in horizontal and vertical directions Increasing f will enlarge figures, but decrease FOV 3D Computer Vision Equivalent Geometry and Video Computing  Consider case with object on the optical axis: f z  More convenient with upright image: z f Projection plane z = f  Equivalent mathematically 3D Computer Vision Perspective Projection and Video Computing  Compute the image coordinates of p in terms of the world (camera) coordinates of P. y Y p(x, y) x P(X,Y,Z ) X 0 Z Z=f    Origin of camera at center of projection Z axis along optical axis Image Plane at Z = f; x // X and y//Y X x f Z Y y f Z 3D Computer Vision Reverse Projection and Video Computing  Given a center of projection and image coordinates of a point, it is not possible to recover the 3D depth of the point from a single image. P(X,Y,Z) can be anywhere along this line p(x,y) All points on this line have image coordinates (x,y). In general, at least two images of the same point taken from two different locations are required to recover depth. 3D Computer Vision and Video Computing Pinhole camera image Amsterdam : what do you see in this picture? straight line size parallelism/angle shape shape of planes depth Photo by Robert Kosara, robert@kosara.net http://www.kosara.net/gallery/pinholeamsterdam/pic01.html 3D Computer Vision and Video Computing Pinhole camera image Amsterdam straight line size parallelism/angle shape shape of planes depth Photo by Robert Kosara, robert@kosara.net http://www.kosara.net/gallery/pinholeamsterdam/pic01.html 3D Computer Vision and Video Computing Pinhole camera image Amsterdam straight line size parallelism/angle shape shape of planes depth Photo by Robert Kosara, robert@kosara.net http://www.kosara.net/gallery/pinholeamsterdam/pic01.html 3D Computer Vision and Video Computing Pinhole camera image Amsterdam straight line size parallelism/angle shape shape of planes depth Photo by Robert Kosara, robert@kosara.net http://www.kosara.net/gallery/pinholeamsterdam/pic01.html 3D Computer Vision and Video Computing Pinhole camera image Amsterdam straight line size parallelism/angle shape shape of planes depth Photo by Robert Kosara, robert@kosara.net http://www.kosara.net/gallery/pinholeamsterdam/pic01.html 3D Computer Vision and Video Computing Pinhole camera image Amsterdam straight line size parallelism/angle shape shape  of planes parallel to image depth Photo by Robert Kosara, robert@kosara.net http://www.kosara.net/gallery/pinholeamsterdam/pic01.html 3D Computer Vision and Video Computing Pinhole camera image Amsterdam: what do you see? straight line size parallelism/angle shape shape  of planes parallel to image Depth ? stereo motion size structure … - We see spatial shapes rather than individual pixels - Knowledge: top-down vision belongs to human - Stereo &Motion most successful in 3D CV & application - You can see it but you don't know how… 3D Computer Vision and Video Computing Yet other pinhole camera images Rabbit or Man? Markus Raetz, Metamorphose II, 1991-92, cast iron, 15 1/4 x 12 x 12 inches Fine Art Center University Gallery, Sep 15 – Oct 26 3D Computer Vision and Video Computing Yet other pinhole camera images 2D projections are not the “same” as the real object as we usually see everyday! Markus Raetz, Metamorphose II, 1991-92, cast iron, 15 1/4 x 12 x 12 inches Fine Art Center University Gallery, Sep 15 – Oct 26 3D Computer Vision and Video Computing It’s real! 3D Computer Vision and Video Computing  Weak Perspective Projection Average depth Z is much larger than the relative distance between any two scene points measured along the optical axis y Y p(x, y) x P(X,Y,Z ) X 0 Z Z=f  A sequence of two transformations    Orthographic projection : parallel rays Isotropic scaling : f/Z Linear Model  Preserve angles and shapes X x f Z Y y f Z 3D Computer Vision Camera Parameters and Video Computing Pose / Camera xim Image frame (xim,yim) yim  y x p Coordinate Systems      Frame Grabber O Frame coordinates (xim, yim) pixels Image coordinates (x,y) in mm Camera coordinates (X,Y,Z) World coordinates (Xw,Yw,Zw) Camera Parameters   Object / World Zw P Pw Xw Yw Intrinsic Parameters (of the camera and the frame grabber): link the frame coordinates of an image point with its corresponding camera coordinates Extrinsic parameters: define the location and orientation of the camera coordinate system with respect to the world coordinate system 3D Computer Vision and Video Computing y (0,0) x p (x,y,f) O     Image center Directions of axes Pixel size ox xim o y yim Pixel (xim,yim) x  ( xim  ox ) s x y  ( yim  o y ) s y From 3D to 2D   Size: (sx,sy) From image to frame  Intrinsic Parameters (I) Perspective projection Intrinsic Parameters    (ox ,oy) : image center (in pixels) (sx ,sy) : effective size of the pixel (in mm) f: focal length X x f Z Y y f Z 3D Computer Vision and Video Computing (x, y)   Intrinsic Parameters (II) k1 , k2 (xd, yd) Lens Distortions Modeled as simple radial distortions     r2 = xd2+yd2 x  xd (1  k1r 2  k2r 4 ) (xd , yd) distorted points 2 4 y  y ( 1  k r  k r d 1 2 ) k1 , k2: distortion coefficients A model with k2 =0 is still accurate for a CCD sensor of 500x500 with ~5 pixels distortion on the outer boundary 3D Computer Vision Extrinsic Parameters and Video Computing xim (xim,yim) yim  O y x p From World to Camera Zw P  R Pw  T  Extrinsic Parameters   P Pw Xw T Yw A 3-D translation vector, T, describing the relative locations of the origins of the two coordinate systems (what’s it?) A 3x3 rotation matrix, R, an orthogonal matrix that brings the corresponding axes of the two systems onto each other 3D Computer Vision and Video Computing Linear  A point as a 2D/ 3D vector p   x   ( x, y)T     Algebra: Vector and Matrix Image point: 2D vector Scene point: 3D vector Translation: 3D vector  y   T: Transpose P  ( X , Y , Z )T T  (Tx , Ty , Tz )T Vector Operations  Addition:    Translation of a 3D vector Dot product ( a scalar):  P  Pw  T  ( X w  Tx , Yw  Ty , Z w  Tz )T a.b = |a||b|cosq c  a  b  aT b Cross product (a vector)  Generates a new vector that is orthogonal to both of them c  ab a x b = (a2b3 - a3b2)i + (a3b1 - a1b3)j + (a1b2 - a2b1)k 3D Computer Vision and Video Computing Linear  Rotation: 3x3 matrix  Orthogonal : R 1  RT , i.e. RRT  RT R  I    Algebra: Vector and Matrix  r11 R  rij  r21 33  r31   r12 r22 r32 r13  R1T   T  r23   R 2  r33  RT3    9 elements => 3+3 constraints (orthogonal/cross ) => 2+2 constraints (unit vectors) => 3 DOF ? (degrees of freedom, orthogonal/dot) How to generate R from three angles? (next few slides) Matrix Operations  R Pw +T= ? - Points in the World are projected on three new axes (of the camera system) and translated to a new origin  r11 X w  r12Yw  r13 Z w  Tx   R1T Pw  Tx      T P  RPw  T   r21 X w  r22Yw  r23 Z w  T y   R 2 Pw  T y     T  R P  T r X  r Y  r Z  T z   31 w 32 w 33 w z   3 w 3D Computer Vision and Video Computing  Rotation: from Angles to Matrix Rotation around the Axes  Result of three consecutive rotations around the coordinate axes O  Zw R  R R  R  Notes:    Only three rotations Every time around one axis Bring corresponding axes to each other   Xw = X, Yw = Y, Zw = Z First step (e.g.) Bring Xw to X Xw  Yw  3D Computer Vision and Video Computing Rotation: from Angles to Matrix  cos  R   sin   0  cos  0 0 0 1 Zw O Yw Rotation  around the Zw Axis       sin  Rotate in XwOYw plane Goal: Bring Xw to X But X is not in XwOYw Xw YwX X in XwOZw (Yw XwOZw)  Yw in YOZ ( X YOZ) Next time rotation around Yw 3D Computer Vision and Video Computing Rotation: from Angles to Matrix  cos  R   sin   0  cos  0 0 0 1 Zw Xw Rotation  around the Zw Axis     sin  Rotate in XwOYw plane so that YwX X in XwOZw (YwXwOZw)  Yw in YOZ (  XYOZ) Zw does not change O Yw 3D Computer Vision and Video Computing cos  R    0  sin   Zw Xw Rotation  around the Yw Axis    0  sin   1 0  0 cos   Rotation: from Angles to Matrix O  Rotate in XwOZw plane so that Xw = X  Zw in YOZ (& Yw in YOZ) Yw does not change Yw 3D Computer Vision and Video Computing cos  R    0  sin   0  sin   1 0  0 cos   Rotation  around the Yw Axis    Rotation: from Angles to Matrix O  Rotate in XwOZw plane so that Xw = X  Zw in YOZ (& Yw in YOZ) Yw does not change Yw Xw Zw 3D Computer Vision and Video Computing 0 1 R  0 cos  0 sin     sin   cos   0 Rotation  around the Xw(X) Axis    Rotation: from Angles to Matrix Rotate in YwOZw plane so that Yw = Y, Zw = Z (& Xw = X) Xw does not change  O Yw Xw Zw 3D Computer Vision and Video Computing 0 1 R  0 cos  0 sin  Rotation: from Angles to Matrix   sin   cos   Yw 0  O Xw Zw  Rotation  around the Xw(X) Axis    Rotate in YwOZw plane so that Yw = Y, Zw = Z (& Xw = X) Xw does not change 3D Computer Vision and Video Computing Rotation: from Angles to Matrix Appendix A.9 of the textbook  Rotation around the Axes  Result of three consecutive rotations around the coordinate axes O R  R R  R   Zw Notes:       Rotation directions The order of multiplications matters: ,, Same R, 6 different sets of ,, R Non-linear function of ,, R is orthogonal It’s easy to compute angles from R cos  cos   R   sin  sin  cos   cos  sin   cos  sin  cos   sin  sin   cos  sin  sin  sin  sin   cos  cos   cos  sin  sin   sin  cos  Xw  Yw   sin    sin  cos   cos  cos   3D Computer Vision and Video Computing Rotation- Axis and Angle Appendix A.9 of the textbook   According to Euler’s Theorem, any 3D rotation can be described by a rotating angle, q, around an axis defined by an unit vector n = [n1, n2, n3]T. Three degrees of freedom – why?  n12  R  I cos q  n2 n1  n3 n1  n1n2 n22 n3 n2 n1n3   0  n2 n3  (1  cos q )   n3  n2 n32   n3 0 n1 n2   n1  sin q 0  3D Computer Vision Linear Version and Video Computing  World to Camera       Camera: P = (X,Y,Z)T Image: p = (x,y)T Not linear equations Image to Frame     r11 X w  r12Yw  r13 Z w  Tx   R1T Pw  Tx      P  RPw  T   r21 X w  r22Yw  r23 Z w  T y   RT2 Pw  T y     T  r X  r Y  r Z  T  31 w 32 w 33 w z   R 3 Pw  Tz  Camera to Image   Camera: P = (X,Y,Z)T World: Pw = (Xw,Yw,Zw)T Transform: R, T of Perspective Projection Neglecting distortion Frame (xim, yim)T World to Frame   (Xw,Yw,Zw)T -> (xim, yim)T Effective focal lengths   fx = f/sx, fy=f/sy Three are not independent ( x, y )  ( f X Y , f ) Z Z x  ( xim  ox ) s x y  ( yim  o y ) s y r X  r Y  r Z T xim  ox   f x 11 w 12 w 13 w x r31 X w  r32Yw  r33 Z w  Tz yim  o y   f y r21 X w  r22Yw  r23 Z w  Ty r31 X w  r32Yw  r33 Z w  Tz 3D Computer Vision and Video Computing  Projective Space  Add fourth coordinate      Only extrinsic parameters World to camera 3x3 Matrix Mint      X  x1  w      x2   M int M ext  Yw  Z  x   3  w  1   xim   x1 / x3        yim   x2 / x3  x1/x3 =xim, x2/x3 =yim 3x4 Matrix Mext   Pw = (Xw,Yw,Zw, 1)T Define (x1,x2,x3)T such that   Linear Matrix Equation of perspective projection Only intrinsic parameters Camera to frame  r11 M ext  r21  r31 r12 r22 r32  f x M int   0  0 0  fy 0 r13 Tx  R1T  r23 T y   RT2 r33 Tz  RT3  Tx   Ty   Tz  ox  o y  1  Simple Matrix Product! Projective Matrix M= MintMext    (Xw,Yw,Zw)T -> (xim, yim)T Linear Transform from projective space to projective plane M defined up to a scale factor – 11 independent entries 3D Computer Vision and Video Computing  Perspective Camera Model  Making some assumptions     Known center: Ox = Oy = 0 Square pixel: Sx = Sy = 1   fr11 M   fr21  r31  fr12  fr22 r32  fr13  fTx   fr23  fTy  r33 Tz      11 independent entries <-> 7 parameters Weak-Perspective Camera Model   Average Distance Z >> Range dZ Define centroid vector Pw Z  Z  RT 3 Pw  Tz   Three Camera Models 8 independent entries Affine Camera Model    Mathematical Generalization of Weak-Pers Doesn’t correspond to physical camera But simple equation and appealing geometry     M wp     Doesn’t preserve angle BUT parallelism 8 independent entries fr11 fr21 0 fr12 fr22 0  a11 M af  a21  0 fr13  fTx   fr23  fTy   0 RT 3 Pw  Tz  a12 a22 0 a13 b1  a23 b2  0 b3  3D Computer Vision and Video Computing  Planes are very common in the Man-Made World nx X w  n yYw  nz Z w  d     One more constraint for all points: Zw is a function of Xw and Yw Zw=0 Pw =(Xw, Yw,0, 1)T 3D point -> 2D point Projective Model of a Plane   n T Pw  d Special case: Ground Plane   Camera Models for a Plane 8 independent entries General Form ?  8 independent entries  x1    fr11     x2    fr21 x   r  3   31  fr12  fr22 r32  Xw  fr13  fTx   Yw   fr23  fTy  Z 0 r33 Tz  w 1 3D Computer Vision and Video Computing  Camera Models for a Plane A Plane in the World nx X w  n yYw  nz Z w  d     Zw=0 Pw =(Xw, Yw,0, 1)T 3D point -> 2D point  x1    fr11     x2    fr21 x   r  3   31  fr12  fr22 r32 Projective Model of Zw=0   One more constraint for all points: Zw is a function of Xw and Yw Special case: Ground Plane   n T Pw  d 8 independent entries General Form ?   Xw  fr13  fTx   Yw   fr23  fTy  Z 0 r33 Tz  w 1 8 independent entries  x1    fr11     x2    fr21 x   r  3   31  fr12  fr22 r32  fTx  X w      fT23  Yw  Tz 1  3D Computer Vision and Video Computing  Camera Models for a Plane A Plane in the World nx X w  n yYw  nz Z w  d      x1    fr11     x2    fr21 x   r  3   31 Zw=0 Pw =(Xw, Yw,0, 1)T 3D point -> 2D point Projective Model of Zw=0   One more constraint for all points: Zw is a function of Xw and Yw Special case: Ground Plane   n T Pw  d  fr12  fr22 r32  Xw   fr13  fTx   Yw    fr23  fTy  Zw   r33 Tz   1  8 independent entries General Form ?   x1    f (r11  nx r13 )  f (r12  n y r13 )  f (dr13  Tx )  X w       x   f ( r  n r )  f ( r  n r )  f ( dr  T ) Y nz = 1  2  21 x 23 22 y 23 23 y  w  (r32  n y r33 ) (dr33  Tz ) 1  Z w  d  nx X w  n yYw  x3   (r31  nx r33 )  8 independent entries  2D (xim,yim) -> 3D (Xw, Yw, Zw) ? 3D Computer Vision and Video Computing  Graphics /Rendering  From 3D world to 2D image     Changing viewpoints and directions Changing focal length Fast rendering algorithms Vision / Reconstruction  From 2D image to 3D model      Applications and Issues     xim   x1 / x3       y x / x  im   2 3  Inverse problem Much harder / unsolved Robust algorithms for matching and parameter estimation Need to estimate camera parameters first Calibration      X  x1  w      x2   M int M ext  Yw  Z  x   3  w  1  Find intrinsic & extrinsic parameters Given image-world point pairs Probably a partially solved problem ? 11 independent entries  <-> 10 parameters: fx, fy, ox, oy, ,,, Tx,Ty,Tz  f x M int   0  0  r11 M ext  r21 r31 0  fy 0 r12 r22 r32 ox  o y  1  r13 Tx  r23 Ty  r33 Tz  3D Computer Vision and Video Computing  Geometric Projection of a Camera     Pinhole camera model Perspective projection Weak-Perspective Projection Camera Parameters (10 or 11)    Camera Model Summary Intrinsic Parameters: f, ox,oy, sx,sy,k1: 4 or 5 independent parameters Extrinsic parameters: R, T – 6 DOF (degrees of freedom) Linear Equations of Camera Models (without distortion)      General Projection Transformation Equation : 11 parameters Perspective Camera Model: 11 parameters Weak-Perspective Camera Model: 8 parameters Affine Camera Model: generalization of weak-perspective: 8 Projective transformation of planes: 8 parameters 3D Computer Vision Next and Video Computing  Determining the value of the extrinsic and intrinsic parameters of a camera Calibration (Ch. 6)     X  x1  w      x2   M int M ext  Yw  Z  x   3  w  1   f x M int   0  0  r11 M ext  r21 r31 0  fy 0 r12 r22 r32 ox  o y  1  r13 Tx  r23 Ty  r33 Tz 

Introduction

Related documents

Products

Support

Introduction

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib