Colorado School of Mines Computer Vision Professor William Hoff Dept of Electrical Engineering &Computer Science Colorado School of Mines Computer Vision http://inside.mines.edu/~whoff/ 1 3D-2D Coordinate Transforms Colorado School of Mines Computer Vision 2 3D to 2D Projections • We have already seen how to project 3D points onto a 2D image – We used the pinhole camera model – Also the geometry of similar triangles • Now we will look at how to model this using matrix multiplication • This will help us better understand and model: – Perspective projection – Other projection types, such as weak perspective projection – Special cases such as the projection of a planar surface Colorado School of Mines Computer Vision 3 Intrinsic Camera Parameters CP(X,Y,Z) • Recall the intrinsic camera parameters, for a pinhole camera model f – Focal length f and sensor element sizes sx,sy • Or, just focal lengths in pixels fx,fy – Optical center of the image at pixel location cx, cy {π} (1,1) • We can capture all the intrinsic camera parameters in a matrix K f / sx K = 0 0 Colorado School of Mines 0 f / sy 0 cx fx c y or K = 0 0 1 Computer Vision {C} Center at (cx,cy) 0 fy 0 cx cy 1 4 3D to 2D Perspective Transformation • We can project 3D points onto 2D with a matrix multiplication • We treat the result as a 2D point in homogeneous coordinates. So we divide through by the last element. X 1 0 0 0 X Y 0 1 0 0 = Y 0 0 1 0 Z Z 1 X X /Z ~ x =Y =Y /Z Z 1 • Recall perspective projection is: x = f X/Z + cx, y = f Y/Z + cy – If f=1 and (cx,cy) = (0,0), then this is the image projection of the point – As we will see later, it is often useful to consider the projection of a point as if it were taken by a camera with f=1 and (cx,cy) = (0,0) – These are called “normalized image coordinates” Colorado School of Mines Computer Vision 5 Complete Perspective Projection • To project 3D points represented in the coordinate system attached to the camera, to the 2D image plane: C x1 1 0 0 0 x2 = K 0 1 0 0 x 0 0 1 0 3 X Y Z , 1 x x1 / x3 y = x2 / x3 1 1 fx K = 0 0 0 fy 0 cx cy 1 • To see this: C fx 0 0 0 fy 0 X cx 1 0 0 0 f x Y cy 0 1 0 0 = 0 Z 1 0 0 1 0 0 1 Colorado School of Mines 0 fy 0 c x X f x X + c x Z f x X / Z + c x c y Y = f yY + c y Z ~ f yY / Z + c y Z 1 Z 1 Computer Vision 6 Extrinsic Camera Matrix • If 3D points are in world coordinates, we first need to transform them to camera coordinates C C C W WR P W= H P = 0 C tWorg W P 1 • We can write this as an extrinsic camera matrix, that does the rotation and translation, then a projection from 3D to 2D = M ext Colorado School of Mines ( r11 C C = R t W Worg r21 r 31 ) Computer Vision r12 r22 r32 r13 r23 r33 tX tY tZ 7 Complete Perspective Projection • Projection of a 3D point WP in the world to a point in the pixel image (xim,yim) W x1 x2 = K M ext x 3 X Y Z , 1 xim = x1 / x3 , yim = x2 / x3 – where K is the intrinsic camera parameter matrix – and Mext is the 3x4 matrix given by = M ext Colorado School of Mines ( r11 C C = tWorg WR r21 r 31 ) r12 r22 r32 r13 r23 r33 tX tY tZ Computer Vision 8 Example • A camera is mounted on a vehicle. It observes a point P in the world at (X,Y,Z) = (16,0,-1). What pixel would P project to? • Assume f=512 pix, (cx,cy)=(256,256) Colorado School of Mines Computer Vision 9 Colorado School of Mines Computer Vision 10 Back Projection • If you have an image point, you can “back project” that point into the scene • However, the resulting 3D point is not uniquely defined – It is actually a ray emanating from the camera center, out through the image point, to infinity – Any 3D point along that ray could have projected to the image point • First, we convert the image point to “normalized” coordinates x norm = K −1 x un − norm Colorado School of Mines X /Z =Y /Z 1 Computer Vision CP(X,Y,Z) xnorm f=1 r = sxnorm is a ray emanating outward to infinity 11 Example • Assume that the cameraman image was taken using a camera with focal length = 600 pixels, with cx,cy in middle of image. – Find the unit vector in the direction of the man’s eye Colorado School of Mines Computer Vision Special Case • Consider a plane observed from two viewpoints • It can be shown that the image points are transformed using a homography (i.e., an arbritrary 3x3 transformation) image 1 image 0 ~ x1 ~ H ~ x0 2D points in image 1 (homogeneous coords) Colorado School of Mines 2D points in image 0 (homogeneous coords) xa h11 xb = h21 x h c 31 Computer Vision h12 h22 h32 h13 x0 h23 y0 h33 1 x1 xa / xc y1 = xb / xc 1 1 13 Weak Perspective • Sometimes it is better to use an approximation to perspective projection, called “weak” projection or scaled orthography • This works if the average depth Zavg to an object is much larger than the variation in depth within the object – Instead of x = f X/Z, y = f Y/Z – use x = f X/Zavg, y = f Y/Zavg C x1 f x x2 = 0 x 0 3 0 fy 0 c x 1 0 0 0 c y 0 1 0 0 1 0 0 0 Z avg X Y Z , 1 x x1 / x3 = y x x / 2 3 1 1 • This makes the image coordinates (x,y) a linear function of the 3D coordinates (X,Y,Z) Colorado School of Mines Computer Vision 14 Special Case • Small planar patch – Often we want to track a small patch on an object – We want to know how the image of that patch transforms as the object rotates • Assume – Size of patch small compared to distance -> weak perspective – Rotation is small -> small angle approximation – Patch is planar • It can be shown that the patch undergoes affine transformation xB a11 yB = a21 1 0 Colorado School of Mines Computer Vision a12 a22 0 tx xA t y yA 1 1 15 Example • A camera is mounted on a robot vehicle. The center of the camera is located at (X,Y,Z) = (0,+1 ,+2) in vehicle coordinates, and the camera is tilted downwards by 30 degrees Colorado School of Mines Computer Vision 16 Example (continued) • The vehicle is located at (X,Y,Z) = (+4,-4,+1) in world coordinates, with a rotation angle of 30 degrees with respect to the world. Colorado School of Mines Computer Vision • The camera observes four points on the ground, with world coordinates at (0,0,0), (1,0,0), (1,1,0), and (0,1,0). • Generate a synthetic image of the points as seen from the camera, assuming the image is 200 pixels high by 300 pixels wide, with an effective focal length of 300 pixels 17 Colorado School of Mines Computer Vision 18