1 ECSE 6650 Computer Vision Project 2 3-D Reconstruction Zhiwei Zhu, Ming Jiang, Chih-ting Wu 1. Introduction The objective of this project is to investigate and implement the methods to recover the 3D properties of object from a pair of stereo images. The computation of 3D reconstruction from a pair of stereo images generally consists of the following 3 steps: (1) rectification, (2) correspondence search, and (3) reconstruction. Given a pair of stereo images, rectification determines a transformation of each image such that pairs of conjugate epipolar lines become collinear and parallel to the horizontal image axis. The importance of rectification is to reduce the correspondence problem from 2-D search to just 1-D search. In the correspondence problem, we need to determine for each pixel in the left image, which pixel in the right image corresponds to it. The search will be correlation-based. Since the images have been rectified, to find the correspondence of a pixel in the left image does not require a search in the whole right image. Instead, we just need to search on the same row in the right image. Due to different occlusion in the both images, some pixels do not have correspondences. In the step of reconstruction, by triangulating each pixel and its correspondence, we can compute the 3D coordinate at that pixel. 2. Camera Calibration 2.1 Calibration Theory Calibration is a technique to estimate the extrinsic and intrinsic parameters of the stereo system from 2D image and provide the priori information for building the 3D structure straightforward. In the full perspective projection camera, each calibration point ( xi yi z i ) projects onto an image plane point with coordinates (ci ri ) determined by the equation 2 xi p1t ci y ri P i p2t z 1 i p3t 1 x p14 i y p24 i z p34 i 1 (2-1) P WM and (2-2) ,where is a scale factor, P is the homogeneous projection fs x matrix, W 0 0 0 fs y 0 r1 c0 r0 is the intrinsic matrix, and M ( R T ) r2 r 1 3 tx t y is the t z extrinsic matrix. Hence, equation(1) can be rewritten as ci s x fr1 c0 r3 ri s y fr2 r0 r3 1 r3 x s x ft x c0 t z i y s y ft y r0 t z i zi tz 1 (2-3) For each pair of 2-D and 3-D point i =0…N , we have equation as p1t M i p14 ci p3t M i ci p34 0 (2-4) p 2t M i p 24 ri p3t M i ri p34 0 (2-5) , where M i ( xi yi z i ) . We can set up a system of linear equations as Mi 0 AV M N 0 1 0 0 ci M i 0 1 Mi ri M i 0 cN M N 1 0 0 1 MN rN M N p1t ci p14 ri t p2 0 p 24 c N t p3 rn p34 (2-6) where 0 (0 0 0) . In general, equation (2-6) has no exact solution, due to errors in the sampling process, and has to be approximated by using a least square approach, i.e. minimizing AV imposing the additional normalization constraint 2 by p3 1 2 || AV || 2 (|| p3 || 2 1) (2-7) 3 Decomposing A into two matrices B and C, and V into Y and Z A ( B C) Y V Z M1 0 , where B M N 0 1 0 1 0 M1 0 0 MN c1 1 r1 0 cN 1 rN 0 c1 M 1 r1 M 1 , C cN M N r M N N (2-8) (2-9) p1 p14 Y p2 p 24 p 34 ,and Z p3 then the equation(2-7) can be written as 2 || BY CZ || 2 (|| Z || 2 1) (2-10) Taking partial derivatives of ε2 with respect to Y and Z and setting them to equal to 0 yield Y ( B t B) 1 B t CZ C t ( I B( B t B) 1 B t )CZ Z (2-11) (2-12) The solution to Z is the eigenvector of matrix C t ( I B( B t B) 1 B t )C . Given Z , then we can solve for Y . Substituting Y into || BY + CZ||2 =λ. This could prove that solution Z corresponds to the eigenvector of the smallest positive eigenvalue of matrix. 2.2. Camera Calibration Results The left and right images taken from the same camera are shown in Figure 1. Figure 1. The left and right images taken from the same camera We manually measured the pixel coordinates of the grid corners in the calibration panels, 4 and these pixel coordinates are composed of the 2D point set used for camera calibration. The 3D points of these 2D points are already given. Then calibrated the camera based on these 2D-3D-point correspondence set. The estimated camera intrinsic and extrinsic parameters are as shown in Table 1. Table 1: The estimated intrinsic and extrinsic parameters Intrinsic Parameter Matrix Rotation Matrix Translation Left Camera 317.3359 858.3853 0 0 - 851.2035 274.9800 0 0 1.0000 - 0.6096 0.7923 0.0235 0.0579 0.0174 0.9982 - 0.7905 - 0.6099 0.0565 - 1.2242 - 6.5227 65.8155 Right Camera 314.2041 861.6958 0 0 - 850.1881 282.0719 0 0 1.0000 - 0.8353 0.5491 0.0281 0.0466 0.0304 0.9984 - 0.5474 - 0.8353 0.0510 4.6503 - 6.2438 68.8479 From the above table, we can find out the fact that the estimated intrinsic parameters for left and right cameras are a little different from each other, although they are the same camera. When we measured the pixel coordinates of the grid corners in the left and right images, we cannot get the exact one. Usually, there are around 2 or 3 pixels error. Therefore, the measurement of the error can cause the deviation between the estimated left and right camera intrinsic parameters. 3. Fundamental Matrix and Essential Matrix Both the fundamental and essential matrices could completely describe the geometric relationship between corresponding points of a stereo pair of cameras. The only difference between the two is that the fundamental matrix deals with uncalibrated cameras, while the essential matrix deals with calibrated cameras. In this paper, we derived the essential and fundamental followed by the eight-point algorithm. 3.1 Fundamental Matrix Since the fundamental matrix F is a 3×3 matrix determined up to an arbitrary scale factor, 8 equations are required to obtain a unique solution. We manually established correspondences between points on the calibration pattern between two images and applied the matched points followed Eight-point algorithm, i.e. given 8 corresponding points or more we could get a set of linear equations whose null-space are non-trivial. For any pair of matching points u (u, v,1) T , u ' (u ' , v' ,1) T from the epipolar geometry, we have 5 u 'T Fu 0 u' v' 1 , or T f 11 f 21 f 31 f12 f 22 f 32 f 13 u f 23 v 0 f 33 1 (3-1) (3-2) The equation corresponding to a pair of points u (u, v,1) T and , u ' (u ' , v' ,1) T will be uu' f11 uv' f 21 uf 31 vu' f12 vv' f 22 vf32 u' f13 v' f 23 f 33 0 (3-3) From all points matches, we obtained a set of linear equation of the form Af 0 (3-4) , where f is a nine-vector containing the entries of the matrix F, and A is the equation matrix. The fundamental matrix, and hence the solution vector f. This system of equation can be solved by Singular Value Decomposition (SVD). Applying SVD to A yields the decomposition USVt with U, the column-orthogonal matrix and V , the orthogonal matrix and S, a diagonal matrix containing the singular values. These singular values σ1≧σ2 ≧…≧σ9≧ 0 are positive or zero elements in decreasing order. In our caseσ9 is zero (8 equations for 9 unknowns) and thus the last column of V is the solution. 3.2 Essential Matrix The Essential matrix contains five parameters (three for rotation and two for the direction of translation). An Essential matrix is obtained from the Fundamental matrix by a transformation involving the intrinsic parameters of the pair of cameras associated with the two views. Thus, constraints on the Essential matrix can be translated into constraints on the intrinsic parameters of the pair of cameras. From Fundamental matrix F Wl T EWr1 (3-5) E Wl T FWr (3-6) the Essential matrix is 3.3 Experiment Results 6 In the following, we will show the computed fundamental matrix F and the essenti al matrix E: F = 0.0000 - 0.0000 0.0013 0.0000 0.0000 - 0.0023 - 0.0026 - 0.0039 1.0000 E= 0.4173 1.8403 0.6147 - 4.9621 6.9256 - 2.1074 - 0.5248 1.7849 - 0.0329 4. Compute R and T Figure 2 Triangulation If both extrinsic and intrinsic parameters are known, we can computer the 3D location of points from their projections p l and p r unambiguously via Triangulation Algorithm [1]. It is the technique to estimate the intersection of two rays through p l and p r , however, due to the image noise, two rays may not actually intersect in space. The goal of this algorithm is identify the line segment that interests and orthogonal to the two rays, and the estimate the center of the segment. Conveniently following up this geometric solutions, we can find out the relationship between the extrinsic parameters of the stereo system. For a point P in the world reference frame we have (4-1) Pr Rr P Tr Pl Rl P Tl and (4-2) From equation(4-1) and (4-2), we have 7 Pl Rl P Tl Rl [ Rr1 ( Pr Tr )] Tl Rl Rr1 Pr Rl Rr1Tr Tl (4-3) Since the relationship between Pl and Pr is given by Pl RPr T , we equate the terms to get R Rl Rr1 Rl RrT T Tl Rl RrT Tr Tl R T Tr and (4-4) (4-5) The relative orientation and translation R and T of the two cameras are shown as follows: R= 0.9449 0.0136 - 0.3270 - 0.0165 0.9999 0.0047 0.3270 - 0.0009 0.9450 T = 16.9770 - 0.5262 - 0.7741 5. Rectification 5.1 Rectification by Construction of New Perspective Projection Matrices Given a pair of stereo images, epipolar rectification determines a transformation of each image plane such that pairs of conjugate epipolar lines become collinear and parallel to one of the image axes (usually the horizontal one). The rectified images can be thought of as acquired by a new stereo rig, obtained by rotating the original cameras. The important advantage of rectification is that computing stereo correspondences is made simpler, because search is done along the horizontal lines of the rectified images. In the implementation of the rectification process, we have tried several methods. Though the algorithm given in the lecture notes can be proved analytically, we have found that the right rectified image has an obvious shift in vertical direction, thus corresponding image points can’t be located along the same row as the given point on the other image. We then switched to the algorithm given in the textbook. It turns out that this algorithm works well with the given image and camera data. However, the book doesn’t give a detailed proof of this algorithm. Suspecting the correctness of both algorithms, we then implemented the algorithm presented in Rectification with Constrained Stereo Geometry, where this algorithm has been proved analytically by A Fusiello. Our implementation of 8 this algorithm has been proved to be successful, supported by the perfectly rectified image and the properties of the epipolar lines and epipoles. The only input of the algorithm is the pair of the perspective projection matrices (PPM) of the two cameras. The output is a pair of rectifying perspective projection matrices, which can be used to compute the rectified images. The idea behind rectification is to define two new PPMs and obtained by rotating the old ones around their optical centers until focal planes becomes coplanar, thereby containing the baseline. This ensures that epipoles are at infinity, hence epipolar lines are parallel. To have horizontal epipolar lines, the baseline must be parallel to the new X axis of both cameras. In addition, to have a proper rectification, conjugate points must have the same vertical coordinate. This is obtained by requiring that the new cameras have the same intrinsic parameters. Note that, being the focal length the same, retinal planes are coplanar too. The optical centers of the new PPMs are the same as the old cameras, whereas the new orientation (the same for both cameras) differs from the old ones by suitable rotations; intrinsic parameters are the same for both cameras. Therefore, the two resulting PPMs will differ only in their optical centers, and they can be thought as a single camera translated along the X axis of its reference system. The algorithm consists of the following steps: 1. Calculate the intrinsic parameters and extrinsic parameters by camera calibration. 2. Decide the positions of the optical center. The position of the optical center is constraint by the fact that its projection to the image plane is at 0. 3. Calculate the new coordinate system. Let c1 and c2 be the two optical centers. The new x axis is along the direction of the line c1-c2, and the new y digestion is orthogonal to the new x and the old z. The new z axis is then orthogonal to baseline and y. Then the rotation part R the new projection matrix is derived based on this. 4. Construct the new PPMs and the rectification matrices. 5. Apply the new rectification matrixes to the images. 5.2 Resampling-Bilinear Interpolation Once the shift vector (u, v) is updated after the previous iteration the image J must be resampled into a new image Jnew according to (-u, -v) in order to align the matching pixels in I and Jnewand provide for the correct alignment error computation. The shift (u, v) is 9 approximated with subpixel precision, so as we are applying it, it may take us to locations in-between the actual pixel values (see diagram below). Figure 3 Bilinear Interpolation The grayscale value at location (x, y) at each new warped pixel in Jnew as Bx,y This can be achieved in two stages: First we defined B x,0 = (1 - x) B0,0 + xB1,0 and B x,1 = (1 - x) B0,1 + x B1,1. From these two equations, we get Bx,y= (1 - y) B x,0+ y Bx,1=(1 - y)(1 - x) B0,0 + x (1 - y) B1,0+ y (1 - x) B0,1+ xyB1,1 5.3 Rectification Results (5-6) (5-7) (5-8) 10 Figure 4. Rectified left and right images Figure 2 shows the rectified left and right images. We can see that for each image point in the left image, the corresponding point in the right image is always in the same row, as indicated as the white lines in Figure 2. 6. Draw the Epipolar Lines Epipoles before rectification: Left: 1.0e+004 * (-1.8508 -0.0304) Right: 1.0e+003 *(-1.8514 0.2502) Epiloles after rectification: Left: 1.0e+018 *( -1.0271 0.0000) Right: 1.0e+018 *( -1.0068 0.0000) This means that both the epipoles are at infinity along the x axis of the image coordinates, parallel to the baseline. This can be explained by the special form of fundamental matrix after rectification. Our experiments show that the fundamental matrix has a null vector of [1 0 0]’. Eipolar lines corresponding to the corners of A5: in the form of a*U+b*V+c=0 Before rectification: -0.0006 -0.0198 3.7996 -0.0005 -0.0199 3.9630 0.0000 -0.0198 4.9736 0.0001 -0.0199 5.0736 After rectification: -0.0000 -0.0200 3.5044 11 -0.0000 -0.0000 -0.0200 -0.0200 3.7241 4.9826 The lines are shown in the figures Epipolar lines before rectification 12 Epipolar lines after rectification 7. Correlation-based Point Matching After rectification, for each image point in the left image, we can always find its corresponding image point in the right image by searching the same row. Therefore, the rectification procedure reduces the point matching procedure from 2D search to 1D search. Thus, we can improve both the matching speed and the matching accuracy. In the following, we will discuss the correlation-based method to do 1D search for the point matching. 7.1 Correlation Method For each left image pixel within this rectangular area, its correlation with a right image pixel is determined by using a small correlation window of fixed size, in which we compute the sum of squared differences (SSD) of the pixel intensities: W W c(d ) ( I l (i k , j l ), I r (i k d1 , j l d 2 )) k W l W where (6-1) 13 (2W+1) is the width of the correlation window. Il and Ir are the intensities of the left and right image pixels respectively. [i, j] are the coordinates of the left image pixel. d =[d1, d2]T is the relative displacement between the left and right image pixels. (u, v) (u v) 2 is the SSD correlation function. 7.2 Disparity Map Figure 5. Triangulation map From Figure 5, we can get the following equation: Z fb u1 u2 (6-2) Where, u1 u2 is the disparity, b is the baseline distance, and f is the focus length. After locating the corresponding points on the same row in the left and right images, we can compute the disparity for each matched pair one by one. Also, we can see that the depth Z is inversely proportional to the disparity d, in here, we set d = u1 u2 . In order to illustrate the disparity for each matched pair, we first normalize the disparity values within the pixel intensity range (0-255). Thus we can display the disparity as pixel intensities. As a result, the brighter is the image point, the higher is the disparity and the smaller is the depth. Note that since we only consider well-correlated points on the maps, we set the disparity to 255 for lowly correlated points. Figure 3 shows the selected region in the left image to do the correlation-based point matching. Also, the corresponding 14 searching region in the right image is cropped and shown. Figure 6. (a) Left rock image (b) Right rock image From Figure 3, we can see that the rock is in the left lower part of the image. The calculated disparity maps with different threshold value for correlation is shown in Figure 4. From the disparity maps, we can distinguish the rock from the background based on the pixel intensities very well because the rock is brighter than the background. Also, this shows that the rock is closer to the camera than the background. Due to the some mismatched points between the left and right points, some background points are also as bright as the rock part, but that is not very significant. Compared with rock part, most of the background points are darker. (a) Threshold value = 0.3 (b) Threshold value = 0.4 15 (c) Threshold value = 0.5 (d) Threshold value = 0.6 Figure 7. The disparity maps with different threshold values (a) (b) (c) (d) 8. 3-D Reconstruction 8.1 Known Both Extrinsic and Intrinsic Parameters If we known the relative orientation R and T , and the intrinsic parameters Wl and Wr , we can reconstruct 3D geometry from two methods: 1) by coinciding object frame with left camera frame.2) by geometric solutions. 8.1.1 Reconstruction by Triangulation Assume the object frame coincide with the left camera frame and let (cl, rl) and (cr, rr) be the left and right image points respectively. The 3-D coordinates (x, y, z) can be solved through the perspective projection equations from left and right image as x cl y λl rl Wl M l (8-1) z 1 1 x cr y λr rr Wr M r and (8-2) z 1 1 This equation system contains of 5 unknowns (x, y, z, l, r) and 6 linear equations, the solution can be obtained using the least-squares method. Let Pl Wl M l and Pr Wr M r represent the projective matrix for left image and right image, respectively, thus we have 16 x x cl Pl 11 Pl 12 Pl 13 Pl 14 cl 0 y y Pl λl rl Pl 21 Pl 22 Pl 23 Pl 24 λl rl 0 z z 1 P 1 0 l 31 Pl 32 Pl 33 Pl 34 1 1 x x c r Pr 11 Pr 12 Pr 13 Pr 14 cr 0 y y Pr λr rr Pr 21 Pr 22 Pr 23 Pr 24 λr rr 0 z z 1 P 1 0 r 31 Pr 32 Pr 33 Pr 34 1 1 (8-3) (8-4) The combination of the equation (7-3)and(7-4), we have Pl 11 Pl 21 P l 31 Pr 11 P r 21 P r 31 Pl 12 Pl 13 cl Pl 22 Pl 23 rl Pl 32 Pl 33 1 Pr 12 Pr 13 0 Pr 22 Pr 23 0 Pr 32 Pr 33 0 0 P x l 14 0 Pl 24 y 0 Pl 34 z c r Pr 14 rr l Pr 24 1 r Pr 34 (8-5) The least-squares solution of the linear system AX B is given by X ( AT A) 1 AT B (8-6) The 3-D coordinates can be thus obtained from the two corresponding image points. The constructed 3D map are shown in Figure 8. 17 Figure 8. 3D reconstruction of rock pile The output of 3d coordinates can be found in the uploaded files, which is named as “3D.txt”. 9. Summary and Conclusion In this project, we performed 3D reconstruction of a scene of a pile of rocks from its 2 images taken from two different viewpoints. There are three main steps to do the 3D reconstruction: rectification, correspondence search and reconstruction. The experimental results showed that we have successfully reconstructed the 3D scene of the rock which is invisible in both images. This most difficult part is the rectification part, but finally, we utilized a new method to rectify the left and right images, and good results are archived. Code: Version 1: Calibration Version2 Calibration Rectification Reconstruction Rectification Reconstruction