Structure from motion Class 9 Read Chapter 5 Geometric Computer Vision course schedule (tentative) Lecture Exercise Sept 16 Introduction - Sept 23 Geometry & Camera model Camera calibration Sept 30 Single View Metrology Measuring in images (Changchang Wu) Oct. 7 Feature Tracking/Matching Correspondence computation Oct. 14 Epipolar Geometry F-matrix computation Oct. 21 Shape-from-Silhouettes Visual-hull computation Oct. 28 Stereo matching papers Nov. 4 Stereo matching (continued) Project proposals Nov. 11 Structured light and active range sensing Papers Nov. 18 Structure from motion and visual SLAM Papers Nov. 25 Multi-view geometry and self-calibration Papers Dec. 2 3D modeling, registration and range/depth fusion (Christopher Zach?) Papers Dec. 9 Shape-from-X and image-based rendering Papers Dec. 16 Final project presentations Final project presentations Today’s class • Structure from motion • factorization • sequential • bundle adjustment Factorization • Factorise observations in structure of the scene and motion/calibration of the camera • Use all points in all images at the same time Affine factorisation Projective factorisation Affine camera The affine projection equations are X j xij Pi y P y Yj ij i Z j 1 0001 1 X j xij Pi x Y j y y Z ij Pi j 1 X j x4 ~ x xij Pi xij Pi ~ y Yj y4 yij Pi yij Pi Z j x how to find the origin? or for that matter a 3D reference point? affine projection preserves center of gravity ~ xij xij xij i ~ yij yij yij i Orthographic factorization (Tomasi Kanade’92) The ortographic projection equations are where mij Pi M j , i 1,...,m, j 1,...,n X j ~ x Pi xij mij ~ , Pi y , M j Y j yij Pi Z j All equations can be collected for all i and j where m11 m m 21 m m1 m PM m12 m 22 mm2 P1 m1n m 2 n P2 , P , M M1 , M 2 ,..., M n m mn Pm Note that P and M are resp. 2mx3 and 3xn matrices and therefore the rank of m is at most 3 Orthographic factorization (Tomasi Kanade’92) Factorize m through singular value decomposition m UV T An affine reconstruction is obtained as follows ~ ~ P U, M V T Closest rank-3 approximation yields MLE! m11 m 21 min m m1 m12 m 22 mm2 m1n P1 m 2 n P2 M , M ,..., M n 1 2 m mn P m Orthographic factorization (Tomasi Kanade’92) Factorize m through singular value decomposition m UV T An affine reconstruction is obtained as follows ~ ~ P U, M V T A metric reconstruction is obtained as follows ~ 1 ~ P PA , M AM Where A is computed from T ~ xx x~ 1 x T T ~ x T 3 linear equations per view on PPii P C 1i 1 Ai Pi A 1P symmetric matrix C (6DOF) T ~ yy y~ 1 y T T ~ y T PPii P C Ai Pi A 1P1i 1 A can be obtained from C T ~ xx y~ 1 y T T ~ y T through Cholesky factorisation PPii P 0P0i 0 C Ai Pi A and inversion Examples Tomasi Kanade’92, Poelman & Kanade’94 Examples Tomasi Kanade’92, Poelman & Kanade’94 Examples Tomasi Kanade’92, Poelman & Kanade’94 Examples Tomasi Kanade’92, Poelman & Kanade’94 Perspective factorization The camera equations λ ij mij Pi M j , i 1,..., m, j 1,..., m for a fixed image i can be written in matrix form as where mi i Pi M m i mi1 , mi 2 ,..., mim , M M1 , M 2 ,..., M m i diag λ i1 , λ i 2 ,..., λ im Perspective factorization All equations can be collected for all i as where m PM m11 P1 m P 2 2 2 m , P ... ... m n n Pm In these formulas m are known, but i,P and M are unknown Observe that PM is a product of a 3mx4 matrix and a 4xn matrix, i.e. it is a rank-4 matrix Perspective factorization algorithm Assume that i are known, then PM is known. Use the singular value decomposition PM=U VT In the noise-free case =diag(s1,s2,s3,s4,0, … ,0) and a reconstruction can be obtained by setting: P=the first four columns of U. M=the first four rows of V. Iterative perspective factorization When i are unknown the following algorithm can be used: 1. Set lij=1 (affine approximation). 2. Factorize PM and obtain an estimate of P and M. If s5 is sufficiently small then STOP. 3. Use m, P and M to estimate i from the camera equations (linearly) mi i=PiM 4. Goto 2. In general the algorithm minimizes the proximity measure P(,P,M)=s5 Note that structure and motion recovered up to an arbitrary projective transformation Further Factorization work Factorization with uncertainty (Irani & Anandan, IJCV’02) Factorization for dynamic scenes (Costeira and Kanade ‘94) (Bregler et al. ‘00, Brand ‘01) (Yan and Pollefeys, ‘05/’06) practical structure and motion recovery from images • Obtain reliable matches using matching or tracking and 2/3-view relations • Compute initial structure and motion • Refine structure and motion • Auto-calibrate • Refine metric structure and motion Sequential Structure and Motion Computation Initialize Motion (P1,P2 compatibel with F) Extend motion (compute pose through matches seen in 2 or more previous views) Initialize Structure (minimize reprojection error) Extend structure (Initialize new structure, refine existing structure) Computation of initial structure and motion according to Hartley and Zisserman “this area is still to some extend a black-art” All features not visible in all images No direct method (factorization not applicable) Build partial reconstructions and assemble (more views is more stable, but less corresp.) 1) Sequential structure and motion recovery 2) Hierarchical structure and motion recovery Sequential structure and motion recovery • Initialize structure and motion from two views • For each additional view • Determine pose • Refine and extend structure • Determine correspondences robustly by jointly estimating matches and epipolar geometry Initial structure and motion Epipolar geometry Projective calibration m Fm 1 0 T 2 P1 I 0 P2 ex F eaT e compatible with F Yields correct projective camera setup (Faugeras´92,Hartley´92) Obtain structure through triangulation Use reprojection error for minimization Avoid measurements in projective space Determine pose towards existing structure M 2D-3D 2D-3D mi+1 mi 2D-2D new view x i Pi X(x 1 ,..., x i 1 ) Compute Pi+1 using robust approach (6-point RANSAC) Extend and refine reconstruction Compute P with 6-point RANSAC • Generate hypothesis using 6 points • Count inliers • Projection error d Pi Xx1 ,..., x i 1 , x i t ? • 3D error d Pi x i , X t3D ? -1 • Back-projection error d Fij x i , x j t ?, j i • Re-projection error d Pi Xx1 ,..., x i 1 , x i , x i t • Projection error with covariance d Pi Xx1 ,..., x i 1 , x i t • Expensive testing? Abort early if not promising • Verify at random, abort if e.g. P(wrong)>0.95 (Chum and Matas, BMVC’02) Calibrated structure from motion • Equations more complicated, but less degeneracies • For calibrated cameras: • 5-point relative motion (5DOF) Nister CVPR03 • 3-point pose estimation (6DOF) Haralick et al. IJCV94 D. Nistér, An efficient solution to the five-point relative pose problem, In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), Volume 2, pages. 195-202, 2003. R. Haralick, C. Lee, K. Ottenberg, M. Nolle. Review and Analysis of Solutions of the Three Point Perspective Pose Estimation Problem. Int’l Journal of Computer Vision, 13, 3, 331-356, 1994. 5-point relative motion (Nister, CVPR03) • Linear equations for 5 points • Linear solution space E = xX + yY + zZ + wW scale does not matter, choose w = 1 • Non-linear constraints detE = 0 > EE E ¡ 1 > 2 tr ace(EE 10 cubic polynomials )E = 0 5-point relative motion (Nister, CVPR03) • Perform Gauss-Jordan elimination on polynomials [n] represents polynomial of degree n in z -z -z -z Three points perspective pose – p3p (Haralick et al., IJCV94) All techniques yield 4th order polynomial Haralick et al. recommends using Finsterwalder’s technique as it yields the best results numerically 1841 1903 Minimal solvers Lot’s of recent activity using Groebner bases: • • • • Henrik Stewénius, David Nistér, Fredrik Kahl, Frederik Schaffalitzky: A Minimal Solution for Relative Pose with Unknown Focal Length, CVPR 2005. H. Stewénius, D. Nistér, M. Oskarsson, and K. Åström. Solutions to minimal generalized relative pose problems. Omnivis 2005. D. Nistér, A Minimal solution to the generalised 3-point pose problem, CVPR 2004 Martin Bujnak, Zuzana Kukelova, Tomás Pajdla: A general solution to the P4P problem for camera with unknown focal length. CVPR 2008. • • • Brian Clipp, Christopher Zach, Jan-Michael Frahm and Marc Pollefeys, A New Minimal Solution to the Relative Pose of a Calibrated Stereo Camera with Small Field of View Overlap, ICCV 2009. Zuzana Kukelova, Martin Bujnak, Tomás Pajdla: Automatic Generator of Minimal Problem Solvers. ECCV 2008. … Changchang’s SfM code for iconic graph • uses 5-point+RANSAC for 2-view initialization • uses 3-point+RANSAC for adding views • performs bundle adjustment For additional images • use 3-point+RANSAC pose estimation Hierarchical structure and motion recovery • • • • Compute 2-view Compute 3-view Stitch 3-view reconstructions Merge and refine reconstruction F T H PM Stitching 3-view reconstructions Different possibilities 1. Align (P2,P3) with (P’1,P’2) arg min d A P2 , P'1 H -1 d A P3 , P'2 H -1 2. Align X,X’ (and C’C’) d X , HX' arg min d PH X' , x H arg min A H 3. Minimize reproj. error j -1 j H j arg min P,X j d P' HX j , x' j d PX j 4. MLE (merge) j j j j ,x j Refining structure and motion • Minimize reprojection error m n min D mki, P̂k M̂i P̂k ,M̂ i k 1 i 1 2 • Maximum Likelyhood Estimation (if error zero-mean Gaussian noise) • Huge problem but can be solved efficiently (Bundle adjustment) Non-linear least-squares X f (P) argmin P • Newton iteration • Levenberg-Marquardt • Sparse Levenberg-Marquardt X f (P) Newton iteration Taylor approximation Jacobian X J P f (P0 ) f (P0 ) J X f (P1 ) X f (P1 ) X f (P0 ) J e0 J -1 T J J J e0 J J J e0 T T Pi 1 Pi T -1 T J J J e0 T -1 J T -1J J T -1e0 normal eq. Levenberg-Marquardt Normal equations J J N J e0 T T Augmented normal equations N' J T e0 N' J J λdiag(J J) λ 0 10 3 success : λ i 1 λ i / 10 failure : λ i 10λ i T accept solve again l small ~ Newton (quadratic convergence) l large ~ descent (guaranteed decrease) T Levenberg-Marquardt Requirements for minimization • Function to compute f • Start value P0 • Optionally, function to compute J (but numerical ok, too) Sparse Levenberg-Marquardt • N 3 complexity for solving N'-1 J T e0 • prohibitive for large problems (100 views 10,000 points ~30,000 unknowns) • Partition parameters • partition A • partition B (only dependent on A and itself) Sparse bundle adjustment residuals: normal equations: with note: tie points should be in partition A Sparse bundle adjustment normal equations: modified normal equations: solve in two parts: Sparse bundle adjustment Jacobian of m n has sparse block structure D m ki , P̂k M̂ i k 1 i 1 P1 P2 P3 2 M U1 im.pts. view 1 U2 J W N JT J U3 WT 12xm 3xn (in general much larger) V Needed for non-linear minimization Sparse bundle adjustment • Eliminate dependence of camera/motion parameters on structure parameters Note in general 3n >> 11m I WV N 0 I 1 Allows much more efficient computations e.g. 100 views,10000 points, solve 1000x1000, not 30000x30000 Often still band diagonal use sparse linear algebra algorithms U-WV-1WT WT V 11xm 3xn Sparse bundle adjustment normal equations: modified normal equations: solve in two parts: Sparse bundle adjustment • Covariance estimation a U WV W T 1 b Y a Y V ab - a Y -1 Y WV -1 Related problems • On-line structure from motion and SLaM (Simultaneous Localization and Mapping) • Kalman filter (linear) • Particle filters (non-linear) Open challenges • Large scale structure from motion • Complete building • Complete city Next class: Multi-View Geometry and Self-Calibration