Structure-from-Motion

Structure-from-Motion • Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving camera. – Equivalently, we can think of the world as moving and the camera as fixed. • Like stereo, but the position of the camera isn’t known (and it’s more natural to use many images with little motion between them, not just two with a lot of motion). – We may or may not assume we know the parameters of the camera, such as its focal length. Structure-from-Motion • As with stereo, we can divide problem: – Correspondence. – Reconstruction. • Again, we’ll talk about reconstruction first. – So for the next few classes we assume that each image contains some points, and we know which points match which. Structure-from-Motion … Movie Reconstruction • A lot harder than with stereo. • Start with simpler case: scaled orthographic projection (weak perspective). – Recall, in this we remove the z coordinate and scale all x and y coordinates the same amount. Announcements • Quiz on April 22 in class (a week from Thursday). – Practice problems handed out Thursday. – Reviews at office hours, possibly in seminar room across the hall from my office. • Questions about PS6 First: Represent motion • We’ll talk about a fixed camera, and moving object. • Key point: Some matrix  x1   y1 P z  1 1  Points x2 y2 . . z2 1 . xn   yn  zn   1  s S   s 1, 2 2 ,1 2,2 s s 1, 3 2,3 The image u u . . . I  v v 1 2 I  SP 1 Then: s s 1,1 2 t t x y    u  v n n Structure-from-Motion • S encodes: – Projection: only two lines – Scaling, since S can have a scale factor. – Translation, by tx/s and ty/s. – Rotation: I  SP Rotation  r1,1   r2,1 r  3,1 r1, 2 r2, 2 r3, 2 r1,3   r2,3  P  r3,3  Represents a 3D rotation of the points in P. First, look at 2D rotation (easier) Matrix R acts on points by rotating them.  cos    sin   cos  R     sin  sin   x1  cos  y1 x2 y2 . sin    cos   . . xn   yn  • Also, RRT = Identity. RT is also a rotation matrix, in the opposite direction to R. Simple 3D Rotation  cos    sin   0  sin  cos 0 0  x  0  y   1  z 1 1 1 x y z 2 . . . 2 2 Rotation about z axis. Rotates x,y coordinates. Leaves z coordinates fixed. x y z n n n      Full 3D Rotation  cos  R    sin   0  sin  cos 0 0  cos   0  0 1   sin  0 sin   1 0  1 0  0 cos 0 cos   0  sin  0   sin   cos  • Any rotation can be expressed as combination of three rotations about three axes.  1 0 0   RR   0 1 0   0 0 1   T • Rows (and columns) of R are orthonormal vectors. • R has determinant 1 (not -1). Questions? Putting it Together 3D Translation  r Scale  1 0 0 t   r  1 0 0  s  0 1 0 t  r  0 1 0    0 0 1 t  Projection 0 1,1 x 2 ,1 y 3 ,1 z r r r 0 1, 2 2,2 3, 2 r r r 0 1, 3 2,3 3,3 0  0 P 0  1 3D Rotation s s   s s where 1,1 1, 2 s 2 ,1 2,2 s 1, 3 2,3 st   P st  x y (s , s , s )  (s , s , s )  0 1,1 1, 2 1, 3 2 ,1 2,2 2,3 (s , s , s )  (s , s , s ) 1,1 1, 2 1, 3 2 ,1 2,2 2,3 We can just write stx as tx and sty as ty. Affine Structure from Motion s s  s s where 1,1 1, 2 s 2 ,1 2,2 s t  P t 1, 3 x 2,3 y (s , s , s )  (s , s , s )  0 1,1 1, 2 1, 3 2 ,1 2,2 2,3 (s , s , s )  (s , s , s ) 1,1 1, 2 1, 3 2 ,1 2,2 2,3 Affine Structure-from-Motion: Two Frames (1) u  v u  v 1 1 1 1 2 1 2 1 u v u v 1 2 1 2 2 2 2 2 . . . u v u v 1 n 1 n 2 n 2 n  s    s  s    s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 s s s s 1 1, 2 1 2,2 2 1, 2 2 2,2 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 t t t t 1 x 1 y 2 x 2 y  x   y  z   1 1 1 1 x y z 1 2 2 2 . . . x  y z   1 n n n Affine Structure-from-Motion: Two Frames (2) x  y z  1 1 To make things easy, suppose: 1 1 x y z 1 2 2 2 x y z 1 3 3 3 x  0   y  0  z  0   1  1 4 4 4 1 0 0 1 0 1 0 1 0  0 1  1 Affine Structure-from-Motion: Two Frames (3) u  v u  v u v u v 1 1 1 1 . 1 2 . . u v u v 1 2 2 1 2 1 1 n 1 n 2 2 2 2 2 n 2 n  s    s  s    s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 s s s s 1 1, 2 1 2,2 2 1, 2 2 2,2 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 t t t t  x   y  z   1 1 x 1 1 y 1 2 x 1 2 y x y z 1 . 1 0 0 1 0 1 0 1 2 . 2 1 1 1 1 2 1 2 1 u v u v 1 2 1 2 2 2 2 2 u v u v 1 3 1 3 2 3 2 3 u v u v 1 4 1 4 2 4 2 4  s    s  s    s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 s s s s 1 1, 2 1 2,2 2 1, 2 2 2,2 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 t t t t 1 x 1 y 2 x 2 y  0   0  0   1 x  y z   1 n n 2 n Looking at the first four points, we get: u  v u  v . 0  0 1  1 Affine Structure-from-Motion: Two Frames (5) u  v u  v 1 k 1 k 2 k 2 k  s    s  s    s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 s s s s 1 1, 2 1 2,2 2 1, 2 2 2,2 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 t t t t 1 x 1 y 2 x 2 y  x     y   z     1  k k k Once we know the motion, we can use the images of another point to solve for the structure. We have four linear equations, with three unknowns. Affine Structure-from-Motion: Two Frames (7) But, what if the first four points aren’t so simple? Then we define A so that: x  y  A z  1 1 1 1 x y z 1 2 2 2 x y z 1 3 3 3 x  0   y  0  z  0   1  1 4 4 4 1 0 0 1 0 1 0 1 0  0 1  1 This is always possible as long as the points aren’t coplanar. Affine Structure-from-Motion: Two Frames (8) u  v u  v Then, given: 1 1 1 1 2 2 1 We have: u v u v 1 1 1 1 2 2 1 1 1 And: 1 1 2 1 2 1 . 2 2 2 2 . 1 2 1 2 . . u v u v 1 2 2 2 2 2 1 n 1 n 2 2 2 2 n 2 n . 1 n 1 2 n 2 n 1 . u v u v n 2 u v u v . 1 2 1 u  v u  v . 1 2 2 1 u  v u  v u v u v . u v u v 1 n 1 n 2 n 2 n  s    s  s    s s s s s 2 ,1 2,2 2 1,1 1, 2 2 2 ,1 1 1 2 ,1 2 1,1 2 2 ,1 1 1, 2 1 2,2 2 1, 2 2 2,2 1 2 1, 3 2 2,2 s s s s t t t t 1 2,3 2 2 2,3 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 t t t t  x    y A A  z   1  x y z 1 . 1 2 2 2,3 1, 3 1 . 1, 3 2 s s s s 1 x y z 1 2,3 2 2,2 1, 2 1 1 1, 2 2  x   y  z   1 1 1, 3 2,2 2 2 ,1 1,1 1,1 1 1,1 1 s s s s 1 1, 2 2 ,1  s    s  s    s  s    s  s    s s s s s 1 1,1 1 x 1 y 2 x 2 y 1 x y 2 x 1 y 2 x 2 y 1 2 1 1 1 1 0  0 0  1 . 2 . . x  y z   1 n n 2 0 0 1 1 n n 2 0 1 0 1 x  y z   1 n 2 1 1 0 0 1 . 2 1 2    A   1 x 1 1 y t t t t n . . . x  y z   1 n n n Affine Structure-from-Motion: Two Frames (9) u  v u  v 1 1 Given: 1 1 2 1 2 1 u v u v . 1 2 . . u v u v 1 2 1 n 1 n 2 2 2 2 2 n 2 n  s    s  s    s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 s s s s s s s s 1 1, 2 1 1, 3 1 2,2 1 2,3 2 1, 2 2 1, 3 2 2,2 2 2,3 t t t t 1 x 1 y 2 x 2 y    A   1 0  0 0  1 1 0 0 1 0 1 0 1 0 0 1 1 . Then we just pretend that: s  s s  s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 s s s s 1 1, 2 1 2,2 2 1, 2 2 2,2 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 t t t t 1 x 1 y 2 x 2 y    A    1 is our motion, and solve as before. . . x  y z   1 n n n Affine Structure-from-Motion: Two Frames (10) This means that we can never determine the exact 3D structure of the scene. We can only determine it up to some transformation, A. Since if a structure and motion explains the points: u  v u  v 1 1 1 1 2 1 2 1 u v u v 1 2 . . . u v u v 1 2 n 1 n 2 2 2 2 So does another of the form: 1 2 n 2 n u  v u  v 1 1 1 1 2 1 2 1 u v u v 1 2 1 2 2 2 2 2  s    s  s    s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 . . . s s s s 1 1, 2 1 2,2 2 1, 2 2 2,2 u v u v 1 n 1 n 2 n 2 n s s s s t t t t 1 1, 3 1 2,3 2 1, 3 2 2,3   s     s    s      s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 1 x 1 y 2 x 2 y  x   y  z   1 x y z 1 1 1 1 1, 2 1 2,2 2 1, 2 2 2,2 . x  y z   1 . n 2 1 s s s s . 2 n 2 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 n t t t t 1 x 1 y 2 x 2 y    A     x     y  A z      1 1 1 1 1 x y z 1 2 2 2 . . . x   y  z   1   n n n Affine Structure-from-Motion: Two Frames (11) u  v u  v 1 1 1 1 2 1 2 1 u v u v 1 2 1 2 2 2 2 2 . . . u v u v 1 n 1 n 2 n 2 n   s     s    s      s Note that A has the form: A corresponds to translation of the points, plus a linear transformation. 1 1,1 1 2 ,1 2 1,1 2 2 ,1 s s s s 1 1, 2 1 2,2 2 1, 2 2 2,2 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 t t t t 1 x 1 y 2 x 2 y    A   a  a  A a  0 1,1 2 ,1 3 ,1   x     y  A z      1 1 1 1 1 a a a 0 1, 2 2,2 3, 2 x y z 1 2 . . x   y  z    1  . n 2 n 2 n a a a 0 1, 3 2,3 3,3 a a a 1 1, 4 2,4 3, 4       Let’s take an explicit example of this. Suppose we have a cube with vertices at: (0,0,0) (10,0,0), (0,0,10)….. Suppose we transform this by rotating it by 45 degrees about the y axis. Then the transformation matrix is: [sq2 0 –sq2 0; 0 1 0 0]. Now suppose instead we had a rectanguloid with corners at (10, 0, 0) (11,0,0), (10, 0, 10) …. We can transform this rectanguloid into the first cube by transforming it as: [10 0 0 -100; 0 1 0 0; 0 0 1 0; 0 0 0 1]. Then we can apply [sq2 0 –sq2 0; 0 1 0 0] to the resulting cube, to generate the same image. Or, we could have combined these two transformations into one: [sq2 0 –sq2 0; 0 1 0 0]* [10 0 0 -100; 0 1 0 0; 0 0 1 0; 0 0 0 1] = [10sq2 0 –sq2 0; 0 1 0 0] Affine Structure-from-Motion: A lot of frames (1) u  v u  v  .   .  .  u v  1 1 1 1 2 1 2 1 m 1 m 1 u v u v 1 2 . . . 1 2 2 2 u v u 1 n 1 n 2 n 2 v . . . u v . . . u v 2 m 2 n m 2 m 2 n . . I . m n  s    s  s    s           s  s   1 1,1 1 2 ,1 2 1,1 2 2 ,1 m 1,1 m 2 ,1 s s s 1 1, 2 1 2,2 2 1, 2 s . . . s s 2 2,2 m 1, 2 m 2,2 S s s s 1 1, 3 1 2,3 2 1, 3 s 2 2,3      x t   y  z   1  t  t  t t t 1 x 1 y 2 x 2 1 y 1 1 s s m 1, 3 m 2,3 x y z 1 2 . 2 . . x  y z   1 n n 2 n m x m y P First Step: Solve for Translation (1) • This is trivial, because we can pick a simple origin. – World origin is arbitrary. – Example: We can assume first point is at origin. • Rotation then doesn’t effect that point. • All its motion is translation. – Better to pick center of mass as origin. • Average of all points. • This also averages all noise. More explicitly, suppose sum(p) = (0,0,0,n)^T. Then, sum(R*P) = R*(sum(P)) = R*(0,0,0,n)^T = (0,0,0,n)^T. Sum(T*R*P) = T*(0,0,0,n)^T = (ntx,nty,ntz,n)^T. (Or just look at the 2x4 projection matrix). If we subtract tx or ty from every row, then the residual is (s11,s12,s13;s21,s22,s23)*P. I = s part of matrix + t part of matrix. r  1 0 0 t   r  1 0 0  s  0 1 0 t  r  0 1 0    0 0 1 t  0 1,1 x 2 ,1 y 3 ,1 z r r r 0 1, 2 2,2 3, 2 r r r 0 1, 3 2,3 3,3 0  0 P  0  1 First Step: Solve for Translation (2) x     0 n  i   WLOG :   y    0  i  1 i   0  z     i n u  i u k 1 i k n u  v u  v ~ I      u v  1 1 1 1 2 1 2 1 m 1 m n v  i v k 1 i 1 k n u v u 1 v 2 . . . u v 1 u u v v u u 1 v v 2 1 1 2 1 . . . 2 2 2 m m m 2 2 2 2 n m m 2 m 1 n 2 . . . u u v v 1 n 2 2 1 n 1 2 2 u u   v v  u u   v v  .   .  .   u u  v  v  m n m . . . m n m First Step: Solve for Translation (3)  u~ ~ v  u~ ~ v  .   .  .   u~  v~  1 1 2 u~ v~ u~ v~ . . . u~ v~ . . . u~ v~ 1 2 1 1 . . . 1 2 2 1 2 m 1 2 2 n 1 2 2 2 m m n . . . n 1 1,1 1 2 1,1 2 2 ,1 n m  s    s  s    s           s  s   2 ,1 n 2 m 1 n 2 1 1 u~ v~ u~ v~ m m 1,1 m 2 ,1 s 1 1, 2 s s s . . 1 2,2 2 1, 2 2 2,2 . s s m 1, 2 m 2,2 s   s  s   s  x  y   z   s  s  1 1, 3 1 2,3 2 1, 3 1 x y 1 z 2 2,3 1 2 . 2 2 m 1, 3 m 2,3 As if by magic, there’s no translation. . . x  y z  n n n Rank Theorem  u~ ~ v  u~ ~ v  .   .  .   u~  v~  1 1 2 u~ v~ u~ v~ . . . u~ v~ . . . u~ v~ 1 2 1 1 . . . 1 2 2 1 2 m 1 2 2 1 2 2 2 m m n . . ~ I . n 1 1,1 1 2 1,1 2 2 ,1 n m  s    s  s    s           s  s   2 ,1 n 2 m 1 n n 2 1 1 u~ v~ u~ v~ m m 1,1 m 2 ,1 s 1 1, 2 s s s . . 1 2,2 2 1, 2 2 2,2 . s s m 1, 2 m 2,2 S s   s  s   s  x  y   z   s  s  1 1, 3 1 2,3 2 1, 3 1 x y 1 z 2 2,3 m 1, 3 m 2,3 1 2 . 2 . . x  y z  n n 2 n P ~ I has rank 3. This means there are 3 vectors such that every ~ row of I is a linear combination of these vectors. These vectors are the rows of P. Solve for S • SVD is made to do this. ~ I  UDV D is diagonal with non-increasing values. U and V have orthonormal rows. Ignoring values that get set to 0, we have U(:,1:3) for S, and D(1:3,1:3)*V(1:3,:) for P. Linear Ambiguity (as before) ~ I = U(:,1:3) * D(1:3,1:3) * V(1:3,:) ~ I = (U(:,1:3) * A) * (inv(A) *D(1:3,1:3) * V(1:3,:)) Noise ~ • I has full rank. • Best solution is to estimate I that’s as near to ~ as possible, with estimate of I having rank 3. I • Our current method does this. Weak Perspective Motion  u~ ~ v  u~ ~ v  .   .  .   u~  v~  1 1 2 u~ v~ u~ v~ . . . u~ v~ . . . u~ v~ 1 2 1 1 . . . 1 2 2 1 2 m 1 2 2 1 2 2 2 m m n . . ~ I . n 1 1,1 1 2 1,1 2 2 ,1 n m  s    s  s    s           s  s   2 ,1 n 2 m 1 n n 2 1 1 u~ v~ u~ v~ m m 1,1 m 2 ,1 s 1 1, 2 s s s . . 1 2,2 2 1, 2 2 2,2 . s s m 1, 2 m 2,2 s   s  s   s  x  y   z   s  s  1 1, 3 1 2,3 2 1, 3 1 x y 1 z 2 2,3 m 1, 3 1 2 . 2 . . x  y z  n n 2 n P m 2,3 S ~ =(U(:,1:3)*A)*(inv(A) *D(1:3,1:3)*V(1:3,:)) I Row 2k and 2k+1 of S should be orthogonal. All rows should be unit vectors. (Push all scale into P). Choose A so (U(:,1:3) * A) satisfies these conditions. Related problems we won’t cover • Missing data. • Points with different, known noise. • Multiple moving objects. Final Messages • Structure-from-motion for points can be reduced to linear algebra. • Epipolar constraint reemerges. • SVD important. • Rank Theorem says the images a scene produces aren’t complicated (also important for recognition).

Structure-from-Motion

Related documents

Products

Support

Structure-from-Motion

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib