Announcements Structure-from-Motion • Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving camera. – Equivalently, we can think of the world as moving and the camera as fixed. • Like stereo, but the position of the camera isn’t known (and it’s more natural to use many images with little motion between them, not just two with a lot of motion). – We may or may not assume we know the parameters of the camera, such as its focal length. Structure-from-Motion • As with stereo, we can divide problem: – Correspondence. – Reconstruction. • Again, we’ll talk about reconstruction first. – So for the next few classes we assume that each image contains some points, and we know which points match which. Structure-from-Motion … Movie Reconstruction • A lot harder than with stereo. • Start with simpler case: scaled orthographic projection (weak perspective). – Recall, in this we remove the z coordinate and scale all x and y coordinates the same amount. First: Represent motion • We’ll talk about a fixed camera, and moving object. • Key point: Some matrix Points r r r t S xn x1 x2 r r r t y1 P z 1 1 y2 z2 1 . . . yn zn 1 1, 2 1, 3 x 2 ,1 2,2 2,3 y The image X X . . . I Y Y 1 2 I SP 1 Then: 1,1 2 X Y n n Structure-from-Motion • S encodes: – Projection: only two lines – Scaling, since S can have a scale factor. – Translation, by tx and ty. – Rotation: I SP Rotation r1,1 r2,1 r 3,1 r1, 2 r2, 2 r3, 2 r1,3 r2,3 P r3,3 Represents a 3D rotation of the points in P. First, look at 2D rotation (easier) Matrix R acts on points by rotating them. cos sin cos R sin sin x1 cos y1 x2 y2 . sin cos . . xn yn • Also, RRT = Identity. RT is also a rotation matrix, in the opposite direction to R. Simple 3D Rotation cos sin 0 sin cos 0 0 x 0 y 1 z 1 1 1 x y z 2 . . . 2 2 Rotation about z axis. Rotates x,y coordinates. Leaves z coordinates fixed. x y z n n n Full 3D Rotation cos R sin 0 sin cos 0 0 cos 0 0 1 sin 0 sin 1 0 1 0 0 cos 0 cos 0 sin 0 sin cos • Any rotation can be expressed as combination of three rotations about three axes. 1 0 0 RR 0 1 0 0 0 1 T • Rows (and columns) of R are orthonormal vectors. • R has determinant 1 (not -1). Putting it Together 3D Translation r Scale 1 0 0 t r 1 0 0 s 0 1 0 t r 0 1 0 0 0 1 t Projection 0 1,1 x 2 ,1 y 3 ,1 z r r r 0 1, 2 2,2 3, 2 r r r 0 1, 3 2,3 3,3 0 0 P 0 1 3D Rotation s s s s where 1,1 1, 2 s 2 ,1 2,2 s 1, 3 2,3 st P st x y (s , s , s ) (s , s , s ) 0 1,1 1, 2 1, 3 2 ,1 2,2 2,3 (s , s , s ) (s , s , s ) 1,1 1, 2 1, 3 2 ,1 2,2 2,3 We can just write stx as tx and sty as ty. Affine Structure from Motion s s s s where 1,1 1, 2 s 2 ,1 2,2 s t P t 1, 3 x 2,3 y (s , s , s ) (s , s , s ) 0 1,1 1, 2 1, 3 2 ,1 2,2 2,3 (s , s , s ) (s , s , s ) 1,1 1, 2 1, 3 2 ,1 2,2 2,3 Affine Structure-from-Motion: Two Frames (1) u v u v 1 1 1 1 2 1 2 1 u v u v 1 2 1 2 2 2 2 2 . . . u v u v 1 n 1 n 2 n 2 n s s s s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 s s s s 1 1, 2 1 2,2 2 1, 2 2 2,2 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 t t t t 1 x 1 y 2 x 2 y x y z 1 1 1 1 x y z 1 2 2 2 . . . x y z 1 n n n Affine Structure-from-Motion: Two Frames (2) x y z 1 1 To make things easy, suppose: 1 1 x y z 1 2 2 2 x y z 1 3 3 3 x 0 y 0 z 0 1 1 4 4 4 1 0 0 1 0 1 0 1 0 0 1 1 Affine Structure-from-Motion: Two Frames (3) u v u v u v u v 1 1 1 1 . 1 2 . . u v u v 1 2 2 1 2 1 1 n 1 n 2 2 2 2 2 n 2 n s s s s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 s s s s 1 1, 2 1 2,2 2 1, 2 2 2,2 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 t t t t x y z 1 1 x 1 1 y 1 2 x 1 2 y x y z 1 . 1 0 0 1 0 1 0 1 2 . 2 1 1 1 1 2 1 2 1 u v u v 1 2 1 2 2 2 2 2 u v u v 1 3 1 3 2 3 2 3 u v u v 1 4 1 4 2 4 2 4 s s s s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 s s s s 1 1, 2 1 2,2 2 1, 2 2 2,2 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 t t t t 1 x 1 y 2 x 2 y 0 0 0 1 x y z 1 n n 2 n Looking at the first four points, we get: u v u v . 0 0 1 1 Affine Structure-from-Motion: Two Frames (7) But, what if the first four points aren’t so simple? Then we define A so that: x y A z 1 1 1 1 x y z 1 2 2 2 x y z 1 3 3 3 x 0 y 0 z 0 1 1 4 4 4 1 0 0 1 0 1 0 1 0 0 1 1 This is always possible as long as the points aren’t coplanar. Affine Structure-from-Motion: Two Frames (8) u v u v Then, given: 1 1 1 1 2 2 1 We have: u v u v 1 1 1 1 2 2 1 1 1 And: 1 1 2 1 2 1 . 2 2 2 2 . 1 2 1 2 . . u v u v 1 2 2 2 2 2 1 n 1 n 2 2 2 2 n 2 n . 1 n 1 2 n 2 n 1 . u v u v n 2 u v u v . 1 2 1 u v u v . 1 2 2 1 u v u v u v u v . u v u v 1 n 1 n 2 n 2 n s s s s s s s s 2 ,1 2,2 2 1,1 1, 2 2 2 ,1 1 1 2 ,1 2 1,1 2 2 ,1 1 1, 2 1 2,2 2 1, 2 2 2,2 1 2 1, 3 2 2,2 s s s s t t t t 1 2,3 2 2 2,3 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 t t t t x y A A z 1 x y z 1 . 1 2 2 2,3 1, 3 1 . 1, 3 2 s s s s 1 x y z 1 2,3 2 2,2 1, 2 1 1 1, 2 2 x y z 1 1 1, 3 2,2 2 2 ,1 1,1 1,1 1 1,1 1 s s s s 1 1, 2 2 ,1 s s s s s s s s s s s s 1 1,1 1 x 1 y 2 x 2 y 1 x y 2 x 1 y 2 x 2 y 1 2 1 1 1 1 0 0 0 1 . 2 . . x y z 1 n n 2 0 0 1 1 n n 2 0 1 0 1 x y z 1 n 2 1 1 0 0 1 . 2 1 2 A 1 x 1 1 y t t t t n . . . x y z 1 n n n Affine Structure-from-Motion: Two Frames (9) u v u v 1 1 Given: 1 1 2 1 2 1 u v u v . 1 2 . . u v u v 1 2 1 n 1 n 2 2 2 2 2 n 2 n s s s s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 s s s s s s s s 1 1, 2 1 1, 3 1 2,2 1 2,3 2 1, 2 2 1, 3 2 2,2 2 2,3 t t t t 1 x 1 y 2 x 2 y A 1 0 0 0 1 1 0 0 1 0 1 0 1 0 0 1 1 . Then we just pretend that: s s s s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 s s s s 1 1, 2 1 2,2 2 1, 2 2 2,2 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 t t t t 1 x 1 y 2 x 2 y A 1 is our motion, and solve as before. . . x y z 1 n n n Affine Structure-from-Motion: Two Frames (10) This means that we can never determine the exact 3D structure of the scene. We can only determine it up to some transformation, A. Since if a structure and motion explains the points: u v u v 1 1 1 1 2 1 2 1 u v u v 1 2 . . . u v u v 1 2 n 1 n 2 2 2 2 So does another of the form: 1 2 n 2 n u v u v 1 1 1 1 2 1 2 1 u v u v 1 2 1 2 2 2 2 2 s s s s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 . . . s s s s 1 1, 2 1 2,2 2 1, 2 2 2,2 u v u v 1 n 1 n 2 n 2 n s s s s t t t t 1 1, 3 1 2,3 2 1, 3 2 2,3 s s s s 1 1,1 1 2 ,1 2 1,1 2 2 ,1 1 x 1 y 2 x 2 y x y z 1 x y z 1 1 1 1 1, 2 1 2,2 2 1, 2 2 2,2 . x y z 1 . n 2 1 s s s s . 2 n 2 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 n t t t t 1 x 1 y 2 x 2 y A x y A z 1 1 1 1 1 x y z 1 2 2 2 . . . x y z 1 n n n Affine Structure-from-Motion: Two Frames (11) u v u v 1 1 1 1 2 1 2 1 u v u v 1 2 1 2 2 2 2 2 . . . u v u v 1 n 1 n 2 n 2 n s s s s Note that A has the form: A corresponds to translation of the points, plus a linear transformation. 1 1,1 1 2 ,1 2 1,1 2 2 ,1 s s s s 1 1, 2 1 2,2 2 1, 2 2 2,2 s s s s 1 1, 3 1 2,3 2 1, 3 2 2,3 t t t t 1 x 1 y 2 x 2 y A a a A a 0 1,1 2 ,1 3 ,1 x y A z 1 1 1 1 1 a a a 0 1, 2 2,2 3, 2 x y z 1 2 . . x y z 1 . n 2 n 2 n a a a 0 1, 3 2,3 3,3 a a a 1 1, 4 2,4 3, 4