Structure-from-Motion

advertisement
Structure-from-Motion
• Determining the 3-D structure of the world, and/or the
motion of a camera using a sequence of images
taken by a moving camera.
– Equivalently, we can think of the world as moving and the
camera as fixed.
• Like stereo, but the position of the camera isn’t
known (and it’s more natural to use many images
with little motion between them, not just two with a lot
of motion).
– We may or may not assume we know the parameters of the
camera, such as its focal length.
Structure-from-Motion
• As with stereo, we can divide problem:
– Correspondence.
– Reconstruction.
• Again, we’ll talk about reconstruction
first.
– So for the next few classes we assume that
each image contains some points, and we
know which points match which.
Structure-from-Motion
…
Movie
Reconstruction
• A lot harder than with stereo.
• Start with simpler case: scaled
orthographic projection (weak
perspective).
– Recall, in this we remove the z coordinate
and scale all x and y coordinates the same
amount.
Announcements
• Quiz on April 22 in class (a week from
Thursday).
– Practice problems handed out Thursday.
– Reviews at office hours, possibly in
seminar room across the hall from my
office.
• Questions about PS6
First: Represent motion
• We’ll talk about a fixed camera, and moving object.
• Key point:
Some matrix
 x1

 y1
P
z
 1
1

Points
x2
y2 . .
z2
1
.
xn 

yn 
zn 

1 
s
S  
s
1, 2
2 ,1
2,2
s
s
1, 3
2,3
The image
u u . . .
I 
v v
1
2
I  SP
1
Then:
s
s
1,1
2
t
t
x
y



u

v
n
n
Structure-from-Motion
• S encodes:
– Projection: only two lines
– Scaling, since S can have a scale factor.
– Translation, by tx/s and ty/s.
– Rotation:
I  SP
Rotation
 r1,1

 r2,1
r
 3,1
r1, 2
r2, 2
r3, 2
r1,3 

r2,3  P

r3,3 
Represents a
3D rotation of
the points in P.
First, look at 2D rotation
(easier)
Matrix R acts
on points by
rotating them.
 cos

  sin 
 cos 
R  
  sin 
sin   x1

cos  y1
x2
y2
.
sin  

cos  
.
.
xn 

yn 
• Also, RRT = Identity. RT is also a rotation
matrix, in the opposite direction to R.
Simple 3D Rotation
 cos

  sin 
 0

sin 
cos
0
0  x

0  y


1  z
1
1
1
x
y
z
2
.
.
.
2
2
Rotation about z axis.
Rotates x,y coordinates. Leaves z coordinates fixed.
x
y
z
n
n
n





Full 3D Rotation
 cos

R    sin 
 0

sin 
cos
0
0  cos 

0  0
1   sin 
0 sin   1
0

1
0  0 cos
0 cos   0  sin 
0 

sin  
cos 
• Any rotation can be expressed as combination of three
rotations about three axes.
 1 0 0


RR   0 1 0 
 0 0 1


T
• Rows (and columns) of R are
orthonormal vectors.
• R has determinant 1 (not -1).
Questions?
Putting it Together
3D Translation  r
Scale
 1 0 0 t 
 r
 1 0 0 
s
 0 1 0 t 
r
 0 1 0 

 0 0 1 t 
Projection
0
1,1
x
2 ,1
y
3 ,1
z
r
r
r
0
1, 2
2,2
3, 2
r
r
r
0
1, 3
2,3
3,3
0

0
P
0

1
3D Rotation
s
s
 
s
s
where
1,1
1, 2
s
2 ,1
2,2
s
1, 3
2,3
st 
 P
st 
x
y
(s , s , s )  (s , s , s )  0
1,1
1, 2
1, 3
2 ,1
2,2
2,3
(s , s , s )  (s , s , s )
1,1
1, 2
1, 3
2 ,1
2,2
2,3
We can just write stx as
tx and sty as ty.
Affine Structure from Motion
s
s

s
s
where
1,1
1, 2
s
2 ,1
2,2
s
t
 P
t
1, 3
x
2,3
y
(s , s , s )  (s , s , s )  0
1,1
1, 2
1, 3
2 ,1
2,2
2,3
(s , s , s )  (s , s , s )
1,1
1, 2
1, 3
2 ,1
2,2
2,3
Affine Structure-from-Motion: Two
Frames (1)
u

v
u

v
1
1
1
1
2
1
2
1
u
v
u
v
1
2
1
2
2
2
2
2
.
.
.
u
v
u
v
1
n
1
n
2
n
2
n
 s
 
 s
 s
 
 s
1
1,1
1
2 ,1
2
1,1
2
2 ,1
s
s
s
s
1
1, 2
1
2,2
2
1, 2
2
2,2
s
s
s
s
1
1, 3
1
2,3
2
1, 3
2
2,3
t
t
t
t
1
x
1
y
2
x
2
y
 x

 y
 z

 1
1
1
1
x
y
z
1
2
2
2
.
.
.
x

y
z 

1
n
n
n
Affine Structure-from-Motion: Two
Frames (2)
x

y
z

1
1
To make things
easy, suppose:
1
1
x
y
z
1
2
2
2
x
y
z
1
3
3
3
x  0
 
y  0

z  0
 
1  1
4
4
4
1
0
0
1
0
1
0
1
0

0
1

1
Affine Structure-from-Motion: Two
Frames (3)
u

v
u

v
u
v
u
v
1
1
1
1
.
1
2
.
.
u
v
u
v
1
2
2
1
2
1
1
n
1
n
2
2
2
2
2
n
2
n
 s
 
 s
 s
 
 s
1
1,1
1
2 ,1
2
1,1
2
2 ,1
s
s
s
s
1
1, 2
1
2,2
2
1, 2
2
2,2
s
s
s
s
1
1, 3
1
2,3
2
1, 3
2
2,3
t
t
t
t
 x

 y
 z

 1
1
x
1
1
y
1
2
x
1
2
y
x
y
z
1
.
1
0
0
1
0
1
0
1
2
.
2
1
1
1
1
2
1
2
1
u
v
u
v
1
2
1
2
2
2
2
2
u
v
u
v
1
3
1
3
2
3
2
3
u
v
u
v
1
4
1
4
2
4
2
4
 s
 
 s
 s
 
 s
1
1,1
1
2 ,1
2
1,1
2
2 ,1
s
s
s
s
1
1, 2
1
2,2
2
1, 2
2
2,2
s
s
s
s
1
1, 3
1
2,3
2
1, 3
2
2,3
t
t
t
t
1
x
1
y
2
x
2
y
 0

 0
 0

 1
x

y
z 

1
n
n
2
n
Looking at the first four points, we get:
u

v
u

v
.
0

0
1

1
Affine Structure-from-Motion: Two
Frames (5)
u

v
u

v
1
k
1
k
2
k
2
k
 s
 
 s
 s
 
 s
1
1,1
1
2 ,1
2
1,1
2
2 ,1
s
s
s
s
1
1, 2
1
2,2
2
1, 2
2
2,2
s
s
s
s
1
1, 3
1
2,3
2
1, 3
2
2,3
t
t
t
t
1
x
1
y
2
x
2
y
 x 
 
 y 
 z 
 
 1 
k
k
k
Once we know the motion, we can use the images of
another point to solve for the structure. We have four
linear equations, with three unknowns.
Affine Structure-from-Motion: Two
Frames (7)
But, what if the first four points aren’t so simple?
Then we define A so that:
x

y

A
z

1
1
1
1
x
y
z
1
2
2
2
x
y
z
1
3
3
3
x  0
 
y  0

z  0
 
1  1
4
4
4
1
0
0
1
0
1
0
1
0

0
1

1
This is always possible as long as the points
aren’t coplanar.
Affine Structure-from-Motion: Two
Frames (8)
u

v
u

v
Then,
given:
1
1
1
1
2
2
1
We have:
u
v
u
v
1
1
1
1
2
2
1
1
1
And:
1
1
2
1
2
1
.
2
2
2
2
.
1
2
1
2
.
.
u
v
u
v
1
2
2
2
2
2
1
n
1
n
2
2
2
2
n
2
n
.
1
n
1
2
n
2
n
1
.
u
v
u
v
n
2
u
v
u
v
.
1
2
1
u

v
u

v
.
1
2
2
1
u

v
u

v
u
v
u
v
.
u
v
u
v
1
n
1
n
2
n
2
n
 s
 
 s
 s
 
 s
s
s
s
s
2 ,1
2,2
2
1,1
1, 2
2
2 ,1
1
1
2 ,1
2
1,1
2
2 ,1
1
1, 2
1
2,2
2
1, 2
2
2,2
1
2
1, 3
2
2,2
s
s
s
s
t
t
t
t
1
2,3
2
2
2,3
s
s
s
s
1
1, 3
1
2,3
2
1, 3
2
2,3
t
t
t
t

x



y
A
A

z


1

x
y
z
1
.
1
2
2
2,3
1, 3
1
.
1, 3
2
s
s
s
s
1
x
y
z
1
2,3
2
2,2
1, 2
1
1
1, 2
2
 x

 y
 z

 1
1
1, 3
2,2
2
2 ,1
1,1
1,1
1
1,1
1
s
s
s
s
1
1, 2
2 ,1
 s
 
 s
 s
 
 s
 s
 
 s
 s
 
 s
s
s
s
s
1
1,1
1
x
1
y
2
x
2
y
1
x
y
2
x
1
y
2
x
2
y
1
2
1
1
1
1
0

0
0

1
.
2
.
.
x

y
z 

1
n
n
2
0
0
1
1
n
n
2
0
1
0
1
x

y
z 

1
n
2
1
1
0
0
1
.
2
1
2



A


1
x
1
1
y
t
t
t
t
n
.
.
.
x

y
z 

1
n
n
n
Affine Structure-from-Motion: Two
Frames (9)
u

v
u

v
1
1
Given:
1
1
2
1
2
1
u
v
u
v
.
1
2
.
.
u
v
u
v
1
2
1
n
1
n
2
2
2
2
2
n
2
n
 s
 
 s
 s
 
 s
1
1,1
1
2 ,1
2
1,1
2
2 ,1
s
s
s
s
s
s
s
s
1
1, 2
1
1, 3
1
2,2
1
2,3
2
1, 2
2
1, 3
2
2,2
2
2,3
t
t
t
t
1
x
1
y
2
x
2
y



A


1
0

0
0

1
1
0
0
1
0
1
0
1
0
0
1
1
.
Then we just pretend that:
s

s
s

s
1
1,1
1
2 ,1
2
1,1
2
2 ,1
s
s
s
s
1
1, 2
1
2,2
2
1, 2
2
2,2
s
s
s
s
1
1, 3
1
2,3
2
1, 3
2
2,3
t
t
t
t
1
x
1
y
2
x
2
y



A



1
is our motion,
and solve as
before.
.
.
x

y
z 

1
n
n
n
Affine Structure-from-Motion: Two
Frames (10)
This means that we can never determine the exact 3D
structure of the scene. We can only determine it up to
some transformation, A. Since if a structure and motion
explains the points:
u

v
u

v
1
1
1
1
2
1
2
1
u
v
u
v
1
2
.
.
.
u
v
u
v
1
2
n
1
n
2
2
2
2
So does
another of
the form:
1
2
n
2
n
u

v
u

v
1
1
1
1
2
1
2
1
u
v
u
v
1
2
1
2
2
2
2
2
 s
 
 s
 s
 
 s
1
1,1
1
2 ,1
2
1,1
2
2 ,1
.
.
.
s
s
s
s
1
1, 2
1
2,2
2
1, 2
2
2,2
u
v
u
v
1
n
1
n
2
n
2
n
s
s
s
s
t
t
t
t
1
1, 3
1
2,3
2
1, 3
2
2,3
  s
 
  s
   s
  
  s
1
1,1
1
2 ,1
2
1,1
2
2 ,1
1
x
1
y
2
x
2
y
 x

 y
 z

 1
x
y
z
1
1
1
1
1, 2
1
2,2
2
1, 2
2
2,2
.
x

y
z 

1
.
n
2
1
s
s
s
s
.
2
n
2
s
s
s
s
1
1, 3
1
2,3
2
1, 3
2
2,3
n
t
t
t
t
1
x
1
y
2
x
2
y



A


  x
 
  y
 A z
 

  1
1
1
1
1
x
y
z
1
2
2
2
.
.
.
x 

y 
z 

1  
n
n
n
Affine Structure-from-Motion: Two
Frames (11)
u

v
u

v
1
1
1
1
2
1
2
1
u
v
u
v
1
2
1
2
2
2
2
2
.
.
.
u
v
u
v
1
n
1
n
2
n
2
n
  s
 
  s
   s
  
  s
Note that A has
the form:
A corresponds to
translation of the
points, plus a
linear
transformation.
1
1,1
1
2 ,1
2
1,1
2
2 ,1
s
s
s
s
1
1, 2
1
2,2
2
1, 2
2
2,2
s
s
s
s
1
1, 3
1
2,3
2
1, 3
2
2,3
t
t
t
t
1
x
1
y
2
x
2
y



A


a

a

A
a

0
1,1
2 ,1
3 ,1
  x
 
  y
 A z
 

  1
1
1
1
1
a
a
a
0
1, 2
2,2
3, 2
x
y
z
1
2
.
.
x 

y 
z 
 
1 
.
n
2
n
2
n
a
a
a
0
1, 3
2,3
3,3
a
a
a
1
1, 4
2,4
3, 4






Let’s take an explicit example of this. Suppose we have a cube with
vertices at: (0,0,0) (10,0,0), (0,0,10)….. Suppose we transform this by
rotating it by 45 degrees about the y axis. Then the transformation matrix
is: [sq2 0 –sq2 0; 0 1 0 0]. Now suppose instead we had a rectanguloid
with corners at (10, 0, 0) (11,0,0), (10, 0, 10) …. We can transform this
rectanguloid into the first cube by transforming it as: [10 0 0 -100; 0 1 0 0;
0 0 1 0; 0 0 0 1]. Then we can apply [sq2 0 –sq2 0; 0 1 0 0] to the
resulting cube, to generate the same image. Or, we could have combined
these two transformations into one:
[sq2 0 –sq2 0; 0 1 0 0]* [10 0 0 -100; 0 1 0 0; 0 0 1 0; 0 0 0 1]
= [10sq2 0 –sq2 0; 0 1 0 0]
Affine Structure-from-Motion: A
lot of frames (1)
u

v
u

v
 .

 .
 .

u
v

1
1
1
1
2
1
2
1
m
1
m
1
u
v
u
v
1
2
.
.
.
1
2
2
2
u
v
u
1
n
1
n
2
n
2
v
.
.
.
u
v
.
.
.
u
v
2
m
2
n
m
2
m
2
n
.
.
I
.
m
n
 s
 
 s
 s
 
 s

 
 
 
 
 s
 s
 
1
1,1
1
2 ,1
2
1,1
2
2 ,1
m
1,1
m
2 ,1
s
s
s
1
1, 2
1
2,2
2
1, 2
s
.
.
.
s
s
2
2,2
m
1, 2
m
2,2
S
s
s
s
1
1, 3
1
2,3
2
1, 3
s
2
2,3




 x
t 
 y
 z

 1

t 
t 
t
t
t
1
x
1
y
2
x
2
1
y
1
1
s
s
m
1, 3
m
2,3
x
y
z
1
2
.
2
.
.
x

y
z 

1
n
n
2
n
m
x
m
y
P
First Step: Solve for Translation
(1)
• This is trivial, because we can pick a simple
origin.
– World origin is arbitrary.
– Example: We can assume first point is at origin.
• Rotation then doesn’t effect that point.
• All its motion is translation.
– Better to pick center of mass as origin.
• Average of all points.
• This also averages all noise.
More explicitly, suppose sum(p) = (0,0,0,n)^T. Then,
sum(R*P) = R*(sum(P)) = R*(0,0,0,n)^T = (0,0,0,n)^T.
Sum(T*R*P) = T*(0,0,0,n)^T = (ntx,nty,ntz,n)^T. (Or
just look at the 2x4 projection matrix). If we subtract tx
or ty from every row, then the residual is
(s11,s12,s13;s21,s22,s23)*P. I = s part of matrix + t
part of matrix.
r
 1 0 0 t 
 r
 1 0 0 
s
 0 1 0 t 
r
 0 1 0 

 0 0 1 t 
0
1,1
x
2 ,1
y
3 ,1
z
r
r
r
0
1, 2
2,2
3, 2
r
r
r
0
1, 3
2,3
3,3
0

0
P

0

1
First Step: Solve for Translation
(2)
x 
   0
n  i  
WLOG :   y    0 
i  1 i   0 
z   
 i
n
u 
i
u
k 1
i
k
n
u

v
u

v
~
I 




u
v

1
1
1
1
2
1
2
1
m
1
m
n
v 
i
v
k 1
i
1
k
n
u
v
u
1
v
2
.
.
.
u
v
1
u u
v v
u u
1
v v
2
1
1
2
1
.
.
.
2
2
2
m
m
m
2
2
2
2
n
m
m
2
m
1
n
2
.
.
.
u u
v v
1
n
2
2
1
n
1
2
2
u u 

v v 
u u 

v v 
. 

. 
. 

u u 
v  v 
m
n
m
.
.
.
m
n
m
First Step: Solve for Translation
(3)
 u~
~
v
 u~
~
v
 .

 .
 .

 u~
 v~

1
1
2
u~
v~
u~
v~
.
.
.
u~
v~
.
.
.
u~
v~
1
2
1
1
.
.
.
1
2
2
1
2
m
1
2
2
n
1
2
2
2
m
m
n
.
.
.
n
1
1,1
1
2
1,1
2
2 ,1
n
m
 s
 
 s
 s
 
 s

 
 
 
 
 s
 s
 
2 ,1
n
2
m
1
n
2
1
1
u~
v~
u~
v~
m
m
1,1
m
2 ,1
s
1
1, 2
s
s
s
.
.
1
2,2
2
1, 2
2
2,2
.
s
s
m
1, 2
m
2,2
s 

s 
s 

s  x
 y

 z


s 
s 
1
1, 3
1
2,3
2
1, 3
1
x
y
1
z
2
2,3
1
2
.
2
2
m
1, 3
m
2,3
As if by magic, there’s no translation.
.
.
x

y
z 
n
n
n
Rank Theorem
 u~
~
v
 u~
~
v
 .

 .
 .

 u~
 v~

1
1
2
u~
v~
u~
v~
.
.
.
u~
v~
.
.
.
u~
v~
1
2
1
1
.
.
.
1
2
2
1
2
m
1
2
2
1
2
2
2
m
m
n
.
.
~
I
.
n
1
1,1
1
2
1,1
2
2 ,1
n
m
 s
 
 s
 s
 
 s

 
 
 
 
 s
 s
 
2 ,1
n
2
m
1
n
n
2
1
1
u~
v~
u~
v~
m
m
1,1
m
2 ,1
s
1
1, 2
s
s
s
.
.
1
2,2
2
1, 2
2
2,2
.
s
s
m
1, 2
m
2,2
S
s 

s 
s 

s  x
 y

 z


s 
s 
1
1, 3
1
2,3
2
1, 3
1
x
y
1
z
2
2,3
m
1, 3
m
2,3
1
2
.
2
.
.
x

y
z 
n
n
2
n
P
~
I has rank 3.
This means there
are 3 vectors
such that every
~
row of I is a
linear
combination of
these vectors.
These vectors
are the rows of P.
Solve for S
• SVD is made to do this.
~
I  UDV
D is diagonal with non-increasing
values.
U and V have orthonormal rows.
Ignoring values that get set to 0, we
have U(:,1:3) for S, and
D(1:3,1:3)*V(1:3,:) for P.
Linear Ambiguity (as before)
~
I = U(:,1:3) * D(1:3,1:3) * V(1:3,:)
~
I = (U(:,1:3) * A) * (inv(A) *D(1:3,1:3) * V(1:3,:))
Noise
~
• I has full rank.
• Best solution is to estimate I that’s as near to
~ as possible, with estimate of I having rank 3.
I
• Our current method does this.
Weak Perspective Motion
 u~
~
v
 u~
~
v
 .

 .
 .

 u~
 v~

1
1
2
u~
v~
u~
v~
.
.
.
u~
v~
.
.
.
u~
v~
1
2
1
1
.
.
.
1
2
2
1
2
m
1
2
2
1
2
2
2
m
m
n
.
.
~
I
.
n
1
1,1
1
2
1,1
2
2 ,1
n
m
 s
 
 s
 s
 
 s

 
 
 
 
 s
 s
 
2 ,1
n
2
m
1
n
n
2
1
1
u~
v~
u~
v~
m
m
1,1
m
2 ,1
s
1
1, 2
s
s
s
.
.
1
2,2
2
1, 2
2
2,2
.
s
s
m
1, 2
m
2,2
s 

s 
s 

s  x
 y

 z


s 
s 
1
1, 3
1
2,3
2
1, 3
1
x
y
1
z
2
2,3
m
1, 3
1
2
.
2
.
.
x

y
z 
n
n
2
n
P
m
2,3
S
~ =(U(:,1:3)*A)*(inv(A) *D(1:3,1:3)*V(1:3,:))
I
Row 2k and
2k+1 of S should
be orthogonal.
All rows should
be unit vectors.
(Push all scale
into P).
Choose A so
(U(:,1:3) * A)
satisfies these
conditions.
Related problems we won’t cover
• Missing data.
• Points with different, known noise.
• Multiple moving objects.
Final Messages
• Structure-from-motion for points can be
reduced to linear algebra.
• Epipolar constraint reemerges.
• SVD important.
• Rank Theorem says the images a
scene produces aren’t complicated
(also important for recognition).
Download