Uploaded by keat950706

Computer Vision II Multiple View Geometry

advertisement
Mathematical
Background:
Linear Algebra
Chapter 1
Mathematical Background:
Linear Algebra
Multiple View Geometry
Summer 2019
Prof. Daniel Cremers
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
Prof. Daniel Cremers
Chair for Computer Vision and Artificial Intelligence
Departments of Informatics & Mathematics
Technical University of Munich
updated May 6, 2019 1/28
Overview
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
1 Vector Spaces
Vector Spaces
Linear Transformations
and Matrices
2 Linear Transformations and Matrices
Properties of Matrices
Singular Value
Decomposition
3 Properties of Matrices
4 Singular Value Decomposition
updated May 6, 2019 2/28
Vector Space (Vektorraum)
A set V is called a linear space or a vector space over the field
R if it is closed under vector summation
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
+: V ×V →V
Vector Spaces
and under scalar multiplication
· : R × V → V,
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
i.e. αv1 + βv2 ∈ V ∀v1 , v2 ∈ V , ∀α, β ∈ R. With respect to
addition (+) it forms a commutative group (existence of neutral
element 0, inverse element −v ). Scalar multiplication respects
the structure of R: α(βu) = (αβ)u. Multiplication and addition
respect the distributive law:
(α + β)v = αv + βv and α(v + u) = αv + αu.
Example: V = Rn , v = (x1 , . . . , xn )> .
A subset W ⊂ V of a vector space V is called subspace if
0 ∈ W and W is closed under + and · (for all α ∈ R).
updated May 6, 2019 3/28
Mathematical
Background:
Linear Algebra
Linear Independence and Basis
The spanned subspace of a set of vectors
S = {v1 , . . . , vk } ⊂ V is the subspace formed by all linear
combinations of these vectors:
(
)
k
X
span(S) = v ∈ V v =
αi vi
i=1
Prof. Daniel Cremers
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
The set S is called linearly independent if:
k
X
Singular Value
Decomposition
αi vi = 0 ⇒ αi = 0 ∀i,
i=1
in other words: if none of the vectors can be expressed as a
linear combination of the remaining vectors. Otherwise the set
is called linearly dependent.
A set of vectors B = {v1 , . . . , vn } is called a basis of V if it is
linearly independent and if it spans the vector space V . A basis
is a maximal set of linearly independent vectors.
updated May 6, 2019 4/28
Properties of a Basis
Let B and B 0 be two bases of a linear space V .
1
2
Prof. Daniel Cremers
B and B 0 contain the same number of vectors. This
number n is called the dimension of the space V .
Any vector v ∈ V can be uniquely expressed as a linear
combination of the basis vectors in B = {b1 , . . . , bn }:
v=
n
X
Vector Spaces
Linear Transformations
and Matrices
αi bi .
i=1
3
Mathematical
Background:
Linear Algebra
Properties of Matrices
Singular Value
Decomposition
In particular, all vectors of B can be expressed as linear
combinations of vectors of another basis bi0 ∈ B 0 :
bi0 =
n
X
αji bj
j=1
The coefficients αji for this basis transform can be
combined in a matrix A. Setting B ≡ (b1 , . . . , bn ) and
B 0 ≡ (b10 , . . . , bn0 ) as the matrices of basis vectors, we can
write: B 0 = BA ⇔ B = B 0 A−1 .
updated May 6, 2019 5/28
Inner Product
On a vector space one can define an inner product (dot
product, dt.: Skalarprodukt 6= skalare Multiplikation):
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
h·, ·i : V × V → R
which is defined by three properties:
1
2
3
Vector Spaces
hu, αv + βwi = αhu, v i + βhu, wi (linear)
hu, v i = hv , ui (symmetric)
hv , v i ≥ 0 and hv , v i = 0 ⇔ v = 0 (positive definite)
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
The scalar product induces a norm
| · | : V → R,
|v | =
p
hv , v i
and a metric
d : V × V → R,
d(v , w) = |v − w| =
p
hv − w, v − wi
for measuring lengths and distances, making V a metric space.
Since the metric is induced by a scalar product V is called a
Hilbert space.
updated May 6, 2019 6/28
Mathematical
Background:
Linear Algebra
Canonical and Induced Inner Product
On V = Rn , one can define the canonical inner product for the
canonical basis B = In as
hx, y i = x > y =
n
X
Prof. Daniel Cremers
xi yi
i=1
which induces the standard L2 -norm or Euclidean norm
q
√
|x|2 = x > x = x12 + · · · + xn2
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
With a basis transform A to the new basis B 0 given by
I = B 0 A−1 the canonical inner product in the new coordinates
x 0 , y 0 is given by:
hx, y i = x > y = (Ax 0 )> (Ay 0 ) = x 0> A> A y 0 ≡ hx 0 , y 0 iA> A
The latter product is called the induced inner product from the
matrix A.
Two vectors v and w are orthogonal iff hv , wi = 0.
updated May 6, 2019 7/28
Kronecker Product and Stack of a Matrix
Given two matrices A ∈ Rm×n and B ∈ Rk ×l , one can define
their Kronecker product A ⊗ B by:


a11 B · · · a1n B


..
..
mk ×nl
..
A⊗B ≡
.
 ∈R
.
.
.
am1 B
···
amn B
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
In Matlab this can be implemented by C=kron(A,B).
m×n
Singular Value
Decomposition
s
Given a matrix A ∈ R
, its stack A is obtained by stacking
its n column vectors a1 , . . . , an ∈ Rm :


a1


As ≡  ...  ∈ Rmn .
an
These notations allow to rewrite algebraic expressions, for
example:
u > A v = (v ⊗ u)> As .
updated May 6, 2019 8/28
Linear Transformations and Matrices
Linear algebra studies the properties of linear transformations
between linear spaces. Since these can be represented by
matrices, linear algebra studies the properties of matrices.
A linear transformation L between two linear spaces V and W
is a map L : V → W such that:
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
Vector Spaces
• L(x + y ) = L(x) + L(y ) ∀x, y ∈ V
• L(αx) = αL(x) ∀x ∈ V , α ∈ R.
Due to the linearity, the action of L on the space V is uniquely
defined by its action on the basis vectors of V . In the canonical
basis {e1 , . . . , en } we have:
L(x) = Ax
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
∀x ∈ V ,
where
A = (L(e1 ), . . . , L(en ))
∈ Rm×n .
The set of all real m × n-matrices is denoted by M(m, n). In
the case that m = n, the set M(m, n) ≡ M(n) forms a ring
over the field R, i.e. it is closed under matrix multiplication and
summation.
updated May 6, 2019 9/28
Mathematical
Background:
Linear Algebra
The Linear Groups GL(n) and SL(n)
There exist certain sets of linear transformations which form a
group.
A group is a set G with an operation ◦ : G × G → G such that:
Prof. Daniel Cremers
1
closed: g1 ◦ g2 ∈ G ∀g1 , g2 ∈ G,
2
assoc.: (g1 ◦ g2 ) ◦ g3 = g1 ◦ (g2 ◦ g3 ) ∀g1 , g2 , g3 ∈ G,
Linear Transformations
and Matrices
3
neutral: ∃e ∈ G : e ◦ g = g ◦ e = g
Properties of Matrices
4
inverse: ∃g −1 ∈ G : g ◦ g −1 = g −1 ◦ g = e
Vector Spaces
∀g ∈ G,
∀g ∈ G.
Singular Value
Decomposition
Example: All invertible (non-singular) real n × n-matrices form
a group with respect to matrix multiplication. This group is
called the general linear group GL(n). It consists of all
A ∈ M(n) for which
det(A) 6= 0
All matrices A ∈ GL(n) for which det(A) = 1 form a group
called the special linear group SL(n). The inverse of A is also
in this group, as det(A−1 ) = det(A)−1
updated May 6, 2019 10/28
Matrix Representation of Groups
Mathematical
Background:
Linear Algebra
A group G has a matrix representation (dt.: Darstellung) or can
be realized as a matrix group if there exists an injective
transformation:
R : G → GL(n)
Prof. Daniel Cremers
which preserves the group structure of G, that is inverse and
composition are preserved by the map:
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
R(e) = In×n ,
R(g ◦ h) = R(g)R(h) ∀g, h ∈ G
Singular Value
Decomposition
Such a map R is called a group homomorphism (dt.
Homomorphismus).
The idea of matrix representations of a group is that they allow
to analyze more abstract groups by looking at the properties of
the respective matrix group. Example: The rotations of an
object form a group, as there exists a neutral element (no
rotation) and an inverse (the inverse rotation) and any
concatenation of rotations is again a rotation (around a
different axis). Studying the properties of the rotation group is
easier if rotations are represented by respective matrices.
updated May 6, 2019 11/28
Mathematical
Background:
Linear Algebra
The Affine Group A(n)
Prof. Daniel Cremers
n
n
An affine transformation L : R → R is defined by a matrix
A ∈ GL(n) and a vector b ∈ Rn such that:
L(x) = Ax + b
The set of all such affine transformations is called the affine
group of dimension n, denoted by A(n).
L defined above is not a linear map unless b = 0. By
introducing
homogeneous coordinates to represent x ∈ Rn by
x
n+1
, L becomes a linear mapping from
1 ∈R
x
A b
x
n+1
n+1
L:R
→R ;
7→
.
1
0 1
1
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
A matrix A0 1b with A ∈ GL(n) and b ∈ Rn is called an affine
matrix. It is an element of GL(n + 1). The affine matrices form
a subgroup of GL(n + 1). Why?
updated May 6, 2019 12/28
Mathematical
Background:
Linear Algebra
The Orthogonal Group O(n)
A matrix A ∈ M(n) is called orthogonal if it preserves the inner
product, i.e.:
hAx, Ay i = hx, y i,
Prof. Daniel Cremers
∀x, y ∈ Rn .
The set of all orthogonal matrices forms the orthogonal group
O(n), which is a subgroup of GL(n). For an orthogonal matrix
R we have
hRx, Ry i = x > R > Ry = x > y ,
∀x, y ∈ Rn
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
Therefore we must have R > R = RR > = I, in other words:
O(n) = {R ∈ GL(n) | R > R = I}
The above identity shows that for any orthogonal matrix R, we
have det(R > R) = (det(R))2 = det(I) = 1, such that
det(R) ∈ {±1}.
The subgroup of O(n) with det(R) = +1 is called the special
orthogonal group SO(n). SO(n) = O(n) ∩ SL(n). In particular,
SO(3) is the group of all 3-dimensional rotation matrices.
updated May 6, 2019 13/28
Mathematical
Background:
Linear Algebra
The Euclidean Group E(n)
A Euclidean transformation L from Rn to Rn is defined by an
orthogonal matrix R ∈ O(n) and a vector T ∈ Rn :
L : Rn → Rn ;
Prof. Daniel Cremers
x 7→ Rx + T .
The set of all such transformations is called the Euclidean
group E(n). It is a subgroup of the affine group A(n).
Embedded by homogeneous coordinates, we get:
(
)
R T
n
E(n) =
R ∈ O(n), T ∈ R .
0 1
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
If R ∈ SO(n) (i.e. det(R) = 1), then we have the special
Euclidean group SE(n). In particular, SE(3) represents the
rigid-body motions (dt.: Starrkörpertransformationen) in R3 .
In summary:
SO(n) ⊂ O(n) ⊂ GL(n),
SE(n) ⊂ E(n) ⊂ A(n) ⊂ GL(n + 1).
updated May 6, 2019 14/28
Range, Span, Null Space and Kernel
Let A ∈ Rm×n be a matrix defining a linear map from Rn to Rm .
The range or span of A (dt.: Bild) is defined as the subspace of
Rm which can be ‘reached’ by A:
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
range(A) = {y ∈ Rm ∃x ∈ Rn : Ax = y }.
Vector Spaces
The range of a matrix A is given by the span of its column
vectors.
The null space or kernel of a matrix A (dt.: Kern) is given by the
subset of vectors x ∈ Rn which are mapped to zero:
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
null(A) ≡ ker(A) = {x ∈ Rn Ax = 0}.
The null space of a matrix A is given by the vectors orthogonal
to its row vectors. Matlab: Z=null(A).
The concepts of range and null space are useful when studying
the solution of linear equations. The system Ax = b will have a
solution x ∈ Rn if and only if b ∈ range(A). Moreover, this
solution will be unique only if ker(A) = {0}. Indeed, if xs is a
solution of Ax = b and xo ∈ ker(A), then xs +xo is also a
solution: A(xs +xo ) = Axs + Axo = b.
updated May 6, 2019 15/28
Rank of a Matrix
The rank of a matrix (dt. Rang) is the dimension of its range:
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
rank(A) = dim (range(A)) .
The rank of a matrix A ∈ Rm×n has the following properties:
1
rank(A) = n − dim(ker(A)).
2
0 ≤ rank(A) ≤ min{m, n}.
rank(A) is equal to the maximum number of linearly
independent row (or column) vectors of A.
3
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
4 rank(A) is the highest order of a nonzero minor of A,
where a minor of order k is the determinant of a k × k
submatrix of A.
Sylvester’s inequality: Let B ∈ Rn×k .
Then AB ∈ Rm×k and
rank(A) + rank(B) − n ≤ rank(AB) ≤ min{rank(A), rank(B)}.
5
For any nonsingular matrices C ∈ Rm×m and D ∈ Rn×n ,
we have: rank(A) = rank(CAD).
Matlab: d=rank(A).
6
updated May 6, 2019 16/28
Mathematical
Background:
Linear Algebra
Eigenvalues and Eigenvectors
Prof. Daniel Cremers
Let A ∈ Cn×n be a complex matrix. A non-zero vector v ∈ Cn is
called a (right) eigenvector of A if:
Av = λv ,
with λ ∈ C.
λ is called an eigenvalue of A. Similarly v is called a left
eigenvector of A, if v > A = λv > for some λ ∈ C.
The spectrum σ(A) of a matrix A is the set of all its eigenvalues.
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
Matlab:
[V,D]=eig(A);
where D is a diagonal matrix containing the eigenvalues and V
is a matrix whose columns are the corresponding eigenvectors,
such that AV=VD.
updated May 6, 2019 17/28
Properties of Eigenvalues and Eigenvectors
Let A ∈ Rn×n be a square matrix. Then:
1
2
3
4
5
If Av = λv for some λ ∈ R, then there also exists a
left-eigenvector η ∈ Rn : η > A = λη > . Hence
σ(A) = σ(A> ).
The eigenvectors of a matrix A associated with different
eigenvalues are linearly independent.
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
All eigenvalues σ(A) are the roots of the characteristic
polynomial det(λI − A) = 0. Therefore det(A) is equal to
the product of all eigenvalues (some of which may appear
multiple times).
If B = PAP −1 for some nonsingular matrix P, then
σ(B) = σ(A).
If λ ∈ C is an eigenvalues, then its conjugate λ is also an
eigenvalue. Thus σ(A) = σ(A) for real matrices A.
updated May 6, 2019 18/28
Symmetric Matrices
A matrix S ∈ Rn×n is called symmetric if S > = S. A symmetric
matrix S is called positive semi-definite (denoted by S ≥ 0 or
S 0) if x > Sx ≥ 0. S is called positive definite (denoted by
S > 0 or S 0) if x > Sx > 0 ∀x 6= 0.
Let S ∈ Rn×n be a real symmetric matrix. Then:
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
Vector Spaces
1
All eigenvalues of S are real, i.e. σ(S) ⊂ R.
2
Eigenvectors vi and vj of S corresponding to distinct
eigenvalues λi 6= λj are orthogonal.
3
There always exist n orthonormal eigenvectors of S which
form a basis of Rn . Let V = (v1 , . . . , vn ) ∈ O(n) be the
orthogonal matrix of these eigenvectors, and
Λ = diag{λ1 , . . . , λn } the diagonal matrix of eigenvalues.
Then we have S = V Λ V > .
4
S is positive (semi-)definite, if all eigenvalues are positive
(nonnegative).
5
Let S be positive semi-definite and λ1 , λn the largest and
smallest eigenvalue. Then λ1 = max|x|=1 hx, Sxi and
λn = min|x|=1 hx, Sxi.
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
updated May 6, 2019 19/28
Mathematical
Background:
Linear Algebra
Norms of Matrices
There are many ways to define norms on the space of matrices
A ∈ Rm×n . They can be defined based on norms on the
domain or codomain spaces on which A operates. In particular,
the induced 2-norm of a matrix A is defined as
q
||A||2 ≡ max |Ax|2 = max hx, A> Axi.
|x|2 =1
|x|2 =1
Prof. Daniel Cremers
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Alternatively, one can define the Frobenius norm of A as:
sX
q
||A||f ≡
aij2 = trace(A> A).
Singular Value
Decomposition
i,j
Note that these norms are in general not the same. Since the
matrix A> A is symmetric and pos.
semi-definite, we can
diagonalize it as: A> A = V diag σ12 , . . . , σn2 V > with
σ12 ≥ σi2 ≥ 0. This leads to:
q
q
||A||2 = σ1 ,
and ||A||f = trace(A> A) = σ12 + . . . + σn2 .
updated May 6, 2019 20/28
Mathematical
Background:
Linear Algebra
Skew-symmetric Matrices
Prof. Daniel Cremers
A matrix A ∈ Rn×n is called skew-symmetric or anti-symmetric
(dt. schiefsymmetrisch) if A> = −A.
If A is a real skew-symmetric matrix, then:
1
2
All eigenvalues of A are either zero or purely imaginary,
i.e. of the form iω with i 2 = −1, ω ∈ R.
There exists an orthogonal matrix V such that
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
>
A = V ΛV ,
where Λ is a block-diagonal matrix
Λ = diag{A1 , . . . , Am , 0, . . . , 0}, with real skew-symmetric
matrices Ai of the form:
0
ai
Ai =
∈ R2×2 , i = 1, . . . , m.
−ai 0
In particular, the rank of any skew-symmetric matrix is even.
updated May 6, 2019 21/28
Examples of Skew-symmetric Matrices
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
In Computer Vision, a common skew-symmetric matrix is given
by the hat operator of a vector u ∈ R3 is:


0
−u3 u2
b =  u3
0
−u1  ∈ R3×3 .
u
−u2 u1
0
This is a linear operator from the space of vectors R3 to the
space of skew-symmetric matrices in R3×3 .
b has the property that
In particular, the matrix u
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
bv = u × v ,
u
where × denotes the standard vector cross product in R3 . For
b) = 2 and the null space of u
b is spanned
u 6= 0, we have rank(u
bu = u > u
b = 0.
by u, because u
updated May 6, 2019 22/28
The Singular Value Decomposition (SVD)
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
In the last slides, we have studied many properties of matrices,
such as rank, range, null space, and induced norms of
matrices. Many of these properties can be captured by the
so-called singular value decomposition (SVD).
SVD can be seen as a generalization of eigenvalues and
eigenvectors to non-square matrices. The computation of SVD
is numerically well-conditioned. It is very useful for solving
linear-algebraic problems such as matrix inversion, rank
computation, linear least-squares estimation, projections, and
fixed-rank approximations.
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
In practice, both singular value decomposition and eigenvalue
decompositions are used quite extensively.
updated May 6, 2019 23/28
Algebraic Derivation of SVD
Let A ∈ Rm×n with m ≥ n be a matrix of rank(A) = p. Then
there exist
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
• U ∈ Rm×p whose columns are orthonormal
• V ∈ Rn×p whose columns are orthonormal, and
• Σ ∈ Rp×p , Σ = diag{σ1 , . . . , σp }, with σ1 ≥ . . . ≥ σp ,
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
such that
A = U ΣV > .
Singular Value
Decomposition
Note that this generalizes the eigenvalue decomposition. While
the latter decomposes a symmetric square matrix A with an
orthogonal transformation V as:
A = V Λ V >,
with V ∈ O(n), Λ = diag{λ1 , . . . , λn },
SVD allows to decompose an arbitrary (non-square) matrix A
of rank p with two transformations U and V with orthonormal
columns as shown above. Nevertheless, we will see that SVD
is based on the eigenvalue decomposition of symmetric square
matrices.
updated May 6, 2019 24/28
Proof of SVD Decomposition 1
Given a matrix A ∈ Rm×n with m ≥ n and rank(A) = p, the
matrix
A> A ∈ Rn×n
is symmetric and positive semi-definite. Therefore it can be
decomposed with non-negative eigenvalues σ12 ≥ . . . ≥ σn2 ≥ 0
with orthonormal eigenvectors v1 , . . . , vn . The σi are called
singular values. Since
>
>
>
ker(A A) = ker(A) and range(A A) = range(A ),
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
we have span{v1 , . . . , vp } = range(A> ) and
span{vp+1 , . . . , vn } = ker(A). Let
ui ≡
1
Avi ⇔ Avi = σi ui ,
σi
i = 1, . . . , p
then the ui ∈ Rm are orthonormal:
hui , uj i =
1
1
hAvi , Avj i =
hvi , A> Avj i = δij .
σi σj
σi σj
updated May 6, 2019 25/28
Mathematical
Background:
Linear Algebra
Proof of SVD Decomposition 2
Complete {ui }pi=1 to a basis {ui }m
i=1 of Rm . Since Avi
have

σ1 0
0 ···

..
.
.. 0
 0
.


..
A (v1 , . . . , vn ) = (u1 , . . . , um )  0 · · · σp
.

 .
.
 .. · · · · · · ..
0 ··· ··· 0
= σi ui , we
0
Prof. Daniel Cremers


0 


,
0 


0 
0
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
which is of the form AṼ = Ũ Σ̃, thus
A = Ũ Σ̃ Ṽ > .
Now simply delete all columns of Ũ and the rows of Ṽ > which
are multiplied by zero singular values and we obtain the form
A = U Σ V > , with U ∈ Rm×p and V ∈ Rn×p .
In Matlab: [U,S,V] = svd(A).
updated May 6, 2019 26/28
A Geometric Interpretation of SVD
For A ∈ Rn×n , the singular value decomposition A = U Σ V > is
such that the columns U = (u1 , . . . , un ) and V = (v1 , . . . , vn )
form orthonormal bases of Rn . If a point x ∈ Rn is mapped to a
point y ∈ Rn by the transformation A, then the coordinates of y
in basis U are related to the coordinates of x in basis V by the
diagonal matrix Σ: each coordinate is merely scaled by the
corresponding singular value:
y = Ax = U Σ V > x
⇔
U > y = Σ V > x.
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
The matrix A maps the unit sphere into an ellipsoid with
semi-axes σi ui . To see this, we call α ≡ V > x the coefficients of
the point x in the basis V and those of y in basis UP
shall be
called β ≡ U > y . All points of the circle fulfill |x|22 = i αi2 = 1.
The above statement says that βi = σi αi . Thus for the points
on the sphere we have
X
X
αi2 =
βi2 /σi2 = 1,
i
i
which states that the transformed points lie on an ellipsoid
oriented along the axes of the basis U.
updated May 6, 2019 27/28
The Generalized (Moore Penrose) Inverse
For certain quadratic matrices one can define an inverse
matrix, if det(A) 6= 0. The set of all invertible matrices forms the
group GL(n). One can also define a (generalized) inverse (also
called pseudo inverse) (dt.: Pseudoinverse) for an arbitrary
(non-quadratic) matrix A ∈ Rm×n . If its SVD is A = U Σ V > the
pseudo inverse is defined as:
−1
Σ1
0
A† = V Σ† U > , where Σ† =
,
0
0 n×m
Mathematical
Background:
Linear Algebra
Prof. Daniel Cremers
Vector Spaces
Linear Transformations
and Matrices
Properties of Matrices
Singular Value
Decomposition
where Σ1 is the diagonal matrix of non-zero singular values. In
Matlab: X=pinv(A). In particular, the pseudo inverse can be
employed in a similar fashion as the inverse of quadratic
invertible matrices:
AA† A = A,
A† AA† = A† .
The linear system Ax = b with A ∈ Rm×n of rank r ≤ min(m, n)
can have multiple or no solutions. xmin = A† b is among all
2
minimizers of |Ax − b| the one with the smallest norm |x|.
updated May 6, 2019 28/28
Representing a Moving
Scene
Prof. Daniel Cremers
Chapter 2
Representing a Moving Scene
Multiple View Geometry
Summer 2019
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
Prof. Daniel Cremers
Chair for Computer Vision and Artificial Intelligence
Departments of Informatics & Mathematics
Technical University of Munich
updated May 8, 2019 1/26
Overview
Representing a Moving
Scene
Prof. Daniel Cremers
1 The Origins of 3D Reconstruction
2 3D Space & Rigid Body Motion
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
3 The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
4 The Lie Group SE(3)
Euler Angles
5 Representing the Camera Motion
6 Euler Angles
updated May 8, 2019 2/26
The Origins of 3D Reconstruction
The goal to reconstruct the three-dimensional structure of the
world from a set of two-dimensional views has a long history in
computer vision. It is a classical ill-posed problem, because the
reconstruction consistent with a given set of observations or
images is typically not unique. Therefore, one will need to
impose additional assumptions.
Mathematically, the study of geometric relations between a 3D
scene and the observed 2D projections is based on two types
of transformations, namely:
• Euclidean motion or rigid-body motion representing the
motion of the camera from one frame to the next.
Representing a Moving
Scene
Prof. Daniel Cremers
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
• Perspective projection to account for the image formation
process (see pinhole camera, etc).
The notion of perspective projection has its roots among the
ancient Greeks (Euclid of Alexandria, ∼ 400 B.C.) and the
Renaissance period (Brunelleschi & Alberti, 1435). The study
of perspective projection lead to the field of projective geometry
(Girard Desargues 1648, Gaspard Monge 18th cent.).
updated May 8, 2019 3/26
The Origins of 3D Reconstruction
Representing a Moving
Scene
Prof. Daniel Cremers
The first work on the problem of multiple view geometry was
that of Erwin Kruppa (1913) who showed that two views of five
points are sufficient to determine both the relative
transformation (motion) between the two views and the 3D
location (structure) of the points up to finitely many solutions.
A linear algorithm to recover structure and motion from two
views based on the epipolar constraint was proposed by
Longuet-Higgins in 1981. An entire series of works along these
lines was summarized in several text books (Faugeras 1993,
Kanatani 1993, Maybank 1993, Weng et al. 1993).
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
Extensions to three views were developed by Spetsakis and
Aloimonos ’87, ’90, and by Shashua ’94 and Hartley ’95.
Factorization techniques for multiple views and orthogonal
projection were developed by Tomasi and Kanade 1992.
The joint estimation of camera motion and 3D location is called
structure and motion or (more recently) visual SLAM.
updated May 8, 2019 4/26
Three-dimensional Euclidean Space
The three-dimensional Euclidean space E3 consists of all
points p ∈ E3 characterized by coordinates
Representing a Moving
Scene
Prof. Daniel Cremers
X ≡ (X1 , X2 , X3 )> ∈ R3 ,
such that E3 can be identified with R3 . That means we talk
about points (E3 ) and coordinates (R3 ) as if they were the
same thing. Given two points X and Y , one can define a bound
vector as
v = X − Y ∈ R3 .
Considering this vector independent of its base point Y makes
it a free vector. The set of free vectors v ∈ R3 forms a linear
vector space. By identifying E3 and R3 , one can endow E3 with
a scalar product, a norm and a metric. This allows to compute
distances, curve length
Z
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
1
l(γ) ≡
|γ̇(s)| ds
for a curve γ : [0, 1] → R3 ,
0
areas or volumes.
updated May 8, 2019 5/26
Cross Product & Skew-symmetric Matrices
On R3 one can define a cross product

× : R 3 × R3 → R3 :

u2 v3 − u3 v2
u × v =  u3 v1 − u1 v3  ∈ R3 ,
u1 v2 − u2 v1
which is a vector orthogonal to u and v . Since u × v = −v × u,
the cross product introduces an orientation. Fixing u induces a
linear mapping v 7→ u × v which can be represented by the
skew-symmetric matrix


0
−u3 u2
b =  u3
0
−u1  ∈ R3×3 .
u
−u2 u1
0
Representing a Moving
Scene
Prof. Daniel Cremers
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
In turn, every skew symmetric matrix M = −M > ∈ R3×3 can be
identified with a vector u ∈ R3 . The operator b defines an
isomorphism between R3 and the space so(3) of all 3 × 3
skew-symmetric matrices. Its inverse is denoted by
∨ : so(3) → R3 .
updated May 8, 2019 6/26
Representing a Moving
Scene
Rigid-body Motion
Prof. Daniel Cremers
A rigid-body motion (or rigid-body transformation) is a family of
maps
gt : R3 → R3 ; X 7→ gt (X ), t ∈ [0, T ]
which preserve the norm and cross product of any two vectors:
• |gt (v )| = |v |,
3D Space & Rigid
Body Motion
∀ v ∈ R3 ,
• gt (u) × gt (v ) = gt (u × v ),
The Origins of 3D
Reconstruction
∀ u, v ∈ R3 .
Since norm and scalar product are related by the polarization
identity
1
hu, v i = (|u + v |2 − |u − v |2 ),
4
one can also state that a rigid-body motion is a map which
preserves inner product and cross product. As a consequence,
rigid-body motions also preserve the triple product
hgt (u), gt (v ) × gt (w)i = hu, v × wi,
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
∀ u, v , w ∈ R3 ,
which means that they are volume-preserving.
updated May 8, 2019 7/26
Representation of Rigid-body Motion
Does the above definition lead to a mathematical
representation of rigid-body motion?
Since it preserves lengths and orientation, the motion gt of a
rigid body is sufficiently defined by specifying the motion of a
Cartesian coordinate frame attached to the object (given by an
origin and orthonormal oriented vectors e1 , e2 , e3 ∈ R3 ). The
motion of the origin can be represented by a translation
T ∈ R3 , whereas the transformation of the vectors ei is given
by new vectors ri = gt (ei ).
Scalar and cross product of these vectors are preserved:
ri> rj
>
= gt (ei ) gt (ej ) =
ei> ej
= δij ,
Representing a Moving
Scene
Prof. Daniel Cremers
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
r1 × r2 = r3 .
The first constraint amounts to the statement that the matrix
R = (r1 , r2 , r3 ) is an orthogonal (rotation) matrix:
R > R = RR > = I, whereas the second property implies that
det(R) = +1, in other words: R is an element of the group
SO(3) = R ∈ R3×3 | R > R = I, det(R) = +1 .
Thus the rigid body motion gt can be written as:
gt (x) = Rx + T .
updated May 8, 2019 8/26
Exponential Coordinates of Rotation
We will now derive a representation of an infinitesimal rotation.
To this end, consider a family of rotation matrices R(t) which
continuously transform a point from its original location
(R(0) = I) to a different one.
with R(t) ∈ SO(3).
X trans (t) = R(t)X orig ,
Representing a Moving
Scene
Prof. Daniel Cremers
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
Since R(t)R(t)> = I, ∀t, we have
The Lie Group SO(3)
d
(RR > ) = ṘR > + R Ṙ > = 0
dt
⇒
ṘR > = −(ṘR > )> .
Representing the
Camera Motion
Thus, ṘR > is a skew-symmetric matrix. As shown in the
section about the b-operator, this implies that there exists a
vector w(t) ∈ R3 such that:
b
Ṙ(t)R > (t) = w(t)
⇔
The Lie Group SE(3)
Euler Angles
b
Ṙ(t) = wR(t).
b
Since R(0) = I, it follows that Ṙ(0) = w(0).
Therefore the
b
skew-symmetric matrix w(0)
∈ so(3) gives the first order
approximation of a rotation:
b
R(dt) = R(0) + dR = I + w(0)
dt.
updated May 8, 2019 9/26
Lie Group and Lie Algebra
The above calculations showed that the effect of any
infinitesimal rotation R ∈ SO(3) can be approximated by an
element from the space of skew-symmetric matrices
b | w ∈ R3 }.
so(3) = {w
Representing a Moving
Scene
Prof. Daniel Cremers
The Origins of 3D
Reconstruction
The rotation group SO(3) is called a Lie group. The space
so(3) is called its Lie algebra.
3D Space & Rigid
Body Motion
Def.: A Lie group (or infinitesimal group) is a smooth manifold
that is also a group, such that the group operations
multiplication and inversion are smooth maps.
The Lie Group SE(3)
The Lie Group SO(3)
Representing the
Camera Motion
Euler Angles
As shown above: The Lie algebra so(3) is the tangent space at
the identity of the rotation group SO(3).
An algebra over a field K is a vector space V over K with
b and vb of the Lie
multiplication on the space V . Elements w
algebra generally do not commute.
One can define the Lie bracket
[ . , . ] : so(3) × so(3) → so(3);
b vb] ≡ w
b vb − vbw.
b
[w,
updated May 8, 2019 10/26
Sophus Lie (1841 - 1899)
Representing a Moving
Scene
Prof. Daniel Cremers
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
Marius Sophus Lie was a Norwegian-born mathematician. He
created the theory of continuous symmetry, and applied it to
the study of geometry and differential equations. Among his
greatest achievements was the discovery that continuous
transformation groups are better understood in their linearized
versions (“Theorie der Transformationsgruppen” 1893). These
infinitesimal generators form a structure which is today known
as a Lie algebra. The linearized version of the group law
corresponds to an operation on the Lie algebra known as the
commutator bracket or Lie bracket. 1882 Professor in
Christiania (Oslo), 1886 Leipzig (succeeding Felix Klein), 1898
Christiania.
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
updated May 8, 2019 11/26
The Exponential Map
Given the infinitesimal formulation of rotation in terms of the
b is it possible to determine a useful
skew symmetric matrix w,
b is
representation of the rotation R(t)? Let us assume that w
constant in time.
The differential equation system
(
b
Ṙ(t) = wR(t),
R(0) = I.
Representing a Moving
Scene
Prof. Daniel Cremers
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
has the solution
R(t) = ewb t =
Representing the
Camera Motion
∞
X
n=0
b n
b 2
(wt)
(wt)
b +
= I + wt
+ ...,
n!
2!
Euler Angles
which is a rotation around the axis w ∈ R3 by an angle of t (if
kwk = 1). Alternatively, one can absorb the scalar t ∈ R into
b to obtain R(t) = ebv with vb = wt.
b
the skew symmetric matrix w
This matrix exponential therefore defines a map from the Lie
algebra to the Lie group:
exp : so(3) → SO(3);
b 7→ ewb .
w
updated May 8, 2019 12/26
The Logarithm of SO(3)
Representing a Moving
Scene
Prof. Daniel Cremers
As in the case of real analysis one can define an inverse
function to the exponential map by the logarithm. In the context
of Lie groups, this will lead to a mapping from the Lie group to
the Lie algebra. For any rotation matrix R ∈ SO(3), there exists
b Such an element is denoted
a w ∈ R3 such that R = exp(w).
b
by w = log(R).
If R = (rij ) 6= I, then an appropriate w is given by:


r32 − r23
w
1
trace(R) − 1
 r13 − r31  .
,
=
|w| = cos−1
2
|w|
2 sin(|w|)
r21 − r12
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
For R = I, we have |w| = 0, i.e. a rotation by an angle 0. The
above statement says: Any orthogonal transformation
R ∈ SO(3) can be realized by rotating by an angle |w| around
w
an axis |w|
as defined above. We will not prove this statement.
Obviously the above representation is not unique since
increasing the angle by multiples of 2π will give the same
rotation R.
updated May 8, 2019 13/26
Schematic Visualization of Lie Group & Lie Algebra
Representing a Moving
Scene
Prof. Daniel Cremers
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
Definition: A Lie group is a smooth manifold that is also a
group, such that the group operations multiplication and
inversion are smooth maps.
Definition: The tangent space to a Lie group at the identity
element is called the associated Lie algebra.
The mapping from the Lie algebra to the Lie group is called the
exponential map. Its inverse is called the logarithm.
updated May 8, 2019 14/26
Rodrigues’ Formula
We have seen that any rotation can be realized by computing
R = ewb . In analogy to the well-known Euler equation
eiφ = cos(φ) + i sin(φ),
Prof. Daniel Cremers
∀φ ∈ R,
we have an expression for skew symmetric matrices
b ∈ so(3):
w
ewb = I +
Representing a Moving
Scene
b2
b
w
w
sin(|w|) +
1 − cos(|w|) .
2
|w|
|w|
This is known as Rodrigues’ formula.
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
Proof: Let t = |w| and v = w/|w|. Then
vb2 = vv > − I,
vb3 = −vb,
...
and
2
t3
t5
t
t4
t6
ewb = ebv t = I+ t −
+
− . . . vb+
−
+
− . . . vb2 .
3! 5!
2! 4! 6!
|
{z
}
|
{z
}
sin(t)
1−cos(t)
updated May 8, 2019 15/26
Representing a Moving
Scene
Representation of Rigid-body Motions SE(3)
Prof. Daniel Cremers
We have seen that the motion of a rigid-body is uniquely
determined by specifying the translation T of any given point
and a rotation matrix R defining the transformation of an
oriented Cartesian coordinate frame at the given point. Thus
the space of rigid-body motions given by the group of special
Euclidean transformations
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
SE(3) ≡ g = (R, T ) | R ∈ SO(3), T ∈ R3 .
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
In homogeneous coordinates, we have:
(
SE(3) ≡
g=
R
0
T
1
)
R ∈ SO(3), T ∈ R
3
⊂ R4×4 .
In the context of rigid motions, one can see the difference
between points in E3 (which can be rotated and translated) and
vectors in R3 (which can only be rotated).
updated May 8, 2019 16/26
Representing a Moving
Scene
The Lie Algebra of Twists
Prof. Daniel Cremers
Given a continuous family of rigid-body transformations
R(t) T (t)
g : R → SE(3); g(t) =
∈ R4×4 ,
0
1
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
we consider
The Lie Group SO(3)
ġ(t)g −1 (t) =
ṘR >
0
Ṫ − ṘR > T
0
∈ R4×4 .
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
As in the case of SO(3), the matrix ṘR
b ∈ so(3).
skew-symmetric matrix w
>
corresponds to some
b
Defining a vector v (t) = Ṫ (t) − w(t)T
(t), we have:
b
w(t)
v (t)
b ∈ R4×4 .
≡ ξ(t)
ġ(t)g −1 (t) =
0
0
updated May 8, 2019 17/26
Representing a Moving
Scene
The Lie Algebra of Twists
Prof. Daniel Cremers
Multiplying with g(t) from the right, we obtain:
b
ġ = ġg −1 g = ξg.
The 4 × 4-matrix ξb can be viewed as a tangent vector along the
curve g(t). ξb is called a twist.
As in the case of so(3), the set of all twists forms a the tangent
space which is the Lie algebra
(
)
b
w
v
3
b ∈ so(3), v ∈ R
se(3) ≡ ξb =
w
⊂ R4×4 .
0 0
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
to the Lie group SE(3).
As before, we can define operators ∧ and ∨ to convert between
a twist ξb ∈ se(3) and its twist coordinates ξ ∈ R6 :
ξb ≡
v
w
∧ b
w
≡
0
v
0
∈R
4×4
,
b
w
0
v
0
∨ v
=
∈ R6 .
w
updated May 8, 2019 18/26
Exponential Coordinates for SE(3)
The twist coordinates ξ = wv are formed by stacking the linear
velocity v ∈ R3 (related to translation) and the angular velocity
w ∈ R3 (related to rotation).
The differential equation system
(
b
ġ(t) = ξg(t),
ξb = const.
g(0) = I,
Representing a Moving
Scene
Prof. Daniel Cremers
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
has the solution
Representing the
Camera Motion
b
ξt
g(t) = e =
∞
X
b n
(ξt)
n=0
For w = 0, we have eξ =
b
ξb
e =
Iv
01
ewb
0
n!
Euler Angles
.
, while for w 6= 0 one can show:
b v +ww > v
(I−ew )w
|w|2
b
1
!
.
updated May 8, 2019 19/26
Representing a Moving
Scene
Exponential Coordinates for SE(3)
The above shows that the exponential map defines a
transformation from the Lie algebra se(3) to the Lie group
SE(3):
b
exp : se(3) → SE(3); ξb 7→ eξ .
The elements ξb ∈ se(3) are called the exponential coordinates
for SE(3).
Conversely: For every g ∈ SE(3) there exist twist coordinates
b
ξ = (v , w) ∈ R6 such that g = exp(ξ).
3
Proof: Given g = (R, T ), we know that there exists w ∈ R with
ewb = R. If |w| =
6 0, the exponential form of g introduced above
shows that we merely need to solve the equation
Prof. Daniel Cremers
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
b + ww > v
(I − ewb )wv
=T
|w|2
for the velocity vector v ∈ R3 . Just as in the case of SO(3), this
representation is generally not unique, i.e. there exist many
twists ξb ∈ se(3) which represent the same rigid-body motion
g ∈ SE(3).
updated May 8, 2019 20/26
Representing the Motion of the Camera
Representing a Moving
Scene
Prof. Daniel Cremers
When observing a scene from a moving camera, the
coordinates and velocity of points in camera coordinates will
change. We will use a rigid-body transformation
R(t) T (t)
∈ SE(3)
g(t) =
0
1
to represent the motion from a fixed world frame to the camera
frame at time t. In particular we assume that at time t = 0 the
camera frame coincides with the world frame, i.e. g(0) = I. For
any point X 0 in world coordinates, its coordinates in the
camera frame at time t are:
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
X (t) = R(t)X 0 + T (t),
or in the homogeneous representation
X (t) = g(t)X 0 .
updated May 8, 2019 21/26
Concatenation of Motions over Frames
Representing a Moving
Scene
Prof. Daniel Cremers
Given two different times t1 and t2 , we denote the
transformation from the points in frame t1 to the points in frame
t2 by g(t2 , t1 ):
X (t2 ) = g(t2 , t1 )X (t1 ).
Obviously we have:
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
X (t3 ) = g(t3 , t2 )X 2 = g(t3 , t2 )g(t2 , t1 )X (t1 ) = g(t3 , t1 )X (t1 ),
The Lie Group SO(3)
The Lie Group SE(3)
and thus:
g(t3 , t1 ) = g(t3 , t2 )g(t2 , t1 ).
Representing the
Camera Motion
Euler Angles
By transferring the coordinates of frame t1 to coordinates in
frame t2 and back, we see that:
X (t1 ) = g(t1 , t2 )X (t2 ) = g(t1 , t2 )g(t2 , t1 )X (t1 ),
which must hold for any point coordinates X (t1 ), thus:
g(t1 , t2 )g(t2 , t1 ) = I
⇔
g −1 (t2 , t1 ) = g(t1 , t2 ).
updated May 8, 2019 22/26
Representing a Moving
Scene
Rules of Velocity Transformation
The coordinates of point X 0 in frame t are given by
X (t) = g(t)X 0 . Therefore the velocity is given by
Prof. Daniel Cremers
Ẋ (t) = ġ(t)X 0 = ġ(t)g −1 (t)X (t)
The Origins of 3D
Reconstruction
By introducing the twist coordinates

b
w(t)
v (t)
b (t) ≡ ġ(t)g −1 (t) = 

V

0
0
3D Space & Rigid
Body Motion




∈ se(3),
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
we get the expression:
Euler Angles
b (t)X (t).
Ẋ (t) = V
In simple 3D-coordinates this gives:
b
Ẋ (t) = w(t)X
(t) + v (t).
b (t) therefore represents the relative velocity of
The symbol V
the world frame as viewed from the camera frame.
updated May 8, 2019 23/26
Transfer Between Frames: The Adjoint Map
Representing a Moving
Scene
Prof. Daniel Cremers
Suppose that a viewer in another frame A is displaced relative
to the current frame by a transformation gxy : Y = gxy X (t).
Then the velocity in this new frame is given by:
b (t)X (t) = gxy V
b g −1 Y (t).
Ẏ (t) = gxy Ẋ (t) = gxy V
xy
This shows that the relative velocity of points observed from
camera frame A is represented by the twist
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
by = gxy V
b g −1 ≡ adg (V
b ).
V
xy
xy
where we have introduced the adjoint map on se(3):
adg : se(3) → se(3);
ξb 7→ g ξbg −1 .
updated May 8, 2019 24/26
Representing a Moving
Scene
Summary
Prof. Daniel Cremers
Rotation SO(3)
Matrix repres.
R ∈ GL(3) :
R > R = I,
det(R) = 1
Rigid-body SE(3)
g=
R
0
T
1
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
3-D coordinates
X = RX 0
X = RX 0 + T
g −1 =
>
>
R −R T
0
1
Inverse
R −1 = R >
Exp. repres.
b
R = exp(w)
b
g = exp(ξ)
Velocity
b
Ẋ = wX
b +v
Ẋ = wX
Adjoint map
b 7→ R w
b R>
w
ξb 7→ g ξbg −1
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
updated May 8, 2019 25/26
Alternative Representations: Euler Angles
In addition to the exponential parameterization, there exist
alternative mathematical representations to parameterize
rotation matrices R ∈ SO(3), given by the Euler angles. These
are local coordinates, i.e. the parameterization is only correct
for a portion of SO(3).
b1 , w
b2 , w
b 3 ) of the Lie algebra so(3), we can
Given a basis (w
define a mapping from R3 to the Lie group SO(3) by:
7→
α : (α1 , α2 , α3 )
b 3 ).
b 1 + α2 w
b 2 + α3 w
exp(α1 w
The coordinates (α1 , α2 , α3 ) are called Lie-Cartan coordinates
of the first kind relative to the above basis.
Representing a Moving
Scene
Prof. Daniel Cremers
The Origins of 3D
Reconstruction
3D Space & Rigid
Body Motion
The Lie Group SO(3)
The Lie Group SE(3)
Representing the
Camera Motion
Euler Angles
The Lie-Cartan coordinates of the second kind are defined as:
β : (β1 , β2 , β3 )
7→
b 1 ) exp(β2 w
b 2 ) exp(β3 w
b 3 ).
exp(β1 w
For the basis representing rotation around the z-, y -, x-axis
w1 = (0, 0, 1)> ,
w2 = (0, 1, 0)> ,
w3 = (1, 0, 0)> ,
the coordinates β1 , β2 , β3 are called Euler angles.
updated May 8, 2019 26/26
Perspective Projection
Prof. Daniel Cremers
Chapter 3
Perspective Projection
Multiple View Geometry
Summer 2019
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
Prof. Daniel Cremers
Chair for Computer Vision and Artificial Intelligence
Departments of Informatics & Mathematics
Technical University of Munich
updated May 20, 2019 1/24
Overview
Perspective Projection
Prof. Daniel Cremers
1 Historic Remarks
Historic Remarks
2 Mathematical Representation
Mathematical
Representation
Intrinsic Parameters
3 Intrinsic Parameters
Spherical Projection
Radial Distortion
4 Spherical Projection
Preimage and
Coimage
Projective Geometry
5 Radial Distortion
6 Preimage and Coimage
7 Projective Geometry
updated May 20, 2019 2/24
Some Historic Remarks
Perspective Projection
Prof. Daniel Cremers
The study of the image formation process has a long history.
The earliest formulations of the geometry of image formation
can be traced back to Euclid (4th century B.C.). Examples of a
partially correct perspective projection are visible in the
frescoes and mosaics of Pompeii (1 B.C.).
These skills seem to have been lost with the fall of the Roman
empire. Correct perspective projection emerged again around
1000 years later in early Renaissance art.
Among the proponents of perspective projection are the
Renaissance artists Brunelleschi, Donatello and Alberti. The
first treatise on the projection process, “Della Pittura” (1435)
was published by Leon Battista Alberti).
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
Apart from the geometry of image formation, the study of the
interaction of light with matter was propagated by artists like
Leonardo da Vinci in the 1500s and by Renaissance painters
such as Caravaggio and Raphael.
updated May 20, 2019 3/24
Perspective Projection in Art
Perspective Projection
Prof. Daniel Cremers
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
Filippo Lippi, “The Feast of Herod: Salome’s Dance.”
Fresco, Cappella Maggiore, Duomo, Prato, Italy, c.1460-1464.
updated May 20, 2019 4/24
Perspective Projection in Art
Perspective Projection
Prof. Daniel Cremers
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
Raphael, The School of Athens (1509)
updated May 20, 2019 5/24
Perspective Projection in Art
Perspective Projection
Prof. Daniel Cremers
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
Dürer’s machine (1525)
updated May 20, 2019 6/24
Perspective Projection in Art
Perspective Projection
Prof. Daniel Cremers
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
Satire by Hogarth 1753
updated May 20, 2019 7/24
Perspective Projection
Perspective Projection in Art
Prof. Daniel Cremers
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
M.C. Escher, Another World 1947
Escher, Belvedere 1958
updated May 20, 2019 8/24
Mathematics of Perspective Projection
Perspective Projection
Prof. Daniel Cremers
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
The above drawing shows the perspective projection of a point
P (observed through a thin lens) to its image p.
The point P has coordinates X = (X , Y , Z ) ∈ R3 relative to the
reference frame centered at the optical center, where the z-axis
is the optical axis (of the lens).
updated May 20, 2019 9/24
Mathematics of Perspective Projection
Perspective Projection
Prof. Daniel Cremers
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
To simplify equations, one flips the signs of x- and y -axes,
which amounts to considering the image plane to be in front of
the center of projection (rather than behind it). The perspective
transformation π is therefore given by
!
f XZ
3
2
π : R → R ; X 7→ x = π(X ) =
.
f YZ
updated May 20, 2019 10/24
Perspective Projection
An Ideal Perspective Camera
Prof. Daniel Cremers
In homogeneous coordinates, the perspective transformation is
given by:



 

X
x
f 0 0 0
 Y 

Zx = Z  y  =  0 f 0 0 
 Z  = Kf Π0 X .
1
0 0 1 0
1
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
where we have introduced the two matrices



f 0 0
1
Kf ≡  0 f 0  and Π0 ≡  0
0 0 1
0
Preimage and
Coimage
0
1
0
0
0
1

0
0 .
0
Projective Geometry
The matrix Π0 is referred to as the standard projection matrix.
Assuming Z to be a constant λ > 0, we obtain:
λx = Kf Π0 X .
updated May 20, 2019 11/24
An Ideal Perspective Camera
Perspective Projection
Prof. Daniel Cremers
From the previous lectures, we know that due to the rigid
motion of the camera, the point X in camera coordinates is
given as a function of the point in world coordinates X 0 by:
X = RX 0 + T ,
or in homogeneous coordinates X = (X , Y , Z , 1)> :
R T
X = gX 0 =
X 0.
0 1
In total, the transformation from world coordinates to image
coordinates is therefore given by
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
λx = Kf Π0 g X 0 .
If the focal length f is known, it can be normalized to 1 (by
changing the units of the image coordinates), such that:
λ x = Π0 X = Π0 g X 0 .
updated May 20, 2019 12/24
Perspective Projection
Intrinsic Camera Parameters
Prof. Daniel Cremers
If the camera is not centered at the optical center, we have an
additional translation ox , oy and if pixel coordinates do not have
unit scale, we need to introduce an additional scaling in x- and
y -direction by sx and sy . If the pixels are not rectangular, we
have a skew factor sθ .
The pixel coordinates (x 0 , y 0 , 1) as a function of homogeneous
camera coordinates X are then given by:
 

 X
 0 

f 0 0
1 0 0 0  
x
sx sθ ox
Y 
λ  y 0  =  0 sy oy   0 f 0   0 1 0 0  
Z 
0 0 1
0 0 1 0
1
0 0 1
1
{z
}|
{z
}|
{z
}
|
≡Ks
≡Kf
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
≡Π0
After the perspective projection Π0 (with focal length 1), we
have an additional transformation which depends on the
(intrinsic) camera parameters. This can be expressed by the
intrinsic parameter matrix K = Ks Kf .
updated May 20, 2019 13/24
The Intrinsic Parameter Matrix
All intrinsic camera parameters therefore enter the intrinsic
parameter matrix


fsx fsθ ox
K ≡ Ks Kf =  0 fsy oy  .
0
0
1
As a function of the world coordinates X 0 , we therefore have:
Perspective Projection
Prof. Daniel Cremers
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
0
λx = K Π0 X = K Π0 g X 0 ≡ Π X 0 .
The 3 × 4 matrix Π ≡ K Π0 g = (KR, KT ) is called a general
projection matrix.
Although the above equation looks like a linear one, we still
have the scale parameter λ. Dividing by λ gives:
x0 =
π1> X 0
,
π3> X 0
y0 =
π2> X 0
,
π3> X 0
Preimage and
Coimage
Projective Geometry
z 0 = 1,
where π1> , π2> , π3> ∈ R4 are the three rows of the projection
matrix Π.
updated May 20, 2019 14/24
The Intrinsic Parameter Matrix
Perspective Projection
Prof. Daniel Cremers
The entries of the intrinsic parameter matrix


fsx fsθ ox
K =  0 fsy oy  ,
0
0
1
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
can be interpreted as follows:
Spherical Projection
Radial Distortion
ox :
x-coordinate of principal point in pixels,
oy :
y -coordinate of principal point in pixels,
fsx = αx :
size of unit length in horizontal pixels,
fsy = αy :
size of unit length in vertical pixels,
αx /αy :
aspect ratio σ,
fsθ :
skew of the pixel, often close to zero.
Preimage and
Coimage
Projective Geometry
updated May 20, 2019 15/24
Perspective Projection
Spherical Perspective Projection
Prof. Daniel Cremers
The perspective pinhole camera introduced above considers a
planar imaging surface. Instead, one can consider a spherical
projection surface given by the unit sphere
S2 ≡ {x ∈ R3 |x| = 1}. The spherical projection πs of a 3D
point X is given by:
πs : R3 → S2 ;
X 7→ x =
X
.
|X |
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
0
The pixel coordinates x as a function of the world coordinates
X 0 are:
λ x 0 = K Π0 g X 0 ,
√
except that the scalar factor is now λ = |X | = X 2 + Y 2 + Z 2 .
One often writes x ∼ y for homogeneous vectors x and y if
they are equal up to a scalar factor. Then we can write:
Preimage and
Coimage
Projective Geometry
x 0 ∼ Π X 0 = K Π0 g X 0 .
This property holds for any imaging surface, as long as the ray
between X and the origin intersects the imaging surface.
updated May 20, 2019 16/24
Perspective Projection
Radial Distortion
Prof. Daniel Cremers
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
bookshelf with regular lens
bookshelf with short focal lens
updated May 20, 2019 17/24
Radial Distortion
The intrinsic parameters in the matrix K model linear
distortions in the transformation to pixel coordinates. In
practice, however, one can also encounter significant
distortions along the radial axis, in particular if a wide field of
view is used or if one uses cheaper cameras such as
webcams. A simple effective model for such distortions is:
x = xd (1 + a1 r 2 + a2 r 4 )),
y = yd (1 + a1 r 2 + a2 r 4 )),
Perspective Projection
Prof. Daniel Cremers
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
where x d ≡ (xd , yd ) is the distorted point, r 2 = xd2 + yd2 . If a
calibration rig is available, the distortion parameters a1 and a2
can be estimated.
Alternatively, one can estimate a distortion model directly from
the images. A more general model (Devernay and Faugeras
1995) is
x = c + f (r )(x d − c),
Radial Distortion
Preimage and
Coimage
Projective Geometry
with f (r ) = 1 + a1 r + a2 r 2 + a3 r 3 + a4 r 4 ,
Here, r = |x d − c| is the distance to an arbitrary center of
distortion c and the distortion correction factor f (r ) is an
arbitrary 4-th order expression. Parameters are computed from
distortions of straight lines or simultaneously with the 3D
reconstruction (Zhang ’96, Stein ’97, Fitzgibbon ’01).
updated May 20, 2019 18/24
Preimage of Points and Lines
The perspective transformation introduced above allows to
define images for arbitrary geometric entities by simply
transforming all points of the entity. However, due to the
unknown scale factor, each point is mapped not to a single
point x, but to an equivalence class of points y ∼ x. It is
therefore useful to study how lines are transformed.
A line L in 3-D is characterized by a base point
X 0 = (X0 , Y0 , Z0 , 1)> ∈ R4 and a vector
V = (V1 , V2 , V3 , 0)> ∈ R4 :
X = X0 + µ V,
µ ∈ R.
Perspective Projection
Prof. Daniel Cremers
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
The image of the line L is given by
x ∼ Π0 X = Π0 (X 0 + µV ) = Π0 X 0 + µΠ0 V .
All points x treated as vectors from the origin o span a 2-D
subspace P. The intersection of this plane P with the image
plane gives the image of the line. P is called the preimage of
the line.
A preimage of a point or a line in the image plane is the largest
set of 3D points that give rise to an image equal to the given
point or line.
updated May 20, 2019 19/24
Perspective Projection
Preimage and Coimage
Prof. Daniel Cremers
Historic Remarks
Mathematical
Representation
Preimage P of a line L
Intrinsic Parameters
Spherical Projection
Preimages can be defined for curves or other more
complicated geometric structures. In the case of points and
lines, however, the preimage is a subspace of R3 . This
subspace can also be represented by its orthogonal
complement, i.e. the normal vector in the case of a plane. This
complement is called the coimage. The coimage of a point or a
line is the subspace in R3 that is the (unique) orthogonal
complement of its preimage. Image, preimage and coimage
are equivalent because they uniquely determine oneanother:
image = preimage ∩ image plane,
⊥
preimage = coimage ,
Radial Distortion
Preimage and
Coimage
Projective Geometry
preimage = span(image),
coimage = preimage⊥ .
updated May 20, 2019 20/24
Preimage and Coimage of Points and Lines
In the case of the line L, the preimage is a 2D subspace,
characterized by the 1D coimage given by the span of its
normal vector ` ∈ R3 . All points of the preimage, and hence all
points x of the image of L are orthogonal to `:
Perspective Projection
Prof. Daniel Cremers
Historic Remarks
`> x = 0.
Mathematical
Representation
The space of all vectors orthogonal to ` is spanned by the row
vectors of b
`, thus we have:
Intrinsic Parameters
Spherical Projection
Radial Distortion
P = span(b
`).
Preimage and
Coimage
In the case that x is the image of a point p, the preimage is a
line and the coimage is the plane orthogonal to x, i.e. it is
spanned by the rows of the matrix xb.
Projective Geometry
In summary we have the following table:
Image
Preimage
Coimage
Point
span(x) ∩ im. plane
span(x) ⊂ R3
b ) ⊂ R3
span(x
Line
span(b
`) ∩ im. plane
span(b
`) ⊂ R3
span(`) ⊂ R3
updated May 20, 2019 21/24
Perspective Projection
Summary
Prof. Daniel Cremers
In this part of the lecture, we studied the perspective projection
which takes us from the 3D (4D) camera coordinates to 2D
camera image coordinates and pixel coordinates. In
homogeneous coordinates, we have the transformations:
g∈SE(3)
K Π
0
f
4D World coordinates −→ 4D Camera coordinates −→
Ks
3D image coordinates −→ 3D pixel coordinates.
In particular, we can summarize the (intrinsic) camera
parameters in the matrix
K = Ks Kf .
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
The full transformation from world coordinates X 0 to pixel
coordinates x 0 is given by:
λx 0 = K Π0 g X 0 .
Moreover, for the images of points and lines we introduced the
notions of preimage (maximal point set which is consistent with
a given image) and coimage (its orthogonal complement). Both
can be used equivalently to the image.
updated May 20, 2019 22/24
Projective Geometry
Perspective Projection
Prof. Daniel Cremers
In order to formally write transformations by linear operations,
we made extensive use of homogeneous coordinates to
represent a 3D point as a 4D-vector (X , Y , Z , 1) with the last
coordinate fixed to 1. This normalization is not always
necessary: One can represent 3D points by a general 4D
vector
X = (XW , YW , ZW , W ) ∈ R4 ,
Historic Remarks
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
remembering that merely the direction of this vector is of
importance. We therefore identify the point in homogeneous
coordinates with the line connecting it with the origin. This
leads to the definition of projective coordinates.
An n-dimensional projective space Pn is the set of all
one-dimensional subspaces (i.e. lines through the origin) of the
vector space Rn+1 . A point p ∈ Pn can then be assigned
homogeneous coordinates X = (x1 , . . . , xn+1 )> , among which
at least one x is nonzero. For any nonzero λ ∈ R, the
coordinates Y = (λx1 , . . . , λxn+1 )> represent the same point p.
Preimage and
Coimage
Projective Geometry
updated May 20, 2019 23/24
Perspective Projection
Projective Geometry
Prof. Daniel Cremers
If the two coordinate vectors X and Y differ by a scalar factor,
then they are said to be equivalent:
X ∼ Y.
Historic Remarks
The point p is represented by the equivalence class of all
multiples of X . Since all points are represented by lines
through the origin, there exist two alternative representations
for the two-dimensional projective space P2 :
1
2
One can represent each point as a point on the 2D-sphere
S2 , where any antipodal points represent the same line.
Mathematical
Representation
Intrinsic Parameters
Spherical Projection
Radial Distortion
Preimage and
Coimage
Projective Geometry
One can represent each point p either as a point on the
plane of R2 (homogeneous coordinates) modeling all
points with non-zero z-component, or as a point on the
circle S1 (again identifying antipodal points) which is
equivalent to P1 .
Both representations hold for the n-dimensional projective
space Pn , which can be either seen as a an nD-sphere Sn or as
Rn with Pn−1 attached (to model lines at infinity).
updated May 20, 2019 24/24
Estimating Point
Correspondence
Prof. Daniel Cremers
Chapter 4
Estimating Point Correspondence
Multiple View Geometry
Summer 2019
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
Prof. Daniel Cremers
Chair for Computer Vision and Artificial Intelligence
Departments of Informatics & Mathematics
Technical University of Munich
updated May 23, 2019 1/22
Overview
Estimating Point
Correspondence
Prof. Daniel Cremers
1 From Photometry to Geometry
From Photometry to
Geometry
2 Small Deformation & Optical Flow
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
3 The Lucas-Kanade Method
Feature Point
Extraction
Wide Baseline
Matching
4 Feature Point Extraction
5 Wide Baseline Matching
updated May 23, 2019 2/22
From Photometry to Geometry
In the last sections, we discussed how points and lines are
transformed from 3D world coordinates to 2D image and pixel
coordinates.
In practice, we do not actually observe points or lines, but
rather brightness or color values at the individual pixels. In
order to transfer from this photometric representation to a
geometric representation of the scene, one can identify points
with characteristic image features and try to associate these
points with corresponding points in the other frames.
The matching of corresponding points will allow us to infer 3D
structure. Nevertheless, one should keep in mind that this
approach is suboptimal: By selecting a small number of feature
points from each image, we throw away a large amount of
potentially useful information contained in each image. Yet,
retaining all image information is computationally challenging.
The selection and matching of a small number of feature
points, on the other hand, allows tracking of 3D objects from a
moving camera in real time - even with limited processing
power.
Estimating Point
Correspondence
Prof. Daniel Cremers
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
updated May 23, 2019 3/22
Estimating Point
Correspondence
Example of Tracking
Prof. Daniel Cremers
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Input frame 1
Input frame 2
Feature Point
Extraction
Wide Baseline
Matching
Wire frame reconstruction
with texture map
updated May 23, 2019 4/22
Estimating Point
Correspondence
Example of Tracking
Prof. Daniel Cremers
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
Tracked input sequence
Textured reconstruction
updated May 23, 2019 5/22
Estimating Point
Correspondence
Identifying Corresponding Points
Prof. Daniel Cremers
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
Input frame 1
Input frame 2
To identify corresponding points in two or more images is one
of the biggest challenges in computer vision. Which of the
points identified in the left image corresponds to which point in
the right one?
updated May 23, 2019 6/22
Estimating Point
Correspondence
Non-rigid Deformation
Prof. Daniel Cremers
In what follows we will assume that objects move rigidly.
However, in general, objects may also deform non-rigidly.
Moreover, there may be partial occlusions:
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
Image 1
Image 2
Registration
Cremers, Guetter, Xu, CVPR ’06
updated May 23, 2019 7/22
Small Deformation versus Wide Baseline
Estimating Point
Correspondence
Prof. Daniel Cremers
In point matching one distinguishes two cases:
• Small deformation: The deformation from one frame to the
other is assumed to be (infinitesimally) small. In this case
the displacement from one frame to the other can be
estimated by classical optic flow estimation, for example
using the methods of Lucas/Kanade or Horn/Schunck. In
particular, these methods allow to model dense
deformation fields (giving a displacement for every pixel in
the image). But one can also track the displacement of a
few feature points which is typically faster.
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
• Wide baseline stereo: In this case the displacement is
assumed to be large. A dense matching of all points to all
is in general computationally infeasible. Therefore, one
typically selects a small number of feature points in each
of the images and develops efficient methods to find an
appropriate pairing of points.
updated May 23, 2019 8/22
Small Deformation
The transformation of all points of a rigidly moving object is
given by:
x2 = h(x1 ) =
Estimating Point
Correspondence
Prof. Daniel Cremers
1
(Rλ1 (X ) x1 + T ) .
λ2 (X )
Locally this motion can be approximated in several ways.
• Translational model:
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
h(x) = x + b.
Wide Baseline
Matching
• Affine model:
h(x) = Ax + b.
The 2D affine model can also be written as:
h(x) = x + u(x)
with
u(x) = S(x)p =
x y 1 0 0 0
0 0 0 x y 1
(p1 , p2 , p3 , p4 , p5 , p6 )> .
updated May 23, 2019 9/22
Optic Flow Estimation
Estimating Point
Correspondence
Prof. Daniel Cremers
The optic flow refers to the apparent 2D motion field
observable between consecutive images of a video. It is
different from the motion of objects in the scene, in the extreme
case of motion along the camera axis, for example, there is no
optic flow, while on the other hand camera rotation generates
an optic flow field even for entirely static scenes.
In 1981, two seminal works on optic flow estimation were
published, namely the works of Lucas & Kanade, and of Horn
& Schunck. Both methods have become very influential with
thousands of citations. They are complementary in the sense
that the Lucas-Kanade method generates sparse flow vectors
under the assumption of constant motion in a local
neighborhood, whereas the Horn-Schunck method generates a
dense flow field under the assumption of spatially smooth
motion. Despite more than 30 years of research, the estimation
of optic flow fields is still a highly active research direction.
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
Due to its simplicity, we will review the Lucas-Kanade method.
updated May 23, 2019 10/22
Estimating Point
Correspondence
The Lucas-Kanade Method
• Brightness Constancy Assumption: Let x(t) denote a
moving point at time t, and I(x, t) a video sequence, then:
I(x(t), t) = const.
∀t,
i.e. the brightness of point x(t) is constant. Therefore the
total time derivative must be zero:
d
dx
∂I
>
I(x(t), t) = ∇I
+
= 0.
dt
dt
∂t
This constraint is often called the (differential) optical flow
constraint. The desired local flow vector (velocity) is given
by v = dx
dt .
• Constant motion in a neighborhood: Since the above
equation cannot be solved for v , one assumes that v is
constant over a neighborhood W (x) of the point x:
∇I(x 0 , t)> v +
∂I 0
(x , t) = 0
∂t
Prof. Daniel Cremers
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
∀x 0 ∈ W (x).
updated May 23, 2019 11/22
Estimating Point
Correspondence
The Lucas-Kanade Method
The brightness is typically not exactly constant and the velocity
is typically not exactly the same for the local neighborhood.
Lucas and Kanade (1981) therefore compute the best velocity
vector v for the point x by minimizing the least squares error
Z
2
E(v ) =
∇I(x 0 , t)> v + It (x 0 , t) dx 0 .
W (x)
Prof. Daniel Cremers
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Expanding the terms and setting the derivative to zero one
obtains:
dE
= 2Mv + 2q = 0,
dv
with
Z
Z
>
0
M=
∇I ∇I dx , and q =
It ∇I dx 0 .
W (x)
Feature Point
Extraction
Wide Baseline
Matching
W (x)
If M is invertible, i.e. det(M) 6= 0, then the solution is
v = −M −1 q.
updated May 23, 2019 12/22
Estimating Point
Correspondence
Estimating Local Displacements
Prof. Daniel Cremers
• Translational motion: Lucas & Kanade ’81:
Z
2
E(b) =
∇I > b + It dx 0 → min .
W (x)
From Photometry to
Geometry
Small Deformation &
Optical Flow
dE
=0
db
⇒
The Lucas-Kanade
Method
b = ···
Feature Point
Extraction
Wide Baseline
Matching
• Affine motion:
Z
E(p) =
2
∇I(x 0 )> S(x 0 ) p + It (x 0 ) dx 0
W (x)
dE
=0
dp
⇒
p = ···
updated May 23, 2019 13/22
When can Small Motion be Estimated?
Estimating Point
Correspondence
Prof. Daniel Cremers
In the formalism of Lucas and Kanade, one cannot always
estimate a translational motion. This problem is often referred
to as the aperture problem. It arises for example, if the region
in the window W (x) around the point x has entirely constant
intensity (for example a white wall), because then ∇I(x) = 0
and It (x) = 0 for all points in the window.
In order for the solution of b to be unique the structure tensor
Z 2
Ix
Ix Iy
M(x) =
dx 0 .
Ix Iy Iy2
W (x)
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
needs to be invertible. That means that we must have
det M 6= 0.
If the structure tensor is not invertible but not zero, then one
can estimate the normal motion, i.e. the motion in direction of
the image gradient.
For those points with det M(x) 6= 0, we can compute a motion
vector b(x). This leads to the following simple feature tracker.
updated May 23, 2019 14/22
A Simple Feature Tracking Algorithm
Feature tracking over a sequence of images can now be done
as follows:
• For a given time instance t, compute at each point x ∈ Ω
the structure tensor
Z 2
Ix
Ix Iy
M(x) =
dx 0 .
Ix Iy Iy2
W (x)
• Mark all points x ∈ Ω for which the determinant of M is
larger than a threshold θ > 0:
Estimating Point
Correspondence
Prof. Daniel Cremers
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
det M(x) ≥ θ.
• For all these points the local velocity is given by:
R
0
−1
R Ix It dx 0 .
b(x, t) = −M(x)
Iy It dx
• Repeat the above steps for the points x + b at time t + 1.
updated May 23, 2019 15/22
Estimating Point
Correspondence
Robust Feature Point Extraction
Even det M(x) 6= 0 does not guarantee robust estimates of
velocity — the inverse of M(x) may not be very stable if, for
example, the determinant of M is very small. Thus locations
with det M 6= 0 are not always reliable features for tracking.
One of the classical feature detectors was developed by
Moravec ’80, Förstner ’84, ’87 and Harris & Stephens ’88.
Prof. Daniel Cremers
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
It is based on the structure tensor
2
Z
Ix
M(x) ≡ Gσ ∗ ∇I∇I > = Gσ (x − x 0 )
Ix Iy
Feature Point
Extraction
Ix Iy
(x 0 ) dx 0 ,
Iy2
Wide Baseline
Matching
where rather than simple summing over the window W (x) we
perform a summation weighted by a Gaussian G of width σ.
Harris and Stephens propose the following expression:
C(x) = det(M) − κ trace2 (M),
and select points for which C(x) > θ with a threshold θ > 0.
updated May 23, 2019 16/22
Response of Förstner Detector
Estimating Point
Correspondence
Prof. Daniel Cremers
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
updated May 23, 2019 17/22
Wide Baseline Matching
Estimating Point
Correspondence
Prof. Daniel Cremers
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
Corresponding points and regions may look very different in
different views. Determining correspondence is a challenge.
In the case of wide baseline matching, large parts of the image
plane will not match at all because they are not visible in the
other image. In other words, while a given point may have
many potential matches, quite possibly it does not have a
corresponding point in the other image.
updated May 23, 2019 18/22
Extensions to Larger Baseline
Estimating Point
Correspondence
Prof. Daniel Cremers
One of the limitations of tracking features frame by frame is
that small errors in the motion accumulate over time and the
window gradually moves away from the point that was originally
tracked. This is known as drift.
A remedy is to match a given point back to the first frame. This
generally implies larger displacements between frames.
Two aspects matter when extending the above simple feature
tracking method to somewhat larger displacements:
• Since the motion of the window between frames is (in
general) no longer translational, one needs to generalize
the motion model for the window W (x), for example by
using an affine motion model.
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
• Since the illumination will change over time (especially
when comparing more distant frames), one can replace
the sum-of-squared-differences by the normalized cross
correlation which is more robust to illumination changes.
updated May 23, 2019 19/22
Estimating Point
Correspondence
Normalized Cross Correlation
The normalized cross correlation is defined as:
Z
I1 (x 0 ) − I¯1 I2 h(x 0 ) − I¯2 dx 0
W (x)
NCC(h) = sZ
,
Z
2
2
I1 (x 0 ) − I¯1 dx 0
I2 h(x 0 ) − I¯2 dx 0
W (x)
W (x)
where I¯1 and I¯2 are the average intensity over the window
W (x). By subtracting this average intensity, the measure
becomes invariant to additive intensity changes I → I + γ.
Prof. Daniel Cremers
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
Dividing by the intensity variances of each window makes the
measure invariant to multiplicative changes I → γI.
If we stack the normalized intensity values of respective
windows into one vector, vi ≡ vec(Ii − I¯i ), then the normalized
cross correlation is the cosine of the angle between them:
NCC(h) = cos ∠(v1 , v2 ).
updated May 23, 2019 20/22
Special Case: Optimal Affine Transformation
The normalized cross correlation can be used to determine the
optimal affine transformation between two given patches.
Since the affine transformation is given by:
h(x) = A x + d,
we need to maximize the cross correlation with respect to the
2 × 2-matrix A and the displacement d:
Â, d̂ = arg max NCC(A, d),
A,d
Estimating Point
Correspondence
Prof. Daniel Cremers
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
where
Z I1 (x 0 ) − Ī1 I2 Ax 0 + d) − Ī2 dx 0
W (x)
NCC(A, d) = v Z
,
Z 2
2
u
u
0
0
0
0
I1 (x ) − Ī1 dx
I2 Ax + d) − Ī2 dx
t
W (x)
W (x)
Efficiently finding appropriate optima, however, is a challenge.
updated May 23, 2019 21/22
Optical Flow Estimation with Deep Neural Networks
Estimating Point
Correspondence
Prof. Daniel Cremers
There exist numerous algorithms to estimate correspondence
across images. Over the last years, neural networks have
become popular for estimating correspondence.
From Photometry to
Geometry
Small Deformation &
Optical Flow
The Lucas-Kanade
Method
Feature Point
Extraction
Wide Baseline
Matching
Dosovitskiy, Fischer, Ilg, Haeusser, Hazirbas, Golkov, van der
Smagt, Cremers and Brox, “FlowNet: Learning Optical Flow
with Convolutional Networks”, ICCV 2015.
updated May 23, 2019 22/22
Reconstruction from
Two Views: Linear
Algorithms
Chapter 5
Reconstruction from Two Views:
Linear Algorithms
Multiple View Geometry
Summer 2019
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
Prof. Daniel Cremers
Chair for Computer Vision and Artificial Intelligence
Departments of Informatics & Mathematics
Technical University of Munich
updated May 23, 2019 1/27
Overview
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
1 The Reconstruction Problem
2 The Epipolar Constraint
The Reconstruction
Problem
The Epipolar
Constraint
3 Eight-Point Algorithm
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
4 Structure Reconstruction
The Uncalibrated Case
5 Four-Point Algorithm
6 The Uncalibrated Case
updated May 23, 2019 2/27
Problem Formulation
Reconstruction from
Two Views: Linear
Algorithms
In the last sections, we discussed how to identify point
correspondences between two consecutive frames. In this
section, we will tackle the next problem, namely that of
reconstructing the 3D geometry of cameras and points.
Prof. Daniel Cremers
To this end, we will make the following assumptions:
The Reconstruction
Problem
• We assume that we are given a set of corresponding
points in two frames taken with the same camera from
different vantage points.
• We assume that the scene is static, i.e. none of the
observed 3D points moved during the camera motion.
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
• We also assume that the intrinsic camera (calibration)
parameters are known.
We will first estimate the camera motion from the set of
corresponding points. Once we know the relative location and
orientation of the cameras, we can reconstruct the 3D location
of all corresponding points by triangulation.
updated May 23, 2019 3/27
Problem Formulation
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
Goal: Estimate camera motion and 3D scene structure from
two views.
updated May 23, 2019 4/27
The Reconstruction Problem
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
In general 3D reconstruction is a challenging problem. If we are
given two views with 100 feature points in each of them, then
we have 200 point coordinates in 2D. The goal is to estimate
• 6 parameters modeling the camera motion R, T and
The Reconstruction
Problem
• 100 × 3 coordinates for the 3D points Xj .
The Epipolar
Constraint
Eight-Point Algorithm
This could be done by minimizing the projection error:
X j
E(R, T , X1 , . . . , X100 ) =
kx1 − π(Xj )k2 + kx2j − π(R, T , Xj )k2
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
j
This amounts to a difficult optimization problem called bundle
adjustment.
Before we look into this problem, we will first study an elegant
solution to entirely get rid of the 3D point coordinates. It leads
to the well-known 8-point algorithm.
updated May 23, 2019 5/27
Epipolar Geometry: Some Notation
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
The projections of a point X onto the two images are denoted
by x1 and x2 . The optical centers of each camera are denoted
by o1 and o2 . The intersections of the line (o1 , o2 ) with each
image plane are called the epipoles e1 and e2 . The
intersections between the epipolar plane (o1 , o2 , X ) and the
image planes are called epipolar lines l1 and l2 . There is one
epipolar plane for each 3D point X .
updated May 23, 2019 6/27
The Epipolar Constraint
We know that x1 (in homogeneous coordinates) is the
projection of a 3D point X . Given known camera parameters
(K = 1) and no rotation or translation of the first camera, we
merely have a projection with unknown depth λ1 . From the first
to the second frame we additionally have a camera rotation R
and translation T followed by a projection. This gives the
equations:
λ1 x1 = X ,
λ2 x2 = RX + T .
Inserting the first equation into the second, we get:
λ2 x2 = R(λ1 x1 ) + T .
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
b
Now we remove the translation by multiplying with T
b
(T v ≡ T × v ):
b x2 = λ1 T
b Rx1
λ2 T
And projection onto x2 gives the epipolar constraint:
b R x1 = 0
x2> T
updated May 23, 2019 7/27
Reconstruction from
Two Views: Linear
Algorithms
The Epipolar Constraint
The epipolar constraint
Prof. Daniel Cremers
b R x1 = 0
x2> T
provides a relation between the 2D point coordinates of a 3D
point in each of the two images and the camera transformation
parameters. The original 3D point coordinates have been
removed. The matrix
b R ∈ R3×3
E =T
is called the essential matrix. The epipolar constraint is also
known as essential constraint or bilinear constraint.
−−→
Geometrically, this constraint states that the three vectors o1 X ,
−
−
→
−−→
o2 o1 and o2 X form a plane, i.e. the triple product of these
vectors (measuring the volume of the parallelepiped) is zero: In
coordinates of the second frame Rx1 gives the direction of the
−−→
−−→
vector o1 X ; T gives the direction of o2 o1 , and x2 is proportional
−−→
to the vector o2 X such that
b R x1 = 0.
volume = x > (T × Rx1 ) = x > T
2
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
2
updated May 23, 2019 8/27
Properties of the Essential Matrix E
The space of all essential matrices is called the essential
space:
n
o
b R | R ∈ SO(3), T ∈ R3
E≡ T
⊂ R3×3
Theorem [Huang & Faugeras, 1989] (Characterization of the
essential matrix): A nonzero matrix E ∈ R3×3 is an essential
matrix if and only if E has a singular value decomposition
(SVD) E = UΣV > with
Σ = diag{σ, σ, 0}
for some σ > 0 and U, V ∈ SO(3).
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
Theorem (Pose recovery from the essential matrix): There
exist exactly two relative poses (R, T ) with R ∈ SO(3) and
T ∈ R3 corresponding to an essential matrix E ∈ E. For
E = UΣV > we have:
b1 , R1 ) = URZ (+ π )ΣU > , UR > (+ π )V > ,
(T
(1)
Z
2
2
>
>
>
π
π
b2 , R2 ) = URZ (− )ΣU , UR (− )V ,
(T
(2)
Z
2
2
In general, only one of these gives meaningful (positive) depth
values.
updated May 23, 2019 9/27
A Basic Reconstruction Algorithm
We have seen that the 2D-coordinates of each 3D point are
coupled to the camera parameters R and T through an
epipolar constraint. In the following, we will derive a 3D
reconstruction algorithm which proceeds as follows:
• Recover the essential matrix E from the epipolar
constraints associated with a set of point pairs.
• Extract the relative translation and rotation from the
essential matrix E.
In general, the matrix E recovered from a set of epipolar
constraints will not be an essential matrix. One can resolve this
problem in two ways:
1
2
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
Recover some matrix E ∈ R3×3 from the epipolar
constraints and then project it onto the essential space.
Optimize the epipolar constraints in the essential space.
While the second approach is in principle more accurate it
involves a nonlinear constrained optimization. We will pursue
the first approach which is simpler and faster.
updated May 23, 2019 10/27
The Eight-Point Linear Algorithm
First we rewrite the epipolar constraint as a scalar product in
the elements of the matrix E and the coordinates of the points
x 1 and x 2 . Let
E s = (e11 , e21 , e31 , e12 , e22 , e32 , e13 , e23 , e33 )>
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
∈ R9
be the vector of elements of E and
a ≡ x1 ⊗ x2
the Kronecker product of the vectors x i = (xi , yi , zi ), defined as
a = (x1 x2 , x1 y2 , x1 z2 , y1 x2 , y1 y2 , y1 z2 , z1 x2 , z1 y2 , z1 z2 )> ∈ R9 .
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
Then the epipolar constraint can be written as:
> s
x>
2 E x 1 = a E = 0.
For n point pairs, we can combine this into the linear system:
χE s = 0,
with χ = (a1 , a2 , . . . , an )> .
updated May 23, 2019 11/27
The Eight-Point Linear Algorithm
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
According to
χE s = 0,
with χ = (a1 , a2 , . . . , an )> .
we see that the vector of coefficients of the essential matrix E
defines the null space of the matrix χ. In order for the above
system to have a unique solution (up to a scaling factor and
ruling out the trivial solution E = 0), the rank of the matrix χ
needs to be exactly 8. Therefore we need at least 8 point pairs.
In certain degenerate cases, the solution for the essential
matrix is not unique even if we have 8 or more point pairs. One
such example is the case that all points lie on a line or on a
plane.
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
Clearly, we will not be able to recover the sign of E. Since with
each E, there are two possible assignments of rotation R and
translation T , we therefore end up with four possible solutions
for rotation and translation.
updated May 23, 2019 12/27
Projection onto Essential Space
The numerically estimated coefficients E s will in general not
correspond to an essential matrix. One can resolve this
problem by projecting it back to the essential space.
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
Theorem (Projection onto essential space): Let F ∈ R3×3 be
an arbitrary matrix with SVD
F = U diag{λ1 , λ2 , λ3 } V > , λ1 ≥ λ2 ≥ λ3 . Then the essential
matrix E which minimizes the Frobenius norm kF − Ek2f is
given by
E = U diag{σ, σ, 0} V > ,
with σ =
λ1 + λ2
.
2
updated May 23, 2019 13/27
Eight Point Algorithm (Longuet-Higgins ’81)
Given a set of n = 8 or more point pairs x i1 , x i2 :
• Compute an approximation of the essential matrix.
Construct the matrix χ = (a1 , a2 , . . . , an )> , where
ai = x i1 ⊗ x i2 . Find the vector E s ∈ R9 which minimizes
kχE s k as the ninth column of Vχ in the SVD χ = Uχ Σχ Vχ> .
Unstack E s into 3 × 3-matrix E.
• Project onto essential space. Compute the SVD
E = U diag{σ1 , σ2 , σ3 }V > . Since in the reconstruction, E
is only defined up to a scalar, we project E onto the
normalized essential space by replacing the singular
values σ1 , σ2 , σ3 with 1, 1, 0.
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
• Recover the displacement from the essential matrix. The
four possible solutions for rotation and translation are:
R = URZ> (± π2 )V > ,
b = URZ (± π )ΣU > ,
T
2
with a rotation by ± π2 around z:

0
RZ> (± π2 ) =  ∓1
0
±1
0
0

0
0 .
1
updated May 23, 2019 14/27
Do We Need Eight Points?
The above reasoning showed that we need at least eight points
in order for the matrix χ to have rank 8 and therefore guarantee
a unique solution for E. Yet, one can take into account the
special structure of E. The space of essential matrices is
actually a five-dimensional space, i.e. E only has 5 (and not 9)
degrees of freedom.
A simple way to take into account the algebraic properties of E
is to make use of the fact that det E = 0. If now we have only 7
point pairs, the null space of χ will have (at least) Dimension 2,
spanned by two vectors E1 and E2 . Then we can solve for E by
determining α such that:
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
det E = det(E1 + αE2 ) = 0.
Along similar lines, Kruppa proved in 1913 that one needs only
five point pairs to recover (R, T ). In the case of degenerate
motion (for example planar or circular motion), one can resolve
the problem with even fewer point pairs.
updated May 23, 2019 15/27
Limitations and Further Extensions
Among the four possible solutions for R and T , there is
generally only one meaningful one (which assigns positive
depth to all points).
The algorithm fails if the translation is exactly 0, since then
E = 0 and nothing can be recovered. Due to noise this
typically does not happen.
In the case of infinitesimal view point change, one can adapt
the eight point algorithm to the continuous motion case, where
the epipolar constraint is replaced by the continuous epipolar
constraint. Rather than recovering (R, T ) one recovers the
linear and angular velocity of the camera.
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
In the case of independently moving objects, one can
generalize the epipolar constraint. For two motions for example
we have:
>
(x >
2 E1 x 1 )(x 2 E2 x 1 ) = 0
with two essential matrices E1 and E2 . Given a sufficiently large
number of point pairs, one can solve the respective equations
for multiple essential matrices using polynomial factorization.
updated May 23, 2019 16/27
Reconstruction from
Two Views: Linear
Algorithms
Structure Reconstruction
The linear eight-point algorithm allowed us to estimate the
camera transformation parameters R and T from a set of
corresponding point pairs. Yet, the essential matrix E and
hence the translation T are only defined up to an arbitrary
scale γ ∈ R+ , with kEk = kT k = 1. After recovering R and T ,
we therefore have for point X j :
λj2 x j2
=
λj1 Rx j1
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
+ γT ,
j = 1, . . . , n,
with unknown scale parameters λji . We can eliminate one of
c
these scales by applying x j2 :
c
c
λj1 x j2 Rx j1 + γ x j2 T = 0,
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
j = 1, . . . , n.
This corresponds to n linear systems of the form
c
c
x j2 Rx j1 , x j2 T
j λ1
= 0. j = 1, . . . , n.
γ
updated May 23, 2019 17/27
Reconstruction from
Two Views: Linear
Algorithms
Structure Reconstruction
Combining the parameters ~λ = (λ11 , λ21 , . . . , λn1 , γ)> ∈ Rn+1 , we
get the linear equation system
Prof. Daniel Cremers
M ~λ = 0
The Reconstruction
Problem
with





M≡



c1 Rx 1
x
1
2
0
0
c2 Rx 2
x
1
2
0
0
0
0
0
0
0
0
0
0
..
.
0
0
0
0
0
0
n−1
x[
Rx n−1
2
1
0
0
n
c
x 2 Rx n1
c1 T
x
2
c2 T
x
2
..
.
[
n−1
x
T
2





.



The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
cn T
x
2
The linear least squares estimate for ~λ is given by the
eigenvector corresponding to the smallest eigenvalue of M > M.
It is only defined up to a global scale. It reflects the ambiguity
that the camera has moved twice the distance, the scene is
twice larger and twice as far away.
updated May 23, 2019 18/27
Reconstruction from
Two Views: Linear
Algorithms
Example
Prof. Daniel Cremers
The Reconstruction
Problem
Left image
right image
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
Reconstruction (Author: Jana Košecká)
updated May 23, 2019 19/27
Degenerate Configurations
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
The eight-point algorithm only provides unique solutions (up to
a scalar factor) if all 3D points are in a “general position”. This
is no longer the case for certain degenerate configurations, for
which all points lie on certain 2D surfaces which are called
critical surfaces.
Typically these critical surfaces are described by a quadratic
equation in the three point coordinates, such that they are
referred to as quadratic surfaces.
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
While most critical configurations do not actually arise in
practice, a specific degenerate configuration which does arise
often is the case that all points lie on a 2D plane (such as
floors, table, walls,...).
The Uncalibrated Case
For the structure-from-motion problem in the context of points
on a plane, one can exploit additional constraints which leads
to the so-called four-point algorithm.
updated May 23, 2019 20/27
Planar Homographies
Let us assume that all points lie on a plane. If X 1 ∈ R3 denotes
the point coordinates in the first frame, and these lie on a plane
with normal N ∈ S2 , then we have:
N >X 1 = d
⇔
1 >
N X 1 = 1.
d
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
The Reconstruction
Problem
In frame two, we therefore have the coordinates:
1
1
X 2 = RX 1 + T = RX 1 + T N > X 1 = R + TN > X 1 ≡ HX 1 ,
d
d
The Epipolar
Constraint
where
The Uncalibrated Case
1
TN > ∈ R3×3
d
is called a homography matrix. Inserting the 2D coordinates,
we get:
λ2 x 2 = Hλ1 x 1 ⇔
x 2 ∼ Hx 1 ,
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
H =R+
where ∼ means equality up to scaling. This expression is
called a planar homography. H depends on camera and plane
parameters.
updated May 23, 2019 21/27
From Point Pairs to Homography
For a pair of corresponding 2D points we therefore have
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
λ2 x 2 = Hλ1 x 1 .
c2 we can eliminate λ2 and obtain:
By multiplying with x
The Reconstruction
Problem
c2 Hx 1 = 0
x
This equation is called the planar epipolar constraint or planar
homography constraint.
Again, we can cast this equation into the form
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
a> H s = 0,
where we have stacked the elements of H into a vector
H s = (H11 , H21 , . . . , H33 )> ∈ R9 ,
and introduced the matrix
c2 ∈ R9×3 .
a ≡ x1 ⊗ x
updated May 23, 2019 22/27
Reconstruction from
Two Views: Linear
Algorithms
The Four Point Algorithm
Let us now assume we have n ≥ 4 pairs of corresponding 2D
points {x j1 , x j2 }, j = 1, . . . , n in the two images. Each point pair
induces a matrix aj , we integrate these into a larger matrix
χ ≡ (a1 , . . . , an )> ∈ R3n×9 ,
and obtain the system
s
χH = 0.
As in the case of the essential matrix, the homography matrix
can be estimated up to a scale factor.
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
This gives rise to the four point algorithm:
• For the point pairs, compute the matrix χ.
• Compute a solution H s for the above equation by singular
value decomposition of χ.
• Extract the motion parameters from the homography
matrix H = R + d1 TN > .
updated May 23, 2019 23/27
General Comments
Clearly, the derivation of the four-point algorithm is in close
analogy to that of the eight-point algorithm.
Rather than estimating the essential matrix E one estimates
the homography matrix H to derive R and T . In the four-point
algorithm, the homography matrix is decomposed into R, N
and T /d. In other words, one can reconstruct the normal of the
plane, but the translation is only obtained in units of the offset d
of the plane and the origin.
The 3D structure of the points can then be computed in the
same manner as before.
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
Since one uses the strong constraint that all points lie in a
plane, the four-point algorithm only requires four
correspondences.
There exist numerous relations between the essential matrix
b R and the corresponding homography matrix
E =T
H = R + Tu > with some u ∈ R3 , in particular:
b H,
E =T
H > E + E > H = 0.
updated May 23, 2019 24/27
The Case of an Uncalibrated Camera
The reconstruction algorithms introduced above all assume
that the camera is calibrated (K = 1). The general
transformation from a 3D point to the image is given by:
λx 0 = K Π0 g X = (KR, KT )X ,
with the intrinsic parameter matrix or calibration matrix:


fsx fsθ ox
K =  0 fsy oy  ∈ R3×3 .
0
0
1
Reconstruction from
Two Views: Linear
Algorithms
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
Four-Point Algorithm
The Uncalibrated Case
The calibration matrix maps metric coordinates into image
(pixel) coordinates, using the focal length f , the optical center
ox , oy , the pixel size sx , sy and a skew factor sθ . If these
parameters are known then one can simply transform the pixel
coordinates x 0 to normalized coordinates x = K −1 x 0 to obtain
the representation used in the previous sections. This amounts
to centering the coordinates with respect to the optical center
etc.
updated May 23, 2019 25/27
Reconstruction from
Two Views: Linear
Algorithms
The Fundamental Matrix
If the camera parameters K cannot be estimated in a
calibration procedure beforehand, then one has to deal with
reconstruction from uncalibrated views.
By transforming all image coordinates x 0 with the inverse
calibration matrix K −1 into metric coordinates x, we obtain the
epipolar constraint for uncalibrated cameras:
b
x>
2 T Rx 1 = 0
⇔
−> b
x 0>
T RK −1 x 01 = 0,
2 K
Prof. Daniel Cremers
The Reconstruction
Problem
The Epipolar
Constraint
Eight-Point Algorithm
Structure
Reconstruction
which can be written as
0
x 0>
2 F x 1 = 0,
Four-Point Algorithm
The Uncalibrated Case
with the fundamental matrix defined as:
b RK −1 = K −> EK −1 .
F ≡ K −> T
Since the invertible matrix K does not affect the rank of this
matrix, we know that F has an SVD F = UΣV > with
Σ = diag(σ1 , σ2 , 0). In fact, any matrix of rank 2 can be a
fundamental matrix.
updated May 23, 2019 26/27
Limitations
Reconstruction from
Two Views: Linear
Algorithms
While it is straight-forward to extend the eight-point algorithm,
such that one can extract a fundamental matrix from a set of
corresponding image points, it is less straight forward how to
proceed from there.
Prof. Daniel Cremers
Firstly, one cannot impose a strong constraint on the specific
structure of the fundamental matrix (apart from the fact that the
last singular value is zero).
The Reconstruction
Problem
Secondly, for a given fundamental matrix F , there does not
exist a finite number of decompositions into extrinsic
parameters R, T and intrinsic parameters K (even apart from
the global scale factor).
Structure
Reconstruction
The Epipolar
Constraint
Eight-Point Algorithm
Four-Point Algorithm
The Uncalibrated Case
As a consequence, one can only determine so-called
projective reconstructions, i.e. reconstructions of geometry and
camera position which are defined up to a so-called projective
transformation.
As a solution, one typically choses a canonical reconstruction
from the family of possible reconstructions.
updated May 23, 2019 27/27
Reconstruction from
Multiple Views
Prof. Daniel Cremers
Chapter 6
Reconstruction from Multiple Views
Multiple View Geometry
Summer 2019
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Prof. Daniel Cremers
Chair for Computer Vision and Artificial Intelligence
Departments of Informatics & Mathematics
Technical University of Munich
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 1/43
Overview
Reconstruction from
Multiple Views
Prof. Daniel Cremers
1 From Two Views to Multiple Views
2 Preimage & Coimage from Multiple Views
3 From Preimages to Rank Constraints
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
4 Geometric Interpretation
5 The Multiple-view Matrix
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
6 Relation to Epipolar Constraints
7 Multiple-View Reconstruction Algorithms
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
8 Multiple-View Reconstruction of Lines
updated May 23, 2019 2/43
Multiple-View Geometry
In this section, we deal with the problem of 3D reconstruction
given multiple views of a static scene, either obtained
simultaneously, or sequentially from a moving camera.
The key idea is that the three-view scenario allows to obtain
more measurements to infer the same number of 3D
coordinates. For example, given two views of a single 3D point,
we have four measurements (x- and y-coordinate in each
view), while the three-view case provides 6 measurements per
point correspondence. As a consequence, the estimation of
motion and structure will generally be more constrained when
reverting to additional views.
The three-view case has traditionally been addressed by the
so-called trifocal tensor [Hartley ’95, Vieville ’93] which
generalizes the fundamental matrix. This tensor – as the
fundamental matrix – does not depend on the scene structure
but rather on the inter-frame camera motion. It captures a
trilinear relationship between three views of the same 3D point
or line [Liu, Huang ’86, Spetsakis, Aloimonos ’87].
Reconstruction from
Multiple Views
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 3/43
Trifocal Tensor versus Multiview Matrices
Reconstruction from
Multiple Views
Prof. Daniel Cremers
Traditionally the trilinear relations were captured by
generalizing the concept of the Fundamental Matrix to that of a
Trifocal Tensor. It was developed among others by [Liu and
Huang ’86], [Spetsakis, Aloimonos ’87]. The use of tensors
was promoted by [Vieville ’93] and [Hartley ’95]. Bilinear,
trilinear and quadrilinear constraints were formulated in [Triggs
’95]. This line of work is summarized in the books:
Faugeras and Luong, “The Geometry of Multiple Views”, 2001
and
Hartley and Zisserman, “Multiple View Geometry”, 2001, 2003.
In the following, however, we stick with a matrix notation for the
multiview scenario. This approach makes use of matrices and
rank constraints on these matrices to impose the constraints
from multiple views. Such rank constraints were used by many
authors, among others in [Triggs ’95] and in [Heyden, Åström
’97]. This line of work is summarized in the book
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
Ma, Soatto, Kosecka, Sastry, “An Invitation to 3D Vision”, 2004.
updated May 23, 2019 4/43
Preimage from Multiple Views
Reconstruction from
Multiple Views
Prof. Daniel Cremers
A preimage of multiple images of a point or a line is the
(largest) set of 3D points that gives rise to the same set of
multiple images of the point or the line.
For example, given the two images `1 and `2 of a line L, the
preimage of these two images is the intersection of the planes
P1 and P2 , i.e. exactly the 3D line L = P1 ∩ P2 .
In general, the preimage of multiple images of points and lines
can be defined by the intersection:
preimage(x 1 , . . . , x m ) = preimage(x 1 ) ∩ · · · ∩ preimage(x m ),
preimage(`1 , . . . , `m ) = preimage(`1 ) ∩ · · · ∩ preimage(`m ).
The above definition allows us to compute preimages for any
set of image points or lines. The preimage of multiple image
lines, for example, can be either an empty set, a point, a line or
a plane, depending on whether or not they come from the
same line in space.
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 5/43
Preimage and Coimage of Points and Lines
Reconstruction from
Multiple Views
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Images of a point p on a line L:
• Preimages P1 and P2 of the image lines should intersect in
the line L.
• Preimages of the two image points x 1 and x 2 should
intersect in the point p.
• Normals `1 and `2 define the coimages of the line L.
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 6/43
Reconstruction from
Multiple Views
Preimage and Coimage of Points and Lines
Prof. Daniel Cremers
For a moving camera at time t, let x(t) denote the image
coordinates of a 3D point X in homogeneous coordinates:
λ(t)x(t) = K (t)Π0 g(t)X ,
where λ(t) denotes the depth of the point, K (t) denotes the
intrinsic parameters, Π0 the generic projection, and
R(t) T (t)
g(t) =
∈ SE(3),
0
1
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
denotes the rigid body motion at time t.
Let us consider a 3D line L in homogeneous coordinates:
L = {X | X = X 0 + µV , µ ∈ R}
From Two Views to
Multiple Views
⊂ R4 ,
where X 0 = [X0 , Y0 , Z0 , 1]> ∈ R4 are the coordinates of the
base point p0 and V = [V1 , V2 , V3 , 0]> ∈ R4 is a nonzero vector
indicating the line direction.
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 7/43
Preimage and Coimage of Points and Lines
Reconstruction from
Multiple Views
Prof. Daniel Cremers
The preimage of L with respect to the image at time t is a plane
ˆ The vector `(t) is
P with normal `(t), where P = span(`).
orthogonal to all points x(t) of the line:
`(t)> x(t) = `(t)> K (t)Π0 g(t)X = 0.
Assume we are given a set of m images at times t1 , . . . , tm
where
λi = λ(ti ), x i = x(ti ), `i = `(ti ), Πi = K (ti )Π0 g(ti ).
With this notation, we can relate the i-th image of a point p to
its world coordinates X :
λi x i = Πi X ,
and the i-th coimage of a line L to its world coordinates (X 0 , V ):
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
>
`>
i Πi X 0 = `i Πi V = 0.
updated May 23, 2019 8/43
From Preimages to Rank Constraints
Reconstruction from
Multiple Views
Prof. Daniel Cremers
The above equations contain the 3D parameters of points and
lines as unknowns. As in the two-view case, we wish to
eliminate these unknowns so as to obtain relationships
between the 2D projections and the camera parameters.
In the two-view case an elimination of the 3D coordinates lead
to the epipolar constraint for the essential matrix E or (in the
uncalibrated case) the fundamental matrix F . The 3D
coordinates (depth values λi associated with each point) could
subsequently obtained from another constraint.
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
There exist different ways to eliminate the 3D parameters
leading to different kinds of constraints which have been
studied in Computer Vision.
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
A systematic elimination of these constraints will lead to a
complete set of conditions.
updated May 23, 2019 9/43
Reconstruction from
Multiple Views
Point Features
Prof. Daniel Cremers
Consider images of a 3D point X seen in multiple views:


 

x1 0 · · · 0
λ1
Π1

 

 0 x2 0
0 

  λ2   Π2 
I ~λ ≡  .
=  .  X ≡ ΠX ,



.
.
.
.
..
..
..   ..   .. 
 ..
0
0 · · · xm
λm
Πm
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
which is of the form
I ~λ = ΠX ,
where ~λ ∈ Rm is the depth scale vector, and Π ∈ R3m×4 the
multiple-view projection matrix associated with the image
matrix I ∈ R3m×m .
Note that apart from the 2D coordinates I, everything else in
the above equations is unknown. As in the two-view case, the
goal is to decouple the above equation into constraints which
allow to separately recover the camera displacements Πi on
one hand and the scene structure λi and X on the other hand.
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 10/43
Reconstruction from
Multiple Views
Point Features
Prof. Daniel Cremers
Every column of I lies in a four-dimensional space spanned by
columns of the matrix Π. In order to have a solution to the
above equation, the columns of I and Π must therefore be
linearly dependent. In other words, the matrix


Π1 x 1 0 · · · 0
 Π2 0 x 2 0
0 


3m×(m+4)
Np ≡ (Π, I) =  .
.
.
..  ∈ R
.
..
..
..
 ..
. 
Πm 0
0 · · · xm
must have a nontrivial right null space. For m ≥ 2 (i.e.
3m ≥ m + 4), full rank would be m + 4. Linear dependence of
columns therefore implies the rank constraint:
rank(Np ) ≤ m + 3.
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
In fact, the vector u ≡ (X > , −~λ> )> ∈ Rm+4 is in the right null
space, as Np u = 0.
updated May 23, 2019 11/43
Reconstruction from
Multiple Views
Point Features
Prof. Daniel Cremers
For a more compact formulation of the above rank constraint,
we introduce the matrix


c1 0 · · · 0
x
 0 x
c2 · · · 0 


3m×3m
I⊥ ≡  .
,
.. . .
..  ∈ R
 ..
.
.
. 
0
0 · · · xcm
which has the property of “annihilating” I:
I ⊥ I = 0,
we can premultiply the above equation to obtain
⊥
I ΠX = 0.
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 12/43
Reconstruction from
Multiple Views
Point Features
Prof. Daniel Cremers
Thus the vector X is in the null space of the matrix


c1 Π1
x
 x

 c2 Π2 
Wp ≡ I ⊥ Π = 
 ∈ R3m×4 .
..


.
xcm Πm
To have a nontrivial solution, we must have
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
rank(Wp ) ≤ 3.
If all images x i are from a single 3D point X , then the matrix
Wp should only have a one-dimensional null space.
Given m images x i ∈ R3 of a point p with respect to m camera
frames Πi , we must have the rank condition
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
rank(Wp ) = rank(Np ) − m ≤ 3.
updated May 23, 2019 13/43
Reconstruction from
Multiple Views
Line Features
Prof. Daniel Cremers
We can derive a similar rank constraint for lines. As we saw
above, for the coimages `i , i = 1, . . . , m of a line L spanned by
a base X 0 and a direction V we have:
>
`>
i Πi X 0 = `i Πi V = 0.
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
Therefore the matrix



Wl ≡ 

`>
1 Π1
`>
2 Π2
..
.
`>
m Πm
From Preimages to
Rank Constraints





Geometric
Interpretation
∈ Rm×4
must satisfy the rank constraint
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
rank(Wl ) ≤ 2,
since the null space of Wl contains the two vectors X 0 and V .
updated May 23, 2019 14/43
Rank Constraints: Geometric Interpretation
Reconstruction from
Multiple Views
Prof. Daniel Cremers
In the case of a point X , we had the equation


c1 Π1
x
 x

 c2 Π2 
Wp X = 0, with Wp = 
 ∈ R3m×4 .
..


.
xcm Πm
Since all matrices xbi have rank 2, the number of independent
rows in Wp is at most 2m. These rows define a set of 2m
planes. Since Wp X = 0, the point X is in the intersection of all
these planes. In order for the 2m planes to have a unique
intersection, we need to have rank(Wp ) = 3.
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 15/43
Rank Constraints: Geometric Interpretation
Reconstruction from
Multiple Views
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Preimage of two image points.
The rows of the matrix Wp correspond to the normal vectors of
four planes. The (nontrivial) rank constraint states that these
four planes have to intersect in a single point.
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 16/43
Rank Constraints: Geometric Interpretation
Reconstruction from
Multiple Views
Prof. Daniel Cremers
In the case of a line L in two views, we have the equation
> ` Π1
∈ R2×4 .
rank(Wl ) ≤ 2, with Wl = 1>
`2 Π2
Clearly, we already have rank(Wl ) ≤ 2, so there is no intrinsic
constraint on two images of a line: The preimage of two image
lines always contains a line. We shall see that this is no longer
true for three or more images of a line, then the above
constraint really becomes meaningful.
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 17/43
Rank Constraints: Geometric Interpretation
Reconstruction from
Multiple Views
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Preimage of two image lines.
For the case of a line observed from two images, the rank
constraint is always fulfilled. Geometrically this states that the
two preimages of each line always intersect in some 3D line.
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 18/43
Reconstruction from
Multiple Views
The Multiple-view Matrix of a Point
In the following, the rank constraints will be rewritten in a more
compact and transparent manner. Let us assume we have m
images, the first of which is in world coordinates. Then we have
projection matrices of the form
Π1 = [I, 0], Π2 = [R2 , T2 ], . . . , Πm = [Rm , Tm ]
∈ R3×4 ,
which model the projection of a point X into the individual
images.
In general for uncalibrated cameras (i.e. Ki 6= I), Ri will not be
an orthogonal rotation matrix but rather an arbitrary invertible
matrix.
Again, we define the matrix Wp as follows:


c1 Π1
x
 x

 c2 Π2 
Wp ≡ I ⊥ Π = 
 ∈ R3m×4 .
..


.
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
xcm Πm
updated May 23, 2019 19/43
Reconstruction from
Multiple Views
The Multiple-view Matrix of a Point
Prof. Daniel Cremers
The rank of the matrix Wp is not affected if we multiply by a
full-rank matrix Dp ∈ R4×5 as follows:

c1 x
c1
x
x
c2 R2 x
c1
 c

 x1 x1 0
x
c
R
x
=  3 3 c1


 0 0 1
..

.
xcm Πm
c1
xcm Rm x

c1 Π1
x
x
 c2 Π2
W p Dp =  .
 ..

0
c2 R2 x 1
x
c3 R3 x 1
x
..
.
xcm Rm x 1
This means that rank(Wp ) ≤ 3 if and only if the submatrix


c2 R2 x 1 x
c2 T2
x
 x
c3 T3 
 c3 R3 x 1 x

Mp ≡ 
 ∈ R3(m−1)×2
..
..


.
.
xcm Rm x 1
xcm Tm

0
c2 T2 
x

c3 T3 
x
.
.. 
. 
xcm Tm
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
has rank(Mp ) ≤ 1.
updated May 23, 2019 20/43
Reconstruction from
Multiple Views
The Multiple-view Matrix of a Point
Prof. Daniel Cremers
The matrix



Mp ≡ 

c2 R2 x 1
x
c3 R3 x 1
x
..
.
xcm Rm x 1
c2 T2
x
c3 T3
x
..
.
xcm Tm





∈ R3(m−1)×2
is called the multiple-view matrix associated with a point p. It
involves both the image x 1 in the first view and the coimages
xbi in the remaining views.
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
In summary:
Relation to Epipolar
Constraints
For multiple images of a point p the matrices Np , Wp and Mp
satisfy:
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
rank(Mp ) = rank(Wp ) − 2 = rank(Np ) − (m + 2) ≤ 1.
updated May 23, 2019 21/43
Multiview Matrix: Geometric Interpretation
Reconstruction from
Multiple Views
Let us look into the geometric information contained in the
multiple-view matrix


c2 R2 x 1 x
c2 T2
x
 x
c3 T3 
 c3 R3 x 1 x

Mp ≡ 
 ∈ R3(m−1)×2 .
..
..


.
.
xcm Rm x 1 xcm Tm
Prof. Daniel Cremers
The constraint rank(Mp ) ≤ 1 implies that the two columns are
linearly dependent. In fact we have
λ1 xbi Ri x 1 + xbi Ti = 0, i = 2, . . . , m which yields
λ1
Mp
= 0.
1
Geometric
Interpretation
Therefore the coefficient capturing the linear dependence is
simply the distance λ1 of the point p from the first camera
center. In other words, the multiple-view matrix captures
exactly the information about a point p that is missing from a
single image, but encoded in multiple images.
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 22/43
Reconstruction from
Multiple Views
Relation to Epipolar Constraints
Prof. Daniel Cremers
For the multiple-view matrix

c2 R2 x 1 x
c2 T2
x
 x
c
c3 T3
R
x
x
 3 3 1
Mp ≡ 
..
..

.
.
xcm Rm x 1 xcm Tm





∈ R3(m−1)×2 .
to have rank(Mp ) = 1, it is necessary that the pair of vectors
xbi Ti and xbi Ri x 1 to be linearly dependent for all i = 2, . . . , m.
This gives the epipolar constraints
b
x>
i Ti Ri x 1 = 0
between the first and the i-th image. (Proof see next slide)
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
Yet, we shall see that the multiview constraint provides more
information than the pairwise epipolar constraints.
updated May 23, 2019 23/43
Relation to Epipolar Constraints
In the previous slide, we claimed that the linear dependence of
xbi Ti and xbi Ri x 1 gives rise to the epipolar constraint
x>
i T̂i Ri x 1 = 0. In the following, we shall give a proof of this
statement which provides an intuitive geometric understanding
of this relationship.
Assume the two vectors xbi Ti and xbi Ri x 1 are dependent, i.e.
there is a scalar γ, such that
xbi Ti = γ xbi Ri x 1 .
Reconstruction from
Multiple Views
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
Since xbi Ti ≡ x i × Ti is proportional to the normal on the plane
spanned by x i and Ti , and xbi Ri x 1 is proportional to the normal
spanned by x i and Ri x 1 , the linear dependence is equivalent
to saying that the three vectors x i , Ti and Ri x 1 are coplanar.
The Multiple-view
Matrix
This again is equivalent to saying that the vector x i is
orthogonal to the normal on the plane spanned by the vectors
Ti and Ri x 1 , i.e.
Multiple-View
Reconstruction of
Lines
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
>
x>
i (Ti × Ri x 1 ) = x i T̂i Ri x 1 = 0.
updated May 23, 2019 24/43
Analysis of the Multiple-view Constraint
Reconstruction from
Multiple Views
Prof. Daniel Cremers
For any nonzero vectors ai , bi

a1 b1
 a2 b2

 ..
..
 .
.
an
∈ R3 , i = 1, 2, . . . , n, the matrix





∈ R3n×2
bn
is rank-deficient if and only if ai bj> − bi aj> = 0 for all
i, j = 1, . . . , n. We will not prove this statement. Applied to the
rank constraint on Mp we get:
xbi Ri x 1 (xbj Tj )> − xbi Ti (xbj Rj x 1 )> = 0,
which gives the trilinear constraint
>
> b
xbi (Ti x >
1 Rj − Ri x 1 Tj )x j = 0.
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
This is a matrix equation giving 3 × 3 = 9 scalar trilinear
equations, only four of which are linearly independent.
updated May 23, 2019 25/43
Reconstruction from
Multiple Views
Analysis of the Multiple-view Constraint
Prof. Daniel Cremers
From the equations
xbi Ri x 1 (xbj Tj )> − xbi Ti (xbj Rj x 1 )> = 0,
∀i, j,
we see that as long as the entries in xbj Tj and xbj Rj x 1 are
non-zero, it follows from the above, that the two vectors xbi Ri x 1
and xbi Ti are linearly dependent. If on the other hand
xbj Tj = xbj Rj x 1 = 0 for some view j, then we have the rare
degenerate case that the point p lies on the line through the
optical centers o1 and oj .
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
In other words: Except for degeneracies, the bilinear (epipolar)
constraints relating two views are already contained in the
trilinear constraints obtained for the multiview scenario.
Relation to Epipolar
Constraints
Note that the equivalence between the bilinear and trilinear
constraints on one hand and the condition that rank(Mp ) ≤ 1 on
the other only holds if the vectors in Mp are nonzero. In certain
degenerate cases this is is not fulfilled.
Multiple-View
Reconstruction of
Lines
Multiple-View
Reconstruction
Algorithms
updated May 23, 2019 26/43
Reconstruction from
Multiple Views
Uniqueness of the Preimage
We will now clarify how the bilinear and trilinear constraints
help to assure the uniqueness of the preimage of a point
observed in three images.
Let x 1 , x 2 , x 3 ∈ R3 be the 2D point coordinates in three camera
frames with distinct optical centers. If the three images satisfy
the pairwise epipolar constraints
c
x>
i Tij Rij x j = 0,
i, j = 1, 2, 3,
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
then a unique preimage is determined except if the three lines
associated to image points x 1 , x 2 , x 3 are coplanar. Here Tij
and Rij refer to the transition between frames i and j.
Similarly, if these vectors satisfy all trilinear constraints
>
> c
xbj (Tji x >
i Rki − Rji x i Tki )x k = 0,
i, j, k = 1, 2, 3,
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
then a unique preimage is determined unless the three lines
associated to image points x 1 , x 2 , x 3 are colinear.
We will not prove these statements.
updated May 23, 2019 27/43
Degeneracies for the Bilinear Constraints
Reconstruction from
Multiple Views
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
In the above example, the point p lies in the plane spanned by
the three optical centers which is also called the trifocal plane.
In this case, all pairs of lines do intersect, yet it does not imply
a unique 3D point p (a unique preimage). In practice this
degenerate case arises rather seldom.
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 28/43
Degeneracies for the Bilinear Constraints
Reconstruction from
Multiple Views
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
In the above example, the optical centers lie on a straight line
(rectilinear motion). Again, all pairs of lines may intersect
without there being a unique preimage p.
This case is frequent in applications when the camera moves
in a straight line (e.g. a car on a highway). Then the epipolar
constraints will not allow a unique reconstruction.
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
Fortunately, the trilinear constraint assures a unique preimage
(unless p is also on the same line with the optical centers).
updated May 23, 2019 29/43
Uniqueness of the Preimage
Reconstruction from
Multiple Views
Prof. Daniel Cremers
Using the multiple-view matrix we obtain a more general and
simpler characterization regarding the uniqueness of the
preimage:
Given m vectors representing the m images of a point in m
views, they correspond to the same point in the 3D space if the
rank of the Mp matrix relative to any of the camera frames is
one. If the rank is zero, the point is determined up to the line on
which all the camera centers must lie.
In summary we get:
rank(Mp ) = 2 ⇒ no point correspondence & empty preimage
rank(Mp ) = 1 ⇒ point correspondence & unique preimage
rank(Mp ) = 0 ⇒ point correspondence & preimage not unique
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
With these constraints we could decide which features to match
for establishing point correspondence over multiple frames.
updated May 23, 2019 30/43
Multiple-view Factorization of Point Features
Reconstruction from
Multiple Views
Prof. Daniel Cremers
The rank condition on the multiple-view matrix captures all the
constraints among multiple images of a point. In principle, one
could perform reconstruction by maximizing some global
objective function subject to the rank condition. This would
lead to a nonlinear optimization problem analogous to the
bundle adjustment in the two-view case.
Alternatively, one can aim for a similar separation of structure
and motion as done for the two-view case in the eight-point
algorithm. Such an algorithm shall be detailed in the following.
One should point out that this approach does not necessarily
lead to a practical algorithm as the spectral approaches do not
imply optimality in the context of noise and uncertainty.
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 31/43
Multiple-view Factorization of Point Features
Suppose we have m images x j1 , . . . , x jm of n points pj and we
want to estimate the unknown projection matrix Π.
The condition rank(Mp ) ≤ 1 states that the two columns of Mp
are linearly dependent. For the j-th point pj this implies




c
c
x j2 R2 x j1
x j2 T2
 c

 c

 xj R xj 
 j

 3 3 1 
j  x 3 T3 
+
α



 = 0 ∈ R3(m−1)×1 ,
..
..




.
.




cj
cj
j
x m Rm x 1
x m Tm
for some parameters αj ∈ R, j = 1, . . . , n. Each row in the
above equation can be obtained from λji x ji = λj1 Ri x j1 + Ti ,
b
multiplying by x ji :
b
b
x ji Ri x j1 + x ji Ti /λj1 = 0.
Reconstruction from
Multiple Views
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
Therefore, αj = 1/λj1 is nothing but the inverse of the depth of
point pj with respect to the first frame.
updated May 23, 2019 32/43
Reconstruction from
Multiple Views
Motion Estimation from Known Structure
Prof. Daniel Cremers
Assume we have the depth of the points and thus their inverse
αj (i.e. known structure). Then the above equation is linear in
the camera motion parameters Ri and Ti . Using the stack
notation Ris = [r11 , r21 , r31 , r12 , r22 , r32 , r13 , r23 , r33 ]> ∈ R9 and
Ti ∈ R3 , we have the linear equation system

s 

Ri
=
Pi

Ti

>
x 11
>
x 21
c1
⊗x
i
c2
⊗x
i
..
.
cn
x n1 > ⊗ x
i
c1
α1 x
i
c2
α2 x
i
..
.
cn
αn x
i

 s
 Ri

 Ti = 0

From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
∈ R3n .
One can show that the matrix Pi ∈ R3n×12 is of rank 11 if more
than n = 6 points in general position are given. In that case the
null space of Pi is onedimensional and the projection matrix
Πi = (Ri , Ti ) is given up to a scale factor. In practice one would
use more than 6 points, obtain a full-rank matrix and compute
the solution by a singular value decomposition (SVD).
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 33/43
Structure Estimation from Known Motion
In turn, if the camera motion Πi = (Ri , Ti ), i = 1, . . . , m is
known, we can estimate the structure (depth parameters
αj , j = 1, . . . , m). The least squares solution for the above
equation is given by:
αj = −
Prof. Daniel Cremers
From Two Views to
Multiple Views
Pm
bj > bj
j
i=2 (x i Ti ) x i Ri x 1
,
Pm bj 2
k
x
T
k
i
i=2
i
Reconstruction from
Multiple Views
j = 1, . . . , n.
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
In this way one can iteratively estimate structure and motion,
estimating one while keeping the other fixed.
Geometric
Interpretation
For initialization one could apply the eight-point algorithm to
the first two images to obtain an estimate of the structure
parameters αj .
Relation to Epipolar
Constraints
While the equation for Πi makes use of the two frames 1 and i
only, the structure parameter estimation takes into account all
frames. This can be done either in batch mode or recursively.
Multiple-View
Reconstruction of
Lines
The Multiple-view
Matrix
Multiple-View
Reconstruction
Algorithms
As for the two-view case, such spectral approaches do not
guarantee optimality in the presence of noise and uncertainty.
updated May 23, 2019 34/43
Reconstruction from
Multiple Views
Multiple-view Matrix for Lines
Prof. Daniel Cremers
The matrix



Wl = 

`>
1 Π1
`>
2 Π2
..
.
`>
m Πm





∈ Rm×4
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
associated with m images of a line in space satisfies the rank
constraint rank(Wl ) ≤ 2, because Wl X 0 = Wl V = 0 for the
base point X 0 and the direction V of the line. To find a more
compact representation, let us assume that the first camera is
in world coordinates, i.e. Π1 = (I, 0). The rank is not affected by
multiplying with a full-rank matrix Dl ∈ R4×5 :
`>
1
`> R2
2

W l Dl =  .
 ..
`>
m Rm

`>
1 `1
`> R2 `1
2

0

`>
2 T2 `1 `b1 0
= .

..
 ..
.  0 0 1
>
`m Tm
`>
m Rm ` 1
0
b
`>
R
2 2 `1
..
.
`> Rm `b1
m
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
0

`>
2 T2
.. 
. 

Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
`>
m Tm
updated May 23, 2019 35/43
Multiple-view Matrix for Lines
Since multiplication with a full rank matrix does not affect the
rank, we have
Reconstruction from
Multiple Views
Prof. Daniel Cremers
rank(Wl Dl ) = rank(Wl ) ≤ 2.
Since the first column of Wl Dl is linearly independent from the
remaining ones, the submatrix
b
`>
2 R2 `1

.
..
Ml = 
`> Rm `b1

m

`>
2 T2

..

.
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
∈ R(m−1)×4 ,
`>
m Tm
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
has the rank constraint:
rank(Ml ) ≤ 1.
For a line projected into m images, we have a much stronger
rank-constraint than for a projected point: For a sufficiently
large number of views m, the matrix Ml could in principle have
a rank of four. The above constraint states that a meaningful
preimage of m observed lines can only exist if rank(Ml ) ≤ 1.
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 36/43
Trilinear Constraints for a Line
Reconstruction from
Multiple Views
Prof. Daniel Cremers
Again, we can take a closer look at the meaning of the above
rank constraint. Regarding the first three columns of Ml it
implies that respective row vectors must be pairwise linearly
dependent, i.e. for all i, j 6= 1:
>
b
b
`>
i Ri `1 ∼ `j Rj `1 ,
which is equivalent to the trilinear equation
b >
`>
i Ri `1 Rj `j = 0.
Proof: The above proportionality states that the three vectors
Ri> `i , Rj> `j and `1 are coplanar. The lower equation is the
equivalent statement that the vector Ri> `i is orthogonal to the
normal on the plane spanned by Rj> `j and `1 .
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
Interestingly, the above constraint only involves the camera
rotations, not the camera translations.
updated May 23, 2019 37/43
Reconstruction from
Multiple Views
Trilinear Constraints for a Line
Prof. Daniel Cremers
Taking into account the fourth column of the multiple-view
matrix Ml , the rank constraint implies the linear dependency
between the ith and the jth row. This is equivalent to the
trilinear constraint:
>
b
`>
j Tj `i Ri `1
−
>
b
`>
i Ti `j Rj `1
= 0.
The proof follows from the general lemma on page 25.
The above constraint relates the first, the ith and the jth
images. From previous discussion, we saw that all nontrivial
constraints in the case of lines involve at least three images.
The two trilinear constraints above are equivalent to the rank
constraint if the scalar `>
i Ti 6= 0, i.e. in non-degenerate cases.
In general, rank(Ml ) ≤ 1 if and only if all its 2 × 2-minors
(deutsch: Untermatrizen), have zero determinant. Since these
minors only include three images at a time, one can conclude
that any multiview constraint on lines can be reduced to
constraints which only involve three lines at a time.
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 38/43
Reconstruction from
Multiple Views
Uniqueness of the Preimage
Prof. Daniel Cremers
The key idea of the rank constraint on the multiple-view matrix
Ml was to assure that m observations of a line correspond to a
consistent preimage L. The uniqueness of the preimage in the
case of the trilinear constraints can be characterized as follows.
From Two Views to
Multiple Views
Lemma: Given three camera frames with distinct optical centers and any three vectors `1 , `2 , `3 ∈ R3 that represent three
image lines. If the three image lines satisfy the trilinear constraints
>
>
>
b
b
`>
j Tji `k Rki `i − `k Tki `j Rji `i = 0,
i, j, k ∈ {1, 2, 3},
then their preimage L is uniquely determined except for the
case in which the preimage of every `i is the same plane in
space. This is the only degenerate case, and in this case,
the matrix Ml becomes zero.
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
Note that the above constraint combines the two trilinear
constraints introduced on the previous slides.
updated May 23, 2019 39/43
Uniqueness of the Preimage
Reconstruction from
Multiple Views
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
No preimage: The lines L2 and L3 don’t coincide.
updated May 23, 2019 40/43
Uniqueness of the Preimage
Reconstruction from
Multiple Views
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
Uniqueness of the preimage: The lines L2 and L3 coincide.
updated May 23, 2019 41/43
Uniqueness of the Preimage
A similar statement can be made regarding the uniqueness of
the preimage of m lines in relation to the rank of the multiview
matrix Ml .
Theorem: Given m vectors `i ∈ R3 representing images of
lines with respect to m camera frames. They correspond to
the same line in space if the rank of the matrix Ml relative to
any of the camera frames is 1. If its rank is 0 (i.e. the matrix
Ml itself is zero), then the line is determined up to a plane on
which all the camera centers must lie.
Overall we have the following cases:
rank(Ml ) = 2 ⇒ no line correspondence
rank(Ml ) = 1 ⇒ line correspondence & unique preimage
Reconstruction from
Multiple Views
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
rank(Ml ) = 0 ⇒ line correspondence & preimage not unique
updated May 23, 2019 42/43
Reconstruction from
Multiple Views
Summary
One can generalize the two-view scenario to that of
simultaneously considering m ≥ 2 images of a scene. The
intrinsic constraints among multiple images of a point or a line
can be expressed in terms of rank conditions on the matrix N,
W or M.
The relationship among these rank conditions is as follows:
(Pre)image
coimage
Jointly
Prof. Daniel Cremers
From Two Views to
Multiple Views
Preimage & Coimage
from Multiple Views
From Preimages to
Rank Constraints
Geometric
Interpretation
Point
rank(Np ) ≤ m + 3
rank(Wp ) ≤ 3
rank(Mp ) ≤ 1
Line
rank(Nl ) ≤ 2m + 2
rank(Wl ) ≤ 2
rank(Ml ) ≤ 1
These rank conditions capture the relationships among
corresponding geometric primitives in multiple images. They
impose the existence of unique preimages (up to degenerate
cases). Moreover, they give rise to natural factorization-based
algorithms for multiview recovery of 3D structure and motion
(i.e. generalizations of the eight-point algorithm).
The Multiple-view
Matrix
Relation to Epipolar
Constraints
Multiple-View
Reconstruction
Algorithms
Multiple-View
Reconstruction of
Lines
updated May 23, 2019 43/43
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
Chapter 7
Bundle Adjustment &
Nonlinear Optimization
Multiple View Geometry
Summer 2019
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
Example Applications
Prof. Daniel Cremers
Chair for Computer Vision and Artificial Intelligence
Departments of Informatics & Mathematics
Technical University of Munich
updated May 23, 2019 1/23
Overview
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
1 Optimality in Noisy Real World Conditions
2 Bundle Adjustment
3 Nonlinear Optimization
Optimality in Noisy
Real World Conditions
Bundle Adjustment
4 Gradient Descent
Nonlinear Optimization
Gradient Descent
5 Least Squares Estimation
6 Newton Methods
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
7 The Gauss-Newton Algorithm
The
Levenberg-Marquardt
Algorithm
8 The Levenberg-Marquardt Algorithm
Summary
Example Applications
9 Summary
10 Example Applications
updated May 23, 2019 2/23
Optimality in Noisy Real World Conditions
In the previous chapters we discussed linear approaches to
solve the structure and motion problem. In particular, the
eight-point algorithm provides closed-form solutions to
estimate the camera parameters and the 3D structure, based
on singular value decomposition.
However, if we have noisy data x˜1 , x˜2 (correspondences not
exact or even incorrect), then we have no guarantee
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
• that R and T are as close as possible to the true solution.
• that we will get a consistent reconstruction.
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
Example Applications
updated May 23, 2019 3/23
Statistical Approaches to Cope with Noise
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
The linear approaches are elegant because optimal solutions
to respective problems can be computed in closed form.
However, they often fail when dealing with noisy and imprecise
point locations. Since measurement noise is not explicitly
considered or modeled, such spectral methods often provide
suboptimal performance in noisy real-world conditions.
In order to take noise and statistical fluctuation into account,
one can revert to a Bayesian formulation and determine the
most likely camera transformation R, T and ‘true’ 2D
coordinates x given the measured coordinates x̃, by
performing a maximum aposteriori estimate:
arg max P(x, R, T | x̃) = arg max P(x̃ | x, R, T ) P(x, R, T )
x,R,T
x,R,T
This approach will however involve modeling probability
densities P on the fairly complicated space SO(3) × S2 of
rotation and translation parameters, as R ∈ SO(3) and T ∈ S2
(3D translation with unit length).
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
Example Applications
updated May 23, 2019 4/23
Bundle Adjustment &
Nonlinear Optimization
Bundle Adjustment and Nonlinear Optimization
Under the assumption that the observed 2D point coordinates
x̃ are corrupted by zero-mean Gaussian noise, maximum
likelihood estimation leads to bundle adjustment:
E(R, T , X 1 , . . . , X N ) =
N
X
2
j
j
x̃ 1 − π(X j ) + x̃ 2 − π(R, T , X j )
2
j=1
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
It aims at minimizing the reprojection error between the
j
observed 2D coordinates x̃ i and the projected 3D coordinate
X j (w.r.t. camera 1). Here π(R, T , X j ) denotes the perspective
projection of X j after rotation and translation.
m X
N
X
Gradient Descent
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
For the general case of m images, we get:
E({Ri , Ti }i=1..m , {X j }j=1..N ) =
Prof. Daniel Cremers
Summary
j
θij x̃ i
2
− π(Ri , Ti , X j ) ,
Example Applications
i=1 j=1
with T1 = 0 and R1 = 1. θij = 1 if point j is visible in image i,
θij = 0 else. The above problems are non-convex.
updated May 23, 2019 5/23
Bundle Adjustment &
Nonlinear Optimization
Different Parameterizations of the Problem
Prof. Daniel Cremers
The same optimization problem can be parameterized
differently. For example, we can introduce x ji to denote the true
j
2D coordinate associated with the measured coordinate x̃ i :
E({x j1 , λj1 }j=1..N , R, T ) =
N
X
Optimality in Noisy
Real World Conditions
j
j
kx j1 − x̃ 1 k2 +kx̃ 2 −π(Rλj1 x j1 +T )k2 .
j=1
N X
2
X
j
kx ji − x̃ i k2
j=1 i=1
x j>
1 e3 = 1,
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
subject to (consistent geometry):
j
b
x j>
2 T Rx 1 = 0,
Nonlinear Optimization
Gradient Descent
Alternatively, we can perform a constrained optimization by
minimizing a cost function (similarity to measurements):
E({x ji }j=1..N , R, T ) =
Bundle Adjustment
Example Applications
x j>
2 e3 = 1,
j = 1, . . . , N.
updated May 23, 2019 6/23
Some Comments on Bundle Adjustment
Bundle Adjustment &
Nonlinear Optimization
Bundle adjustment aims at jointly estimating 3D coordinates of
points and camera parameters – typically the rigid body
motion, but sometimes also intrinsic calibration parameters or
radial distortion. Different models of the noise in the observed
2D points leads to different cost functions, zero-mean
Gaussian noise being the most common assumption.
Prof. Daniel Cremers
The approach is called bundle adjustment (dt.:
Bündelausgleich) because it aims at adjusting the bundles of
light rays emitted from the 3D points. Originally derived in the
field of photogrammetry in the 1950s, it is now used frequently
in computer vision. A good overview can be found in:
Triggs, McLauchlan, Hartley, Fitzgibbon, “Bundle Adjustment –
A Modern Synthesis”, ICCV Workshop 1999.
Nonlinear Optimization
Typically it is used as the last step in a reconstruction pipeline
because the minimization of this highly non-convex cost
function requires a good initialization. The minimization of
non-convex energies is a challenging problem. Bundle
adjustment type cost functions are typically minimized by
nonlinear least squares algorithms.
Summary
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Gradient Descent
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Example Applications
updated May 23, 2019 7/23
Nonlinear Programming
Nonlinear programming denotes the process of iteratively
solving a nonlinear optimization problem, i.e. a problem
involving the maximization or minimization of an objective
function over a set of real variables under a set of equality or
inequality constraints.
There are numerous methods and techniques. Good
overviews of respective methods can be found for example in
Bersekas (1999) “Nonlinear Programming”, Nocedal & Wright
(1999), “Numerical Optimization” or Luenberger & Ye (2008),
“Linear and nonlinear programming”.
Depending on the cost function, different algorithms are
employed. In the following, we will discuss (nonlinear) least
squares estimation and several popular iterative techniques for
nonlinear optimization:
• the gradient descent,
• Newton methods,
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
Example Applications
• the Gauss-Newton algorithm,
• the Levenberg-Marquardt algorithm.
updated May 23, 2019 8/23
Gradient Descent
Gradient descent or steepest descent is a first-order
optimization method. It aims at computing a local minimum of a
(generally) non-convex cost function by iteratively stepping in
the direction in which the energy decreases most. This is given
by the negative energy gradient.
To minimize a real-valued cost E : Rn → R, the gradient flow
for E(x) is defined by the differential equation:
(
x(0) = x0
dx
dt
Discretization: xk +1 = xk −
Prof. Daniel Cremers
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
Least Squares
Estimation
= − dE
dx (x)
dE
dx (xk ),
Bundle Adjustment &
Nonlinear Optimization
Newton Methods
k = 0, 1, 2, . . . .
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
E(x)
E(x)
Summary
Example Applications
updated May 23, 2019 9/23
Gradient Descent
Under certain conditions on E(x), the gradient descent iteration
xk +1 = xk − dE
(xk ),
dx
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
k = 0, 1, 2, . . .
converges to a local minimum. For the case of convex E, this
will also be the global minimum. The step size can be chosen
differently in each iteration.
Gradient descent is a popular and broadly applicable method.
It is typically not the fastest solution to compute minimizers
because the asymptotic convergence rate is often inferior to
that of more specialized algorithms. First-order methods with
optimal convergence rates were pioneered by Yuri Nesterov.
In particular, highly anisotropic cost functions (with strongly
different curvatures in different directions) require many
iterations and trajectories tend to zig-zag. Locally optimal step
sizes in each iteration can be computed by line search. For
specific cost functions, alternative techniques such as the
conjugate gradient method, Newton methods, or the BFGS
method are preferable.
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
Example Applications
updated May 23, 2019 10/23
Linear Least Squares Estimation
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
Ordinary least squares or linear least squares is a method to
for estimating a set of parameters x ∈ Rd in a linear regression
model. Assume for each input vector bi ∈ Rd , i ∈ {1, .., n}, we
observe a scalar response ai ∈ R. Assume there is a linear
relationship of the form
ai = bi> x + ηi
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
with an unknown vector x ∈ Rd and zero-mean Gaussian noise
η ∼ N (0, Σ) with a diagonal covariance matrix of the form
Σ = σ 2 In . Maximum likelihood estimation of x leads to the
ordinary least squares problem:
X
min
(ai − x > bi )2 = (a − Bx)> (a − Bx).
x
i
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
Example Applications
Linear least squares estimation was introduced by Legendre
(1805) and Gauss (1795/1809). When asking for which noise
distribution the optimal estimator was the arithmetic mean,
Gauss invented the normal distribution.
updated May 23, 2019 11/23
Bundle Adjustment &
Nonlinear Optimization
Linear Least Squares Estimation
For general Σ, we get the generalized least squares problem:
Prof. Daniel Cremers
min(a − Bx)> Σ−1 (a − Bx).
x
This is a quadratic cost function with positive definite Σ−1 . It
has the closed-form solution:
−1
>
x̂ = arg min(a − Bx) Σ
x
>
−1
= (B Σ
−1
B)
>
−1
B Σ
(a − Bx)
a.
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
Least Squares
Estimation
Newton Methods
If there is no correlation among the observed variances, then
the matrix Σ is diagonal. This case is referred to as weighted
least squares:
X
min
wi (ai − x > bi )2 ,
with wi = σi−2 .
x
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
Example Applications
i
For the case of unknown matrix Σ, there exist iterative
estimation algorithms such as feasible generalized least
squares or iteratively reweighted least squares.
updated May 23, 2019 12/23
Iteratively Reweighted Least Squares
The method of iteratively reweighted least squares aims at
minimizing generally non-convex optimization problems of the
form
X
min
wi (x)|ai − fi (x)|2 ,
x
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
Optimality in Noisy
Real World Conditions
i
with some known weighting function wi (x). A solution is
obtained by iterating the following problem:
X
xt+1 = arg min
wi (xt ) |ai − fi (x)|2
x
i
For the case that fi is linear, i.e. fi (x) = x > bi , each subproblem
X
xt+1 = arg min
wi (xt ) |ai − x > bi |2
x
i
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
Example Applications
is simply a weighted least squares problem that can be solved
in closed form. Nevertheless, this iterative approach will
generally not converge to a global minimum of the original
(nonconvex) problem.
updated May 23, 2019 13/23
Nonlinear Least Squares Estimation
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
Nonlinear least squares estimation aims at fitting observations
(ai , bi ) with a nonlinear model of the form ai ≈ f (bi , x) for some
function f parameterized with an unknown vector x ∈ Rd .
Minimizing the sum of squares error
X
min
ri (x)2 ,
with ri (x) = ai − f (bi , x),
x
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
i
Gradient Descent
is generally a non-convex optimization problem.
Least Squares
Estimation
The optimality condition is given by
Newton Methods
X ∂ri
= 0,
ri
∂xj
The Gauss-Newton
Algorithm
∀ j ∈ {1, .., d}.
i
The
Levenberg-Marquardt
Algorithm
Summary
Typically one cannot directly solve these equation. Yet, there
exist iterative algorithms for computing approximate solutions,
including Newton methods, the Gauss-Newton algorithm and
the Levenberg-Marquardt algorithm.
Example Applications
updated May 23, 2019 14/23
Newton Methods for Optimization
Newton methods are second order methods: In contrast to
first-order methods like gradient descent, they also make use
of second derivatives. Geometrically, Newton method
iteratively approximate the cost function E(x) quadratically and
takes a step to the minimizer of this approximation.
Let xt be the estimated solution after t iterations. Then the
Taylor approximation of E(x) in the vicinity of this estimate is:
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
1
E(x) ≈ E(xt ) + g (x − xt ) + (x − xt )> H(x − xt ),
2
>
The first and second derivative are denoted by the Jacobian
g = dE/dx(xt ) and the Hessian matrix d 2 E/dx 2 (xt ). For this
second-order approximation, the optimality condition is:
dE
= g + H(x − xt ) = 0
dx
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
(∗)
Example Applications
Setting the next iterate to the minimizer x leads to:
xt+1 = xt − H −1 g.
updated May 23, 2019 15/23
Newton Methods for Optimization
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
In practice, one often choses a more conservative step size
γ ∈ (0, 1):
xt+1 = xt − γ H −1 g.
When applicable, second-order methods are often faster than
first-order methods, at least when measured in number of
iterations. In particular, there exists a local neighborhood
around each optimum where the Newton method converges
quadratically for γ = 1 (if the Hessian is invertible and Lipschitz
continuous).
Optimality in Noisy
Real World Conditions
For large optimization problems, computing and inverting the
Hessian may be challenging. Moreover, since this problem is
often not parallelizable, some second order methods do not
profit from GPU acceleration. In such cases, one can aim to
iteratively solve the extremality condition (∗).
The Gauss-Newton
Algorithm
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
Least Squares
Estimation
Newton Methods
The
Levenberg-Marquardt
Algorithm
Summary
Example Applications
In case that H is not positive definite, there exist quasi-Newton
methods which aim at approximating H or H −1 with a positive
definite matrix.
updated May 23, 2019 16/23
The Gauss-Newton Algorithm
The Gauss-Newton algorithm is a method to solve non-linear
least-squares problems of the form:
X
min
ri (x)2 .
x
Prof. Daniel Cremers
i
It can be derived as an approximation to the Newton method.
The latter iterates:
xt+1 = xt − H
−1
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
g.
Gradient Descent
with the gradient g:
Least Squares
Estimation
gj = 2
X ∂ri
ri
,
∂xj
Newton Methods
The Gauss-Newton
Algorithm
i
The
Levenberg-Marquardt
Algorithm
and the Hessian H:
Hjk = 2
Bundle Adjustment &
Nonlinear Optimization
X
i
∂ri ∂ri
∂ 2 ri
+ ri
∂xj ∂xk
∂xj ∂xk
.
Summary
Example Applications
Dropping the second order term leads to the approximation:
X
∂ri
Hjk ≈ 2
Jij Jik ,
with Jij =
.
∂xj
i
updated May 23, 2019 17/23
Bundle Adjustment &
Nonlinear Optimization
The Gauss-Newton Algorithm
Prof. Daniel Cremers
The approximation
H ≈ 2J > J,
with the Jacobian J =
dr
,
dx
together with g = 2J > r , leads to the Gauss-Newton algorithm:
Optimality in Noisy
Real World Conditions
Bundle Adjustment
with ∆ = −(J > J)−1 J > r
xt+1 = xt + ∆,
Nonlinear Optimization
Gradient Descent
In contrast to the Newton algorithm, the Gauss-Newton
algorithm does not require the computation of second
derivatives. Moreover, the above approximation of the Hessian
is by construction positive definite.
This approximation of the Hessian is valid if
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
2
ri
∂ri ∂ri
∂ ri
,
∂xj ∂xk
∂xj ∂xk
Example Applications
This is the case if the residuum ri is small or if it is close to
linear (in which case the second derivatives are small).
updated May 23, 2019 18/23
Bundle Adjustment &
Nonlinear Optimization
The Levenberg-Marquardt Algorithm
The Newton algorithm
Prof. Daniel Cremers
xt+1 = xt − H −1 g,
can be modified (damped):
xt+1 = xt − H + λIn
−1
Optimality in Noisy
Real World Conditions
g,
Bundle Adjustment
Nonlinear Optimization
to create a hybrid between the Newton method (λ = 0) and a
gradient descent with step size 1/λ (for λ → ∞) .
Gradient Descent
In the same manner, Levenberg (1944) suggested to damp the
Gauss-Newton algorithm for nonlinear least squares:
Newton Methods
xt+1 = xt + ∆,
with ∆ = − J > J + λIn
−1
J >r .
Least Squares
Estimation
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
Marquardt (1963) suggested a more adaptive component-wise
damping of the form:
Example Applications
−1 >
∆ = − J > J + λdiag(J > J)
J r,
which avoids slow convergence in directions of small gradient.
updated May 23, 2019 19/23
Summary
Bundle adjustment was pioneered in the 1950s as a technique
for structure and motion estimation in noisy real-world
conditions. It aims at estimating the locations of N 3D points X j
j
and camera motions (Ri , Ti ), given noisy 2D projections x̃ i in m
images.
The assumption of zero-mean Gaussian noise on the 2D
observations leads to the weighted nonlinear least squares
problem:
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
Least Squares
Estimation
E {Ri , Ti }i=1..m , {X j }j=1..N =
m X
N
X
j
2
θij x̃ i − π(Ri , Ti , X j ) ,
i=1 j=1
with θij = 1 if point j is visible in image i, θij = 0 else.
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
Solutions of this nonconvex problem can be computed by
various iterative algorithms, most importantly the
Gauss-Newton algorithm or its damped version, the
Levenberg-Marquardt algorithm. Bundle adjustment is typically
initialized by an algorithm such as the eight-point or five-point
algorithm.
Example Applications
updated May 23, 2019 20/23
Example I: From Internet Photo Collections...
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
Example Applications
Flickr images for search term “Notre Dame”
Snavely, Seitz, Szeliski, “Modeling the world from Internet
photo collections,” IJCV 2008.
updated May 23, 2019 21/23
...to Sparse Reconstructions
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
Snavely, Seitz, Szeliski, “Modeling the world from Internet
photo collections,” IJCV 2008.
The
Levenberg-Marquardt
Algorithm
Summary
Example Applications
updated May 23, 2019 22/23
Example II: Realtime Structure and Motion
Bundle Adjustment &
Nonlinear Optimization
Prof. Daniel Cremers
Optimality in Noisy
Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
Least Squares
Estimation
Newton Methods
The Gauss-Newton
Algorithm
The
Levenberg-Marquardt
Algorithm
Summary
Example Applications
Klein & Murray, “Parallel Tracking and Mapping (PTAM) for
Small AR Workspaces,” ISMAR 2007.
updated May 23, 2019 23/23
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Chapter 8
Direct Approaches to Visual SLAM
Direct Methods
Multiple View Geometry
Summer 2019
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Prof. Daniel Cremers
Chair for Computer Vision and Artificial Intelligence
Departments of Informatics & Mathematics
Technical University of Munich
updated June 13, 2019 1/33
Overview
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
1 Direct Methods
2 Realtime Dense Geometry
3 Dense RGB-D Tracking
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
4 Loop Closure and Global Consistency
5 Dense Tracking and Mapping
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
6 Large Scale Direct Monocular SLAM
7 Direct Sparse Odometry
updated June 13, 2019 2/33
Classical Approaches to Multiple View Reconstruction
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
In the past chapters we have studied classical approaches to
multiple view reconstruction. These methods tackle the
problem of structure and motion estimation (or visual SLAM) in
several steps:
Direct Methods
1
2
A set of feature points is extracted from the images –
ideally points such as corners which can be reliably
identified in subsequent images as well.
One determines a correspondence of these points across
the various images. This can be done either through local
tracking (using optical flow approaches) or by random
sampling of possible partners based on a feature
descriptor (SIFT, SURF, etc.) associated with each point.
3
The camera motion is estimated based on a set of
corresponding points. In many approaches this is done by
a series of algorithms such as the eight-point algorithm or
the five-point algorithm followed by bundle adjustment.
4
For a given camera motion one can then compute a dense
reconstruction using stereo reconstruction methods.
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
updated June 13, 2019 3/33
Shortcomings of Classical Approaches
Such classical approaches are indirect in the sense that they
do not compute structure and motion directly from the images
but rather from a sparse set of precomputed feature points.
Despite a number of successes, they have several drawbacks:
• From the point of view of statistical inference, they are
suboptimal: In the selection of feature points much
potentially valuable information contained in the colors of
each image is discarded.
• They invariably lack robustness: Errors in the point
correspondence may have devastating effects on the
estimated camera motion. Since one often selects very
few point pairs only (8 points for the eight-point algorithm,
5 points for the five-point algorithm), any incorrect
correspondence will lead to an incorrect motion estimate.
• They do not address the highly coupled problems of
motion estimation and dense structure estimation. They
merely do so for a sparse set of points. As a consequence,
improvements in the estimated dense geometry will not be
used to improve the camera motion estimates.
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
updated June 13, 2019 4/33
Toward Direct Approaches to Multiview Reconstruction
In the last few years, researchers have been promoting direct
approaches to multi-view reconstruction. Rather than
extracting a sparse set of feature points to determine the
camera motion, direct methods aim at estimating camera
motion and dense or semi-dense scene geometry directly from
the input images. This has several advantages:
• Direct methods tend to be more robust to noise and other
nuisances because they exploit all available input
information.
• Direct methods provide a semi-dense geometric
reconstruction of the scene which goes well beyond the
sparse point cloud generated by the eight-point algorithm
or bundle adjustment. Depending on the application, a
separate dense reconstruction step may no longer be
necessary.
• Direct methods are typically faster because the
feature-point extraction and correspondence finding is
omitted: They can provide fairly accurate camera motion
and scene structure in real-time on a CPU.
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
updated June 13, 2019 5/33
Feature-Based versus Direct Methods
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Engel, Sturm, Cremers, ICCV 2013
updated June 13, 2019 6/33
Direct Methods for Multi-view Reconstruction
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
In the following, we will briefly review several recent works on
direct methods for realtime multiple-view reconstruction:
• the method of Stühmer, Gumhold, Cremers, DAGM 2010
computes dense geometry from a handheld camera in
real-time.
• the methods of Steinbrücker, Sturm, Cremers, 2011 and
Kerl, Sturm, Cremers, 2013 directly compute the camera
motion of an RGB-D camera.
• the method of Newcombe, Lovegrove, Davison, ICCV
2011 directly determines dense geometry and camera
motion from the images.
• the method of Engel, Sturm, Cremers, ICCV 2013 and
Engel, Schöps, Cremers, ECCV 2014 directly computes
camera motion and semi-dense geometry for a handheld
(monocular) camera.
• the method of Engel, Koltun, Cremers, PAMI 2018 directly
estimates highly accurate camera motion and sparse
geometry.
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
updated June 13, 2019 7/33
Realtime Dense Geometry from a Handheld Camera
Let gi ∈ SE(3) be the rigid body motion from the first camera to
the i-th camera, and let Ii : Ω → R be the i-th image. A dense
depth map h : Ω → R can be computed by solving the
optimization problem:
min
h
n Z
X
i=2
Ω
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
I1 (x) − Ii πgi (hx)
Z
|∇h| dx,
dx + λ
Ω
Realtime Dense
Geometry
Dense RGB-D
Tracking
where x is represented in homogeneous coordinates and hx is
the corresponding 3D point.
Loop Closure and
Global Consistency
Like in optical flow estimation, the unknown depth map should
be such that for all pixels x ∈ Ω, the transformation into the
other images Ii should give rise to the same color as in the
reference image I1 .
Large Scale Direct
Monocular SLAM
Dense Tracking and
Mapping
Direct Sparse
Odometry
This cost function can be minimized at framerate by
coarse-to-fine linearization solved in parallel on a GPU.
Stuehmer, Gumhold, Cremers, DAGM 2010.
updated June 13, 2019 8/33
Realtime Dense Geometry from a Handheld Camera
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Input image
Reconstruction
Textured geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Textured reconstructions
Untextured
Stuehmer, Gumhold, Cremers, DAGM 2010.
updated June 13, 2019 9/33
Dense RGB-D Tracking
The approach of Stühmer et al. (2010) relies on a sparse
feature-point based camera tracker (PTAM) and computes
dense geometry directly on the images. Steinbrücker, Sturm,
Cremers (2011) propose a complementary approach to directly
compute the camera motion from RGB-D images. The idea is
to compute the rigid body motion gξ which optimally aligns two
subsequent color images I1 and I2 :
Z
2
min
I1 (x) − I2 πgξ (hx)
dx
ξ∈se(3) Ω
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
updated June 13, 2019 10/33
Direct Approaches to
Visual SLAM
Dense RGB-D Tracking
The above non-convex problem can be approximated as a
convex problem by linearizing the residuum around an initial
guess ξ0 :
Z
E(ξ) ≈
I1 (x) − I2 πgξ0 (hx) − ∇I2>
Ω
dπ
dgξ
dgξ
dξ
2
ξ dx
This is a convex quadratic cost function which gives rise to a
linear optimality condition:
dE(ξ)
= Aξ + b = 0
dξ
To account for larger motions of the camera, this problem is
solved in a coarse-to-fine manner. The linearization of the
residuum is identical with a Gauss-Newton approach. It
corresponds to an approximation of the Hessian by a positive
definite matrix.
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Steinbrücker, Sturm, Cremers 2011
updated June 13, 2019 11/33
Dense RGB-D Tracking
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Steinbrücker, Sturm, Cremers 2011
updated June 13, 2019 12/33
Direct Approaches to
Visual SLAM
Dense RGB-D Tracking
In the small-baseline setting, this image aligning approach
provides more accurate camera motion than the commonly
used generalized Iterated Closest Points (GICP) approach.
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
frame difference, sequence 1
frame difference, sequence 2
Direct Sparse
Odometry
Steinbrücker, Sturm, Cremers 2011
A related direct tracking approach was proposed for stereo
reconstruction in Comport, Malis, Rives, ICRA 2007. A
generalization which makes use of non-quadratic penalizers
was proposed in Kerl, Sturm, Cremers, ICRA 2013.
updated June 13, 2019 13/33
A Benchmark for RGB-D Tracking
Accurately tracking the camera is among the most central
challenges in computer vision. Quantitative performance of
algorithms can be validated on benchmarks.
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Sturm, Engelhard, Endres, Burgard, Cremers, IROS 2012
updated June 13, 2019 14/33
A Benchmark for RGB-D Tracking
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Sturm, Engelhard, Endres, Burgard, Cremers, IROS 2012
updated June 13, 2019 15/33
Combining Photometric and Geometric Consistency
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Kerl, Sturm, Cremers, IROS 2013 propose an extension of the
RGB-D camera tracker which combines color consistency and
geometric consistency of subsequent RGB-D images.
Assuming that the vector ri = (rci , rzi ) ∈ R2 containing the color
and geometric discrepancy for pixel i follows a bivariate
t-distribution, the maximum likelihood pose estimate can be
computed as:
X
min
wi ri> Σ−1 ri ,
ξ∈R6
i
with weights wi based on the student t-distribution:
wi =
ν+1
.
ν + ri> Σ−1 ri
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
This nonlinear weighted least squares problem can be solved
in an iteratively reweighted least squares manner by alternating
a Gauss-Newton style optimization with a re-estimation of the
weights wi and the matrix Σ.
updated June 13, 2019 16/33
Direct Approaches to
Visual SLAM
Loop Closure and Global Consistency
When tracking a camera over a longer period of time, errors
tend to accumulate. While a single room may still be mapped
more or less accurately, mapping a larger environment will lead
to increasing distortions: Corridors and walls will no longer be
straight but slightly curved.
A remedy is to introduce pose graph optimization and loop
closuring, a technique popularized in laser-based SLAM
systems. The key idea is to estimate the relative camera
motion ξˆij for any camera pair i and j in a certain neighborhood.
Subsequently, one can determine a globally consistent camera
trajectory ξ = {ξi }i=1..T by solving the nonlinear least squares
problem
min
ξ
X
ξˆij − ξi ◦ ξj−1
>
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Σ−1
ξˆij − ξi ◦ ξj−1 ,
ij
i∼j
ˆ
where Σ−1
ij denotes the uncertainty of measurement ξij . This
problem can be solved using, for example, a
Levenberg-Marquardt algorithm.
updated June 13, 2019 17/33
Pose Graph Optimization and Loop Closure
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Kerl, Sturm, Cremers, IROS 2013
updated June 13, 2019 18/33
Dense Tracking and Mapping
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Newcombe, Lovegrove & Davison (ICCV 2011) propose an
algorithm which computes both the geometry of the scene and
the camera motion from a direct and dense algorithm.
They compute the inverse depth u = 1/h by minimizing a cost
function of the form
Z
n Z
x X
min
dx + λ ρ(x) |∇u| dx,
I1 (x) − Ii πgi
u
u
Ω
Ω
i=2
for fixed camera motions gi . The function ρ introduces an
edge-dependent weighting assigning small weights in locations
where the input images exhibit strong gradients:
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
ρ(x) = exp (−|∇Iσ (x)|α ) .
The camera tracking is then performed with respect to the
textured reconstruction in a manner similar to Steinbrücker et
al. (2011). The method is initialized using feature point based
stereo.
updated June 13, 2019 19/33
Dense Tracking and Mapping
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Newcombe, Lovegrove & Davison (ICCV 2011)
updated June 13, 2019 20/33
Large-Scale Direct Monocular SLAM
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
A method for real-time direct monocular SLAM is proposed in
Engel, Sturm, Cremers, ICCV 2013 and Engel, Schöps,
Cremers, ECCV 2014. It combines several contributions which
make it well-suited for robust large-scale monocular SLAM:
• Rather than tracking and putting into correspondence a
sparse set of feature points, the method estimates a
semi-dense depth map which associates an inverse depth
with each pixel that exhibits sufficient gray value variation.
• To account for noise and uncertainty each inverse depth
value is associated with an uncertainty which is
propagated and updated over time like in a Kalman filter.
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
• Since monocular SLAM is invariably defined up to scale
only, we explicitly facilitate scaling of the reconstruction by
modeling the camera motion using the Lie group of 3D
similarity transformations Sim(3).
• Global consistency is assured by loop closuring on Sim(3).
updated June 13, 2019 21/33
Tracking by Direct sim(3) Image Alignment
Since reconstructions from a monocular camera are only
defined up to scale, Engel, Schöps, Cremers, ECCV 2014
account for rescaling of the environment by representing the
camera motion as an element in the Lie group of 3D similarity
transformations Sim(3) which is defined as:
sR T
Sim(3) =
with R ∈ SO(3), T ∈ R3 , s ∈ R+ .
0 1
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
One can minimize a nonlinear least squares problem
X
min
wi ri2 (ξ),
ξ∈sim(3)
i
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
where ri denotes the color residuum across different images
and wi a weighting as suggested in Kerl et al. IROS 2013.
Direct Sparse
Odometry
The above cost function can then be optimized by a weighted
Gauss-Newton algorithm on the Lie group Sim(3):
ξ (t+1) = ∆ξ ◦ ξ (t) ,
with ∆ξ = J > WJ
−1
J >W r , J =
∂r
∂ξ
updated June 13, 2019 22/33
Large-Scale Direct Monocular SLAM
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Engel, Schöps, Cremers, ECCV 2014
updated June 13, 2019 23/33
Large-Scale Direct Monocular SLAM
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Engel, Schöps, Cremers, ECCV 2014
updated June 13, 2019 24/33
Towards Direct Sparse Odometry
Despite its popularity, LSD SLAM has several shortcomings:
• While the pose graph optimization allows to impose global
consistency, it merely performs a joint optimization of the
extrinsic parameters associated with all keyframes. In
contrast to a full bundle adjustment, it does not optimize
the geometry. This is hard to do in realtime, in particular
for longer sequences.
• LSD SLAM actually optimizes two different cost functions
for estimating geometry and camera motion.
• LSD SLAM introduces spatial regularity by a spatial
filtering of the inverse depth values. This creates
correlations among the geometry parameters which in turn
makes Gauss-Newton optimization difficult.
• LSD SLAM is based on the assumption of brightness
constancy. In real-world videos, brightness is often not
preserved. Due to varying exposure time, vignette and
gamma correction, the brightness can vary substantially.
While feature descriptors are often invariant to these
changes, the local brightness itself is not.
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
updated June 13, 2019 25/33
From Brightness Constancy to Irradiance Constancy
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Brightness variations due to vignette, gamma correction and
exposure time can be eliminated by a complete photometric
calibration:
I(x) = G t V (x)B(x)
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
where the measured brightness I depends on the irradiance B,
the vignette V , the exposure time t and the camera response
function G (gamma function). G and V can be calibrated
beforehand, t can be read out from the camera.
Engel, Koltun, Cremers, PAMI 2018
updated June 13, 2019 26/33
Windowed Joint Optimization
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
A complete bundle adjustment over longer sequences is
difficult to carry out in realtime because the number of 3D point
coordinates may grow very fast over time. Furthermore new
observations are likely to predominantly affect parameters
associated with neighboring structures and cameras. For a
given data set, one can study the connectivity graph, i.e. a
graph where each node represents an image and two nodes
are connected if they look at the same 3D structure.
Direct Sparse Odometry therefore reverts to a windowed joint
optimization, the idea being that from all 3D coordinates and
camera frames only those in a recent time window are
included. The remaining ones are marginalized out.
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
If one avoid spatial filtering and selects only a sparser subset
of points, then the points can be assumed to be fairly
independent. As a result the Hessian matrix becomes sparser
and the Schur complement can be employed to make the
Gauss-Newton updates more efficient.
updated June 13, 2019 27/33
Effects of Spatial Correlation on the Hessian
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Engel, Koltun, Cremers, PAMI 2018
updated June 13, 2019 28/33
Effect of Spatial Correlation on the Hessian Matrix
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
geometry not correlated
geometry correlated
Direct Sparse
Odometry
Engel, Koltun, Cremers, PAMI 2018
updated June 13, 2019 29/33
Direct Approaches to
Visual SLAM
The Schur Complement Trick
Solving the Newton update step (called normal equation)
Hαα Hαβ
xα
gα
Hx =
=
,
>
Hββ
Hαβ
xβ
gβ
Prof. Daniel Cremers
Direct Methods
for the unknowns xα and xβ is usually done by QR
decomposition for large problems. In this case, however, Hββ is
typically block diagonal (and thus easy to invert).
Left-multiplication with the matrix
−1
I −Hαβ Hββ
,
0
I
leads to:
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
S
>
Hαβ
0
Hββ
xα
xβ
=
−1
gα − Hαβ Hββ
gβ
gβ
,
−1 >
where S = Hαα − Hαβ Hββ
Hαβ is the Schur complement of Hββ
in H. It is symmetric, positive definite and block structured. The
equation Sxα = ... is the reduced camera system.
updated June 13, 2019 30/33
Direct Sparse Odometry
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Engel, Koltun, Cremers, PAMI 2018
updated June 13, 2019 31/33
Direct Sparse Odometry
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
Engel, Koltun, Cremers, PAMI 2018
updated June 13, 2019 32/33
Quantitative Evaluation
A quantitative comparison of Direct Sparse Odometry to the
state-of-the-art keypoint based technique ORB SLAM shows
substantial improvements in precision and robustness:
Direct Approaches to
Visual SLAM
Prof. Daniel Cremers
Direct Methods
Realtime Dense
Geometry
Dense RGB-D
Tracking
Loop Closure and
Global Consistency
Dense Tracking and
Mapping
Large Scale Direct
Monocular SLAM
Direct Sparse
Odometry
# of runs with a given error in translation, rotation and scale drift.
Engel, Koltun, Cremers, PAMI 2018
updated June 13, 2019 33/33
Variational Methods: A
Short Intro
Prof. Daniel Cremers
Chapter 9
Variational Methods: A Short Intro
Variational Methods
Multiple View Geometry
Summer 2019
Variational Image
Smoothing
Euler-Lagrange
Equation
Gradient Descent
Adaptive Smoothing
Euler and Lagrange
Prof. Daniel Cremers
Chair for Computer Vision and Artificial Intelligence
Departments of Informatics & Mathematics
Technical University of Munich
updated June 25, 2019 1/13
Overview
Variational Methods: A
Short Intro
Prof. Daniel Cremers
1 Variational Methods
Variational Methods
2 Variational Image Smoothing
Variational Image
Smoothing
Euler-Lagrange
Equation
3 Euler-Lagrange Equation
Gradient Descent
Adaptive Smoothing
Euler and Lagrange
4 Gradient Descent
5 Adaptive Smoothing
6 Euler and Lagrange
updated June 25, 2019 2/13
Variational Methods
Variational Methods: A
Short Intro
Prof. Daniel Cremers
Variational methods are a class of optimization methods. They
are popular because they allow to solve many problems in a
mathematically transparent manner. Instead of implementing a
heuristic sequence of processing steps (as was commonly
done in the 1980’s), one clarifies beforehand what properties
an ’optimal’ solution should have.
Variational methods are particularly popular for
infinite-dimensional problems and spatially continuous
representations.
Variational Methods
Variational Image
Smoothing
Euler-Lagrange
Equation
Gradient Descent
Adaptive Smoothing
Euler and Lagrange
Particular applications are:
• Image denoising and image restoration
• Image segmentation
• Motion estimation and optical flow
• Spatially dense multiple view reconstruction
• Tracking
updated June 25, 2019 3/13
Advantages of Variational Methods
Variational Methods: A
Short Intro
Prof. Daniel Cremers
Variational methods have many advantages over heuristic
multi-step approaches (such as the Canny edge detector):
• A mathematical analysis of the considered cost function
allows to make statements on the existence and
uniqueness of solutions.
• Approaches with multiple processing steps are difficult to
modify. All steps rely on the input from a previous step.
Exchanging one module by another typically requires to
re-engineer the entire processing pipeline.
Variational Methods
Variational Image
Smoothing
Euler-Lagrange
Equation
Gradient Descent
Adaptive Smoothing
Euler and Lagrange
• Variational methods make all modeling assumptions
transparent, there are no hidden assumptions.
• Variational methods typically have fewer tuning
parameters. In addition, the effect of respective
parameters is clear.
• Variational methods are easily fused – one simply adds
respective energies / cost functions.
updated June 25, 2019 4/13
Example: Variational Image Smoothing
Variational Methods: A
Short Intro
Prof. Daniel Cremers
Let f : Ω → R be a grayvalue input image on the domain
Ω ⊂ R2 . We assume that the observed image arises by some
’true’ image corrupted by additive noise. We are interested in a
denoised version u of the input image f .
The approximation u should fulfill two properties:
• It should be as similar as possible to f .
• It should be spatially smooth (i.e. ’noise-free’).
Both of these criteria can be entered in a cost function of the
form
E(u) = Edata (u, f ) + Esmoothness (u)
Variational Methods
Variational Image
Smoothing
Euler-Lagrange
Equation
Gradient Descent
Adaptive Smoothing
Euler and Lagrange
The first term measures the similarity of f and u. The second
one measures the smoothness of the (hypothetical) function u.
Most variational approaches have the above form. They merely
differ in the specific form of the data (similarity) term and the
regularity (or smoothness) term.
updated June 25, 2019 5/13
Example: Variational Image Smoothing
For denoising a grayvalue image f : Ω ⊂ R2 → R, specific
examples of data and smoothness term are:
Z
2
Edata (u, f ) =
u(x) − f (x) dx,
Ω
Variational Methods: A
Short Intro
Prof. Daniel Cremers
Variational Methods
Variational Image
Smoothing
and
Z
Esmoothness (u) =
|∇u(x)|2 dx,
Ω
>
where ∇ = (∂/∂x, ∂/∂y ) denotes the spatial gradient.
Euler-Lagrange
Equation
Gradient Descent
Adaptive Smoothing
Euler and Lagrange
Minimizing the weighted sum of data and smoothness term
Z
Z
2
E(u) =
u(x) − f (x) dx + λ |∇u(x)|2 dx, λ > 0,
leads to a smooth approximation u : Ω → R of the input image.
Such energies which assign a real value to a function are
called a functionals. How does one minimize functionals where
the argument is a function u(x) (rather than a finite number of
parameters)?
updated June 25, 2019 6/13
Functional Minimization & Euler-Lagrange Equation
Variational Methods: A
Short Intro
Prof. Daniel Cremers
• As a necessary condition for minimizers of a functional the
associated Euler-Lagrange equation must hold. For a
functional of the form
Z
E(u) = L(u, u 0 ) dx,
Variational Methods
Variational Image
Smoothing
Euler-Lagrange
Equation
it is given by
Gradient Descent
∂L
d ∂L
dE
=
−
=0
du
∂u
dx ∂u 0
Adaptive Smoothing
Euler and Lagrange
• The central idea of variational methods is therefore to
determine solutions of the Euler-Lagrange equation of a
given functional. For general non-convex functionals this is
a difficult problem.
• Another solution is to start with an (appropriate) function
u0 (x) and to modify it step by step such that in each
iteration the value of the functional is decreased. Such
methods are called descent methods.
updated June 25, 2019 7/13
Gradient Descent
Variational Methods: A
Short Intro
Prof. Daniel Cremers
One specific descent method is called gradient descent or
steepest descent. The key idea is to start from an initialization
u(x, t = 0) and iteratively march in direction of the negative
energy gradient.
Variational Methods
For the class of functionals considered above, the gradient
descent is given by the following partial differential equation:


 u(x, 0) = u0 (x)

 ∂u(x, t) = − dE = − ∂L + d ∂L .
∂t
du
∂u
dx ∂u 0
Variational Image
Smoothing
Euler-Lagrange
Equation
Gradient Descent
Adaptive Smoothing
Euler and Lagrange
2
Specifically for L(u, u 0 ) = 21 u(x) − f (x) + λ2 |u 0 (x)|2 this
means:
∂u
= (f − u) + λu 00 .
∂t
If the gradient descent evolution converges: ∂u/∂t = − dE
du = 0,
then we have found a solution for the Euler-Lagrange equation.
updated June 25, 2019 8/13
Variational Methods: A
Short Intro
Image Smoothing by Gradient Descent
Prof. Daniel Cremers
Variational Methods
Variational Image
Smoothing
2
R
E(u) = (f − u) dx + λ
R
2
|∇u| dx → min.
Euler-Lagrange
Equation
Gradient Descent
Adaptive Smoothing
Euler and Lagrange
E(u) =
R
|∇u|2 dx → min.
Author: D. Cremers
updated June 25, 2019 9/13
Discontinuity-preserving Smoothing
Variational Methods: A
Short Intro
Prof. Daniel Cremers
Variational Methods
Variational Image
Smoothing
E(u) =
R
2
|∇u| dx → min.
Euler-Lagrange
Equation
Gradient Descent
Adaptive Smoothing
Euler and Lagrange
E(u) =
R
|∇u| dx → min.
Author: D. Cremers
updated June 25, 2019 10/13
Discontinuity-preserving Smoothing
Variational Methods: A
Short Intro
Prof. Daniel Cremers
Variational Methods
Variational Image
Smoothing
Euler-Lagrange
Equation
Gradient Descent
Adaptive Smoothing
Euler and Lagrange
updated June 25, 2019 11/13
Variational Methods: A
Short Intro
Leonhard Euler
Prof. Daniel Cremers
Variational Methods
Variational Image
Smoothing
Euler-Lagrange
Equation
Gradient Descent
Adaptive Smoothing
Euler and Lagrange
Leonhard Euler (1707 – 1783)
• Published 886 papers and books, most of these in the last
20 years of his life. He is generally considered the most
influential mathematician of the 18th century.
• Contributions: Euler number, Euler angle, Euler formula,
Euler theorem, Euler equations (for liquids),
Euler-Lagrange equations,...
• 13 children
updated June 25, 2019 12/13
Joseph-Louis Lagrange
Variational Methods: A
Short Intro
Prof. Daniel Cremers
Variational Methods
Variational Image
Smoothing
Euler-Lagrange
Equation
Gradient Descent
Adaptive Smoothing
Euler and Lagrange
Joseph-Louis Lagrange (1736 – 1813)
• born Giuseppe Lodovico Lagrangia (in Turin). Autodidact.
• At the age of 19: Chair for mathematics in Turin.
• Later worked in Berlin (1766-1787) and Paris (1787-1813).
• 1788: La Méchanique Analytique.
• 1800: Leçons sur le calcul des fonctions.
updated June 25, 2019 13/13
Variational Multiview
Reconstruction
Prof. Daniel Cremers
Chapter 10
Variational Multiview Reconstruction
Multiple View Geometry
Summer 2019
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
Prof. Daniel Cremers
Chair for Computer Vision and Artificial Intelligence
Departments of Informatics & Mathematics
Technical University of Munich
updated June 25, 2019 1/20
Overview
Variational Multiview
Reconstruction
Prof. Daniel Cremers
1 Shape Representation and Optimization
Shape Representation
and Optimization
Variational Multiview
Reconstruction
2 Variational Multiview Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
3 Super-resolution Texture Reconstruction
4 Space-Time Reconstruction from Multiview Video
updated June 25, 2019 2/20
Shape Optimization
Shape optimization is a field of mathematics that is focused on
formulating the estimation of geometric structures by means of
optimization methods.
Among the major challenges in this context is the question how
to mathematically represent shape. The choice of
representation entails a number of consequences, in particular
regarding the question of how efficiently one can store
geometric structures and how efficiently one can compute
optimal geometry.
Variational Multiview
Reconstruction
Prof. Daniel Cremers
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
There exist numerous representations of shape which can
loosely be grouped into two classes:
• Explicit representations: The points of a surface are
represented explicitly (directly), either as a set of points, a
polyhedron or a parameterized surface.
• Implicit representations: The surface is represented
implicity by specifying the parts of ambient space that are
inside and outside a given surface.
updated June 25, 2019 3/20
Variational Multiview
Reconstruction
Explicit Shape Representations
An explicit representations of a closed curve C ⊂ Rd is a
mapping C : S1 → Rd from the circle S1 to Rd . Examples are
polygons or – more generally – spline curves:
C(s) =
N
X
Ci Bi (s),
i=1
where C1 , . . . , CN ∈ Rd denote control points and
B1 , . . . , BN : S1 → R denote a set of spline basis functions:
basis functions
Prof. Daniel Cremers
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
spline & control points
updated June 25, 2019 4/20
Explicit Shape Representations
Variational Multiview
Reconstruction
Prof. Daniel Cremers
Splines can be extended from curves to surfaces or higher
dimensional structures. A spline surface S ⊂ Rd can be
defined as:
X
S(s, t) =
Ci,j Bi (s)Bj (t),
i,j
where Ci,j ∈ Rd denote control points and
B1 , . . . , BN : [0, 1] → R denote a set of spline basis functions.
Depending on whether the surface is closed or open these
basis functions will have a cyclic nature (as below) or not:
basis functions
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
spline surface & cntrl. points
updated June 25, 2019 5/20
Implicit Shape Representations
Variational Multiview
Reconstruction
Prof. Daniel Cremers
One example of an implicit representation is the indicator
function of the surface S, which is a function u : V → {0, 1}
defined on the surrounding volume V ⊂ R3 that takes on the
values 1 inside the surface and 0 outside the surface:

1, if x ∈ int(S)
u(x) =
0, if x ∈ ext(S)
Another example is the signed distance function φ : V → R
which assigns all points in the surrounding volume the (signed)
distance from the surface S:

+d(x, S), if x ∈ int(S)
φ(x) =
−d(x, S), if x ∈ ext(S)
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
Depending on the application it may be useful to know for
every voxel how far it is from the surface. Signed distance
functions can be computed in polynomial time. MatLab: bwdist.
updated June 25, 2019 6/20
Explicit Versus Implicit Representations
Variational Multiview
Reconstruction
Prof. Daniel Cremers
In general, compared to explicit rerpresentations the implicit
representations have the following strengths and weaknesses:
- Implicit representations typically require more memory in
order to represent a geometric structure at a specific
resolution. Rather than storing a few points along the
curve or surface, one needs to store an occupancy value
for each volume element.
- Moving or updating an implicit representation is typically
slower: rather than move a few control points, one needs
to update the occupancy of all volume elements.
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
+ Methods based on implicit representations do not depend
on a choice of parameterization.
+ Implicit representations allow to represent objects of
arbitrary topology (i.e. the number of holes is arbitrary).
+ With respect to an implicit representation many shape
optimization challenges can be formulated as convex
optimization problems and can then be optimized globally.
updated June 25, 2019 7/20
Multiview Reconstruction as Shape Optimization
Variational Multiview
Reconstruction
Prof. Daniel Cremers
How can we cast multiple view reconstruction as a shape
optimization problem? To this end, we will assume that the
camera orientations are given.
Rather than estimate the correspondence between all pairs of
pixels in either image we will simply ask:
How likely is a given voxel x on the object surface S?
3
If the voxel x ∈ V of the given volume V ⊂ R was on the
surface then (up to visibility issues) the projection of that voxel
into each image should give rise to the same color (or local
texture). Thus we can assign to each voxel x ∈ V a so-called
photoconsistency function
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
ρ : V → [0, 1],
which takes on low values (near 0) if the projected voxels give
rise to the same color (or local texture) and high values (near
1) otherwise.
updated June 25, 2019 8/20
A Weighted Minimal Surface Approach
Variational Multiview
Reconstruction
Prof. Daniel Cremers
The reconstruction from multiple views can now be formulated
as finding the maximally photoconsistent surface, i.e. a surface
Sopt with an overall minimal photoconsistency score:
Z
Sopt = arg min ρ(s)ds.
(1)
S
S
This seminal formulation was proposed among others by
Faugeras & Keriven (1998). Many good reconstructions were
computed by starting from an initial guess of S and locally
minimizing this energy using gradient descent. But can we
compute the global minimum?
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
The above energy has a central drawback:
The global minimizer of (1) is the empty set.
It has zero cost while all surfaces have a non-negative energy.
This short-coming of minimal surface formulations is often
called the shrinking bias. How can we prevent the empty set?
updated June 25, 2019 9/20
Variational Multiview
Reconstruction
Imposing Silhouette Consistency
Assume that we additionally have the silhouette Si of the
observed 3D object outlined in every image i = 1, . . . , n. Then
we can formulate the reconstruction problem as a constrained
optimization problem (Cremers, Kolev, PAMI 2011):
Z
min ρ(s)ds, such that πi (S) = Si ∀i = 1, . . . , n.
S
S
Prof. Daniel Cremers
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Written in the indicator function u : V → {0, 1} of the surface S
this reads:
Z
min
ρ(x)|∇u(x)| dx
u:V →{0,1}
Space-Time
Reconstruction from
Multiview Video
V
Z
u(x) dRij ≥ 1, if j ∈ Si
s. t.
(∗)
Rij
Z
u(x) dRij = 0, if j ∈
/ Si ,
Rij
where Rij denotes the visual ray through pixel j of image i.
updated June 25, 2019 10/20
Variational Multiview
Reconstruction
Imposing Silhoutte Consistency
Rij
Prof. Daniel Cremers
S
Si
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
Top view of the geometry and respective visual rays.
Any ray passing through the silhoutte must intersect the object
in at least one voxel.
Any ray passing outside the silhouette may not intersect the
object in any pixel.
Cremers, Kolev, PAMI 2011
updated June 25, 2019 11/20
Variational Multiview
Reconstruction
Convex Relaxation and Thresholding
Prof. Daniel Cremers
By relaxing the binarity constraint on u and allowing
intermediate values between 0 and 1 for the function u, the
optimization problem (∗) becomes convex.
Proposition
Shape Representation
and Optimization
The set
Variational Multiview
Reconstruction
D :=



R
u : V → [0, 1]
Rij
R
Rij
u(x) dRij ≥ 1

if j ∈ Si ∀i, j 
u(x) dRij = 0
if j ∈
/ Si ∀i, j 
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
of silhouette consistent functions is convex.
Proof.
For a proof we refer to Kolev, Cremers, ECCV 2008.
Thus we can compute solutions to the silhouette constrained
reconstruction problem by solving the relaxed convex problem
and subsequently thresholding the computed solution.
updated June 25, 2019 12/20
Reconstructing Complex Geometry
Variational Multiview
Reconstruction
Prof. Daniel Cremers
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
3 out of 33 input images of resolution 1024 × 768
Data courtesy of Y. Furukawa.
updated June 25, 2019 13/20
Reconstructing Complex Geometry
Variational Multiview
Reconstruction
Prof. Daniel Cremers
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
Estimated multiview reconstruction
Cremers, Kolev, PAMI 2011
updated June 25, 2019 14/20
Reconstruction from a Handheld Camera
Variational Multiview
Reconstruction
Prof. Daniel Cremers
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
2/28 images
Estimated multiview reconstruction
Cremers, Kolev, PAMI 2011
updated June 25, 2019 15/20
Multi-view Texture Reconstruction
In addition to the dense geometry S, we can also recover the
texture T : S → R3 of the object from the images Ii : Ωi → R3 .
Rather than simply back-projecting respective images onto the
surface, Goldlücke & Cremers ICCV 2009 suggest to solve a
variational super-resolution approach of the form:
Z
n Z 2
X
b ∗ T ◦ πi−1 − Ii dx + λ k∇S T kds,
min
T :S→R3
i=1 Ω
i
S
where b is a linear operator representing blurring and
downsampling and πi denotes the projection onto image Ωi :
Variational Multiview
Reconstruction
Prof. Daniel Cremers
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
updated June 25, 2019 16/20
Multi-view Texture Reconstruction
The super-resolution texture estimation is a convex
optimization problem which can be solved efficiently. It
generates a textured model of the object which cannot be
distinguished from the original:
Variational Multiview
Reconstruction
Prof. Daniel Cremers
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
One of 36 input images
textured 3D model
Goldlücke, Cremers, ICCV 2009, DAGM 2009
updated June 25, 2019 17/20
Multi-view Texture Reconstruction
The super-resolution approach exploits the fact that every
surface patch is observed in multiple images. It allows to invert
the blurring and downsampling, providing a high-resolution
texturing which is sharper than the individual input images:
Variational Multiview
Reconstruction
Prof. Daniel Cremers
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
input image close-up
super-resolution texture
Goldlücke, Cremers, ICCV 2009, DAGM 2009
updated June 25, 2019 18/20
Space-Time Reconstruction from Multi-view Video
Although laser-based reconstruction is often more accurate
and more reliable (in the absence of texture), image-based
reconstruction has two advantages:
Variational Multiview
Reconstruction
Prof. Daniel Cremers
• One can extract geometry and color of the objects.
• On can reconstruct actions over time filmed with multiple
synchronized cameras.
Oswald & Cremers 4DMOD 2013 and Oswald, Stühmer,
Cremers, ECCV 2014, propose convex variational approaches
for dense space-time reconstruction from multi-view video.
1/16 input videos
Shape Representation
and Optimization
Variational Multiview
Reconstruction
Super-resolution
Texture Reconstruction
Space-Time
Reconstruction from
Multiview Video
Dense reconstructions over time
Oswald, Stühmer, Cremers, ECCV 2014
updated June 25, 2019 19/20
Toward Free-Viewpoint Television
Space-time action reconstructions as done in Oswald &
Cremers 2013 entail many fascinating applications, including:
• For video conferencing one can transmit a full 3D model of
a speaker which gives stronger sense of presence and
immersion.
Variational Multiview
Reconstruction
Prof. Daniel Cremers
Shape Representation
and Optimization
• For sports analysis one can analyze the precise motion of
a gymnast.
Variational Multiview
Reconstruction
• For free viewpoint television, the spectator at home can
interactively chose from which viewpoint to follow an
action.
Space-Time
Reconstruction from
Multiview Video
Super-resolution
Texture Reconstruction
Textured action reconstruction for free-viewpoint television
Oswald, Cremers, 4DMOD 2013
updated June 25, 2019 20/20
Download