Mathematical Background: Linear Algebra Chapter 1 Mathematical Background: Linear Algebra Multiple View Geometry Summer 2019 Prof. Daniel Cremers Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition Prof. Daniel Cremers Chair for Computer Vision and Artificial Intelligence Departments of Informatics & Mathematics Technical University of Munich updated May 6, 2019 1/28 Overview Mathematical Background: Linear Algebra Prof. Daniel Cremers 1 Vector Spaces Vector Spaces Linear Transformations and Matrices 2 Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition 3 Properties of Matrices 4 Singular Value Decomposition updated May 6, 2019 2/28 Vector Space (Vektorraum) A set V is called a linear space or a vector space over the field R if it is closed under vector summation Mathematical Background: Linear Algebra Prof. Daniel Cremers +: V ×V →V Vector Spaces and under scalar multiplication · : R × V → V, Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition i.e. αv1 + βv2 ∈ V ∀v1 , v2 ∈ V , ∀α, β ∈ R. With respect to addition (+) it forms a commutative group (existence of neutral element 0, inverse element −v ). Scalar multiplication respects the structure of R: α(βu) = (αβ)u. Multiplication and addition respect the distributive law: (α + β)v = αv + βv and α(v + u) = αv + αu. Example: V = Rn , v = (x1 , . . . , xn )> . A subset W ⊂ V of a vector space V is called subspace if 0 ∈ W and W is closed under + and · (for all α ∈ R). updated May 6, 2019 3/28 Mathematical Background: Linear Algebra Linear Independence and Basis The spanned subspace of a set of vectors S = {v1 , . . . , vk } ⊂ V is the subspace formed by all linear combinations of these vectors: ( ) k X span(S) = v ∈ V v = αi vi i=1 Prof. Daniel Cremers Vector Spaces Linear Transformations and Matrices Properties of Matrices The set S is called linearly independent if: k X Singular Value Decomposition αi vi = 0 ⇒ αi = 0 ∀i, i=1 in other words: if none of the vectors can be expressed as a linear combination of the remaining vectors. Otherwise the set is called linearly dependent. A set of vectors B = {v1 , . . . , vn } is called a basis of V if it is linearly independent and if it spans the vector space V . A basis is a maximal set of linearly independent vectors. updated May 6, 2019 4/28 Properties of a Basis Let B and B 0 be two bases of a linear space V . 1 2 Prof. Daniel Cremers B and B 0 contain the same number of vectors. This number n is called the dimension of the space V . Any vector v ∈ V can be uniquely expressed as a linear combination of the basis vectors in B = {b1 , . . . , bn }: v= n X Vector Spaces Linear Transformations and Matrices αi bi . i=1 3 Mathematical Background: Linear Algebra Properties of Matrices Singular Value Decomposition In particular, all vectors of B can be expressed as linear combinations of vectors of another basis bi0 ∈ B 0 : bi0 = n X αji bj j=1 The coefficients αji for this basis transform can be combined in a matrix A. Setting B ≡ (b1 , . . . , bn ) and B 0 ≡ (b10 , . . . , bn0 ) as the matrices of basis vectors, we can write: B 0 = BA ⇔ B = B 0 A−1 . updated May 6, 2019 5/28 Inner Product On a vector space one can define an inner product (dot product, dt.: Skalarprodukt 6= skalare Multiplikation): Mathematical Background: Linear Algebra Prof. Daniel Cremers h·, ·i : V × V → R which is defined by three properties: 1 2 3 Vector Spaces hu, αv + βwi = αhu, v i + βhu, wi (linear) hu, v i = hv , ui (symmetric) hv , v i ≥ 0 and hv , v i = 0 ⇔ v = 0 (positive definite) Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition The scalar product induces a norm | · | : V → R, |v | = p hv , v i and a metric d : V × V → R, d(v , w) = |v − w| = p hv − w, v − wi for measuring lengths and distances, making V a metric space. Since the metric is induced by a scalar product V is called a Hilbert space. updated May 6, 2019 6/28 Mathematical Background: Linear Algebra Canonical and Induced Inner Product On V = Rn , one can define the canonical inner product for the canonical basis B = In as hx, y i = x > y = n X Prof. Daniel Cremers xi yi i=1 which induces the standard L2 -norm or Euclidean norm q √ |x|2 = x > x = x12 + · · · + xn2 Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition With a basis transform A to the new basis B 0 given by I = B 0 A−1 the canonical inner product in the new coordinates x 0 , y 0 is given by: hx, y i = x > y = (Ax 0 )> (Ay 0 ) = x 0> A> A y 0 ≡ hx 0 , y 0 iA> A The latter product is called the induced inner product from the matrix A. Two vectors v and w are orthogonal iff hv , wi = 0. updated May 6, 2019 7/28 Kronecker Product and Stack of a Matrix Given two matrices A ∈ Rm×n and B ∈ Rk ×l , one can define their Kronecker product A ⊗ B by: a11 B · · · a1n B .. .. mk ×nl .. A⊗B ≡ . ∈R . . . am1 B ··· amn B Mathematical Background: Linear Algebra Prof. Daniel Cremers Vector Spaces Linear Transformations and Matrices Properties of Matrices In Matlab this can be implemented by C=kron(A,B). m×n Singular Value Decomposition s Given a matrix A ∈ R , its stack A is obtained by stacking its n column vectors a1 , . . . , an ∈ Rm : a1 As ≡ ... ∈ Rmn . an These notations allow to rewrite algebraic expressions, for example: u > A v = (v ⊗ u)> As . updated May 6, 2019 8/28 Linear Transformations and Matrices Linear algebra studies the properties of linear transformations between linear spaces. Since these can be represented by matrices, linear algebra studies the properties of matrices. A linear transformation L between two linear spaces V and W is a map L : V → W such that: Mathematical Background: Linear Algebra Prof. Daniel Cremers Vector Spaces • L(x + y ) = L(x) + L(y ) ∀x, y ∈ V • L(αx) = αL(x) ∀x ∈ V , α ∈ R. Due to the linearity, the action of L on the space V is uniquely defined by its action on the basis vectors of V . In the canonical basis {e1 , . . . , en } we have: L(x) = Ax Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition ∀x ∈ V , where A = (L(e1 ), . . . , L(en )) ∈ Rm×n . The set of all real m × n-matrices is denoted by M(m, n). In the case that m = n, the set M(m, n) ≡ M(n) forms a ring over the field R, i.e. it is closed under matrix multiplication and summation. updated May 6, 2019 9/28 Mathematical Background: Linear Algebra The Linear Groups GL(n) and SL(n) There exist certain sets of linear transformations which form a group. A group is a set G with an operation ◦ : G × G → G such that: Prof. Daniel Cremers 1 closed: g1 ◦ g2 ∈ G ∀g1 , g2 ∈ G, 2 assoc.: (g1 ◦ g2 ) ◦ g3 = g1 ◦ (g2 ◦ g3 ) ∀g1 , g2 , g3 ∈ G, Linear Transformations and Matrices 3 neutral: ∃e ∈ G : e ◦ g = g ◦ e = g Properties of Matrices 4 inverse: ∃g −1 ∈ G : g ◦ g −1 = g −1 ◦ g = e Vector Spaces ∀g ∈ G, ∀g ∈ G. Singular Value Decomposition Example: All invertible (non-singular) real n × n-matrices form a group with respect to matrix multiplication. This group is called the general linear group GL(n). It consists of all A ∈ M(n) for which det(A) 6= 0 All matrices A ∈ GL(n) for which det(A) = 1 form a group called the special linear group SL(n). The inverse of A is also in this group, as det(A−1 ) = det(A)−1 updated May 6, 2019 10/28 Matrix Representation of Groups Mathematical Background: Linear Algebra A group G has a matrix representation (dt.: Darstellung) or can be realized as a matrix group if there exists an injective transformation: R : G → GL(n) Prof. Daniel Cremers which preserves the group structure of G, that is inverse and composition are preserved by the map: Vector Spaces Linear Transformations and Matrices Properties of Matrices R(e) = In×n , R(g ◦ h) = R(g)R(h) ∀g, h ∈ G Singular Value Decomposition Such a map R is called a group homomorphism (dt. Homomorphismus). The idea of matrix representations of a group is that they allow to analyze more abstract groups by looking at the properties of the respective matrix group. Example: The rotations of an object form a group, as there exists a neutral element (no rotation) and an inverse (the inverse rotation) and any concatenation of rotations is again a rotation (around a different axis). Studying the properties of the rotation group is easier if rotations are represented by respective matrices. updated May 6, 2019 11/28 Mathematical Background: Linear Algebra The Affine Group A(n) Prof. Daniel Cremers n n An affine transformation L : R → R is defined by a matrix A ∈ GL(n) and a vector b ∈ Rn such that: L(x) = Ax + b The set of all such affine transformations is called the affine group of dimension n, denoted by A(n). L defined above is not a linear map unless b = 0. By introducing homogeneous coordinates to represent x ∈ Rn by x n+1 , L becomes a linear mapping from 1 ∈R x A b x n+1 n+1 L:R →R ; 7→ . 1 0 1 1 Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition A matrix A0 1b with A ∈ GL(n) and b ∈ Rn is called an affine matrix. It is an element of GL(n + 1). The affine matrices form a subgroup of GL(n + 1). Why? updated May 6, 2019 12/28 Mathematical Background: Linear Algebra The Orthogonal Group O(n) A matrix A ∈ M(n) is called orthogonal if it preserves the inner product, i.e.: hAx, Ay i = hx, y i, Prof. Daniel Cremers ∀x, y ∈ Rn . The set of all orthogonal matrices forms the orthogonal group O(n), which is a subgroup of GL(n). For an orthogonal matrix R we have hRx, Ry i = x > R > Ry = x > y , ∀x, y ∈ Rn Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition Therefore we must have R > R = RR > = I, in other words: O(n) = {R ∈ GL(n) | R > R = I} The above identity shows that for any orthogonal matrix R, we have det(R > R) = (det(R))2 = det(I) = 1, such that det(R) ∈ {±1}. The subgroup of O(n) with det(R) = +1 is called the special orthogonal group SO(n). SO(n) = O(n) ∩ SL(n). In particular, SO(3) is the group of all 3-dimensional rotation matrices. updated May 6, 2019 13/28 Mathematical Background: Linear Algebra The Euclidean Group E(n) A Euclidean transformation L from Rn to Rn is defined by an orthogonal matrix R ∈ O(n) and a vector T ∈ Rn : L : Rn → Rn ; Prof. Daniel Cremers x 7→ Rx + T . The set of all such transformations is called the Euclidean group E(n). It is a subgroup of the affine group A(n). Embedded by homogeneous coordinates, we get: ( ) R T n E(n) = R ∈ O(n), T ∈ R . 0 1 Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition If R ∈ SO(n) (i.e. det(R) = 1), then we have the special Euclidean group SE(n). In particular, SE(3) represents the rigid-body motions (dt.: Starrkörpertransformationen) in R3 . In summary: SO(n) ⊂ O(n) ⊂ GL(n), SE(n) ⊂ E(n) ⊂ A(n) ⊂ GL(n + 1). updated May 6, 2019 14/28 Range, Span, Null Space and Kernel Let A ∈ Rm×n be a matrix defining a linear map from Rn to Rm . The range or span of A (dt.: Bild) is defined as the subspace of Rm which can be ‘reached’ by A: Mathematical Background: Linear Algebra Prof. Daniel Cremers range(A) = {y ∈ Rm ∃x ∈ Rn : Ax = y }. Vector Spaces The range of a matrix A is given by the span of its column vectors. The null space or kernel of a matrix A (dt.: Kern) is given by the subset of vectors x ∈ Rn which are mapped to zero: Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition null(A) ≡ ker(A) = {x ∈ Rn Ax = 0}. The null space of a matrix A is given by the vectors orthogonal to its row vectors. Matlab: Z=null(A). The concepts of range and null space are useful when studying the solution of linear equations. The system Ax = b will have a solution x ∈ Rn if and only if b ∈ range(A). Moreover, this solution will be unique only if ker(A) = {0}. Indeed, if xs is a solution of Ax = b and xo ∈ ker(A), then xs +xo is also a solution: A(xs +xo ) = Axs + Axo = b. updated May 6, 2019 15/28 Rank of a Matrix The rank of a matrix (dt. Rang) is the dimension of its range: Mathematical Background: Linear Algebra Prof. Daniel Cremers rank(A) = dim (range(A)) . The rank of a matrix A ∈ Rm×n has the following properties: 1 rank(A) = n − dim(ker(A)). 2 0 ≤ rank(A) ≤ min{m, n}. rank(A) is equal to the maximum number of linearly independent row (or column) vectors of A. 3 Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition 4 rank(A) is the highest order of a nonzero minor of A, where a minor of order k is the determinant of a k × k submatrix of A. Sylvester’s inequality: Let B ∈ Rn×k . Then AB ∈ Rm×k and rank(A) + rank(B) − n ≤ rank(AB) ≤ min{rank(A), rank(B)}. 5 For any nonsingular matrices C ∈ Rm×m and D ∈ Rn×n , we have: rank(A) = rank(CAD). Matlab: d=rank(A). 6 updated May 6, 2019 16/28 Mathematical Background: Linear Algebra Eigenvalues and Eigenvectors Prof. Daniel Cremers Let A ∈ Cn×n be a complex matrix. A non-zero vector v ∈ Cn is called a (right) eigenvector of A if: Av = λv , with λ ∈ C. λ is called an eigenvalue of A. Similarly v is called a left eigenvector of A, if v > A = λv > for some λ ∈ C. The spectrum σ(A) of a matrix A is the set of all its eigenvalues. Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition Matlab: [V,D]=eig(A); where D is a diagonal matrix containing the eigenvalues and V is a matrix whose columns are the corresponding eigenvectors, such that AV=VD. updated May 6, 2019 17/28 Properties of Eigenvalues and Eigenvectors Let A ∈ Rn×n be a square matrix. Then: 1 2 3 4 5 If Av = λv for some λ ∈ R, then there also exists a left-eigenvector η ∈ Rn : η > A = λη > . Hence σ(A) = σ(A> ). The eigenvectors of a matrix A associated with different eigenvalues are linearly independent. Mathematical Background: Linear Algebra Prof. Daniel Cremers Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition All eigenvalues σ(A) are the roots of the characteristic polynomial det(λI − A) = 0. Therefore det(A) is equal to the product of all eigenvalues (some of which may appear multiple times). If B = PAP −1 for some nonsingular matrix P, then σ(B) = σ(A). If λ ∈ C is an eigenvalues, then its conjugate λ is also an eigenvalue. Thus σ(A) = σ(A) for real matrices A. updated May 6, 2019 18/28 Symmetric Matrices A matrix S ∈ Rn×n is called symmetric if S > = S. A symmetric matrix S is called positive semi-definite (denoted by S ≥ 0 or S 0) if x > Sx ≥ 0. S is called positive definite (denoted by S > 0 or S 0) if x > Sx > 0 ∀x 6= 0. Let S ∈ Rn×n be a real symmetric matrix. Then: Mathematical Background: Linear Algebra Prof. Daniel Cremers Vector Spaces 1 All eigenvalues of S are real, i.e. σ(S) ⊂ R. 2 Eigenvectors vi and vj of S corresponding to distinct eigenvalues λi 6= λj are orthogonal. 3 There always exist n orthonormal eigenvectors of S which form a basis of Rn . Let V = (v1 , . . . , vn ) ∈ O(n) be the orthogonal matrix of these eigenvectors, and Λ = diag{λ1 , . . . , λn } the diagonal matrix of eigenvalues. Then we have S = V Λ V > . 4 S is positive (semi-)definite, if all eigenvalues are positive (nonnegative). 5 Let S be positive semi-definite and λ1 , λn the largest and smallest eigenvalue. Then λ1 = max|x|=1 hx, Sxi and λn = min|x|=1 hx, Sxi. Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition updated May 6, 2019 19/28 Mathematical Background: Linear Algebra Norms of Matrices There are many ways to define norms on the space of matrices A ∈ Rm×n . They can be defined based on norms on the domain or codomain spaces on which A operates. In particular, the induced 2-norm of a matrix A is defined as q ||A||2 ≡ max |Ax|2 = max hx, A> Axi. |x|2 =1 |x|2 =1 Prof. Daniel Cremers Vector Spaces Linear Transformations and Matrices Properties of Matrices Alternatively, one can define the Frobenius norm of A as: sX q ||A||f ≡ aij2 = trace(A> A). Singular Value Decomposition i,j Note that these norms are in general not the same. Since the matrix A> A is symmetric and pos. semi-definite, we can diagonalize it as: A> A = V diag σ12 , . . . , σn2 V > with σ12 ≥ σi2 ≥ 0. This leads to: q q ||A||2 = σ1 , and ||A||f = trace(A> A) = σ12 + . . . + σn2 . updated May 6, 2019 20/28 Mathematical Background: Linear Algebra Skew-symmetric Matrices Prof. Daniel Cremers A matrix A ∈ Rn×n is called skew-symmetric or anti-symmetric (dt. schiefsymmetrisch) if A> = −A. If A is a real skew-symmetric matrix, then: 1 2 All eigenvalues of A are either zero or purely imaginary, i.e. of the form iω with i 2 = −1, ω ∈ R. There exists an orthogonal matrix V such that Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition > A = V ΛV , where Λ is a block-diagonal matrix Λ = diag{A1 , . . . , Am , 0, . . . , 0}, with real skew-symmetric matrices Ai of the form: 0 ai Ai = ∈ R2×2 , i = 1, . . . , m. −ai 0 In particular, the rank of any skew-symmetric matrix is even. updated May 6, 2019 21/28 Examples of Skew-symmetric Matrices Mathematical Background: Linear Algebra Prof. Daniel Cremers In Computer Vision, a common skew-symmetric matrix is given by the hat operator of a vector u ∈ R3 is: 0 −u3 u2 b = u3 0 −u1 ∈ R3×3 . u −u2 u1 0 This is a linear operator from the space of vectors R3 to the space of skew-symmetric matrices in R3×3 . b has the property that In particular, the matrix u Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition bv = u × v , u where × denotes the standard vector cross product in R3 . For b) = 2 and the null space of u b is spanned u 6= 0, we have rank(u bu = u > u b = 0. by u, because u updated May 6, 2019 22/28 The Singular Value Decomposition (SVD) Mathematical Background: Linear Algebra Prof. Daniel Cremers In the last slides, we have studied many properties of matrices, such as rank, range, null space, and induced norms of matrices. Many of these properties can be captured by the so-called singular value decomposition (SVD). SVD can be seen as a generalization of eigenvalues and eigenvectors to non-square matrices. The computation of SVD is numerically well-conditioned. It is very useful for solving linear-algebraic problems such as matrix inversion, rank computation, linear least-squares estimation, projections, and fixed-rank approximations. Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition In practice, both singular value decomposition and eigenvalue decompositions are used quite extensively. updated May 6, 2019 23/28 Algebraic Derivation of SVD Let A ∈ Rm×n with m ≥ n be a matrix of rank(A) = p. Then there exist Mathematical Background: Linear Algebra Prof. Daniel Cremers • U ∈ Rm×p whose columns are orthonormal • V ∈ Rn×p whose columns are orthonormal, and • Σ ∈ Rp×p , Σ = diag{σ1 , . . . , σp }, with σ1 ≥ . . . ≥ σp , Vector Spaces Linear Transformations and Matrices Properties of Matrices such that A = U ΣV > . Singular Value Decomposition Note that this generalizes the eigenvalue decomposition. While the latter decomposes a symmetric square matrix A with an orthogonal transformation V as: A = V Λ V >, with V ∈ O(n), Λ = diag{λ1 , . . . , λn }, SVD allows to decompose an arbitrary (non-square) matrix A of rank p with two transformations U and V with orthonormal columns as shown above. Nevertheless, we will see that SVD is based on the eigenvalue decomposition of symmetric square matrices. updated May 6, 2019 24/28 Proof of SVD Decomposition 1 Given a matrix A ∈ Rm×n with m ≥ n and rank(A) = p, the matrix A> A ∈ Rn×n is symmetric and positive semi-definite. Therefore it can be decomposed with non-negative eigenvalues σ12 ≥ . . . ≥ σn2 ≥ 0 with orthonormal eigenvectors v1 , . . . , vn . The σi are called singular values. Since > > > ker(A A) = ker(A) and range(A A) = range(A ), Mathematical Background: Linear Algebra Prof. Daniel Cremers Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition we have span{v1 , . . . , vp } = range(A> ) and span{vp+1 , . . . , vn } = ker(A). Let ui ≡ 1 Avi ⇔ Avi = σi ui , σi i = 1, . . . , p then the ui ∈ Rm are orthonormal: hui , uj i = 1 1 hAvi , Avj i = hvi , A> Avj i = δij . σi σj σi σj updated May 6, 2019 25/28 Mathematical Background: Linear Algebra Proof of SVD Decomposition 2 Complete {ui }pi=1 to a basis {ui }m i=1 of Rm . Since Avi have σ1 0 0 ··· .. . .. 0 0 . .. A (v1 , . . . , vn ) = (u1 , . . . , um ) 0 · · · σp . . . .. · · · · · · .. 0 ··· ··· 0 = σi ui , we 0 Prof. Daniel Cremers 0 , 0 0 0 Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition which is of the form AṼ = Ũ Σ̃, thus A = Ũ Σ̃ Ṽ > . Now simply delete all columns of Ũ and the rows of Ṽ > which are multiplied by zero singular values and we obtain the form A = U Σ V > , with U ∈ Rm×p and V ∈ Rn×p . In Matlab: [U,S,V] = svd(A). updated May 6, 2019 26/28 A Geometric Interpretation of SVD For A ∈ Rn×n , the singular value decomposition A = U Σ V > is such that the columns U = (u1 , . . . , un ) and V = (v1 , . . . , vn ) form orthonormal bases of Rn . If a point x ∈ Rn is mapped to a point y ∈ Rn by the transformation A, then the coordinates of y in basis U are related to the coordinates of x in basis V by the diagonal matrix Σ: each coordinate is merely scaled by the corresponding singular value: y = Ax = U Σ V > x ⇔ U > y = Σ V > x. Mathematical Background: Linear Algebra Prof. Daniel Cremers Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition The matrix A maps the unit sphere into an ellipsoid with semi-axes σi ui . To see this, we call α ≡ V > x the coefficients of the point x in the basis V and those of y in basis UP shall be called β ≡ U > y . All points of the circle fulfill |x|22 = i αi2 = 1. The above statement says that βi = σi αi . Thus for the points on the sphere we have X X αi2 = βi2 /σi2 = 1, i i which states that the transformed points lie on an ellipsoid oriented along the axes of the basis U. updated May 6, 2019 27/28 The Generalized (Moore Penrose) Inverse For certain quadratic matrices one can define an inverse matrix, if det(A) 6= 0. The set of all invertible matrices forms the group GL(n). One can also define a (generalized) inverse (also called pseudo inverse) (dt.: Pseudoinverse) for an arbitrary (non-quadratic) matrix A ∈ Rm×n . If its SVD is A = U Σ V > the pseudo inverse is defined as: −1 Σ1 0 A† = V Σ† U > , where Σ† = , 0 0 n×m Mathematical Background: Linear Algebra Prof. Daniel Cremers Vector Spaces Linear Transformations and Matrices Properties of Matrices Singular Value Decomposition where Σ1 is the diagonal matrix of non-zero singular values. In Matlab: X=pinv(A). In particular, the pseudo inverse can be employed in a similar fashion as the inverse of quadratic invertible matrices: AA† A = A, A† AA† = A† . The linear system Ax = b with A ∈ Rm×n of rank r ≤ min(m, n) can have multiple or no solutions. xmin = A† b is among all 2 minimizers of |Ax − b| the one with the smallest norm |x|. updated May 6, 2019 28/28 Representing a Moving Scene Prof. Daniel Cremers Chapter 2 Representing a Moving Scene Multiple View Geometry Summer 2019 The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles Prof. Daniel Cremers Chair for Computer Vision and Artificial Intelligence Departments of Informatics & Mathematics Technical University of Munich updated May 8, 2019 1/26 Overview Representing a Moving Scene Prof. Daniel Cremers 1 The Origins of 3D Reconstruction 2 3D Space & Rigid Body Motion The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) 3 The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion 4 The Lie Group SE(3) Euler Angles 5 Representing the Camera Motion 6 Euler Angles updated May 8, 2019 2/26 The Origins of 3D Reconstruction The goal to reconstruct the three-dimensional structure of the world from a set of two-dimensional views has a long history in computer vision. It is a classical ill-posed problem, because the reconstruction consistent with a given set of observations or images is typically not unique. Therefore, one will need to impose additional assumptions. Mathematically, the study of geometric relations between a 3D scene and the observed 2D projections is based on two types of transformations, namely: • Euclidean motion or rigid-body motion representing the motion of the camera from one frame to the next. Representing a Moving Scene Prof. Daniel Cremers The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles • Perspective projection to account for the image formation process (see pinhole camera, etc). The notion of perspective projection has its roots among the ancient Greeks (Euclid of Alexandria, ∼ 400 B.C.) and the Renaissance period (Brunelleschi & Alberti, 1435). The study of perspective projection lead to the field of projective geometry (Girard Desargues 1648, Gaspard Monge 18th cent.). updated May 8, 2019 3/26 The Origins of 3D Reconstruction Representing a Moving Scene Prof. Daniel Cremers The first work on the problem of multiple view geometry was that of Erwin Kruppa (1913) who showed that two views of five points are sufficient to determine both the relative transformation (motion) between the two views and the 3D location (structure) of the points up to finitely many solutions. A linear algorithm to recover structure and motion from two views based on the epipolar constraint was proposed by Longuet-Higgins in 1981. An entire series of works along these lines was summarized in several text books (Faugeras 1993, Kanatani 1993, Maybank 1993, Weng et al. 1993). The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles Extensions to three views were developed by Spetsakis and Aloimonos ’87, ’90, and by Shashua ’94 and Hartley ’95. Factorization techniques for multiple views and orthogonal projection were developed by Tomasi and Kanade 1992. The joint estimation of camera motion and 3D location is called structure and motion or (more recently) visual SLAM. updated May 8, 2019 4/26 Three-dimensional Euclidean Space The three-dimensional Euclidean space E3 consists of all points p ∈ E3 characterized by coordinates Representing a Moving Scene Prof. Daniel Cremers X ≡ (X1 , X2 , X3 )> ∈ R3 , such that E3 can be identified with R3 . That means we talk about points (E3 ) and coordinates (R3 ) as if they were the same thing. Given two points X and Y , one can define a bound vector as v = X − Y ∈ R3 . Considering this vector independent of its base point Y makes it a free vector. The set of free vectors v ∈ R3 forms a linear vector space. By identifying E3 and R3 , one can endow E3 with a scalar product, a norm and a metric. This allows to compute distances, curve length Z The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles 1 l(γ) ≡ |γ̇(s)| ds for a curve γ : [0, 1] → R3 , 0 areas or volumes. updated May 8, 2019 5/26 Cross Product & Skew-symmetric Matrices On R3 one can define a cross product × : R 3 × R3 → R3 : u2 v3 − u3 v2 u × v = u3 v1 − u1 v3 ∈ R3 , u1 v2 − u2 v1 which is a vector orthogonal to u and v . Since u × v = −v × u, the cross product introduces an orientation. Fixing u induces a linear mapping v 7→ u × v which can be represented by the skew-symmetric matrix 0 −u3 u2 b = u3 0 −u1 ∈ R3×3 . u −u2 u1 0 Representing a Moving Scene Prof. Daniel Cremers The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles In turn, every skew symmetric matrix M = −M > ∈ R3×3 can be identified with a vector u ∈ R3 . The operator b defines an isomorphism between R3 and the space so(3) of all 3 × 3 skew-symmetric matrices. Its inverse is denoted by ∨ : so(3) → R3 . updated May 8, 2019 6/26 Representing a Moving Scene Rigid-body Motion Prof. Daniel Cremers A rigid-body motion (or rigid-body transformation) is a family of maps gt : R3 → R3 ; X 7→ gt (X ), t ∈ [0, T ] which preserve the norm and cross product of any two vectors: • |gt (v )| = |v |, 3D Space & Rigid Body Motion ∀ v ∈ R3 , • gt (u) × gt (v ) = gt (u × v ), The Origins of 3D Reconstruction ∀ u, v ∈ R3 . Since norm and scalar product are related by the polarization identity 1 hu, v i = (|u + v |2 − |u − v |2 ), 4 one can also state that a rigid-body motion is a map which preserves inner product and cross product. As a consequence, rigid-body motions also preserve the triple product hgt (u), gt (v ) × gt (w)i = hu, v × wi, The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles ∀ u, v , w ∈ R3 , which means that they are volume-preserving. updated May 8, 2019 7/26 Representation of Rigid-body Motion Does the above definition lead to a mathematical representation of rigid-body motion? Since it preserves lengths and orientation, the motion gt of a rigid body is sufficiently defined by specifying the motion of a Cartesian coordinate frame attached to the object (given by an origin and orthonormal oriented vectors e1 , e2 , e3 ∈ R3 ). The motion of the origin can be represented by a translation T ∈ R3 , whereas the transformation of the vectors ei is given by new vectors ri = gt (ei ). Scalar and cross product of these vectors are preserved: ri> rj > = gt (ei ) gt (ej ) = ei> ej = δij , Representing a Moving Scene Prof. Daniel Cremers The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles r1 × r2 = r3 . The first constraint amounts to the statement that the matrix R = (r1 , r2 , r3 ) is an orthogonal (rotation) matrix: R > R = RR > = I, whereas the second property implies that det(R) = +1, in other words: R is an element of the group SO(3) = R ∈ R3×3 | R > R = I, det(R) = +1 . Thus the rigid body motion gt can be written as: gt (x) = Rx + T . updated May 8, 2019 8/26 Exponential Coordinates of Rotation We will now derive a representation of an infinitesimal rotation. To this end, consider a family of rotation matrices R(t) which continuously transform a point from its original location (R(0) = I) to a different one. with R(t) ∈ SO(3). X trans (t) = R(t)X orig , Representing a Moving Scene Prof. Daniel Cremers The Origins of 3D Reconstruction 3D Space & Rigid Body Motion Since R(t)R(t)> = I, ∀t, we have The Lie Group SO(3) d (RR > ) = ṘR > + R Ṙ > = 0 dt ⇒ ṘR > = −(ṘR > )> . Representing the Camera Motion Thus, ṘR > is a skew-symmetric matrix. As shown in the section about the b-operator, this implies that there exists a vector w(t) ∈ R3 such that: b Ṙ(t)R > (t) = w(t) ⇔ The Lie Group SE(3) Euler Angles b Ṙ(t) = wR(t). b Since R(0) = I, it follows that Ṙ(0) = w(0). Therefore the b skew-symmetric matrix w(0) ∈ so(3) gives the first order approximation of a rotation: b R(dt) = R(0) + dR = I + w(0) dt. updated May 8, 2019 9/26 Lie Group and Lie Algebra The above calculations showed that the effect of any infinitesimal rotation R ∈ SO(3) can be approximated by an element from the space of skew-symmetric matrices b | w ∈ R3 }. so(3) = {w Representing a Moving Scene Prof. Daniel Cremers The Origins of 3D Reconstruction The rotation group SO(3) is called a Lie group. The space so(3) is called its Lie algebra. 3D Space & Rigid Body Motion Def.: A Lie group (or infinitesimal group) is a smooth manifold that is also a group, such that the group operations multiplication and inversion are smooth maps. The Lie Group SE(3) The Lie Group SO(3) Representing the Camera Motion Euler Angles As shown above: The Lie algebra so(3) is the tangent space at the identity of the rotation group SO(3). An algebra over a field K is a vector space V over K with b and vb of the Lie multiplication on the space V . Elements w algebra generally do not commute. One can define the Lie bracket [ . , . ] : so(3) × so(3) → so(3); b vb] ≡ w b vb − vbw. b [w, updated May 8, 2019 10/26 Sophus Lie (1841 - 1899) Representing a Moving Scene Prof. Daniel Cremers The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) Marius Sophus Lie was a Norwegian-born mathematician. He created the theory of continuous symmetry, and applied it to the study of geometry and differential equations. Among his greatest achievements was the discovery that continuous transformation groups are better understood in their linearized versions (“Theorie der Transformationsgruppen” 1893). These infinitesimal generators form a structure which is today known as a Lie algebra. The linearized version of the group law corresponds to an operation on the Lie algebra known as the commutator bracket or Lie bracket. 1882 Professor in Christiania (Oslo), 1886 Leipzig (succeeding Felix Klein), 1898 Christiania. The Lie Group SE(3) Representing the Camera Motion Euler Angles updated May 8, 2019 11/26 The Exponential Map Given the infinitesimal formulation of rotation in terms of the b is it possible to determine a useful skew symmetric matrix w, b is representation of the rotation R(t)? Let us assume that w constant in time. The differential equation system ( b Ṙ(t) = wR(t), R(0) = I. Representing a Moving Scene Prof. Daniel Cremers The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) has the solution R(t) = ewb t = Representing the Camera Motion ∞ X n=0 b n b 2 (wt) (wt) b + = I + wt + ..., n! 2! Euler Angles which is a rotation around the axis w ∈ R3 by an angle of t (if kwk = 1). Alternatively, one can absorb the scalar t ∈ R into b to obtain R(t) = ebv with vb = wt. b the skew symmetric matrix w This matrix exponential therefore defines a map from the Lie algebra to the Lie group: exp : so(3) → SO(3); b 7→ ewb . w updated May 8, 2019 12/26 The Logarithm of SO(3) Representing a Moving Scene Prof. Daniel Cremers As in the case of real analysis one can define an inverse function to the exponential map by the logarithm. In the context of Lie groups, this will lead to a mapping from the Lie group to the Lie algebra. For any rotation matrix R ∈ SO(3), there exists b Such an element is denoted a w ∈ R3 such that R = exp(w). b by w = log(R). If R = (rij ) 6= I, then an appropriate w is given by: r32 − r23 w 1 trace(R) − 1 r13 − r31 . , = |w| = cos−1 2 |w| 2 sin(|w|) r21 − r12 The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles For R = I, we have |w| = 0, i.e. a rotation by an angle 0. The above statement says: Any orthogonal transformation R ∈ SO(3) can be realized by rotating by an angle |w| around w an axis |w| as defined above. We will not prove this statement. Obviously the above representation is not unique since increasing the angle by multiples of 2π will give the same rotation R. updated May 8, 2019 13/26 Schematic Visualization of Lie Group & Lie Algebra Representing a Moving Scene Prof. Daniel Cremers The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles Definition: A Lie group is a smooth manifold that is also a group, such that the group operations multiplication and inversion are smooth maps. Definition: The tangent space to a Lie group at the identity element is called the associated Lie algebra. The mapping from the Lie algebra to the Lie group is called the exponential map. Its inverse is called the logarithm. updated May 8, 2019 14/26 Rodrigues’ Formula We have seen that any rotation can be realized by computing R = ewb . In analogy to the well-known Euler equation eiφ = cos(φ) + i sin(φ), Prof. Daniel Cremers ∀φ ∈ R, we have an expression for skew symmetric matrices b ∈ so(3): w ewb = I + Representing a Moving Scene b2 b w w sin(|w|) + 1 − cos(|w|) . 2 |w| |w| This is known as Rodrigues’ formula. The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles Proof: Let t = |w| and v = w/|w|. Then vb2 = vv > − I, vb3 = −vb, ... and 2 t3 t5 t t4 t6 ewb = ebv t = I+ t − + − . . . vb+ − + − . . . vb2 . 3! 5! 2! 4! 6! | {z } | {z } sin(t) 1−cos(t) updated May 8, 2019 15/26 Representing a Moving Scene Representation of Rigid-body Motions SE(3) Prof. Daniel Cremers We have seen that the motion of a rigid-body is uniquely determined by specifying the translation T of any given point and a rotation matrix R defining the transformation of an oriented Cartesian coordinate frame at the given point. Thus the space of rigid-body motions given by the group of special Euclidean transformations The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) SE(3) ≡ g = (R, T ) | R ∈ SO(3), T ∈ R3 . The Lie Group SE(3) Representing the Camera Motion Euler Angles In homogeneous coordinates, we have: ( SE(3) ≡ g= R 0 T 1 ) R ∈ SO(3), T ∈ R 3 ⊂ R4×4 . In the context of rigid motions, one can see the difference between points in E3 (which can be rotated and translated) and vectors in R3 (which can only be rotated). updated May 8, 2019 16/26 Representing a Moving Scene The Lie Algebra of Twists Prof. Daniel Cremers Given a continuous family of rigid-body transformations R(t) T (t) g : R → SE(3); g(t) = ∈ R4×4 , 0 1 The Origins of 3D Reconstruction 3D Space & Rigid Body Motion we consider The Lie Group SO(3) ġ(t)g −1 (t) = ṘR > 0 Ṫ − ṘR > T 0 ∈ R4×4 . The Lie Group SE(3) Representing the Camera Motion Euler Angles As in the case of SO(3), the matrix ṘR b ∈ so(3). skew-symmetric matrix w > corresponds to some b Defining a vector v (t) = Ṫ (t) − w(t)T (t), we have: b w(t) v (t) b ∈ R4×4 . ≡ ξ(t) ġ(t)g −1 (t) = 0 0 updated May 8, 2019 17/26 Representing a Moving Scene The Lie Algebra of Twists Prof. Daniel Cremers Multiplying with g(t) from the right, we obtain: b ġ = ġg −1 g = ξg. The 4 × 4-matrix ξb can be viewed as a tangent vector along the curve g(t). ξb is called a twist. As in the case of so(3), the set of all twists forms a the tangent space which is the Lie algebra ( ) b w v 3 b ∈ so(3), v ∈ R se(3) ≡ ξb = w ⊂ R4×4 . 0 0 The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles to the Lie group SE(3). As before, we can define operators ∧ and ∨ to convert between a twist ξb ∈ se(3) and its twist coordinates ξ ∈ R6 : ξb ≡ v w ∧ b w ≡ 0 v 0 ∈R 4×4 , b w 0 v 0 ∨ v = ∈ R6 . w updated May 8, 2019 18/26 Exponential Coordinates for SE(3) The twist coordinates ξ = wv are formed by stacking the linear velocity v ∈ R3 (related to translation) and the angular velocity w ∈ R3 (related to rotation). The differential equation system ( b ġ(t) = ξg(t), ξb = const. g(0) = I, Representing a Moving Scene Prof. Daniel Cremers The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) has the solution Representing the Camera Motion b ξt g(t) = e = ∞ X b n (ξt) n=0 For w = 0, we have eξ = b ξb e = Iv 01 ewb 0 n! Euler Angles . , while for w 6= 0 one can show: b v +ww > v (I−ew )w |w|2 b 1 ! . updated May 8, 2019 19/26 Representing a Moving Scene Exponential Coordinates for SE(3) The above shows that the exponential map defines a transformation from the Lie algebra se(3) to the Lie group SE(3): b exp : se(3) → SE(3); ξb 7→ eξ . The elements ξb ∈ se(3) are called the exponential coordinates for SE(3). Conversely: For every g ∈ SE(3) there exist twist coordinates b ξ = (v , w) ∈ R6 such that g = exp(ξ). 3 Proof: Given g = (R, T ), we know that there exists w ∈ R with ewb = R. If |w| = 6 0, the exponential form of g introduced above shows that we merely need to solve the equation Prof. Daniel Cremers The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles b + ww > v (I − ewb )wv =T |w|2 for the velocity vector v ∈ R3 . Just as in the case of SO(3), this representation is generally not unique, i.e. there exist many twists ξb ∈ se(3) which represent the same rigid-body motion g ∈ SE(3). updated May 8, 2019 20/26 Representing the Motion of the Camera Representing a Moving Scene Prof. Daniel Cremers When observing a scene from a moving camera, the coordinates and velocity of points in camera coordinates will change. We will use a rigid-body transformation R(t) T (t) ∈ SE(3) g(t) = 0 1 to represent the motion from a fixed world frame to the camera frame at time t. In particular we assume that at time t = 0 the camera frame coincides with the world frame, i.e. g(0) = I. For any point X 0 in world coordinates, its coordinates in the camera frame at time t are: The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles X (t) = R(t)X 0 + T (t), or in the homogeneous representation X (t) = g(t)X 0 . updated May 8, 2019 21/26 Concatenation of Motions over Frames Representing a Moving Scene Prof. Daniel Cremers Given two different times t1 and t2 , we denote the transformation from the points in frame t1 to the points in frame t2 by g(t2 , t1 ): X (t2 ) = g(t2 , t1 )X (t1 ). Obviously we have: The Origins of 3D Reconstruction 3D Space & Rigid Body Motion X (t3 ) = g(t3 , t2 )X 2 = g(t3 , t2 )g(t2 , t1 )X (t1 ) = g(t3 , t1 )X (t1 ), The Lie Group SO(3) The Lie Group SE(3) and thus: g(t3 , t1 ) = g(t3 , t2 )g(t2 , t1 ). Representing the Camera Motion Euler Angles By transferring the coordinates of frame t1 to coordinates in frame t2 and back, we see that: X (t1 ) = g(t1 , t2 )X (t2 ) = g(t1 , t2 )g(t2 , t1 )X (t1 ), which must hold for any point coordinates X (t1 ), thus: g(t1 , t2 )g(t2 , t1 ) = I ⇔ g −1 (t2 , t1 ) = g(t1 , t2 ). updated May 8, 2019 22/26 Representing a Moving Scene Rules of Velocity Transformation The coordinates of point X 0 in frame t are given by X (t) = g(t)X 0 . Therefore the velocity is given by Prof. Daniel Cremers Ẋ (t) = ġ(t)X 0 = ġ(t)g −1 (t)X (t) The Origins of 3D Reconstruction By introducing the twist coordinates b w(t) v (t) b (t) ≡ ġ(t)g −1 (t) = V 0 0 3D Space & Rigid Body Motion ∈ se(3), The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion we get the expression: Euler Angles b (t)X (t). Ẋ (t) = V In simple 3D-coordinates this gives: b Ẋ (t) = w(t)X (t) + v (t). b (t) therefore represents the relative velocity of The symbol V the world frame as viewed from the camera frame. updated May 8, 2019 23/26 Transfer Between Frames: The Adjoint Map Representing a Moving Scene Prof. Daniel Cremers Suppose that a viewer in another frame A is displaced relative to the current frame by a transformation gxy : Y = gxy X (t). Then the velocity in this new frame is given by: b (t)X (t) = gxy V b g −1 Y (t). Ẏ (t) = gxy Ẋ (t) = gxy V xy This shows that the relative velocity of points observed from camera frame A is represented by the twist The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles by = gxy V b g −1 ≡ adg (V b ). V xy xy where we have introduced the adjoint map on se(3): adg : se(3) → se(3); ξb 7→ g ξbg −1 . updated May 8, 2019 24/26 Representing a Moving Scene Summary Prof. Daniel Cremers Rotation SO(3) Matrix repres. R ∈ GL(3) : R > R = I, det(R) = 1 Rigid-body SE(3) g= R 0 T 1 The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) 3-D coordinates X = RX 0 X = RX 0 + T g −1 = > > R −R T 0 1 Inverse R −1 = R > Exp. repres. b R = exp(w) b g = exp(ξ) Velocity b Ẋ = wX b +v Ẋ = wX Adjoint map b 7→ R w b R> w ξb 7→ g ξbg −1 The Lie Group SE(3) Representing the Camera Motion Euler Angles updated May 8, 2019 25/26 Alternative Representations: Euler Angles In addition to the exponential parameterization, there exist alternative mathematical representations to parameterize rotation matrices R ∈ SO(3), given by the Euler angles. These are local coordinates, i.e. the parameterization is only correct for a portion of SO(3). b1 , w b2 , w b 3 ) of the Lie algebra so(3), we can Given a basis (w define a mapping from R3 to the Lie group SO(3) by: 7→ α : (α1 , α2 , α3 ) b 3 ). b 1 + α2 w b 2 + α3 w exp(α1 w The coordinates (α1 , α2 , α3 ) are called Lie-Cartan coordinates of the first kind relative to the above basis. Representing a Moving Scene Prof. Daniel Cremers The Origins of 3D Reconstruction 3D Space & Rigid Body Motion The Lie Group SO(3) The Lie Group SE(3) Representing the Camera Motion Euler Angles The Lie-Cartan coordinates of the second kind are defined as: β : (β1 , β2 , β3 ) 7→ b 1 ) exp(β2 w b 2 ) exp(β3 w b 3 ). exp(β1 w For the basis representing rotation around the z-, y -, x-axis w1 = (0, 0, 1)> , w2 = (0, 1, 0)> , w3 = (1, 0, 0)> , the coordinates β1 , β2 , β3 are called Euler angles. updated May 8, 2019 26/26 Perspective Projection Prof. Daniel Cremers Chapter 3 Perspective Projection Multiple View Geometry Summer 2019 Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry Prof. Daniel Cremers Chair for Computer Vision and Artificial Intelligence Departments of Informatics & Mathematics Technical University of Munich updated May 20, 2019 1/24 Overview Perspective Projection Prof. Daniel Cremers 1 Historic Remarks Historic Remarks 2 Mathematical Representation Mathematical Representation Intrinsic Parameters 3 Intrinsic Parameters Spherical Projection Radial Distortion 4 Spherical Projection Preimage and Coimage Projective Geometry 5 Radial Distortion 6 Preimage and Coimage 7 Projective Geometry updated May 20, 2019 2/24 Some Historic Remarks Perspective Projection Prof. Daniel Cremers The study of the image formation process has a long history. The earliest formulations of the geometry of image formation can be traced back to Euclid (4th century B.C.). Examples of a partially correct perspective projection are visible in the frescoes and mosaics of Pompeii (1 B.C.). These skills seem to have been lost with the fall of the Roman empire. Correct perspective projection emerged again around 1000 years later in early Renaissance art. Among the proponents of perspective projection are the Renaissance artists Brunelleschi, Donatello and Alberti. The first treatise on the projection process, “Della Pittura” (1435) was published by Leon Battista Alberti). Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry Apart from the geometry of image formation, the study of the interaction of light with matter was propagated by artists like Leonardo da Vinci in the 1500s and by Renaissance painters such as Caravaggio and Raphael. updated May 20, 2019 3/24 Perspective Projection in Art Perspective Projection Prof. Daniel Cremers Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry Filippo Lippi, “The Feast of Herod: Salome’s Dance.” Fresco, Cappella Maggiore, Duomo, Prato, Italy, c.1460-1464. updated May 20, 2019 4/24 Perspective Projection in Art Perspective Projection Prof. Daniel Cremers Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry Raphael, The School of Athens (1509) updated May 20, 2019 5/24 Perspective Projection in Art Perspective Projection Prof. Daniel Cremers Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry Dürer’s machine (1525) updated May 20, 2019 6/24 Perspective Projection in Art Perspective Projection Prof. Daniel Cremers Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry Satire by Hogarth 1753 updated May 20, 2019 7/24 Perspective Projection Perspective Projection in Art Prof. Daniel Cremers Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry M.C. Escher, Another World 1947 Escher, Belvedere 1958 updated May 20, 2019 8/24 Mathematics of Perspective Projection Perspective Projection Prof. Daniel Cremers Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry The above drawing shows the perspective projection of a point P (observed through a thin lens) to its image p. The point P has coordinates X = (X , Y , Z ) ∈ R3 relative to the reference frame centered at the optical center, where the z-axis is the optical axis (of the lens). updated May 20, 2019 9/24 Mathematics of Perspective Projection Perspective Projection Prof. Daniel Cremers Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry To simplify equations, one flips the signs of x- and y -axes, which amounts to considering the image plane to be in front of the center of projection (rather than behind it). The perspective transformation π is therefore given by ! f XZ 3 2 π : R → R ; X 7→ x = π(X ) = . f YZ updated May 20, 2019 10/24 Perspective Projection An Ideal Perspective Camera Prof. Daniel Cremers In homogeneous coordinates, the perspective transformation is given by: X x f 0 0 0 Y Zx = Z y = 0 f 0 0 Z = Kf Π0 X . 1 0 0 1 0 1 Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion where we have introduced the two matrices f 0 0 1 Kf ≡ 0 f 0 and Π0 ≡ 0 0 0 1 0 Preimage and Coimage 0 1 0 0 0 1 0 0 . 0 Projective Geometry The matrix Π0 is referred to as the standard projection matrix. Assuming Z to be a constant λ > 0, we obtain: λx = Kf Π0 X . updated May 20, 2019 11/24 An Ideal Perspective Camera Perspective Projection Prof. Daniel Cremers From the previous lectures, we know that due to the rigid motion of the camera, the point X in camera coordinates is given as a function of the point in world coordinates X 0 by: X = RX 0 + T , or in homogeneous coordinates X = (X , Y , Z , 1)> : R T X = gX 0 = X 0. 0 1 In total, the transformation from world coordinates to image coordinates is therefore given by Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry λx = Kf Π0 g X 0 . If the focal length f is known, it can be normalized to 1 (by changing the units of the image coordinates), such that: λ x = Π0 X = Π0 g X 0 . updated May 20, 2019 12/24 Perspective Projection Intrinsic Camera Parameters Prof. Daniel Cremers If the camera is not centered at the optical center, we have an additional translation ox , oy and if pixel coordinates do not have unit scale, we need to introduce an additional scaling in x- and y -direction by sx and sy . If the pixels are not rectangular, we have a skew factor sθ . The pixel coordinates (x 0 , y 0 , 1) as a function of homogeneous camera coordinates X are then given by: X 0 f 0 0 1 0 0 0 x sx sθ ox Y λ y 0 = 0 sy oy 0 f 0 0 1 0 0 Z 0 0 1 0 0 1 0 1 0 0 1 1 {z }| {z }| {z } | ≡Ks ≡Kf Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry ≡Π0 After the perspective projection Π0 (with focal length 1), we have an additional transformation which depends on the (intrinsic) camera parameters. This can be expressed by the intrinsic parameter matrix K = Ks Kf . updated May 20, 2019 13/24 The Intrinsic Parameter Matrix All intrinsic camera parameters therefore enter the intrinsic parameter matrix fsx fsθ ox K ≡ Ks Kf = 0 fsy oy . 0 0 1 As a function of the world coordinates X 0 , we therefore have: Perspective Projection Prof. Daniel Cremers Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion 0 λx = K Π0 X = K Π0 g X 0 ≡ Π X 0 . The 3 × 4 matrix Π ≡ K Π0 g = (KR, KT ) is called a general projection matrix. Although the above equation looks like a linear one, we still have the scale parameter λ. Dividing by λ gives: x0 = π1> X 0 , π3> X 0 y0 = π2> X 0 , π3> X 0 Preimage and Coimage Projective Geometry z 0 = 1, where π1> , π2> , π3> ∈ R4 are the three rows of the projection matrix Π. updated May 20, 2019 14/24 The Intrinsic Parameter Matrix Perspective Projection Prof. Daniel Cremers The entries of the intrinsic parameter matrix fsx fsθ ox K = 0 fsy oy , 0 0 1 Historic Remarks Mathematical Representation Intrinsic Parameters can be interpreted as follows: Spherical Projection Radial Distortion ox : x-coordinate of principal point in pixels, oy : y -coordinate of principal point in pixels, fsx = αx : size of unit length in horizontal pixels, fsy = αy : size of unit length in vertical pixels, αx /αy : aspect ratio σ, fsθ : skew of the pixel, often close to zero. Preimage and Coimage Projective Geometry updated May 20, 2019 15/24 Perspective Projection Spherical Perspective Projection Prof. Daniel Cremers The perspective pinhole camera introduced above considers a planar imaging surface. Instead, one can consider a spherical projection surface given by the unit sphere S2 ≡ {x ∈ R3 |x| = 1}. The spherical projection πs of a 3D point X is given by: πs : R3 → S2 ; X 7→ x = X . |X | Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion 0 The pixel coordinates x as a function of the world coordinates X 0 are: λ x 0 = K Π0 g X 0 , √ except that the scalar factor is now λ = |X | = X 2 + Y 2 + Z 2 . One often writes x ∼ y for homogeneous vectors x and y if they are equal up to a scalar factor. Then we can write: Preimage and Coimage Projective Geometry x 0 ∼ Π X 0 = K Π0 g X 0 . This property holds for any imaging surface, as long as the ray between X and the origin intersects the imaging surface. updated May 20, 2019 16/24 Perspective Projection Radial Distortion Prof. Daniel Cremers Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry bookshelf with regular lens bookshelf with short focal lens updated May 20, 2019 17/24 Radial Distortion The intrinsic parameters in the matrix K model linear distortions in the transformation to pixel coordinates. In practice, however, one can also encounter significant distortions along the radial axis, in particular if a wide field of view is used or if one uses cheaper cameras such as webcams. A simple effective model for such distortions is: x = xd (1 + a1 r 2 + a2 r 4 )), y = yd (1 + a1 r 2 + a2 r 4 )), Perspective Projection Prof. Daniel Cremers Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection where x d ≡ (xd , yd ) is the distorted point, r 2 = xd2 + yd2 . If a calibration rig is available, the distortion parameters a1 and a2 can be estimated. Alternatively, one can estimate a distortion model directly from the images. A more general model (Devernay and Faugeras 1995) is x = c + f (r )(x d − c), Radial Distortion Preimage and Coimage Projective Geometry with f (r ) = 1 + a1 r + a2 r 2 + a3 r 3 + a4 r 4 , Here, r = |x d − c| is the distance to an arbitrary center of distortion c and the distortion correction factor f (r ) is an arbitrary 4-th order expression. Parameters are computed from distortions of straight lines or simultaneously with the 3D reconstruction (Zhang ’96, Stein ’97, Fitzgibbon ’01). updated May 20, 2019 18/24 Preimage of Points and Lines The perspective transformation introduced above allows to define images for arbitrary geometric entities by simply transforming all points of the entity. However, due to the unknown scale factor, each point is mapped not to a single point x, but to an equivalence class of points y ∼ x. It is therefore useful to study how lines are transformed. A line L in 3-D is characterized by a base point X 0 = (X0 , Y0 , Z0 , 1)> ∈ R4 and a vector V = (V1 , V2 , V3 , 0)> ∈ R4 : X = X0 + µ V, µ ∈ R. Perspective Projection Prof. Daniel Cremers Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry The image of the line L is given by x ∼ Π0 X = Π0 (X 0 + µV ) = Π0 X 0 + µΠ0 V . All points x treated as vectors from the origin o span a 2-D subspace P. The intersection of this plane P with the image plane gives the image of the line. P is called the preimage of the line. A preimage of a point or a line in the image plane is the largest set of 3D points that give rise to an image equal to the given point or line. updated May 20, 2019 19/24 Perspective Projection Preimage and Coimage Prof. Daniel Cremers Historic Remarks Mathematical Representation Preimage P of a line L Intrinsic Parameters Spherical Projection Preimages can be defined for curves or other more complicated geometric structures. In the case of points and lines, however, the preimage is a subspace of R3 . This subspace can also be represented by its orthogonal complement, i.e. the normal vector in the case of a plane. This complement is called the coimage. The coimage of a point or a line is the subspace in R3 that is the (unique) orthogonal complement of its preimage. Image, preimage and coimage are equivalent because they uniquely determine oneanother: image = preimage ∩ image plane, ⊥ preimage = coimage , Radial Distortion Preimage and Coimage Projective Geometry preimage = span(image), coimage = preimage⊥ . updated May 20, 2019 20/24 Preimage and Coimage of Points and Lines In the case of the line L, the preimage is a 2D subspace, characterized by the 1D coimage given by the span of its normal vector ` ∈ R3 . All points of the preimage, and hence all points x of the image of L are orthogonal to `: Perspective Projection Prof. Daniel Cremers Historic Remarks `> x = 0. Mathematical Representation The space of all vectors orthogonal to ` is spanned by the row vectors of b `, thus we have: Intrinsic Parameters Spherical Projection Radial Distortion P = span(b `). Preimage and Coimage In the case that x is the image of a point p, the preimage is a line and the coimage is the plane orthogonal to x, i.e. it is spanned by the rows of the matrix xb. Projective Geometry In summary we have the following table: Image Preimage Coimage Point span(x) ∩ im. plane span(x) ⊂ R3 b ) ⊂ R3 span(x Line span(b `) ∩ im. plane span(b `) ⊂ R3 span(`) ⊂ R3 updated May 20, 2019 21/24 Perspective Projection Summary Prof. Daniel Cremers In this part of the lecture, we studied the perspective projection which takes us from the 3D (4D) camera coordinates to 2D camera image coordinates and pixel coordinates. In homogeneous coordinates, we have the transformations: g∈SE(3) K Π 0 f 4D World coordinates −→ 4D Camera coordinates −→ Ks 3D image coordinates −→ 3D pixel coordinates. In particular, we can summarize the (intrinsic) camera parameters in the matrix K = Ks Kf . Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry The full transformation from world coordinates X 0 to pixel coordinates x 0 is given by: λx 0 = K Π0 g X 0 . Moreover, for the images of points and lines we introduced the notions of preimage (maximal point set which is consistent with a given image) and coimage (its orthogonal complement). Both can be used equivalently to the image. updated May 20, 2019 22/24 Projective Geometry Perspective Projection Prof. Daniel Cremers In order to formally write transformations by linear operations, we made extensive use of homogeneous coordinates to represent a 3D point as a 4D-vector (X , Y , Z , 1) with the last coordinate fixed to 1. This normalization is not always necessary: One can represent 3D points by a general 4D vector X = (XW , YW , ZW , W ) ∈ R4 , Historic Remarks Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion remembering that merely the direction of this vector is of importance. We therefore identify the point in homogeneous coordinates with the line connecting it with the origin. This leads to the definition of projective coordinates. An n-dimensional projective space Pn is the set of all one-dimensional subspaces (i.e. lines through the origin) of the vector space Rn+1 . A point p ∈ Pn can then be assigned homogeneous coordinates X = (x1 , . . . , xn+1 )> , among which at least one x is nonzero. For any nonzero λ ∈ R, the coordinates Y = (λx1 , . . . , λxn+1 )> represent the same point p. Preimage and Coimage Projective Geometry updated May 20, 2019 23/24 Perspective Projection Projective Geometry Prof. Daniel Cremers If the two coordinate vectors X and Y differ by a scalar factor, then they are said to be equivalent: X ∼ Y. Historic Remarks The point p is represented by the equivalence class of all multiples of X . Since all points are represented by lines through the origin, there exist two alternative representations for the two-dimensional projective space P2 : 1 2 One can represent each point as a point on the 2D-sphere S2 , where any antipodal points represent the same line. Mathematical Representation Intrinsic Parameters Spherical Projection Radial Distortion Preimage and Coimage Projective Geometry One can represent each point p either as a point on the plane of R2 (homogeneous coordinates) modeling all points with non-zero z-component, or as a point on the circle S1 (again identifying antipodal points) which is equivalent to P1 . Both representations hold for the n-dimensional projective space Pn , which can be either seen as a an nD-sphere Sn or as Rn with Pn−1 attached (to model lines at infinity). updated May 20, 2019 24/24 Estimating Point Correspondence Prof. Daniel Cremers Chapter 4 Estimating Point Correspondence Multiple View Geometry Summer 2019 From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching Prof. Daniel Cremers Chair for Computer Vision and Artificial Intelligence Departments of Informatics & Mathematics Technical University of Munich updated May 23, 2019 1/22 Overview Estimating Point Correspondence Prof. Daniel Cremers 1 From Photometry to Geometry From Photometry to Geometry 2 Small Deformation & Optical Flow Small Deformation & Optical Flow The Lucas-Kanade Method 3 The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching 4 Feature Point Extraction 5 Wide Baseline Matching updated May 23, 2019 2/22 From Photometry to Geometry In the last sections, we discussed how points and lines are transformed from 3D world coordinates to 2D image and pixel coordinates. In practice, we do not actually observe points or lines, but rather brightness or color values at the individual pixels. In order to transfer from this photometric representation to a geometric representation of the scene, one can identify points with characteristic image features and try to associate these points with corresponding points in the other frames. The matching of corresponding points will allow us to infer 3D structure. Nevertheless, one should keep in mind that this approach is suboptimal: By selecting a small number of feature points from each image, we throw away a large amount of potentially useful information contained in each image. Yet, retaining all image information is computationally challenging. The selection and matching of a small number of feature points, on the other hand, allows tracking of 3D objects from a moving camera in real time - even with limited processing power. Estimating Point Correspondence Prof. Daniel Cremers From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching updated May 23, 2019 3/22 Estimating Point Correspondence Example of Tracking Prof. Daniel Cremers From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Input frame 1 Input frame 2 Feature Point Extraction Wide Baseline Matching Wire frame reconstruction with texture map updated May 23, 2019 4/22 Estimating Point Correspondence Example of Tracking Prof. Daniel Cremers From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching Tracked input sequence Textured reconstruction updated May 23, 2019 5/22 Estimating Point Correspondence Identifying Corresponding Points Prof. Daniel Cremers From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching Input frame 1 Input frame 2 To identify corresponding points in two or more images is one of the biggest challenges in computer vision. Which of the points identified in the left image corresponds to which point in the right one? updated May 23, 2019 6/22 Estimating Point Correspondence Non-rigid Deformation Prof. Daniel Cremers In what follows we will assume that objects move rigidly. However, in general, objects may also deform non-rigidly. Moreover, there may be partial occlusions: From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching Image 1 Image 2 Registration Cremers, Guetter, Xu, CVPR ’06 updated May 23, 2019 7/22 Small Deformation versus Wide Baseline Estimating Point Correspondence Prof. Daniel Cremers In point matching one distinguishes two cases: • Small deformation: The deformation from one frame to the other is assumed to be (infinitesimally) small. In this case the displacement from one frame to the other can be estimated by classical optic flow estimation, for example using the methods of Lucas/Kanade or Horn/Schunck. In particular, these methods allow to model dense deformation fields (giving a displacement for every pixel in the image). But one can also track the displacement of a few feature points which is typically faster. From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching • Wide baseline stereo: In this case the displacement is assumed to be large. A dense matching of all points to all is in general computationally infeasible. Therefore, one typically selects a small number of feature points in each of the images and develops efficient methods to find an appropriate pairing of points. updated May 23, 2019 8/22 Small Deformation The transformation of all points of a rigidly moving object is given by: x2 = h(x1 ) = Estimating Point Correspondence Prof. Daniel Cremers 1 (Rλ1 (X ) x1 + T ) . λ2 (X ) Locally this motion can be approximated in several ways. • Translational model: From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction h(x) = x + b. Wide Baseline Matching • Affine model: h(x) = Ax + b. The 2D affine model can also be written as: h(x) = x + u(x) with u(x) = S(x)p = x y 1 0 0 0 0 0 0 x y 1 (p1 , p2 , p3 , p4 , p5 , p6 )> . updated May 23, 2019 9/22 Optic Flow Estimation Estimating Point Correspondence Prof. Daniel Cremers The optic flow refers to the apparent 2D motion field observable between consecutive images of a video. It is different from the motion of objects in the scene, in the extreme case of motion along the camera axis, for example, there is no optic flow, while on the other hand camera rotation generates an optic flow field even for entirely static scenes. In 1981, two seminal works on optic flow estimation were published, namely the works of Lucas & Kanade, and of Horn & Schunck. Both methods have become very influential with thousands of citations. They are complementary in the sense that the Lucas-Kanade method generates sparse flow vectors under the assumption of constant motion in a local neighborhood, whereas the Horn-Schunck method generates a dense flow field under the assumption of spatially smooth motion. Despite more than 30 years of research, the estimation of optic flow fields is still a highly active research direction. From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching Due to its simplicity, we will review the Lucas-Kanade method. updated May 23, 2019 10/22 Estimating Point Correspondence The Lucas-Kanade Method • Brightness Constancy Assumption: Let x(t) denote a moving point at time t, and I(x, t) a video sequence, then: I(x(t), t) = const. ∀t, i.e. the brightness of point x(t) is constant. Therefore the total time derivative must be zero: d dx ∂I > I(x(t), t) = ∇I + = 0. dt dt ∂t This constraint is often called the (differential) optical flow constraint. The desired local flow vector (velocity) is given by v = dx dt . • Constant motion in a neighborhood: Since the above equation cannot be solved for v , one assumes that v is constant over a neighborhood W (x) of the point x: ∇I(x 0 , t)> v + ∂I 0 (x , t) = 0 ∂t Prof. Daniel Cremers From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching ∀x 0 ∈ W (x). updated May 23, 2019 11/22 Estimating Point Correspondence The Lucas-Kanade Method The brightness is typically not exactly constant and the velocity is typically not exactly the same for the local neighborhood. Lucas and Kanade (1981) therefore compute the best velocity vector v for the point x by minimizing the least squares error Z 2 E(v ) = ∇I(x 0 , t)> v + It (x 0 , t) dx 0 . W (x) Prof. Daniel Cremers From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Expanding the terms and setting the derivative to zero one obtains: dE = 2Mv + 2q = 0, dv with Z Z > 0 M= ∇I ∇I dx , and q = It ∇I dx 0 . W (x) Feature Point Extraction Wide Baseline Matching W (x) If M is invertible, i.e. det(M) 6= 0, then the solution is v = −M −1 q. updated May 23, 2019 12/22 Estimating Point Correspondence Estimating Local Displacements Prof. Daniel Cremers • Translational motion: Lucas & Kanade ’81: Z 2 E(b) = ∇I > b + It dx 0 → min . W (x) From Photometry to Geometry Small Deformation & Optical Flow dE =0 db ⇒ The Lucas-Kanade Method b = ··· Feature Point Extraction Wide Baseline Matching • Affine motion: Z E(p) = 2 ∇I(x 0 )> S(x 0 ) p + It (x 0 ) dx 0 W (x) dE =0 dp ⇒ p = ··· updated May 23, 2019 13/22 When can Small Motion be Estimated? Estimating Point Correspondence Prof. Daniel Cremers In the formalism of Lucas and Kanade, one cannot always estimate a translational motion. This problem is often referred to as the aperture problem. It arises for example, if the region in the window W (x) around the point x has entirely constant intensity (for example a white wall), because then ∇I(x) = 0 and It (x) = 0 for all points in the window. In order for the solution of b to be unique the structure tensor Z 2 Ix Ix Iy M(x) = dx 0 . Ix Iy Iy2 W (x) From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching needs to be invertible. That means that we must have det M 6= 0. If the structure tensor is not invertible but not zero, then one can estimate the normal motion, i.e. the motion in direction of the image gradient. For those points with det M(x) 6= 0, we can compute a motion vector b(x). This leads to the following simple feature tracker. updated May 23, 2019 14/22 A Simple Feature Tracking Algorithm Feature tracking over a sequence of images can now be done as follows: • For a given time instance t, compute at each point x ∈ Ω the structure tensor Z 2 Ix Ix Iy M(x) = dx 0 . Ix Iy Iy2 W (x) • Mark all points x ∈ Ω for which the determinant of M is larger than a threshold θ > 0: Estimating Point Correspondence Prof. Daniel Cremers From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching det M(x) ≥ θ. • For all these points the local velocity is given by: R 0 −1 R Ix It dx 0 . b(x, t) = −M(x) Iy It dx • Repeat the above steps for the points x + b at time t + 1. updated May 23, 2019 15/22 Estimating Point Correspondence Robust Feature Point Extraction Even det M(x) 6= 0 does not guarantee robust estimates of velocity — the inverse of M(x) may not be very stable if, for example, the determinant of M is very small. Thus locations with det M 6= 0 are not always reliable features for tracking. One of the classical feature detectors was developed by Moravec ’80, Förstner ’84, ’87 and Harris & Stephens ’88. Prof. Daniel Cremers From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method It is based on the structure tensor 2 Z Ix M(x) ≡ Gσ ∗ ∇I∇I > = Gσ (x − x 0 ) Ix Iy Feature Point Extraction Ix Iy (x 0 ) dx 0 , Iy2 Wide Baseline Matching where rather than simple summing over the window W (x) we perform a summation weighted by a Gaussian G of width σ. Harris and Stephens propose the following expression: C(x) = det(M) − κ trace2 (M), and select points for which C(x) > θ with a threshold θ > 0. updated May 23, 2019 16/22 Response of Förstner Detector Estimating Point Correspondence Prof. Daniel Cremers From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching updated May 23, 2019 17/22 Wide Baseline Matching Estimating Point Correspondence Prof. Daniel Cremers From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching Corresponding points and regions may look very different in different views. Determining correspondence is a challenge. In the case of wide baseline matching, large parts of the image plane will not match at all because they are not visible in the other image. In other words, while a given point may have many potential matches, quite possibly it does not have a corresponding point in the other image. updated May 23, 2019 18/22 Extensions to Larger Baseline Estimating Point Correspondence Prof. Daniel Cremers One of the limitations of tracking features frame by frame is that small errors in the motion accumulate over time and the window gradually moves away from the point that was originally tracked. This is known as drift. A remedy is to match a given point back to the first frame. This generally implies larger displacements between frames. Two aspects matter when extending the above simple feature tracking method to somewhat larger displacements: • Since the motion of the window between frames is (in general) no longer translational, one needs to generalize the motion model for the window W (x), for example by using an affine motion model. From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching • Since the illumination will change over time (especially when comparing more distant frames), one can replace the sum-of-squared-differences by the normalized cross correlation which is more robust to illumination changes. updated May 23, 2019 19/22 Estimating Point Correspondence Normalized Cross Correlation The normalized cross correlation is defined as: Z I1 (x 0 ) − I¯1 I2 h(x 0 ) − I¯2 dx 0 W (x) NCC(h) = sZ , Z 2 2 I1 (x 0 ) − I¯1 dx 0 I2 h(x 0 ) − I¯2 dx 0 W (x) W (x) where I¯1 and I¯2 are the average intensity over the window W (x). By subtracting this average intensity, the measure becomes invariant to additive intensity changes I → I + γ. Prof. Daniel Cremers From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching Dividing by the intensity variances of each window makes the measure invariant to multiplicative changes I → γI. If we stack the normalized intensity values of respective windows into one vector, vi ≡ vec(Ii − I¯i ), then the normalized cross correlation is the cosine of the angle between them: NCC(h) = cos ∠(v1 , v2 ). updated May 23, 2019 20/22 Special Case: Optimal Affine Transformation The normalized cross correlation can be used to determine the optimal affine transformation between two given patches. Since the affine transformation is given by: h(x) = A x + d, we need to maximize the cross correlation with respect to the 2 × 2-matrix A and the displacement d: Â, d̂ = arg max NCC(A, d), A,d Estimating Point Correspondence Prof. Daniel Cremers From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching where Z I1 (x 0 ) − Ī1 I2 Ax 0 + d) − Ī2 dx 0 W (x) NCC(A, d) = v Z , Z 2 2 u u 0 0 0 0 I1 (x ) − Ī1 dx I2 Ax + d) − Ī2 dx t W (x) W (x) Efficiently finding appropriate optima, however, is a challenge. updated May 23, 2019 21/22 Optical Flow Estimation with Deep Neural Networks Estimating Point Correspondence Prof. Daniel Cremers There exist numerous algorithms to estimate correspondence across images. Over the last years, neural networks have become popular for estimating correspondence. From Photometry to Geometry Small Deformation & Optical Flow The Lucas-Kanade Method Feature Point Extraction Wide Baseline Matching Dosovitskiy, Fischer, Ilg, Haeusser, Hazirbas, Golkov, van der Smagt, Cremers and Brox, “FlowNet: Learning Optical Flow with Convolutional Networks”, ICCV 2015. updated May 23, 2019 22/22 Reconstruction from Two Views: Linear Algorithms Chapter 5 Reconstruction from Two Views: Linear Algorithms Multiple View Geometry Summer 2019 Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case Prof. Daniel Cremers Chair for Computer Vision and Artificial Intelligence Departments of Informatics & Mathematics Technical University of Munich updated May 23, 2019 1/27 Overview Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers 1 The Reconstruction Problem 2 The Epipolar Constraint The Reconstruction Problem The Epipolar Constraint 3 Eight-Point Algorithm Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm 4 Structure Reconstruction The Uncalibrated Case 5 Four-Point Algorithm 6 The Uncalibrated Case updated May 23, 2019 2/27 Problem Formulation Reconstruction from Two Views: Linear Algorithms In the last sections, we discussed how to identify point correspondences between two consecutive frames. In this section, we will tackle the next problem, namely that of reconstructing the 3D geometry of cameras and points. Prof. Daniel Cremers To this end, we will make the following assumptions: The Reconstruction Problem • We assume that we are given a set of corresponding points in two frames taken with the same camera from different vantage points. • We assume that the scene is static, i.e. none of the observed 3D points moved during the camera motion. The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case • We also assume that the intrinsic camera (calibration) parameters are known. We will first estimate the camera motion from the set of corresponding points. Once we know the relative location and orientation of the cameras, we can reconstruct the 3D location of all corresponding points by triangulation. updated May 23, 2019 3/27 Problem Formulation Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case Goal: Estimate camera motion and 3D scene structure from two views. updated May 23, 2019 4/27 The Reconstruction Problem Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers In general 3D reconstruction is a challenging problem. If we are given two views with 100 feature points in each of them, then we have 200 point coordinates in 2D. The goal is to estimate • 6 parameters modeling the camera motion R, T and The Reconstruction Problem • 100 × 3 coordinates for the 3D points Xj . The Epipolar Constraint Eight-Point Algorithm This could be done by minimizing the projection error: X j E(R, T , X1 , . . . , X100 ) = kx1 − π(Xj )k2 + kx2j − π(R, T , Xj )k2 Structure Reconstruction Four-Point Algorithm The Uncalibrated Case j This amounts to a difficult optimization problem called bundle adjustment. Before we look into this problem, we will first study an elegant solution to entirely get rid of the 3D point coordinates. It leads to the well-known 8-point algorithm. updated May 23, 2019 5/27 Epipolar Geometry: Some Notation Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case The projections of a point X onto the two images are denoted by x1 and x2 . The optical centers of each camera are denoted by o1 and o2 . The intersections of the line (o1 , o2 ) with each image plane are called the epipoles e1 and e2 . The intersections between the epipolar plane (o1 , o2 , X ) and the image planes are called epipolar lines l1 and l2 . There is one epipolar plane for each 3D point X . updated May 23, 2019 6/27 The Epipolar Constraint We know that x1 (in homogeneous coordinates) is the projection of a 3D point X . Given known camera parameters (K = 1) and no rotation or translation of the first camera, we merely have a projection with unknown depth λ1 . From the first to the second frame we additionally have a camera rotation R and translation T followed by a projection. This gives the equations: λ1 x1 = X , λ2 x2 = RX + T . Inserting the first equation into the second, we get: λ2 x2 = R(λ1 x1 ) + T . Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case b Now we remove the translation by multiplying with T b (T v ≡ T × v ): b x2 = λ1 T b Rx1 λ2 T And projection onto x2 gives the epipolar constraint: b R x1 = 0 x2> T updated May 23, 2019 7/27 Reconstruction from Two Views: Linear Algorithms The Epipolar Constraint The epipolar constraint Prof. Daniel Cremers b R x1 = 0 x2> T provides a relation between the 2D point coordinates of a 3D point in each of the two images and the camera transformation parameters. The original 3D point coordinates have been removed. The matrix b R ∈ R3×3 E =T is called the essential matrix. The epipolar constraint is also known as essential constraint or bilinear constraint. −−→ Geometrically, this constraint states that the three vectors o1 X , − − → −−→ o2 o1 and o2 X form a plane, i.e. the triple product of these vectors (measuring the volume of the parallelepiped) is zero: In coordinates of the second frame Rx1 gives the direction of the −−→ −−→ vector o1 X ; T gives the direction of o2 o1 , and x2 is proportional −−→ to the vector o2 X such that b R x1 = 0. volume = x > (T × Rx1 ) = x > T 2 The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case 2 updated May 23, 2019 8/27 Properties of the Essential Matrix E The space of all essential matrices is called the essential space: n o b R | R ∈ SO(3), T ∈ R3 E≡ T ⊂ R3×3 Theorem [Huang & Faugeras, 1989] (Characterization of the essential matrix): A nonzero matrix E ∈ R3×3 is an essential matrix if and only if E has a singular value decomposition (SVD) E = UΣV > with Σ = diag{σ, σ, 0} for some σ > 0 and U, V ∈ SO(3). Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case Theorem (Pose recovery from the essential matrix): There exist exactly two relative poses (R, T ) with R ∈ SO(3) and T ∈ R3 corresponding to an essential matrix E ∈ E. For E = UΣV > we have: b1 , R1 ) = URZ (+ π )ΣU > , UR > (+ π )V > , (T (1) Z 2 2 > > > π π b2 , R2 ) = URZ (− )ΣU , UR (− )V , (T (2) Z 2 2 In general, only one of these gives meaningful (positive) depth values. updated May 23, 2019 9/27 A Basic Reconstruction Algorithm We have seen that the 2D-coordinates of each 3D point are coupled to the camera parameters R and T through an epipolar constraint. In the following, we will derive a 3D reconstruction algorithm which proceeds as follows: • Recover the essential matrix E from the epipolar constraints associated with a set of point pairs. • Extract the relative translation and rotation from the essential matrix E. In general, the matrix E recovered from a set of epipolar constraints will not be an essential matrix. One can resolve this problem in two ways: 1 2 Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case Recover some matrix E ∈ R3×3 from the epipolar constraints and then project it onto the essential space. Optimize the epipolar constraints in the essential space. While the second approach is in principle more accurate it involves a nonlinear constrained optimization. We will pursue the first approach which is simpler and faster. updated May 23, 2019 10/27 The Eight-Point Linear Algorithm First we rewrite the epipolar constraint as a scalar product in the elements of the matrix E and the coordinates of the points x 1 and x 2 . Let E s = (e11 , e21 , e31 , e12 , e22 , e32 , e13 , e23 , e33 )> Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers ∈ R9 be the vector of elements of E and a ≡ x1 ⊗ x2 the Kronecker product of the vectors x i = (xi , yi , zi ), defined as a = (x1 x2 , x1 y2 , x1 z2 , y1 x2 , y1 y2 , y1 z2 , z1 x2 , z1 y2 , z1 z2 )> ∈ R9 . The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case Then the epipolar constraint can be written as: > s x> 2 E x 1 = a E = 0. For n point pairs, we can combine this into the linear system: χE s = 0, with χ = (a1 , a2 , . . . , an )> . updated May 23, 2019 11/27 The Eight-Point Linear Algorithm Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers According to χE s = 0, with χ = (a1 , a2 , . . . , an )> . we see that the vector of coefficients of the essential matrix E defines the null space of the matrix χ. In order for the above system to have a unique solution (up to a scaling factor and ruling out the trivial solution E = 0), the rank of the matrix χ needs to be exactly 8. Therefore we need at least 8 point pairs. In certain degenerate cases, the solution for the essential matrix is not unique even if we have 8 or more point pairs. One such example is the case that all points lie on a line or on a plane. The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case Clearly, we will not be able to recover the sign of E. Since with each E, there are two possible assignments of rotation R and translation T , we therefore end up with four possible solutions for rotation and translation. updated May 23, 2019 12/27 Projection onto Essential Space The numerically estimated coefficients E s will in general not correspond to an essential matrix. One can resolve this problem by projecting it back to the essential space. Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case Theorem (Projection onto essential space): Let F ∈ R3×3 be an arbitrary matrix with SVD F = U diag{λ1 , λ2 , λ3 } V > , λ1 ≥ λ2 ≥ λ3 . Then the essential matrix E which minimizes the Frobenius norm kF − Ek2f is given by E = U diag{σ, σ, 0} V > , with σ = λ1 + λ2 . 2 updated May 23, 2019 13/27 Eight Point Algorithm (Longuet-Higgins ’81) Given a set of n = 8 or more point pairs x i1 , x i2 : • Compute an approximation of the essential matrix. Construct the matrix χ = (a1 , a2 , . . . , an )> , where ai = x i1 ⊗ x i2 . Find the vector E s ∈ R9 which minimizes kχE s k as the ninth column of Vχ in the SVD χ = Uχ Σχ Vχ> . Unstack E s into 3 × 3-matrix E. • Project onto essential space. Compute the SVD E = U diag{σ1 , σ2 , σ3 }V > . Since in the reconstruction, E is only defined up to a scalar, we project E onto the normalized essential space by replacing the singular values σ1 , σ2 , σ3 with 1, 1, 0. Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case • Recover the displacement from the essential matrix. The four possible solutions for rotation and translation are: R = URZ> (± π2 )V > , b = URZ (± π )ΣU > , T 2 with a rotation by ± π2 around z: 0 RZ> (± π2 ) = ∓1 0 ±1 0 0 0 0 . 1 updated May 23, 2019 14/27 Do We Need Eight Points? The above reasoning showed that we need at least eight points in order for the matrix χ to have rank 8 and therefore guarantee a unique solution for E. Yet, one can take into account the special structure of E. The space of essential matrices is actually a five-dimensional space, i.e. E only has 5 (and not 9) degrees of freedom. A simple way to take into account the algebraic properties of E is to make use of the fact that det E = 0. If now we have only 7 point pairs, the null space of χ will have (at least) Dimension 2, spanned by two vectors E1 and E2 . Then we can solve for E by determining α such that: Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case det E = det(E1 + αE2 ) = 0. Along similar lines, Kruppa proved in 1913 that one needs only five point pairs to recover (R, T ). In the case of degenerate motion (for example planar or circular motion), one can resolve the problem with even fewer point pairs. updated May 23, 2019 15/27 Limitations and Further Extensions Among the four possible solutions for R and T , there is generally only one meaningful one (which assigns positive depth to all points). The algorithm fails if the translation is exactly 0, since then E = 0 and nothing can be recovered. Due to noise this typically does not happen. In the case of infinitesimal view point change, one can adapt the eight point algorithm to the continuous motion case, where the epipolar constraint is replaced by the continuous epipolar constraint. Rather than recovering (R, T ) one recovers the linear and angular velocity of the camera. Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case In the case of independently moving objects, one can generalize the epipolar constraint. For two motions for example we have: > (x > 2 E1 x 1 )(x 2 E2 x 1 ) = 0 with two essential matrices E1 and E2 . Given a sufficiently large number of point pairs, one can solve the respective equations for multiple essential matrices using polynomial factorization. updated May 23, 2019 16/27 Reconstruction from Two Views: Linear Algorithms Structure Reconstruction The linear eight-point algorithm allowed us to estimate the camera transformation parameters R and T from a set of corresponding point pairs. Yet, the essential matrix E and hence the translation T are only defined up to an arbitrary scale γ ∈ R+ , with kEk = kT k = 1. After recovering R and T , we therefore have for point X j : λj2 x j2 = λj1 Rx j1 Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint + γT , j = 1, . . . , n, with unknown scale parameters λji . We can eliminate one of c these scales by applying x j2 : c c λj1 x j2 Rx j1 + γ x j2 T = 0, Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case j = 1, . . . , n. This corresponds to n linear systems of the form c c x j2 Rx j1 , x j2 T j λ1 = 0. j = 1, . . . , n. γ updated May 23, 2019 17/27 Reconstruction from Two Views: Linear Algorithms Structure Reconstruction Combining the parameters ~λ = (λ11 , λ21 , . . . , λn1 , γ)> ∈ Rn+1 , we get the linear equation system Prof. Daniel Cremers M ~λ = 0 The Reconstruction Problem with M≡ c1 Rx 1 x 1 2 0 0 c2 Rx 2 x 1 2 0 0 0 0 0 0 0 0 0 0 .. . 0 0 0 0 0 0 n−1 x[ Rx n−1 2 1 0 0 n c x 2 Rx n1 c1 T x 2 c2 T x 2 .. . [ n−1 x T 2 . The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case cn T x 2 The linear least squares estimate for ~λ is given by the eigenvector corresponding to the smallest eigenvalue of M > M. It is only defined up to a global scale. It reflects the ambiguity that the camera has moved twice the distance, the scene is twice larger and twice as far away. updated May 23, 2019 18/27 Reconstruction from Two Views: Linear Algorithms Example Prof. Daniel Cremers The Reconstruction Problem Left image right image The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case Reconstruction (Author: Jana Košecká) updated May 23, 2019 19/27 Degenerate Configurations Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers The eight-point algorithm only provides unique solutions (up to a scalar factor) if all 3D points are in a “general position”. This is no longer the case for certain degenerate configurations, for which all points lie on certain 2D surfaces which are called critical surfaces. Typically these critical surfaces are described by a quadratic equation in the three point coordinates, such that they are referred to as quadratic surfaces. The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm While most critical configurations do not actually arise in practice, a specific degenerate configuration which does arise often is the case that all points lie on a 2D plane (such as floors, table, walls,...). The Uncalibrated Case For the structure-from-motion problem in the context of points on a plane, one can exploit additional constraints which leads to the so-called four-point algorithm. updated May 23, 2019 20/27 Planar Homographies Let us assume that all points lie on a plane. If X 1 ∈ R3 denotes the point coordinates in the first frame, and these lie on a plane with normal N ∈ S2 , then we have: N >X 1 = d ⇔ 1 > N X 1 = 1. d Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers The Reconstruction Problem In frame two, we therefore have the coordinates: 1 1 X 2 = RX 1 + T = RX 1 + T N > X 1 = R + TN > X 1 ≡ HX 1 , d d The Epipolar Constraint where The Uncalibrated Case 1 TN > ∈ R3×3 d is called a homography matrix. Inserting the 2D coordinates, we get: λ2 x 2 = Hλ1 x 1 ⇔ x 2 ∼ Hx 1 , Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm H =R+ where ∼ means equality up to scaling. This expression is called a planar homography. H depends on camera and plane parameters. updated May 23, 2019 21/27 From Point Pairs to Homography For a pair of corresponding 2D points we therefore have Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers λ2 x 2 = Hλ1 x 1 . c2 we can eliminate λ2 and obtain: By multiplying with x The Reconstruction Problem c2 Hx 1 = 0 x This equation is called the planar epipolar constraint or planar homography constraint. Again, we can cast this equation into the form The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case a> H s = 0, where we have stacked the elements of H into a vector H s = (H11 , H21 , . . . , H33 )> ∈ R9 , and introduced the matrix c2 ∈ R9×3 . a ≡ x1 ⊗ x updated May 23, 2019 22/27 Reconstruction from Two Views: Linear Algorithms The Four Point Algorithm Let us now assume we have n ≥ 4 pairs of corresponding 2D points {x j1 , x j2 }, j = 1, . . . , n in the two images. Each point pair induces a matrix aj , we integrate these into a larger matrix χ ≡ (a1 , . . . , an )> ∈ R3n×9 , and obtain the system s χH = 0. As in the case of the essential matrix, the homography matrix can be estimated up to a scale factor. Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case This gives rise to the four point algorithm: • For the point pairs, compute the matrix χ. • Compute a solution H s for the above equation by singular value decomposition of χ. • Extract the motion parameters from the homography matrix H = R + d1 TN > . updated May 23, 2019 23/27 General Comments Clearly, the derivation of the four-point algorithm is in close analogy to that of the eight-point algorithm. Rather than estimating the essential matrix E one estimates the homography matrix H to derive R and T . In the four-point algorithm, the homography matrix is decomposed into R, N and T /d. In other words, one can reconstruct the normal of the plane, but the translation is only obtained in units of the offset d of the plane and the origin. The 3D structure of the points can then be computed in the same manner as before. Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case Since one uses the strong constraint that all points lie in a plane, the four-point algorithm only requires four correspondences. There exist numerous relations between the essential matrix b R and the corresponding homography matrix E =T H = R + Tu > with some u ∈ R3 , in particular: b H, E =T H > E + E > H = 0. updated May 23, 2019 24/27 The Case of an Uncalibrated Camera The reconstruction algorithms introduced above all assume that the camera is calibrated (K = 1). The general transformation from a 3D point to the image is given by: λx 0 = K Π0 g X = (KR, KT )X , with the intrinsic parameter matrix or calibration matrix: fsx fsθ ox K = 0 fsy oy ∈ R3×3 . 0 0 1 Reconstruction from Two Views: Linear Algorithms Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction Four-Point Algorithm The Uncalibrated Case The calibration matrix maps metric coordinates into image (pixel) coordinates, using the focal length f , the optical center ox , oy , the pixel size sx , sy and a skew factor sθ . If these parameters are known then one can simply transform the pixel coordinates x 0 to normalized coordinates x = K −1 x 0 to obtain the representation used in the previous sections. This amounts to centering the coordinates with respect to the optical center etc. updated May 23, 2019 25/27 Reconstruction from Two Views: Linear Algorithms The Fundamental Matrix If the camera parameters K cannot be estimated in a calibration procedure beforehand, then one has to deal with reconstruction from uncalibrated views. By transforming all image coordinates x 0 with the inverse calibration matrix K −1 into metric coordinates x, we obtain the epipolar constraint for uncalibrated cameras: b x> 2 T Rx 1 = 0 ⇔ −> b x 0> T RK −1 x 01 = 0, 2 K Prof. Daniel Cremers The Reconstruction Problem The Epipolar Constraint Eight-Point Algorithm Structure Reconstruction which can be written as 0 x 0> 2 F x 1 = 0, Four-Point Algorithm The Uncalibrated Case with the fundamental matrix defined as: b RK −1 = K −> EK −1 . F ≡ K −> T Since the invertible matrix K does not affect the rank of this matrix, we know that F has an SVD F = UΣV > with Σ = diag(σ1 , σ2 , 0). In fact, any matrix of rank 2 can be a fundamental matrix. updated May 23, 2019 26/27 Limitations Reconstruction from Two Views: Linear Algorithms While it is straight-forward to extend the eight-point algorithm, such that one can extract a fundamental matrix from a set of corresponding image points, it is less straight forward how to proceed from there. Prof. Daniel Cremers Firstly, one cannot impose a strong constraint on the specific structure of the fundamental matrix (apart from the fact that the last singular value is zero). The Reconstruction Problem Secondly, for a given fundamental matrix F , there does not exist a finite number of decompositions into extrinsic parameters R, T and intrinsic parameters K (even apart from the global scale factor). Structure Reconstruction The Epipolar Constraint Eight-Point Algorithm Four-Point Algorithm The Uncalibrated Case As a consequence, one can only determine so-called projective reconstructions, i.e. reconstructions of geometry and camera position which are defined up to a so-called projective transformation. As a solution, one typically choses a canonical reconstruction from the family of possible reconstructions. updated May 23, 2019 27/27 Reconstruction from Multiple Views Prof. Daniel Cremers Chapter 6 Reconstruction from Multiple Views Multiple View Geometry Summer 2019 From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Prof. Daniel Cremers Chair for Computer Vision and Artificial Intelligence Departments of Informatics & Mathematics Technical University of Munich Multiple-View Reconstruction of Lines updated May 23, 2019 1/43 Overview Reconstruction from Multiple Views Prof. Daniel Cremers 1 From Two Views to Multiple Views 2 Preimage & Coimage from Multiple Views 3 From Preimages to Rank Constraints From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints 4 Geometric Interpretation 5 The Multiple-view Matrix Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints 6 Relation to Epipolar Constraints 7 Multiple-View Reconstruction Algorithms Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines 8 Multiple-View Reconstruction of Lines updated May 23, 2019 2/43 Multiple-View Geometry In this section, we deal with the problem of 3D reconstruction given multiple views of a static scene, either obtained simultaneously, or sequentially from a moving camera. The key idea is that the three-view scenario allows to obtain more measurements to infer the same number of 3D coordinates. For example, given two views of a single 3D point, we have four measurements (x- and y-coordinate in each view), while the three-view case provides 6 measurements per point correspondence. As a consequence, the estimation of motion and structure will generally be more constrained when reverting to additional views. The three-view case has traditionally been addressed by the so-called trifocal tensor [Hartley ’95, Vieville ’93] which generalizes the fundamental matrix. This tensor – as the fundamental matrix – does not depend on the scene structure but rather on the inter-frame camera motion. It captures a trilinear relationship between three views of the same 3D point or line [Liu, Huang ’86, Spetsakis, Aloimonos ’87]. Reconstruction from Multiple Views Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 3/43 Trifocal Tensor versus Multiview Matrices Reconstruction from Multiple Views Prof. Daniel Cremers Traditionally the trilinear relations were captured by generalizing the concept of the Fundamental Matrix to that of a Trifocal Tensor. It was developed among others by [Liu and Huang ’86], [Spetsakis, Aloimonos ’87]. The use of tensors was promoted by [Vieville ’93] and [Hartley ’95]. Bilinear, trilinear and quadrilinear constraints were formulated in [Triggs ’95]. This line of work is summarized in the books: Faugeras and Luong, “The Geometry of Multiple Views”, 2001 and Hartley and Zisserman, “Multiple View Geometry”, 2001, 2003. In the following, however, we stick with a matrix notation for the multiview scenario. This approach makes use of matrices and rank constraints on these matrices to impose the constraints from multiple views. Such rank constraints were used by many authors, among others in [Triggs ’95] and in [Heyden, Åström ’97]. This line of work is summarized in the book From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines Ma, Soatto, Kosecka, Sastry, “An Invitation to 3D Vision”, 2004. updated May 23, 2019 4/43 Preimage from Multiple Views Reconstruction from Multiple Views Prof. Daniel Cremers A preimage of multiple images of a point or a line is the (largest) set of 3D points that gives rise to the same set of multiple images of the point or the line. For example, given the two images `1 and `2 of a line L, the preimage of these two images is the intersection of the planes P1 and P2 , i.e. exactly the 3D line L = P1 ∩ P2 . In general, the preimage of multiple images of points and lines can be defined by the intersection: preimage(x 1 , . . . , x m ) = preimage(x 1 ) ∩ · · · ∩ preimage(x m ), preimage(`1 , . . . , `m ) = preimage(`1 ) ∩ · · · ∩ preimage(`m ). The above definition allows us to compute preimages for any set of image points or lines. The preimage of multiple image lines, for example, can be either an empty set, a point, a line or a plane, depending on whether or not they come from the same line in space. From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 5/43 Preimage and Coimage of Points and Lines Reconstruction from Multiple Views Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Images of a point p on a line L: • Preimages P1 and P2 of the image lines should intersect in the line L. • Preimages of the two image points x 1 and x 2 should intersect in the point p. • Normals `1 and `2 define the coimages of the line L. Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 6/43 Reconstruction from Multiple Views Preimage and Coimage of Points and Lines Prof. Daniel Cremers For a moving camera at time t, let x(t) denote the image coordinates of a 3D point X in homogeneous coordinates: λ(t)x(t) = K (t)Π0 g(t)X , where λ(t) denotes the depth of the point, K (t) denotes the intrinsic parameters, Π0 the generic projection, and R(t) T (t) g(t) = ∈ SE(3), 0 1 Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix denotes the rigid body motion at time t. Let us consider a 3D line L in homogeneous coordinates: L = {X | X = X 0 + µV , µ ∈ R} From Two Views to Multiple Views ⊂ R4 , where X 0 = [X0 , Y0 , Z0 , 1]> ∈ R4 are the coordinates of the base point p0 and V = [V1 , V2 , V3 , 0]> ∈ R4 is a nonzero vector indicating the line direction. Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 7/43 Preimage and Coimage of Points and Lines Reconstruction from Multiple Views Prof. Daniel Cremers The preimage of L with respect to the image at time t is a plane ˆ The vector `(t) is P with normal `(t), where P = span(`). orthogonal to all points x(t) of the line: `(t)> x(t) = `(t)> K (t)Π0 g(t)X = 0. Assume we are given a set of m images at times t1 , . . . , tm where λi = λ(ti ), x i = x(ti ), `i = `(ti ), Πi = K (ti )Π0 g(ti ). With this notation, we can relate the i-th image of a point p to its world coordinates X : λi x i = Πi X , and the i-th coimage of a line L to its world coordinates (X 0 , V ): From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines > `> i Πi X 0 = `i Πi V = 0. updated May 23, 2019 8/43 From Preimages to Rank Constraints Reconstruction from Multiple Views Prof. Daniel Cremers The above equations contain the 3D parameters of points and lines as unknowns. As in the two-view case, we wish to eliminate these unknowns so as to obtain relationships between the 2D projections and the camera parameters. In the two-view case an elimination of the 3D coordinates lead to the epipolar constraint for the essential matrix E or (in the uncalibrated case) the fundamental matrix F . The 3D coordinates (depth values λi associated with each point) could subsequently obtained from another constraint. From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints There exist different ways to eliminate the 3D parameters leading to different kinds of constraints which have been studied in Computer Vision. Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines A systematic elimination of these constraints will lead to a complete set of conditions. updated May 23, 2019 9/43 Reconstruction from Multiple Views Point Features Prof. Daniel Cremers Consider images of a 3D point X seen in multiple views: x1 0 · · · 0 λ1 Π1 0 x2 0 0 λ2 Π2 I ~λ ≡ . = . X ≡ ΠX , . . . . .. .. .. .. .. .. 0 0 · · · xm λm Πm From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints which is of the form I ~λ = ΠX , where ~λ ∈ Rm is the depth scale vector, and Π ∈ R3m×4 the multiple-view projection matrix associated with the image matrix I ∈ R3m×m . Note that apart from the 2D coordinates I, everything else in the above equations is unknown. As in the two-view case, the goal is to decouple the above equation into constraints which allow to separately recover the camera displacements Πi on one hand and the scene structure λi and X on the other hand. Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 10/43 Reconstruction from Multiple Views Point Features Prof. Daniel Cremers Every column of I lies in a four-dimensional space spanned by columns of the matrix Π. In order to have a solution to the above equation, the columns of I and Π must therefore be linearly dependent. In other words, the matrix Π1 x 1 0 · · · 0 Π2 0 x 2 0 0 3m×(m+4) Np ≡ (Π, I) = . . . .. ∈ R . .. .. .. .. . Πm 0 0 · · · xm must have a nontrivial right null space. For m ≥ 2 (i.e. 3m ≥ m + 4), full rank would be m + 4. Linear dependence of columns therefore implies the rank constraint: rank(Np ) ≤ m + 3. From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines In fact, the vector u ≡ (X > , −~λ> )> ∈ Rm+4 is in the right null space, as Np u = 0. updated May 23, 2019 11/43 Reconstruction from Multiple Views Point Features Prof. Daniel Cremers For a more compact formulation of the above rank constraint, we introduce the matrix c1 0 · · · 0 x 0 x c2 · · · 0 3m×3m I⊥ ≡ . , .. . . .. ∈ R .. . . . 0 0 · · · xcm which has the property of “annihilating” I: I ⊥ I = 0, we can premultiply the above equation to obtain ⊥ I ΠX = 0. From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 12/43 Reconstruction from Multiple Views Point Features Prof. Daniel Cremers Thus the vector X is in the null space of the matrix c1 Π1 x x c2 Π2 Wp ≡ I ⊥ Π = ∈ R3m×4 . .. . xcm Πm To have a nontrivial solution, we must have From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation rank(Wp ) ≤ 3. If all images x i are from a single 3D point X , then the matrix Wp should only have a one-dimensional null space. Given m images x i ∈ R3 of a point p with respect to m camera frames Πi , we must have the rank condition The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines rank(Wp ) = rank(Np ) − m ≤ 3. updated May 23, 2019 13/43 Reconstruction from Multiple Views Line Features Prof. Daniel Cremers We can derive a similar rank constraint for lines. As we saw above, for the coimages `i , i = 1, . . . , m of a line L spanned by a base X 0 and a direction V we have: > `> i Πi X 0 = `i Πi V = 0. From Two Views to Multiple Views Preimage & Coimage from Multiple Views Therefore the matrix Wl ≡ `> 1 Π1 `> 2 Π2 .. . `> m Πm From Preimages to Rank Constraints Geometric Interpretation ∈ Rm×4 must satisfy the rank constraint The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines rank(Wl ) ≤ 2, since the null space of Wl contains the two vectors X 0 and V . updated May 23, 2019 14/43 Rank Constraints: Geometric Interpretation Reconstruction from Multiple Views Prof. Daniel Cremers In the case of a point X , we had the equation c1 Π1 x x c2 Π2 Wp X = 0, with Wp = ∈ R3m×4 . .. . xcm Πm Since all matrices xbi have rank 2, the number of independent rows in Wp is at most 2m. These rows define a set of 2m planes. Since Wp X = 0, the point X is in the intersection of all these planes. In order for the 2m planes to have a unique intersection, we need to have rank(Wp ) = 3. From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 15/43 Rank Constraints: Geometric Interpretation Reconstruction from Multiple Views Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Preimage of two image points. The rows of the matrix Wp correspond to the normal vectors of four planes. The (nontrivial) rank constraint states that these four planes have to intersect in a single point. Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 16/43 Rank Constraints: Geometric Interpretation Reconstruction from Multiple Views Prof. Daniel Cremers In the case of a line L in two views, we have the equation > ` Π1 ∈ R2×4 . rank(Wl ) ≤ 2, with Wl = 1> `2 Π2 Clearly, we already have rank(Wl ) ≤ 2, so there is no intrinsic constraint on two images of a line: The preimage of two image lines always contains a line. We shall see that this is no longer true for three or more images of a line, then the above constraint really becomes meaningful. From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 17/43 Rank Constraints: Geometric Interpretation Reconstruction from Multiple Views Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Preimage of two image lines. For the case of a line observed from two images, the rank constraint is always fulfilled. Geometrically this states that the two preimages of each line always intersect in some 3D line. Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 18/43 Reconstruction from Multiple Views The Multiple-view Matrix of a Point In the following, the rank constraints will be rewritten in a more compact and transparent manner. Let us assume we have m images, the first of which is in world coordinates. Then we have projection matrices of the form Π1 = [I, 0], Π2 = [R2 , T2 ], . . . , Πm = [Rm , Tm ] ∈ R3×4 , which model the projection of a point X into the individual images. In general for uncalibrated cameras (i.e. Ki 6= I), Ri will not be an orthogonal rotation matrix but rather an arbitrary invertible matrix. Again, we define the matrix Wp as follows: c1 Π1 x x c2 Π2 Wp ≡ I ⊥ Π = ∈ R3m×4 . .. . Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines xcm Πm updated May 23, 2019 19/43 Reconstruction from Multiple Views The Multiple-view Matrix of a Point Prof. Daniel Cremers The rank of the matrix Wp is not affected if we multiply by a full-rank matrix Dp ∈ R4×5 as follows: c1 x c1 x x c2 R2 x c1 c x1 x1 0 x c R x = 3 3 c1 0 0 1 .. . xcm Πm c1 xcm Rm x c1 Π1 x x c2 Π2 W p Dp = . .. 0 c2 R2 x 1 x c3 R3 x 1 x .. . xcm Rm x 1 This means that rank(Wp ) ≤ 3 if and only if the submatrix c2 R2 x 1 x c2 T2 x x c3 T3 c3 R3 x 1 x Mp ≡ ∈ R3(m−1)×2 .. .. . . xcm Rm x 1 xcm Tm 0 c2 T2 x c3 T3 x . .. . xcm Tm From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines has rank(Mp ) ≤ 1. updated May 23, 2019 20/43 Reconstruction from Multiple Views The Multiple-view Matrix of a Point Prof. Daniel Cremers The matrix Mp ≡ c2 R2 x 1 x c3 R3 x 1 x .. . xcm Rm x 1 c2 T2 x c3 T3 x .. . xcm Tm ∈ R3(m−1)×2 is called the multiple-view matrix associated with a point p. It involves both the image x 1 in the first view and the coimages xbi in the remaining views. From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix In summary: Relation to Epipolar Constraints For multiple images of a point p the matrices Np , Wp and Mp satisfy: Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines rank(Mp ) = rank(Wp ) − 2 = rank(Np ) − (m + 2) ≤ 1. updated May 23, 2019 21/43 Multiview Matrix: Geometric Interpretation Reconstruction from Multiple Views Let us look into the geometric information contained in the multiple-view matrix c2 R2 x 1 x c2 T2 x x c3 T3 c3 R3 x 1 x Mp ≡ ∈ R3(m−1)×2 . .. .. . . xcm Rm x 1 xcm Tm Prof. Daniel Cremers The constraint rank(Mp ) ≤ 1 implies that the two columns are linearly dependent. In fact we have λ1 xbi Ri x 1 + xbi Ti = 0, i = 2, . . . , m which yields λ1 Mp = 0. 1 Geometric Interpretation Therefore the coefficient capturing the linear dependence is simply the distance λ1 of the point p from the first camera center. In other words, the multiple-view matrix captures exactly the information about a point p that is missing from a single image, but encoded in multiple images. From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 22/43 Reconstruction from Multiple Views Relation to Epipolar Constraints Prof. Daniel Cremers For the multiple-view matrix c2 R2 x 1 x c2 T2 x x c c3 T3 R x x 3 3 1 Mp ≡ .. .. . . xcm Rm x 1 xcm Tm ∈ R3(m−1)×2 . to have rank(Mp ) = 1, it is necessary that the pair of vectors xbi Ti and xbi Ri x 1 to be linearly dependent for all i = 2, . . . , m. This gives the epipolar constraints b x> i Ti Ri x 1 = 0 between the first and the i-th image. (Proof see next slide) From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines Yet, we shall see that the multiview constraint provides more information than the pairwise epipolar constraints. updated May 23, 2019 23/43 Relation to Epipolar Constraints In the previous slide, we claimed that the linear dependence of xbi Ti and xbi Ri x 1 gives rise to the epipolar constraint x> i T̂i Ri x 1 = 0. In the following, we shall give a proof of this statement which provides an intuitive geometric understanding of this relationship. Assume the two vectors xbi Ti and xbi Ri x 1 are dependent, i.e. there is a scalar γ, such that xbi Ti = γ xbi Ri x 1 . Reconstruction from Multiple Views Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation Since xbi Ti ≡ x i × Ti is proportional to the normal on the plane spanned by x i and Ti , and xbi Ri x 1 is proportional to the normal spanned by x i and Ri x 1 , the linear dependence is equivalent to saying that the three vectors x i , Ti and Ri x 1 are coplanar. The Multiple-view Matrix This again is equivalent to saying that the vector x i is orthogonal to the normal on the plane spanned by the vectors Ti and Ri x 1 , i.e. Multiple-View Reconstruction of Lines Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms > x> i (Ti × Ri x 1 ) = x i T̂i Ri x 1 = 0. updated May 23, 2019 24/43 Analysis of the Multiple-view Constraint Reconstruction from Multiple Views Prof. Daniel Cremers For any nonzero vectors ai , bi a1 b1 a2 b2 .. .. . . an ∈ R3 , i = 1, 2, . . . , n, the matrix ∈ R3n×2 bn is rank-deficient if and only if ai bj> − bi aj> = 0 for all i, j = 1, . . . , n. We will not prove this statement. Applied to the rank constraint on Mp we get: xbi Ri x 1 (xbj Tj )> − xbi Ti (xbj Rj x 1 )> = 0, which gives the trilinear constraint > > b xbi (Ti x > 1 Rj − Ri x 1 Tj )x j = 0. From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines This is a matrix equation giving 3 × 3 = 9 scalar trilinear equations, only four of which are linearly independent. updated May 23, 2019 25/43 Reconstruction from Multiple Views Analysis of the Multiple-view Constraint Prof. Daniel Cremers From the equations xbi Ri x 1 (xbj Tj )> − xbi Ti (xbj Rj x 1 )> = 0, ∀i, j, we see that as long as the entries in xbj Tj and xbj Rj x 1 are non-zero, it follows from the above, that the two vectors xbi Ri x 1 and xbi Ti are linearly dependent. If on the other hand xbj Tj = xbj Rj x 1 = 0 for some view j, then we have the rare degenerate case that the point p lies on the line through the optical centers o1 and oj . From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix In other words: Except for degeneracies, the bilinear (epipolar) constraints relating two views are already contained in the trilinear constraints obtained for the multiview scenario. Relation to Epipolar Constraints Note that the equivalence between the bilinear and trilinear constraints on one hand and the condition that rank(Mp ) ≤ 1 on the other only holds if the vectors in Mp are nonzero. In certain degenerate cases this is is not fulfilled. Multiple-View Reconstruction of Lines Multiple-View Reconstruction Algorithms updated May 23, 2019 26/43 Reconstruction from Multiple Views Uniqueness of the Preimage We will now clarify how the bilinear and trilinear constraints help to assure the uniqueness of the preimage of a point observed in three images. Let x 1 , x 2 , x 3 ∈ R3 be the 2D point coordinates in three camera frames with distinct optical centers. If the three images satisfy the pairwise epipolar constraints c x> i Tij Rij x j = 0, i, j = 1, 2, 3, Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation then a unique preimage is determined except if the three lines associated to image points x 1 , x 2 , x 3 are coplanar. Here Tij and Rij refer to the transition between frames i and j. Similarly, if these vectors satisfy all trilinear constraints > > c xbj (Tji x > i Rki − Rji x i Tki )x k = 0, i, j, k = 1, 2, 3, The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines then a unique preimage is determined unless the three lines associated to image points x 1 , x 2 , x 3 are colinear. We will not prove these statements. updated May 23, 2019 27/43 Degeneracies for the Bilinear Constraints Reconstruction from Multiple Views Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix In the above example, the point p lies in the plane spanned by the three optical centers which is also called the trifocal plane. In this case, all pairs of lines do intersect, yet it does not imply a unique 3D point p (a unique preimage). In practice this degenerate case arises rather seldom. Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 28/43 Degeneracies for the Bilinear Constraints Reconstruction from Multiple Views Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation In the above example, the optical centers lie on a straight line (rectilinear motion). Again, all pairs of lines may intersect without there being a unique preimage p. This case is frequent in applications when the camera moves in a straight line (e.g. a car on a highway). Then the epipolar constraints will not allow a unique reconstruction. The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines Fortunately, the trilinear constraint assures a unique preimage (unless p is also on the same line with the optical centers). updated May 23, 2019 29/43 Uniqueness of the Preimage Reconstruction from Multiple Views Prof. Daniel Cremers Using the multiple-view matrix we obtain a more general and simpler characterization regarding the uniqueness of the preimage: Given m vectors representing the m images of a point in m views, they correspond to the same point in the 3D space if the rank of the Mp matrix relative to any of the camera frames is one. If the rank is zero, the point is determined up to the line on which all the camera centers must lie. In summary we get: rank(Mp ) = 2 ⇒ no point correspondence & empty preimage rank(Mp ) = 1 ⇒ point correspondence & unique preimage rank(Mp ) = 0 ⇒ point correspondence & preimage not unique From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines With these constraints we could decide which features to match for establishing point correspondence over multiple frames. updated May 23, 2019 30/43 Multiple-view Factorization of Point Features Reconstruction from Multiple Views Prof. Daniel Cremers The rank condition on the multiple-view matrix captures all the constraints among multiple images of a point. In principle, one could perform reconstruction by maximizing some global objective function subject to the rank condition. This would lead to a nonlinear optimization problem analogous to the bundle adjustment in the two-view case. Alternatively, one can aim for a similar separation of structure and motion as done for the two-view case in the eight-point algorithm. Such an algorithm shall be detailed in the following. One should point out that this approach does not necessarily lead to a practical algorithm as the spectral approaches do not imply optimality in the context of noise and uncertainty. From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 31/43 Multiple-view Factorization of Point Features Suppose we have m images x j1 , . . . , x jm of n points pj and we want to estimate the unknown projection matrix Π. The condition rank(Mp ) ≤ 1 states that the two columns of Mp are linearly dependent. For the j-th point pj this implies c c x j2 R2 x j1 x j2 T2 c c xj R xj j 3 3 1 j x 3 T3 + α = 0 ∈ R3(m−1)×1 , .. .. . . cj cj j x m Rm x 1 x m Tm for some parameters αj ∈ R, j = 1, . . . , n. Each row in the above equation can be obtained from λji x ji = λj1 Ri x j1 + Ti , b multiplying by x ji : b b x ji Ri x j1 + x ji Ti /λj1 = 0. Reconstruction from Multiple Views Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines Therefore, αj = 1/λj1 is nothing but the inverse of the depth of point pj with respect to the first frame. updated May 23, 2019 32/43 Reconstruction from Multiple Views Motion Estimation from Known Structure Prof. Daniel Cremers Assume we have the depth of the points and thus their inverse αj (i.e. known structure). Then the above equation is linear in the camera motion parameters Ri and Ti . Using the stack notation Ris = [r11 , r21 , r31 , r12 , r22 , r32 , r13 , r23 , r33 ]> ∈ R9 and Ti ∈ R3 , we have the linear equation system s Ri = Pi Ti > x 11 > x 21 c1 ⊗x i c2 ⊗x i .. . cn x n1 > ⊗ x i c1 α1 x i c2 α2 x i .. . cn αn x i s Ri Ti = 0 From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints ∈ R3n . One can show that the matrix Pi ∈ R3n×12 is of rank 11 if more than n = 6 points in general position are given. In that case the null space of Pi is onedimensional and the projection matrix Πi = (Ri , Ti ) is given up to a scale factor. In practice one would use more than 6 points, obtain a full-rank matrix and compute the solution by a singular value decomposition (SVD). Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 33/43 Structure Estimation from Known Motion In turn, if the camera motion Πi = (Ri , Ti ), i = 1, . . . , m is known, we can estimate the structure (depth parameters αj , j = 1, . . . , m). The least squares solution for the above equation is given by: αj = − Prof. Daniel Cremers From Two Views to Multiple Views Pm bj > bj j i=2 (x i Ti ) x i Ri x 1 , Pm bj 2 k x T k i i=2 i Reconstruction from Multiple Views j = 1, . . . , n. Preimage & Coimage from Multiple Views From Preimages to Rank Constraints In this way one can iteratively estimate structure and motion, estimating one while keeping the other fixed. Geometric Interpretation For initialization one could apply the eight-point algorithm to the first two images to obtain an estimate of the structure parameters αj . Relation to Epipolar Constraints While the equation for Πi makes use of the two frames 1 and i only, the structure parameter estimation takes into account all frames. This can be done either in batch mode or recursively. Multiple-View Reconstruction of Lines The Multiple-view Matrix Multiple-View Reconstruction Algorithms As for the two-view case, such spectral approaches do not guarantee optimality in the presence of noise and uncertainty. updated May 23, 2019 34/43 Reconstruction from Multiple Views Multiple-view Matrix for Lines Prof. Daniel Cremers The matrix Wl = `> 1 Π1 `> 2 Π2 .. . `> m Πm ∈ Rm×4 From Two Views to Multiple Views Preimage & Coimage from Multiple Views associated with m images of a line in space satisfies the rank constraint rank(Wl ) ≤ 2, because Wl X 0 = Wl V = 0 for the base point X 0 and the direction V of the line. To find a more compact representation, let us assume that the first camera is in world coordinates, i.e. Π1 = (I, 0). The rank is not affected by multiplying with a full-rank matrix Dl ∈ R4×5 : `> 1 `> R2 2 W l Dl = . .. `> m Rm `> 1 `1 `> R2 `1 2 0 `> 2 T2 `1 `b1 0 = . .. .. . 0 0 1 > `m Tm `> m Rm ` 1 0 b `> R 2 2 `1 .. . `> Rm `b1 m From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints 0 `> 2 T2 .. . Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines `> m Tm updated May 23, 2019 35/43 Multiple-view Matrix for Lines Since multiplication with a full rank matrix does not affect the rank, we have Reconstruction from Multiple Views Prof. Daniel Cremers rank(Wl Dl ) = rank(Wl ) ≤ 2. Since the first column of Wl Dl is linearly independent from the remaining ones, the submatrix b `> 2 R2 `1 . .. Ml = `> Rm `b1 m `> 2 T2 .. . From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints ∈ R(m−1)×4 , `> m Tm Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints has the rank constraint: rank(Ml ) ≤ 1. For a line projected into m images, we have a much stronger rank-constraint than for a projected point: For a sufficiently large number of views m, the matrix Ml could in principle have a rank of four. The above constraint states that a meaningful preimage of m observed lines can only exist if rank(Ml ) ≤ 1. Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 36/43 Trilinear Constraints for a Line Reconstruction from Multiple Views Prof. Daniel Cremers Again, we can take a closer look at the meaning of the above rank constraint. Regarding the first three columns of Ml it implies that respective row vectors must be pairwise linearly dependent, i.e. for all i, j 6= 1: > b b `> i Ri `1 ∼ `j Rj `1 , which is equivalent to the trilinear equation b > `> i Ri `1 Rj `j = 0. Proof: The above proportionality states that the three vectors Ri> `i , Rj> `j and `1 are coplanar. The lower equation is the equivalent statement that the vector Ri> `i is orthogonal to the normal on the plane spanned by Rj> `j and `1 . From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines Interestingly, the above constraint only involves the camera rotations, not the camera translations. updated May 23, 2019 37/43 Reconstruction from Multiple Views Trilinear Constraints for a Line Prof. Daniel Cremers Taking into account the fourth column of the multiple-view matrix Ml , the rank constraint implies the linear dependency between the ith and the jth row. This is equivalent to the trilinear constraint: > b `> j Tj `i Ri `1 − > b `> i Ti `j Rj `1 = 0. The proof follows from the general lemma on page 25. The above constraint relates the first, the ith and the jth images. From previous discussion, we saw that all nontrivial constraints in the case of lines involve at least three images. The two trilinear constraints above are equivalent to the rank constraint if the scalar `> i Ti 6= 0, i.e. in non-degenerate cases. In general, rank(Ml ) ≤ 1 if and only if all its 2 × 2-minors (deutsch: Untermatrizen), have zero determinant. Since these minors only include three images at a time, one can conclude that any multiview constraint on lines can be reduced to constraints which only involve three lines at a time. From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 38/43 Reconstruction from Multiple Views Uniqueness of the Preimage Prof. Daniel Cremers The key idea of the rank constraint on the multiple-view matrix Ml was to assure that m observations of a line correspond to a consistent preimage L. The uniqueness of the preimage in the case of the trilinear constraints can be characterized as follows. From Two Views to Multiple Views Lemma: Given three camera frames with distinct optical centers and any three vectors `1 , `2 , `3 ∈ R3 that represent three image lines. If the three image lines satisfy the trilinear constraints > > > b b `> j Tji `k Rki `i − `k Tki `j Rji `i = 0, i, j, k ∈ {1, 2, 3}, then their preimage L is uniquely determined except for the case in which the preimage of every `i is the same plane in space. This is the only degenerate case, and in this case, the matrix Ml becomes zero. Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines Note that the above constraint combines the two trilinear constraints introduced on the previous slides. updated May 23, 2019 39/43 Uniqueness of the Preimage Reconstruction from Multiple Views Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines No preimage: The lines L2 and L3 don’t coincide. updated May 23, 2019 40/43 Uniqueness of the Preimage Reconstruction from Multiple Views Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines Uniqueness of the preimage: The lines L2 and L3 coincide. updated May 23, 2019 41/43 Uniqueness of the Preimage A similar statement can be made regarding the uniqueness of the preimage of m lines in relation to the rank of the multiview matrix Ml . Theorem: Given m vectors `i ∈ R3 representing images of lines with respect to m camera frames. They correspond to the same line in space if the rank of the matrix Ml relative to any of the camera frames is 1. If its rank is 0 (i.e. the matrix Ml itself is zero), then the line is determined up to a plane on which all the camera centers must lie. Overall we have the following cases: rank(Ml ) = 2 ⇒ no line correspondence rank(Ml ) = 1 ⇒ line correspondence & unique preimage Reconstruction from Multiple Views Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines rank(Ml ) = 0 ⇒ line correspondence & preimage not unique updated May 23, 2019 42/43 Reconstruction from Multiple Views Summary One can generalize the two-view scenario to that of simultaneously considering m ≥ 2 images of a scene. The intrinsic constraints among multiple images of a point or a line can be expressed in terms of rank conditions on the matrix N, W or M. The relationship among these rank conditions is as follows: (Pre)image coimage Jointly Prof. Daniel Cremers From Two Views to Multiple Views Preimage & Coimage from Multiple Views From Preimages to Rank Constraints Geometric Interpretation Point rank(Np ) ≤ m + 3 rank(Wp ) ≤ 3 rank(Mp ) ≤ 1 Line rank(Nl ) ≤ 2m + 2 rank(Wl ) ≤ 2 rank(Ml ) ≤ 1 These rank conditions capture the relationships among corresponding geometric primitives in multiple images. They impose the existence of unique preimages (up to degenerate cases). Moreover, they give rise to natural factorization-based algorithms for multiview recovery of 3D structure and motion (i.e. generalizations of the eight-point algorithm). The Multiple-view Matrix Relation to Epipolar Constraints Multiple-View Reconstruction Algorithms Multiple-View Reconstruction of Lines updated May 23, 2019 43/43 Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers Chapter 7 Bundle Adjustment & Nonlinear Optimization Multiple View Geometry Summer 2019 Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization Gradient Descent Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary Example Applications Prof. Daniel Cremers Chair for Computer Vision and Artificial Intelligence Departments of Informatics & Mathematics Technical University of Munich updated May 23, 2019 1/23 Overview Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers 1 Optimality in Noisy Real World Conditions 2 Bundle Adjustment 3 Nonlinear Optimization Optimality in Noisy Real World Conditions Bundle Adjustment 4 Gradient Descent Nonlinear Optimization Gradient Descent 5 Least Squares Estimation 6 Newton Methods Least Squares Estimation Newton Methods The Gauss-Newton Algorithm 7 The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm 8 The Levenberg-Marquardt Algorithm Summary Example Applications 9 Summary 10 Example Applications updated May 23, 2019 2/23 Optimality in Noisy Real World Conditions In the previous chapters we discussed linear approaches to solve the structure and motion problem. In particular, the eight-point algorithm provides closed-form solutions to estimate the camera parameters and the 3D structure, based on singular value decomposition. However, if we have noisy data x˜1 , x˜2 (correspondences not exact or even incorrect), then we have no guarantee Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization Gradient Descent • that R and T are as close as possible to the true solution. • that we will get a consistent reconstruction. Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary Example Applications updated May 23, 2019 3/23 Statistical Approaches to Cope with Noise Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers The linear approaches are elegant because optimal solutions to respective problems can be computed in closed form. However, they often fail when dealing with noisy and imprecise point locations. Since measurement noise is not explicitly considered or modeled, such spectral methods often provide suboptimal performance in noisy real-world conditions. In order to take noise and statistical fluctuation into account, one can revert to a Bayesian formulation and determine the most likely camera transformation R, T and ‘true’ 2D coordinates x given the measured coordinates x̃, by performing a maximum aposteriori estimate: arg max P(x, R, T | x̃) = arg max P(x̃ | x, R, T ) P(x, R, T ) x,R,T x,R,T This approach will however involve modeling probability densities P on the fairly complicated space SO(3) × S2 of rotation and translation parameters, as R ∈ SO(3) and T ∈ S2 (3D translation with unit length). Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization Gradient Descent Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary Example Applications updated May 23, 2019 4/23 Bundle Adjustment & Nonlinear Optimization Bundle Adjustment and Nonlinear Optimization Under the assumption that the observed 2D point coordinates x̃ are corrupted by zero-mean Gaussian noise, maximum likelihood estimation leads to bundle adjustment: E(R, T , X 1 , . . . , X N ) = N X 2 j j x̃ 1 − π(X j ) + x̃ 2 − π(R, T , X j ) 2 j=1 Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization It aims at minimizing the reprojection error between the j observed 2D coordinates x̃ i and the projected 3D coordinate X j (w.r.t. camera 1). Here π(R, T , X j ) denotes the perspective projection of X j after rotation and translation. m X N X Gradient Descent Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm For the general case of m images, we get: E({Ri , Ti }i=1..m , {X j }j=1..N ) = Prof. Daniel Cremers Summary j θij x̃ i 2 − π(Ri , Ti , X j ) , Example Applications i=1 j=1 with T1 = 0 and R1 = 1. θij = 1 if point j is visible in image i, θij = 0 else. The above problems are non-convex. updated May 23, 2019 5/23 Bundle Adjustment & Nonlinear Optimization Different Parameterizations of the Problem Prof. Daniel Cremers The same optimization problem can be parameterized differently. For example, we can introduce x ji to denote the true j 2D coordinate associated with the measured coordinate x̃ i : E({x j1 , λj1 }j=1..N , R, T ) = N X Optimality in Noisy Real World Conditions j j kx j1 − x̃ 1 k2 +kx̃ 2 −π(Rλj1 x j1 +T )k2 . j=1 N X 2 X j kx ji − x̃ i k2 j=1 i=1 x j> 1 e3 = 1, Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary subject to (consistent geometry): j b x j> 2 T Rx 1 = 0, Nonlinear Optimization Gradient Descent Alternatively, we can perform a constrained optimization by minimizing a cost function (similarity to measurements): E({x ji }j=1..N , R, T ) = Bundle Adjustment Example Applications x j> 2 e3 = 1, j = 1, . . . , N. updated May 23, 2019 6/23 Some Comments on Bundle Adjustment Bundle Adjustment & Nonlinear Optimization Bundle adjustment aims at jointly estimating 3D coordinates of points and camera parameters – typically the rigid body motion, but sometimes also intrinsic calibration parameters or radial distortion. Different models of the noise in the observed 2D points leads to different cost functions, zero-mean Gaussian noise being the most common assumption. Prof. Daniel Cremers The approach is called bundle adjustment (dt.: Bündelausgleich) because it aims at adjusting the bundles of light rays emitted from the 3D points. Originally derived in the field of photogrammetry in the 1950s, it is now used frequently in computer vision. A good overview can be found in: Triggs, McLauchlan, Hartley, Fitzgibbon, “Bundle Adjustment – A Modern Synthesis”, ICCV Workshop 1999. Nonlinear Optimization Typically it is used as the last step in a reconstruction pipeline because the minimization of this highly non-convex cost function requires a good initialization. The minimization of non-convex energies is a challenging problem. Bundle adjustment type cost functions are typically minimized by nonlinear least squares algorithms. Summary Optimality in Noisy Real World Conditions Bundle Adjustment Gradient Descent Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Example Applications updated May 23, 2019 7/23 Nonlinear Programming Nonlinear programming denotes the process of iteratively solving a nonlinear optimization problem, i.e. a problem involving the maximization or minimization of an objective function over a set of real variables under a set of equality or inequality constraints. There are numerous methods and techniques. Good overviews of respective methods can be found for example in Bersekas (1999) “Nonlinear Programming”, Nocedal & Wright (1999), “Numerical Optimization” or Luenberger & Ye (2008), “Linear and nonlinear programming”. Depending on the cost function, different algorithms are employed. In the following, we will discuss (nonlinear) least squares estimation and several popular iterative techniques for nonlinear optimization: • the gradient descent, • Newton methods, Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization Gradient Descent Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary Example Applications • the Gauss-Newton algorithm, • the Levenberg-Marquardt algorithm. updated May 23, 2019 8/23 Gradient Descent Gradient descent or steepest descent is a first-order optimization method. It aims at computing a local minimum of a (generally) non-convex cost function by iteratively stepping in the direction in which the energy decreases most. This is given by the negative energy gradient. To minimize a real-valued cost E : Rn → R, the gradient flow for E(x) is defined by the differential equation: ( x(0) = x0 dx dt Discretization: xk +1 = xk − Prof. Daniel Cremers Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization Gradient Descent Least Squares Estimation = − dE dx (x) dE dx (xk ), Bundle Adjustment & Nonlinear Optimization Newton Methods k = 0, 1, 2, . . . . The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm E(x) E(x) Summary Example Applications updated May 23, 2019 9/23 Gradient Descent Under certain conditions on E(x), the gradient descent iteration xk +1 = xk − dE (xk ), dx Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers k = 0, 1, 2, . . . converges to a local minimum. For the case of convex E, this will also be the global minimum. The step size can be chosen differently in each iteration. Gradient descent is a popular and broadly applicable method. It is typically not the fastest solution to compute minimizers because the asymptotic convergence rate is often inferior to that of more specialized algorithms. First-order methods with optimal convergence rates were pioneered by Yuri Nesterov. In particular, highly anisotropic cost functions (with strongly different curvatures in different directions) require many iterations and trajectories tend to zig-zag. Locally optimal step sizes in each iteration can be computed by line search. For specific cost functions, alternative techniques such as the conjugate gradient method, Newton methods, or the BFGS method are preferable. Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization Gradient Descent Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary Example Applications updated May 23, 2019 10/23 Linear Least Squares Estimation Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers Ordinary least squares or linear least squares is a method to for estimating a set of parameters x ∈ Rd in a linear regression model. Assume for each input vector bi ∈ Rd , i ∈ {1, .., n}, we observe a scalar response ai ∈ R. Assume there is a linear relationship of the form ai = bi> x + ηi Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization Gradient Descent with an unknown vector x ∈ Rd and zero-mean Gaussian noise η ∼ N (0, Σ) with a diagonal covariance matrix of the form Σ = σ 2 In . Maximum likelihood estimation of x leads to the ordinary least squares problem: X min (ai − x > bi )2 = (a − Bx)> (a − Bx). x i Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary Example Applications Linear least squares estimation was introduced by Legendre (1805) and Gauss (1795/1809). When asking for which noise distribution the optimal estimator was the arithmetic mean, Gauss invented the normal distribution. updated May 23, 2019 11/23 Bundle Adjustment & Nonlinear Optimization Linear Least Squares Estimation For general Σ, we get the generalized least squares problem: Prof. Daniel Cremers min(a − Bx)> Σ−1 (a − Bx). x This is a quadratic cost function with positive definite Σ−1 . It has the closed-form solution: −1 > x̂ = arg min(a − Bx) Σ x > −1 = (B Σ −1 B) > −1 B Σ (a − Bx) a. Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization Gradient Descent Least Squares Estimation Newton Methods If there is no correlation among the observed variances, then the matrix Σ is diagonal. This case is referred to as weighted least squares: X min wi (ai − x > bi )2 , with wi = σi−2 . x The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary Example Applications i For the case of unknown matrix Σ, there exist iterative estimation algorithms such as feasible generalized least squares or iteratively reweighted least squares. updated May 23, 2019 12/23 Iteratively Reweighted Least Squares The method of iteratively reweighted least squares aims at minimizing generally non-convex optimization problems of the form X min wi (x)|ai − fi (x)|2 , x Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers Optimality in Noisy Real World Conditions i with some known weighting function wi (x). A solution is obtained by iterating the following problem: X xt+1 = arg min wi (xt ) |ai − fi (x)|2 x i For the case that fi is linear, i.e. fi (x) = x > bi , each subproblem X xt+1 = arg min wi (xt ) |ai − x > bi |2 x i Bundle Adjustment Nonlinear Optimization Gradient Descent Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary Example Applications is simply a weighted least squares problem that can be solved in closed form. Nevertheless, this iterative approach will generally not converge to a global minimum of the original (nonconvex) problem. updated May 23, 2019 13/23 Nonlinear Least Squares Estimation Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers Nonlinear least squares estimation aims at fitting observations (ai , bi ) with a nonlinear model of the form ai ≈ f (bi , x) for some function f parameterized with an unknown vector x ∈ Rd . Minimizing the sum of squares error X min ri (x)2 , with ri (x) = ai − f (bi , x), x Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization i Gradient Descent is generally a non-convex optimization problem. Least Squares Estimation The optimality condition is given by Newton Methods X ∂ri = 0, ri ∂xj The Gauss-Newton Algorithm ∀ j ∈ {1, .., d}. i The Levenberg-Marquardt Algorithm Summary Typically one cannot directly solve these equation. Yet, there exist iterative algorithms for computing approximate solutions, including Newton methods, the Gauss-Newton algorithm and the Levenberg-Marquardt algorithm. Example Applications updated May 23, 2019 14/23 Newton Methods for Optimization Newton methods are second order methods: In contrast to first-order methods like gradient descent, they also make use of second derivatives. Geometrically, Newton method iteratively approximate the cost function E(x) quadratically and takes a step to the minimizer of this approximation. Let xt be the estimated solution after t iterations. Then the Taylor approximation of E(x) in the vicinity of this estimate is: Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization Gradient Descent 1 E(x) ≈ E(xt ) + g (x − xt ) + (x − xt )> H(x − xt ), 2 > The first and second derivative are denoted by the Jacobian g = dE/dx(xt ) and the Hessian matrix d 2 E/dx 2 (xt ). For this second-order approximation, the optimality condition is: dE = g + H(x − xt ) = 0 dx Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary (∗) Example Applications Setting the next iterate to the minimizer x leads to: xt+1 = xt − H −1 g. updated May 23, 2019 15/23 Newton Methods for Optimization Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers In practice, one often choses a more conservative step size γ ∈ (0, 1): xt+1 = xt − γ H −1 g. When applicable, second-order methods are often faster than first-order methods, at least when measured in number of iterations. In particular, there exists a local neighborhood around each optimum where the Newton method converges quadratically for γ = 1 (if the Hessian is invertible and Lipschitz continuous). Optimality in Noisy Real World Conditions For large optimization problems, computing and inverting the Hessian may be challenging. Moreover, since this problem is often not parallelizable, some second order methods do not profit from GPU acceleration. In such cases, one can aim to iteratively solve the extremality condition (∗). The Gauss-Newton Algorithm Bundle Adjustment Nonlinear Optimization Gradient Descent Least Squares Estimation Newton Methods The Levenberg-Marquardt Algorithm Summary Example Applications In case that H is not positive definite, there exist quasi-Newton methods which aim at approximating H or H −1 with a positive definite matrix. updated May 23, 2019 16/23 The Gauss-Newton Algorithm The Gauss-Newton algorithm is a method to solve non-linear least-squares problems of the form: X min ri (x)2 . x Prof. Daniel Cremers i It can be derived as an approximation to the Newton method. The latter iterates: xt+1 = xt − H −1 Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization g. Gradient Descent with the gradient g: Least Squares Estimation gj = 2 X ∂ri ri , ∂xj Newton Methods The Gauss-Newton Algorithm i The Levenberg-Marquardt Algorithm and the Hessian H: Hjk = 2 Bundle Adjustment & Nonlinear Optimization X i ∂ri ∂ri ∂ 2 ri + ri ∂xj ∂xk ∂xj ∂xk . Summary Example Applications Dropping the second order term leads to the approximation: X ∂ri Hjk ≈ 2 Jij Jik , with Jij = . ∂xj i updated May 23, 2019 17/23 Bundle Adjustment & Nonlinear Optimization The Gauss-Newton Algorithm Prof. Daniel Cremers The approximation H ≈ 2J > J, with the Jacobian J = dr , dx together with g = 2J > r , leads to the Gauss-Newton algorithm: Optimality in Noisy Real World Conditions Bundle Adjustment with ∆ = −(J > J)−1 J > r xt+1 = xt + ∆, Nonlinear Optimization Gradient Descent In contrast to the Newton algorithm, the Gauss-Newton algorithm does not require the computation of second derivatives. Moreover, the above approximation of the Hessian is by construction positive definite. This approximation of the Hessian is valid if Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary 2 ri ∂ri ∂ri ∂ ri , ∂xj ∂xk ∂xj ∂xk Example Applications This is the case if the residuum ri is small or if it is close to linear (in which case the second derivatives are small). updated May 23, 2019 18/23 Bundle Adjustment & Nonlinear Optimization The Levenberg-Marquardt Algorithm The Newton algorithm Prof. Daniel Cremers xt+1 = xt − H −1 g, can be modified (damped): xt+1 = xt − H + λIn −1 Optimality in Noisy Real World Conditions g, Bundle Adjustment Nonlinear Optimization to create a hybrid between the Newton method (λ = 0) and a gradient descent with step size 1/λ (for λ → ∞) . Gradient Descent In the same manner, Levenberg (1944) suggested to damp the Gauss-Newton algorithm for nonlinear least squares: Newton Methods xt+1 = xt + ∆, with ∆ = − J > J + λIn −1 J >r . Least Squares Estimation The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary Marquardt (1963) suggested a more adaptive component-wise damping of the form: Example Applications −1 > ∆ = − J > J + λdiag(J > J) J r, which avoids slow convergence in directions of small gradient. updated May 23, 2019 19/23 Summary Bundle adjustment was pioneered in the 1950s as a technique for structure and motion estimation in noisy real-world conditions. It aims at estimating the locations of N 3D points X j j and camera motions (Ri , Ti ), given noisy 2D projections x̃ i in m images. The assumption of zero-mean Gaussian noise on the 2D observations leads to the weighted nonlinear least squares problem: Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization Gradient Descent Least Squares Estimation E {Ri , Ti }i=1..m , {X j }j=1..N = m X N X j 2 θij x̃ i − π(Ri , Ti , X j ) , i=1 j=1 with θij = 1 if point j is visible in image i, θij = 0 else. Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary Solutions of this nonconvex problem can be computed by various iterative algorithms, most importantly the Gauss-Newton algorithm or its damped version, the Levenberg-Marquardt algorithm. Bundle adjustment is typically initialized by an algorithm such as the eight-point or five-point algorithm. Example Applications updated May 23, 2019 20/23 Example I: From Internet Photo Collections... Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization Gradient Descent Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary Example Applications Flickr images for search term “Notre Dame” Snavely, Seitz, Szeliski, “Modeling the world from Internet photo collections,” IJCV 2008. updated May 23, 2019 21/23 ...to Sparse Reconstructions Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization Gradient Descent Least Squares Estimation Newton Methods The Gauss-Newton Algorithm Snavely, Seitz, Szeliski, “Modeling the world from Internet photo collections,” IJCV 2008. The Levenberg-Marquardt Algorithm Summary Example Applications updated May 23, 2019 22/23 Example II: Realtime Structure and Motion Bundle Adjustment & Nonlinear Optimization Prof. Daniel Cremers Optimality in Noisy Real World Conditions Bundle Adjustment Nonlinear Optimization Gradient Descent Least Squares Estimation Newton Methods The Gauss-Newton Algorithm The Levenberg-Marquardt Algorithm Summary Example Applications Klein & Murray, “Parallel Tracking and Mapping (PTAM) for Small AR Workspaces,” ISMAR 2007. updated May 23, 2019 23/23 Direct Approaches to Visual SLAM Prof. Daniel Cremers Chapter 8 Direct Approaches to Visual SLAM Direct Methods Multiple View Geometry Summer 2019 Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Prof. Daniel Cremers Chair for Computer Vision and Artificial Intelligence Departments of Informatics & Mathematics Technical University of Munich updated June 13, 2019 1/33 Overview Direct Approaches to Visual SLAM Prof. Daniel Cremers 1 Direct Methods 2 Realtime Dense Geometry 3 Dense RGB-D Tracking Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency 4 Loop Closure and Global Consistency 5 Dense Tracking and Mapping Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry 6 Large Scale Direct Monocular SLAM 7 Direct Sparse Odometry updated June 13, 2019 2/33 Classical Approaches to Multiple View Reconstruction Direct Approaches to Visual SLAM Prof. Daniel Cremers In the past chapters we have studied classical approaches to multiple view reconstruction. These methods tackle the problem of structure and motion estimation (or visual SLAM) in several steps: Direct Methods 1 2 A set of feature points is extracted from the images – ideally points such as corners which can be reliably identified in subsequent images as well. One determines a correspondence of these points across the various images. This can be done either through local tracking (using optical flow approaches) or by random sampling of possible partners based on a feature descriptor (SIFT, SURF, etc.) associated with each point. 3 The camera motion is estimated based on a set of corresponding points. In many approaches this is done by a series of algorithms such as the eight-point algorithm or the five-point algorithm followed by bundle adjustment. 4 For a given camera motion one can then compute a dense reconstruction using stereo reconstruction methods. Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry updated June 13, 2019 3/33 Shortcomings of Classical Approaches Such classical approaches are indirect in the sense that they do not compute structure and motion directly from the images but rather from a sparse set of precomputed feature points. Despite a number of successes, they have several drawbacks: • From the point of view of statistical inference, they are suboptimal: In the selection of feature points much potentially valuable information contained in the colors of each image is discarded. • They invariably lack robustness: Errors in the point correspondence may have devastating effects on the estimated camera motion. Since one often selects very few point pairs only (8 points for the eight-point algorithm, 5 points for the five-point algorithm), any incorrect correspondence will lead to an incorrect motion estimate. • They do not address the highly coupled problems of motion estimation and dense structure estimation. They merely do so for a sparse set of points. As a consequence, improvements in the estimated dense geometry will not be used to improve the camera motion estimates. Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry updated June 13, 2019 4/33 Toward Direct Approaches to Multiview Reconstruction In the last few years, researchers have been promoting direct approaches to multi-view reconstruction. Rather than extracting a sparse set of feature points to determine the camera motion, direct methods aim at estimating camera motion and dense or semi-dense scene geometry directly from the input images. This has several advantages: • Direct methods tend to be more robust to noise and other nuisances because they exploit all available input information. • Direct methods provide a semi-dense geometric reconstruction of the scene which goes well beyond the sparse point cloud generated by the eight-point algorithm or bundle adjustment. Depending on the application, a separate dense reconstruction step may no longer be necessary. • Direct methods are typically faster because the feature-point extraction and correspondence finding is omitted: They can provide fairly accurate camera motion and scene structure in real-time on a CPU. Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry updated June 13, 2019 5/33 Feature-Based versus Direct Methods Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Engel, Sturm, Cremers, ICCV 2013 updated June 13, 2019 6/33 Direct Methods for Multi-view Reconstruction Direct Approaches to Visual SLAM Prof. Daniel Cremers In the following, we will briefly review several recent works on direct methods for realtime multiple-view reconstruction: • the method of Stühmer, Gumhold, Cremers, DAGM 2010 computes dense geometry from a handheld camera in real-time. • the methods of Steinbrücker, Sturm, Cremers, 2011 and Kerl, Sturm, Cremers, 2013 directly compute the camera motion of an RGB-D camera. • the method of Newcombe, Lovegrove, Davison, ICCV 2011 directly determines dense geometry and camera motion from the images. • the method of Engel, Sturm, Cremers, ICCV 2013 and Engel, Schöps, Cremers, ECCV 2014 directly computes camera motion and semi-dense geometry for a handheld (monocular) camera. • the method of Engel, Koltun, Cremers, PAMI 2018 directly estimates highly accurate camera motion and sparse geometry. Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry updated June 13, 2019 7/33 Realtime Dense Geometry from a Handheld Camera Let gi ∈ SE(3) be the rigid body motion from the first camera to the i-th camera, and let Ii : Ω → R be the i-th image. A dense depth map h : Ω → R can be computed by solving the optimization problem: min h n Z X i=2 Ω Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods I1 (x) − Ii πgi (hx) Z |∇h| dx, dx + λ Ω Realtime Dense Geometry Dense RGB-D Tracking where x is represented in homogeneous coordinates and hx is the corresponding 3D point. Loop Closure and Global Consistency Like in optical flow estimation, the unknown depth map should be such that for all pixels x ∈ Ω, the transformation into the other images Ii should give rise to the same color as in the reference image I1 . Large Scale Direct Monocular SLAM Dense Tracking and Mapping Direct Sparse Odometry This cost function can be minimized at framerate by coarse-to-fine linearization solved in parallel on a GPU. Stuehmer, Gumhold, Cremers, DAGM 2010. updated June 13, 2019 8/33 Realtime Dense Geometry from a Handheld Camera Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Input image Reconstruction Textured geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Textured reconstructions Untextured Stuehmer, Gumhold, Cremers, DAGM 2010. updated June 13, 2019 9/33 Dense RGB-D Tracking The approach of Stühmer et al. (2010) relies on a sparse feature-point based camera tracker (PTAM) and computes dense geometry directly on the images. Steinbrücker, Sturm, Cremers (2011) propose a complementary approach to directly compute the camera motion from RGB-D images. The idea is to compute the rigid body motion gξ which optimally aligns two subsequent color images I1 and I2 : Z 2 min I1 (x) − I2 πgξ (hx) dx ξ∈se(3) Ω Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry updated June 13, 2019 10/33 Direct Approaches to Visual SLAM Dense RGB-D Tracking The above non-convex problem can be approximated as a convex problem by linearizing the residuum around an initial guess ξ0 : Z E(ξ) ≈ I1 (x) − I2 πgξ0 (hx) − ∇I2> Ω dπ dgξ dgξ dξ 2 ξ dx This is a convex quadratic cost function which gives rise to a linear optimality condition: dE(ξ) = Aξ + b = 0 dξ To account for larger motions of the camera, this problem is solved in a coarse-to-fine manner. The linearization of the residuum is identical with a Gauss-Newton approach. It corresponds to an approximation of the Hessian by a positive definite matrix. Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Steinbrücker, Sturm, Cremers 2011 updated June 13, 2019 11/33 Dense RGB-D Tracking Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Steinbrücker, Sturm, Cremers 2011 updated June 13, 2019 12/33 Direct Approaches to Visual SLAM Dense RGB-D Tracking In the small-baseline setting, this image aligning approach provides more accurate camera motion than the commonly used generalized Iterated Closest Points (GICP) approach. Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM frame difference, sequence 1 frame difference, sequence 2 Direct Sparse Odometry Steinbrücker, Sturm, Cremers 2011 A related direct tracking approach was proposed for stereo reconstruction in Comport, Malis, Rives, ICRA 2007. A generalization which makes use of non-quadratic penalizers was proposed in Kerl, Sturm, Cremers, ICRA 2013. updated June 13, 2019 13/33 A Benchmark for RGB-D Tracking Accurately tracking the camera is among the most central challenges in computer vision. Quantitative performance of algorithms can be validated on benchmarks. Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Sturm, Engelhard, Endres, Burgard, Cremers, IROS 2012 updated June 13, 2019 14/33 A Benchmark for RGB-D Tracking Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Sturm, Engelhard, Endres, Burgard, Cremers, IROS 2012 updated June 13, 2019 15/33 Combining Photometric and Geometric Consistency Direct Approaches to Visual SLAM Prof. Daniel Cremers Kerl, Sturm, Cremers, IROS 2013 propose an extension of the RGB-D camera tracker which combines color consistency and geometric consistency of subsequent RGB-D images. Assuming that the vector ri = (rci , rzi ) ∈ R2 containing the color and geometric discrepancy for pixel i follows a bivariate t-distribution, the maximum likelihood pose estimate can be computed as: X min wi ri> Σ−1 ri , ξ∈R6 i with weights wi based on the student t-distribution: wi = ν+1 . ν + ri> Σ−1 ri Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry This nonlinear weighted least squares problem can be solved in an iteratively reweighted least squares manner by alternating a Gauss-Newton style optimization with a re-estimation of the weights wi and the matrix Σ. updated June 13, 2019 16/33 Direct Approaches to Visual SLAM Loop Closure and Global Consistency When tracking a camera over a longer period of time, errors tend to accumulate. While a single room may still be mapped more or less accurately, mapping a larger environment will lead to increasing distortions: Corridors and walls will no longer be straight but slightly curved. A remedy is to introduce pose graph optimization and loop closuring, a technique popularized in laser-based SLAM systems. The key idea is to estimate the relative camera motion ξˆij for any camera pair i and j in a certain neighborhood. Subsequently, one can determine a globally consistent camera trajectory ξ = {ξi }i=1..T by solving the nonlinear least squares problem min ξ X ξˆij − ξi ◦ ξj−1 > Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Σ−1 ξˆij − ξi ◦ ξj−1 , ij i∼j ˆ where Σ−1 ij denotes the uncertainty of measurement ξij . This problem can be solved using, for example, a Levenberg-Marquardt algorithm. updated June 13, 2019 17/33 Pose Graph Optimization and Loop Closure Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Kerl, Sturm, Cremers, IROS 2013 updated June 13, 2019 18/33 Dense Tracking and Mapping Direct Approaches to Visual SLAM Prof. Daniel Cremers Newcombe, Lovegrove & Davison (ICCV 2011) propose an algorithm which computes both the geometry of the scene and the camera motion from a direct and dense algorithm. They compute the inverse depth u = 1/h by minimizing a cost function of the form Z n Z x X min dx + λ ρ(x) |∇u| dx, I1 (x) − Ii πgi u u Ω Ω i=2 for fixed camera motions gi . The function ρ introduces an edge-dependent weighting assigning small weights in locations where the input images exhibit strong gradients: Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry ρ(x) = exp (−|∇Iσ (x)|α ) . The camera tracking is then performed with respect to the textured reconstruction in a manner similar to Steinbrücker et al. (2011). The method is initialized using feature point based stereo. updated June 13, 2019 19/33 Dense Tracking and Mapping Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Newcombe, Lovegrove & Davison (ICCV 2011) updated June 13, 2019 20/33 Large-Scale Direct Monocular SLAM Direct Approaches to Visual SLAM Prof. Daniel Cremers A method for real-time direct monocular SLAM is proposed in Engel, Sturm, Cremers, ICCV 2013 and Engel, Schöps, Cremers, ECCV 2014. It combines several contributions which make it well-suited for robust large-scale monocular SLAM: • Rather than tracking and putting into correspondence a sparse set of feature points, the method estimates a semi-dense depth map which associates an inverse depth with each pixel that exhibits sufficient gray value variation. • To account for noise and uncertainty each inverse depth value is associated with an uncertainty which is propagated and updated over time like in a Kalman filter. Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry • Since monocular SLAM is invariably defined up to scale only, we explicitly facilitate scaling of the reconstruction by modeling the camera motion using the Lie group of 3D similarity transformations Sim(3). • Global consistency is assured by loop closuring on Sim(3). updated June 13, 2019 21/33 Tracking by Direct sim(3) Image Alignment Since reconstructions from a monocular camera are only defined up to scale, Engel, Schöps, Cremers, ECCV 2014 account for rescaling of the environment by representing the camera motion as an element in the Lie group of 3D similarity transformations Sim(3) which is defined as: sR T Sim(3) = with R ∈ SO(3), T ∈ R3 , s ∈ R+ . 0 1 Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency One can minimize a nonlinear least squares problem X min wi ri2 (ξ), ξ∈sim(3) i Dense Tracking and Mapping Large Scale Direct Monocular SLAM where ri denotes the color residuum across different images and wi a weighting as suggested in Kerl et al. IROS 2013. Direct Sparse Odometry The above cost function can then be optimized by a weighted Gauss-Newton algorithm on the Lie group Sim(3): ξ (t+1) = ∆ξ ◦ ξ (t) , with ∆ξ = J > WJ −1 J >W r , J = ∂r ∂ξ updated June 13, 2019 22/33 Large-Scale Direct Monocular SLAM Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Engel, Schöps, Cremers, ECCV 2014 updated June 13, 2019 23/33 Large-Scale Direct Monocular SLAM Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Engel, Schöps, Cremers, ECCV 2014 updated June 13, 2019 24/33 Towards Direct Sparse Odometry Despite its popularity, LSD SLAM has several shortcomings: • While the pose graph optimization allows to impose global consistency, it merely performs a joint optimization of the extrinsic parameters associated with all keyframes. In contrast to a full bundle adjustment, it does not optimize the geometry. This is hard to do in realtime, in particular for longer sequences. • LSD SLAM actually optimizes two different cost functions for estimating geometry and camera motion. • LSD SLAM introduces spatial regularity by a spatial filtering of the inverse depth values. This creates correlations among the geometry parameters which in turn makes Gauss-Newton optimization difficult. • LSD SLAM is based on the assumption of brightness constancy. In real-world videos, brightness is often not preserved. Due to varying exposure time, vignette and gamma correction, the brightness can vary substantially. While feature descriptors are often invariant to these changes, the local brightness itself is not. Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry updated June 13, 2019 25/33 From Brightness Constancy to Irradiance Constancy Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Brightness variations due to vignette, gamma correction and exposure time can be eliminated by a complete photometric calibration: I(x) = G t V (x)B(x) Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry where the measured brightness I depends on the irradiance B, the vignette V , the exposure time t and the camera response function G (gamma function). G and V can be calibrated beforehand, t can be read out from the camera. Engel, Koltun, Cremers, PAMI 2018 updated June 13, 2019 26/33 Windowed Joint Optimization Direct Approaches to Visual SLAM Prof. Daniel Cremers A complete bundle adjustment over longer sequences is difficult to carry out in realtime because the number of 3D point coordinates may grow very fast over time. Furthermore new observations are likely to predominantly affect parameters associated with neighboring structures and cameras. For a given data set, one can study the connectivity graph, i.e. a graph where each node represents an image and two nodes are connected if they look at the same 3D structure. Direct Sparse Odometry therefore reverts to a windowed joint optimization, the idea being that from all 3D coordinates and camera frames only those in a recent time window are included. The remaining ones are marginalized out. Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry If one avoid spatial filtering and selects only a sparser subset of points, then the points can be assumed to be fairly independent. As a result the Hessian matrix becomes sparser and the Schur complement can be employed to make the Gauss-Newton updates more efficient. updated June 13, 2019 27/33 Effects of Spatial Correlation on the Hessian Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Engel, Koltun, Cremers, PAMI 2018 updated June 13, 2019 28/33 Effect of Spatial Correlation on the Hessian Matrix Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM geometry not correlated geometry correlated Direct Sparse Odometry Engel, Koltun, Cremers, PAMI 2018 updated June 13, 2019 29/33 Direct Approaches to Visual SLAM The Schur Complement Trick Solving the Newton update step (called normal equation) Hαα Hαβ xα gα Hx = = , > Hββ Hαβ xβ gβ Prof. Daniel Cremers Direct Methods for the unknowns xα and xβ is usually done by QR decomposition for large problems. In this case, however, Hββ is typically block diagonal (and thus easy to invert). Left-multiplication with the matrix −1 I −Hαβ Hββ , 0 I leads to: Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry S > Hαβ 0 Hββ xα xβ = −1 gα − Hαβ Hββ gβ gβ , −1 > where S = Hαα − Hαβ Hββ Hαβ is the Schur complement of Hββ in H. It is symmetric, positive definite and block structured. The equation Sxα = ... is the reduced camera system. updated June 13, 2019 30/33 Direct Sparse Odometry Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Engel, Koltun, Cremers, PAMI 2018 updated June 13, 2019 31/33 Direct Sparse Odometry Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry Engel, Koltun, Cremers, PAMI 2018 updated June 13, 2019 32/33 Quantitative Evaluation A quantitative comparison of Direct Sparse Odometry to the state-of-the-art keypoint based technique ORB SLAM shows substantial improvements in precision and robustness: Direct Approaches to Visual SLAM Prof. Daniel Cremers Direct Methods Realtime Dense Geometry Dense RGB-D Tracking Loop Closure and Global Consistency Dense Tracking and Mapping Large Scale Direct Monocular SLAM Direct Sparse Odometry # of runs with a given error in translation, rotation and scale drift. Engel, Koltun, Cremers, PAMI 2018 updated June 13, 2019 33/33 Variational Methods: A Short Intro Prof. Daniel Cremers Chapter 9 Variational Methods: A Short Intro Variational Methods Multiple View Geometry Summer 2019 Variational Image Smoothing Euler-Lagrange Equation Gradient Descent Adaptive Smoothing Euler and Lagrange Prof. Daniel Cremers Chair for Computer Vision and Artificial Intelligence Departments of Informatics & Mathematics Technical University of Munich updated June 25, 2019 1/13 Overview Variational Methods: A Short Intro Prof. Daniel Cremers 1 Variational Methods Variational Methods 2 Variational Image Smoothing Variational Image Smoothing Euler-Lagrange Equation 3 Euler-Lagrange Equation Gradient Descent Adaptive Smoothing Euler and Lagrange 4 Gradient Descent 5 Adaptive Smoothing 6 Euler and Lagrange updated June 25, 2019 2/13 Variational Methods Variational Methods: A Short Intro Prof. Daniel Cremers Variational methods are a class of optimization methods. They are popular because they allow to solve many problems in a mathematically transparent manner. Instead of implementing a heuristic sequence of processing steps (as was commonly done in the 1980’s), one clarifies beforehand what properties an ’optimal’ solution should have. Variational methods are particularly popular for infinite-dimensional problems and spatially continuous representations. Variational Methods Variational Image Smoothing Euler-Lagrange Equation Gradient Descent Adaptive Smoothing Euler and Lagrange Particular applications are: • Image denoising and image restoration • Image segmentation • Motion estimation and optical flow • Spatially dense multiple view reconstruction • Tracking updated June 25, 2019 3/13 Advantages of Variational Methods Variational Methods: A Short Intro Prof. Daniel Cremers Variational methods have many advantages over heuristic multi-step approaches (such as the Canny edge detector): • A mathematical analysis of the considered cost function allows to make statements on the existence and uniqueness of solutions. • Approaches with multiple processing steps are difficult to modify. All steps rely on the input from a previous step. Exchanging one module by another typically requires to re-engineer the entire processing pipeline. Variational Methods Variational Image Smoothing Euler-Lagrange Equation Gradient Descent Adaptive Smoothing Euler and Lagrange • Variational methods make all modeling assumptions transparent, there are no hidden assumptions. • Variational methods typically have fewer tuning parameters. In addition, the effect of respective parameters is clear. • Variational methods are easily fused – one simply adds respective energies / cost functions. updated June 25, 2019 4/13 Example: Variational Image Smoothing Variational Methods: A Short Intro Prof. Daniel Cremers Let f : Ω → R be a grayvalue input image on the domain Ω ⊂ R2 . We assume that the observed image arises by some ’true’ image corrupted by additive noise. We are interested in a denoised version u of the input image f . The approximation u should fulfill two properties: • It should be as similar as possible to f . • It should be spatially smooth (i.e. ’noise-free’). Both of these criteria can be entered in a cost function of the form E(u) = Edata (u, f ) + Esmoothness (u) Variational Methods Variational Image Smoothing Euler-Lagrange Equation Gradient Descent Adaptive Smoothing Euler and Lagrange The first term measures the similarity of f and u. The second one measures the smoothness of the (hypothetical) function u. Most variational approaches have the above form. They merely differ in the specific form of the data (similarity) term and the regularity (or smoothness) term. updated June 25, 2019 5/13 Example: Variational Image Smoothing For denoising a grayvalue image f : Ω ⊂ R2 → R, specific examples of data and smoothness term are: Z 2 Edata (u, f ) = u(x) − f (x) dx, Ω Variational Methods: A Short Intro Prof. Daniel Cremers Variational Methods Variational Image Smoothing and Z Esmoothness (u) = |∇u(x)|2 dx, Ω > where ∇ = (∂/∂x, ∂/∂y ) denotes the spatial gradient. Euler-Lagrange Equation Gradient Descent Adaptive Smoothing Euler and Lagrange Minimizing the weighted sum of data and smoothness term Z Z 2 E(u) = u(x) − f (x) dx + λ |∇u(x)|2 dx, λ > 0, leads to a smooth approximation u : Ω → R of the input image. Such energies which assign a real value to a function are called a functionals. How does one minimize functionals where the argument is a function u(x) (rather than a finite number of parameters)? updated June 25, 2019 6/13 Functional Minimization & Euler-Lagrange Equation Variational Methods: A Short Intro Prof. Daniel Cremers • As a necessary condition for minimizers of a functional the associated Euler-Lagrange equation must hold. For a functional of the form Z E(u) = L(u, u 0 ) dx, Variational Methods Variational Image Smoothing Euler-Lagrange Equation it is given by Gradient Descent ∂L d ∂L dE = − =0 du ∂u dx ∂u 0 Adaptive Smoothing Euler and Lagrange • The central idea of variational methods is therefore to determine solutions of the Euler-Lagrange equation of a given functional. For general non-convex functionals this is a difficult problem. • Another solution is to start with an (appropriate) function u0 (x) and to modify it step by step such that in each iteration the value of the functional is decreased. Such methods are called descent methods. updated June 25, 2019 7/13 Gradient Descent Variational Methods: A Short Intro Prof. Daniel Cremers One specific descent method is called gradient descent or steepest descent. The key idea is to start from an initialization u(x, t = 0) and iteratively march in direction of the negative energy gradient. Variational Methods For the class of functionals considered above, the gradient descent is given by the following partial differential equation: u(x, 0) = u0 (x) ∂u(x, t) = − dE = − ∂L + d ∂L . ∂t du ∂u dx ∂u 0 Variational Image Smoothing Euler-Lagrange Equation Gradient Descent Adaptive Smoothing Euler and Lagrange 2 Specifically for L(u, u 0 ) = 21 u(x) − f (x) + λ2 |u 0 (x)|2 this means: ∂u = (f − u) + λu 00 . ∂t If the gradient descent evolution converges: ∂u/∂t = − dE du = 0, then we have found a solution for the Euler-Lagrange equation. updated June 25, 2019 8/13 Variational Methods: A Short Intro Image Smoothing by Gradient Descent Prof. Daniel Cremers Variational Methods Variational Image Smoothing 2 R E(u) = (f − u) dx + λ R 2 |∇u| dx → min. Euler-Lagrange Equation Gradient Descent Adaptive Smoothing Euler and Lagrange E(u) = R |∇u|2 dx → min. Author: D. Cremers updated June 25, 2019 9/13 Discontinuity-preserving Smoothing Variational Methods: A Short Intro Prof. Daniel Cremers Variational Methods Variational Image Smoothing E(u) = R 2 |∇u| dx → min. Euler-Lagrange Equation Gradient Descent Adaptive Smoothing Euler and Lagrange E(u) = R |∇u| dx → min. Author: D. Cremers updated June 25, 2019 10/13 Discontinuity-preserving Smoothing Variational Methods: A Short Intro Prof. Daniel Cremers Variational Methods Variational Image Smoothing Euler-Lagrange Equation Gradient Descent Adaptive Smoothing Euler and Lagrange updated June 25, 2019 11/13 Variational Methods: A Short Intro Leonhard Euler Prof. Daniel Cremers Variational Methods Variational Image Smoothing Euler-Lagrange Equation Gradient Descent Adaptive Smoothing Euler and Lagrange Leonhard Euler (1707 – 1783) • Published 886 papers and books, most of these in the last 20 years of his life. He is generally considered the most influential mathematician of the 18th century. • Contributions: Euler number, Euler angle, Euler formula, Euler theorem, Euler equations (for liquids), Euler-Lagrange equations,... • 13 children updated June 25, 2019 12/13 Joseph-Louis Lagrange Variational Methods: A Short Intro Prof. Daniel Cremers Variational Methods Variational Image Smoothing Euler-Lagrange Equation Gradient Descent Adaptive Smoothing Euler and Lagrange Joseph-Louis Lagrange (1736 – 1813) • born Giuseppe Lodovico Lagrangia (in Turin). Autodidact. • At the age of 19: Chair for mathematics in Turin. • Later worked in Berlin (1766-1787) and Paris (1787-1813). • 1788: La Méchanique Analytique. • 1800: Leçons sur le calcul des fonctions. updated June 25, 2019 13/13 Variational Multiview Reconstruction Prof. Daniel Cremers Chapter 10 Variational Multiview Reconstruction Multiple View Geometry Summer 2019 Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video Prof. Daniel Cremers Chair for Computer Vision and Artificial Intelligence Departments of Informatics & Mathematics Technical University of Munich updated June 25, 2019 1/20 Overview Variational Multiview Reconstruction Prof. Daniel Cremers 1 Shape Representation and Optimization Shape Representation and Optimization Variational Multiview Reconstruction 2 Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video 3 Super-resolution Texture Reconstruction 4 Space-Time Reconstruction from Multiview Video updated June 25, 2019 2/20 Shape Optimization Shape optimization is a field of mathematics that is focused on formulating the estimation of geometric structures by means of optimization methods. Among the major challenges in this context is the question how to mathematically represent shape. The choice of representation entails a number of consequences, in particular regarding the question of how efficiently one can store geometric structures and how efficiently one can compute optimal geometry. Variational Multiview Reconstruction Prof. Daniel Cremers Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video There exist numerous representations of shape which can loosely be grouped into two classes: • Explicit representations: The points of a surface are represented explicitly (directly), either as a set of points, a polyhedron or a parameterized surface. • Implicit representations: The surface is represented implicity by specifying the parts of ambient space that are inside and outside a given surface. updated June 25, 2019 3/20 Variational Multiview Reconstruction Explicit Shape Representations An explicit representations of a closed curve C ⊂ Rd is a mapping C : S1 → Rd from the circle S1 to Rd . Examples are polygons or – more generally – spline curves: C(s) = N X Ci Bi (s), i=1 where C1 , . . . , CN ∈ Rd denote control points and B1 , . . . , BN : S1 → R denote a set of spline basis functions: basis functions Prof. Daniel Cremers Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video spline & control points updated June 25, 2019 4/20 Explicit Shape Representations Variational Multiview Reconstruction Prof. Daniel Cremers Splines can be extended from curves to surfaces or higher dimensional structures. A spline surface S ⊂ Rd can be defined as: X S(s, t) = Ci,j Bi (s)Bj (t), i,j where Ci,j ∈ Rd denote control points and B1 , . . . , BN : [0, 1] → R denote a set of spline basis functions. Depending on whether the surface is closed or open these basis functions will have a cyclic nature (as below) or not: basis functions Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video spline surface & cntrl. points updated June 25, 2019 5/20 Implicit Shape Representations Variational Multiview Reconstruction Prof. Daniel Cremers One example of an implicit representation is the indicator function of the surface S, which is a function u : V → {0, 1} defined on the surrounding volume V ⊂ R3 that takes on the values 1 inside the surface and 0 outside the surface: 1, if x ∈ int(S) u(x) = 0, if x ∈ ext(S) Another example is the signed distance function φ : V → R which assigns all points in the surrounding volume the (signed) distance from the surface S: +d(x, S), if x ∈ int(S) φ(x) = −d(x, S), if x ∈ ext(S) Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video Depending on the application it may be useful to know for every voxel how far it is from the surface. Signed distance functions can be computed in polynomial time. MatLab: bwdist. updated June 25, 2019 6/20 Explicit Versus Implicit Representations Variational Multiview Reconstruction Prof. Daniel Cremers In general, compared to explicit rerpresentations the implicit representations have the following strengths and weaknesses: - Implicit representations typically require more memory in order to represent a geometric structure at a specific resolution. Rather than storing a few points along the curve or surface, one needs to store an occupancy value for each volume element. - Moving or updating an implicit representation is typically slower: rather than move a few control points, one needs to update the occupancy of all volume elements. Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video + Methods based on implicit representations do not depend on a choice of parameterization. + Implicit representations allow to represent objects of arbitrary topology (i.e. the number of holes is arbitrary). + With respect to an implicit representation many shape optimization challenges can be formulated as convex optimization problems and can then be optimized globally. updated June 25, 2019 7/20 Multiview Reconstruction as Shape Optimization Variational Multiview Reconstruction Prof. Daniel Cremers How can we cast multiple view reconstruction as a shape optimization problem? To this end, we will assume that the camera orientations are given. Rather than estimate the correspondence between all pairs of pixels in either image we will simply ask: How likely is a given voxel x on the object surface S? 3 If the voxel x ∈ V of the given volume V ⊂ R was on the surface then (up to visibility issues) the projection of that voxel into each image should give rise to the same color (or local texture). Thus we can assign to each voxel x ∈ V a so-called photoconsistency function Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video ρ : V → [0, 1], which takes on low values (near 0) if the projected voxels give rise to the same color (or local texture) and high values (near 1) otherwise. updated June 25, 2019 8/20 A Weighted Minimal Surface Approach Variational Multiview Reconstruction Prof. Daniel Cremers The reconstruction from multiple views can now be formulated as finding the maximally photoconsistent surface, i.e. a surface Sopt with an overall minimal photoconsistency score: Z Sopt = arg min ρ(s)ds. (1) S S This seminal formulation was proposed among others by Faugeras & Keriven (1998). Many good reconstructions were computed by starting from an initial guess of S and locally minimizing this energy using gradient descent. But can we compute the global minimum? Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video The above energy has a central drawback: The global minimizer of (1) is the empty set. It has zero cost while all surfaces have a non-negative energy. This short-coming of minimal surface formulations is often called the shrinking bias. How can we prevent the empty set? updated June 25, 2019 9/20 Variational Multiview Reconstruction Imposing Silhouette Consistency Assume that we additionally have the silhouette Si of the observed 3D object outlined in every image i = 1, . . . , n. Then we can formulate the reconstruction problem as a constrained optimization problem (Cremers, Kolev, PAMI 2011): Z min ρ(s)ds, such that πi (S) = Si ∀i = 1, . . . , n. S S Prof. Daniel Cremers Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Written in the indicator function u : V → {0, 1} of the surface S this reads: Z min ρ(x)|∇u(x)| dx u:V →{0,1} Space-Time Reconstruction from Multiview Video V Z u(x) dRij ≥ 1, if j ∈ Si s. t. (∗) Rij Z u(x) dRij = 0, if j ∈ / Si , Rij where Rij denotes the visual ray through pixel j of image i. updated June 25, 2019 10/20 Variational Multiview Reconstruction Imposing Silhoutte Consistency Rij Prof. Daniel Cremers S Si Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video Top view of the geometry and respective visual rays. Any ray passing through the silhoutte must intersect the object in at least one voxel. Any ray passing outside the silhouette may not intersect the object in any pixel. Cremers, Kolev, PAMI 2011 updated June 25, 2019 11/20 Variational Multiview Reconstruction Convex Relaxation and Thresholding Prof. Daniel Cremers By relaxing the binarity constraint on u and allowing intermediate values between 0 and 1 for the function u, the optimization problem (∗) becomes convex. Proposition Shape Representation and Optimization The set Variational Multiview Reconstruction D := R u : V → [0, 1] Rij R Rij u(x) dRij ≥ 1 if j ∈ Si ∀i, j u(x) dRij = 0 if j ∈ / Si ∀i, j Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video of silhouette consistent functions is convex. Proof. For a proof we refer to Kolev, Cremers, ECCV 2008. Thus we can compute solutions to the silhouette constrained reconstruction problem by solving the relaxed convex problem and subsequently thresholding the computed solution. updated June 25, 2019 12/20 Reconstructing Complex Geometry Variational Multiview Reconstruction Prof. Daniel Cremers Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video 3 out of 33 input images of resolution 1024 × 768 Data courtesy of Y. Furukawa. updated June 25, 2019 13/20 Reconstructing Complex Geometry Variational Multiview Reconstruction Prof. Daniel Cremers Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video Estimated multiview reconstruction Cremers, Kolev, PAMI 2011 updated June 25, 2019 14/20 Reconstruction from a Handheld Camera Variational Multiview Reconstruction Prof. Daniel Cremers Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video 2/28 images Estimated multiview reconstruction Cremers, Kolev, PAMI 2011 updated June 25, 2019 15/20 Multi-view Texture Reconstruction In addition to the dense geometry S, we can also recover the texture T : S → R3 of the object from the images Ii : Ωi → R3 . Rather than simply back-projecting respective images onto the surface, Goldlücke & Cremers ICCV 2009 suggest to solve a variational super-resolution approach of the form: Z n Z 2 X b ∗ T ◦ πi−1 − Ii dx + λ k∇S T kds, min T :S→R3 i=1 Ω i S where b is a linear operator representing blurring and downsampling and πi denotes the projection onto image Ωi : Variational Multiview Reconstruction Prof. Daniel Cremers Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video updated June 25, 2019 16/20 Multi-view Texture Reconstruction The super-resolution texture estimation is a convex optimization problem which can be solved efficiently. It generates a textured model of the object which cannot be distinguished from the original: Variational Multiview Reconstruction Prof. Daniel Cremers Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video One of 36 input images textured 3D model Goldlücke, Cremers, ICCV 2009, DAGM 2009 updated June 25, 2019 17/20 Multi-view Texture Reconstruction The super-resolution approach exploits the fact that every surface patch is observed in multiple images. It allows to invert the blurring and downsampling, providing a high-resolution texturing which is sharper than the individual input images: Variational Multiview Reconstruction Prof. Daniel Cremers Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video input image close-up super-resolution texture Goldlücke, Cremers, ICCV 2009, DAGM 2009 updated June 25, 2019 18/20 Space-Time Reconstruction from Multi-view Video Although laser-based reconstruction is often more accurate and more reliable (in the absence of texture), image-based reconstruction has two advantages: Variational Multiview Reconstruction Prof. Daniel Cremers • One can extract geometry and color of the objects. • On can reconstruct actions over time filmed with multiple synchronized cameras. Oswald & Cremers 4DMOD 2013 and Oswald, Stühmer, Cremers, ECCV 2014, propose convex variational approaches for dense space-time reconstruction from multi-view video. 1/16 input videos Shape Representation and Optimization Variational Multiview Reconstruction Super-resolution Texture Reconstruction Space-Time Reconstruction from Multiview Video Dense reconstructions over time Oswald, Stühmer, Cremers, ECCV 2014 updated June 25, 2019 19/20 Toward Free-Viewpoint Television Space-time action reconstructions as done in Oswald & Cremers 2013 entail many fascinating applications, including: • For video conferencing one can transmit a full 3D model of a speaker which gives stronger sense of presence and immersion. Variational Multiview Reconstruction Prof. Daniel Cremers Shape Representation and Optimization • For sports analysis one can analyze the precise motion of a gymnast. Variational Multiview Reconstruction • For free viewpoint television, the spectator at home can interactively chose from which viewpoint to follow an action. Space-Time Reconstruction from Multiview Video Super-resolution Texture Reconstruction Textured action reconstruction for free-viewpoint television Oswald, Cremers, 4DMOD 2013 updated June 25, 2019 20/20