Geometry Slides (part 4) - Weizmann Institute of Science

advertisement
Geometry 4:
Multiview Stereo
Introduction to Computer Vision
Ronen Basri
Weizmann Institute of Science
Material covered
• Pinhole camera model, perspective projection
• Two view geometry, general case:
• Epipolar geometry, the essential matrix
• Camera calibration, the fundamental matrix
• Two view geometry, degenerate cases
• Homography (planes, camera rotation)
• A taste of projective geometry
• Stereo vision: 3D reconstruction from two views
• Multi-view geometry, reconstruction through
factorization
Structure from motion
• Input:
• a set of point tracks
• Output:
• 3D location of each point (shape)
• camera parameters (motion)
• Assumptions:
• Rigid motion
• Orthographic projection (no scale)
• Method: SVD factorization (Tomasi & Kanade)
Setup
• 𝐼1 , 𝐼2 , … , 𝐼𝑓 : a collection of images (video frames)
depicting a rigid scene
• 𝑝 point tracks in those 𝑓 frames: 𝑝𝑖𝑗 = (π‘₯𝑖𝑗 , 𝑦𝑖𝑗 )𝑇
the location of 𝑃𝑗 at frame 𝑖
• Unknown 3D locations:
𝑃𝑗 = (𝑋𝑗 , π‘Œπ‘— , 𝑍𝑗 )𝑇 ∈ ℝ3 , 𝑗 = 1, … , 𝑝
• Therefore,
π‘₯𝑖𝑗 = 𝒓𝑖 𝑇 𝑃𝑗 + 𝑐𝑖
𝑦𝑖𝑗 = 𝒔𝑖 𝑇 𝑃𝑗 + 𝑑𝑖
𝒓𝑖 𝑇 , 𝒔𝑖 𝑇 are the two top rows of a 3 × 3 rotation
matrix
Objective
Find 𝒓𝑖 𝒔𝑖 ∈ ℝ3 and 𝑐𝑖 , 𝑑𝑖 ∈ ℝ that minimize
𝑓
𝑝
𝑇
(𝒓𝑖 𝑃𝑗 + 𝑐𝑖 ) − π‘₯𝑖𝑗
2
𝑇
+ (𝒔𝑖 𝑃𝑗 + 𝑑𝑖 ) − 𝑦𝑖𝑗
𝑖=1 𝑗=1
Subject to
𝒓𝑖 = 𝒔𝑖 = 1
𝒓𝑖 𝑇 𝒔𝑖 = 0
2
Eliminate translation
• We can eliminate translation by representing the
location of each point relative to the centroids of all
𝑝 points
• Assume without loss of generality that the centroid
of 𝑃1 , … , 𝑃𝑝 coincides with the origin 𝟎 ∈ ℝ3
• Translate each image point by setting
π‘₯𝑖𝑗 = π‘₯𝑖𝑗 − π‘₯𝑖
𝑦𝑖𝑗 = 𝑦𝑖𝑗 − 𝑦𝑖
(π‘₯𝑖 , 𝑦𝑖 ) denotes the centroid of (π‘₯𝑖𝑗 , 𝑦𝑖𝑗 )
Objective (no translation)
Find 𝒓𝑖 𝒔𝑖 ∈ ℝ3 that minimize
𝑓
𝑝
𝑇
𝒓𝑖 𝑃𝑗 − π‘₯𝑖𝑗
2
𝑇
+ 𝒔𝑖 𝑃𝑗 − 𝑦𝑖𝑗
𝑖=1 𝑗=1
Subject to
𝒓𝑖 = 𝒔𝑖 = 1
𝒓𝑖 𝑇 𝒔𝑖 = 0
2
Measurement matrix
𝑀=
π‘₯11
…
π‘₯𝑓1
𝑦11
..
𝑦𝑓1
π‘₯12
.
.
.
π‘₯𝑓2
𝑦12
.
.
.
.
.
.
𝑦𝑓2
.
.
.
π‘₯1𝑝
…
π‘₯𝑓𝑝
𝑦1𝑝
…
𝑦𝑓𝑝
2𝑓×𝑝
Transformation and shape matrices
𝑇=
𝒓1 𝑇
…
𝒓𝑓 𝑇
𝒔1 𝑇
…
𝒔𝑓 𝑇
𝑋1
𝑆 = π‘Œ1
𝑍1
=
𝑋2
π‘Œ2
𝑍2
π‘Ÿ11
…
π‘Ÿπ‘“1
𝑠11
…
𝑠𝑓1
.
π‘Ÿ12
π‘Ÿ13
…
π‘Ÿπ‘“3
𝑠13
…
𝑠𝑓3
π‘Ÿπ‘“2
𝑠12
𝑠𝑓2
.
.
𝑋𝑝
π‘Œπ‘
𝑍𝑝
2𝑓×3
3×𝑝
Objective: matrix notation
Find 𝑇 and 𝑆 that minimize
𝑀 − 𝑇𝑆
𝐹
Subject to
𝒓𝑖 = 𝒔𝑖 = 1
𝒓𝑖 𝑇 𝒔𝑖 = 0
𝑀 is 2𝑓 × π‘, 𝑇 is 2𝑓 × 3, 𝑆 is 3 × π‘
𝑀 = 𝑇𝑆 + Noise
π‘₯11 π‘₯12
…
π‘₯𝑓1 π‘₯𝑓2
𝑦11 𝑦12
..
𝑦𝑓1 𝑦𝑓2
π‘Ÿ11 π‘Ÿ12
…
π‘Ÿπ‘“1 π‘Ÿπ‘“2
= 𝑠
11 𝑠12
…
𝑠𝑓1 𝑠𝑓2
.
.
.
.
.
.
.
.
.
.
.
.
π‘Ÿ13
…
π‘Ÿπ‘“3
𝑠13
…
𝑠𝑓3
π‘₯1𝑝
…
π‘₯𝑓𝑝
𝑦1𝑝
…
𝑦𝑓𝑝
𝑋1
π‘Œ1
𝑍1
2𝑓×3
2𝑓×𝑝
…
…
𝑋𝑝
π‘Œπ‘
𝑍𝑝
+ Noise
3×𝑝
TK-Factorization
𝑀 = 𝑇𝑆 + Noise
Step 1: find rank 3 approximation to 𝑀 using SVD
𝑀 = π‘ˆΣ𝑉 𝑇
where
π‘ˆ is 2𝑓 × 2𝑓, π‘ˆ 𝑇 π‘ˆ = 𝐼,
Σ = diag(𝜎1 , 𝜎2 , … ), size 2𝑓 × π‘, and
𝜎1 ≥ 𝜎2 ≥ β‹― ≥ 0
𝑉 is 𝑝 × π‘, 𝑉 𝑇 𝑉 = 𝐼
TK-Factorization
𝑀 = π‘ˆΣ3 𝑉 𝑇
where Σ3 = diag(𝜎1 , 𝜎2 , 𝜎3 , 0, 0, … )
Note: this is a relaxation, only noise components
outside the 3D space are annihilated
Step 2: factorization
𝑇 = π‘ˆ Σ3
Ambiguity:
𝑆=
Σ3 𝑉 𝑇
𝑀 = (𝑇𝐴)(𝐴−1 𝑆)
for any non-singular, 3 × 3 matrix 𝐴
TK-Factorization
Step 3: resolve ambiguity
𝒓𝑖 = 𝒔𝑖 = 1
𝒓𝑖 𝑇 𝒔𝑖 = 0
𝒓𝑖 𝑇
Let 𝑅𝑖 =
𝒔𝑖 𝑇
𝒓𝑖 𝑇
Let 𝑇𝑖 =
𝒔𝑖 𝑇
, note that 𝑅𝑖 𝑅𝑖 𝑇 = 𝐼
2×3
be the corresponding rows in 𝑇, then
2×3
𝑅𝑖 = 𝑇𝑖 𝐴
Find a 3 × 3 symmetric matrix 𝐴𝐴𝑇
𝑇
𝑇
𝑇𝑖 𝐴𝐴 𝑇𝑖 = 𝑅𝑖 𝑅𝑖 𝑇 = 𝐼
TK-Factorization
𝑇
𝑇𝑖 𝐴𝐴 𝑇𝑖 = 𝑅𝑖 𝑅𝑖 𝑇 = 𝐼
• Equation is linear in 𝐴𝐴𝑇
• There are 3𝑓 equations in 6 unknowns
• Find 𝐴 by eigen-decomposition
𝐴𝐴𝑇 = π‘Šβˆ†π‘Š 𝑇
so that
𝐴=π‘Š βˆ†
• Solution is obtained up to a rotation ambiguity
𝑇
𝑇
𝑇
𝑇𝑖 (𝐴𝐡)(𝐡 𝐴 )𝑇𝑖
such that 𝐡𝐡𝑇 = 𝐼
𝑇
TK-Factorization: Summary
1. Eliminate translation, construct 𝑀
2. 𝑆𝑉𝐷(𝑀) to get rank 3 𝑀 and factorize
𝑀 = 𝑇𝑆 (3 × 3 ambiguity 𝐴 remains)
3. Resolve ambiguity: estimate 𝐴𝐴𝑇 by
exploiting orthonormality of each rotation,
then factorize to obtain 𝐴
Final solution up to rotation and reflection
TK-Factorization: pros and cons
• Advantages:
• Breaks a difficult, non-linear optimization into
simple optimization steps
• Works well with errors
• Disadvantage:
• Orthographic projection
• Requires complete tracks
Factorization with incomplete tracks
• Need a way to approximate by a low rank
matrix with missing data
min π‘Š βŠ™ (𝑋 − 𝑀)
rank 𝑋 =3
π‘Š a mask, π‘Šπ‘–π‘— = 1 wherever 𝑀𝑖𝑗 is known
• This problem is NP-hard
• Surrogate: minimize the nuclear norm – sum
of singular values, 𝜎1 + 𝜎2 + 𝜎3 + β‹―
• Nuclear norm is convex, minimization often
achieves low rank
• Better iterative procedures exist
Perspective multiview stereo
• A point 𝑃 = (𝑋, π‘Œ, 𝑍) is projected to
𝑓𝑋
π‘“π‘Œ
π‘₯=
𝑦=
𝑍
𝑍
• A point rotated by 𝑅 and translated by 𝒕
projects to
𝑓(𝒓2 𝑇 𝑃 + 𝑑𝑦 )
𝑓(𝒓1 𝑇 𝑃 + 𝑑π‘₯ )
π‘₯=
𝑦=
𝑇
𝒓3 𝑃 + 𝑑𝑧
𝒓3 𝑇 𝑃 + 𝑑𝑧
𝒓𝑖 𝑇 denotes the rows of 𝑅
Bundle adjustment
• Given 𝑝 points in 𝑓 frames (π‘₯𝑖𝑗 , 𝑦𝑖𝑗 ) find camera
matrices 𝐢𝑖 and positions 𝑃𝑗 that minimize
𝑓
𝑝
𝑖=1 𝑗=1
𝑇
𝑓(𝒓𝑖1 𝑃𝑗 + 𝑑π‘₯ )
− π‘₯𝑖𝑗
𝑇
𝒓𝑖3 𝑃𝑗 + 𝑑𝑧
2
𝑇
𝑓(𝒓𝑖2 𝑃𝑗 + 𝑑𝑦 )
+
− 𝑦𝑖𝑗
𝑇
𝒓𝑖3 𝑃𝑗 + 𝑑𝑧
• Alternate optimization
• Given 𝑅𝑖 and π’•π’Š , solve for 𝑃𝑗
• Given 𝑃𝑗 solve for 𝑅𝑖 and π’•π’Š
• Very good initial guess is required
2
Bundler (Photo Tourism)
(Snavely et al.)
Bundler (Photo Tourism)
• Given images, identify feature points,
describe them with SIFTs
• Match SIFTs, accept each match 𝑝𝑖 ↔ 𝑝𝑗
whose score is at least twice of any other
match 𝑝𝑖 ↔ π‘π‘˜
• For every pair of images with sufficiently
many matches use RANSAC to recover
Essential matrices
• Starting with two images and adding one
image at a time: use essential matrix to
recover depth and apply bundle adjustment
Simultaneous solutions
• 𝐸𝑖𝑗 = 𝒕𝑖𝑗 𝑅𝑖𝑗 : Essential matrix between 𝐼𝑖 and 𝐼𝑗 ,
×
𝑖, 𝑗 = 1, … , 𝑓, available on a subset of image pairs
• Objective: recover camera orientation 𝑅𝑖 and
location 𝒕𝑖 relative to a global coordinate system
• First step: recover rotations:
𝑇
min 𝑅𝑖𝑗 − 𝑅𝑖 𝑅𝑗
𝑅𝑖
𝐹
• This can be solved in various ways, for example
min 𝑅𝑖𝑗 𝑅𝑗 − 𝑅𝑖 : least squares solution if we
𝑅𝑖
𝐹
ignore the orthonormality constraints for 𝑅𝑖
Epipolar relation in global coordinates
• The epipolar line relation, 𝑝𝑇 𝐸𝑖𝑗 π‘ž = 0 can be
written in a global coordinate system as follows
𝑝𝑇 𝑅𝑖 𝑇 𝒕𝑖 × − 𝒕𝑗
𝑅𝑗 π‘ž = 0
×
• This generalizes the formula for the essential matrix
(plug in 𝑅𝑖 = 𝐼, 𝒕𝑖 = 𝟎)
• Once camera orientations 𝑅𝑖 are known we can
solve for camera locations (equation is linear and
homogeneous in the translation components)
• Solution suffers from shrinkage problems
Multiview reconsruction
Download