Slides

advertisement
Epipolar lines
epipolar plane
epipolar lines
epipolar lines
O
Baseline
𝑝′ 𝑇 𝐸𝑝 = 0
O’
Rectification
• Rectification: rotation and scaling of each
camera’s coordinate frame to make the
epipolar lines horizontal and equi-height,
by bringing the two image planes to be
parallel to the baseline
• Rectification is achieved by applying
homography to each of the two images
Rectification
𝐻𝑙
O
π»π‘Ÿ
Baseline
π‘ž′ 𝑇 𝐻𝑙 −𝑇 πΈπ»π‘Ÿ −1 π‘ž = 0
O’
Cyclopean coordinates
• In a rectified stereo rig with baseline of length 𝑏,
we place the origin at the midpoint between the
camera centers.
• a point 𝑋, π‘Œ, 𝑍 is projected to:
–
–
𝑓(𝑋−𝑏/2)
π‘“π‘Œ
Left image: π‘₯𝑙 =
, 𝑦𝑙 =
𝑍
𝑍
𝑓(𝑋+𝑏/2)
π‘“π‘Œ
Right image: π‘₯π‘Ÿ =
, π‘¦π‘Ÿ =
𝑍
𝑍
• Cyclopean coordinates:
𝑏(π‘₯π‘Ÿ + π‘₯𝑙 )
𝑋=
,
2(π‘₯π‘Ÿ − π‘₯𝑙 )
𝑏(π‘¦π‘Ÿ + 𝑦𝑙 )
Y=
,
2(π‘₯π‘Ÿ − π‘₯𝑙 )
𝑓𝑏
𝑍=
π‘₯π‘Ÿ − π‘₯𝑙
Disparity
𝑓𝑏
π‘₯π‘Ÿ − π‘₯𝑙 =
𝑍
• Disparity is inverse proportional to depth
• Constant disparity ⟺ constant depth
• Larger baseline, more stable reconstruction of depth
(but more occlusions, correspondence is harder)
(Note that disparity is defined in a rectified rig in a
cyclopean coordinate frame)
Random dot stereogram
• Depth can be perceived from a random dot pair of
images (Julesz)
• Stereo perception is based solely on local information
(low level)
Moving random dots
Compared elements
• Pixel intensities
• Pixel color
• Small window (e.g. 3 × 3 or 5 × 5), often
using normalized correlation to offset gain
• Features and edges (less common)
• Mini segments
Dynamic programming
• Each pair of epipolar lines is compared
independently
• Local cost, sum of unary term and binary term
– Unary term: cost of a single match
– Binary term: cost of change of disparity
(occlusion)
• Analogous to string matching (‘diff’ in Unix)
String matching
• Swing → String
S
Start
t
r
i
n
g
S
w
i
n
g
End
String matching
• Cost: #substitutions + #insertions + #deletions
S
t
r
i
n
g
S
w
i
n
g
Dynamic Programming
• Shortest path in a grid
• Diagonals: constant disparity
• Moving along the diagonal – pay unary cost
(cost of pixel match)
• Move sideways – pay binary cost, i.e. disparity
change (occlusion, right or left)
• Cost prefers fronto-parallel planes. Penalty is
paid for tilted planes
Dynamic Programming
Start
𝑇𝑖𝑗 = max(𝑇𝑖−1,𝑗 + 𝐢𝑖−1,𝑗→𝑖,𝑗 ,
𝑇𝑖−1,𝑗−1 + 𝐢𝑖−1,𝑗−1→𝑖,𝑗 ,
𝑇𝑖−1,𝑗−1 + 𝐢𝑖,𝑗−1→𝑖,𝑗 )
Complexity?
Probability interpretation: Viterbi algorithm
• Markov chain
• States: discrete set of disparity
𝑛
𝑃 𝑑1 , … , 𝑑𝑛 = 𝑃𝑖 (𝑑1 )
𝑃𝑖 𝑑𝑖 𝑃𝑖−1,𝑖 (𝑑𝑖−1 , 𝑑𝑖 )
𝑖=2
• Log probabilities: product ⟹ sum
Probability interpretation: Viterbi algorithm
• Markov chain
• States: discrete set of disparity
− log 𝑃 𝑑1 , … , 𝑑𝑛
𝑛
= − log 𝑃𝑖 𝑑1 −
(log𝑃𝑖 𝑑𝑖 + log𝑃𝑖−1,𝑖 𝑑𝑖−1 , 𝑑𝑖 )
𝑖=2
• Maximum likelihood: minimize sum of negative logs
• Viterbi algorithm: equivalent to shortest path
Dynamic Programming: Pros and Cons
• Advantages:
– Simple, efficient
– Achieves global optimum
– Generally works well
• Disadvantages:
Dynamic Programming: Pros and Cons
• Advantages:
– Simple, efficient
– Achieves global optimum
– Generally works well
• Disadvantages:
– Works separately on each epipolar line,
does not enforce smoothness across epipolars
– Prefers fronto-parallel planes
– Too local? (considers only immediate neighbors)
Markov Random Field
• Graph 𝐺 = 𝑉, 𝐸
In our case: graph is
a 4-connected grid
representing one image
• States: disparity
• Minimize energy of the form
𝐸(π’Ÿ) =
𝑉𝑝,π‘ž 𝑑𝑝 , π‘‘π‘ž +
(𝑝,π‘ž)∈𝐸
𝐷𝑝 (𝑑𝑝 )
𝑝∈𝑉
• Interpreted as negative log probabilities
Iterated Conditional Modes (ICM)
• Initialize states (= disparities) for every pixel
• Update repeatedly each pixel by the most likely
disparity given the values assigned to its neighbors:
min
𝑑𝑝
𝑉𝑝,π‘ž 𝑑𝑝 , π‘‘π‘ž + 𝐷𝑝 (𝑑𝑝 )
π‘ž∈𝒩(𝑝)
• Markov blanket: the state of a pixel only depends on
the states of its immediate neighbors
• Similar to Gauss-Seidel iterations
• Slow convergence to (often bad) local minimum
Graph cuts: expansion moves
• Assume 𝐷 π‘₯ is non-negative and 𝑉 π‘₯, 𝑦 is
metric:
– 𝑉 π‘₯, π‘₯ = 0
– 𝑉 π‘₯, 𝑦 = 𝑉 𝑦, π‘₯
– 𝑉 π‘₯, 𝑦 ≤ 𝑉 π‘₯, 𝑧 + 𝑉 𝑧, 𝑦
• We can apply more semi-global moves using
minimal s-t cuts
• Converges faster to a better (local) minimum
α-Expansion
• In any one round, expansion move allows each
pixel to either
– change its state to α, or
– maintain its previous state
Each round is implemented via max flow/min cut
• One iteration: apply expansion moves sequentially
with all possible disparity values
• Repeat till convergence
α-Expansion
• Every round achieves a globally optimal solution over
one expansion move
• Energy decreases (non-increasing) monotonically
between rounds
• At convergence energy is optimal with respect to all
expansion moves, and within a scale factor from the
global optimum:
𝐸(π’Ÿπ‘’π‘₯π‘π‘Žπ‘›π‘ π‘–π‘œπ‘› ) ≤ 2𝑐𝐸(π’Ÿ ∗ )
where
max 𝑉(𝛼, 𝛽)
𝑐=
𝛼≠𝛽∈π’Ÿ
min 𝑉(𝛼, 𝛽)
𝛼≠𝛽∈π’Ÿ
α-Expansion (1D example)
𝑑𝑝
π‘‘π‘ž
α-Expansion (1D example)
𝛼
𝛼
α-Expansion (1D example)
𝛼
𝛼
α-Expansion (1D example)
𝛼
𝐷𝑝 (𝛼)
π·π‘ž (𝛼)
π‘‰π‘π‘ž 𝛼, 𝛼 = 0
𝛼
α-Expansion (1D example)
𝛼
But what about
π‘‰π‘π‘ž (𝑑𝑝 , π‘‘π‘ž )?
𝐷𝑝 (𝑑𝑝 )
π·π‘ž (π‘‘π‘ž )
𝛼
α-Expansion (1D example)
𝛼
π‘‰π‘π‘ž (𝑑𝑝 , π‘‘π‘ž )
𝐷𝑝 (𝑑𝑝 )
π·π‘ž (π‘‘π‘ž )
𝛼
α-Expansion (1D example)
𝛼
π·π‘ž (𝛼)
π‘‰π‘π‘ž (𝑑𝑝 , 𝛼)
𝐷𝑝 (𝑑𝑝 )
𝛼
α-Expansion (1D example)
𝛼
𝐷𝑝 (𝛼)
π‘‰π‘π‘ž (𝛼, π‘‘π‘ž )
π·π‘ž (π‘‘π‘ž )
𝛼
α-Expansion (1D example)
𝛼
π‘‰π‘π‘ž (𝑑𝑝 , 𝛼)
π‘‰π‘π‘ž (𝛼, π‘‘π‘ž )
π‘‰π‘π‘ž (𝑑𝑝 , π‘‘π‘ž )
Such a cut cannot be obtained due to triangle inequality:
π‘‰π‘π‘ž (𝛼, π‘‘π‘ž ) ≤ π‘‰π‘π‘ž 𝛼
𝑑𝑝 , π‘‘π‘ž + π‘‰π‘π‘ž (𝑑𝑝 , 𝛼)
Common Metrics
• Potts model:
0
𝑉 π‘₯, 𝑦 =
1
π‘₯=𝑦
π‘₯≠𝑦
• 𝑉 π‘₯, 𝑦 = π‘₯ − 𝑦
• 𝑉 π‘₯, 𝑦 = π‘₯ − 𝑦
2
• Truncated β„“1 :
𝑉 π‘₯, 𝑦 =
π‘₯−𝑦
𝑇
π‘₯−𝑦 <𝑇
otherwise
• Truncated squared difference is not a metric
Reconstruction with graph-cuts
Original
Result
Ground truth
A different application: detect skyline
•
•
•
•
•
Input: one image, oriented with sky above
Objective: find the skyline in the image
Graph: grid
Two states: sky, ground
Unary (data) term:
– State = sky, low if blue, otherwise high
– State = ground, high if blue, otherwise low
• Binary term for vertical connections:
– If the state of a node is sky, the node above should also be sky (set to
infinity if not)
– If the state of a node is ground, the node below should also be ground
• Solve with expansion move. This is a binary (two state) problem,
and so graph cut can find the global optimum in one expansion
move
Download