correspondence, layered, active illumination

advertisement
Stanford CS223B Computer Vision, Winter 2006
Stereo
Stereo
Lecture 6
Stereo II
Professor Sebastian Thrun
CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado
Stereo Vision: Outline








Basic Equations
Epipolar Geometry
Image Rectification
Reconstruction
Correspondence
Active Range Imaging Technology
Dense and Layered Stereo
Smoothing With Markov Random Fields
Sebastian Thrun
Stanford University
CS223B Computer Vision
A Last Word on Preprocessing….
Sebastian Thrun
Stanford University
CS223B Computer Vision
Epipolar Rectified Images
Epipolar line
Sebastian Thrun
Stanford University
CS223B Computer Vision
Epipolar Rectified Images
Source: A. Fusiello, Verona, 2000]
Sebastian Thrun
Stanford University
CS223B Computer Vision
Image Normalization



Even when the cameras are identical models, there can
be differences in gain and sensitivity.
The cameras do not see exactly the same surfaces, so
their overall light levels can differ.
For these reasons and more, it is a good idea to
normalize the pixels in each window:
I
I
1
Wm ( x , y )
Wm ( x , y )

 I (u, v)
Average pixel
( u ,v )Wm ( x , y )
2
[
I
(
u
,
v
)]

Window magnitude
( u ,v )Wm ( x , y )
I ( x, y )  I
Iˆ( x, y ) 
I  I W ( x, y )
Normalized pixel
m
Sebastian Thrun
Stanford University
CS223B Computer Vision
Stereo Vision: Outline








Basic Equations
Epipolar Geometry
Image Rectification
Reconstruction
Correspondence
Active Range Imaging Technology
Dense and Layered Stereo
Smoothing With Markov Random Fields
Sebastian Thrun
Stanford University
CS223B Computer Vision
Correspondence
x
pl .1
O1
y
z
P1
P1
x
Phantom points
f
y
O2
z
pr ,1
Sebastian Thrun
Stanford University
CS223B Computer Vision
Correspondence via Correlation
Left
Right
scanline
SSD error
Rectified images
disparity
(Same as max-correlation / max-cosine for normalized image patch)
Sebastian Thrun
Stanford University
CS223B Computer Vision
Images as Vectors
Left
Right
wR
wL
Each window is a vector
in an m2 dimensional
vector space.
Normalization makes
them unit length.
Sebastian Thrun
Stanford University
CS223B Computer Vision
Image Metrics
(Normalized) Sum of Squared Differences
wR (d )
wL
CSSD (d ) 
 [ Iˆ (u, v)  Iˆ
2
(
u

d
,
v
)]
R
L
( u ,v )Wm ( x , y )
 wL  wR (d )
2
Normalized Correlation
CNC (d ) 
 Iˆ (u, v) Iˆ
L
( u ,v )Wm ( x , y )
R
(u  d , v)
 wL  wR (d )  cos
d  arg min d wL  wR (d )  arg max d wL  wR (d )
*
Sebastian Thrun
2
Stanford University
CS223B Computer Vision
Correspondence Using Correlation
Left
Disparity Map
Images courtesy of Point Grey Research
Sebastian Thrun
Stanford University
CS223B Computer Vision
Correspondence By Features
LEFT IMAGE
line
corner
structure
Sebastian Thrun
Stanford University
CS223B Computer Vision
Correspondence By Features
RIGHT IMAGE
corner
line
structure

Search in the right image… the disparity (dx, dy) is the
displacement when the similarity measure is maximum
Sebastian Thrun
Stanford University
CS223B Computer Vision
Stereo Correspondences
Left scanline
Right scanline
…
Sebastian Thrun
…
Stanford University
CS223B Computer Vision
Stereo Correspondences
Left scanline
Right scanline
…
…
Match
Match
Occlusion
Sebastian Thrun
Match
Stanford University
Disocclusion
CS223B Computer Vision
Search Over Correspondences
Occluded Pixels
Left scanline
Right scanline
Disoccluded Pixels
Three cases:
– Sequential – cost of match
– Occluded – cost of no match
– Disoccluded – cost of no match
Sebastian Thrun
Stanford University
CS223B Computer Vision
Stereo Matching with Dynamic Programming
Occluded Pixels
Left scanline
Right scanline
Dis-occluded Pixels
Scan across grid
computing optimal cost
for each node given its
upper-left neighbors.
Backtrack from the
terminal to get the
optimal path.
Terminal
Sebastian Thrun
Stanford University
CS223B Computer Vision
Stereo Matching with Dynamic Programming
Occluded Pixels
Start
Left scanline
Right scanline
Dis-occluded Pixels
Dynamic programming
yields the optimal path
through grid. This is the
best set of matches that
satisfy the ordering
constraint
End
Sebastian Thrun
Stanford University
CS223B Computer Vision
Stereo Matching with Dynamic Programming
Occluded Pixels
Left scanline
Right scanline
Dis-occluded Pixels
Scan across grid
computing optimal cost
for each node given its
upper-left neighbors.
Backtrack from the
terminal to get the
optimal path.
Terminal
Sebastian Thrun
Stanford University
CS223B Computer Vision
Stereo Matching with Dynamic Programming
Occluded Pixels
Left scanline
Right scanline
Dis-occluded Pixels
Scan across grid
computing optimal cost
for each node given its
upper-left neighbors.
Backtrack from the
terminal to get the
optimal path.
Terminal
Sebastian Thrun
Stanford University
CS223B Computer Vision
Dense Stereo Matching: Examples

input
View extrapolation results
depth image
novel view
[Matthies,Szeliski,Kanade’88]
Sebastian Thrun
Stanford University
CS223B Computer Vision
Dense Stereo Matching

Some other view extrapolation results
input
Sebastian Thrun
depth image
Stanford University
novel view
CS223B Computer Vision
Dense Stereo Matching

Compute certainty map from correlations
input
Sebastian Thrun
depth map
Stanford University
certainty map
CS223B Computer Vision
DP for Correspondence


Does this always work?
When would it fail?
– Failure Example 1
– Failure Example 2
– Failure Example 3
Sebastian Thrun
Stanford University
CS223B Computer Vision
Correspondence Problem 1

It is fundamentally ambiguous, even with stereo
constraints
Figure from
Forsyth & Ponce
Ordering constraint…
Sebastian Thrun
…and its failure
Stanford University
CS223B Computer Vision
Correspondence Problem 2

Correspondence fail for smooth surfaces

There is currently no good solution to the
correspondence problem
Sebastian Thrun
Stanford University
CS223B Computer Vision
Correspondence Problem 3



Regions without texture
Highly Specular surfaces
Translucent objects
Sebastian Thrun
Stanford University
CS223B Computer Vision
Stereo Vision: Outline








Basic Equations
Epipolar Geometry
Image Rectification
Reconstruction
Correspondence
Active Range Imaging Technology
Dense and Layered Stereo
Smoothing With Markov Random Fields
Sebastian Thrun
Stanford University
CS223B Computer Vision
How can We Improve Stereo?
Space-time stereo scanner
uses unstructured light to aid
in correspondence
Sebastian Thrun
Result: Dense 3D mesh (noisy)
Stanford University
CS223B Computer Vision
Prof Marc Levoy @ Stanford
By James Davis,
Honda Research,
Now UCSC
Sebastian Thrun
Stanford University
CS223B Computer Vision
rectified
Active Stereo (Structured Light)
Sebastian Thrun
Stanford University
CS223B Computer Vision
Structured Light: 3-D Result
3D Snapshot
3D Model
By James Davis,
Honda Research
Sebastian Thrun
Stanford University
CS223B Computer Vision
Time of Flight Sensor: Shutter
http://www.3dvsystems.com
Sebastian Thrun
Stanford University
CS223B Computer Vision
Time of Flight Sensor: Shutter
http://www.3dvsystems.com
Sebastian Thrun
Stanford University
CS223B Computer Vision
Time of Flight Sensor: Shutter
http://www.3dvsystems.com
Sebastian Thrun
Stanford University
CS223B Computer Vision
Stereo Vision: Outline








Basic Equations
Epipolar Geometry
Image Rectification
Reconstruction
Correspondence
Active Range Imaging Technology
Layered Stereo
Smoothing With Markov Random Fields
Sebastian Thrun
Stanford University
CS223B Computer Vision
Disclaimer

The Following Material Shall Not Be Required
For the Midterm Exam
Sebastian Thrun
Stanford University
CS223B Computer Vision
Layered Stereo

Assign pixel to different “layers” (objects, sprites)
Sebastian Thrun
Stanford University
CS223B Computer Vision
Layered Stereo

Track each layer from frame to frame,
compute plane eqn. and composite mosaic

Re-compute pixel assignment by comparing
original images to sprites
Sebastian Thrun
Stanford University
CS223B Computer Vision
Layered Stereo

Re-synthesize original or novel images from
collection of sprites
Sebastian Thrun
Stanford University
CS223B Computer Vision
Layered Stereo

Advantages:
– can represent occluded regions
– can represent transparent and border (mixed) pixels
(sprites have alpha value per pixel)
– works on texture-less interior regions

Limitations:
– fails for high depth-complexity scenes
Sebastian Thrun
Stanford University
CS223B Computer Vision
Fitting Planar Surfaces (with EM)
*
Sebastian Thrun
Stanford University
*
CS223B Computer Vision
Expectation Maximization

3D Model:
  {1 , 2 ,, J }
Planar surface in 3D
 j   j ,  j  3  
y
surface
normal 
surface
Distance point-surface
z
displacement 
dist(  j , zi )   j  zi   j
x
Sebastian Thrun
Stanford University
CS223B Computer Vision
Mixture Measurement Model

Case 1: Measurement zi caused by plane j
1
p ( zi |  j ) 
2 2
e
1 ( j  zi   j )

2
2
2
 Case 2: Measurement zi caused by something else
p ( zi |  * ) 
Sebastian Thrun
1
zmax

1
2 2
Stanford University
e
2
1 z max
 ln
2 2 2
CS223B Computer Vision
Measurement Model with Correspondences

1
p( zi |  , c1 ,, cJ , c* ) 
( j  zi   j )
z max 2 J
1 
 c* ln

c
j
2
2 2 j 1
2

}
2 2
e
2




correspondence variables C:
c* , c j {0,1}
J
c*   c j  1
j 1
 p( Z |  , C )  
i 1
Sebastian Thrun

1
2
( j  zi   j )
z max 2 J
1 
 ci* ln

c
ij
2
2 2 j 1
2

2
e
Stanford University
2




CS223B Computer Vision
Expected Log-Likelihood Function
p( Z |  , C )  
i 1
…after some simple math
Ec ln p( Z , C |  )

1
2

( j  zi   j )
z max 2 J
1 
 ci* ln

c
ij
2
2
2

2
j

1

2
e




1


ln


2
 ( J  1) 2



2
  1 E[c ] ln z max

i  2 i* 2 2


2 
J
  1 E[c ] ( j  zi   j ) 
ij
2
 2


j 1


probabilistic
data association
Sebastian Thrun
2
Stanford University
mapping with
known data association
CS223B Computer Vision
The EM Algorithm
Ec ln p( Z , C |  )

J
const   E[cij ]
i
j 1
( j  zi   j ) 2
2

E-step: given plane params, compute E[cij ]

M-step: given expectations, compute {a j ,  j }
Sebastian Thrun
Stanford University
CS223B Computer Vision
Choosing the “Right” Number of Planes: AIC
J=0
J=1
J=2
J=3
J=4
J=5
increased data likelihood
increased prior probability
log p( J | d )  const  log p(d | J )  log p( J )
Sebastian Thrun
Stanford University
CS223B Computer Vision
Determining Number of Surfaces
Add
Firstmodel
model
Prune
E/M
M-step
E-Step
Steps
components
model
component
*
*
Sebastian Thrun
J =2
=1
=3
*
Stanford University
CS223B Computer Vision
Layered Stereo

Resulting sprite collection
Sebastian Thrun
Stanford University
CS223B Computer Vision
Layered Stereo

Estimated depth map
Sebastian Thrun
Stanford University
CS223B Computer Vision
Stereo Vision: Outline








Basic Equations
Epipolar Geometry
Image Rectification
Reconstruction
Correspondence
Active Range Imaging Technology
Dense and Layered Stereo
Smoothing With Markov Random Fields
Sebastian Thrun
Stanford University
CS223B Computer Vision
Motivation and Goals
James Diebel
Sebastian Thrun
Stanford University
CS223B Computer Vision
Motivation and Goals
James Diebel
Sebastian Thrun
Stanford University
CS223B Computer Vision
Network of Constraints (Markov Random Field)
Directions
Vertex Node
Edge Node
Face Node
Sebastian Thrun
James Diebel
Stanford University
CS223B Computer Vision
MRF Approach to Smoothing

Potential function: contains a sensor-model term
and a surface prior
   xi  x0i  i xi  x0i    j 1  n1  n2 
T
i


j
The edge potential is important!
Minimize  by conjugate gradient
– Optimize systems with tens of thousands of
parameters in just a couple seconds
– Time to converge is O(N), between 0.7 sec (25,000
nodes in the MRF) and 25 sec (900,000 nodes)
Diebel/Thrun, 2006
Sebastian Thrun
Stanford University
CS223B Computer Vision
Possible Edge Potential Functions
Sebastian Thrun
Stanford University
CS223B Computer Vision
Results: Smoothing
James Diebel
Sebastian Thrun
Stanford University
CS223B Computer Vision
Results: Smoothing
James Diebel
Sebastian Thrun
Stanford University
CS223B Computer Vision
Results: Smoothing
James Diebel
Sebastian Thrun
Stanford University
CS223B Computer Vision
Results: Smoothing
James Diebel
Sebastian Thrun
Stanford University
CS223B Computer Vision
Movies…
Movies in Windows Media Player
Sebastian Thrun
Stanford University
CS223B Computer Vision
Stereo Vision: Outline








Basic Equations
Epipolar Geometry
Image Rectification
Reconstruction
Correspondence
Active Range Imaging Technology
Dense and Layered Stereo
Smoothing With Markov Random Fields
Sebastian Thrun
Stanford University
CS223B Computer Vision
Download