Stereo

advertisement
3-D Computer Vision
CSc 83029
Stereo
CSc83029 3-D Computer Vision / Ioannis Stamos
Stereopsis
 Recovering 3D information (depth) from
two images.




The correspondence problem.
The reconstruction problem.
Epipolar constraint.
The 8-point algorithm.
CSc83029 3-D Computer Vision / Ioannis Stamos
The 2 problems of Stereo
The setting: Simultaneous acquisition of 2 images
(left, right) of a static scene.
 Correspondence: Which parts of the left and
right images are projections of the same scene
element?
 Reconstruction: Given:
 A number of corresponding points between the left
and right image,
 Information on the geometry of the stereo system,
Find:
3-D structure of observed objects.
CSc83029 3-D Computer Vision / Ioannis Stamos
Stereo Vision
depth map
CSc83029 3-D Computer Vision / Ioannis Stamos
A simple stereo system
Fixation
Point:Infinity.
Parallel
optical axes.
P
Z-axis
Z
cl
cr
f
X-axis
Or
Ol
Left Camera
T
Right Camera
CSc83029 3-D Computer Vision / Ioannis Stamos
Triangulation
Fixation
Point:Infinity.
Parallel
optical axes.
P
Z-axis
Z
cl
pl
X-axis
Ol
cr
pr
f
Or
T
Left Camera
Right Camera
Calibrated Cameras
CSc83029 3-D Computer Vision / Ioannis Stamos
Triangulation
cl
Fixation
Point:Infinity.
Parallel
optical axes.
P
Z-axis
xl
xr
Z
pl
X-axis
cr
pr
f
Or
Ol
T
Left Camera
Right Camera
Calibrated Cameras
Similar triangles:
T  xl  xr T
T
Z f

Z
Z  f
d
, d  xl  xr
d:disparity (difference in retinal positions).
T:baseline.
Depth (Z) is inversely proportional to d (fixation at infinity)
Triangulation
cl
Fixation
Point:Infinity.
Parallel
optical axes.
P
Z-axis
xl
xr
Z
pl
X-axis
cr
pr
f
Or
Ol
T
Left Camera
Right Camera
Calibrated Cameras
Similar triangles:
T  xl  xr T
T
Z f

Z
Z  f
d
, d  xl  xr
d:disparity (difference in retinal positions).
T:baseline.
Baseline T: accuracy/robustness of depth calculation.
Triangulation
cl
xl
pl
X-axis
Fixation
Point:Infinity.
Parallel
optical axes.
P
Z-axis
Z xr cr
pr
f
Or
Ol
T
Left Camera
Right Camera
Calibrated Cameras
Similar triangles:
T  xl  xr T
T
Z f

Z
Z  f
d:disparity (difference in retinal positions).
T:baseline.
Small baselines: less accurate measurements.
d
, d  xl  xr
Triangulation
cl
xl
pl
X-axis
Fixation
Point:Infinity.
Parallel
optical axes.
P
Z-axis
Z xr cr
pr
f
Or
Ol
T
Left Camera
Right Camera
Calibrated Cameras
Similar triangles:
T  xl  xr T
T
Z f

Z
Z  f
d:disparity (difference in retinal positions).
T:baseline.
Large baselines: occlusions/foreshortening.
d
, d  xl  xr
Parameters of Stereo System
Fixation
Point:Infinity.
Parallel
optical axes.
P
Z-axis
xl
cl
Z
pl
X-axis
cr
pr
f
Or
Ol
Left Camera
xr
T
Right Camera
1) Intrinsic parameters (i.e. f, cl, cr)
2) Extrinsic parameters: relative position and
orientation of the 2 cameras.
STEREO CALIBRATION PROBLEM
CSc83029 3-D Computer Vision / Ioannis Stamos
Stereo – Photometric Constraint
• Same world point has same intensity in both images.
• Lambertian fronto-parallel
• Issues (noise, specularities, foreshortening)
From Jana Kosecka
• Difficulties – ambiguities, large changes of appearance, due to change
Of viewpoint, non-uniquess
Correspondence Is Difficult
Ambiguity: there may be many possible 3D reconstructions.
CSc83029 3-D Computer Vision / Ioannis Stamos
Correspondence Is Difficult
No texture: difficult to find a unique match.
CSc83029 3-D Computer Vision / Ioannis Stamos
Correspondence Is Difficult
Foreshortening: the projection in each image is different.
CSc83029 3-D Computer Vision / Ioannis Stamos
Correspondence Is Difficult
Occlusions: there may not be a correspondence.
Assumptions: 1) Most scene points are visible from both views.
2) Corresponding image regions are similar.
Correspondence Is Difficult
Curved surfaces: triangulation produces incorrect position.
CSc83029 3-D Computer Vision / Ioannis Stamos
Correspondence is difficult: The Ordering
Constraint
Points appear in the same order
But it is not always the case ...
CSc83029 3-D Computer Vision / Ioannis Stamos
More Correspondence Problems
 Regions without texture
 Highly Specular surfaces
 Translucent objects
CSc83029 3-D Computer Vision / Ioannis Stamos
Methods For Correspondence
 Correlation based (dense correspondences).
 Feature based (such as edges/lines/corners).
CSc83029 3-D Computer Vision / Ioannis Stamos
Correlation-Based Methods
R(pl )
pl
Left Image
Right Image
1) For each pixel pl in the left image search in a region R(pl) in the
right image for corresponding pixel pr.
2) Use image windows of size (2W+1)x(2W+1).
3) Select the pixel pr that maximizes a correlation function.
HAVE TO SPECIFY: Region R, size W, and correlation function ψ.
Correlation-Based Methods
R(pl )
pl
Left Image
Right Image
For each pixel pl=[i,j] in the left image
For each displacement d=[d1,d2] in R(pl)
W
W
Compute c(d) 
  ( I l (i  k , j  l ), Ir (i  k  d1, j  l  d 2 ))
k  W l  W
The disparity of pl is the d that maximizes c(d)
HAVE TO SPECIFY: Region R, size W, and correlation function ψ.
Correlation-Based Methods
R(pl )
pl
Left Image
 (u, v )  uv
 ( u, v )  ( u  v ) 2
Right Image
CROSS-CORRELATION
SUM OF SQUARED DIFFERENCES
SSD
SSD is usually preferred: handles different intensity scales.
Normalized cross-correlation is better (but is more expensive).
Correspondence Is Difficult
Intensities in window may differ.
Normalized cross-correlation may help.
CSc83029 3-D Computer Vision / Ioannis Stamos
Image Normalization
 Even when the cameras are identical models,
there can be differences in gain and sensitivity.
 The cameras do not see exactly the same
surfaces, so their overall light levels can differ.
 For these reasons and more, it is a good idea to
normalize the pixels in each window:
I
I
1
Wm ( x , y )
Wm ( x , y )

 I (u, v)
Average pixel
( u ,v )Wm ( x , y )
2
[
I
(
u
,
v
)]

Window magnitude
( u ,v )Wm ( x , y )
ˆI ( x, y )  I ( x, y )  I
I  I W ( x, y )
Normalized pixel
m
CSc83029 3-D Computer Vision / Ioannis Stamos
From Sebastian Thrun/Jana Kosecka
Comparing Windows:
Minimize
Sum of Squared
Differences
Maximize
Cross correlation
It is closely related to the SSD:
From Jana Kosecka
CSc83029 3-D Computer Vision / Ioannis Stamos
Region based Similarity Metrics
• Sum of squared differences
• Normalize cross-correlation
• Sum of absolute differences
From Jana Kosecka
CSc83029 3-D Computer Vision / Ioannis Stamos
NCC score for two widely
separated views
NCC score
From Jana Kosecka
CSc83029 3-D Computer Vision / Ioannis Stamos
Window size
W=3
W = 20
 Effect of window size
Better results with adaptive window
•
•
T. Kanade and M. Okutomi, A Stereo Matching Algorithm with an Adaptive
Window: Theory and Experiment,, Proc. International Conference on Robotics and
Automation, 1991.
D. Scharstein and R. Szeliski. Stereo matching with nonlinear diffusion.
International Journal of Computer Vision, 28(2):155-174, July 1998
CSc83029 3-D Computer Vision / Ioannis Stamos
(S. Seitz)
Stereo results
 Data from University of Tsukuba
Scene
Ground truth
CSc83029 3-D Computer Vision / Ioannis Stamos
(Seitz)
Results with window correlation
Window-based matching
(best window size)
Ground truth
CSc83029 3-D Computer Vision / Ioannis Stamos
(Seitz)
Results with better method
State of the art
Ground truth
Boykov et al., Fast Approximate Energy Minimization via Graph Cuts,
International Conference on Computer Vision, September 1999.
CSc83029 3-D Computer Vision / Ioannis Stamos
(Seitz)
Feature-Based Methods
Left Image
Right Image
Match sparse sets of extracted features.
A feature descriptor for a line could contain:
length l, orientation o, midpoint (x,y), average contrast c
An example similarity measure (w’s are weights):
S
1
2 3-D Computer Vision2/ Ioannis Stamos 2
w0 (ll  lr ) 2  w1 (CSc83029
1   2 )  w2 ( ml  mr )  w3 ( cl  cr )
Correspondence Using
Correlation
Left
Disparity Map
Images courtesy of Point Grey Research
CSc83029 3-D Computer Vision / Ioannis Stamos
Correspondence By Features
LEFT IMAGE
corner
line
structure
CSc83029 3-D Computer Vision / Ioannis Stamos
From Sebastian Thrun/Jana Kosecka
Correspondence By Features
RIGHT IMAGE
corner
line
structure
 Search in the right image… the disparity (dx, dy) is the
displacement when the similarity measure is maximum
From Sebastian Thrun/Jana Kosecka
Comparison of Matching Methods
Correlation-Based
Feature-Based
 Dense depth maps.
 Need textured images
 Sensitive to
 Sparse depth maps.
 Insensitive to
foreshorening/illumination
changes
 Need close views
illumination changes.
 A-priori info used.
 Faster.
Problems: occlusions/spurious matches:
=>Introduce constraints in matching
(i.e. left-right consistency constraint)
CSc83029 3-D Computer Vision / Ioannis Stamos
Epipolar Constraint (Geometry)
P
Pl
EPIPOLAR
LINE
Image plane
πl
Scene point
Pr
EPIPOLAR
LINE
EPIPOLAR
PLANE
pr
pl
el
Ol
er
Image plane
πr
Or
Center of projection
Center of projection
Epipoles
CSc83029 3-D Computer Vision / Ioannis Stamos
Epipolar Constraint
P
Pl
EPIPOLAR
LINE
Image plane
πl
Scene point
Pr
EPIPOLAR
LINE
EPIPOLAR
PLANE
pr
pl
Ol
el
er
Image plane
πr
Or
Center of projection
Center of projection
Epipoles
Extrinsic parameters: Left/Right Camera Frames:
Pr=R(Pl-T), T=Or-Ol (1)
Epipolar Constraint
Ol, Or, pl =>
Scene point
P
Enough to define right E.L.
Pl
Pr
EPIPOLAR
LINE
Image plane
πl
EPIPOLAR
LINE
EPIPOLAR
PLANE
pr
pl
Ol
el
er
Image plane
πr
Or
Center of projection
Center of projection
Epipoles
Given pl, pr is constrained to lie on the Epipolar Line (E.L.).
For each left pixel pl, find the corresponding right E.L.
Searching for pr reduces to a 1-D problem.
Epipolar Constraint
Scene points
Image plane
el
er
Center of projection
Center of projection
Epipoles
All E.L.s go through epipoles.
Parallel image planes => epipoles at infinity.
Essential Matrix
Estimate the epipolar geometry: correspondence
between points and E.L.s.
Pl
Pl , Pl -T and T are coplanar
Pr  Pl  T
( Pl  T )T T  Pl  0
(1)
( RT Pr )T T  Pl   0
T
Link bw/ epipolar constraint and
extrinsic parameters of stereo
system.
Pr EPl  0
T
T  Pl  SPl
 0
S   TZ

 TY
Pr RT  Pl   0
T
 TZ
0
TX
TY 
 TX 

0 
Pr ( RS ) Pl  0
T
Essential Matrix
Perspective:
pl=[xl,yl,fl]T, pr=[xr,yr,zr]T
pl= fl/Zl Pl, pr=fr/Zr Pr
Pr EPl  0
T
Pr
Pl
pr Epl  0
T
ur
pl
ul
pr
el
Perspective
Projection
er
T
pl and pr are points in camera coordinate s
0
pr R  Tz
 T y
T
Tz
0
Ty
Tx
Tx
0

 pl  0

E
Epipolar lines are found by
ur  Epl
ul  E T pr
Essential matrix
Rank 2
CSc83029 3-D Computer Vision / Ioannis Stamos
Camera Models (linear versions)
Elegant decomposition.
No distortion!
Homogeneous
Coordinates
Measured Pixel
(xim, yim)
?
World Point
(Xw, Yw,Zw)
Fundamental Matrix
Ml (Mr) matrix of intrinsic parameters for left (right) camera.
Camera to pixel coordinates:
Pr
Pl
pl  M l pl
p r  M r pr
ur
pl
ul
pr
el
er
T
Epipolar lines:
ur  Fpl
ul  F T pr
Essential matrix equation
becomes:
pr M rT EM l1 pl  0
T
F
Fundamental matrix
F: pixel coordinates !
E: camera coordinates !
CSc83029 3-D Computer Vision / Ioannis Stamos
Conclusions
Essential Matrix
Fundamental Matrix
 Encodes information on
 Encodes information
extrinsic parameters.
 Has rank 2.
 Its 2 non-zero singular
values are equal.
on both the extrinsic
and intrinsic
parameters.
 Has rank 2.
CSc83029 3-D Computer Vision / Ioannis Stamos
Estimating the epipolar geometry
el
er
CSc83029 3-D Computer Vision / Ioannis Stamos
Estimating the epipolar geometry
el
er
Problem: Find the fundamental matrix from a set of image correspondences
p
i T
r
F pil  0
CSc83029 3-D Computer Vision / Ioannis Stamos
 p , p 
i
l
i
r
Estimating the epipolar geometry
el
min
F
er
 p
i T
r
i
F pl

2
i
With the respect to the constraint: Rank(F) = 2.
CSc83029 3-D Computer Vision / Ioannis Stamos
The 8-point algorithm
n>=8 correspondences
pi r F pil  0
T
Av  0
v: the 9 elements of F.
A: n x 9 measurement matrix.
Solve using SVD (solution up to a scale factor).
Enforce rank(F)=2 =>SVD on the computed F.
Be careful: numerical instabilities.
Epipolar Lines – Example
CSc83029 3-D Computer Vision / Ioannis Stamos
Example
Two views
Point Feature Matching
From Jana Kosecka
Example
Epipolar Geometry
Camera Pose
and
Sparse Structure Recovery
From Jana Kosecka
CSc83029 3-D Computer Vision / Ioannis Stamos
Locating the Epipoles from E & F
Accurate epipole localization:
1) Refining epipolar lines.
2) Checking for consistency.
3) Uncalibrated stereo.
el
er
F => el, er in pixel coordinates.
E => el, er in camera coordinates.
Fact: All epipolar lines pass through epipoles.
CSc83029 3-D Computer Vision / Ioannis Stamos
Image rectification
 Given general displacement how to warp the views
 Such that epipolar lines are parallel to each other
 How to warp it back to canonical configuration
(more details later)
CSc83029 3-D Computer Vision / Ioannis Stamos
(Seitz)
Epipolar rectification
• Rectified Image Pair
• Corresponding epipolar lines are aligned with the scan-lines
• Search for dense correspondence is a 1D search
CSc83029 3-D Computer Vision / Ioannis Stamos
Epipolar rectification
Rectified Image Pair
CSc83029 3-D Computer Vision / Ioannis Stamos
Rectification (Trucco, Ch. 7)
 Rotate left camera so that epipole goes to
infinity (known R, known epipoles)
 Apply same rotation to right camera
 Rotate right camera by R
 Adjust scale in both camera reference frames
CSc83029 3-D Computer Vision / Ioannis Stamos
Rectification
 Problem: Epipolar lines not parallel to scan
lines
P
Pl
Pr
Epipolar Plane
Epipolar Lines
p
Ol
p
l
el
er
r
Or
Epipoles
CSc83029 3-D Computer Vision / Ioannis Stamos
From Sebastian Thrun/Jana Kosecka
Rectification
 Problem: Epipolar lines not parallel to scan
lines
P
Pl
Pr
Rectified Images
Epipolar Plane
Epipolar Lines
p
p
l
Ol
r
Or
Epipoles at infinity
CSc83029 3-D Computer Vision / Ioannis Stamos
From Sebastian Thrun/Jana Kosecka
3-D Reconstruction
CSc83029
3-Dby
Computer
/ Ioannis
Stamos
Reprinted from “Stereo by Intra- and Intet-Scanline
Search,”
Y. Ohta andVision
T. Kanade,
IEEE Trans.
on Pattern Analysis and Machine
Intelligence, 7(2):139-154 (1985).  1985 IEEE.
3-D Reconstruction
A Priori Knowledge
3-D Reconstruction from two views
 Intrinsic and extrinsic  Unambiguous (triangulation)
 Intrinsic only
 Up to unknown scaling factor
 No information
 Up to unknown projective
transformation
CSc83029 3-D Computer Vision / Ioannis Stamos
Projective Reconstruction
Euclidean reconstruction
Projective reconstruction
CSc83029 3-D Computer Vision / Ioannis Stamos
From Sebastian Thrun/Jana Kosecka
Euclidean vs Projective
reconstruction
 Euclidean reconstruction – true metric properties of objects
lenghts (distances), angles, parallelism are preserved
 Unchanged under rigid body transformations
 => Euclidean Geometry – properties of rigid bodies under
rigid body transformations, similarity transformation
 Projective reconstruction – lengths, angles, parallelism are
NOT preserved – we get distorted images of objects – their
distorted 3D counterparts --> 3D projective reconstruction
 => Projective Geometry
CSc83029 3-D Computer Vision / Ioannis Stamos
From Sebastian Thrun/Jana Kosecka
Check this out!
http://www.well.com/user/jimg/stereo/stereo_list.html
CSc83029 3-D Computer Vision / Ioannis Stamos
How can We Improve Stereo?
Space-time stereo scanner
uses unstructured light to aid
in correspondence
Result: Dense 3D mesh (noisy)
CSc83029 3-D Computer Vision / Ioannis Stamos
From Sebastian Thrun/Jana Kosecka
Active Stereo: Adding Texture to Scene
By James Davis,
Honda
Research,
Now UCSC
CSc83029 3-D Computer Vision / Ioannis Stamos
From Sebastian Thrun/Jana Kosecka
rectified
Active Stereo (Structured Light)
CSc83029 3-D Computer Vision / Ioannis Stamos
From Sebastian Thrun/Jana Kosecka
Range Images (depth images, depth
maps, surface profiles, 2.5-D images)
•Sensors that produce depth directly.
•Pixel of a range image is the distance between a known reference frame
and a visible point in the scene.
•Representations:
•Cloud of Points (x,y,z)
•Rij form (spatial information is explicit)
CSc83029 3-D Computer Vision / Ioannis Stamos
Active Range Sensors
 Project energy or control sensor’s parameters.
 Laser, Radars (accurate)/Sonars(inaccurate).
 Active Focusing/Defocusing.
CSc83029 3-D Computer Vision / Ioannis Stamos
Triangulation
CSc83029 3-D Computer Vision / Ioannis Stamos
Triangulation
Light Stripe System.
Zc
Yc
Xc
Light Plane:
AX+BY+CZ+D=0
(in camera frame)
Image Point:
x=f X/Z, y=f Y/Z
(perspective)
Image
Triangulation: Z=-D f/(A x + B y + C f)
Move light stripeCSc83029
or object.
3-D Computer Vision / Ioannis Stamos
Time of Flight
CSc83029 3-D Computer Vision / Ioannis Stamos
Download