Computer Vision Colorado School of Mines Professor William Hoff

advertisement
Colorado School of Mines
Computer Vision
Professor William Hoff
Dept of Electrical Engineering &Computer Science
Colorado School of Mines
Computer Vision
http://inside.mines.edu/~whoff/
1
Stereo Vision
Colorado School of Mines
Computer Vision
2
Inferring 3D from 2D
• Model based pose estimation
-> Can determine
the pose of the
model
single
(calibrated)
camera
Known model
• Stereo vision
two
(calibrated)
cameras
Arbitrary
scene
-> Can determine
the positions of
points in the
scene
Relative pose between
cameras is also known
Colorado School of Mines
Computer Vision
3
Stereo Vision
• A way of getting depth (3-D) information about a scene from
two (or more) 2-D images
– Used by humans and animals, now computers
• Computational stereo vision
– Studied extensively in the last 25 years
– Difficult; still being researched
– Some commercial systems available
• Good references
– Scharstein and Szeliski, 2002. “A Taxonomy and Evaluation of Dense TwoFrame Stereo Correspondence Algorithms.” International Journal of Computer
Vision, 47(1-3), 7-42
– http://vision.middlebury.edu/stereo - extensive website with evaluations of
algorithms, test data, code
Colorado School of Mines
Computer Vision
4
Example
Left image
Right image
Reconstructed surface
with image texture
Davi Geiger
Colorado School of Mines
Computer Vision
5
Example
• Notice how different parts of the two images align,
for different values of the horizontal shift (disparity)
Iright = im2double(imread('pentagonRight.png'));
Ileft = im2double(imread('pentagonLeft.png'));
% Disparity is d = xleft-xright
% So Ileft(x,y) = Iright(x+d,y)
for d=-20:20
d
Idiff = abs(Ileft(:, 21:end-20) - Iright(:, d+21:d+end-20));
imshow(Idiff, []);
pause
end
Colorado School of Mines
Computer Vision
6
Stereo Displays
• Stereograms were popular in the early
1900’s
• A special viewer was needed to display two
different images to the left and right eyes
http://www.columbia.edu/itc/mealac/pritchett/00routesdata/1700_1799/jaipur/jaipurcity/jaipurcity.html
Colorado School of Mines
Computer Vision
7
Stereo Displays
• 3D movies were popular in the 1950’s
• The left and right images were displayed as
red and blue
http://j-walkblog.com/index.php?/weblog/posts/swimmers/
Colorado School of Mines
Computer Vision
8
Stereo Displays
• Current technology for 3D movies and
computer displays is to use polarized
glasses
• The viewer wears eyeglasses which
contain circular polarizers of opposite
handedness
http://www.3dsgamenews.com/201
1/01/3ds-to-feature-3d-movies/
Colorado School of Mines
Computer Vision
9
Stereo Principle
• If you know
– intrinsic parameters of each camera
– the relative pose between the cameras
• If you measure
– An image point in the left camera
– The corresponding point in the right camera
• Each image point corresponds to a ray
emanating from that camera
• You can intersect the rays (triangulate) to
find the absolute point position
Colorado School of Mines
Computer Vision
10
Stereo Geometry – Simple Case
P(XL,YL,ZL)
• Assume image planes are coplanar
• There is only a translation in the X direction
between the two coordinate frames
• b is the baseline distance between the
cameras
X
X
xL  f L , xR  f R
ZL
ZR
 xL  f
d  xL  xR  f
ZL  ZR  Z
XL  XR b
XR b
Z
 X R  b  X R
Z
xL
 f
 Z f
b
Z
ZL
ZR
b
b
d
XL
Left
camera
Disparity d = xL - xR
Colorado School of Mines
xR
Computer Vision
XR
Right
camera
11
Goal: a complete disparity map
•
Disparity is the difference in position of
corresponding points between the left
and right images
http://vision.middlebury.edu/stereo
Colorado School of Mines
Computer Vision
12
Reconstruction Error
• Given the uncertainty in pixel projection
of the point, what is the error in depth?
• Obviously the error in depth (DZ) will
depend on:
– Z, b, f
– DxL, DxR
• Let’s find the expected value of the error,
and the variance of the error
From http://www.danet.dk/sensor_fusion
Colorado School of Mines
Computer Vision
13
Reconstruction Error
• First, find the error in disparity Dd, from the error of locating the feature
in each image, DxL and DxR
d  xL  xR
– Taking the total derivative of each side
d (d )  d ( xL )  d ( xR )
Dd  DxL  DxR
– Assuming DxL, DxR are independent and zero mean
  E Dd   E DxL   E DxR   0

 
Var Dd   E Dd     E Dd 
– and
2
2


 
 E Dx  2 E Dx Dx   E Dx 
 E Dx  E Dx 
Var Dd   E DxL  DxR   E DxL  2DxL DxR  DxR
2
2
2
2
L
L
2
L
Colorado School of Mines
R
R
2
2

So
sd2 = sL2 + sR2
R
Computer Vision
14
Reconstruction Error
• Next, we take the total derivative of Z=fb/d
– If the only uncertainty is in the disparity d
DZ  f
b
 Dd 
2
d
• The mean error is Z = E[DZ]
• The variance of the error is sZ2 = E [(DZ- Z)2]
Colorado School of Mines
Computer Vision
15
Example
• A stereo vision system estimates the disparity of a point as
d=10 pixels
– What is the depth (Z) of the point, if f = 500 pixels and b = 10 cm?
– What is the uncertainty (standard deviation) of the depth, if the
standard deviation of locating a feature in each image = 1 pixel?
• How to handle uncertainty in both disparity and focal length?
Colorado School of Mines
Computer Vision
16
Geometry - general case
P(XL,YL,ZL)
• Cameras not aligned, but we still know relative pose
• Assuming f=1, we have
 xL 
 xR 
 
 
p L   yL  , p R   yR 
1
1
 
 
• In principle, you can find P by
intersecting the rays OLpL and
ORpR
• However, they may not intersect
• Instead, find the midpoint of the
segment perpendicular to the
two rays
Colorado School of Mines
Computer Vision
pR
ZL
pL
ZR
XR
XL
Left
camera
Right
camera
17
Triangulation (continued)
• The projection of P onto the left image
is
ZL pL = ML P
• The projection of P onto the right image
is
ZR pR = MR P
• where
P
pL
pR
1 0 0 0


ML   0 1 0 0 
0 0 1 0


 r11 r12

M R   r21 r22
r r
 31 32
Colorado School of Mines
r13 t x 

r23 t y  
r33 t z 

R
L
R
R
t Lorg

Computer Vision
18
Triangulation (continued)
• Note that pL and ML P are parallel, so
their cross product should be zero
• Similarly for pR and MR P
• Point P should satisfy both
P
pL  M L P  0
pR  M R P  0
pL
pR
• This is a system of four equations; can
solve for the three unknowns (XL, YL, ZL)
using least squares
• Method also works for more than two
cameras
Colorado School of Mines
Computer Vision
19
Stereo Process
• Extract features from the left and right images
• Match the left and right image features, to get their disparity
in position (the “correspondence problem”)
• Use stereo disparity to compute depth (the reconstruction
problem)
http://vision.middlebury.edu/stereo/data/scenes2003/
• The correspondence problem is the most difficult
Colorado School of Mines
Computer Vision
20
Characteristics of Human Stereo Vision
• Matching features must appear similar in the left and
right images
For example, we can’t fuse a left stereo
image with a negative of the right image…
Colorado School of Mines
Computer Vision
21
http://cs.wellesley.edu/~cs332/
Characteristics of Human Stereo Vision
• Can only “fuse” objects within a limited
range of depth around the fixation
distance
• Vergence eye movements are needed to
fuse objects over larger range of depths
http://cs.wellesley.edu/~cs332/
Colorado School of Mines
Computer Vision
22
Panum’s Fusional Area
• Panum's fusional
area is the range
of depths for
which binocular
fusion can occur
(without changing
vergence angles)
• It’s actually quite
small … we are
able to perceive a
wide range of
depths because
we are changing
vergence angles
http://webvision.med.utah.edu/imageswv/KallDepth7.jpg
Colorado School of Mines
Computer Vision
23
Characteristics of Human Stereo Vision
• Cells in visual cortex are selective for stereo disparity
• Neurons that are selective for a larger disparity range have larger
receptive fields
• zero disparity: at
fixation distance
• near: in front of
point of fixation
• far: behind point
of fixation
http://cs.wellesley.edu/~cs332/
Colorado School of Mines
Computer Vision
24
Characteristics of Human Stereo Vision
• Can fuse random-dot stereograms
Bela Julesz,
1971
• Shows
– Stereo system can function independently
– We can match “simple” features
– Highlights the ambiguity of the matching process
Colorado School of Mines
Computer Vision
25
http://cs.wellesley.edu/~cs332/
Example
• Make a random dot stereogram
L = rand(400,400);
R = L;
% Shift center portion by 50 pixels
R(100:300, 150:350) = L(100:300, 100:300);
% Fill in part that moved
R(100:300, 100:149) = rand(201, 50);
Colorado School of Mines
Computer Vision
26
Correspondence Problem – Most difficult part of
stereo vision
• For every point in the left image, there are many possible
matches in the right image
• Locally, many points look similar -> matches are ambiguous
• We can use the (known) geometry of the cameras to help
limit the search for matches
• The most important constraint is the epipolar constraint
– We can limit the search for a match to be along a certain line in the
other image
Colorado School of Mines
Computer Vision
27
Epipolar Constraint
With aligned cameras, search for corresponding point is 1D along
corresponding row of other camera.
Colorado School of Mines
Computer Vision
28
Epipolar constraint for non baseline stereo
computation
If cameras are not aligned, a 1D search can still be determined for the
corresponding point. P1, C1, C2 determine a plane that cuts image I2 in a
line: P2 will be on that line.
Colorado School of Mines
Computer Vision
29
Rectification
• If relative camera pose is known, it
is possible to rectify the images –
effectively rotate both cameras so
that they are looking perpendicular
to the line joining the camera
centers
Original image pair overlaid with several
epipolar lines
• These means that epipolar lines will
be horizontal, and matching
algorithms will be more efficient
Images rectified so that epipolar lines are
horizontal and in vertical correspondence
From Richard Szeliski, Computer Vision:
Algorithms and Applications, Springer, 2010
Colorado School of Mines
Computer Vision
30
Correspondence Problem
• Even using the epipolar constraint, there are many possible
matches
• Worst case scenarios
• A white board (no features)
• A checkered wallpaper (ambiguous matches)
• The problem is under constrained
• To solve, we need to impose assumptions about the real
world:
–
–
–
–
–
Disparity limits
Appearance
Uniqueness
Ordering
Smoothness
Colorado School of Mines
Computer Vision
31
Disparity limits
• Assume that valid disparities
are within certain limits
– Constrains search
• Why usually true?
• When is it violated?
Colorado School of Mines
Computer Vision
32
Appearance
• Assume features should
have similar appearance in
the left and right images
• Why usually true?
http://vision.middlebury.edu/stereo/data/scenes2003/
• When is it violated?
Colorado School of Mines
Computer Vision
33
Uniqueness
• Assume that a point in the left
image can have at most one
match in the right image
• Why usually true?
xL
• When is it violated?
xR
b
XL
Left
camera
Colorado School of Mines
Computer Vision
XR
Right
camera
34
Ordering
• Assume features should be in
the same left to right order in
each image
• Why usually true?
• When is it violated?
Colorado School of Mines
Computer Vision
35
Smoothness
• Assume objects have mostly smooth
surfaces, meaning that disparities
should vary smoothly (e.g., have a
low second derivative)
• Why usually true?
• When is it violated?
Colorado School of Mines
Computer Vision
36
Methods for Correspondence
• Match points based on local similarity between images
• Two general approaches
• Correlation-based approaches
– Matches image patches using correlation
– Assumes only a translational difference between the two local patches (no
rotation, or differences in appearance due to perspective)
– A good assumption if patch covers a single surface, and surface is far away
compared to baseline between cameras
– Works well for scenes with lots of texture
• Feature-based approaches
– Matches edges, lines, or corners
– Gives a sparse reconstruction
– May be better for scenes with little texture
Colorado School of Mines
Computer Vision
37
Correlation Approach
• Select a range of disparities to search
• For each patch in the left image, compute cross correlation score for
every point along the epipolar line
• Find maximum correlation score along that line
Colorado School of Mines
Computer Vision
38
Matlab demo
• Parameters:
– Size of template patch
– Horizontal disparity search window
– Vertical disparity search window
% Simple stereo system using cross correlation
clear all
close all
Left
% Constants
W=16;
% size of cross-correlation template is (2W+1 x 2W+1)
DH = 50;
% disparity horizontal search limit is -DH .. DH
DV = 8;
% disparity vertical search limit is -DV .. +DV
Ileft = imread('left.png');
Iright = imread('right.png');
figure(1), imshow(Ileft, []), title('Left image');
figure(2), imshow(Iright, []), title('Right image');
pause;
% Calculate disparity at a set of discrete points
xborder = W+DH+1;
yborder = W+DV+1;
xTsize = W+DH;
% horizontal template size is 2*xTsize+1
yTsize = W+DV;
% vertical template size is 2*yTsize+1
Colorado School of Mines
Computer Vision
Template
Template patch
from left
Right
Search area
Search region in
right
Correlation scores
Correlation scores
(peak in red)
39
Matlab demo (continued)
npts = 0;
% number of found disparity points
for x=xborder:W:size(Ileft,2)-xborder
for y=yborder:W:size(Ileft,1)-yborder
% Extract a template from the left image centered at x,y
figure(1), hold on, plot(x, y, 'rd'),
hold off;
• Scan through left image
T = imcrop(Ileft, [x-W y-W 2*W 2*W]);
%figure(3), imshow(T, []), title('Template');
% Search for match in the right image, in a region centered at x,y
% and of dimensions DW wide by DH high.
IR = imcrop(Iright, [x-xTsize y-yTsize 2*xTsize 2*yTsize]);
%figure(4), imshow(IR, []), title('Search area');
% The correlation score image is the size of IR, expanded by W in
% each direction.
ccscores = normxcorr2(T,IR);
%figure(5), imshow(ccscores, []), title('Correlation scores');
% Get the location of the peak in the correlation score image
[max_score, maxindex] = max(ccscores(:));
[ypeak, xpeak] = ind2sub(size(ccscores),maxindex);
hold on, plot(xpeak, ypeak, 'rd'),
hold off;
– Extract a template
patch from the left
– Do normalized crosscorrelation to match
to the right
– Accept a match if
score is greater than a
threshold
% If score too low, ignore this point
if max_score < 0.85
continue;
end
Colorado School of Mines
Computer Vision
40
Matlab demo (continued)
• Extract peak location, save disparity value
• Plot all points when done
% These are the coordinates of the peak in the search image
ypeak = ypeak - W;
xpeak = xpeak - W;
%figure(4), hold on, plot(xpeak, ypeak, 'rd'),
hold off;
% These are the
xpeak = xpeak +
ypeak = ypeak +
figure(2), hold
coordinates in the full sized right image
(x-xTsize);
(y-yTsize);
on, plot(xpeak, ypeak, 'rd'),
hold off;
% Save the point in a list, along with its disparity
npts = npts+1;
xPt(npts) = x;
yPt(npts) = y;
dPt(npts) = xpeak-x;
% disparity is xright-xleft
%pause
end
end
figure, plot3(xPt, yPt, dPt, 'd');
Colorado School of Mines
Computer Vision
41
Area-based matching
• Window size tradeoff
– Larger windows are more unique
– Smaller windows less likely to cross
discontinuities
• Similarity measures
– CC (cross-correlation)
SSD is equivalent to CC
– SSD (sum of squared differences)
– SAD (sum of absolute differences)
Colorado School of Mines
Computer Vision
42
Additional notes
• Stereo vision website
– http://vision.middlebury.edu/stereo
• Example commercial system
– http://www.ptgrey.com
Colorado School of Mines
Computer Vision
43
Download