Colorado School of Mines Computer Vision Professor William Hoff Dept of Electrical Engineering &Computer Science Colorado School of Mines Computer Vision http://inside.mines.edu/~whoff/ 1 Stereo Vision Colorado School of Mines Computer Vision 2 Inferring 3D from 2D • Model based pose estimation -> Can determine the pose of the model single (calibrated) camera Known model • Stereo vision two (calibrated) cameras Arbitrary scene -> Can determine the positions of points in the scene Relative pose between cameras is also known Colorado School of Mines Computer Vision 3 Stereo Vision • A way of getting depth (3-D) information about a scene from two (or more) 2-D images – Used by humans and animals, now computers • Computational stereo vision – Studied extensively in the last 25 years – Difficult; still being researched – Some commercial systems available • Good references – Scharstein and Szeliski, 2002. “A Taxonomy and Evaluation of Dense TwoFrame Stereo Correspondence Algorithms.” International Journal of Computer Vision, 47(1-3), 7-42 – http://vision.middlebury.edu/stereo - extensive website with evaluations of algorithms, test data, code Colorado School of Mines Computer Vision 4 Example Left image Right image Reconstructed surface with image texture Davi Geiger Colorado School of Mines Computer Vision 5 Example • Notice how different parts of the two images align, for different values of the horizontal shift (disparity) Iright = im2double(imread('pentagonRight.png')); Ileft = im2double(imread('pentagonLeft.png')); % Disparity is d = xleft-xright % So Ileft(x,y) = Iright(x+d,y) for d=-20:20 d Idiff = abs(Ileft(:, 21:end-20) - Iright(:, d+21:d+end-20)); imshow(Idiff, []); pause end Colorado School of Mines Computer Vision 6 Stereo Displays • Stereograms were popular in the early 1900’s • A special viewer was needed to display two different images to the left and right eyes http://www.columbia.edu/itc/mealac/pritchett/00routesdata/1700_1799/jaipur/jaipurcity/jaipurcity.html Colorado School of Mines Computer Vision 7 Stereo Displays • 3D movies were popular in the 1950’s • The left and right images were displayed as red and blue http://j-walkblog.com/index.php?/weblog/posts/swimmers/ Colorado School of Mines Computer Vision 8 Stereo Displays • Current technology for 3D movies and computer displays is to use polarized glasses • The viewer wears eyeglasses which contain circular polarizers of opposite handedness http://www.3dsgamenews.com/201 1/01/3ds-to-feature-3d-movies/ Colorado School of Mines Computer Vision 9 Stereo Principle • If you know – intrinsic parameters of each camera – the relative pose between the cameras • If you measure – An image point in the left camera – The corresponding point in the right camera • Each image point corresponds to a ray emanating from that camera • You can intersect the rays (triangulate) to find the absolute point position Colorado School of Mines Computer Vision 10 Stereo Geometry – Simple Case P(XL,YL,ZL) • Assume image planes are coplanar • There is only a translation in the X direction between the two coordinate frames • b is the baseline distance between the cameras X X xL f L , xR f R ZL ZR xL f d xL xR f ZL ZR Z XL XR b XR b Z X R b X R Z xL f Z f b Z ZL ZR b b d XL Left camera Disparity d = xL - xR Colorado School of Mines xR Computer Vision XR Right camera 11 Goal: a complete disparity map • Disparity is the difference in position of corresponding points between the left and right images http://vision.middlebury.edu/stereo Colorado School of Mines Computer Vision 12 Reconstruction Error • Given the uncertainty in pixel projection of the point, what is the error in depth? • Obviously the error in depth (DZ) will depend on: – Z, b, f – DxL, DxR • Let’s find the expected value of the error, and the variance of the error From http://www.danet.dk/sensor_fusion Colorado School of Mines Computer Vision 13 Reconstruction Error • First, find the error in disparity Dd, from the error of locating the feature in each image, DxL and DxR d xL xR – Taking the total derivative of each side d (d ) d ( xL ) d ( xR ) Dd DxL DxR – Assuming DxL, DxR are independent and zero mean E Dd E DxL E DxR 0 Var Dd E Dd E Dd – and 2 2 E Dx 2 E Dx Dx E Dx E Dx E Dx Var Dd E DxL DxR E DxL 2DxL DxR DxR 2 2 2 2 L L 2 L Colorado School of Mines R R 2 2 So sd2 = sL2 + sR2 R Computer Vision 14 Reconstruction Error • Next, we take the total derivative of Z=fb/d – If the only uncertainty is in the disparity d DZ f b Dd 2 d • The mean error is Z = E[DZ] • The variance of the error is sZ2 = E [(DZ- Z)2] Colorado School of Mines Computer Vision 15 Example • A stereo vision system estimates the disparity of a point as d=10 pixels – What is the depth (Z) of the point, if f = 500 pixels and b = 10 cm? – What is the uncertainty (standard deviation) of the depth, if the standard deviation of locating a feature in each image = 1 pixel? • How to handle uncertainty in both disparity and focal length? Colorado School of Mines Computer Vision 16 Geometry - general case P(XL,YL,ZL) • Cameras not aligned, but we still know relative pose • Assuming f=1, we have xL xR p L yL , p R yR 1 1 • In principle, you can find P by intersecting the rays OLpL and ORpR • However, they may not intersect • Instead, find the midpoint of the segment perpendicular to the two rays Colorado School of Mines Computer Vision pR ZL pL ZR XR XL Left camera Right camera 17 Triangulation (continued) • The projection of P onto the left image is ZL pL = ML P • The projection of P onto the right image is ZR pR = MR P • where P pL pR 1 0 0 0 ML 0 1 0 0 0 0 1 0 r11 r12 M R r21 r22 r r 31 32 Colorado School of Mines r13 t x r23 t y r33 t z R L R R t Lorg Computer Vision 18 Triangulation (continued) • Note that pL and ML P are parallel, so their cross product should be zero • Similarly for pR and MR P • Point P should satisfy both P pL M L P 0 pR M R P 0 pL pR • This is a system of four equations; can solve for the three unknowns (XL, YL, ZL) using least squares • Method also works for more than two cameras Colorado School of Mines Computer Vision 19 Stereo Process • Extract features from the left and right images • Match the left and right image features, to get their disparity in position (the “correspondence problem”) • Use stereo disparity to compute depth (the reconstruction problem) http://vision.middlebury.edu/stereo/data/scenes2003/ • The correspondence problem is the most difficult Colorado School of Mines Computer Vision 20 Characteristics of Human Stereo Vision • Matching features must appear similar in the left and right images For example, we can’t fuse a left stereo image with a negative of the right image… Colorado School of Mines Computer Vision 21 http://cs.wellesley.edu/~cs332/ Characteristics of Human Stereo Vision • Can only “fuse” objects within a limited range of depth around the fixation distance • Vergence eye movements are needed to fuse objects over larger range of depths http://cs.wellesley.edu/~cs332/ Colorado School of Mines Computer Vision 22 Panum’s Fusional Area • Panum's fusional area is the range of depths for which binocular fusion can occur (without changing vergence angles) • It’s actually quite small … we are able to perceive a wide range of depths because we are changing vergence angles http://webvision.med.utah.edu/imageswv/KallDepth7.jpg Colorado School of Mines Computer Vision 23 Characteristics of Human Stereo Vision • Cells in visual cortex are selective for stereo disparity • Neurons that are selective for a larger disparity range have larger receptive fields • zero disparity: at fixation distance • near: in front of point of fixation • far: behind point of fixation http://cs.wellesley.edu/~cs332/ Colorado School of Mines Computer Vision 24 Characteristics of Human Stereo Vision • Can fuse random-dot stereograms Bela Julesz, 1971 • Shows – Stereo system can function independently – We can match “simple” features – Highlights the ambiguity of the matching process Colorado School of Mines Computer Vision 25 http://cs.wellesley.edu/~cs332/ Example • Make a random dot stereogram L = rand(400,400); R = L; % Shift center portion by 50 pixels R(100:300, 150:350) = L(100:300, 100:300); % Fill in part that moved R(100:300, 100:149) = rand(201, 50); Colorado School of Mines Computer Vision 26 Correspondence Problem – Most difficult part of stereo vision • For every point in the left image, there are many possible matches in the right image • Locally, many points look similar -> matches are ambiguous • We can use the (known) geometry of the cameras to help limit the search for matches • The most important constraint is the epipolar constraint – We can limit the search for a match to be along a certain line in the other image Colorado School of Mines Computer Vision 27 Epipolar Constraint With aligned cameras, search for corresponding point is 1D along corresponding row of other camera. Colorado School of Mines Computer Vision 28 Epipolar constraint for non baseline stereo computation If cameras are not aligned, a 1D search can still be determined for the corresponding point. P1, C1, C2 determine a plane that cuts image I2 in a line: P2 will be on that line. Colorado School of Mines Computer Vision 29 Rectification • If relative camera pose is known, it is possible to rectify the images – effectively rotate both cameras so that they are looking perpendicular to the line joining the camera centers Original image pair overlaid with several epipolar lines • These means that epipolar lines will be horizontal, and matching algorithms will be more efficient Images rectified so that epipolar lines are horizontal and in vertical correspondence From Richard Szeliski, Computer Vision: Algorithms and Applications, Springer, 2010 Colorado School of Mines Computer Vision 30 Correspondence Problem • Even using the epipolar constraint, there are many possible matches • Worst case scenarios • A white board (no features) • A checkered wallpaper (ambiguous matches) • The problem is under constrained • To solve, we need to impose assumptions about the real world: – – – – – Disparity limits Appearance Uniqueness Ordering Smoothness Colorado School of Mines Computer Vision 31 Disparity limits • Assume that valid disparities are within certain limits – Constrains search • Why usually true? • When is it violated? Colorado School of Mines Computer Vision 32 Appearance • Assume features should have similar appearance in the left and right images • Why usually true? http://vision.middlebury.edu/stereo/data/scenes2003/ • When is it violated? Colorado School of Mines Computer Vision 33 Uniqueness • Assume that a point in the left image can have at most one match in the right image • Why usually true? xL • When is it violated? xR b XL Left camera Colorado School of Mines Computer Vision XR Right camera 34 Ordering • Assume features should be in the same left to right order in each image • Why usually true? • When is it violated? Colorado School of Mines Computer Vision 35 Smoothness • Assume objects have mostly smooth surfaces, meaning that disparities should vary smoothly (e.g., have a low second derivative) • Why usually true? • When is it violated? Colorado School of Mines Computer Vision 36 Methods for Correspondence • Match points based on local similarity between images • Two general approaches • Correlation-based approaches – Matches image patches using correlation – Assumes only a translational difference between the two local patches (no rotation, or differences in appearance due to perspective) – A good assumption if patch covers a single surface, and surface is far away compared to baseline between cameras – Works well for scenes with lots of texture • Feature-based approaches – Matches edges, lines, or corners – Gives a sparse reconstruction – May be better for scenes with little texture Colorado School of Mines Computer Vision 37 Correlation Approach • Select a range of disparities to search • For each patch in the left image, compute cross correlation score for every point along the epipolar line • Find maximum correlation score along that line Colorado School of Mines Computer Vision 38 Matlab demo • Parameters: – Size of template patch – Horizontal disparity search window – Vertical disparity search window % Simple stereo system using cross correlation clear all close all Left % Constants W=16; % size of cross-correlation template is (2W+1 x 2W+1) DH = 50; % disparity horizontal search limit is -DH .. DH DV = 8; % disparity vertical search limit is -DV .. +DV Ileft = imread('left.png'); Iright = imread('right.png'); figure(1), imshow(Ileft, []), title('Left image'); figure(2), imshow(Iright, []), title('Right image'); pause; % Calculate disparity at a set of discrete points xborder = W+DH+1; yborder = W+DV+1; xTsize = W+DH; % horizontal template size is 2*xTsize+1 yTsize = W+DV; % vertical template size is 2*yTsize+1 Colorado School of Mines Computer Vision Template Template patch from left Right Search area Search region in right Correlation scores Correlation scores (peak in red) 39 Matlab demo (continued) npts = 0; % number of found disparity points for x=xborder:W:size(Ileft,2)-xborder for y=yborder:W:size(Ileft,1)-yborder % Extract a template from the left image centered at x,y figure(1), hold on, plot(x, y, 'rd'), hold off; • Scan through left image T = imcrop(Ileft, [x-W y-W 2*W 2*W]); %figure(3), imshow(T, []), title('Template'); % Search for match in the right image, in a region centered at x,y % and of dimensions DW wide by DH high. IR = imcrop(Iright, [x-xTsize y-yTsize 2*xTsize 2*yTsize]); %figure(4), imshow(IR, []), title('Search area'); % The correlation score image is the size of IR, expanded by W in % each direction. ccscores = normxcorr2(T,IR); %figure(5), imshow(ccscores, []), title('Correlation scores'); % Get the location of the peak in the correlation score image [max_score, maxindex] = max(ccscores(:)); [ypeak, xpeak] = ind2sub(size(ccscores),maxindex); hold on, plot(xpeak, ypeak, 'rd'), hold off; – Extract a template patch from the left – Do normalized crosscorrelation to match to the right – Accept a match if score is greater than a threshold % If score too low, ignore this point if max_score < 0.85 continue; end Colorado School of Mines Computer Vision 40 Matlab demo (continued) • Extract peak location, save disparity value • Plot all points when done % These are the coordinates of the peak in the search image ypeak = ypeak - W; xpeak = xpeak - W; %figure(4), hold on, plot(xpeak, ypeak, 'rd'), hold off; % These are the xpeak = xpeak + ypeak = ypeak + figure(2), hold coordinates in the full sized right image (x-xTsize); (y-yTsize); on, plot(xpeak, ypeak, 'rd'), hold off; % Save the point in a list, along with its disparity npts = npts+1; xPt(npts) = x; yPt(npts) = y; dPt(npts) = xpeak-x; % disparity is xright-xleft %pause end end figure, plot3(xPt, yPt, dPt, 'd'); Colorado School of Mines Computer Vision 41 Area-based matching • Window size tradeoff – Larger windows are more unique – Smaller windows less likely to cross discontinuities • Similarity measures – CC (cross-correlation) SSD is equivalent to CC – SSD (sum of squared differences) – SAD (sum of absolute differences) Colorado School of Mines Computer Vision 42 Additional notes • Stereo vision website – http://vision.middlebury.edu/stereo • Example commercial system – http://www.ptgrey.com Colorado School of Mines Computer Vision 43