Stanford CS223B Computer Vision, Winter 2006 Stereo Stereo Lecture 6 Stereo II Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado Stereo Vision: Outline Basic Equations Epipolar Geometry Image Rectification Reconstruction Correspondence Active Range Imaging Technology Dense and Layered Stereo Smoothing With Markov Random Fields Sebastian Thrun Stanford University CS223B Computer Vision A Last Word on Preprocessing…. Sebastian Thrun Stanford University CS223B Computer Vision Epipolar Rectified Images Epipolar line Sebastian Thrun Stanford University CS223B Computer Vision Epipolar Rectified Images Source: A. Fusiello, Verona, 2000] Sebastian Thrun Stanford University CS223B Computer Vision Image Normalization Even when the cameras are identical models, there can be differences in gain and sensitivity. The cameras do not see exactly the same surfaces, so their overall light levels can differ. For these reasons and more, it is a good idea to normalize the pixels in each window: I I 1 Wm ( x , y ) Wm ( x , y ) I (u, v) Average pixel ( u ,v )Wm ( x , y ) 2 [ I ( u , v )] Window magnitude ( u ,v )Wm ( x , y ) I ( x, y ) I Iˆ( x, y ) I I W ( x, y ) Normalized pixel m Sebastian Thrun Stanford University CS223B Computer Vision Stereo Vision: Outline Basic Equations Epipolar Geometry Image Rectification Reconstruction Correspondence Active Range Imaging Technology Dense and Layered Stereo Smoothing With Markov Random Fields Sebastian Thrun Stanford University CS223B Computer Vision Correspondence x pl .1 O1 y z P1 P1 x Phantom points f y O2 z pr ,1 Sebastian Thrun Stanford University CS223B Computer Vision Correspondence via Correlation Left Right scanline SSD error Rectified images disparity (Same as max-correlation / max-cosine for normalized image patch) Sebastian Thrun Stanford University CS223B Computer Vision Images as Vectors Left Right wR wL Each window is a vector in an m2 dimensional vector space. Normalization makes them unit length. Sebastian Thrun Stanford University CS223B Computer Vision Image Metrics (Normalized) Sum of Squared Differences wR (d ) wL CSSD (d ) [ Iˆ (u, v) Iˆ 2 ( u d , v )] R L ( u ,v )Wm ( x , y ) wL wR (d ) 2 Normalized Correlation CNC (d ) Iˆ (u, v) Iˆ L ( u ,v )Wm ( x , y ) R (u d , v) wL wR (d ) cos d arg min d wL wR (d ) arg max d wL wR (d ) * Sebastian Thrun 2 Stanford University CS223B Computer Vision Correspondence Using Correlation Left Disparity Map Images courtesy of Point Grey Research Sebastian Thrun Stanford University CS223B Computer Vision Correspondence By Features LEFT IMAGE line corner structure Sebastian Thrun Stanford University CS223B Computer Vision Correspondence By Features RIGHT IMAGE corner line structure Search in the right image… the disparity (dx, dy) is the displacement when the similarity measure is maximum Sebastian Thrun Stanford University CS223B Computer Vision Stereo Correspondences Left scanline Right scanline … Sebastian Thrun … Stanford University CS223B Computer Vision Stereo Correspondences Left scanline Right scanline … … Match Match Occlusion Sebastian Thrun Match Stanford University Disocclusion CS223B Computer Vision Search Over Correspondences Occluded Pixels Left scanline Right scanline Disoccluded Pixels Three cases: – Sequential – cost of match – Occluded – cost of no match – Disoccluded – cost of no match Sebastian Thrun Stanford University CS223B Computer Vision Stereo Matching with Dynamic Programming Occluded Pixels Left scanline Right scanline Dis-occluded Pixels Scan across grid computing optimal cost for each node given its upper-left neighbors. Backtrack from the terminal to get the optimal path. Terminal Sebastian Thrun Stanford University CS223B Computer Vision Stereo Matching with Dynamic Programming Occluded Pixels Start Left scanline Right scanline Dis-occluded Pixels Dynamic programming yields the optimal path through grid. This is the best set of matches that satisfy the ordering constraint End Sebastian Thrun Stanford University CS223B Computer Vision Stereo Matching with Dynamic Programming Occluded Pixels Left scanline Right scanline Dis-occluded Pixels Scan across grid computing optimal cost for each node given its upper-left neighbors. Backtrack from the terminal to get the optimal path. Terminal Sebastian Thrun Stanford University CS223B Computer Vision Stereo Matching with Dynamic Programming Occluded Pixels Left scanline Right scanline Dis-occluded Pixels Scan across grid computing optimal cost for each node given its upper-left neighbors. Backtrack from the terminal to get the optimal path. Terminal Sebastian Thrun Stanford University CS223B Computer Vision Dense Stereo Matching: Examples input View extrapolation results depth image novel view [Matthies,Szeliski,Kanade’88] Sebastian Thrun Stanford University CS223B Computer Vision Dense Stereo Matching Some other view extrapolation results input Sebastian Thrun depth image Stanford University novel view CS223B Computer Vision Dense Stereo Matching Compute certainty map from correlations input Sebastian Thrun depth map Stanford University certainty map CS223B Computer Vision DP for Correspondence Does this always work? When would it fail? – Failure Example 1 – Failure Example 2 – Failure Example 3 Sebastian Thrun Stanford University CS223B Computer Vision Correspondence Problem 1 It is fundamentally ambiguous, even with stereo constraints Figure from Forsyth & Ponce Ordering constraint… Sebastian Thrun …and its failure Stanford University CS223B Computer Vision Correspondence Problem 2 Correspondence fail for smooth surfaces There is currently no good solution to the correspondence problem Sebastian Thrun Stanford University CS223B Computer Vision Correspondence Problem 3 Regions without texture Highly Specular surfaces Translucent objects Sebastian Thrun Stanford University CS223B Computer Vision Stereo Vision: Outline Basic Equations Epipolar Geometry Image Rectification Reconstruction Correspondence Active Range Imaging Technology Dense and Layered Stereo Smoothing With Markov Random Fields Sebastian Thrun Stanford University CS223B Computer Vision How can We Improve Stereo? Space-time stereo scanner uses unstructured light to aid in correspondence Sebastian Thrun Result: Dense 3D mesh (noisy) Stanford University CS223B Computer Vision Prof Marc Levoy @ Stanford By James Davis, Honda Research, Now UCSC Sebastian Thrun Stanford University CS223B Computer Vision rectified Active Stereo (Structured Light) Sebastian Thrun Stanford University CS223B Computer Vision Structured Light: 3-D Result 3D Snapshot 3D Model By James Davis, Honda Research Sebastian Thrun Stanford University CS223B Computer Vision Time of Flight Sensor: Shutter http://www.3dvsystems.com Sebastian Thrun Stanford University CS223B Computer Vision Time of Flight Sensor: Shutter http://www.3dvsystems.com Sebastian Thrun Stanford University CS223B Computer Vision Time of Flight Sensor: Shutter http://www.3dvsystems.com Sebastian Thrun Stanford University CS223B Computer Vision Stereo Vision: Outline Basic Equations Epipolar Geometry Image Rectification Reconstruction Correspondence Active Range Imaging Technology Layered Stereo Smoothing With Markov Random Fields Sebastian Thrun Stanford University CS223B Computer Vision Disclaimer The Following Material Shall Not Be Required For the Midterm Exam Sebastian Thrun Stanford University CS223B Computer Vision Layered Stereo Assign pixel to different “layers” (objects, sprites) Sebastian Thrun Stanford University CS223B Computer Vision Layered Stereo Track each layer from frame to frame, compute plane eqn. and composite mosaic Re-compute pixel assignment by comparing original images to sprites Sebastian Thrun Stanford University CS223B Computer Vision Layered Stereo Re-synthesize original or novel images from collection of sprites Sebastian Thrun Stanford University CS223B Computer Vision Layered Stereo Advantages: – can represent occluded regions – can represent transparent and border (mixed) pixels (sprites have alpha value per pixel) – works on texture-less interior regions Limitations: – fails for high depth-complexity scenes Sebastian Thrun Stanford University CS223B Computer Vision Fitting Planar Surfaces (with EM) * Sebastian Thrun Stanford University * CS223B Computer Vision Expectation Maximization 3D Model: {1 , 2 ,, J } Planar surface in 3D j j , j 3 y surface normal surface Distance point-surface z displacement dist( j , zi ) j zi j x Sebastian Thrun Stanford University CS223B Computer Vision Mixture Measurement Model Case 1: Measurement zi caused by plane j 1 p ( zi | j ) 2 2 e 1 ( j zi j ) 2 2 2 Case 2: Measurement zi caused by something else p ( zi | * ) Sebastian Thrun 1 zmax 1 2 2 Stanford University e 2 1 z max ln 2 2 2 CS223B Computer Vision Measurement Model with Correspondences 1 p( zi | , c1 ,, cJ , c* ) ( j zi j ) z max 2 J 1 c* ln c j 2 2 2 j 1 2 } 2 2 e 2 correspondence variables C: c* , c j {0,1} J c* c j 1 j 1 p( Z | , C ) i 1 Sebastian Thrun 1 2 ( j zi j ) z max 2 J 1 ci* ln c ij 2 2 2 j 1 2 2 e Stanford University 2 CS223B Computer Vision Expected Log-Likelihood Function p( Z | , C ) i 1 …after some simple math Ec ln p( Z , C | ) 1 2 ( j zi j ) z max 2 J 1 ci* ln c ij 2 2 2 2 j 1 2 e 1 ln 2 ( J 1) 2 2 1 E[c ] ln z max i 2 i* 2 2 2 J 1 E[c ] ( j zi j ) ij 2 2 j 1 probabilistic data association Sebastian Thrun 2 Stanford University mapping with known data association CS223B Computer Vision The EM Algorithm Ec ln p( Z , C | ) J const E[cij ] i j 1 ( j zi j ) 2 2 E-step: given plane params, compute E[cij ] M-step: given expectations, compute {a j , j } Sebastian Thrun Stanford University CS223B Computer Vision Choosing the “Right” Number of Planes: AIC J=0 J=1 J=2 J=3 J=4 J=5 increased data likelihood increased prior probability log p( J | d ) const log p(d | J ) log p( J ) Sebastian Thrun Stanford University CS223B Computer Vision Determining Number of Surfaces Add Firstmodel model Prune E/M M-step E-Step Steps components model component * * Sebastian Thrun J =2 =1 =3 * Stanford University CS223B Computer Vision Layered Stereo Resulting sprite collection Sebastian Thrun Stanford University CS223B Computer Vision Layered Stereo Estimated depth map Sebastian Thrun Stanford University CS223B Computer Vision Stereo Vision: Outline Basic Equations Epipolar Geometry Image Rectification Reconstruction Correspondence Active Range Imaging Technology Dense and Layered Stereo Smoothing With Markov Random Fields Sebastian Thrun Stanford University CS223B Computer Vision Motivation and Goals James Diebel Sebastian Thrun Stanford University CS223B Computer Vision Motivation and Goals James Diebel Sebastian Thrun Stanford University CS223B Computer Vision Network of Constraints (Markov Random Field) Directions Vertex Node Edge Node Face Node Sebastian Thrun James Diebel Stanford University CS223B Computer Vision MRF Approach to Smoothing Potential function: contains a sensor-model term and a surface prior xi x0i i xi x0i j 1 n1 n2 T i j The edge potential is important! Minimize by conjugate gradient – Optimize systems with tens of thousands of parameters in just a couple seconds – Time to converge is O(N), between 0.7 sec (25,000 nodes in the MRF) and 25 sec (900,000 nodes) Diebel/Thrun, 2006 Sebastian Thrun Stanford University CS223B Computer Vision Possible Edge Potential Functions Sebastian Thrun Stanford University CS223B Computer Vision Results: Smoothing James Diebel Sebastian Thrun Stanford University CS223B Computer Vision Results: Smoothing James Diebel Sebastian Thrun Stanford University CS223B Computer Vision Results: Smoothing James Diebel Sebastian Thrun Stanford University CS223B Computer Vision Results: Smoothing James Diebel Sebastian Thrun Stanford University CS223B Computer Vision Movies… Movies in Windows Media Player Sebastian Thrun Stanford University CS223B Computer Vision Stereo Vision: Outline Basic Equations Epipolar Geometry Image Rectification Reconstruction Correspondence Active Range Imaging Technology Dense and Layered Stereo Smoothing With Markov Random Fields Sebastian Thrun Stanford University CS223B Computer Vision