Algorithms & Applications in Computer Vision Lihi Zelnik-Manor Lecture 9: Optical Flow Last Time: Alignment • • • • • • Homographies Rotational Panoramas RANSAC Global alignment Warping Blending Today: Motion and Flow • Motion estimation • Patch-based motion (optic flow) • Regularization and line processes • Parametric (global) motion • Layered motion models Optical Flow Where did each pixel in image 1 go to in image 2 Optical Flow Pierre Kornprobst's Demo Introduction • Given a video sequence with camera/objects moving we can better understand the scene if we find the motions of the camera/objects. Scene Interpretation • • • • • How is the camera moving? How many moving objects are there? Which directions are they moving in? How fast are they moving? Can we recognize their type of motion (e.g. walking, running, etc.)? Applications • Recover camera ego-motion. Result by MobilEye (www.mobileye.com) Applications • Motion segmentation Result by: L.Zelnik-Manor, M.Machline, M.Irani “Multi-body Segmentation: Revisiting Motion Consistency” To appear, IJCV Applications • Structure from Motion Input Reconstructed shape Result by: L. Zhang, B. Curless, A. Hertzmann, S.M. Seitz “Shape and motion under varying illumination: Unifying structure from motion, photometric stereo, and multi-view stereo” ICCV’03 Examples of Motion fields Forward motion Rotation Horizontal translation Closer objects appear to move faster!! Motion Field & Optical Flow Field • Motion Field = Real world 3D motion • Optical Flow Field = Projection of the motion field onto the 2d image CCD 2D optical flow vector u u , v 3D motion vector When does it break? The screen is stationary yet displays motion Homogeneous objects generate zero optical flow. Fixed sphere. Changing light source. Non-rigid texture motion The Optical Flow Field Still, in many cases it does work…. • Goal: Find for each pixel a velocity vector u u , v which says: – How quickly is the pixel moving across the image – In which direction it is moving How do we actually do that? Estimating Optical Flow • Assume the image intensity I is constant Time = t Time = t+dt x, y I x, y, t x dx, y dy I x dx , y dy , t dt Brightness Constancy Equation I x , y , t I x dx , y dy , t dt First order Taylor Expansion I x, y, t I x dx I y dy I t dt Simplify notations: Gradient and I x dx I y dy I t dt 0 flow are orthogonal Divide by dt and denote: dx dy u v dt dt I xu I y v I t Problem I: One equation, two unknowns Problem II: “The Aperture Problem” • For points on a line of fixed intensity we can only recover the normal flow Time t Time t+dt ? Where did the blue point move to? We need additional constraints Use Local Information Sometimes enlarging the aperture can help Local smoothness Lucas Kanade (1984) I xu I y v It I x u I y I t v Assume constant (u,v) in small neighborhood I x1 I y 1 I t1 u I x2 I y 2 It 2 v Au b Lucas Kanade (1984) Goal: Minimize Au b 2 Au b Method: Least-Squares T A Au A b T 2x2 2x1 T u A A 2x1 1 T A b How does Lucas-Kanade behave? T u A A 1 I x2 T A A I x I y T A b IxI y 2 Iy We want this matrix to be invertible. i.e., no zero eigenvalues How does Lucas-Kanade behave? • Edge A T A becomes singular I y , I x I x , I y I x2 I x I y I x I y I y I 2 y Ix 0 0 I y I is eigenvecto r with eigenvalue x 0 How does Lucas-Kanade behave? T • Homogeneous A A 0 0 eigenvalues I x , I y 0 How does Lucas-Kanade behave? • Textured regions two high eigenvalues I x , I y 0 How does Lucas-Kanade behave? • Edge A T A becomes singular • Homogeneous regions low gradients A A0 T • High texture Iterative Refinement • Estimate velocity at each pixel using one iteration of Lucas and Kanade estimation T u A A 1 T A b • Warp one image toward the other using the estimated flow field (easier said than done) • Refine estimate by repeating the process Optical Flow: Iterative Estimation estimate update Initial guess: u 0 0 Estimate: u u uˆ uˆ 1 x0 x 0 Optical Flow: Iterative Estimation f 1 ( x u1 ) estimate update Initial guess: u 1 Estimate: u u uˆ 2 1 uˆ x0 x Optical Flow: Iterative Estimation f1 ( x u 2 ) estimate update Initial guess: u 2 Estimate: u 3 u 2 uˆ uˆ x0 x Optical Flow: Iterative Estimation f1 ( x u 3 ) f 2 ( x ) x0 x Optical Flow: Iterative Estimation • Some Implementation Issues: – Warping is not easy (ensure that errors in warping are smaller than the estimate refinement) – Warp one image, take derivatives of the other so you don’t need to re-compute the gradient after each iteration. – Often useful to low-pass filter the images before motion estimation (for better derivative estimation, and linear approximations to image intensity) Other break-downs • Brightness constancy is not satisfied Correlation based methods • A point does not move like its neighbors – what is the ideal window size? Regularization based methods • The motion is not small (Taylor expansion doesn’t hold) • Aliasing Use multi-scale estimation Optical Flow: Aliasing Temporal aliasing causes ambiguities in optical flow because images can have many pixels with the same intensity. I.e., how do we know which ‘correspondence’ is correct? actual shift estimated shift nearest match is correct (no aliasing) nearest match is incorrect (aliasing) To overcome aliasing: coarse-to-fine estimation. Multi-Scale Flow Estimation u=1.25 pixels u=2.5 pixels u=5 pixels image Itt-1 Gaussian pyramid of image It u=10 pixels image I t+1 Gaussian pyramid of image It+1 Multi-Scale Flow Estimation run Lucas-Kanade warp & upsample run Lucas-Kanade . . . image Itt-1 Gaussian pyramid of image It image I t+1 Gaussian pyramid of image It+1 Other break-downs • Brightness constancy is not satisfied Correlation based methods • A point does not move like its neighbors – what is the ideal window size? Regularization based methods • The motion is not small (Taylor expansion doesn’t hold) • Aliasing Use multi-scale estimation Spatial Coherence Assumption * Neighboring points in the scene typically belong to the same surface and hence typically have similar motions. * Since they also project to nearby points in the image, we expect spatial coherence in image flow. Black Formalize this Idea Noisy 1D signal: u x Noisy measurements p(x) Black Regularization Find the “best fitting” smoothed function v(x) u q(x) x Noisy measurements p(x) Black Membrane model Find the “best fitting” smoothed function v(x) u q(x) x Black Membrane model Find the “best fitting” smoothed function q(x) u q(x) x Black Membrane model Find the “best fitting” smoothed function q(x) u q(x) Black Regularization u q(x) x N E (q ) Spatial smoothness assumption Faithful to the data Minimize: (q ( x) x 1 N 1 p ( x )) ( q ( x 1) q ( x )) 2 2 x 1 Black Bayesian Interpretation N 1 N E (v ) ( v ( x ) u ( x )) ( v ( x 1) v ( x )) 2 x 1 2 x 1 p ( v | u ) p (u | v ) p ( v ) u ( x) v( x) ~ N ( 0 , 1) v ( x ) v ( x 1) 2 2 ~ N ( 0 , 2 ) N p (u | v ) x 1 N 1 p (v ) x 1 1 2 1 exp( 1 2 2 1 2 ( u ( x ) v ( x )) / 1 ) exp( 2 1 2 2 ( v x ( x )) / 2 ) 2 2 Black Discontinuities u q(x) x What about this discontinuity? What is happening here? What can we do? Black Robust Estimation Noise distributions are often non-Gaussian, having much heavier tails. Noise samples from the tails are called outliers. • Sources of outliers (multiple motions): – specularities / highlights – jpeg artifacts / interlacing / motion blur – multiple motions (occlusion boundaries, transparency) u2 velocity space + + u1 Black Occlusion occlusion disocclusion shear Multiple motions within a finite region. Black Coherent Motion Possibly Gaussian. Black Multiple Motions Definitely not Gaussian. Black Weak membrane model u v(x) x N E (q, l ) x 1 N 1 2 2 ( q ( x ) p ( x )) l ( x )( q ( x 1) q ( x )) (1 l ( x )) x 1 l ( x ) { 0 ,1} Black Analog line process Penalty function N E (q, l ) x 1 Family of quadratics N 1 2 2 ( q ( x ) p ( x )) l ( x )( q ( x 1) q ( x )) ( l ( x )) x 1 0 l( x) 1 Black Analog line process Infimum defines a robust error function. Minima are the same: N E (q, l ) x 1 N 1 2 2 ( q ( x ) p ( x )) l ( x )( q ( x 1) q ( x )) ( l ( x )) x 1 N E (v ) x 1 N 1 ( q ( x ) p ( x )) ( q ( x 1) q ( x ), 2 ) 2 x 1 Black Least Squares Estimation Standard Least Squares Estimation allows too much influence for outlying points E (m) ( xi ) i ( xi ) ( xi m) 2 Influence ( x) ( xi m) x Robust Regularization u v(x) x Treat large spatial derivatives as outliers. Minimize: N E (v ) (q ( x) x 1 N 1 p ( x ), 1 ) ( q ( x 1) q ( x ), 2 ) x 1 Black Robust Estimation Problem: Least-squares estimators penalize deviations between data & model with quadratic error fn (extremely sensitive to outliers) error penalty function influence function Redescending error functions (e.g., Geman-McClure) help to reduce the influence of outlying measurements. error penalty function influence function Black Regularization Horn and Schunk (1981) Add global smoothness term Smoothness error: E s 2 ux 2 uy 2 vx 2 vy dx dy D Error in brightness constancy equation E c I x u I y v I t dx dy Minimize: 2 D Ec Es Solve by calculus of variations Robust Estimation Black & Anandan (1993) Regularization can over-smooth across edges Use “smarter” regularization Robust Estimation Black & Anandan (1993) Regularization can over-smooth across edges Use “smarter” regularization Minimize: 1 I x u I y v I t 2 u x , u y 2 v x , v y dx dy D Brightness constancy Smoothness Optimization • Gradient descent • Coarse-to-fine (pyramid) • Deterministic annealing Black Example Multi-resolution registration Optical Flow Results Optical Flow Results Parametric motion estimation Global (parametric) motion models • • • • 2D Models: Affine Quadratic Planar projective transform (Homography) • • • • 3D Models: Instantaneous camera motion models Homography+epipole Plane+Parallax Szeliski Motion models Translation Affine Perspective 3D rotation 2 unknowns 6 unknowns 8 unknowns 3 unknowns Szeliski Example: Affine Motion For panning camera or planar surfaces: u p1 p 2 x p 3 y v p 4 p5 x p6 y I x ( p1 p 2 x p 3 y ) I y ( p 4 p 5 x p 6 y ) I t I x Ixx Ix y Iy Iyx I y y p It Only 6 parameters to solve for Better results Least Square Minimization (over all pixels): Err ( a ) I x Ixx Ix y I y I y x I y y p I t 2 Last lecture: Alignment / motion warping • “Alignment”: Assuming we know the correspondences, how do we get the transformation? ( xi , y i ) e.g., affine model in abs. coords… x i m 1 yi m3 m 2 x i t1 m 4 y i t 2 ( x i , y i ) • Expressed in terms of absolute coordinates of corresponding points… • Generally presumed features separately detected in each frame Today: “flow”, “parametric motion” • Two views presumed in temporal sequence…track or analyze spatio-temporal gradient ( xi , y i ) ( x i , y i ) • Sparse or dense in first frame • Search in second frame • Motion models expressed in terms of position change Today: “flow”, “parametric motion” • Two views presumed in temporal sequence…track or analyze spatio-temporal gradient ( xi , y i ) ( x i , y i ) • Sparse or dense in first frame • Search in second frame • Motion models expressed in terms of position change Today: “flow”, “parametric motion” • Two views presumed in temporal sequence…track or analyze spatio-temporal gradient ( xi , y i ) ( x i , y i ) • Sparse or dense in first frame • Search in second frame • Motion models expressed in terms of position change Today: “flow”, “parametric motion” • Two views presumed in temporal sequence…track or analyze spatio-temporal gradient ( xi , y i ) (ui,vi) • Sparse or dense in first frame • Search in second frame • Motion models expressed in terms of position change Today: “flow”, “parametric motion” • Two views presumed in temporal sequence…track or analyze spatio-temporal gradient (ui,vi) • Sparse or dense in first frame • Search in second frame • Motion models expressed in terms of position change Today: “flow”, “parametric motion” • Two views presumed in temporal sequence…track or analyze spatio-temporal gradient (ui,vi) Previous Alignment model: x i m 1 yi m3 m 2 x i t1 m 4 y i t 2 Now, Displacement model: u i a 2 vi a5 u ( x , y ) a1 a 2 x a 3 y v( x, y ) a 4 a 5 x a 6 y a 3 x i a1 a6 yi a4 • Sparse or dense in first frame • Search in second frame • Motion models expressed in terms of position change Other 2D Motion Models Quadratic – instantaneous approximation to planar motion u q 1 q 2 x q 3 y q 7 x q 8 xy 2 v q 4 q 5 x q 6 y q 7 xy q 8 y x' Projective – exact planar motion y' 2 h1 h 2 x h 3 y h 7 h8 x h 9 y h 4 h5 x h6 y h 7 h8 x h 9 y and u x ' x , v y ' y Szeliski 3D Motion Models Instantaneous camera motion: Global parameters: X , Y , Z , T X , TY , T Z Local Parameter: Z ( x, y ) u xy X (1 x ) Y y Z (T X T Z x ) Z 2 v (1 y ) X xy Y x Z (TY T Z x ) Z 2 x' Homography+Epipole Global parameters: Local Parameter: h 7 x h8 y h 9 t 3 h 4 x h 5 y h 6 t1 h1 , , h9 , t1 , t 2 , t 3 y' ( x, y ) and : Residual Planar Parallax Motion Global parameters: Local Parameter: h1 x h 2 y h 3 t1 h 7 x h8 y h 9 t 3 u x ' x , u x x w t1 , t 2 , t 3 ( x, y ) v y x w 1 t3 1 t3 v y ' y ( t 3 x t1 ) (t 3 y t 2 ) Segmentation of Affine Motion = Input + Segmentation result Result by: L.Zelnik-Manor, M.Irani “Multi-frame estimation of planar motion”, PAMI 2000 Panoramas Input Motion estimation by Andrew Zisserman’s group Stabilization Result by: L.Zelnik-Manor, M.Irani “Multi-frame estimation of planar motion”, PAMI 2000 Discrete Search vs. Gradient Based • Consider image I translated by u0 , v0 I 0 ( x, y ) I ( x, y ) I1 ( x u0 , y v0 ) I ( x, y ) 1 ( x, y ) E (u , v) ( I ( x, y ) I1 ( x u , y v)) 2 x, y ( I ( x, y ) I ( x u0 u , y v0 v) 1 ( x, y )) 2 x, y • The discrete search method simply searches for the best estimate. • The gradient method linearizes the intensity function and solves for the estimate Szeliski Correlation and SSD • For larger displacements, do template matching – Define a small area around a pixel as the template – Match the template against each pixel within a search area in next image. – Use a match measure such as correlation, normalized correlation, or sum-of-squares difference – Choose the maximum (or minimum) as the match – Sub-pixel estimate (Lucas-Kanade) Szeliski Shi-Tomasi feature tracker 1. Find good features (min eigenvalue of 22 Hessian) 2. Use Lucas-Kanade to track with pure translation 3. Use affine registration with first feature patch 4. Terminate tracks whose dissimilarity gets too large 5. Start new tracks when needed Szeliski Tracking result How well do these techniques work? A Database and Evaluation Methodology for Optical Flow Simon Baker, Daniel Scharstein, J.P Lewis, Stefan Roth, Michael Black, and Richard Szeliski ICCV 2007 http://vision.middlebury.edu/flow/ Limitations of Yosemite Yosemite • Only sequence used for quantitative evaluation Image 7 • • • • Image 8 Ground-Truth Flow Flow Color Coding Limitations: Very simple and synthetic Small, rigid motion Minimal motion discontinuities/occlusions http://www.youtube.com/watch?v=0xRDFf7UtQE Szeliski Limitations of Yosemite Yosemite • Only sequence used for quantitative evaluation Image 7 • • • • • • Image 8 Ground-Truth Flow Flow Color Coding Current challenges: Non-rigid motion Real sensor noise Complex natural scenes Motion discontinuities Need more challenging and more realistic benchmarks Szeliski Realistic synthetic imagery Grove Rock • Randomly generate scenes with “trees” and “rocks” • Significant occlusions, motion, texture, and blur • Rendered using Mental Ray and “lens shader” plugin Motion estimation 89 Szeliski Modified stereo imagery Moebius Venus • Recrop and resample ground-truth stereo datasets to have appropriate motion for OF 90 Szeliski Dense flow with hidden texture • • • • Paint scene with textured fluorescent paint Take 2 images: One in visible light, one in UV light Move scene in very small steps using robot Generate ground-truth by tracking the UV images Visible UV Setup Lights Image Cropped Szeliski Experimental results http://vision.middlebury.edu/flow/ Motion estimation 92 Szeliski Conclusions • Difficulty: Data substantially more challenging than Yosemite • Diversity: Substantial variation in difficulty across the various datasets • Motion GT vs Interpolation: Best algorithms for one are not the best for the other • Comparison with Stereo: Performance of existing flow algorithms appears weak Szeliski Layered Scene Representations Motion representations • How can we describe this scene? Szeliski Block-based motion prediction • Break image up into square blocks • Estimate translation for each block • Use this to predict next frame, code difference (MPEG-2) Szeliski Layered motion • Break image sequence up into “layers”: • = • Describe each layer’s motion Szeliski Layered Representation For scenes with multiple affine motions Estimate dominant motion parameters Reject pixels which do not fit Convergence Restart on remaining pixels Layered motion • • • • • • • • Advantages: can represent occlusions / disocclusions each layer’s motion can be smooth video segmentation for semantic processing Difficulties: how do we determine the correct number? how do we assign pixels? how do we model the motion? Szeliski How do we form them? Szeliski How do we estimate the layers? 1. 2. 3. 4. 5. compute coarse-to-fine flow estimate affine motion in blocks (regression) cluster with k-means assign pixels to best fitting affine region re-estimate affine motions in each region… Szeliski Layer synthesis • • • • For each layer: stabilize the sequence with the affine motion compute median value at each pixel Determine occlusion relationships Szeliski Results Szeliski Recent results: SIFT Flow Recent results: SIFT Flow Recent GPU Implementation • http://gpu4vision.icg.tugraz.at/ • Real time flow exploiting robust norm + regularized mapping Today: Motion and Flow • • • • • Motion estimation Patch-based motion (optic flow) Regularization and line processes Parametric (global) motion Layered motion models Slide Credits • Trevor Darrell • Rick Szeliski • Michael Black