776 Computer Vision Jan-Michael Frahm Spring 2012 Feature point extraction homogeneous edge corner Find points for which the following is maximum i.e. maximize smallest eigenvalue of M Comparing image regions Compare intensities pixel-by-pixel I(x,y) I´(x,y) Dissimilarity measures Sum of Square Differences Comparing image regions Compare intensities pixel-by-pixel I(x,y) I´(x,y) Similarity measures Zero-mean Normalized Cross Correlation Feature point extraction • Approximate SSD for small displacement Δ o Image difference, square difference for pixel • SSD for window Harris corner detector • Use small local window: • Maximize „cornerness“: • Only use local maxima, subpixel accuracy through second order surface fitting • Select strongest features over whole image and over each tile (e.g. 1000/image, 2/tile) Simple matching • for each corner in image 1 find the corner in image 2 that is most similar (using SSD or NCC) and vice-versa • Only compare geometrically compatible points • Keep mutual best matches What transformations does this work for? Feature matching: example 3 3 2 4 1 5 0.96 -0.40 -0.16 -0.39 0.19 -0.05 0.75 -0.47 0.51 0.72 -0.18 -0.39 0.73 0.15 -0.75 -0.27 0.49 0.16 0.79 0.21 0.08 0.50 -0.45 0.28 0.99 2 4 1 What transformations does this work for? What level of transformation do we need? 5 Feature tracking • Identify features and track them over video o Small difference between frames o potential large difference overall • Standard approach: KLT (Kanade-Lukas-Tomasi) Feature Tracking Establish correspondences between identical salient points multiple images 11 Good features to track • Use the same window in feature selection as for tracking itself • Compute motion assuming it is small differentiate: Affine is also possible, but a bit harder (6x6 in stead of 2x2) Example Simple displacement is sufficient between consecutive frames, but not to compare to reference template Example Synthetic example Good features to keep tracking Perform affine alignment between first and last frame Stop tracking features with too large errors Optical flow • Brightness constancy assumption (small motion) • 1D example possibility for iterative refinement Optical flow • Brightness constancy assumption (small motion) • 2D example the “aperture” problem (1 constraint) (2 unknowns) ? isophote I(t+1)=I isophote I(t)=I The Aperture Problem Let M I I T • Algorithm: At each pixel compute and I x I t b I I y t by solvingU MU b • M is singular if all gradient vectors point in the same direction • e.g., along an edge • of course, trivially singular if the summation is over a single pixel or there is no texture • i.e., only normal flow is available (aperture problem) • Corners and textured areas are OK Motion estimation Slide credit: S. Seitz, R. Szeliski Optical flow • How to deal with aperture problem? (3 constraints if color gradients are different) Assume neighbors have same displacement Slide credit: S. Seitz, R. Szeliski SSD Surface – Textured area Motion estimation 21 Slide credit: S. Seitz, R. Szeliski SSD Surface -- Edge Motion estimation 22 Slide credit: S. Seitz, R. Szeliski SSD – homogeneous area Motion estimation 23 Slide credit: S. Seitz, R. Szeliski Lucas-Kanade Assume neighbors have same displacement least-squares: Revisiting the small motion assumption • Is this motion small enough? o Probably not—it’s much larger than one pixel (2nd order terms dominate) o How might we solve this problem? * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003 Reduce the resolution! * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003 Coarse-to-fine optical flow estimation slides from Bradsky and Thrun u=1.25 pixels u=2.5 pixels u=5 pixels image It-1 u=10 pixels Gaussian pyramid of image It-1 image I Gaussian pyramid of image I estimation slides from Bradsky and Thrun run iterative L-K warp & upsample run iterative L-K . . . image IJt-1 Gaussian pyramid of image It-1 image I Gaussian pyramid of image I Gain-Adaptive KLT-Tracking dxdx 1 1 dydy 1 1 = = λ K c dx dy Video gain Video with with fixed auto-gain • Data parallel implementation on GPU [Sinha, Frahm, Pollefeys, Genc MVA'0 • Simultaneous tracking and radiometric calibration [Kim, Frahm, Pollefeys ICCV07] - But: not data parallel – hard for GPU acceleration • Block-Jacobi iterations [Zach, Gallup, Frahm CVGPU’08] 29 + Data parallel, very efficient on GPU Gain Estimation [Zach, Gallup, Frahm CVGPU08] • Camera reported (blue) and estimated gains (red) 30 Limits of the gradient method Fails when intensity structure in window is poor Fails when the displacement is large (typical operating range is motion of 1 pixel) Linearization of brightness is suitable only for small displacements • Also, brightness is not strictly constant in images actually less problematic than it appears, since we can pre-filter images to make them look similar Slide credit: S. Seitz, R. Szeliski Limitations of Yosemite Yosemite • Only sequence used for quantitative evaluation • • • • Image 7 Image 8 Ground-Truth Limitations: Flow Very simple and synthetic Small, rigid motion Minimal motion discontinuities/occlusions Flow Color Coding Slide credit: S. Seitz, R. Szeliski Limitations of Yosemite Yosemite • Only sequence used for quantitative evaluation • • • • • • Image 7 Image 8 Ground-Truth Flow Flow Color Coding Current challenges: Non-rigid motion Real sensor noise Complex natural scenes Motion discontinuities Need more challenging and more realistic benchmarks Slide credit: S. Seitz, R. Szeliski Realistic synthetic imagery Grove Rock • Randomly generate scenes with “trees” and “rocks” • Significant occlusions, motion, texture, and blur • Rendered using Mental Ray and “lens shader” plugin Motion estimation 34 Slide credit: S. Seitz, R. Szeliski Modified stereo imagery Moebius Venus • Recrop and resample ground-truth stereo datasets to have appropriate motion for OF Motion estimation 35 Slide credit: S. Seitz, R. Szeliski Dense flow with hidden texture • • • • Paint scene with textured fluorescent paint Take 2 images: One in visible light, one in UV light Move scene in very small steps using robot Generate ground-truth by tracking the UV images Visible UV Setup Lights Image Cropped Slide credit: S. Seitz, R. Szeliski Experimental results • Algorithms: • Pyramid LK: OpenCV-based implementation of Lucas-Kanade on a Gaussian pyramid • Black and Anandan: Author’s implementation • Bruhn et al.: Our implementation • MediaPlayerTM: Code used for video frame-rate upsampling in Microsoft MediaPlayer • Zitnick et al.: Author’s implementation Slide credit: S. Seitz, R. Szeliski Experimental results Motion estimation 38 Slide credit: S. Seitz, R. Szeliski Conclusions • Difficulty: Data substantially more challenging than Yosemite • Diversity: Substantial variation in difficulty across the various datasets • Motion GT vs Interpolation: Best algorithms for one are not the best for the other • Comparison with Stereo: Performance of existing flow algorithms appears weak Slide credit: S. Seitz, R. Szeliski Motion representations • How can we describe this scene? Slide credit: S. Seitz, R. Szeliski Block-based motion prediction • Break image up into square blocks • Estimate translation for each block • Use this to predict next frame, code difference (MPEG-2) Slide credit: S. Seitz, R. Szeliski Layered motion • Break image sequence up into “layers”: • = • Describe each layer’s motion Slide credit: S. Seitz, R. Szeliski Layered motion • • • • • • • • Advantages: can represent occlusions / disocclusions each layer’s motion can be smooth video segmentation for semantic processing Difficulties: how do we determine the correct number? how do we assign pixels? how do we model the motion? Slide credit: S. Seitz, R. Szeliski Layers for video summarization Motion estimation 44 Slide credit: S. Seitz, R. Szeliski Background modeling (MPEG-4) • Convert masked images into a background sprite for layered video coding + • • + + = Slide credit: S. Seitz, R. Szeliski What are layers? • [Wang & Adelson, 1994] • intensities • alphas • velocities Slide credit: S. Seitz, R. Szeliski How do we form them? Slide credit: S. Seitz, R. Szeliski How do we estimate the layers? 1. 2. 3. 4. 5. compute coarse-to-fine flow estimate affine motion in blocks (regression) cluster with k-means assign pixels to best fitting affine region re-estimate affine motions in each region… Slide credit: S. Seitz, R. Szeliski Layer synthesis • • • • For each layer: stabilize the sequence with the affine motion compute median value at each pixel Determine occlusion relationships Slide credit: S. Seitz, R. Szeliski Results Slide credit: S. Seitz, R. Szeliski Fitting Fitting • We’ve learned how to detect edges, corners, blobs. Now what? • We would like to form a higher-level, more compact representation of the features in the image by grouping multiple features according to a simple model 9300 Harris Corners Pkwy, Charlotte, NC Slide credit: S. Lazebnik Fitting • Choose a parametric model to represent a set of features simple model: lines simple model: circles complicated model: car Source: K. Grauman Fitting: Issues Case study: Line detection • Noise in the measured feature locations • Extraneous data: clutter (outliers), multiple lines • Missing data: occlusions Slide credit: S. Lazebnik Fitting: Overview • If we know which points belong to the line, how do we find the “optimal” line parameters? o Least squares • What if there are outliers? o Robust fitting, RANSAC • What if there are many lines? o Voting methods: RANSAC, Hough transform • What if we’re not even sure it’s a line? o Model selection Slide credit: S. Lazebnik Least squares line fitting •Data: (x1, y1), …, (xn, yn) •Line equation: yi = m xi + b •Find (m, b) to minimize y=mx+b (xi, yi) E i 1 ( yi m xi b) 2 n y1 Y yn x1 1 X xn 1 m B b E Y XB (Y XB)T (Y XB) Y T Y 2( XB)T Y ( XB)T ( XB) 2 dE 2 X T XB 2 X T Y 0 dB X T XB X T Y Normal equations: least squares solution to XB=Y Slide credit: S. Lazebnik Problem with “vertical” least squares • Not rotation-invariant • Fails completely for vertical lines Slide credit: S. Lazebnik