EECS 274 Computer Vision Local Invariant Features :Local features • • • • • • Matching with Harris Detector Scale-invariant Feature Detection Scale Invariant Image Descriptors Affine-invariant Feature Detection Object Recognition SIFT Features • Reading: S Chapter 4 Examples Features What local features to use? Aperture problem Ambiguity of 1-dimensional motion perception Stripes moved left 5 pixels Stripes moved upward 6 pixels Introduction Local invariant photometric descriptors () local descriptor Local : robust to occlusion/clutter + no segmentation Photometric : distinctive Invariant : to image transformations + illumination changes History - Recognition Color histogram [Swain 91] Each pixel is described by a color vector r g b Distribution of color vectors is described by a histogram not robust to occlusion, not invariant, not distinctive History - Recognition Eigenimages [Turk 91] • Each face vector is represented in the eigenimage space – eigenvectors with the highest eigenvalues = eigenimages .. . . v2 v1 v3 • The new image is projected into the eigenimage space – determine the closest face not robust to occlusion, requires segmentation, not invariant, discriminant History - Recognition Geometric invariants [Rothwell 92] • Function with a value independent of the transformation f ( x, y) f ( x, y) where ( x, y)t T ( x, y)t • Invariant for image rotation : distance of two points • Invariant for planar homography : cross-ratio local and invariant, not discriminant, requires sub-pixel extraction of primitives History - Recognition Problems : occlusion, clutter, image transformations, distinctiveness Solution : recognition with local photometric invariants [ Local greyvalue invariants for image retrieval, C. Schmid and R. Mohr, PAMI 1997 ] Approach () local descriptor 1) Extraction of interest points (characteristic locations) 2) Computation of local descriptors 3) Determining correspondences 4) Selection of similar images Matching with interest points • Extraction of interest points with the Harris detector • Comparison of points with cross-correlation • Verification with the fundamental matrix Moravec corner detector • Developed for Stanford Cart in 1977 Moravec corner detector Change of intensity for the shift [u,v]: E (u , v) w( x, y ) I ( x u , y v) I ( x, y ) 2 x, y Window function Shifted intensity Intensity Four shifts: (u,v) = (1,0), (1,1), (0,1), (-1, 1) Look for local maxima in min(E) Problems of Moravec detector • Noisy response due to a binary window function • Only a set of shifts at every 45 degree is considered • Only minimum of E is taken into account Harris corner detector (1988) solves these problems. Harris detector Based on the idea of auto-correlation Important difference in all directions interest point Interest points Geometric features repeatable under transformations 2D characteristics of the signal high informational content Comparison of different detectors [Schmid98] Harris detector Harris detector Auto-correlation function for a point ( x, y ) and a shift (u, v) (x, y) E (u, v) w( x, y )( I ( x, y ) I ( x u, y v)) 2 x y Discrete shifts can be avoided with the auto-correlation matrix u with I ( x u, y v) I ( x, y ) ( I x ( x, y ) I y ( x, y )) v E (u , v) x u w ( x , y ) I ( x , y ) I ( x , y ) y y x v w( x, y ) * ( I x ( x, y )) 2 u v x , y w( x, y ) * I x ( x, y ) I y ( x, y ) x, y uT Au I x2 A w* I I x y IxI y 2 Iy 2 w( x, y) * I ( x, y) I ( x, y) u v w ( x , y )( I ( x , y )) x y x, y 2 y x, y Comparison of detectors: Rotation repeatability = #good matches/mean(#points) [Comparing and Evaluating Interest Points, Schmid, Mohr & Bauckhage, ICCV 98] Comparison of detectors: Perspective repeatability = #good matches/mean(#points) [Comparing and Evaluating Interest Points, Schmid, Mohr & Bauckhage, ICCV 98] Harris corner detector Intensity change in shifting window: eigenvalue analysis E (u) E (u, v) uT Au I x2 A w* I I x y IxI y 2 Iy 1, 2 – eigenvalues of A Ellipse E(u,v) = const direction of the slowest change direction of the fastest change (max)-1/2 (min)-1/2 Shi and Tomasi use min(1, 2) to locate good features to track uncertainty ellipse Harris corner detector Classification of image points using eigenvalues of M: 2 edge 2 >> 1 Corner 1 and 2 are large, 1 ~ 2; E increases in all directions 1 and 2 are small; E is almost constant in all directions flat edge 1 >> 2 1 Harris detection • Auto-correlation matrix – captures the structure of the local neighborhood – measure based on eigenvalues of this matrix • 2 strong eigenvalues => interest point • 1 strong eigenvalue => contour • 0 eigenvalue => uniform region • Interest point detection – threshold on the eigenvalues – local maximum for localization Harris corner detector Measure of corner response: R det( A) k (trace( A)) 2 12 k (1 2 ) 2 (k – empirical constant, k = 0.04-0.06) Example Example Example Good features Using auto-correlation or Hessian matrix Local descriptors () local descriptor Descriptors characterize the local neighborhood of a point Local jet Convolution of image I with Gaussian derivatives I ( x, y ) G ( ) I ( x, y ) Gx ( ) I ( x, y ) * G ( ) y v( x, y ) I ( x, y ) * Gxx ( ) I ( x, y ) * G ( ) xy I ( x, y ) * G yy ( ) I ( x, y ) G ( ) G( x, y) I ( x x, y y)dxdy ( x, y)t t G (( x, y) , ) exp( ) 2 2 2 2 1 2 N-Jet, local jet • Invariance to image rotation : differential invariants [Koen87] L L L L L L L L i i x x y y Lxx Lx Lx 2 Lxy Lx Ly Lyy Lyy Li Lij L j L L L ii xx yy Lxx Lxx 2 Lxy Lxy Lyy Lyy Lij Lij ( L L L L L L L L ) ij jkl i k l jkk i l l L L L L L LL L iij j k k ijk i j l L L L L ij jkl i k l Lijk Li L j Lk where ij is the antisymmet ric epsilon te nsor Local descriptors Robustness to illumination changes In case of an affine transformation I1 (x) aI 2 (x) b or normalization of the image patch with mean and variance Li Lij L j 3/ 2 ( L L ) i i Lii 1/ 2 ( Li Li ) Lij L ji Li Li ij ( L jkl Li Lk Ll L jkk Li Ll Ll ) ( Li Li ) 2 L L L L L LL L ) ijk i j k iij j k k ( Li Li ) 2 ij L jkl Li Lk Ll ( Li Li ) 2 Lijk Li L j Lk 2 ( L L ) i i Determining correspondences () ? = () Vector comparison using the Mahalanobis distance dist M (p, q) (p q)T 1 (p q) Selection of similar images • In a large database – voting algorithm – additional constraints • Rapid acces with an indexing mechanism Voting algorithm () vector of local characteristics Voting algorithm 2 1 1 1 0 I 1 is the corresponding model image Additional constraints • Semi-local constraints – neighboring points should match – angles, length ratios should be similar 1 1 2 3 2 3 • Global constraints – robust estimation of the image transformation (homogaphy, epipolar geometry) Results database with ~1000 images Results Harris detector Interest points extracted with Harris (~ 500 points) Cross-correlation matching Initial matches (188 pairs) Global constraints Robust estimation of the fundamental matrix 99 inliers 89 outliers Summary of Harris detector • Very good results in the presence of occlusion and clutter – local information – discriminant greyvalue information – invariance to image rotation and illumination • Not invariance to scale and affine changes • Solution for more general view point changes – local invariant descriptors to scale and rotation – extraction of invariant points and regions Scale Invariant Feature Detection • Consider two images of the same scene, related by a scale change (i.e. zooming) • How are their scale space representations related? Scale Space Theory (Lindeberg ’98): normalized derivative s, t / 2 x , t / 2 y ( x' , y ' ) ( sx, sy ) I ( x, y ) I ( x' , y ' ) t ' s 2t L' ( x' , y ' , t ' ) s m ( 1) L( x, y, t ) is a free parameter for the task at hand Normalized derivatives have the same value at corresponding relative scales Harris detector + scale changes Scale Adapted Harris Detector Many corresponding points at which the scale factor corresponds to scale change between images Scale invariant Harris points • Multi-scale extraction of Harris interest points • Selection of points at characteristic scale in scale space Laplacian Chacteristic scale : - maximum in scale space - scale invariant Scale invariant interest points multi-scale Harris points selection of points at the characteristic scale with Laplacian invariant points + associated regions [Mikolajczyk & Schmid’01] Harris-Laplacian Feature Harris detector – adaptation to scale Evaluation of scale invariant detectors repeatability – scale changes SIFT: Overview • 1999 • Generates image features, “keypoints” – invariant to image scaling and rotation – partially invariant to change in illumination and 3D camera viewpoint – many can be extracted from typical images – highly distinctive Algorithm overview 1. Scale-space extrema detection – Uses difference-of-Gaussian function 2. Keypoint localization – Sub-pixel location and scale fit to a model 3. Orientation assignment – 1 or more for each keypoint 4. Keypoint descriptor – Created from local image gradients Scale space • Definition: L( x, y, ) G( x, y, ) I ( x, y) 1 ( x 2 y 2 ) / 2 2 e where G( x, y, ) 2 2 Scale space • Keypoints are detected using scale-space extrema in difference-of-Gaussian function D • D definition: D( x, y, ) (G ( x, y, k ) G ( x, y, )) I ( x, y ) L( x, y, k ) L( x, y, ) • Efficient to compute Relationship of D to G 2 2 • Close approximation to scale-normalized Laplacian of Gaussian, 2 2G • • G Diffusion equation: 2G Approximate ∂G/∂σ: G G( x, y, k ) G( x, y, ) k – giving, G ( x, y, k ) G ( x, y, ) 2G k G( x, y, k ) G( x, y, ) (k 1) 2 2G • When D has scales differing by a constant factor it already incorporates the σ2 scale normalization required for scale-invariance • e.g., k 2 Scale space construction 2k2σ 2kσ 2σ kσ σ 2kσ 2σ kσ σ Scale space • A collection of images obtained by progressively smoothing the input image • Analogous to gradually reducing image resolution • See Vedaldi’s implementation (http://www.vlfeat.org/ ) for details • Discretized scales ( s, o) 0 2( s / S o ) , DoG( o,s ) I o ( o,s 1) I o ( o,s ) 0 1.6, s 0,, S 1, o omin ,, omin O 1 s : # of scales per octave, O : # of octaves S 3, O log 2 (min( I w , I h )) Scale space images … … … … … … … … first octave second octave third octave fourth octave Difference-of-Gaussian images … … … … … … … … first octave second octave third octave fourth octave Frequency of sampling • There is no minimum • Best frequency determined experimentally Prior smoothing for each octave • Increasing σ increases robustness, but costs • σ = 1.6 a good tradeoff • Doubling the image initially increases number of keypoints Finding extrema • Sample point D(x,y,σ) is selected only if it is a minimum or a maximum of these points Extrema in this image DoG scale space Localization • 3D quadratic function is fit to the local sample points • Start with Taylor expansion with sample point as the origin D() D D 12 D T 2 T 2 – where ( x, y, )T • Take the derivative with respect to X, and D D ˆ 0 X set it to 0, giving X X D D • ˆ is the location of the keypoint • This is a 3x3 linear system 2 2 2 1 2 Localization D 2 D ˆ 0 X 2 X X 2D 2 2 D y 2D x 2D y 2D y 2 2D yx 2D D x D 2D y yx y x D 2 D x x 2 • Hessian and derivative approximated by finite differences, – example: D Dki ,j1 Dki ,j1 2 2 D Dki ,j1 2 Dki , j Dki ,j1 2 1 2 D ( Dki 11, j Dki 11, j ) ( Dki 11, j Dki 11, j ) y 4 • If X is > 0.5 in any dimension, process repeated Filtering • Contrast (use prev. equation): – If |D(X)| < 0.03, throw it out ˆ ) D 1 D Xˆ D( 2 T • Edge-iness: – Use ratio of principal curvatures to throw out poorly defined peaks D D H D – Curvatures come from Hessian: D – Ratio of Trace(H)2 and Determinant(H) xx xy xy yy Tr ( H ) Dxx Dyy Tr ( H ) 2 ( ) 2 Det ( H ) Dxx Dyy ( Dxy ) , ratio Det ( H ) ratio > (r+1)2/(r), throw it out (SIFT uses 2 – If r=10) Orientation assignment • Descriptor computed relative to keypoint’s orientation achieves rotation invariance • Precomputed along with mag. for all levels (useful in descriptor computation) m( x, y ) ( L( x 1, y ) L( x 1, y )) 2 ( L( x, y 1) L( x, y 1)) 2 ( x, y ) a tan 2(( L( x, y 1) L( x, y 1)) /( L( x 1, y ) L( x 1, y ))) • Multiple orientations assigned to keypoints from an orientation histogram – Significantly improve stability of matching Keypoint images Keypoint Selection Finding extrema with DoG Removing |D(X)| < 0.03 Removing |D(X)| < 0.03 Select canonical orientation • Create histogram of local gradient directions computed at selected scale • Assign canonical orientation at peak of smoothed histogram • Each key specifies stable 2D coordinates (x, y, scale, orientation) Descriptor • Descriptor has 3 dimensions (x,y,θ) • Orientation histogram of gradient magnitudes • Position and orientation of each gradient sample rotated relative to keypoint orientation Descriptor • Weight magnitude of each sample point by Gaussian weighting function • Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects) Descriptor • Best results achieved with 4x4x8 = 128 descriptor size • Normalize to unit length – Reduces effect of illumination change • Cap each element to 0.2, normalize again – Reduces non-linear illumination changes – 0.2 determined experimentally Object detection • Create a database of keypoints from training images • Match keypoints to a database – Nearest neighbor search PCA-SIFT • • • • Different descriptor (same keypoints) Apply PCA to the gradient patch Descriptor size is 20 (instead of 128) More robust, faster SIFT: Summary • • • • • • Scale space Difference-of-Gaussian Localization Filtering Orientation assignment Descriptor, 128 elements Object recognition • Definition: Identify an object and determine its pose and model parameters • Commercial object recognition – $4 billion/year industry for inspection and assembly – Almost entirely based on template matching • Upcoming applications – Mobile robots, toys, user interfaces – Location recognition – Digital camera panoramas, 3D scene modeling Invariant local features • Image content => local feature coordinates invariant to translation, rotation, scale • Technical details regarding – – – – Keypoint matching (using 1st and 2nd nearest neighbors) Efficient nearest neighbor indexing Clustering with Hough transform (model location, orientation, scale) Account for affine distortion Features Advantages of invariant local features • Locality: features are local, so robust to occlusion and clutter (no prior segmentation) • Distinctiveness: individual features can be matched to a large database of objects • Quantity: many features can be generated for even small objects • Efficiency: close to real-time performance Experimental evaluation Scale change (factor 2.5) Harris-Laplace DoG Viewpoint change (60 degrees) Harris-Affine (Harris-Laplace) Descriptors - conclusion • SIFT + steerable perform best • Performance of the descriptor independent of the detector • Errors due to imprecision in region estimation, localization Image retrieval … > 5000 images change in viewing angle Matches 22 correct matches Image retrieval … > 5000 images change in viewing angle + scale change Matches 33 correct matches Multiple panoramas from an unordered image set Image registration and blending Location recognition Robot localization • Joint work with Stephen Se, Jim Little Recognizing panoramas • Matthew Brown and David Lowe • Recognize overlap from an unordered set of images and automatically stitch together • SIFT features provide initial feature matching • Image blending at multiple scales hides the seams Panorama automatically assembled from 143 images Map continuously built over time Planar recognition • Planar surfaces can be reliably recognized at a rotation of 60° away from the camera • Affine fit approximates perspective projection • Only 3 points are needed for recognition 3D object recognition • Extract outlines with background subtraction 3D object recognition • Only 3 keys are needed for recognition, so extra keys provide robustness • Affine model is no longer as accurate Recognition under occlusion Comparison to template matching • Costs of template matching – 250,000 locations x 30 orientations x 4 scales = 30,000,000 evaluations – Does not easily handle partial occlusion and other variation without large increase in template numbers – Viola & Jones cascade must start again for each qualitatively different template • Costs of local feature approach – 3000 evaluations (reduction by factor of 10,000) – Features are more invariant to illumination, 3D rotation, and object variation – Use of many small subtemplates increases robustness to partial occlusion and other variations More local features • • • • • • Harris Laplace Harris Affine Hessian detector Hessian Laplace Hessian Affine MSER (Maximally Stable Extremal Regions) • SURF (Speeded-Up Robust Feature)