Invariant features

EECS 274 Computer Vision Local Invariant Features :Local features • • • • • • Matching with Harris Detector Scale-invariant Feature Detection Scale Invariant Image Descriptors Affine-invariant Feature Detection Object Recognition SIFT Features • Reading: S Chapter 4 Examples Features What local features to use? Aperture problem Ambiguity of 1-dimensional motion perception Stripes moved left 5 pixels Stripes moved upward 6 pixels Introduction Local invariant photometric descriptors () local descriptor Local : robust to occlusion/clutter + no segmentation Photometric : distinctive Invariant : to image transformations + illumination changes History - Recognition Color histogram [Swain 91] Each pixel is described by a color vector r   g b   Distribution of color vectors is described by a histogram         not robust to occlusion, not invariant, not distinctive     History - Recognition Eigenimages [Turk 91] • Each face vector is represented in the eigenimage space – eigenvectors with the highest eigenvalues = eigenimages .. . . v2 v1 v3 • The new image is projected into the eigenimage space – determine the closest face not robust to occlusion, requires segmentation, not invariant, discriminant History - Recognition Geometric invariants [Rothwell 92] • Function with a value independent of the transformation f ( x, y)  f ( x, y) where ( x, y)t  T ( x, y)t • Invariant for image rotation : distance of two points • Invariant for planar homography : cross-ratio  local and invariant, not discriminant, requires sub-pixel extraction of primitives History - Recognition Problems : occlusion, clutter, image transformations, distinctiveness Solution : recognition with local photometric invariants [ Local greyvalue invariants for image retrieval, C. Schmid and R. Mohr, PAMI 1997 ] Approach () local descriptor 1) Extraction of interest points (characteristic locations) 2) Computation of local descriptors 3) Determining correspondences 4) Selection of similar images Matching with interest points • Extraction of interest points with the Harris detector • Comparison of points with cross-correlation • Verification with the fundamental matrix Moravec corner detector • Developed for Stanford Cart in 1977 Moravec corner detector Change of intensity for the shift [u,v]: E (u , v)   w( x, y )  I ( x  u , y  v)  I ( x, y )  2 x, y Window function Shifted intensity Intensity Four shifts: (u,v) = (1,0), (1,1), (0,1), (-1, 1) Look for local maxima in min(E) Problems of Moravec detector • Noisy response due to a binary window function • Only a set of shifts at every 45 degree is considered • Only minimum of E is taken into account Harris corner detector (1988) solves these problems. Harris detector Based on the idea of auto-correlation Important difference in all directions  interest point Interest points Geometric features repeatable under transformations 2D characteristics of the signal high informational content Comparison of different detectors [Schmid98] Harris detector Harris detector Auto-correlation function for a point ( x, y ) and a shift (u, v)  (x, y) E (u, v)   w( x, y )( I ( x, y )  I ( x  u, y  v)) 2 x y Discrete shifts can be avoided with the auto-correlation matrix u  with I ( x  u, y  v)  I ( x, y )  ( I x ( x, y ) I y ( x, y ))  v E (u , v)   x  u        w ( x , y ) I ( x , y ) I ( x , y ) y y  x  v     w( x, y ) * ( I x ( x, y )) 2   u v  x , y w( x, y ) * I x ( x, y ) I y ( x, y )  x, y  uT Au  I x2 A  w* I I  x y IxI y   2  Iy  2  w( x, y) * I ( x, y) I ( x, y) u   v  w ( x , y )( I ( x , y ))    x y x, y 2 y x, y  Comparison of detectors: Rotation repeatability = #good matches/mean(#points) [Comparing and Evaluating Interest Points, Schmid, Mohr & Bauckhage, ICCV 98] Comparison of detectors: Perspective repeatability = #good matches/mean(#points) [Comparing and Evaluating Interest Points, Schmid, Mohr & Bauckhage, ICCV 98] Harris corner detector Intensity change in shifting window: eigenvalue analysis E (u)  E (u, v)  uT Au  I x2 A  w* I I  x y IxI y   2  Iy  1, 2 – eigenvalues of A Ellipse E(u,v) = const direction of the slowest change direction of the fastest change (max)-1/2 (min)-1/2 Shi and Tomasi use min(1, 2) to locate good features to track uncertainty ellipse Harris corner detector Classification of image points using eigenvalues of M: 2 edge 2 >> 1 Corner 1 and 2 are large, 1 ~ 2; E increases in all directions 1 and 2 are small; E is almost constant in all directions flat edge 1 >> 2 1 Harris detection • Auto-correlation matrix – captures the structure of the local neighborhood – measure based on eigenvalues of this matrix • 2 strong eigenvalues => interest point • 1 strong eigenvalue => contour • 0 eigenvalue => uniform region • Interest point detection – threshold on the eigenvalues – local maximum for localization Harris corner detector Measure of corner response: R  det( A)  k (trace( A)) 2  12  k (1  2 ) 2 (k – empirical constant, k = 0.04-0.06) Example Example Example Good features Using auto-correlation or Hessian matrix Local descriptors () local descriptor Descriptors characterize the local neighborhood of a point Local jet Convolution of image I with Gaussian derivatives  I ( x, y )  G ( )     I ( x, y )  Gx ( )   I ( x, y ) * G ( )  y   v( x, y )   I ( x, y ) * Gxx ( )   I ( x, y ) * G ( )  xy    I ( x, y ) * G yy ( )       I ( x, y )  G ( )      G( x, y) I ( x  x, y  y)dxdy    ( x, y)t t G (( x, y) ,  )  exp(  ) 2 2 2 2 1 2 N-Jet, local jet • Invariance to image rotation : differential invariants [Koen87] L L         L L L L  L L i i x x y y        Lxx Lx Lx  2 Lxy Lx Ly  Lyy Lyy  Li Lij L j     L L  L ii xx yy         Lxx Lxx  2 Lxy Lxy  Lyy Lyy  Lij Lij      ( L L L L  L L L L )   ij jkl i k l   jkk i l l   L L L L L LL L     iij j k k ijk i j l       L L L L      ij jkl i k l     Lijk Li L j Lk      where  ij is the antisymmet ric epsilon te nsor Local descriptors Robustness to illumination changes In case of an affine transformation I1 (x)  aI 2 (x)  b or normalization of the image patch with mean and variance Li Lij L j     3/ 2 ( L L ) i i     Lii   1/ 2   ( Li Li )     Lij L ji     Li Li       ij ( L jkl Li Lk Ll  L jkk Li Ll Ll )    ( Li Li ) 2    L L L L L LL L )  ijk i j k  iij j k k    ( Li Li ) 2       ij L jkl Li Lk Ll   ( Li Li ) 2       Lijk Li L j Lk   2 ( L L )   i i   Determining correspondences () ? = () Vector comparison using the Mahalanobis distance dist M (p, q)  (p  q)T 1 (p  q) Selection of similar images • In a large database – voting algorithm – additional constraints • Rapid acces with an indexing mechanism Voting algorithm () vector of local characteristics Voting algorithm 2 1 1 1 0 I 1 is the corresponding model image Additional constraints • Semi-local constraints – neighboring points should match – angles, length ratios should be similar 1 1 2 3 2 3 • Global constraints – robust estimation of the image transformation (homogaphy, epipolar geometry) Results database with ~1000 images Results Harris detector Interest points extracted with Harris (~ 500 points) Cross-correlation matching Initial matches (188 pairs) Global constraints Robust estimation of the fundamental matrix 99 inliers 89 outliers Summary of Harris detector • Very good results in the presence of occlusion and clutter – local information – discriminant greyvalue information – invariance to image rotation and illumination • Not invariance to scale and affine changes • Solution for more general view point changes – local invariant descriptors to scale and rotation – extraction of invariant points and regions Scale Invariant Feature Detection • Consider two images of the same scene, related by a scale change (i.e. zooming) • How are their scale space representations related? Scale Space Theory (Lindeberg ’98):   normalized derivative s,    t  / 2  x ,   t  / 2  y ( x' , y ' )  ( sx, sy ) I ( x, y )  I ( x' , y ' ) t '  s 2t L' ( x' , y ' , t ' )  s m ( 1) L( x, y, t )  is a free parameter for the task at hand Normalized derivatives have the same value at corresponding relative scales Harris detector + scale changes Scale Adapted Harris Detector Many corresponding points at which the scale factor corresponds to scale change between images Scale invariant Harris points • Multi-scale extraction of Harris interest points • Selection of points at characteristic scale in scale space Laplacian Chacteristic scale : - maximum in scale space - scale invariant Scale invariant interest points multi-scale Harris points selection of points at the characteristic scale with Laplacian invariant points + associated regions [Mikolajczyk & Schmid’01] Harris-Laplacian Feature Harris detector – adaptation to scale Evaluation of scale invariant detectors repeatability – scale changes SIFT: Overview • 1999 • Generates image features, “keypoints” – invariant to image scaling and rotation – partially invariant to change in illumination and 3D camera viewpoint – many can be extracted from typical images – highly distinctive Algorithm overview 1. Scale-space extrema detection – Uses difference-of-Gaussian function 2. Keypoint localization – Sub-pixel location and scale fit to a model 3. Orientation assignment – 1 or more for each keypoint 4. Keypoint descriptor – Created from local image gradients Scale space • Definition: L( x, y,  )  G( x, y,  )  I ( x, y) 1  ( x 2  y 2 ) / 2 2 e where G( x, y,  )  2 2 Scale space • Keypoints are detected using scale-space extrema in difference-of-Gaussian function D • D definition: D( x, y,  )  (G ( x, y, k )  G ( x, y,  ))  I ( x, y )  L( x, y, k )  L( x, y,  ) • Efficient to compute Relationship of D to  G 2 2 • Close approximation to scale-normalized Laplacian of Gaussian,  2 2G • • G Diffusion equation:    2G Approximate ∂G/∂σ: G  G( x, y, k )  G( x, y, )  k   – giving, G ( x, y, k )  G ( x, y,  )   2G k   G( x, y, k )  G( x, y,  )  (k  1) 2 2G • When D has scales differing by a constant factor it already incorporates the σ2 scale normalization required for scale-invariance • e.g., k  2 Scale space construction 2k2σ 2kσ 2σ kσ σ 2kσ 2σ kσ σ Scale space • A collection of images obtained by progressively smoothing the input image • Analogous to gradually reducing image resolution • See Vedaldi’s implementation (http://www.vlfeat.org/ ) for details • Discretized scales  ( s, o)   0  2( s / S o ) , DoG( o,s )  I o ( o,s 1)  I o ( o,s )  0  1.6, s  0,, S  1, o  omin ,, omin  O  1 s : # of scales per octave, O : # of octaves S  3, O  log 2 (min( I w , I h )) Scale space images … … … … … … … … first octave second octave third octave fourth octave Difference-of-Gaussian images … … … … … … … … first octave second octave third octave fourth octave Frequency of sampling • There is no minimum • Best frequency determined experimentally Prior smoothing for each octave • Increasing σ increases robustness, but costs • σ = 1.6 a good tradeoff • Doubling the image initially increases number of keypoints Finding extrema • Sample point D(x,y,σ) is selected only if it is a minimum or a maximum of these points Extrema in this image DoG scale space Localization • 3D quadratic function is fit to the local sample points • Start with Taylor expansion with sample point as the origin D()  D  D   12  D  T 2 T 2 – where   ( x, y,  )T • Take the derivative with respect to X, and D  D ˆ 0   X set it to 0, giving X X  D  D • ˆ     is the location of the keypoint • This is a 3x3 linear system 2 2 2 1 2 Localization D  2 D ˆ 0  X 2 X X 2D   2  2  D  y 2D   x 2D y 2D y 2 2D yx 2D   D      x     D  2D   y       yx  y   x   D   2 D     x  x 2  • Hessian and derivative approximated by finite differences, – example: D Dki ,j1  Dki ,j1   2  2 D Dki ,j1  2 Dki , j  Dki ,j1   2 1  2 D ( Dki 11, j  Dki 11, j )  ( Dki 11, j  Dki 11, j )  y 4 • If X is > 0.5 in any dimension, process repeated Filtering • Contrast (use prev. equation): – If |D(X)| < 0.03, throw it out ˆ )  D  1 D Xˆ D(  2  T • Edge-iness: – Use ratio of principal curvatures to throw out poorly defined peaks D  D H  D  – Curvatures come from Hessian: D   – Ratio of Trace(H)2 and Determinant(H) xx xy xy yy Tr ( H )  Dxx  Dyy     Tr ( H ) 2 (   ) 2 Det ( H )  Dxx Dyy  ( Dxy )   , ratio   Det ( H )  ratio > (r+1)2/(r), throw it out (SIFT uses 2 – If r=10) Orientation assignment • Descriptor computed relative to keypoint’s orientation achieves rotation invariance • Precomputed along with mag. for all levels (useful in descriptor computation) m( x, y )  ( L( x  1, y )  L( x  1, y )) 2  ( L( x, y  1)  L( x, y  1)) 2  ( x, y )  a tan 2(( L( x, y  1)  L( x, y  1)) /( L( x  1, y )  L( x  1, y ))) • Multiple orientations assigned to keypoints from an orientation histogram – Significantly improve stability of matching Keypoint images Keypoint Selection Finding extrema with DoG Removing |D(X)| < 0.03 Removing |D(X)| < 0.03 Select canonical orientation • Create histogram of local gradient directions computed at selected scale • Assign canonical orientation at peak of smoothed histogram • Each key specifies stable 2D coordinates (x, y, scale, orientation) Descriptor • Descriptor has 3 dimensions (x,y,θ) • Orientation histogram of gradient magnitudes • Position and orientation of each gradient sample rotated relative to keypoint orientation Descriptor • Weight magnitude of each sample point by Gaussian weighting function • Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects) Descriptor • Best results achieved with 4x4x8 = 128 descriptor size • Normalize to unit length – Reduces effect of illumination change • Cap each element to 0.2, normalize again – Reduces non-linear illumination changes – 0.2 determined experimentally Object detection • Create a database of keypoints from training images • Match keypoints to a database – Nearest neighbor search PCA-SIFT • • • • Different descriptor (same keypoints) Apply PCA to the gradient patch Descriptor size is 20 (instead of 128) More robust, faster SIFT: Summary • • • • • • Scale space Difference-of-Gaussian Localization Filtering Orientation assignment Descriptor, 128 elements Object recognition • Definition: Identify an object and determine its pose and model parameters • Commercial object recognition – $4 billion/year industry for inspection and assembly – Almost entirely based on template matching • Upcoming applications – Mobile robots, toys, user interfaces – Location recognition – Digital camera panoramas, 3D scene modeling Invariant local features • Image content => local feature coordinates invariant to translation, rotation, scale • Technical details regarding – – – – Keypoint matching (using 1st and 2nd nearest neighbors) Efficient nearest neighbor indexing Clustering with Hough transform (model location, orientation, scale) Account for affine distortion Features Advantages of invariant local features • Locality: features are local, so robust to occlusion and clutter (no prior segmentation) • Distinctiveness: individual features can be matched to a large database of objects • Quantity: many features can be generated for even small objects • Efficiency: close to real-time performance Experimental evaluation Scale change (factor 2.5) Harris-Laplace DoG Viewpoint change (60 degrees) Harris-Affine (Harris-Laplace) Descriptors - conclusion • SIFT + steerable perform best • Performance of the descriptor independent of the detector • Errors due to imprecision in region estimation, localization Image retrieval … > 5000 images change in viewing angle Matches 22 correct matches Image retrieval … > 5000 images change in viewing angle + scale change Matches 33 correct matches Multiple panoramas from an unordered image set Image registration and blending Location recognition Robot localization • Joint work with Stephen Se, Jim Little Recognizing panoramas • Matthew Brown and David Lowe • Recognize overlap from an unordered set of images and automatically stitch together • SIFT features provide initial feature matching • Image blending at multiple scales hides the seams Panorama automatically assembled from 143 images Map continuously built over time Planar recognition • Planar surfaces can be reliably recognized at a rotation of 60° away from the camera • Affine fit approximates perspective projection • Only 3 points are needed for recognition 3D object recognition • Extract outlines with background subtraction 3D object recognition • Only 3 keys are needed for recognition, so extra keys provide robustness • Affine model is no longer as accurate Recognition under occlusion Comparison to template matching • Costs of template matching – 250,000 locations x 30 orientations x 4 scales = 30,000,000 evaluations – Does not easily handle partial occlusion and other variation without large increase in template numbers – Viola & Jones cascade must start again for each qualitatively different template • Costs of local feature approach – 3000 evaluations (reduction by factor of 10,000) – Features are more invariant to illumination, 3D rotation, and object variation – Use of many small subtemplates increases robustness to partial occlusion and other variations More local features • • • • • • Harris Laplace Harris Affine Hessian detector Hessian Laplace Hessian Affine MSER (Maximally Stable Extremal Regions) • SURF (Speeded-Up Robust Feature)

Invariant features

Related documents

Products

Support

Invariant features

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib