Interest Points Detection CS485/685 Computer Vision Dr. George Bebis Interest Points • Local features associated with a significant change of an image property of several properties simultaneously (e.g., intensity, color, texture). Why Extract Interest Points? • Corresponding points (or features) between images enable the estimation of parameters describing geometric transforms between the images. What if we don’t know the correspondences? ? () feature descriptor ? = () feature descriptor • Need to compare feature descriptors of local patches surrounding interest points What if we don’t know the correspondences? ? () feature descriptor ? = () feature descriptor • Lots of possibilities (this is a popular research area) – Simple option: match square windows around the point – State of the art approach: SIFT • David Lowe, UBC http://www.cs.ubc.ca/~lowe/keypoints/ Invariance • Features should be detected despite geometric or photometric changes in the image. • Given two transformed versions of the same image, features should be detected in corresponding locations. How to achieve invariance? 1. Detector must be invariant to geometric and photometric transformations. 2. Descriptors must be invariant (if matching the descriptions is required). Applications • • • • • • Image alignment 3D reconstruction Object recognition Indexing and database retrieval Object tracking Robot navigation Example: Object Recognition occlusion, clutter Example: Panorama Stitching • How do we combine these two images? Panorama stitching (cont’d) Step 1: extract features Step 2: match features Panorama stitching (cont’d) Step 1: extract features Step 2: match features Step 3: align images What features should we use? Use features with gradients in at least two (significantly) different orientations e.g., corners What features should we use? (cont’d) (auto-correlation) Corners • Corners are easier to localize than lines when considering the correspondence problem (aperture problem). A point on a line is hard to match. t t+1 A corner is easier t t+1 Characteristics of good features • Repeatability – The same feature can be found in several images despite geometric and photometric transformations • Saliency – Each feature has a distinctive description • Compactness and efficiency – Many fewer features than image pixels • Locality – A feature occupies a relatively small area of the image; robust to clutter and occlusion Mains Steps in Corner Detection 1. For each pixel in the input image, the corner operator is applied to obtain a cornerness measure for this pixel. 2. Threshold cornerness map to eliminate weak corners. 3. Apply non-maximal suppression to eliminate points whose cornerness measure is not larger than the cornerness values of all points within a certain distance. Mains Steps in Corner Detection (cont’d) Corner Types Example of L-junction, Y-junction, T-junction, Arrow-junction, and X-junction corner types Corner Detection Methods • Contour based – Extract contours and search for maximal curvature or inflexion points along the contour. • Intensity based – Compute a measure that indicates the presence of an interest point directly from gray (or color) values. • Parametric model based – Fit parametric intensity model to the image. – Can provide sub-pixel accuracy but are limited to specific types of interest points (e.g., L-corners). A contour-based approach: Curvature Scale Space • Object has been segmented • Parametric contour representation: (x(t), y(t)) g(t,σ): Gaussian curvature Curvature Scale Space (cont’d) σ G. Bebis, G. Papadourakis and S. Orphanoudakis, "Curvature Scale Space Driven Object Recognition with an Indexing Scheme based on Artificial Neural Networks", Pattern Recognition., Vol. 32, No. 7, pp. 1175-1201, 1999. A parametric model approach: Zuniga-Haralick Detector • Approximate image function in the neighborhood of the pixel (i,j) by a cubic polynomial. (use SVD to find the coefficients) measure of "cornerness”: 2(k 22 k6 k 2 k3k5 k32 k 4 ) ZH (i, j ) (k 22 k32 )3 / 2 Corner Detection Using Edge Detection? • Edge detectors are not stable at corners. • Gradient is ambiguous at corner tip. • Discontinuity of gradient direction near corner. Corner Detection Using Intensity: Basic Idea • Image gradient has two or more dominant directions near a corner. • Shifting a window in any direction should give a large change in intensity. “flat” region: no change in all directions “edge”: no change along the edge direction “corner”: significant change in all directions Moravec Detector (1977) • Measure intensity variation at (x,y) by shifting a small window (3x3 or 5x5) by one pixel in each of the eight principle directions (horizontally, vertically, and four diagonals). Moravec Detector (1977) • Calculate intensity variation by taking the sum of squares of intensity differences of corresponding pixels in these two windows. 8 directions ∆x, ∆y in {-1,0,1} SW(-1,-1), SW(-1,0), ...SW(1,1) Moravec Detector (cont’d) • The “cornerness” of a pixel is the minimum intensity variation found over the eight shift directions: Cornerness(x,y) = min{SW(-1,-1), SW(-1,0), ...SW(1,1)} Cornerness Map (normalized) Note response to isolated points! Moravec Detector (cont’d) • Non-maximal suppression will yield the final corners. Moravec Detector (cont’d) • Does a reasonable job in finding the majority of true corners. • Edge points not in one of the eight principle directions will be assigned a relatively large cornerness value. Moravec Detector (cont’d) • The response is anisotropic as the intensity variation is only calculated at a discrete set of shifts (i.e., not rotationally invariant) Harris Detector • Improves the Moravec operator by avoiding the use of discrete directions and discrete shifts. • Uses a Gaussian window instead of a square window. or 1 in window, 0 outside Gaussian C.Harris and M.Stephens. "A Combined Corner and Edge Detector.“ Proceedings of the 4th Alvey Vision Conference: pages 147—151, 1988. Harris Detector (cont’d) • Using first-order Taylor expansion: Reminder: Taylor expansion f´(a) f´´(a) f ( x ) f(a) ( x a) ( x a)2 1! 2! f ( n ) (a ) ( x a)n O( x n 1 ) n! Harris Detector (cont’d) Since Harris Detector (cont’d) ( AW(x,y)= f ( xi , yi ) 2 ) x 2 x 2 matrix ( f ( xi , yi ) 2 ) y (Hessian or auto-correlation or second moment) Auto-correlation matrix f x2 AW x, y fx f y fx f y 2 f y Describes the gradient distribution (i.e., local structure) inside window! Does not depend on x, y Harris Detector (cont’d) • General case – use window function: SW (x, y ) w( xi , yi ) f ( xi , yi ) f ( xi x, yi y ) 2 xi , yi default window function w(x,y) : 1 in window, 0 outside 2 w ( x , y ) f x x, y AW w( x, y ) f x f y x, y w ( x , y ) f f x y f x2 x, y w( x, y ) x, y fx f y 2 w( x, y ) f y x, y fx f y f y2 Harris Detector (cont’d) • Harris uses a Gaussian window: w(x,y)=G(x,y,σI) where σI is called the “integration” scale window function w(x,y) : Gaussian 2 w ( x , y ) f x x, y AW w( x, y ) f x f y x, y w ( x , y ) f f x y f x2 x, y w( x, y ) x, y fx f y 2 w( x, y ) f y x, y fx f y f y2 Auto-correlation matrix (cont’d) Harris Detector (cont’d) Since M is symmetric, we have: 1 0 AW R R 0 2 1 We can visualize AW as an ellipse with axis lengths determined by the eigenvalues and orientation determined by R Ellipse equation: [x y ] AW x y const direction of the fastest change (min)-1/2 direction of the slowest change (max)-1/2 Harris Detector (cont’d) • Eigenvectors encode edge direction • Eigenvalues encode edge strength direction of the fastest change (min)-1/2 direction of the slowest change (max)-1/2 Distribution of fx and fy Distribution of fx and fy (cont’d) Harris Detector (cont’d) 2 “Edge” 2 >> 1 Classification of image points using eigenvalues of AW: 1 and 2 are small; SW is almost constant in all directions “Corner” 1 and 2 are large, 1 ~ 2 ; E increases in all directions “Flat” region “Edge” 1 >> 2 1 Harris Detector (cont’d) % 2 (assuming that 1 > 2) Harris Detector (cont’d) • To avoid eigenvalue computation, the following response function is used: R(A) = det(A) – k trace2(A) • It can be shown that: R(A) = λ1 λ2-k(λ1+ λ2)2 Harris Detector (cont’d) “Edge” R<0 R(A) = det(A) – k trace2(A) “Corner” R>0 k: is a const, usually between 0.04 and 0.06 |R| small “Flat” region “Edge” R<0 Harris Detector (cont’d) f ( xi , yi ) G ( x, y, D ) * f ( xi , yi ) x x f ( xi , yi ) G( x, y, D )* f ( xi , yi ) y y σD is called the “differentiation” scale Harris Detector - Example Compute corner response R Harris Detector - Example Find points with large corner response: R>threshold Harris Detector - Example Take only the points of local maxima of R Harris Detector - Example Map corners on the original image Harris Detector - Example Invariance • Is the Harris detector invariant to geometric and photometric changes? • Features should be detected despite geometric or photometric changes in the image. Geometric/Photometric Changes • Geometric – Rotation – Scale – Affine • Photometric – Affine intensity change: I(x,y) a I(x,y) + b Harris Detector: Rotation Invariance • Rotation Ellipse rotates but its shape (i.e. eigenvalues) remains the same Corner response R is invariant to image rotation Harris Detector: Photometric Changes • Affine intensity change Only derivatives are used => invariance to intensity shift I I + b Intensity scale: I(x,y) a I(x,y) R R threshold x (image coordinate) x (image coordinate) Partially invariant to affine intensity change Harris Detector: Scale Invariance • Scaling Corner All points will be classified as edges Not invariant to scaling Harris Detector: Advantages • Insensitive to: – – – – 2D shift and rotation Small illumination variations Small viewpoint changes Low computational requirements Harris Detector: Repeatability • In a comparative study of different interest point detectors, it was proven to be the most repeatable and most informative. Repeatability = # correspondences min(m1 , m2 ) best results: σI=1, σD=2 C. Schmid, R. Mohr, and C. Bauckhage, "Evaluation of Interest Point Detectors", International Journal of Computer Vision 37(2), 151.172, 2000. Harris Detector: Disadvantages • Sensitive to: – Scale change – Viewpoint change – Significant change in contrast How to handle scale changes? • AW must be adapted to scale changes. • If the scale change is known, we can adapt the Harris detector to the scale change (i.e., set σD). • What if the scale change is unknown? How to handle scale changes? (cont’d) • • • One solution is to extract interest points at multiple scales (i.e., vary σD) and use them all. But then, there will be many points representing the same structure, complicating matching! Also, point locations shift as scale increases. The size of the circle corresponds to the scale at which the point was detected How to handle scale changes? (cont’d) • Exhaustive search How to handle scale changes? (cont’d) • • • Alternatively, use scale selection to find the characteristic scale of each feature. Characteristic scale depends on the feature’s spatial extent (i.e., local neighborhood of pixels). Only a subset of the points computed in scale space are selected! Automatic Scale Selection • Main idea: select points at which a local measure F(x,σn) is maximal over scales. T. Lindeberg (1998). "Feature detection with automatic scale selection" International Journal of Computer Vision, 30 (2): pp 77--116. Lindeberg et al, 1996 Slide from Tinne Tuytelaars Automatic Scale Selection (cont’d) • Characteristic scale is relatively independent of the image scale. • The ratio of the scale values corresponding to the max values, is equal to the scale factor between the images. Scale factor: 2.5 Scale selection allows for finding spatial extend that is covariant with the image transformation. Invariant / Covariant • A function f( ) is invariant under some transformation T( ) if its value does change when the transformation is applied to its argument: if f(x) = y then f(T(x))=y • A function f( ) is covariant when it commutes with the transformation T( ): if f(x) = y then f(T(x))=T(f(x))=T(y) Automatic Scale Selection (cont’d) • What local measures could be used? – Should be rotation invariant – Should have one stable sharp peak Automatic Scale Selection (cont’d) • Typically, derivative-based functions: (s=σ) • Laplacian yielded best results. • DoG was second best … Recall: Edge detection Using 1st derivative f d g dx d f g dx Edge Derivative of Gaussian Edge = maximum of derivative Recall: Edge detection Using 2nd derivative f d2 g 2 dx d2 f 2g dx Edge Second derivative of Gaussian (Laplacian) Edge = zero crossing of second derivative From edges to blobs • Edge = ripple • Blob = superposition of two ripples maximum Spatial selection: the magnitude of the Laplacian response will achieve a maximum (absolute value) at the center of he blob, provided the scale of the Laplacian is “matched” to the scale of the blob (e.g, spatial extent) Scale selection • Find the characteristic scale of the blob by convolving it with Laplacian filters at several scales and looking for the maximum response. • However, Laplacian response decays as scale increases: original signal (radius=8) increasing σ Why does this happen? Scale normalization • The response of a derivative of Gaussian filter to a perfect step edge decreases as σ increases 1 2 Scale normalization (cont’d) • To keep response the same (scale-invariant), must multiply Gaussian derivative by σ • Laplacian is the second Gaussian derivative, so it must be multiplied by σ2 Effect of scale normalization Original signal Unnormalized Laplacian response Scale-normalized Laplacian response maximum Scale selection: case of circle • At what scale does the Laplacian achieve a maximum response for a binary circle of radius r? r image Laplacian Scale selection: case of circle (cont’d) r image r/ 2 Laplacian response • The Laplacian achieves a maximum at r/ 2 scale (σ) Spatial extent determination • Spatial extent is taken equal to the neighborhood used to localize the feature. • Spatial extent r can be determined from r / 2 characteristic scale Harris-Laplace Detector (1) Compute interest points at multiple scales using the Harris detector. - Scales are chosen as follows: σn=knσ0 - At each scale, choose local maxima assuming 3 x 3 window (sn= σn) = K. Mikolajczyk and C. Schmid (2001). “Indexing based on scale invariant interest points" Int Conference on Computer Vision, pp 525-531. Harris-Laplace Detector (cont’d) (2) Select points at which a local measure (i.e., normalized Laplacian) is maximal over scales. = (sn= σn) K. Mikolajczyk and C. Schmid (2001). “Indexing based on scale invariant interest points" Int Conference on Computer Vision, pp 525-531. Example • Points detected at each scale for two images that differ by a scale factor 1.92 – Few points detected in the same location but on different scales. – Many correspondences between levels for which scale ratio corresponds to the real scale change between images. Example The size of the circles corresponds to the scale at which the point was detected Harris-Laplace Detector (cont’d) • These points are invariant to: – Scale – Rotation – Translation • Robust to: – Illumination changes – Limited viewpoint changes Repeatability Efficient implementation using DoG • LoG can be approximated by DoG G( x, y, k ) G( x, y, ) (k 1) 22G • DoG already incorporates the σ2 scale normalization required for scale-invariance. Efficient implementation using DoG (cont’d) • Given a Gaussian-blurred image L( x, y, ) G( x, y, ) I ( x, y) the result of convolving an image with a differenceof Gaussian is given by: D( x, y, ) (G ( x, y, k ) G ( x, y, )) I ( x, y ) L( x, y, k ) L( x, y, ) Scale Space Construction • Convolve the image with Gaussian filters at different scales. Scale Space Construction (cont’d) • Generate difference-of- Gaussian images from the difference of adjacent blurred images. Scale Space Construction (cont’d) • The convolved images are grouped by octave (an octave corresponds to doubling the value of σ). – The value of k is selected so that we obtain a fixed number of blurred images per octave. – This also ensures that we obtain the same number of difference-of-Gaussian images per octave. Scale Space Construction (cont’d) 2k2σ 2kσ 2σ kσ σ David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp. 91-110, 2004. 2kσ 2σ kσ σ Harris-Laplace Detector (cont’d) • • Convolve image with scale-normalized Laplacian at several scales Find maxima of normaliz Laplacian response in scale-space Using 3 x 3 windows, compare the pixel at X to its 26 neighbors Affine Invariance • Similarly to characteristic scale selection, detect the characteristic shape of the local feature. Affine Adaptation 1. Detect initial region with Harris Laplace 2. Estimate affine shape with M 3. Normalize the affine region to a circular one 4. Re-detect the new location and scale in the normalized image 5. Go to step 2 if the eigenvalues of the M for the new point are not equal [detector not yet adapted to the characteristic shape]