Distinctive Image Features from Scale-Invariant Keypoints

advertisement
Distinctive Image Features from
Scale-Invariant Keypoints
Ronnie Bajwa
Sameer Pawar
*
* Adapted from slides found online by Michael Kowalski, Lehigh University
Goal

Extract distinctive invariant features from an
image that can be used to perform reliable
object recognition and many other
applications.
Key requirements for good feature



Highly distinctive, with a low probability of mismatch
Should be easy to extract
Invariance:
- Scale and rotation
- change in illumination
- slight change in viewing angle
- image noise
- clutter and occlusion
Brief History

Moravec(1981):
–

Harris(1988):
–
–

Selects locations that has large gradients in all directions (corners)
Invariant to image rotation and to slight affine intensity change, but faces
problems for scale change.
Zhang(1995):
–

corner detector.
Introduced correlation window to match images in large range motion.
Schmid (1997):
–
–
–
Suggested use of local feature matching for image recognition.
Used rotationally invariant descriptor of the local image region.
Multiple feature matches accomplish recognition under occlusion & clutter.
Harris Corner Detector:
direction of the fastest
change
“Flat”
2 ~ 1~0
“Edge”
2 >> 1
direction of
the slowest
change
“Corner”
2 ~ 1 ~ Large
SIFT Algorithm Overview
Filtered approach
1.
Scale-space extrema detection
–
–
2.
Keypoint localization
–
–
3.
4.
Identify potential points: invariant to scale & orientation.
Difference-of-Gaussian function
Improve the estimate for location by fitting a quadratic
Extrema thresholded for filter out insignificant points.
Orientation Assignment
Orientation assigned to each keypoint and
neighboring pixels based on local gradient.
Keypoint Descriptor construction
–
Feature vector based on gradients of local neighborhood
Keypoint Selection: Scale space

We express the image at different scales by
filtering it with a Gaussian kernel
Keypoint Selection: why DoG’s?



Lindeberg(1994) and Mikolajczyk (2002) found that the maxima
and minima of the scaled Laplacian
provides the most
stable scale invariant features
We can use the scaled images to approximate this:
Efficient to compute
–
Smoothed images L needed later so D can be computed by
simple image subtraction
Back to picking keypoints
1.
2.
3.
4.
5.
Supersample original image
Compute smoothed images using different scales σ
for entire octave
Compute doG images from adjacent scales for
entire octave
Isolate keypoints in each octave by detecting
extrema in doG compared to neighboring pixels
Subsample image 2σ of current octave and repeat
process (2-3) for next octave
Visual representation of process
Visual representation of process (cont.)
A More Visual Representation
Original Image
Starting Image
A More Visual Representation (cont.)
First Octave of L images
A More Visual Representation (cont.)
Second Octave
Third Octave
Fourth Octave
A More Visual Representation (cont.)
First Octave difference-of-Gaussians
A More Visual Representation (cont.)
Scale space sampling

How many fine scales in every octave?

Extremas can be arbitrary close but very close ones are unstable.
After subsampling and before finding scaled images of the octave, prior
smoothing of 1.6 is done
To compensate the loss of higher spatial frequencies, original image is
doubled in size


Accurate Keypoint Localization



From difference-of-Gaussian local extrema
detection we obtain approximate values for
keypoints
Originally these approximations were used
directly
For an improvement in matching and stability
fitting to a 3D quadratic function is used
The Taylor Series Expansion

Take Taylor Series Expansion of scale-space function
D(x,y,σ)
–
Use up to quadratic termsx
^
DT
1 T 2D
D( x)  D
x x
x
x
2
x 2
–
–
–
origin shifted to sample point
x  ( x, y,  )T offset from this sample point
to find location of extremum, take derivative and set to 0
1
 2 D D
x 2
x
x
Thresholding Keypoints (part 1)

The function value at the extrema is used to reject
unstable extrema
–
–
Low contrast
Evaluate
1 DT
D( x)  D 
x
2 x
–
Absolute value less than 0.03 at extrema location results in
discarding of extrema
Thresholding Keypoints (part 2)

Difference-of-Gaussian function will be
strong along edges
–
–
–
Some locations along edges are poorly
determined and will become unstable when even
small amounts of noise are added
These locations will have a large principal
curvature across the edge but a small principal of
curvature perpendicular to the edge
Therefore we need to compute the principal
curvatures at the location and compare the two
Computing the Principal Curvatures

Hessian matrix
 Dxx
H 
 Dxy
–
–
Dxy 
Dyy 
The eigenvalues of H are proportional to principal curvatures
Tr (H)  Dxx Dyy   
  r
Det (H)  Dxx Dyy  ( Dxy )2  
We are not concerned about actual values of eigenvalue, just
the ratio of the two
Tr (H) 2 (   ) 2 (r  ) 2 (r  1) 2



2
Det (H)

r
r
Stages of keypoint selection
Assigning an Orientation


We finally have a keypoint that we are going
to keep
The next step is assigning an orientation for
the keypoint
–
Used in making the matching technique invariant
to rotation
Assigning an Orientation (cont.)


Gaussian smoothed image, L, with closest scale is chosen
(scale invariance)
Points in region around keypoint are selected and magnitude
and orientations of gradient are calculated
m( x, y )  ( L( x  1, y )  L( x  1, y )) 2  ( L( x, y  1)  L( x, y  1)) 2
 ( x, y)  tan 1 (( L( x, y  1)  L( x, y  1)) /( L( x  1, y)  L( x  1, y)))

Orientation histogram formed with 36 bins. Sample is added to
appropriate bin and weighted by gradient magnitude and
Gaussian-weighted circular window with a of σ 1.5 times scale
of keypoint
Assigning an Orientation (cont.)



Highest peak in orientation is found along
with any other peaks within 80% of highest
peak
3 closest histogram values to each peak are
used to interpolate (fit to a parabola) a better
accurate peak
As of now each keypoint has 4 dimesions: x
location, y location, scale, and orientation
Keypoint Descriptor




Calculated using a region around the keypoint as
opposed to directly from the keypoint for robustness
Like before, magnitudes and orientations are
calculated for points in region around keypoint using
L of nearest scale
To ensure orientation invariance the gradient
orientations and coordinates of descriptor are rotated
relative to orientation of keypoint
Provides invariance to changes in illumination and
3D camera viewpoint
Visual Representation of Keypoint Descriptor
Keypoint Descriptor (cont.)

Magnitude of each point is weighted with σ of
one half the width of the descriptor window
–
–

Stops sudden changes in descriptor due to small
changes in position of the window
Gives less weight to gradients far from keypoint
Samples are divided into 4x4 subregions
around keypoint. Allows for a change of up to
4 positions of a sample while still being
included in the same histogram
Keypoint Descriptor (cont.)

Avoiding boundary effects between histograms
–
Trilinear interpolation used to distribute value of gradient of
each sample into adjacent histogram bins


Weight equal to 1 – d , where d is the distance of a specific
sample to the center of a bin
Vector normalization
–
–
–
Done at the end to ensure invariance to illumination change
(affine)
Entire vector normalized to 1
To combat non-linear illumination (camera saturation)
changes values in feature vector are thresholded to no
larger than 0.2 and then the vector is normalized.
We now have features

Up to this point we have:
–
–
–
–
–

Found rough approximations for features by
looking at the difference-of-Gaussians
Localized the keypoint more accurately
Removed poor keypoints
Determined the orientation of a keypoint
Calculated a 128 feature vector for each keypoint
What do we do now?
Matching Keypoints between images

Tale of two images (or more)
–
–


One image is the training sample of what we are
looking for
The other image is the world picture that might
contain instances of the training sample
Both images have features associated with
them across different octaves
How do we match features between the two?
Matching Keypoints between images
(cont.)

Nearest Neighbor Algorithm(Euclidean Distance)
–
–
Independently match all keypoints in all octaves in
one image with all keypoints in all octaves in other
image
How to solve problem of features that have no
correct match with opposite image (i.e.
background noise, clutter)


Global threshold on distance (did not perform well)
Ratio of closest neighbor with second closest neighbor.
Matching Keypoints between images
(cont.)

Threshold at 0.8
–
–
Eliminates 90% of false matches
Eliminates less than 5% of correct matches
Efficient Nearest Neighbor indexing

In high dimensions no algorithms
–
k-dimensional tree: Is O(2klog n).
Efficient Nearest Neighbor indexing

Use an approximate algorithm called BestBin-First (feature space)
–
–
–
–
Bins are searched in order of their closest distance .
Heap-based priority queue.
Returns closest neighbor with high probability
Search cut off after searching the 200 nearest-neighbor
candidates

Works well since we only consider matches where nearest
neighbor is less than 0.8 times the distance to second nearest
neighbor.
Clustering -Hough Transform

Small \ Highly-occluded objects:
–
–
–

3 feature matches are sufficient.
99% outliers.
RANSAC doesn’t perform well due to large number of
outliers.
Hough Transform
–
–
–
Uses each feature to vote for all object poses consistent
with the feature
Predicts approx. model for similarity transformation.
Bin Sizes must be large to account for large error bounds



Orientation-30 degrees, Scale-of 2, Location-0.25.
Each keypoint votes to two closest bins in each dimension/
16 entries for each hypothesis (feature).
Model Verification-Affine Transformation

Geometric Verification of clusters
Results (Object Recognition)
Results (cont.)
Questions?
Download