Object recognition using improved image feature

advertisement
Object recognition using improved image feature extraction and matching method
Djalalov M.M. (SUE «UNICON.UZ»), Radjabov T.D. (TUIT)
In this paper, improvement of SIFT algorithm for image feature extraction
and matching is presented. We propose new method for image matching step of
SIFT algorithm which is little faster for object recognition. For evaluation, series of
simulations and experiments are conducted, where our Improved-SIFT algorithm
showed better results.
Мазкур ишда тасвирларни таснифини экстракциялаш ва таққослаш
учун такомиллашган SIFT алгоритми кўрсатилган. Биз SIFT алгоритми учун
тасвирларни таққослашни янги усулини таклиф этяпмиз. Бу усулни
бахолаш мақсадида қатор синовлар хам олиб борилди ва таклиф
қилинаётган усул объектларни аниқлашда тезроқ ва тўғрироқ натижаларни
кўрсатди.
В данной работе представлен усовершенствованный SIFT алгоритм
для экстракции и сравнения характеристик изображений. Мы предлагаем
новый метод для сравнения изображений алгоритма SIFT, который немного
быстрее для распознавания объектов. Для оценки была проведена серия
моделирования и экспериментов, где наш усовершенствованный SIFT
алгоритм показал лучшие результаты
1. Introduction
Image matching is a fundamental
aspect of many problems in computer vision, including object or scene recognition,
solving for 3D structure from multiple images, stereo correspondence, and motion
tracking. This paper describes image features that have many properties that make
them suitable for matching differing images
of an object or scene. The features are invariant to image scaling and rotation, and
partially invariant to change in illumination
and 3D camera viewpoint. They are well localized in both the spatial and frequency
domains, reducing the probability of disruption by occlusion, clutter, or noise. Large
numbers of features can be extracted from
typical images with efficient algorithms. In
addition, the features are highly distinctive,
which allows a single feature to be correctly
matched with high probability against a large
database of features, providing a basis for
object and scene recognition [1-3].
2. Original Scale Invariant Feature
Transform (SIFT) algorithm
Scale Invariant Feature Transform
(SIFT) was first presented by Lowe [4]. The
SIFT algorithm takes an image and transforms it into a collection of local feature vectors. Each of these feature vectors is supposed to be distinctive and invariant to any
scaling, rotation or translation of the image.
First, the feature locations are determined as the local extrema of Difference
of Gaussians (DOG pyramid) as given by
(3). To build the DOG pyramid the input image is convolved iteratively with a Gaussian
kernel (2). This procedure is repeated as
long as the down-sampling is possible. Each
collection of images of the same size is
called an octave. All octaves build together
the so-called Gaussian pyramid by (1),
which is represented by a 3D function L(x, y,
σ):
L ( x, y ,  )  G ( x , y ,  )  I ( x, y )
G ( x, y ,  ) 
1
2 2
e ( x
2
 y2 ) / 2
(1)
(2)
D( x, y,  )  (G ( x, y,  )  G ( x, y,  ))  I ( x, y )  L( x, y, k )  L( x, y,  )
(3)
The local extrema (maxima or minima) of DOG function are detected by comparing each
pixel with its 26 neighbours in the scale-space (8 neighbours in the same scale, 9 corresponding
neighbours in the scale above and 9 in the scale below). The search for for extrema excludes
the first and the last image in each octave because they do not have a scale above and a scale
below respectively. To increase the number of extracted features the input image is doubled before it is treated by SIFT algorithm, which however increases the computational time significantly.
Scale-space extrema detection produces too many keypoint candidates, some of which
are unstable. The next step in the algorithm is to perform a detailed fit to the nearby data for accurate location, scale, and ratio of principal curvatures. This information allows points to be rejected that have low contrast (and are therefore sensitive to noise) or are poorly localized along
an edge.
Figure 1. Diagram showing the blurred images at different scales, and the computation of the
difference-of-Gaussian images.
For each candidate keypoint, interpolation of nearby data is used to accurately determine
its position. The initial approach was to just locate each keypoint at the location and scale of the
candidate keypoint. The new approach calculates the interpolated location of the extremum,
which substantially improves matching and stability. The interpolation is done using the quadratic Taylor expansion of the Difference-of-Gaussian scale-space function, D(x, y, σ) with the candidate keypoint as the origin. This Taylor expansion is given by (4):
D( x)  D 
D T
1  2D
x  xT
x,
x
2
x 2
(4)
where D and its derivatives are evaluated at the candidate keypoint and x=(x, y, σ) is the offset
from this point.
Next step, each keypoint is assigned one or more orientations based on local image gradient directions. This is the key step in achieving invariance to rotation as the keypoint descriptor
can be represented relative to this orientation and therefore achieve invariance to image rotation. First, the Gaussian-smoothed image L(x, y, σ) at the keypoint's scale σ is taken so that all
computations are performed in a scale-invariant manner. For an image sample L(x, y) at scale σ,
the gradient magnitude, m(x, y), and orientation, θ(x, y), are precomputed using pixel differences
as (5) and (6):
m( x, y )  ( L( x  1, y )  L( x  1, y )) 2  ( L( x, y  1)  L( x, y  1))
(5)
 L( x, y  1)  L( x, y  1) 

 L( x  1, y )  L( x  1, y ) 
 ( x, y )  tan 1 
(6)
Previous steps found keypoint locations at particular scales and assigned orientations to
them. This ensured invariance to image location, scale and rotation. Now we need to compute a
descriptor vector for each keypoint such that the descriptor is highly distinctive and partially invariant to the remaining variations such as illumination, 3D viewpoint, etc.
3. Improved SIFT algorithm
From the algorithm description given above, it is evident that in general, the SIFTalgorithm can be understood as a local image operator which takes an input image and
transforms it into a collection of local features.
The feature matching between SIFT descriptors of two images includes the comp utation of the Euclidean distance between each descriptor of the first image and ea ch descriptor of the second image in Euclidean space [5]. According to the Nearest Neighborhood procedure for each a i feature in the model image feature set the corresponding feature bi must be looked for in the test image feature set. The corresponding feature is one
with the smallest Euclidean distance to the feature ai. A pair of corresponding features (ai,
b i) is called a match M(a i, b i) [6].
In the case where the distance ratio of nearest neighbor’s Euclidean distance to
second nearest neighbor’s Euclidean distance exceeds a predefined threshold the
matched feature are discarded.
Euclidean distance means the distance of keypoints in feature space, given by (7).
All keyponts (features) from two images are transformed into multi-dimensional space
based on their gradients, orientations, magnitude, locations, brightness etc. Each feature
in feature space represents feature vector:
D(a, b)  (a1  b1 ) 2  (a 2  b2 ) 2  ...  (a n  bn ) 2 
# features
 (a
i 1
i
 bi ) 2
(7)
where, D(a, b) stands for the Euclidean distance between feature vector a and vector b,
and the matched-points will be eliminated if D(a, b) is larger than the set threshold.
Calculating the distance between all feature points is computationally expensive. We su ggest to find dot products of these feature vectors (8), which is much faster and more robust
rather than finding distance. Because distance between features can be similar and mi smatching may occur, but angle is always different. Dot product is found and i nverse cosine
taken between feature vectors as (9):
n
a  b   ai bi  a1b1  a 2 b2  ...  a n bn
(8)
i 1
 a b
 a b
  arccos

,


(9)
Check if nearest neighbor has angle less than predefined ratio:
  pred.ratio .
(10)
In previous SIFT they only compare the nearest neighbor distance to other distances and take the least value. In Improved-SIFT we compare angles between feature vec-
tors. We also use distance ratio for “Outlier rejection” to reduce false matching and take
only positive values:
A1  A2  DisRatio .
(11)
4. Simulation result
In order to verify weather our proposing method working or not, we conduct series of
simulations for object recognition using template matching method. We did our simulations for
matching on different conditions when images are Scaled, Rotated, Shifted and etc. We simulated using Matlab program on Pentium Dual running at 2.20 GHz PC. The algorithms were tested
using 20 images in different cases. We modified original SIFT matching Matlab code into Improved-SIFT with dot product and Outlier Rejection values.
In Figure 2 we illustrated results for scaled image matching case.
a)
b)
c)
Figure 2. a) Previous SIFT; b) Improved-SIFT; c) After Outlier Rejection. Simulation results for
matching of two images where the 2nd image is scaled one of 1st image.
In the first image (Fig.2a), previous SIFT was shown and there were many mismatching.
In the second (Fig.2b) image our Improved-SIFT was shown and there were less mismatching
compared to previous one, because we calculate dot product. The third image (Fig.2c) is the result of using Outlier Rejection for our Improved -SIFT where all mismatches are removed and
correct matching points are displayed.
In Figure 3 the results for scaled and rotated image matching case is illustrated.
a)
b)
c)
Figure 3. a) Previous SIFT; b) Improved-SIFT; c) After Outlier Rejection. Simulation results for
matching of two images where the 2nd image is rotated and scaled one of 1st image.
Also in the scaled and rotated case there were same results as before. Our proposing
method gives better matching result than existing SIFT.
4. Conclusion
SIFT is very famous algorithm for
image feature extraction but in image
matching sometimes it doesn’t work well. In
this paper we proposed Improved-SIFT feature extraction and matching method for object recognition concept. As it was seen in
simulation results our method recognizes
and matches images more accurately than
previous SIFT. In the future we will use our
Improved-SIFT method in many other fields
like Human face and facial expression
recognition, Panoramic image stitching etc.
References
1. Liang
Cheng,Jianya
Gong,
Xiaoxia Yang, Robust Affine Invariant Feature Extraction for Image Matching, IEEE
Geoscience and Remote Sensing Letters,
April 2008.
2. Liang Cheng, A new method for
remote sensing image matching by integrating affine invariant feature extraction and
RANSAC, Image and Signal Processing
(CISP), 2010 3rd International Congress, p.:
1605 – 1609, 2010.
3. Madzin, H., Zainuddin, R., Feature Extraction and Image Matching of 3D
Lung Cancer Cell Image, Soft Computing
and Pattern Recognition, 2009. SOCPAR
'09. International Conference, 2009.
4. David G. Lowe, "Distinctive image features from scale-invariant keypoints,"
International Journal of Computer Vision,
60, 2 (2004), pp. 91-110.
5. R. Jiayuan, W. Yigang, D. Yun,
Study on eliminating wrong match pairs of
SIFT, Signal Processing (ICSP), 2010 IEEE
10th International Conference, p.: 992 –
995, 2010.
6. Omercevic, D.;
Drbohlav, O.;
Leonardis, A, High-Dimensional Feature
Matching: Employing the Concept of Meaningful Nearest Neighbors, Computer Vision,
2007. ICCV 2007. IEEE 11th International
Conference, p.: 1 – 8, Oct. 200.
Шеэ
It’s advisable
to read
An Introduction to Object Recognition: Selected Algorithms for a Wide
Variety of Applications
Author: Marco Treiber
Publisher: Springer, 2010
The book presents an overview of the diverse applications for object recognition (OR) and highlights important algorithm classes, presenting representative
example algorithms for each class. The presentation of each algorithm describes
the basic algorithm flow in detail, complete with graphical illustrations. Pseudocode implementations are also included for many of the methods, and definitions
are supplied for terms which may be unfamiliar to the novice reader. Supporting a
clear and intuitive tutorial style, the usage of mathematics is kept to a minimum.
Download