Object recognition using improved image feature extraction and matching method Djalalov M.M. (SUE «UNICON.UZ»), Radjabov T.D. (TUIT) In this paper, improvement of SIFT algorithm for image feature extraction and matching is presented. We propose new method for image matching step of SIFT algorithm which is little faster for object recognition. For evaluation, series of simulations and experiments are conducted, where our Improved-SIFT algorithm showed better results. Мазкур ишда тасвирларни таснифини экстракциялаш ва таққослаш учун такомиллашган SIFT алгоритми кўрсатилган. Биз SIFT алгоритми учун тасвирларни таққослашни янги усулини таклиф этяпмиз. Бу усулни бахолаш мақсадида қатор синовлар хам олиб борилди ва таклиф қилинаётган усул объектларни аниқлашда тезроқ ва тўғрироқ натижаларни кўрсатди. В данной работе представлен усовершенствованный SIFT алгоритм для экстракции и сравнения характеристик изображений. Мы предлагаем новый метод для сравнения изображений алгоритма SIFT, который немного быстрее для распознавания объектов. Для оценки была проведена серия моделирования и экспериментов, где наш усовершенствованный SIFT алгоритм показал лучшие результаты 1. Introduction Image matching is a fundamental aspect of many problems in computer vision, including object or scene recognition, solving for 3D structure from multiple images, stereo correspondence, and motion tracking. This paper describes image features that have many properties that make them suitable for matching differing images of an object or scene. The features are invariant to image scaling and rotation, and partially invariant to change in illumination and 3D camera viewpoint. They are well localized in both the spatial and frequency domains, reducing the probability of disruption by occlusion, clutter, or noise. Large numbers of features can be extracted from typical images with efficient algorithms. In addition, the features are highly distinctive, which allows a single feature to be correctly matched with high probability against a large database of features, providing a basis for object and scene recognition [1-3]. 2. Original Scale Invariant Feature Transform (SIFT) algorithm Scale Invariant Feature Transform (SIFT) was first presented by Lowe [4]. The SIFT algorithm takes an image and transforms it into a collection of local feature vectors. Each of these feature vectors is supposed to be distinctive and invariant to any scaling, rotation or translation of the image. First, the feature locations are determined as the local extrema of Difference of Gaussians (DOG pyramid) as given by (3). To build the DOG pyramid the input image is convolved iteratively with a Gaussian kernel (2). This procedure is repeated as long as the down-sampling is possible. Each collection of images of the same size is called an octave. All octaves build together the so-called Gaussian pyramid by (1), which is represented by a 3D function L(x, y, σ): L ( x, y , ) G ( x , y , ) I ( x, y ) G ( x, y , ) 1 2 2 e ( x 2 y2 ) / 2 (1) (2) D( x, y, ) (G ( x, y, ) G ( x, y, )) I ( x, y ) L( x, y, k ) L( x, y, ) (3) The local extrema (maxima or minima) of DOG function are detected by comparing each pixel with its 26 neighbours in the scale-space (8 neighbours in the same scale, 9 corresponding neighbours in the scale above and 9 in the scale below). The search for for extrema excludes the first and the last image in each octave because they do not have a scale above and a scale below respectively. To increase the number of extracted features the input image is doubled before it is treated by SIFT algorithm, which however increases the computational time significantly. Scale-space extrema detection produces too many keypoint candidates, some of which are unstable. The next step in the algorithm is to perform a detailed fit to the nearby data for accurate location, scale, and ratio of principal curvatures. This information allows points to be rejected that have low contrast (and are therefore sensitive to noise) or are poorly localized along an edge. Figure 1. Diagram showing the blurred images at different scales, and the computation of the difference-of-Gaussian images. For each candidate keypoint, interpolation of nearby data is used to accurately determine its position. The initial approach was to just locate each keypoint at the location and scale of the candidate keypoint. The new approach calculates the interpolated location of the extremum, which substantially improves matching and stability. The interpolation is done using the quadratic Taylor expansion of the Difference-of-Gaussian scale-space function, D(x, y, σ) with the candidate keypoint as the origin. This Taylor expansion is given by (4): D( x) D D T 1 2D x xT x, x 2 x 2 (4) where D and its derivatives are evaluated at the candidate keypoint and x=(x, y, σ) is the offset from this point. Next step, each keypoint is assigned one or more orientations based on local image gradient directions. This is the key step in achieving invariance to rotation as the keypoint descriptor can be represented relative to this orientation and therefore achieve invariance to image rotation. First, the Gaussian-smoothed image L(x, y, σ) at the keypoint's scale σ is taken so that all computations are performed in a scale-invariant manner. For an image sample L(x, y) at scale σ, the gradient magnitude, m(x, y), and orientation, θ(x, y), are precomputed using pixel differences as (5) and (6): m( x, y ) ( L( x 1, y ) L( x 1, y )) 2 ( L( x, y 1) L( x, y 1)) (5) L( x, y 1) L( x, y 1) L( x 1, y ) L( x 1, y ) ( x, y ) tan 1 (6) Previous steps found keypoint locations at particular scales and assigned orientations to them. This ensured invariance to image location, scale and rotation. Now we need to compute a descriptor vector for each keypoint such that the descriptor is highly distinctive and partially invariant to the remaining variations such as illumination, 3D viewpoint, etc. 3. Improved SIFT algorithm From the algorithm description given above, it is evident that in general, the SIFTalgorithm can be understood as a local image operator which takes an input image and transforms it into a collection of local features. The feature matching between SIFT descriptors of two images includes the comp utation of the Euclidean distance between each descriptor of the first image and ea ch descriptor of the second image in Euclidean space [5]. According to the Nearest Neighborhood procedure for each a i feature in the model image feature set the corresponding feature bi must be looked for in the test image feature set. The corresponding feature is one with the smallest Euclidean distance to the feature ai. A pair of corresponding features (ai, b i) is called a match M(a i, b i) [6]. In the case where the distance ratio of nearest neighbor’s Euclidean distance to second nearest neighbor’s Euclidean distance exceeds a predefined threshold the matched feature are discarded. Euclidean distance means the distance of keypoints in feature space, given by (7). All keyponts (features) from two images are transformed into multi-dimensional space based on their gradients, orientations, magnitude, locations, brightness etc. Each feature in feature space represents feature vector: D(a, b) (a1 b1 ) 2 (a 2 b2 ) 2 ... (a n bn ) 2 # features (a i 1 i bi ) 2 (7) where, D(a, b) stands for the Euclidean distance between feature vector a and vector b, and the matched-points will be eliminated if D(a, b) is larger than the set threshold. Calculating the distance between all feature points is computationally expensive. We su ggest to find dot products of these feature vectors (8), which is much faster and more robust rather than finding distance. Because distance between features can be similar and mi smatching may occur, but angle is always different. Dot product is found and i nverse cosine taken between feature vectors as (9): n a b ai bi a1b1 a 2 b2 ... a n bn (8) i 1 a b a b arccos , (9) Check if nearest neighbor has angle less than predefined ratio: pred.ratio . (10) In previous SIFT they only compare the nearest neighbor distance to other distances and take the least value. In Improved-SIFT we compare angles between feature vec- tors. We also use distance ratio for “Outlier rejection” to reduce false matching and take only positive values: A1 A2 DisRatio . (11) 4. Simulation result In order to verify weather our proposing method working or not, we conduct series of simulations for object recognition using template matching method. We did our simulations for matching on different conditions when images are Scaled, Rotated, Shifted and etc. We simulated using Matlab program on Pentium Dual running at 2.20 GHz PC. The algorithms were tested using 20 images in different cases. We modified original SIFT matching Matlab code into Improved-SIFT with dot product and Outlier Rejection values. In Figure 2 we illustrated results for scaled image matching case. a) b) c) Figure 2. a) Previous SIFT; b) Improved-SIFT; c) After Outlier Rejection. Simulation results for matching of two images where the 2nd image is scaled one of 1st image. In the first image (Fig.2a), previous SIFT was shown and there were many mismatching. In the second (Fig.2b) image our Improved-SIFT was shown and there were less mismatching compared to previous one, because we calculate dot product. The third image (Fig.2c) is the result of using Outlier Rejection for our Improved -SIFT where all mismatches are removed and correct matching points are displayed. In Figure 3 the results for scaled and rotated image matching case is illustrated. a) b) c) Figure 3. a) Previous SIFT; b) Improved-SIFT; c) After Outlier Rejection. Simulation results for matching of two images where the 2nd image is rotated and scaled one of 1st image. Also in the scaled and rotated case there were same results as before. Our proposing method gives better matching result than existing SIFT. 4. Conclusion SIFT is very famous algorithm for image feature extraction but in image matching sometimes it doesn’t work well. In this paper we proposed Improved-SIFT feature extraction and matching method for object recognition concept. As it was seen in simulation results our method recognizes and matches images more accurately than previous SIFT. In the future we will use our Improved-SIFT method in many other fields like Human face and facial expression recognition, Panoramic image stitching etc. References 1. Liang Cheng,Jianya Gong, Xiaoxia Yang, Robust Affine Invariant Feature Extraction for Image Matching, IEEE Geoscience and Remote Sensing Letters, April 2008. 2. Liang Cheng, A new method for remote sensing image matching by integrating affine invariant feature extraction and RANSAC, Image and Signal Processing (CISP), 2010 3rd International Congress, p.: 1605 – 1609, 2010. 3. Madzin, H., Zainuddin, R., Feature Extraction and Image Matching of 3D Lung Cancer Cell Image, Soft Computing and Pattern Recognition, 2009. SOCPAR '09. International Conference, 2009. 4. David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110. 5. R. Jiayuan, W. Yigang, D. Yun, Study on eliminating wrong match pairs of SIFT, Signal Processing (ICSP), 2010 IEEE 10th International Conference, p.: 992 – 995, 2010. 6. Omercevic, D.; Drbohlav, O.; Leonardis, A, High-Dimensional Feature Matching: Employing the Concept of Meaningful Nearest Neighbors, Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference, p.: 1 – 8, Oct. 200. Шеэ It’s advisable to read An Introduction to Object Recognition: Selected Algorithms for a Wide Variety of Applications Author: Marco Treiber Publisher: Springer, 2010 The book presents an overview of the diverse applications for object recognition (OR) and highlights important algorithm classes, presenting representative example algorithms for each class. The presentation of each algorithm describes the basic algorithm flow in detail, complete with graphical illustrations. Pseudocode implementations are also included for many of the methods, and definitions are supplied for terms which may be unfamiliar to the novice reader. Supporting a clear and intuitive tutorial style, the usage of mathematics is kept to a minimum.