Intro Related work Approach Program results Future work Goals › To aid in my research for my thesis “Markerless Indoor Localization for the Mobile Environment” › Gain a better understanding of feature descriptors and image matching Challenges of image matching in the indoor environment › Local Image Texture Features (LITFs) are sparse and tend to be clustered › As LITFs become sparse they become unreliable Image matching has three related areas of work that need to be considered: › Detecting Image features › Matching feature points › Image matching Image features are patches in an image that can be found consistently LITF based features are the most commonly used image features: › Scale Invariant Feature Transform (SIFT) [1] › Speeded Up Robust Feature (SURF) [2] › Oriented FAST Rotated BRIEF (ORB) [3] Line segment features › Typically calculated by Hough transform › Used to find line intersections or vanishing points in images [1] David G. Lowe. “Distinctive Image Features from Scale-Invariant Keypoints”. International Journal of Computer Vision 2004. [2] Bay, Herbert, Tinne Tuytelaars, and Luc Van Gool. "Surf: Speeded up robust features." Computer Vision–ECCV 2006. Springer Berlin Heidelberg, 2006. 404-417. [3] Rublee, Ethan, et al. "ORB: an efficient alternative to SIFT or SURF." Computer Vision (ICCV), 2011 IEEE International Conferensce on. IEEE, 2011. SIFT and SURF are the most commonly used LITF Robust to noise, scale, intensity variations, some affine deformations and occlusions SIFT: › Creates a 4x4 descriptor around the key point › Has been modified many ways one of the most commonly used is PCA-SIFT [1] SURF: › It is faster to compute than SIFT [2,3] Both SIFT and SURF, however, are computationally expensive and their descriptors require a large amount of memory to store [1] Yan Ke and Rahul Sukthankar, “PCA-SIFT: a more distinctive representation for local image descriptors,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 511-517 (2004) [2] Bay, Herbert, Tinne Tuytelaars, and Luc Van Gool. "Surf: Speeded up robust features." Computer Vision–ECCV 2006. Springer Berlin Heidelberg, 2006. 404-417. [3] Heinly, Jared, Enrique Dunn, and Jan-Michael Frahm. "Comparative evaluation of binary features." Computer Vision–ECCV 2012. Springer Berlin Heidelberg, 2012. 759-773. Computationally less expensive than SIFT and SURF by one to two orders of magnitude [1,2] Requires less space than SIFT and SURF descriptors [2] Is robust to most image deformations that SIFT and SURF are [1,2] Has been run on a 7Hz video stream on a mobile phone (1GHz ARM processor, 512 Mb RAM) [1] This makes it a promising LITF descriptor for my thesis [1] Rublee, Ethan, et al. "ORB: an efficient alternative to SIFT or SURF." Computer Vision (ICCV), 2011 IEEE International Conferensce on. IEEE, 2011. [2] Heinly, Jared, Enrique Dunn, and Jan-Michael Frahm. "Comparative evaluation of binary features." Computer Vision–ECCV 2012. Springer Berlin Heidelberg, 2012. 759-773. Are typically extracted by the Hough transform [1,2] They are useful in many ways: › Extract planes in the image › Determining where lines in the image intersection › Line segment intersections are good features to match to the database Can be used to determine cross ratio of 4 collinear points › Determining the vanishing point in the image, help determine camera pose › Can typically still be found in sparse LITF environments [1] Ballard, Dana H. "Generalizing the Hough transform to detect arbitrary shapes." Pattern recognition 13.2 (1981): 111-122. [2] Hough, Paul VC. "Method and means for recognizing complex patterns." U.S. Patent No. 3,069,654. 18 Dec. 1962. Feature matching is an import step in image matching Determines the best matching features between a query image and a database of images Two types of feature matching explored: › Brute Force matching › Locality Sensitive Hashing (LSH) [1] [1] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. “Localitysensitive hashing scheme based on p-stable distributions”, in Proceedings of the 20th Annual Symposium on Computational Geometry, pp. 253-262 (2004) Most accurate form of feature matching Exhaustively searches the database to find the best matches for query key points Has a major flaw, it becomes prohibitively expensive as the database grows It is only suitable for small databases (500-10000 feature points) k Nearest Neighbor (kNN) approximation Hashes keypoints in a way that preserves locality [1] Since locality is preserved, distances between hashes is equivalent to distance between key points [1] Is very efficient matching images against a large database Only better than Brute Force matching when the database becomes large [1] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. “Locality-sensitive hashing scheme based on p-stable distributions”, in Proceedings of the 20th Annual Symposium on Computational Geometry, pp. 253-262 (2004) Given a set of feature matches to a database, the best image match needs to be found Problem is that there may be feature matches to multiple images in the database › Feature matchers have the potential to incorrectly match feature points Need to determine the correct match: › One approach is visual Bag of Words , weak matching constraint › Fitting the matched features to a model, stronger matching constraint Fitting matched features to a model is an effective and reliable way to determine image similarity There are two common models that are used: › Fundamental matrix › Homography Models can also be customized to the task at hand › Hile et al [1] uses the floor plan of the building as the model they match the floor plane in the image to [1] Hile, Harlan, and Gaetano Borriello. "Positioning and orientation in indoor environments using camera phones." Computer Graphics and Applications, IEEE 28.4 (2008): 32-39. BF matching LSH matching Query image is matched to the database Finding the correct image: › Images with fewer than 𝑘 matches are filtered out › The fundamental matrix between the query image and each remaining db image is found › The db image with the largest number of inliers 𝑖 is selected › If 𝑖 ≥ 𝑡 then it is determined to be the correct image Match the query image to the database Finding the correct image: › First the images are filtered: if a db image has more than one key point matching a key point in the query image. The closest key point has to be 60% closer then the second closest point › All images with fewer then k matches are discarded › A fundamental matrix is formed between the query image and each remaining db image › The image with the best number of inliers 𝑖 ≥ 𝑡 is determined to be the best match The program was trained with 67 images of the third floor of Brown Building Images of 4 doorways were taken from 5-6 perspectives with 2-3 steps between the perspectives Images traveling down the long north to south hallway starting at each end were taken with 2-3 steps between each image So far the prototype has only been written in C++ and has only been run on a laptop. But should be able to run on a mobile platform. It currently does not report false positives, but it does report false negatives when the texture becomes sparse (a) (b) (c) (a) (b) (c) A notable result occurred when the 51 images of the long north south hallway where added › The initial matching phase’s accuracy improved noticeably. › This was unexpected because the BF and LSH matching algorithms don’t appear to use machine learning. › A possible reason for this is that the matchers have more data to use making them more accurate Adapt this work to my thesis Implement the mobile version of matching process Explore line segment features to see if they would work well in conjunction with LITF descriptors