Document 13546923

advertisement
 Intro
 Related
work
 Approach
 Program results
 Future work

Goals
› To aid in my research for my thesis
“Markerless Indoor Localization for the
Mobile Environment”
› Gain a better understanding of feature
descriptors and image matching
 Challenges
of image
matching in the indoor
environment
› Local Image Texture Features
(LITFs) are sparse and tend to
be clustered
› As LITFs become sparse they
become unreliable
 Image
matching has three related
areas of work that need to be
considered:
› Detecting Image features
› Matching feature points
› Image matching



Image features are patches in an image that can be found
consistently
LITF based features are the most commonly used image
features:
› Scale Invariant Feature Transform (SIFT) [1]
› Speeded Up Robust Feature (SURF) [2]
› Oriented FAST Rotated BRIEF (ORB) [3]
Line segment features
› Typically calculated by Hough transform
› Used to find line intersections or vanishing points in images
[1] David G. Lowe. “Distinctive Image Features from Scale-Invariant Keypoints”. International Journal of Computer Vision 2004.
[2] Bay, Herbert, Tinne Tuytelaars, and Luc Van Gool. "Surf: Speeded up robust features." Computer Vision–ECCV 2006.
Springer Berlin Heidelberg, 2006. 404-417.
[3] Rublee, Ethan, et al. "ORB: an efficient alternative to SIFT or SURF." Computer Vision (ICCV), 2011 IEEE International Conferensce on. IEEE, 2011.
SIFT and SURF are the most commonly used LITF
 Robust to noise, scale, intensity variations, some
affine deformations and occlusions
 SIFT:
› Creates a 4x4 descriptor around the key point
› Has been modified many ways one of the most
commonly used is PCA-SIFT [1]
 SURF:
› It is faster to compute than SIFT [2,3]
 Both SIFT and SURF, however, are computationally
expensive and their descriptors require a large
amount of memory to store

[1] Yan Ke and Rahul Sukthankar, “PCA-SIFT: a more distinctive representation for local image descriptors,” in Proceedings of IEEE Conference
on Computer Vision and Pattern Recognition, Vol. 1, pp. 511-517 (2004)
[2] Bay, Herbert, Tinne Tuytelaars, and Luc Van Gool. "Surf: Speeded up robust features." Computer Vision–ECCV 2006. Springer Berlin
Heidelberg, 2006. 404-417.
[3] Heinly, Jared, Enrique Dunn, and Jan-Michael Frahm. "Comparative evaluation of binary features." Computer Vision–ECCV 2012.
Springer Berlin Heidelberg, 2012. 759-773.





Computationally less expensive than
SIFT and SURF by one to two orders
of magnitude [1,2]
Requires less space than SIFT and
SURF descriptors [2]
Is robust to most image deformations
that SIFT and SURF are [1,2]
Has been run on a 7Hz video stream
on a mobile phone (1GHz ARM
processor, 512 Mb RAM) [1]
This makes it a promising LITF
descriptor for my thesis
[1] Rublee, Ethan, et al. "ORB: an efficient alternative to SIFT or SURF."
Computer Vision (ICCV), 2011 IEEE International Conferensce on. IEEE, 2011.
[2] Heinly, Jared, Enrique Dunn, and Jan-Michael Frahm. "Comparative evaluation of binary features."
Computer Vision–ECCV 2012. Springer Berlin Heidelberg, 2012. 759-773.


Are typically extracted by the Hough
transform [1,2]
They are useful in many ways:
› Extract planes in the image
› Determining where lines in the image
intersection
› Line segment intersections are good
features to match to the database
 Can be used to determine cross ratio of 4
collinear points
› Determining the vanishing point in the
image, help determine camera pose
› Can typically still be found in sparse LITF
environments
[1] Ballard, Dana H. "Generalizing the Hough transform to detect arbitrary shapes."
Pattern recognition 13.2 (1981): 111-122.
[2] Hough, Paul VC. "Method and means for recognizing complex patterns." U.S. Patent No. 3,069,654. 18 Dec. 1962.

Feature matching is an import step in image matching
Determines the best matching features between a query
image and a database of images

Two types of feature matching explored:

› Brute Force matching
› Locality Sensitive Hashing (LSH) [1]
[1] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. “Localitysensitive hashing scheme based on p-stable distributions”, in
Proceedings of the 20th Annual Symposium on Computational
Geometry, pp. 253-262 (2004)
Most accurate form of feature matching
 Exhaustively searches the database to
find the best matches for query key
points
 Has a major flaw, it becomes
prohibitively expensive as the database
grows
 It is only suitable for small databases
(500-10000 feature points)

k Nearest Neighbor (kNN) approximation
 Hashes keypoints in a way that preserves locality
[1]
 Since locality is preserved, distances between
hashes is equivalent to distance between key
points [1]
 Is very efficient matching images against a large
database
 Only better than Brute Force matching when the
database becomes large

[1] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. “Locality-sensitive hashing scheme based
on p-stable distributions”, in Proceedings of the 20th Annual Symposium on Computational Geometry, pp. 253-262 (2004)


Given a set of feature matches to a database, the
best image match needs to be found
Problem is that there may be feature matches to
multiple images in the database
› Feature matchers have the potential to incorrectly match
feature points

Need to determine the correct match:
› One approach is visual Bag of Words , weak matching
constraint
› Fitting the matched features to a model, stronger
matching constraint


Fitting matched features to a model is an effective
and reliable way to determine image similarity
There are two common models that are used:
› Fundamental matrix
› Homography

Models can also be customized to the task at hand
› Hile et al [1] uses the floor plan of the building as the
model they match the floor plane in the image to
[1] Hile, Harlan, and Gaetano Borriello. "Positioning and orientation
in indoor environments using camera phones." Computer Graphics and Applications, IEEE 28.4 (2008): 32-39.
BF matching
 LSH matching

Query image is matched to the database
 Finding the correct image:

› Images with fewer than 𝑘 matches are filtered
out
› The fundamental matrix between the query
image and each remaining db image is found
› The db image with the largest number of inliers 𝑖
is selected
› If 𝑖 ≥ 𝑡 then it is determined to be the correct
image
Match the query image to the database
 Finding the correct image:

› First the images are filtered: if a db image has
more than one key point matching a key point
in the query image. The closest key point has to
be 60% closer then the second closest point
› All images with fewer then k matches are
discarded
› A fundamental matrix is formed between the
query image and each remaining db image
› The image with the best number of inliers 𝑖 ≥ 𝑡 is
determined to be the best match
The program was trained with 67 images
of the third floor of Brown Building
 Images of 4 doorways were taken from
5-6 perspectives with 2-3 steps between
the perspectives
 Images traveling down the long north to
south hallway starting at each end were
taken with 2-3 steps between each
image



So far the prototype has only been written in C++
and has only been run on a laptop. But should be
able to run on a mobile platform.
It currently does not report false positives, but it does
report false negatives when the texture becomes
sparse
(a)
(b)
(c)
(a)
(b)
(c)

A notable result occurred when the 51
images of the long north south hallway
where added
› The initial matching phase’s accuracy
improved noticeably.
› This was unexpected because the BF and
LSH matching algorithms don’t appear to
use machine learning.
› A possible reason for this is that the matchers
have more data to use making them more
accurate
Adapt this work to my thesis
 Implement the mobile version of
matching process
 Explore line segment features to see if
they would work well in conjunction with
LITF descriptors

Download