Document

advertisement
Indexing Techniques
Mei-Chen Yeh
Last week
• Matching two sets of features
– Strategy 1
• Convert to a fixed-length feature vector (Bag-of-words)
• Use a conventional proximity measure
– Strategy 2:
• Build point correspondences
Last week: bag-of-words
frequency
visual vocabulary
…..
codewords
Matching local features: building patch
correspondences
?
Image 1
Image 2
To generate candidate matches, find patches that
have the most similar appearance (e.g., lowest SSD)
Slide credits: Prof. Kristen Grauman
Matching local features: building patch
correspondences
?
Image 1
Image 2
Simplest approach: compare them all, take the closest
(or closest k, or within a thresholded distance)
Slide credits: Prof. Kristen Grauman
Indexing local features
• Each patch / region has a descriptor, which is a point
in some high-dimensional feature space (e.g., SIFT)
Descriptor’s
feature space
Database images
Indexing local features
• When we see close points in feature space, we have
similar descriptors, which indicates similar local
content.
Descriptor’s
feature space
Database images
Query image
Problem statement
• With potentially thousands of features per
image, and hundreds to millions of images to
search, how to efficiently find those that are
relevant to a new image?
50 thousand images
4m
Slide credit: Nistér and Stewénius
110 million images?
Scalability matters!
The Nearest-Neighbor Search Problem
• Given
– A set S of n points in d dimensions
– A query point q
• Which point in S is closest to q?
?
Time complexity
of linear scan:
O( dn
? )
The Nearest-Neighbor Search Problem
The Nearest-Neighbor Search Problem
• r-nearest neighbor
– for any query q, returns a point p ∈ S
s.t. p  q  r
• c-approximate r-nearest neighbor
– for any query q, returns a point p’ ∈ S
s.t. p'q  cr
Today
• Indexing local features
– Inverted file
– Vocabulary tree
– Locality sensitivity hashing
Indexing local
features:
inverted file
Indexing local features:
inverted file
• For text documents, an efficient way to find all
pages on which a word occurs is to use an
index.
• We want to find all images in which a feature
occurs.
– page ~ image
– word ~ feature
• To use this idea, we’ll need to map our
features to “visual words”.
Text retrieval vs. image search
• What makes the problems similar, different?
Visual words
e.g., SIFT descriptor space: each
point is 128-dimensional
• Extract some local features from a number of images …
Slide credit: D. Nister, CVPR 2006
Visual words
Visual words
Visual words
Each point is a
local descriptor,
e.g. SIFT vector.
Example: Quantize into 3 words
Visual words
• Map high-dimensional descriptors to tokens/words by
quantizing the feature space
• Quantize via
clustering, let cluster
centers be the
prototype “words”
Word #2
Descriptor’s
feature space
• Determine which
word to assign to
each new image
region by finding the
closest cluster
center.
Visual words
• Each group of
patches belongs
to the same visual
word!
Figure from Sivic & Zisserman, ICCV 2003
Visual vocabulary formation
Issues:
• Sampling strategy: where to extract features? Fixed
locations or interest points?
• Clustering / quantization algorithm
• What corpus provides features (universal
vocabulary?)
• Vocabulary size, number of words
• Weight of each word?
Inverted file index
Why the index give us
a significant gain in
efficiency?
The index maps word-to-image ids
Inverted file index
A query image is matched to database images that share visual words.
tf-idf weighting
• Term frequency – inverse document frequency
• Describe the frequency of each word within an
image, decrease the weights of the words that
appear often in the database
– economic, trade, …
w↗
discriminative regions
common regions
– the, most, we, …
w↘
tf-idf weighting
• Term frequency – inverse document frequency
• Describe the frequency of each word within an
image, decrease the weights of the words that
appear often in the database
Number of
occurrences of word i
in document d
Total number of
documents in
database
Number of words in
document d
Number of documents
word i occurs in, in
whole database
Bag-of-Words + Inverted file
Bag-of-words representation
Local descriptors from
training samples
http://people.cs.ubc.ca/~lowe/keypoints/
…
Training images
Visual-word2
Visual-word1
Local descriptor
Frequency
Visual-word3
VW1
Image i
VW2
Image i
Feature space
Vocabulary
Visual Words
...
...
Image k
Image j
...
Matching Score
Image k
Inverted file
http://www.robots.ox.ac.uk/~vgg
/research/vgoogle/index.html
VW3
.
.
.
VWk
Image m
...
Image n
K: number of words in vocabulary
Slide credit: Xin Yang
D. Nistér and H. Stewenius. Scalable Recognition
with a Vocabulary Tree, CVPR 2006.
Visualize as a tree
Vocabulary Tree
Sensory Augmented
andRecognition
Perceptual
Tutorial Computing
Object
Visual
• Training: Filling the tree
[Nister & Stewenius, CVPR’06]
Slide credit: David Nister
Vocabulary Tree
Sensory Augmented
andRecognition
Perceptual
Tutorial Computing
Object
Visual
• Training: Filling the tree
[Nister & Stewenius, CVPR’06]
Slide credit: David Nister
Vocabulary Tree
Sensory Augmented
andRecognition
Perceptual
Tutorial Computing
Object
Visual
• Training: Filling the tree
[Nister & Stewenius, CVPR’06]
Slide credit: David Nister
Vocabulary Tree
Sensory Augmented
andRecognition
Perceptual
Tutorial Computing
Object
Visual
• Training: Filling the tree
[Nister & Stewenius, CVPR’06]
Slide credit: David Nister
Vocabulary Tree
Sensory Augmented
andRecognition
Perceptual
Tutorial Computing
Object
Visual
• Training: Filling the tree
[Nister & Stewenius, CVPR’06]
Slide credit: David Nister
42
Vocabulary Tree
Sensory Augmented
andRecognition
Perceptual
Tutorial Computing
Object
Visual
• Recognition
Retrieved
Or perform geometric
verification
[Nister & Stewenius, CVPR’06]
Slide credit: David Nister
Think about the computational advantage of the
hierarchical tree vs. a flat vocabulary!
Hashing
Direct addressing
• Create a direct-address table with m slots
0
U
(universe of keys)
1
9
4
0
7
K
(actual keys)
2
5
1
6
key
2
2
3
3
4
5
5
6
3
8
7
8
9
8
satellite data
Direct addressing
• Search operation: O(1)
• Problem: The range of keys can be large!
– 64-bit numbers => 18,446,744,073,709,551,616
different keys
U
– SIFT: 128 * 8 bits
K
Hashing
• O(1) average-case time
• Use a hash function h to compute the slot
from the key k
T: hash table
0
U
(universe of keys)
h(k1)
may not be k1 anymore!
h(k4)
K
(actual keys)
k1
k5
h(k5) = h(k3)
k4
k3
m-1
may share a bucket
Hashing
• A good hash function
– Satisfies the assumption of simple uniform
hashing: each key is equally likely to hash to any
of the m slots.
• How to design a hash function for indexing
high-dimensional data?
128-d
T: hash table
?
Locality-sensitive hashing
•
Indyk and Motwani. Approximate nearest
neighbors: towards removing the curse of
dimensionality, STOC 1998.
Locality-sensitive hashing (LSH)
• Hash functions are locality-sensitive, if, for any
pair of points p, q we have:
– Pr[h(p)=h(q)] is “high” if p is close to q
– Pr[h(p)=h(q)] is “low” if p is far from q
Pr[h( x)  h( y )]  sim( x, y ),
hF
Locality Sensitive Hashing
• A family H of functions h: Rd → U is called (r, cr,
P1, P2)-sensitive, if for any p, q:
– if p  q  r then Pr[h(p)=h(q)] > P1
– if p  q  cr then Pr[h(p)=h(q)] < P2
LSH Function: Hamming Space
• Consider binary vectors
– points from {0, 1}d
– Hamming distance D(p, q) = # positions on which p
and q differ
Example: (d = 3)
D(100, 011) = 3
D(010, 111) = 2
LSH Function: Hamming Space
• Define hash function h as hi(p) = pi where pi is
the i-th bit of p
Example: select the 1st dimension
h(010) = 0
h(111) = 1
D(p,q)/d
q)? d?
= D(p,
Pr[h(010)≠h(111)] = ?⅔ vs.
Pr[h(p)=h(q)] = ?1 - D(p, q)/d
Clearly, h is locality sensitive.
LSH Function: Hamming Space
• A k-bit locality-sensitive hash function is
defined as g(p) = [h1(p), h2(p), …, hk(p)]T
– Each hi(p) is chosen randomly
– Each hi(p) results in a single bit
Pr(similar points collide) ≥

1
1  1  
 P1 
k
Pr(dissimilar points collide) ≤ P2k
Indyk and Motwani [1998]
LSH Function: R2 space
• Consider 2-d vectors
LSH Function: R2 space
• The probability that a random hyperplane separates
two unit vectors depends on the angle between
them:
LSH Pre-processing
• Each image is entered into L hash tables
indexed by independently constructed g1,
g2, …, gL
• Preprocessing Space: O(LN)
LSH Querying
• For each hash table, return the bin indexed by
gi(q), 1 ≤ i ≤ L.
• Perform a linear search on the union of the bins.
W. –T Lee and H. –T. Chen. Probing the localfeature space of interest points, ICIP 2010.
Hash family
The dot-product a‧v projects each vector v to “a
line”
a : random vector sampled from a Gaussian distribution
b : real value chosen uniformly from the range [0 , r]
r : segment width
Building the hash table
Building the hash table
: segment width
(max-min)/t
For each random projection,
we get t buckets.
Building the hash table
• Generate K projections
Combing them to get an
index in the hash table:
How many buckets do
we get? tK
Building the hash table
• Example
– 5 projections (K = 5)
– 15 segments (t = 15)
• 155 = 759,375 buckets in total!
Sketching the Feature Space
Natural image patches
(from Berkeley segmentation database )
Noise image patches
(Randomly-generated noise patches)
Collect three image patches of different size 16x16 , 32x32 , 64x64
Each set consist of 200,000 patches.
Patch distribution over buckets
Summary
• Indexing techniques are essential for
organizing a database and for enabling fast
matching.
• For indexing high-dimensional data
– Inverted file
– Vocabulary tree
– Locality sensitive hashing
Resources and extended readings
• LSH Matlab Toolbox
– http://www.cs.brown.edu/~gregory/download.ht
ml
• Yeh et al., “Adaptive Vocabulary Forests for
Dynamic Indexing and Category Learning,”
ICCV 2007.
Download