Indexing Techniques Mei-Chen Yeh Last week • Matching two sets of features – Strategy 1 • Convert to a fixed-length feature vector (Bag-of-words) • Use a conventional proximity measure – Strategy 2: • Build point correspondences Last week: bag-of-words frequency visual vocabulary ….. codewords Matching local features: building patch correspondences ? Image 1 Image 2 To generate candidate matches, find patches that have the most similar appearance (e.g., lowest SSD) Slide credits: Prof. Kristen Grauman Matching local features: building patch correspondences ? Image 1 Image 2 Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance) Slide credits: Prof. Kristen Grauman Indexing local features • Each patch / region has a descriptor, which is a point in some high-dimensional feature space (e.g., SIFT) Descriptor’s feature space Database images Indexing local features • When we see close points in feature space, we have similar descriptors, which indicates similar local content. Descriptor’s feature space Database images Query image Problem statement • With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image? 50 thousand images 4m Slide credit: Nistér and Stewénius 110 million images? Scalability matters! The Nearest-Neighbor Search Problem • Given – A set S of n points in d dimensions – A query point q • Which point in S is closest to q? ? Time complexity of linear scan: O( dn ? ) The Nearest-Neighbor Search Problem The Nearest-Neighbor Search Problem • r-nearest neighbor – for any query q, returns a point p ∈ S s.t. p q r • c-approximate r-nearest neighbor – for any query q, returns a point p’ ∈ S s.t. p'q cr Today • Indexing local features – Inverted file – Vocabulary tree – Locality sensitivity hashing Indexing local features: inverted file Indexing local features: inverted file • For text documents, an efficient way to find all pages on which a word occurs is to use an index. • We want to find all images in which a feature occurs. – page ~ image – word ~ feature • To use this idea, we’ll need to map our features to “visual words”. Text retrieval vs. image search • What makes the problems similar, different? Visual words e.g., SIFT descriptor space: each point is 128-dimensional • Extract some local features from a number of images … Slide credit: D. Nister, CVPR 2006 Visual words Visual words Visual words Each point is a local descriptor, e.g. SIFT vector. Example: Quantize into 3 words Visual words • Map high-dimensional descriptors to tokens/words by quantizing the feature space • Quantize via clustering, let cluster centers be the prototype “words” Word #2 Descriptor’s feature space • Determine which word to assign to each new image region by finding the closest cluster center. Visual words • Each group of patches belongs to the same visual word! Figure from Sivic & Zisserman, ICCV 2003 Visual vocabulary formation Issues: • Sampling strategy: where to extract features? Fixed locations or interest points? • Clustering / quantization algorithm • What corpus provides features (universal vocabulary?) • Vocabulary size, number of words • Weight of each word? Inverted file index Why the index give us a significant gain in efficiency? The index maps word-to-image ids Inverted file index A query image is matched to database images that share visual words. tf-idf weighting • Term frequency – inverse document frequency • Describe the frequency of each word within an image, decrease the weights of the words that appear often in the database – economic, trade, … w↗ discriminative regions common regions – the, most, we, … w↘ tf-idf weighting • Term frequency – inverse document frequency • Describe the frequency of each word within an image, decrease the weights of the words that appear often in the database Number of occurrences of word i in document d Total number of documents in database Number of words in document d Number of documents word i occurs in, in whole database Bag-of-Words + Inverted file Bag-of-words representation Local descriptors from training samples http://people.cs.ubc.ca/~lowe/keypoints/ … Training images Visual-word2 Visual-word1 Local descriptor Frequency Visual-word3 VW1 Image i VW2 Image i Feature space Vocabulary Visual Words ... ... Image k Image j ... Matching Score Image k Inverted file http://www.robots.ox.ac.uk/~vgg /research/vgoogle/index.html VW3 . . . VWk Image m ... Image n K: number of words in vocabulary Slide credit: Xin Yang D. Nistér and H. Stewenius. Scalable Recognition with a Vocabulary Tree, CVPR 2006. Visualize as a tree Vocabulary Tree Sensory Augmented andRecognition Perceptual Tutorial Computing Object Visual • Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister Vocabulary Tree Sensory Augmented andRecognition Perceptual Tutorial Computing Object Visual • Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister Vocabulary Tree Sensory Augmented andRecognition Perceptual Tutorial Computing Object Visual • Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister Vocabulary Tree Sensory Augmented andRecognition Perceptual Tutorial Computing Object Visual • Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister Vocabulary Tree Sensory Augmented andRecognition Perceptual Tutorial Computing Object Visual • Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister 42 Vocabulary Tree Sensory Augmented andRecognition Perceptual Tutorial Computing Object Visual • Recognition Retrieved Or perform geometric verification [Nister & Stewenius, CVPR’06] Slide credit: David Nister Think about the computational advantage of the hierarchical tree vs. a flat vocabulary! Hashing Direct addressing • Create a direct-address table with m slots 0 U (universe of keys) 1 9 4 0 7 K (actual keys) 2 5 1 6 key 2 2 3 3 4 5 5 6 3 8 7 8 9 8 satellite data Direct addressing • Search operation: O(1) • Problem: The range of keys can be large! – 64-bit numbers => 18,446,744,073,709,551,616 different keys U – SIFT: 128 * 8 bits K Hashing • O(1) average-case time • Use a hash function h to compute the slot from the key k T: hash table 0 U (universe of keys) h(k1) may not be k1 anymore! h(k4) K (actual keys) k1 k5 h(k5) = h(k3) k4 k3 m-1 may share a bucket Hashing • A good hash function – Satisfies the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots. • How to design a hash function for indexing high-dimensional data? 128-d T: hash table ? Locality-sensitive hashing • Indyk and Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality, STOC 1998. Locality-sensitive hashing (LSH) • Hash functions are locality-sensitive, if, for any pair of points p, q we have: – Pr[h(p)=h(q)] is “high” if p is close to q – Pr[h(p)=h(q)] is “low” if p is far from q Pr[h( x) h( y )] sim( x, y ), hF Locality Sensitive Hashing • A family H of functions h: Rd → U is called (r, cr, P1, P2)-sensitive, if for any p, q: – if p q r then Pr[h(p)=h(q)] > P1 – if p q cr then Pr[h(p)=h(q)] < P2 LSH Function: Hamming Space • Consider binary vectors – points from {0, 1}d – Hamming distance D(p, q) = # positions on which p and q differ Example: (d = 3) D(100, 011) = 3 D(010, 111) = 2 LSH Function: Hamming Space • Define hash function h as hi(p) = pi where pi is the i-th bit of p Example: select the 1st dimension h(010) = 0 h(111) = 1 D(p,q)/d q)? d? = D(p, Pr[h(010)≠h(111)] = ?⅔ vs. Pr[h(p)=h(q)] = ?1 - D(p, q)/d Clearly, h is locality sensitive. LSH Function: Hamming Space • A k-bit locality-sensitive hash function is defined as g(p) = [h1(p), h2(p), …, hk(p)]T – Each hi(p) is chosen randomly – Each hi(p) results in a single bit Pr(similar points collide) ≥ 1 1 1 P1 k Pr(dissimilar points collide) ≤ P2k Indyk and Motwani [1998] LSH Function: R2 space • Consider 2-d vectors LSH Function: R2 space • The probability that a random hyperplane separates two unit vectors depends on the angle between them: LSH Pre-processing • Each image is entered into L hash tables indexed by independently constructed g1, g2, …, gL • Preprocessing Space: O(LN) LSH Querying • For each hash table, return the bin indexed by gi(q), 1 ≤ i ≤ L. • Perform a linear search on the union of the bins. W. –T Lee and H. –T. Chen. Probing the localfeature space of interest points, ICIP 2010. Hash family The dot-product a‧v projects each vector v to “a line” a : random vector sampled from a Gaussian distribution b : real value chosen uniformly from the range [0 , r] r : segment width Building the hash table Building the hash table : segment width (max-min)/t For each random projection, we get t buckets. Building the hash table • Generate K projections Combing them to get an index in the hash table: How many buckets do we get? tK Building the hash table • Example – 5 projections (K = 5) – 15 segments (t = 15) • 155 = 759,375 buckets in total! Sketching the Feature Space Natural image patches (from Berkeley segmentation database ) Noise image patches (Randomly-generated noise patches) Collect three image patches of different size 16x16 , 32x32 , 64x64 Each set consist of 200,000 patches. Patch distribution over buckets Summary • Indexing techniques are essential for organizing a database and for enabling fast matching. • For indexing high-dimensional data – Inverted file – Vocabulary tree – Locality sensitive hashing Resources and extended readings • LSH Matlab Toolbox – http://www.cs.brown.edu/~gregory/download.ht ml • Yeh et al., “Adaptive Vocabulary Forests for Dynamic Indexing and Category Learning,” ICCV 2007.