Document

Indexing Techniques Mei-Chen Yeh Last week • Matching two sets of features – Strategy 1 • Convert to a fixed-length feature vector (Bag-of-words) • Use a conventional proximity measure – Strategy 2: • Build point correspondences Last week: bag-of-words frequency visual vocabulary ….. codewords Matching local features: building patch correspondences ? Image 1 Image 2 To generate candidate matches, find patches that have the most similar appearance (e.g., lowest SSD) Slide credits: Prof. Kristen Grauman Matching local features: building patch correspondences ? Image 1 Image 2 Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance) Slide credits: Prof. Kristen Grauman Indexing local features • Each patch / region has a descriptor, which is a point in some high-dimensional feature space (e.g., SIFT) Descriptor’s feature space Database images Indexing local features • When we see close points in feature space, we have similar descriptors, which indicates similar local content. Descriptor’s feature space Database images Query image Problem statement • With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image? 50 thousand images 4m Slide credit: Nistér and Stewénius 110 million images? Scalability matters! The Nearest-Neighbor Search Problem • Given – A set S of n points in d dimensions – A query point q • Which point in S is closest to q? ? Time complexity of linear scan: O( dn ? ) The Nearest-Neighbor Search Problem The Nearest-Neighbor Search Problem • r-nearest neighbor – for any query q, returns a point p ∈ S s.t. p  q  r • c-approximate r-nearest neighbor – for any query q, returns a point p’ ∈ S s.t. p'q  cr Today • Indexing local features – Inverted file – Vocabulary tree – Locality sensitivity hashing Indexing local features: inverted file Indexing local features: inverted file • For text documents, an efficient way to find all pages on which a word occurs is to use an index. • We want to find all images in which a feature occurs. – page ～ image – word ～ feature • To use this idea, we’ll need to map our features to “visual words”. Text retrieval vs. image search • What makes the problems similar, different? Visual words e.g., SIFT descriptor space: each point is 128-dimensional • Extract some local features from a number of images … Slide credit: D. Nister, CVPR 2006 Visual words Visual words Visual words Each point is a local descriptor, e.g. SIFT vector. Example: Quantize into 3 words Visual words • Map high-dimensional descriptors to tokens/words by quantizing the feature space • Quantize via clustering, let cluster centers be the prototype “words” Word #2 Descriptor’s feature space • Determine which word to assign to each new image region by finding the closest cluster center. Visual words • Each group of patches belongs to the same visual word! Figure from Sivic & Zisserman, ICCV 2003 Visual vocabulary formation Issues: • Sampling strategy: where to extract features? Fixed locations or interest points? • Clustering / quantization algorithm • What corpus provides features (universal vocabulary?) • Vocabulary size, number of words • Weight of each word? Inverted file index Why the index give us a significant gain in efficiency? The index maps word-to-image ids Inverted file index A query image is matched to database images that share visual words. tf-idf weighting • Term frequency – inverse document frequency • Describe the frequency of each word within an image, decrease the weights of the words that appear often in the database – economic, trade, … w↗ discriminative regions common regions – the, most, we, … w↘ tf-idf weighting • Term frequency – inverse document frequency • Describe the frequency of each word within an image, decrease the weights of the words that appear often in the database Number of occurrences of word i in document d Total number of documents in database Number of words in document d Number of documents word i occurs in, in whole database Bag-of-Words + Inverted file Bag-of-words representation Local descriptors from training samples http://people.cs.ubc.ca/~lowe/keypoints/ … Training images Visual-word2 Visual-word1 Local descriptor Frequency Visual-word3 VW1 Image i VW2 Image i Feature space Vocabulary Visual Words ... ... Image k Image j ... Matching Score Image k Inverted file http://www.robots.ox.ac.uk/~vgg /research/vgoogle/index.html VW3 . . . VWk Image m ... Image n K: number of words in vocabulary Slide credit: Xin Yang D. Nistér and H. Stewenius. Scalable Recognition with a Vocabulary Tree, CVPR 2006. Visualize as a tree Vocabulary Tree Sensory Augmented andRecognition Perceptual Tutorial Computing Object Visual • Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister Vocabulary Tree Sensory Augmented andRecognition Perceptual Tutorial Computing Object Visual • Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister Vocabulary Tree Sensory Augmented andRecognition Perceptual Tutorial Computing Object Visual • Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister Vocabulary Tree Sensory Augmented andRecognition Perceptual Tutorial Computing Object Visual • Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister Vocabulary Tree Sensory Augmented andRecognition Perceptual Tutorial Computing Object Visual • Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister 42 Vocabulary Tree Sensory Augmented andRecognition Perceptual Tutorial Computing Object Visual • Recognition Retrieved Or perform geometric verification [Nister & Stewenius, CVPR’06] Slide credit: David Nister Think about the computational advantage of the hierarchical tree vs. a flat vocabulary! Hashing Direct addressing • Create a direct-address table with m slots 0 U (universe of keys) 1 9 4 0 7 K (actual keys) 2 5 1 6 key 2 2 3 3 4 5 5 6 3 8 7 8 9 8 satellite data Direct addressing • Search operation: O(1) • Problem: The range of keys can be large! – 64-bit numbers => 18,446,744,073,709,551,616 different keys U – SIFT: 128 * 8 bits K Hashing • O(1) average-case time • Use a hash function h to compute the slot from the key k T: hash table 0 U (universe of keys) h(k1) may not be k1 anymore! h(k4) K (actual keys) k1 k5 h(k5) = h(k3) k4 k3 m-1 may share a bucket Hashing • A good hash function – Satisfies the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots. • How to design a hash function for indexing high-dimensional data? 128-d T: hash table ? Locality-sensitive hashing • Indyk and Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality, STOC 1998. Locality-sensitive hashing (LSH) • Hash functions are locality-sensitive, if, for any pair of points p, q we have: – Pr[h(p)=h(q)] is “high” if p is close to q – Pr[h(p)=h(q)] is “low” if p is far from q Pr[h( x)  h( y )]  sim( x, y ), hF Locality Sensitive Hashing • A family H of functions h: Rd → U is called (r, cr, P1, P2)-sensitive, if for any p, q: – if p  q  r then Pr[h(p)=h(q)] > P1 – if p  q  cr then Pr[h(p)=h(q)] < P2 LSH Function: Hamming Space • Consider binary vectors – points from {0, 1}d – Hamming distance D(p, q) = # positions on which p and q differ Example: (d = 3) D(100, 011) = 3 D(010, 111) = 2 LSH Function: Hamming Space • Define hash function h as hi(p) = pi where pi is the i-th bit of p Example: select the 1st dimension h(010) = 0 h(111) = 1 D(p,q)/d q)? d? = D(p, Pr[h(010)≠h(111)] = ?⅔ vs. Pr[h(p)=h(q)] = ?1 - D(p, q)/d Clearly, h is locality sensitive. LSH Function: Hamming Space • A k-bit locality-sensitive hash function is defined as g(p) = [h1(p), h2(p), …, hk(p)]T – Each hi(p) is chosen randomly – Each hi(p) results in a single bit Pr(similar points collide) ≥  1 1  1    P1  k Pr(dissimilar points collide) ≤ P2k Indyk and Motwani [1998] LSH Function: R2 space • Consider 2-d vectors LSH Function: R2 space • The probability that a random hyperplane separates two unit vectors depends on the angle between them: LSH Pre-processing • Each image is entered into L hash tables indexed by independently constructed g1, g2, …, gL • Preprocessing Space: O(LN) LSH Querying • For each hash table, return the bin indexed by gi(q), 1 ≤ i ≤ L. • Perform a linear search on the union of the bins. W. –T Lee and H. –T. Chen. Probing the localfeature space of interest points, ICIP 2010. Hash family The dot-product a‧v projects each vector v to “a line” a : random vector sampled from a Gaussian distribution b : real value chosen uniformly from the range [0 , r] r : segment width Building the hash table Building the hash table : segment width (max-min)/t For each random projection, we get t buckets. Building the hash table • Generate K projections Combing them to get an index in the hash table: How many buckets do we get? tK Building the hash table • Example – 5 projections (K = 5) – 15 segments (t = 15) • 155 = 759,375 buckets in total! Sketching the Feature Space Natural image patches (from Berkeley segmentation database ) Noise image patches (Randomly-generated noise patches) Collect three image patches of different size 16x16 , 32x32 , 64x64 Each set consist of 200,000 patches. Patch distribution over buckets Summary • Indexing techniques are essential for organizing a database and for enabling fast matching. • For indexing high-dimensional data – Inverted file – Vocabulary tree – Locality sensitive hashing Resources and extended readings • LSH Matlab Toolbox – http://www.cs.brown.edu/~gregory/download.ht ml • Yeh et al., “Adaptive Vocabulary Forests for Dynamic Indexing and Category Learning,” ICCV 2007.

Document

Related documents

Products

Support

Document

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib