Image Retrieval with Geometry-Preserving Visual Phrases Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Similar Image Retrieval … Image Database Ranked relevant images Bag-of-Visual-Word (BoW) Length: dictionary size … Images are represented as the histogram of words Similarity of two images: cosine similarity of histograms Geometry-preserving Visual Phrases length-k Phrase:: k words in a certain spatial layout Bag of Phrases: … … (length-2 phrases) Phrases vs. Words Irrelevant Word Length-2 Length-3 Relevant Word Length-2 Length-3 Previous Works Geometry Verification Only on top ranked images Searching Step with BoW … Encode Spatial Info Post-processing (Geometry Verification) Modeling relationship between words … Previous works: reduce …the number of phrases (length-2 Phrase) Dimension: exponential to # of words in Phrase Co-occurrences in Entire image [L. Torresani, et al, CVPR 2009] No spatial information Phrases a local neighborhoods Yuan et al, CVPR07][Z. Wu et al., CVPR10] Ourin work: All phrases, [J.Linear computation time [C.L.Zitnick, Tech.Report 07] No long range interactions, weak geometry Select a subset of phrases [J. Yuan et al, CVPR07] Discard a large portion of phrases Approach Overview 1. Similarity Measure BoW [Zhang and Chen, 09] 2. Large Scale Retrieval Inverted Files Min-hash BoP This Paper Inverted Files Min-hash Co-occurring Phrases Only consider the translation difference AB D C F E F A AB C D A F E F [Zhang and Chen, 09] Co-occurring Phrase Algorithm y y' y AB D C F E F A AB C D A E F F 3 2 1 EF 0 -1 F -2 -3 -4 A F F B A C A -2 -1 DF # of co-occurring length -2 Phrases: 2 1 +1 =5 3 A 0 1 2 x x' x 3 4 Offset space [Zhang and Chen, 09] Relation with the feature vector … … (x) k … … O(M ) sameask (y) BOW!!! kO(| |k | X k|k 1| Y |k 1) ( x), ( y) # of co-occurring length-k phrases Inner product of the feature vectors M: # of corresponding pairs, in practice, linear to the number of local features Inverted Index with BoW Avoid comparing with every image Inverted Index … … … … … Image ID Score I1 I2 +1 Score table … In Inverted Index with Word Location I1 … … … … Assume same word only occurs once in the same image, Same memory usage as BoW … … … Score Table Compute # of Co-occurring Phrases: Compute the Offset Space Image ID BoW … I2 In Score I1 BoP I1 I2 In … Inverted Files with Phrases Inverted Index … -1,-1 0,-1 1,-1 -1,0 0,0 1,0 … 0,1 wi I1 I10 … I8 … I5 … … … +1 Offset Space +1 … +1 … +1 … … Final Score 5 Offset Space I2 I1 In 2 4 8 2 2 1 Image ID … 3 I1 2 10 1 I2 … Score Final similarity scores In Overview BoP BoW Less storage and time complexity Inverted Files Min-hash Inverted Files Min-hash Min-hash with BoW m fi I I’ Probability of min-hash collision (same word) = Image Similarity Min-hash with Phrases m fi m fj I I’ y y ' y 3 2 1 0 -1 -2 -3 -4 -3 -2 -1 0 1 2 x x' x Offset space Probability of k min-hash collision with consistent geometry (Details are in the paper) Other Invariances Add dimension to the offset space Increase the memory usage Image I x p1 yˆ y y ' y p3 s s' p2 Image I’ x' y' log(sˆ) xˆ x x' s s' [Zhang and Chen, 10] Variant Matching Local histogram matching Evaluation 1. BoW + Inverted Index vs. BoP + inverted Index 2. BoW + Min-hash vs. BoP + Min-hash Post-processing methods: complimentary to our work Experiments –Inverted Index 5K Oxford dataset (55 queries) 1M flicker distracters Philbin, J. et al. 07 Example Precision-recall curve BoW BoP Precision Precision BoW BoP Recall Higher precision at lower recall Recall Comparison Mean average precision: mean of the AP on 55 queries mAP BoP+RANSAC 0.700 0.650 0.600 BoP 0.550 BoW 0.500 BoW+RANSAC 0.450 0 200 400 600 Vocabulary Size (K) 800 1000 Outperform BoW (similar computation) Outperform BoW+RANSAC (10 times slower on 150 top images) Larger improvement on smaller vocabulary size +Flicker 1M Dataset 0.65 0.6 0.55 0.5 0.45 0.4 mAP 0 BoW BoP 200 400 600 800 Number of Images 1000 Computational Complexity Method Memory BoW BoP BoW+RANSAC 8.1G 8.5G - Runtime (seconds) Quantization Search 0.137s 0.89s 0.215s 4.137s 0.89s RANSAC: 4s on top 300 images Experiment - min-hash University of Kentucky dataset 3.30 3.20 3.10 3.00 BoW 2.90 BoP 2.80 200 500 800 # of min-hash fun. Minhash with BoW: [O. Chum et al., BMVC08] Conclusion Encode more spatial information into the BoW Can be applied to all images in the database at the searching step Same computational complexity as BoW Better Retrieval Precision than BoW+RANSAC