WISE: Large Scale Content-Based Web Image Search Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley 1 Query by Images “A picture is worth a thousand words.” What leaf ? Artist ? Higher resolution? ©: Who else using this? Bank web-site? Email ? Ad. in e-bay ? …… 2 Partial-Duplicate Image Search • Given a query image, find its partial duplicates from a database of web-images Two Major Challenges • How to represent images – No text annotations or labels – Noise and modification • How to efficiently index and query images – Large number of images (millions) ? Index 4 Image Representation: Bag-of-Words [CVPR’09, ICCV’09] 1: Feature extraction: Bundle Features 2: Quantization Detection Code-book [Lowe 2004, Matas et al 2002, Winder et al 2007] 3: Representation Bag-of-words [Sivic&Zisserman’2003] Descriptor ID 60 50 40 1 30 20 10 0 0 5 10 15 20 25 30 …… Normalization 3506 … 999,999 120 100 3506 80 60 40 20 0 5 10 15 0 5 10 15 0 5 10 15 20 25 30 20 25 30 20 25 30 1 206 …… 0 60 Description 35 30 120 60 60 50 100 50 50 40 80 40 40 30 60 30 30 40 20 20 20 10 10 999,999 25 20 20 15 [Lowe 2004, Winder et al 2007] 10 10 5 0 0 5 10 15 20 25 30 60 0 0 5 10 15 20 25 30 90 0 0 5 10 15 20 25 30 60 0 0 0 5 10 15 20 25 30 90 60 80 80 50 50 70 40 40 60 50 40 50 30 20 40 30 20 30 20 20 20 10 10 10 10 0 5 10 15 20 25 30 0 1,000,000 30 40 0 50 70 60 30 10 0 5 10 15 20 25 30 0 0 5 10 15 20 25 30 0 0 5 10 15 20 25 30 0 5 … 6 Matching query to database • Use an index – Each visual word has a ‘posting list’ – Lists every image containing the word • At query time – Look up the posting list for each query word – Merge lists to find candidate images • Partial match: don’t need every word to be present 7 How much work to query? • Disk-based index, bottleneck is random reads – One seek per posting list • Also one seek per matching image – To fetch thumbnail etc • Keep as little information as possible in posting lists, to keep index size small 8 Index Pipeline • Implemented in a large computer cluster – 256-nodes, using Dryad/DryadLINQ Image Crawler Content Chunk Feature Extractor Bundled Features Feature Quantizer Visual Words Media DB (Thumbnail, URL) Crawler Local Features Indexer Inverted Index Visual Word Index 9 Query Pipeline Query Image Feature Extractor GUI Results … Bundled Features Feature Quantizer Visual Words Inverted Index Media DB (Thumbnail, URL) Index Server Search Results (chunkID, imgID) 10 Bag-of-Words: Limitations Quantization • Quantization 60 30 20 10 0 60 0 5 10 15 20 25 30 50 40 20 10 0 80 0 5 10 15 0 5 0 5 10 15 0 5 10 15 20 25 30 25 30 20 25 30 20 25 30 70 60 50 40 30 20 10 0 0 5 10 15 20 25 30 120 100 3506 80 60 40 20 0 10 15 20 90 80 70 60 60 50 50 40 40 30 20 999,999 30 10 20 0 0 5 10 15 20 25 30 10 60 50 90 80 40 70 1000,000 30 60 50 20 40 10 30 0 20 [Jegou et al, ECCV 2008] 1 30 90 0 – Hamming embedding ID 50 40 …… [Philbin et al, CVPR 2008] 70 …… – Lost discriminative power – Sensitive to image variations and noises – Soft quantization Descriptor 90 80 10 0 0 5 10 15 20 25 30 Geometric verification • In practice, bag of words is too weak • Does not exploit any geometry • Post-process to check spatial layout of matching features • Requires a disk seek per image – Only used as a re-ranking step to shortlist of matched images 12 Geometric Re-ranking Re-rank top 300 images 0.75 0.7 0.65 mAP 0.6 0.55 0.5 0.45 0.4 0.35 baseline (bag-of-words) baseline + reranking 0.3 0.25 50000 200000 500000 Number of images 1000000 Geometry in the index • Previous works: – Jegou et al ECCV 2008 • Try to match similar orientations and scales – Perdoch et al CVPR 2009 • Match oriented features more effectively • Still feature-by-feature – Global geometric consitency applied at the end 14 Single Feature is Weak + + Neighboring Features ? Define Neighboring Features • Previous works – kNN voting [Sivic&Zisserman 2003] – Higher-order spatial features [Liu et al][Yuan et al][Tirilly et al][Quack et al] – Post geometric spatial verification [Lowe’2004][Chum et al 2007][Nister 2006][Philbin et al 2007]…… – Geometric Min-Hash [Chum et al 2009] • Challenges – Repeatable – Partial matching – Scalable: simple enough to build into index Define Neighboring Features DoG Features [Lowe 2004] - point features - repeatable MSER Features [Matas et al 2002] - region features - repeatable Define Neighboring Features DoG Features [Lowe 2004] - point features - repeatable MSER Features [Matas et al 2002] - region features - repeatable region groups points? Bundled Feature: Definition • Bundled Feature = A set of DOG features bundled by a MSER region MSER region DoG interest points Bundled Feature: Definition Bundled Features Matching Bundles: Membership Query bundle q = {qj} = { } Matched bundle p = { pi } Membership score: M m (q; p) q p 4 Voting weight: v(q j ) M m (q; p) 4 Sim( I1 , I 2 ) v(q j ) 4 16 q j q j Matching Bundles: Membership Query bundle q = {qj} = { } Matched bundles p1, p2, p3 p1 Membership score: M m (q; p1 ) q p1 2 M m (q; p2 ) q p2 1 M m (q; p3 ) q p3 2 p2 p3 v(q j ) max M m (q; pk ) | q j q Sim( I1 , I 2 ) v(q j ) 8 q j pk v(q2 ) 2 v(q1 ) 2 v(q3 ) max(1, 2) 2 v(q4 ) 2 Matching Bundles: Geometric Constraint y query candidate query 4 5 3 4 3 2 1 2 5 4 4 3 3 2 1 1 order in query img: 1 < 2 < 3 < 4 order in target img: 1 < 3 < 4 < 5 order inconsistency: 0 + 0 + 0 = 0 candidate 2 1 order in query img: 1 < 2 < 3< 4 matching order: inconsistency: 5 > 2 > 1< 3 Penalize inconsistent relative orders: M g (q; p) Oq pi Oq pi 1 1 + 1 + 0= 2 Matching Bundles: Formulation • Bundle matching score: M ( q ; p ) M m (q; p) M g (q; p) membership geometric constraint • Image matching score: v(q j ) max M (q; pk ) | q j q pk Sim( I1 , I 2 ) v(q j ) {q j } - Repeatable - Partial matching - Scalable? Inverted Index (without Bundles) Visual word …… Posting …… Image ID Image ID = 27 … … … 27 … 27 …… …… Inverted Index with Bundles Visual word …… Posting …… Bundle Bits Image ID Image ID = 27 9 bits 1 p1 5 bits 5 bits Bundle ID X-Order Y-Order 3 p2 … … … 27, 27, [1,2,1] [2,2,2] …… p3 …… 2 … 27, [3,1,1] Retrieval Query Image Iq Inverted index with bundle bits …… 10, 1, [1,1,2] 1,1, [2,5,9] 3,1, [3,4,5] 9,2, [3,2,5] … 10, 1, [2,2,1] … … … 12,1, [1.1.2] …… …… Top candidate images p1 p2 p3 Experimental Settings • Image database: – 1M web images from query-click log • Ground truth partial duplicates – 780 known partial duplicate images in 19 groups • Baseline bag-of-words – Visual word vocabulary size = 1 M – Soft quantization factor = 4 – 500 features per image Partial Duplicate Example …… Partial Duplicate Example …… Example Query Results Query Challenging cases Evaluation: Precision-Recall • A query returns N images – T : correct matches – A : expected matches T Recall = A Precision T Precision = N Recall Comparison: Precision-Recall Query image: Baseline bag-of-words (started from 13th) Bundled features (started from 13th) More Precision-Recall Comparisons Evaluation: mAP • Average Precision (AP) for one query: – Area under Precision-Recall curve AP • mAP: mean of AP’s from all testing queries mAP: Baseline Bag-of-Words 0.7 0.65 0.6 mAP 0.55 0.5 0.45 baseline HE bundled(membership) bundled bundled + HE 0.4 0.35 50000 200000 500000 Number of images 1000000 mAP: Hamming Embedding (HE) 0.7 0.65 0.6 mAP 0.55 0.5 0.45 baseline HE bundled(membership) bundled bundled + HE 0.4 0.35 50000 200000 500000 Number of images 1000000 mAP: Bundle (Membership) 0.7 0.65 0.6 mAP 0.55 0.5 0.45 baseline HE bundled(membership) bundled bundled + HE 0.4 0.35 50000 200000 500000 Number of images 1000000 mAP: Bundle (both terms) 0.7 0.65 0.6 mAP 0.55 0.5 0.45 26% 40% baseline HE bundled(membership) bundled bundled + HE 0.4 0.35 50000 200000 500000 Number of images 1000000 mAP: Bundle + HE 0.7 0.65 0.6 mAP 0.55 0.5 0.45 49% baseline HE bundled(membership) bundled bundled + HE 0.4 0.35 50000 200000 500000 Number of images 1000000 Bundle VS. Geometric Re-ranking Re-rank top 300 images 0.75 0.7 0.65 mAP 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 baseline (bag-of-words) bundle baseline + reranking bundle + reranking 50000 200000 500000 Number of images 1000000 Bundle + Geometric Re-ranking Re-rank top 300 images 0.75 0.7 0.65 mAP 0.6 24% 0.55 0.5 0.45 0.4 0.35 0.3 0.25 baseline (bag-of-words) bundle baseline + reranking bundle + reranking 50000 200000 500000 Number of images 1000000 77% More Results Failure Case Demo Setup Client Web Server Index Servers Document Server 6 million images 46 Demo Query image Results 47 Conclusion • Bundle feature – More discriminative – Enforce spatial constraints while traversing index – Partial match – Scalable: built into index 9 bits 5 bits 5 bits Bundle ID X-Order Y-Order Thanks!