22IsardGuestLecture...

advertisement
WISE: Large Scale Content-Based
Web Image Search
Michael Isard
Joint with: Qifa Ke, Jian Sun, Zhong Wu
Microsoft Research Silicon Valley
1
Query by Images
“A picture is worth a thousand words.”
What leaf ?
Artist ? Higher resolution?
©: Who else using this?
Bank web-site?
Email ?
Ad. in e-bay ?
……
2
Partial-Duplicate Image Search
• Given a query image, find its partial duplicates
from a database of web-images
Two Major Challenges
• How to represent images
– No text annotations or labels
– Noise and modification
• How to efficiently index and query images
– Large number of images (millions)
?
Index
4
Image Representation: Bag-of-Words
[CVPR’09, ICCV’09]
1: Feature extraction: Bundle Features
2: Quantization
Detection
Code-book
[Lowe 2004,
Matas et al 2002,
Winder et al 2007]
3: Representation
Bag-of-words
[Sivic&Zisserman’2003]
Descriptor
ID
60
50
40
1
30
20
10
0
0
5
10
15
20
25
30
……
Normalization
3506
…
999,999
120
100
3506
80
60
40
20
0
5
10
15
0
5
10
15
0
5
10
15
20
25
30
20
25
30
20
25
30
1
206
……
0
60
Description
35
30
120
60
60
50
100
50
50
40
80
40
40
30
60
30
30
40
20
20
20
10
10
999,999
25
20
20
15
[Lowe 2004,
Winder et al 2007]
10
10
5
0
0
5
10
15
20
25
30
60
0
0
5
10
15
20
25
30
90
0
0
5
10
15
20
25
30
60
0
0
0
5
10
15
20
25
30
90
60
80
80
50
50
70
40
40
60
50
40
50
30
20
40
30
20
30
20
20
20
10
10
10
10
0
5
10
15
20
25
30
0
1,000,000
30
40
0
50
70
60
30
10
0
5
10
15
20
25
30
0
0
5
10
15
20
25
30
0
0
5
10
15
20
25
30
0
5
…
6
Matching query to database
• Use an index
– Each visual word has a ‘posting list’
– Lists every image containing the word
• At query time
– Look up the posting list for each query word
– Merge lists to find candidate images
• Partial match: don’t need every word to be present
7
How much work to query?
• Disk-based index, bottleneck is random reads
– One seek per posting list
• Also one seek per matching image
– To fetch thumbnail etc
• Keep as little information as possible in
posting lists, to keep index size small
8
Index Pipeline
• Implemented in a large computer cluster
– 256-nodes, using Dryad/DryadLINQ
Image
Crawler
Content
Chunk
Feature
Extractor
Bundled
Features
Feature
Quantizer
Visual Words
Media DB
(Thumbnail, URL)
Crawler
Local Features
Indexer
Inverted
Index
Visual Word
Index
9
Query Pipeline
Query
Image
Feature
Extractor
GUI
Results
…
Bundled
Features
Feature
Quantizer
Visual Words
Inverted
Index
Media DB
(Thumbnail, URL)
Index
Server
Search Results
(chunkID, imgID)
10
Bag-of-Words: Limitations
Quantization
• Quantization
60
30
20
10
0
60
0
5
10
15
20
25
30
50
40
20
10
0
80
0
5
10
15
0
5
0
5
10
15
0
5
10
15
20
25
30
25
30
20
25
30
20
25
30
70
60
50
40
30
20
10
0
0
5
10
15
20
25
30
120
100
3506
80
60
40
20
0
10
15
20
90
80
70
60
60
50
50
40
40
30
20
999,999
30
10
20
0
0
5
10
15
20
25
30
10
60
50
90
80
40
70
1000,000
30
60
50
20
40
10
30
0
20
[Jegou et al, ECCV 2008]
1
30
90
0
– Hamming embedding
ID
50
40
……
[Philbin et al, CVPR 2008]
70
……
– Lost discriminative
power
– Sensitive to image
variations and noises
– Soft quantization
Descriptor
90
80
10
0
0
5
10
15
20
25
30
Geometric verification
• In practice, bag of words is too weak
• Does not exploit any geometry
• Post-process to check spatial layout of
matching features
• Requires a disk seek per image
– Only used as a re-ranking step to shortlist of
matched images
12
Geometric Re-ranking
Re-rank top
300 images
0.75
0.7
0.65
mAP
0.6
0.55
0.5
0.45
0.4
0.35
baseline (bag-of-words)
baseline + reranking
0.3
0.25
50000
200000
500000
Number of images
1000000
Geometry in the index
• Previous works:
– Jegou et al ECCV 2008
• Try to match similar orientations and scales
– Perdoch et al CVPR 2009
• Match oriented features more effectively
• Still feature-by-feature
– Global geometric consitency applied at the end
14
Single Feature is Weak
+
+
Neighboring Features ?
Define Neighboring Features
• Previous works
– kNN voting [Sivic&Zisserman 2003]
– Higher-order spatial features
[Liu et al][Yuan et al][Tirilly et al][Quack et al]
– Post geometric spatial verification
[Lowe’2004][Chum et al 2007][Nister 2006][Philbin et al 2007]……
– Geometric Min-Hash [Chum et al 2009]
• Challenges
– Repeatable
– Partial matching
– Scalable: simple enough to build into index
Define Neighboring Features
DoG Features [Lowe 2004]
- point features
- repeatable
MSER Features [Matas et al 2002]
- region features
- repeatable
Define Neighboring Features
DoG Features [Lowe 2004]
- point features
- repeatable
MSER Features [Matas et al 2002]
- region features
- repeatable
region groups points?
Bundled Feature: Definition
• Bundled Feature =
A set of DOG features bundled by a MSER region
MSER region
DoG interest points
Bundled Feature: Definition
Bundled Features
Matching Bundles: Membership
Query bundle
q = {qj} = {
}
Matched bundle
p = { pi }
Membership score:
M m (q; p)  q  p  4
Voting weight:
v(q j )  M m (q; p)  4
Sim( I1 , I 2 )   v(q j )   4  16
q j 
q j 
Matching Bundles: Membership
Query bundle
q = {qj} = {
}
Matched bundles
p1, p2, p3
p1
Membership score:
M m (q; p1 )  q  p1  2
M m (q; p2 )  q  p2  1
M m (q; p3 )  q  p3  2
p2
p3
v(q j )  max M m (q; pk ) | q j  q
Sim( I1 , I 2 )   v(q j )  8
q j 
pk
v(q2 )  2 v(q1 )  2
v(q3 )  max(1, 2)  2
v(q4 )  2
Matching Bundles: Geometric Constraint
y
query
candidate
query
4
5
3
4
3
2
1
2
5
4
4
3
3
2
1
1
order in query img: 1 < 2 < 3 < 4
order in target img: 1 < 3 < 4 < 5
order inconsistency: 0 + 0 + 0 = 0
candidate
2
1
order in query img:
1 < 2 < 3< 4
matching order:
inconsistency:
5 > 2 > 1< 3
Penalize inconsistent relative orders:
M g (q; p)     Oq  pi   Oq  pi 1 
1 + 1 + 0= 2
Matching Bundles: Formulation
• Bundle matching score:
M ( q ; p )  M m (q; p)   M g (q; p)
membership
geometric constraint
• Image matching score:
v(q j )  max M (q; pk ) | q j  q 
pk
Sim( I1 , I 2 )   v(q j )
{q j }
- Repeatable
- Partial matching
- Scalable?
Inverted Index (without Bundles)
Visual
word
……
Posting
……
Image ID
Image ID = 27
…
…
…
27
…
27
……
……
Inverted Index with Bundles
Visual
word
……
Posting
……
Bundle Bits
Image ID
Image ID = 27
9 bits
1
p1
5 bits
5 bits
Bundle ID X-Order Y-Order
3
p2
…
…
…
27,
27,
[1,2,1] [2,2,2]
……
p3
……
2
…
27,
[3,1,1]
Retrieval
Query Image Iq
Inverted index with bundle bits
……
10, 1,
[1,1,2]
1,1,
[2,5,9]
3,1,
[3,4,5]
9,2,
[3,2,5]
…
10, 1,
[2,2,1]
… … …
12,1,
[1.1.2]
……
……
Top candidate images
p1
p2
p3
Experimental Settings
• Image database:
– 1M web images from query-click log
• Ground truth partial duplicates
– 780 known partial duplicate images in 19 groups
• Baseline bag-of-words
– Visual word vocabulary size = 1 M
– Soft quantization factor = 4
– 500 features per image
Partial Duplicate Example
……
Partial Duplicate Example
……
Example Query Results
Query
Challenging cases
Evaluation: Precision-Recall
• A query returns N images
– T : correct matches
– A : expected matches
T
Recall =
A
Precision
T
Precision =
N
Recall
Comparison: Precision-Recall
Query image:
Baseline bag-of-words (started from 13th)
Bundled features (started from 13th)
More Precision-Recall Comparisons
Evaluation: mAP
• Average Precision (AP) for one query:
– Area under Precision-Recall curve
AP
• mAP: mean of AP’s from all testing queries
mAP: Baseline Bag-of-Words
0.7
0.65
0.6
mAP
0.55
0.5
0.45
baseline
HE
bundled(membership)
bundled
bundled + HE
0.4
0.35
50000
200000
500000
Number of images
1000000
mAP: Hamming Embedding (HE)
0.7
0.65
0.6
mAP
0.55
0.5
0.45
baseline
HE
bundled(membership)
bundled
bundled + HE
0.4
0.35
50000
200000
500000
Number of images
1000000
mAP: Bundle (Membership)
0.7
0.65
0.6
mAP
0.55
0.5
0.45
baseline
HE
bundled(membership)
bundled
bundled + HE
0.4
0.35
50000
200000
500000
Number of images
1000000
mAP: Bundle (both terms)
0.7
0.65
0.6
mAP
0.55
0.5
0.45
26% 40%
baseline
HE
bundled(membership)
bundled
bundled + HE
0.4
0.35
50000
200000
500000
Number of images
1000000
mAP: Bundle + HE
0.7
0.65
0.6
mAP
0.55
0.5
0.45
49%
baseline
HE
bundled(membership)
bundled
bundled + HE
0.4
0.35
50000
200000
500000
Number of images
1000000
Bundle VS. Geometric Re-ranking
Re-rank top
300 images
0.75
0.7
0.65
mAP
0.6
0.55
0.5
0.45
0.4
0.35
0.3
0.25
baseline (bag-of-words)
bundle
baseline + reranking
bundle + reranking
50000
200000
500000
Number of images
1000000
Bundle + Geometric Re-ranking
Re-rank top
300 images
0.75
0.7
0.65
mAP
0.6
24%
0.55
0.5
0.45
0.4
0.35
0.3
0.25
baseline (bag-of-words)
bundle
baseline + reranking
bundle + reranking
50000
200000
500000
Number of images
1000000
77%
More Results
Failure Case
Demo Setup
Client
Web
Server
Index
Servers
Document
Server
6 million images
46
Demo
Query image
Results
47
Conclusion
• Bundle feature
– More discriminative
– Enforce spatial constraints while traversing index
– Partial match
– Scalable: built into index
9 bits
5 bits
5 bits
Bundle ID X-Order Y-Order
Thanks!
Download