Image Retrieval with Geometry-Preserving

advertisement
Image Retrieval with
Geometry-Preserving Visual Phrases
Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen
Cornell University
Similar Image Retrieval
…
Image Database
Ranked relevant images
Bag-of-Visual-Word (BoW)
Length: dictionary size
…
Images are represented as the histogram of words
Similarity of two images: cosine similarity of histograms
Geometry-preserving Visual Phrases
length-k Phrase:: k words in a certain spatial layout
Bag of
Phrases:
…
…
(length-2 phrases)
Phrases vs. Words
Irrelevant
Word
Length-2
Length-3
Relevant
Word
Length-2
Length-3
Previous Works
Geometry Verification
Only on top ranked images
Searching Step
with BoW
…
Encode Spatial Info
Post-processing (Geometry Verification)
Modeling relationship between words
…
Previous works: reduce
…the number of phrases
(length-2 Phrase)
Dimension: exponential to # of words in Phrase
Co-occurrences in Entire image [L. Torresani, et al, CVPR 2009]
No spatial information
Phrases
a local neighborhoods
Yuan et al, CVPR07][Z. Wu et al., CVPR10]
Ourin work:
All phrases, [J.Linear
computation time
[C.L.Zitnick, Tech.Report 07]
No long range interactions, weak geometry
Select a subset of phrases [J. Yuan et al, CVPR07]
Discard a large portion of phrases
Approach
Overview
1. Similarity Measure
BoW
[Zhang and Chen, 09]
2. Large Scale Retrieval
Inverted
Files
Min-hash
BoP
This Paper
Inverted
Files
Min-hash
Co-occurring Phrases
Only consider the translation difference
AB
D
C
F
E F
A
AB
C D
A
F
E F
[Zhang and Chen, 09]
Co-occurring Phrase Algorithm
y  y' y
AB
D
C
F
E F
A
AB
C D
A
E F
F
3
2
1 EF
0
-1 F
-2
-3
-4
A
F
F B A
C
A
-2 -1
DF
# of co-occurring
length -2 Phrases:
 2
1 +1    =5
3
A
0
1 2
x  x' x
3 4
Offset space
[Zhang and Chen, 09]
Relation with the feature vector
…
…
 (x)
k
…
…
O(M ) sameask (y)
BOW!!!
kO(|  |k | X k|k 1| Y |k 1)
  ( x),  ( y)  # of co-occurring length-k phrases
Inner product of the feature
vectors
M: # of corresponding pairs, in practice, linear to
the number of local features
Inverted Index with BoW
Avoid comparing with every image
Inverted Index
…
…
…
…
…
Image ID
Score
I1
I2
+1
Score table
…
In
Inverted Index with Word Location
I1
…
…
…
…
Assume same word only occurs once in the same image,
 Same memory usage as BoW
…
…
…
Score Table
Compute # of Co-occurring Phrases:
Compute the Offset Space
Image ID
BoW
…
I2
In
Score
I1
BoP
I1
I2
In
…
Inverted Files with Phrases
Inverted Index
…
-1,-1 0,-1 1,-1
-1,0 0,0 1,0
…
0,1
wi
I1 I10
…
I8
…
I5
…
…
…
+1
Offset
Space
+1
…
+1
…
+1
…
…
Final Score
5
Offset
Space
I2
I1
In
2
4
8
2
2
1
Image ID
…
3
I1
2
10
1
I2
…
Score
Final similarity scores
In
Overview
BoP
BoW
Less storage and
time complexity
Inverted
Files
Min-hash
Inverted
Files
Min-hash
Min-hash with BoW
m fi
I
I’
Probability of min-hash collision
(same word)
= Image Similarity
Min-hash with Phrases
m fi
m fj
I
I’
y  y ' y
3
2
1
0
-1
-2
-3
-4 -3
-2 -1 0 1 2
x  x' x
Offset space
Probability of k min-hash collision with consistent geometry
(Details are in the paper)
Other Invariances
Add dimension to the offset space
Increase the memory usage
Image I
x
p1
yˆ  y  y '
y
p3
s
s'
p2
Image I’
x'
y'
log(sˆ)
xˆ  x  x'
s
s'
[Zhang and Chen, 10]
Variant Matching
Local histogram matching
Evaluation
1.
BoW + Inverted Index vs. BoP + inverted Index
2.
BoW + Min-hash vs. BoP + Min-hash
Post-processing methods: complimentary to our work
Experiments –Inverted Index
 5K Oxford dataset (55 queries)
 1M flicker distracters
Philbin, J. et al. 07
Example Precision-recall curve
BoW
BoP
Precision
Precision
BoW
BoP
Recall
Higher precision at lower recall
Recall
Comparison
Mean average precision: mean of the AP on 55 queries
mAP
BoP+RANSAC
0.700
0.650
0.600
BoP
0.550
BoW
0.500
BoW+RANSAC
0.450
0
200
400
600
Vocabulary Size (K)
800
1000
Outperform BoW (similar computation)
Outperform BoW+RANSAC (10 times slower on 150 top images)
Larger improvement on smaller vocabulary size
+Flicker 1M Dataset
0.65
0.6
0.55
0.5
0.45
0.4
mAP
0
BoW
BoP
200
400
600
800
Number of Images
1000
Computational Complexity
Method
Memory
BoW
BoP
BoW+RANSAC
8.1G
8.5G
-
Runtime (seconds)
Quantization
Search
0.137s
0.89s
0.215s
4.137s
0.89s
RANSAC: 4s on top 300 images
Experiment - min-hash
University of Kentucky dataset
3.30
3.20
3.10
3.00
BoW
2.90
BoP
2.80
200
500
800
# of min-hash fun.
Minhash with BoW: [O. Chum et al., BMVC08]
Conclusion
 Encode more spatial information into the BoW
 Can be applied to all images in the database at the
searching step
 Same computational complexity as BoW
 Better Retrieval Precision than BoW+RANSAC
Download