Towards Bridging Semantic Gap and Intention Gap in Image Retrieval

advertisement
Towards Bridging Semantic Gap and Intention Gap in Image Retrieval
Hanwang Zhang1, Zheng-Jun Zha2, Yang Yang1, ShuichengYan1, Yue Gao1, Tat-Seng Chua1
1: National University of Singapore
2: Institute of Intelligent Machines, Chinese Academy of Sciences
1/33
2/33
Semantic
Hierarchy
High-level
Semantic
Semantic
Concept
ontological
Low-level
Visual Feature
Semantic Gap Bridged?
No !
semantic
3/33
User
Intention
User Feedback
Intention Gap Bridged?
No !
Low-level
Visual Feature
4/33
5/33
6/33
7/33
1
2
General framework for Content-based Image Retrieval
8/33
☞ 95,800 images are manually labeled with 33 attributes
☞ Automatically discovered 2-26 attributes for each
concept node
☞ 15 ~ 58 attributes per concept
9/33
☞ Attributes bridge the semantic gap
1
concept
attribute
2
10/33
☞ A2SH well defines attributes  more informative
Which “Wing”?
11/33
☞ A2SH bridges the intention gap
1
Leg
Skin
2
Leg
Tail
12/33
13/33
14/33
15/33
predicts whether an image belongs to
concept c
C
16/33
+
c
+
predicts whether an image belongs to
concept c
hierarchical one v.s. all
_
_
+
+
☞Exploit hierarchical relation
☞Alleviate error propagation
17/33
predicts the presence of an attribute a
of concept c
☞ Nameable attributes:
human nameable, hierarchical supervised learning
☞ Unnameable attributes:
human unnameable, hierarchical unsupervised learning
☞ They together offer a comprehensive description of the
multiple facets of a concept
18/33
☞Nameable attributes are
not discriminative enough.
Ear
Snout
Eye
☞Discover new attributes for
concepts that share many
nameable attributes.
☞2-26 for each concept.
Furry
D. Parikh, K. Graman. “Interactively Building a Discriminative
Vocabulary of Nameable Attributes”, CVPR 2011.
19/33
☞ Concept classifiers
Semantic path prediction
☞ Attribute classifiers
Image representation along the semantic path
Hierarchical Semantic Representation
20/33
Images are represented by attributes in the context of concepts
Hierarchical semantic similarity
21/33
Same concept  close, different concepts  far
22/33
☞ Concept classifiers
Semantic path prediction
☞ Attribute classifiers
Image representation along the semantic path
Hierarchical Semantic Representation
☞ Hierarchical Semantic Similarity Function
Semantic similarity between images
23/33
24/33
Hierarchical semantic similarity
Candidate images are retrieved by semantic indexing
c child(c) Ic
candidate
images
25/33
☞
A2SH: our method
☞
hBilinear: retrieves images by bilinear semantic metric
(Deng et al. 2011 CVPR)
☞
hPath: length (confidence) of the common semantic
path of an image and the query
☞
hVisual: hPath+visual similarity
☞
fSemantic: flat semantic feature similarity
☞
fVisual: visual feature similarity
Training: 50%, Gallery: 50% (95, 800 queries)
26/33
Method
fVisual
fSemantic
hVisual
hBilinear
A2SH
Time (ms)
1.18 x 104
3.62 x 103
7.42 x 102
4.47 x 102
70.6
27/33
matched
semantically similar
fVisual
hBilinear
A2SH
28/33
☞ Image-level Feedback
Query
29/33
☞ Attribute-level Feedback
Query
Leg
Cloth
Zhang et al. “Attribute Feedback”, MM 2012
30/33
2-min Method
fixed time MAP@20
A2SH
HF
QPM
SVM
0.25
0.22
0.21
0.21
31/33
initial
matched
semantically similar
QPM
HF
A2SH
32/33
Attribute-augmented
Semantic Hierarchy
SH with Attributes
Framework for CBIR
Effectiveness
Verified
Gaps bridging
1.23 M Images
33/33
mammal
selected
base
confusion matrix
confusion matrix
 Only leaves have images and each concept’s images are
merged bottom-top
 50% to 50% training and testing (gallery)
 100 random images per leaf from testing are used as
queries
 100 random images from each leaf’s training images are
annotated with attributes
 Color, texture, edge and multi-scale dense SIFT. LLC
with max-pooling, 2-level spatial pyramid. 35,903-d feature
vector
0.93
0.92
0.77
Download