Towards Bridging Semantic Gap and Intention Gap in Image Retrieval Hanwang Zhang1, Zheng-Jun Zha2, Yang Yang1, ShuichengYan1, Yue Gao1, Tat-Seng Chua1 1: National University of Singapore 2: Institute of Intelligent Machines, Chinese Academy of Sciences 1/33 2/33 Semantic Hierarchy High-level Semantic Semantic Concept ontological Low-level Visual Feature Semantic Gap Bridged? No ! semantic 3/33 User Intention User Feedback Intention Gap Bridged? No ! Low-level Visual Feature 4/33 5/33 6/33 7/33 1 2 General framework for Content-based Image Retrieval 8/33 ☞ 95,800 images are manually labeled with 33 attributes ☞ Automatically discovered 2-26 attributes for each concept node ☞ 15 ~ 58 attributes per concept 9/33 ☞ Attributes bridge the semantic gap 1 concept attribute 2 10/33 ☞ A2SH well defines attributes more informative Which “Wing”? 11/33 ☞ A2SH bridges the intention gap 1 Leg Skin 2 Leg Tail 12/33 13/33 14/33 15/33 predicts whether an image belongs to concept c C 16/33 + c + predicts whether an image belongs to concept c hierarchical one v.s. all _ _ + + ☞Exploit hierarchical relation ☞Alleviate error propagation 17/33 predicts the presence of an attribute a of concept c ☞ Nameable attributes: human nameable, hierarchical supervised learning ☞ Unnameable attributes: human unnameable, hierarchical unsupervised learning ☞ They together offer a comprehensive description of the multiple facets of a concept 18/33 ☞Nameable attributes are not discriminative enough. Ear Snout Eye ☞Discover new attributes for concepts that share many nameable attributes. ☞2-26 for each concept. Furry D. Parikh, K. Graman. “Interactively Building a Discriminative Vocabulary of Nameable Attributes”, CVPR 2011. 19/33 ☞ Concept classifiers Semantic path prediction ☞ Attribute classifiers Image representation along the semantic path Hierarchical Semantic Representation 20/33 Images are represented by attributes in the context of concepts Hierarchical semantic similarity 21/33 Same concept close, different concepts far 22/33 ☞ Concept classifiers Semantic path prediction ☞ Attribute classifiers Image representation along the semantic path Hierarchical Semantic Representation ☞ Hierarchical Semantic Similarity Function Semantic similarity between images 23/33 24/33 Hierarchical semantic similarity Candidate images are retrieved by semantic indexing c child(c) Ic candidate images 25/33 ☞ A2SH: our method ☞ hBilinear: retrieves images by bilinear semantic metric (Deng et al. 2011 CVPR) ☞ hPath: length (confidence) of the common semantic path of an image and the query ☞ hVisual: hPath+visual similarity ☞ fSemantic: flat semantic feature similarity ☞ fVisual: visual feature similarity Training: 50%, Gallery: 50% (95, 800 queries) 26/33 Method fVisual fSemantic hVisual hBilinear A2SH Time (ms) 1.18 x 104 3.62 x 103 7.42 x 102 4.47 x 102 70.6 27/33 matched semantically similar fVisual hBilinear A2SH 28/33 ☞ Image-level Feedback Query 29/33 ☞ Attribute-level Feedback Query Leg Cloth Zhang et al. “Attribute Feedback”, MM 2012 30/33 2-min Method fixed time MAP@20 A2SH HF QPM SVM 0.25 0.22 0.21 0.21 31/33 initial matched semantically similar QPM HF A2SH 32/33 Attribute-augmented Semantic Hierarchy SH with Attributes Framework for CBIR Effectiveness Verified Gaps bridging 1.23 M Images 33/33 mammal selected base confusion matrix confusion matrix Only leaves have images and each concept’s images are merged bottom-top 50% to 50% training and testing (gallery) 100 random images per leaf from testing are used as queries 100 random images from each leaf’s training images are annotated with attributes Color, texture, edge and multi-scale dense SIFT. LLC with max-pooling, 2-level spatial pyramid. 35,903-d feature vector 0.93 0.92 0.77