Contextual Image Search Wenhao Lu , Jingdong Wang , Xian-Sheng Hua, Shengjin Wang , Shipeng Li Tsinghua University, Beijing, P. R. China, Microsoft Research Asia, Beijing, P. R. China, MM 2011 Outline System overview Database construction Contextual image search with text/image input Experiment Future Work 2 MM 2011 System overview Text input 3 MM 2011 System overview Image input 4 MM 2011 Database construction 5 MM 2011 Database construction 1. Feature extraction (MSER) extracts stable regions from the image by considering the change in area w.r.t the change in intensity of a connected component defined 6 MM 2011 Database construction 2. SIFT descriptor 7 MM 2011 Database construction 2. SIFT descriptor 8 MM 2011 Contextual Image Search With Text Input 1. Context Capturing textual contexts: page title / document title local context visual contexts: vision-based page segmentation algorithm (VIPS) 9 MM 2011 vision-based page segmentation MM 2011 Traditional DOM tree 10 vision-based page segmentation 11 MM 2011 VIPS vision-based page segmentation DOM tree +Visual Info Tag cue: <HR> Color cue: background color Text cue Size cue 12 MM 2011 Contextual Image Search With Text Input 2. Contextual Query Augmentation Goal: remove possible ambiguities Augmented query = query + textual context Candidate augmented query MM 2011 evaluate the relevance between the context and augmented query (Okapi BM25) 13 Contextual Image Search With Text Input 2. Contextual Query Augmentation Okapi BM25 : extended context (using synonyms, stemming, and so on) ~ k=2.0, b=0.75 14 MM 2011 Contextual Image Search With Text Input 2. Contextual Query Augmentation 3. Image Search by Text Rank score = : static score (ex. the Web page holding this image) 15 Contextual Reranking textually contextual reranking , : discarding the augmented query related words visually contextual reranking 1. Filter out images whose semantic contents may not be relevant to the query. (compute local textual context and query) 16 MM 2011 Contextual Reranking visually contextual reranking 2. Visual word weight: Find common pattern 3. Compute similarity :visual contexts : an image : histogram vector of i MM 2011 : histogram vector of k 17 Overall Ranking = 0.2 = 0.2 =1 18 MM 2011 Contextual Image Search with Image Input 3 1. Search to annotation discovers the candidate textual queries using the technique “Annotating images by mining search result” (IEEE 2008) 19 MM 2011 Contextual Image Search with Image Input 3 1. Search to annotation 20 MM 2011 Contextual Image Search with Image Input 3 1. Search to annotation First : find similar image Second: surrounding texts of the obtained duplicated images are mined to get a list of candidate textual queries visual features semantic features MM 2011 Contextual Image Search with Image Input 1. Search to annotation 22 MM 2011 Contextual Image Search with Image Input 2. Contextual query identification calculate ~ 23 MM 2011 Experiment 15,000,000 images and associated web pages 5 users (level 0~level 3) 24 MM 2011 Experiment 0.95 0.65 nDCG curves MM 2011 25 Experiment Visual Result for Text Input 26 MM 2011 Experiment Visual Result for Text Input (Textual Reranking) 27 MM 2011 Experiment Visual Result for Text Input (Visual Reranking) 28 MM 2011 Experiment Visual Result for Image Input textual query “Van gogh” 29 MM 2011 Future Work 1. More general contextual image search, including mobile image search with wider contexts (e.g., position, time, and history) 2. Extend contextual image search to contextual video search by applying the proposed methodology and investigating extra video contexts 30 MM 2011