Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos Motivation • Digital cameras and mobile phone cameras popularize rapidly: – More and more personal photos; – Retrieving images from enormous collections of personal photos becomes a more and more important topic. ? How to retrieve? Prior Work: CBIR • Content-Based The paramountImage challenge Retrieval -- semantic (CBIR)gap: – The Users gap provide between images the low-level as queries visual to retrieve features and personal the high-level photos. semantic concepts. query result Image with highlevel concept … Semantic Gap compare Low-level Feature vector … … Feature vectors in DB Prior Work: Image Annotation Image annotation is used classify images w.r.t. high-level •• It more convenient forto the user to retrieve the Anisintermediate stage for textual query based image semantic desirable personal photos using textual queries. retrieval. concepts. – Semantic concepts are analogous to the textual terms describing document contents. annotate query Sunset compare database result Annotation Result: high-level concepts … … Retrieve Idea • But Leverage information from web images tocategories retrieve doand Web raw images consumer are accompanied photos from bydigital tags, cameras not consumer photos in personal photo collection. titles. contain such semantic textual descriptions – Google and Flickr exploit them to index web images. building people, family information Web Contextual people, wedding Images Information No intermediate image annotation sunset process. …… …… Web Images Consumer Photos Framework Large Collection of Web images (with descriptive words) Textual Query Automatic Web Image Retrieval WordNet Raw Consumer Photos Relevant/ Irrelevant Images Classifier Consumer Photo Retrieval Relevance Feedback Top-Ranked Consumer Photos Refined Top-Ranked Consumer Photos • When Then, auser classifier isuse trained based on theseto web provides arelevance textual query, It would And The user then be can consumer used alsoto find photos relevant/irrelevant can be feedback ranked based refine images. images on the the retrieval classifier’s in web results. image decision collections. value. Automatic Web Image Retrieval …… …… Relevant Web Images boat …… Inverted File “boat” ark barge …… Irrelevant Web Images …… dredger houseboat Semantic Word Trees Based on WordNet • The which do notfirst containing the query word For web The user’s webimages images textual containing query, the search query it word in theare and its two-level descendants areimages”. considered as “irrelevant semantic considered word as “relevant trees. web web images”. Classifier Training Relevant Web Images ds ds ds ds …… …… …… sample1 sample2 sample3 sample4 Irrelevant Web Images Classifier f s(x) • Construct 100 smaller training sets: – Negative Samples: Randomly sample a fixed number ofon irrelevant • Based Finally, on linearly each training combine set, alltrain decision decision stumps stumps based oneach their web images for 100 times; dimension. training errors. – Positive Samples: The relevant web images. Relevance Feedback via Cross-Domain Regularized Regression User-labeled images x1,…,xl Other images f T(x) should be close to +1 (labeled as positive) −1 (labeled as negative) f T(x) should be close to f s(x) A regularizer to control the complexity of the target classifier f T(x) a targetcan linear classifier (x) =square wTx. • Design This problem be solved withf Tleast solver. Source Classifiers • Decision Stump Ensemble: – Trained on each dimension for each bag; – Decision values are fused after a sigmoid mapping: fd(x) = ∑i γid h(sid(xd-θid)); – Pros: • Non-linear; • Easy to be parallelized; – Cons: • Testing is time-consuming; Accelerating Source Classifiers • One possible solution: – Remove sigmoid mapping: • fd(x) = ∑i γid sid(xid-θid) = (∑i γid sid)xi-(∑i γid sidθid); • Assume there are N bags, D dims: – Testing Complexity: O(ND) --> O(D) – Cons: • Become linear; – Too weak. Accelerating Source Classifiers • Another possible solution: – Use linear svm instead of decision stump ensemble. • Train 1 linear svm classifier for each bag; • Fuse the decision values with a sigmoid mapping; – Pros: • It is hopeful to use less bags to achieve a satisfying retrieval precision; • Although testing complexity is still O(ND), there are much less ``exp'' function calls (ND --> N); • Individual classifiers are computed with just a vector dot product, which can be efficiently computed with SIMD instructions. Comparison on Time Cost Comparison on Time Cost Performance Comparison Relevance Feedback +1 positive -0.1 negative Error Rate Refinement during RF • Assume that there are M training data, in which E instances are incorrectly classified. – err_rate = E / M; • For fs(x), when user labels one instance x as y \in (-1, 1): – If fs(x) = y, then • err_rate = E / (M + α) – If fs(x) = -y, then • err_rate = (E + α) / (M + α) The End