Multimodal visual resource discovery Stefan Rüger

Multimodal visual resource discovery Stefan Rüger http://mmir.doc.ic.ac.uk 1 © Imperial College London Multimodal visual resource discovery • • • • • • 3 Multimedia Content Management Semantic gap and automated annotation Polysemy and relevance feedback Content-based visual search Video summarisation Lateral browsing © Imperial College London Research: Multimedia content management Information is of no use unless you can actually access it. [from trec.nist.gov] 4 © Imperial College London Multimedia Content Management • Archive – text, video, images, speech, music, web content, … • Query – text, stills, sketch, speech, humming, examples, … – content-based • Value-added services – browsing, summaries, story boards – document clustering, cluster summaries • Putting the user in the loop 5 © Imperial College London text stills sketch speech sound humming motion New types of Search Engines query  text video    Example doc images speech music sketches multimedia 6 © Imperial College London conventional text roar retrieval you and get a wildlife type “floods” documentary and BBCand humget a tune radio news get a music piece Some applications • Medicine – Get diagnosis of cases with similar scans • Law enforcement – Copyright infringement – CCTV video retrieval • Digital libraries – Novel resource discovery methods • Industrial applications – Multimedia knowledge management 7 © Imperial College London Multimodal visual resource discovery • • • • • • 8 Multimedia Content Management Semantic gap and automated annotation Polysemy and relevance feedback Content-based visual search Video summarisation Lateral browsing © Imperial College London The semantic gap 1. 120,000 pixels with a particular spatial colour distribution 2. faces (Caucasian & African) and a vase-like object 3. victory, triumph, ... 9 © Imperial College London Automated image annotation as machine translation problem water grass trees the beautiful sun le soleil beau 10 © Imperial College London Automated image annotation as machine learning problem • Training set, eg, 32,000 images such as animal, creature, face, grass, nature, outdoors, outside, portrait, teeth, tiger, vertical, water, wild, wildlife 11 © Imperial College London Automated image annotation as machine learning problem • Probabilistic models: – maximum entropy models – models for joint and conditional probabilities – evidence combination with Support Vector Machines [PhD work in progress: A Yavlinsky, Nov 2003-, partly ORS funded] [PhD work in progress: J Magalhães, Oct 2004-, funded by FCT] [with Magalhães, SIGIR 2005] [with Yavlinsky and Schofield, CIVR 2005] [with Yavlinsky, Heesch and Pickering: ICASSP May 2004] 12 © Imperial College London A simple Bayesian classifier • Use training data J and annotations w • P(w|I) is probability of word w given unseen image I • The model is an empirical distribution (w,J) 13 © Imperial College London Automated image annotation: an example Automated: water buildings city sunset aerial 14 © Imperial College London Bridging the semantic gap • Region classification (grass, water, sky, ...) • Infer complex concepts from simple models (grass+plates+people = barbecue) [CultureGrid EU proposal, application Sept 2005] [collaboration with AT&T Research, Cambridge 2001] 15 © Imperial College London Multimodal visual resource discovery • • • • • • 16 Multimedia Content Management Semantic gap and automated annotation Polysemy and relevance feedback Content-based visual search Video summarisation Lateral browsing © Imperial College London Polysemy 17 © Imperial College London Relevance feedback • endow system with plasticity (parameters) • system needs to learn from user = change the parameters [funding co-application: relevance feedback using an eye-tracker, Jan 2005] [with Heesch and Yavlinsky, CIVR 2003] [with Heesch, ECIR 2003] 18 © Imperial College London 19 © Imperial College London 20 © Imperial College London 21 © Imperial College London 22 © Imperial College London Multimodal visual resource discovery • • • • • • 23 Multimedia Content Management Semantic gap and automated annotation Polysemy and relevance feedback Content-based visual search Video summarisation Lateral browsing © Imperial College London Content-based Visual Search [invited talks: AMR 2005, National e-Science workshop 2005] [invited panels: EWIMT 2004, AMR 2005] [Choplin 1.3 open-source software, 3 Q 2005] [iBase 1.2 open-source software, 3 Q 2005] [with Howarth, Information Retrieval, submitted 2005] [with Howarth CIVR 2005] [with Chawda, Craft, Cairns and Heesch, HCI 2005] [with Howarth, ECIR 2005] [with Heesch, Pickering, Howarth and Yavlinsky, JCDL 2004] [with Yavlinsky, Heesch and Pickering, ICASSP 2004] [with Howarth, CIVR 2004] [with Pickering, Computer Vision and Image Understanding, 92(1), 2003] [with Heesch, Pickering and Yavlinsky, TRECVID 2003] [with Heesch and Yavlinsky, CIVR 2003] [with Pickering, Heesch, O’Callaghan and Bull, TREC 2002] [with Pickering and Sinclair, CIVR 2002] 24 College London [with Pickering, TREC© Imperial 2001] 11 Primitive Features for photographs • 4 Colour histograms – global HSV, centre HSV, marginal RGB colour moments, colour structure descriptor • 2 Structure maps – Convolution/wavelet map features on grey image • 3 Texture descriptors – Using signal processing, psychology and statistics • 1 Text index – Bag of stemmed-words (tf-idf) • 1 Localisation feature – Thumbnail of grey image 25 © Imperial College London Evaluation conference TRECVID • TRECVID topic 102: Find shots from behind the pitcher in a baseball game as he throws a ball that the batter swings at 26 © Imperial College London TRECVID 2003 interactive search evaluation MAP RANK (out of 36) 27 Median 0.19 Mean 0.18 S + RF + B 0.26 5 S + RF 0.26 4 S+B 0.23 8 © Imperial College London One example feature class: texture • regional property – contrast, coarseness, directionality, … – co-occurrence statistics – signal processing – based on psychological experiments [with Howarth, IEE Proc on Vision, Image and Signal, accepted 2005] [with Howarth, Yavlinsky and Heesch, CLEF 2004] [with Howarth, CIVR 2004] 28 © Imperial College London Image CLEF • Medical image collection, 8725 images, 25 topics • mean average precision 34.5%, 3rd within 1% of 1st 29 © Imperial College London Multimodal visual resource discovery • • • • • • 30 Multimedia Content Management Semantic gap and automated annotation Polysemy and relevance feedback Content-based visual search Video summarisation Lateral browsing © Imperial College London Video summary story-level segmentation keyframe summary videotext summary named entities full-text search [technology being licensed by Imperial Innovations to industry] [UK patent application 2004] [finished PhD project: Pickering, now at HTB, London] [with Wong and Pickering, CIVR 2003] [with Lal, DUC 2002] [Pickering: best UK CS student project 2000 – national prize] 31 © Imperial College London Multimodal visual resource discovery • • • • • • 32 Multimedia Content Management Semantic gap and automated annotation Polysemy and relevance feedback Content-based visual search Video summarisation Lateral browsing © Imperial College London Lateral browsing: NNk network • Precompute all possible nearest neighbours – instantiate network within images of database [with Heesch, CIVR 2005] [with Heesch, CIVR 2004] [with Heesch, ECIR 2004] 33 © Imperial College London Nearest neighbours wrt features query Structure Colour Text 34 © Imperial College London Browsing example • TRECVID topic 102: Find shots from behind the pitcher in a baseball game as he throws a ball that the batter swings at 35 © Imperial College London [Alexander May: SET award 2004: best use of IT – national prize] 36 © Imperial College London 37 © Imperial College London 38 © Imperial College London 39 © Imperial College London Properties • • • • • • 40 Exposure of semantic richness User decides which image meaning is correct Network precomputed -> fast and interactive Supports search without query formulation 3 degrees of separation (for 32,000 images) scale-free, small-world graph © Imperial College London Conclusions: Digital Content Management MM IR ML IS HCI DB KM 41 © Imperial College London Multimodal visual resource discovery Stefan Rüger http://mmir.doc.ic.ac.uk 42 © Imperial College London

Multimodal visual resource discovery Stefan Rüger

Related documents

Products

Support

Multimodal visual resource discovery Stefan Rüger

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib