CuZero: Interactive Video Search for Informed User Shih-Fu Chang (joint work with Eric Zavesky) Digital Video and Multimedia Lab Columbia University Oct. 2008 The Goggle Era Search Paradigm “…type in a few words at most, then expect the engine to bring back the perfect results. More than 95 percent of us never use the advanced search features most engines include, …” – The Search, J. Battelle, 2003 Simple keyword search is the primary search method Extension to visual search seems to be a natural desirable step Examples: text search on Google, Flickr, Youtube digital video | multimedia lab The Role of User Formulate query Result triage An Example of Web Image Search Text Query : “Manhattan Cruise” over Goggle Image How to visualize large image/video sets? Why are these images returned? No word-document matching If this does not work, how do we choose better search terms? digital video | multimedia lab Minor Changes in Keywords Big Difference Text Query : “Cruise around Manhattan” digital video | multimedia lab Pains of Frustrated Users Forced to take “one shot” searches, iterating queries with a trial and error approach... Problem: User’s Inability in Forming Queries Difficult to choose words/concepts without in-depth knowledge of data and vocabulary A Call for Research Attention ? Research needed to address user frustration A lot of work on content analytics Solution-Keep User Informed and Engaged Help users stay “informed” in formulating queries in understanding search results in changing search rapidly and flexibly Keep users “engaged” Instant feedback with minimal disruption as opposed to slow “trial-and-error” An attempt: CuZero Zero-Latency Informed Search & Navigation A lot of efforts in visual tagging Multi-source tagging Social tagging Tagging by game Geo-context tagging Auto-tagging by content analysis Useful for keyword search S.-F. Chang, Columbia U. 11 Tagging by Content Analytics Audio-visual features Machine learning models Web metadata Context (geo, user etc) Large pools of semantic tags, concept labels Semantic Indexes - + ... Statistical models Anchor Snow Soccer Building Outdoor Example: LSCOM concept pool/models 374 concept detection models: objects, people, location, scenes, events, etc airplane airplane_takeoff airport_or_airfield armed_person building car cityscape crowd desert dirt_gravel_road entertainment explosion_fire forest highway hospital insurgents landscape maps military military_base military_personnel mountain nighttime people-marching person powerplants riot river road rpg shooting smoke tanks urban vegetation vehicle waterscape_waterfront weapons weather A User Community Approach to Vocabulary Development Large Scale Concept Ontology for Multimedia (IBM-Columbia-CMU joint effort in 2005-6) Ontology for tagging multi-source news video for analyst tasks Joint effort by government analysts, librarians, researchers Criteria: Useful, Observable, and Machine Detectable 834 concepts defined, extended to 2000+ by Cyc, 449 concepts annotated Labeled over TRECVID 2005 development set, 61,000 shots 30+ annotators at Columbia and CMU 33 million judgments Free for download (350+ downloads so far) 14 Results of sample concept detectors waterfront bridge crowd explosion fire US flag Military personnel • Does not really solve the computer vision problem • But the plurality of the models make it interesting for search • Similar to speech retrieval from imperfect ASR digital video | multimedia lab Sample detection results (TRECVID2008) Airplane flying Classroom Demonstration Or Protest Cityscape Singing Example: “Smoke” Detector (Video Source: Youtube) • Compare to other tags, content-based analysis provide precision for deep tagging • Ping point exact segment of concept A prototype CuZero: Zero-Latency Informed Search & Navigation Informed User: Instant Informed Query Formulation Informed User for Visual Search: Instant visual concept suggestion Instant Concept Suggestion query time concept mining Lexical mapping LSCOM Mapping keywords to concept definition, synonyms, sense context, etc Co-occurrent concepts images car road Basketball courts and the American won the Saint Denis on the Phoenix Suns because of the 50 point for 19 in their role within the National Association of Basketball George Rizq led Hertha for the local basketball game the wisdom and sports championship of the president text Baghdad to attend the game I see more goals and the players did not offer great that Beijing Games as the beginning of his brilliance Nayyouf 10 this atmosphere the culture of sports championship Query-Time Concept Mining Initial text query results Find dominant visual concepts person meeting dominant concepts visual mining CuZero Real-Time Query Interface (demo) Query auto completion & expansion Instant Concept Suggestion Achieving Zero Latency Exploration Concept formulation (“car”) concept formulation (“snow”) • Overlap (concept formulation) with (map rendering) • Hide rendering latency during user interaction • Course-to-fine concept map planning/rendering • Speed optimization on-going … A prototype CuZero: Zero-Latency Informed Search & Navigation Address Information Overflow: Effective Visualization linear browser restricts inspection flexibility CMU Informedia Concept Filter only people only outdoor Informed User: Rapid Exploration of Results Media Mill Rotor Browser CuZero visualization: Help users stay informed and engaged Create a concept map as search anchors Direct user control by sliding through the map Instant display for each location, without new query “sky” “water” “boat” Achieve Breadth-Depth Flexibility by instant concept map navigation (demo) Breadth: Quick sliding in the concept map Depth: Deep exploration of each result list Concept map Deep exploration of single permutation Columbia CuZero Video Search Engine - addressing the search frustration Query Formulation Support intuitive formation in arbitrary combination of semantic concepts and visual examples (Video Source: Youtube) road fire smoke Navigation and exploration Real-time refinement and update of query. Fast, lightweight environment Web-based client-server approach for instant deployment with large data archive. Smoke Road Explosion Running 1 Running 2 dynamic refinement Simple Extensions More flexibility in modifying concept map Construct composite “super” concepts from simple ones Construct personal search toolbox Apply concept template to streaming mode Concept Map Refinement (current method requires multi-panel interaction) 3 Changing navigation map requires additional query reformulation and search execution 1 User wants to replace 2 ‘road’ with a related concept Co-occurrence can suggest related concepts 4 Will interrupt the browsing task and requires cumbersome text-entry and manual concept selection 33 In-place Anchor Editing • Allow fast lookup of related anchors or allow user to swap in a ʻback-upʼ anchor • May also suggest strong metadata relationships Car Road • Capture time • Author/channel Urban 8:00 PM • Geo location 34 NBC Outdoor Formulating a Super-Anchor • Navigation map allows precise specification of anchor weights Car • For search topics of high-interest, user may have invested significant time in formulation/browsing • With ideal location in navigation map, want to save this anchor configuration Super Anchor Road (explosion + car + outdoor + road) Explosio n Fire Outdoor • Apply on new search at later time • Share with other users for collaborative filtering • Eventually build a personalized search tool box 35 Super-Anchor Filters Super-Anchors: Automated Alerts • Passive monitoring of live streams to trigger alert • Analyst benefits from customized search parameters for specific target • Also reuse super-anchors in new navigation map. Snow Super Anchor a b alternate database or streaming content Super Anchor (explosion + car + outdoor + road) 36 c threshold Deeper issues Is keyword-concept search the right paradigm? Perhaps for certain vertical domains? How far can automatic content analytics go? Scale up the number of concepts? Find the right visual lexicon? Are they accurate enough for search? S.-F. Chang, Columbia U. 37 Summary Effective visual search requires attention to help improve user experience More challenges than tagging and retrieval CuZero Design Principles Keep user informed and engaged in Query formulation and result visualization Instant feedback Support fusion of search modalities Visual concept Example Keyword /speech transcripts More Information LSCOM lexicon and annotation Include 449 concepts (object, scene, event et al) Columbia DVMM Lab S.-F. Chang, Columbia U. 374 SVM-based concept detectors Video search demos 39