Semantic Extraction and Semantics-Based Annotation and Retrieval for Video Databases Authors: Yan Liu & Fei Li Department of Computer Science Columbia University Presented by: Maleq Khan November 13, 2002 Introduction Rapid growth and wide application of video database leads to fast video data retrieval upon user query Problem Statement Finding video clips in large database quickly Semantic interpretation of visual content “Find video shots where President Bush is stepping off an airplane” Extraction and representation of temporal information “Find video shots where Purdue President Martin Jischeke is handshaking with President Bush after he stepped off an airplane” Representation of spatial information Semantics Annotation Manual Annotation is not feasible for large database Many different semantics interpretation Need automatic annotation Background Video shots: unbroken sequence of frames Key frame: frame that represent the salient feature or content of a shot Video scene: Collection of semantically related and temporally adjacent shots Background (continued) Story unit, U: a collection of interesting objects in a shot Locales, d: background of a shot A ≡ Ui dj: Ui takes place in locales dj Dialogue: A B A B A B A ….., A B A C A B A B Action: progressive sequence of shots with contrasting visual data content. VIMS (Video Info Management Sys) Video data Video browsing Segmentation Color Motion Shape Key frame computation Feature extraction Video query, retrieval and production … Semantics-Based Query Image matching and content based retrieval are based on visual similarity Unable to answer semantics-based query “A red car is running by a tree” Extracting temporal/spatial information hidden in video is necessary for semantic description Semantic Description Model Color Motion … Object searching Object tags High-level retrieval High level description Direction Object recognition Temporal Comp. Sample database Temporal diagram Temporal Diagram scene Objects with position and recording info Links based on similarity in story Link to other scene for browsing scene Link to other video Using bibliographic data Object Tracking Position is identified with a boundary rectangle Motion is defined as change of relative positions with a still object If viewing direction changes by angle , multiply all position info by cos. Dynamic Tag Building An array to store semantic description New query: search tag first If not found, run the procedure and new semantic description is added to the tag. Summary Automatic semantic extraction Object tracking Temporal diagram Automatic tag building Comments Identify moving objects if a relative still object is given Cannot distinguish different kind of motions Temporal diagram is not complete Claimed “real-time computation for large digital library” but no theoretical or experimental result is given