ENEE631 Spring’09 Lecture 19 (4/13/2009) Video Content Analysis and Streaming Spring ’09 Instructor: Min Wu Electrical and Computer Engineering Department, University of Maryland, College Park bb.eng.umd.edu (select ENEE631 S’09) minwu@eng.umd.edu M. Wu: ENEE631 Digital Image Processing (Spring'09) Overview and Logistics UMCP ENEE631 Slides (created by M.Wu © 2004) Last Time: – General methodologies on motion analysis – Optical flow equations Today: – Wrap up motion analysis – Video content analysis Basic framework Temporal segmentation; Compressed domain processing – A quick guide on video communications M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [2] Review: Optical Flow Equation UMCP ENEE631 Slides (created by M.Wu © 2001) Orthogonal decomposition of the flow vector v – Projection along “normal direction” ~ vn i.e., along image gradient f ’s direction – Projection along tangent direction ~ vt i.e., along orthogonal direction to image gradient f Tangent direction O.F.E. f || f || vn 0 t f vn t f Normal direction || f || From Wang’s Preprint Fig.6.2 M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [3] Ambiguity in Motion Estimation UMCP ENEE631 Slides (created by M.Wu © 2001) One equation for two unknowns – Tangent direction of motion vector is undetermined – “Aperture problem” From Wang’s Preprint Fig.6.3 Aperture ~ small window over which to apply const. intensity assumption MV can be estimated only if aperture contains 2+ different gradient directions (e.g. corners) – Usually need additional constraints Spatial smoothness of motion field Indeterminate motion vector over constant region (||f || = 0) – Reliable motion estimation only for regions with brightness variations (e.g. edges or nonflat textures) M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [4] UMCP ENEE631 Slides (created by M.Wu © 2001) General Methodologies for Motion Estimation Two categories: Feature vs. Intensity based estimation Feature based – Step-1 establish correspondences between feature pairs – Step-2 estimate parameters of a chosen motion model by least-square fitting of the correspondences – Good for global/camera motion describable by parametric models Common models: affine, projective, … (Wang Sec.5.5.2-5.5.4) Applications: Image mosaicing, synthesis of multiple-views Intensity based – Apply optical flow equation (or its variation) to local regions – Good for non-simple motion and multiple objects – Applications: video coding, motion prediction and filtering M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [5] UMCP ENEE631 Slides (created by M.Wu © 2001) Motion Estimation Criteria Criterion based on displaced frame difference – E.g. in block matching approach Criterion based on optical flow equations Other criteria and considerations – Smoothness constraints – Bayesian criterion M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [6] UMCP ENEE631 Slides (created by M.Wu © 2001) Commonly Used Optimization Methods For minimizing the previously defined M.E. error function Exhaustive search – MAD often used for computational simplicity – Guaranteed global optimality at expense of computation complexity – Fast algorithms for sub-optimal solutions Gradient-based search (Appendix B of Wang’s book) – MSE often used for mathematical tractability (differentiable) – Iterative approach refine an estimate along negative gradient directions of objective func. – Generally converge to local optimal require good initial estimate – Estimation method of Gradient also affects accuracy & robustness M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [7] Various Motion Estimation Approaches UMCP ENEE631 Slides (created by M.Wu © 2001) Pixel-based motion estimation (Wang’s sec.6.3) Estimate one MV for every pixel Use relation from Optical Flow Equation to construct M.E. criterion Add smoothness constraints on motion field to deal with aperture problem and avoid poor estimation of MV Block-matching – Correlation method (Wang’s sec.6.4.5) Deformable block-matching (Wang’s sec.6.5) – Use more block-based motion model than translational model e.g., affine/bilinear/projective mapping for each block (sec.5.5) square block in current frame match with non-square block in ref. Mesh-based motion estimation (Wang’s sec.6.6) M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [8] Video Content Analysis M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [10] Figure from MPEG-7 Document N4031 (March 2001) UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Recall: MPEG-7 “Multimedia Content Description Interface” – Not a video coding/compression standard like previous MPEG – Emphasize on how to describe the video content for efficient indexing, search, and retrieval Standardize the description mechanism of content – Descriptor, Description Scheme & Description Definition Languages – Commonly used visual descriptors: Color, Texture, Shape, … M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [11] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Introduction to Video Content Analysis Teach computer to “understand” video content – Define features that computer can learn to measure and compare – Give example correspondences so that computer can learn color (RGB values or other color coordinates) motion (magnitude and directions) shape (contours) texture and patterns build connections between feature & higher-level semantics/concepts statistical classification and recognition techniques Video understanding 1. 2. 3. 4. Break a video sequence into chunks, each with consistent content ~ “shot” Group similar shot into scenes that represent certain events Describe connections among scenes via story boards or scene graphs Associate shot/scene with representative feature/semantics for future query M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [12] Video Understanding (step-1) From Yeung-Yeo-Liu: STG (Princeton) – Break a video sequence into chunks, each with consistent content ~ “shot” M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [13] Video Understanding (step-2) From Yeung-Yeo-Liu: STG (Princeton) – Group similar shot into scenes M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [14] Video Understanding (step-3) From Yeung-Yeo-Liu: STG (Princeton) – Describe connections among scenes via story boards or scene graphs M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [15] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Video Temporal Segmentation A first step toward video content understanding – Elect “key frames” to represent each shot for index/retrieval – Sequence of shot duration as a “signature” for a video Two types of transitions – “Cut” ~ abrupt transition – Gradual transition: Fade out and Fade in; Dissolve; Wipe Detecting transitions – Detecting cut is relatively easier check frame-wise difference – Detecting dissolve and fade by checking linearity f0 (1 – t/T) + f1 * t/T – Detecting wipe ~ more difficult exploit transition patterns, or linearity of color histogram M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [16] Detect Dissolve via Linearity in Pixel Changes Pixel 2 hk dissolve m gk Pixel 1 n Pixel 3 Dissolve: a linear combination of g and h Detect straight lines in DC frame space – correlation detection on triplets M. Wu: ENEE631 Digital Image Processing (Spring'09) From talks by Joyce-Liu (Princeton) Lec.19 – Video Analysis & Comm [17] UMCP ENEE408G Slides (created by M.Wu © 2002) Examples of Wipes M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [18] UMCP ENEE408G Slides (created by M.Wu © 2002) Wipe Detection (1) – Convert the 2-D problem to 1-D by projection A common strategy in feature extraction and analysis in image processing – Perform horizon, vertical, diagonal projection to detect diverse wipe types M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [19] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Review: Color Histogram Generalize from luminance histogram What is color histogram? – Count the # of pixels with the same color – Plot color-value vs. corresponding pixel# Give idea of the dominate color and color distribution – Ignore the exact spatial location of each color value – Useful in image and video analysis Color histogram can be used to: – Detect gradual shot transition esp. for fancy wipes – Measure content similarity between images / video shots M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [20] From talks by Joyce-Liu (Princeton) Wipe Detection (2) Diverse and fancy wipes Linear change in color histogram Bin 2 m wipe Gk Hk Bin 1 Ref: Joyce & Liu, IEEE Trans. Multimedia, 2006. M. Wu: ENEE631 Digital Image Processing (Spring'09) Bin 3 n Lec.19 – Video Analysis & Comm [21] From talks by Joyce-Liu (Princeton) Types of Transitions – [above] Transition types offered by Adobe Premiere – See also transition demos provided by PowerPoint Video transition collection (Dr. Rob Joyce) M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [22] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Compressed-Domain Processing Does video analysis have to decompress the whole video? Use I & P frames only to reduce computation and enhance robustness in scene change detection … I bb P bb P bb P bb I b b P … Working in compressed domain – Process video by only doing partial decoding (inverse VLC, etc.) without a full decoding (IDCT) to save computation – Low-resolution version provides enough info for transition detection => “DC-image” M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [23] Example From Joyce-Liu (Princeton) DC Image – Put DC of each block together – Already contain most information of the video DC Frame M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [24] Fast Extraction of DC Image From MPEG-1 UMCP ENEE408G Slides (created by M.Wu © 2002) I frame – Put together DC coeff. from each block (and apply proper scaling) Predictive (P/B) frame – Fast approximation of reference block’s DC – Adding DC of the motion compensation residue recall DCT is a linear transform See Yeo-Liu’s paper for more derivations on approximations (DC; DC+2AC) 1 2 R 3 C 4 hi wi [ DCT ( Pref )]00 [ DCT ( Pi )]00 i 1 64 4 [ DCT ( Pcur )]00 [ DCT ( Pref )]00 [ DCT ( Pdiff )]00 M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [25] Compressed-Domain Scene Change Detection UMCP ENEE408G Slides (created by M.Wu © 2002) Compare nearby frames – Take pixel-wise difference of nearby DC-frames – Or take pixel-wise difference of every N frames to accumulate more changes => useful for detect gradual transitions Observe the pixel-wise difference for different frame pairs – Peaks @ cuts, and plateaus @ gradual transitions Figure from Yeo-Liu CSVT’95 paper M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [26] UMCP ENEE408G Slides (created by M.Wu © 2002) Scene Change Detection (cont’d) – Identify candidate places for gradual transitions – Can further explore the linearity in DC frames => Help differentiate gradual transitions from motions Figure from Yeo-Liu CSVT’95 paper M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [27] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Summary on Video Temporal Segmentation A first step toward video content understanding Two types of transitions – “Cut” ~ abrupt transition – Gradual transition: Fade out and Fade in; Dissolve; Wipe Detecting transitions: can be done on “DC images” w/o full decompression – Detecting cut is relatively easier ~ check frame-wise difference – Detecting dissolve and fade by checking linearity f0 (1 – t/T) + f1 * t/T – Detecting wipe ~ more difficult exploit transition patterns, or linearity of color histogram M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [28] Video Communications M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [29] MM + Data Comm. = Effective MM Communications? Multimedia vs. Generic Data – Perceptual no-difference vs. Bit-by-bit accuracy – Unequal importance within multimedia data – High data volume and real-time requirements Need consider the interplay between source coding and transmission and make use of MM specific properties E.g. wireless video need “good” compression algorithm to: – Support scalable video compression rate ( from 10 to several hundred kbps) – Be robust to the transmission errors and channel impairments – Minimize end-to-end delay – Handle missing frames intelligently M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [30] (From D. Lun @ HK PolyUniv. Short Course 6/01) Error-Resilient Coding with Localized Synch Marker To reduce error propagation Input sequence H.263 encoder MB detection H.263 decoder Error concealment LRM Output sequence Random noise H.263 with FRM M. Wu: ENEE631 Digital Image Processing (Spring'09) H.263 with LRM Lec.19 – Video Analysis & Comm [31] Issues in Video Communications/Streaming Source coding aspects – – – – Rate-Distortion tradeoff and bit allocation in R-D optimal sense Scalable coding and Fine Granular Scalability (FGS) Multiple description coding Error resilient source coding Channel coding aspects ~ see ENEE626 for general theory – Unequal Error Protection (UEP) channel codes – Embedded modulation for achieving UEP Joint source-channel approaches – Jointly select source and channel coding parameters to optimize end-to-end distortion – Wisely map source codewords to channel symbols – Take advantage of channel’s non-uniform characteristics for UEP Bandwidth resource determination, allocation & adaptation M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [32] Reading References UMCP ENEE408G Slides (created by M.Wu © 2002) Video temporal segmentation for content analysis – Yeo-Liu CSVT 12/1995 paper (DC-image & scene change detection) – Joyce-Liu TMM 2006 paper (Wipe detection) Video communications – Wang’s video textbook: Chapter 14, 15. – Wood’s book: Chapter 12 M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [33]