
ENEE631 Spring’09
Lecture 19 (4/13/2009)
Video Content Analysis and Streaming
Spring ’09 Instructor: Min Wu
Electrical and Computer Engineering Department,
University of Maryland, College Park
 bb.eng.umd.edu (select ENEE631 S’09)
 minwu@eng.umd.edu
Overview and Logistics
Last Time:
– General methodologies on motion analysis
– Optical flow equations
– Wrap up motion analysis
– Video content analysis
Basic framework
Temporal segmentation; Compressed domain processing
– A quick guide on video communications
Review: Optical Flow Equation
Orthogonal decomposition of the flow vector v
– Projection along “normal direction” ~ vn
i.e., along image gradient  f ’s direction
– Projection along tangent direction ~ vt
i.e., along orthogonal direction to image gradient  f
Tangent direction
|| f || vn 
 vn  
Normal direction
|| f ||
From Wang’s
Preprint Fig.6.2
Ambiguity in Motion Estimation
One equation for two unknowns
– Tangent direction of motion vector
is undetermined
– “Aperture problem”
From Wang's Preprint Fig.6.3
Aperture ~ small window over which to apply const. intensity assumption
MV can be estimated only if aperture contains 2+ different gradient
directions (e.g. corners)
– Usually need additional constraints
Spatial smoothness of motion field
Indeterminate motion vector over constant region (||f || = 0)
– Reliable motion estimation only for regions with brightness
variations (e.g. edges or nonflat textures)
General Methodologies for Motion Estimation
Two categories: Feature vs. Intensity based estimation
Feature based
– Step-1 establish correspondences between feature pairs
– Step-2 estimate parameters of a chosen motion model by
least-square fitting of the correspondences
– Good for global/camera motion describable by parametric
Common models: affine, projective, … (Wang Sec.5.5.2-5.5.4)
Applications: Image mosaicing, synthesis of multiple-views
Intensity based
– Apply optical flow equation (or its variation) to local regions
– Good for non-simple motion and multiple objects
– Applications: video coding, motion prediction and filtering
Motion Estimation Criteria
Criterion based on displaced frame difference
– E.g. in block matching approach
Criterion based on optical flow equations
Other criteria and considerations
– Smoothness constraints
– Bayesian criterion
Commonly Used Optimization Methods
For minimizing the previously defined M.E. error function
Exhaustive search
– MAD often used for computational simplicity
– Guaranteed global optimality at expense of computation complexity
– Fast algorithms for sub-optimal solutions
Gradient-based search (Appendix B of Wang’s book)
– MSE often used for mathematical tractability (differentiable)
– Iterative approach
refine an estimate along negative gradient directions of objective func.
– Generally converge to local optimal
require good initial estimate
– Estimation method of Gradient also affects accuracy & robustness
Various Motion Estimation Approaches
Pixel-based motion estimation (Wang’s sec.6.3)
Estimate one MV for every pixel
Use relation from Optical Flow Equation to construct M.E. criterion
Add smoothness constraints on motion field to deal with aperture
problem and avoid poor estimation of MV
– Correlation method (Wang’s sec.6.4.5)
Deformable block-matching (Wang’s sec.6.5)
– Use more block-based motion model than translational model
e.g., affine/bilinear/projective mapping for each block (sec.5.5)
square block in current frame match with non-square block in ref.
Mesh-based motion estimation (Wang’s sec.6.6)
Video Content Analysis
Figure from MPEG-7
Document N4031
(March 2001)
Recall: MPEG-7
“Multimedia Content Description Interface”
– Not a video coding/compression standard like previous MPEG
– Emphasize on how to describe the video content for efficient
indexing, search, and retrieval
Standardize the description mechanism of content
– Descriptor, Description Scheme & Description Definition
– Commonly used visual descriptors: Color, Texture, Shape, …
Introduction to Video Content Analysis
Teach computer to “understand” video content
Define features that computer can learn to measure and compare
Give example correspondences so that computer can learn
color (RGB values or other color coordinates)
motion (magnitude and directions)
shape (contours)
texture and patterns
build connections between feature & higher-level semantics/concepts
statistical classification and recognition techniques
Video understanding
Break a video sequence into chunks, each with consistent content ~ “shot”
Group similar shot into scenes that represent certain events
Describe connections among scenes via story boards or scene graphs
Associate shot/scene with representative feature/semantics for future query
Video Understanding (step-1)
From Yeung-Yeo-Liu:
STG (Princeton)
– Break a video sequence into chunks, each with consistent
content ~ “shot”
Video Understanding (step-2)
From Yeung-Yeo-Liu:
STG (Princeton)
– Group similar shot into scenes
Video Understanding (step-3)
From Yeung-Yeo-Liu:
STG (Princeton)
– Describe connections among scenes via story boards or
scene graphs
Video Temporal Segmentation
A first step toward video content understanding
– Elect “key frames” to represent each shot for index/retrieval
– Sequence of shot duration as a “signature” for a video
Two types of transitions
– “Cut” ~ abrupt transition
– Gradual transition: Fade out and Fade in; Dissolve; Wipe
Detecting transitions
– Detecting cut is relatively easier
check frame-wise difference
– Detecting dissolve and fade by checking linearity
f0 (1 – t/T) + f1 * t/T
– Detecting wipe ~ more difficult
exploit transition patterns, or linearity of color histogram
Detect Dissolve via Linearity in Pixel Changes
Pixel 2
Pixel 1
Pixel 3
Dissolve: a linear combination of g and h
Detect straight lines in DC frame space
– correlation detection on triplets
M. Wu: ENEE631 Digital Image Processing (Spring'09)
From talks by
Joyce-Liu (Princeton)
Lec.19 – Video Analysis & Comm [17]
Examples of Wipes
Wipe Detection (1)
– Convert the 2-D
problem to 1-D by
A common strategy
in feature extraction
and analysis in
image processing
– Perform horizon,
vertical, diagonal
projection to detect
diverse wipe types
Review: Color Histogram
Generalize from luminance histogram
What is color histogram?
– Count the # of pixels with the same color
– Plot color-value vs. corresponding pixel#
Give idea of the dominate color and color distribution
– Ignore the exact spatial location of each color value
– Useful in image and video analysis
Color histogram can be used to:
– Detect gradual shot transition esp. for fancy wipes
– Measure content similarity between images / video shots
From talks by
Joyce-Liu (Princeton)
Wipe Detection (2)
Diverse and
fancy wipes
Linear change in
color histogram
Bin 2
Bin 1
Ref: Joyce & Liu, IEEE Trans. Multimedia, 2006.
Bin 3
From talks by
Joyce-Liu (Princeton)
Types of Transitions
– [above] Transition types offered by Adobe Premiere
– See also transition demos provided by PowerPoint
Video transition collection (Dr. Rob Joyce)
Compressed-Domain Processing
Does video analysis have to decompress the whole video?
Use I & P frames only to reduce computation and enhance
robustness in scene change detection
… I bb P bb P bb P bb I b b P …
Working in compressed domain
– Process video by only doing partial decoding (inverse VLC,
etc.) without a full decoding (IDCT) to save computation
– Low-resolution version provides enough info for transition
=> “DC-image”
Example From
Joyce-Liu (Princeton)
DC Image
– Put DC of each block together
– Already contain most information of the video
DC Frame
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Fast Extraction of DC Image From MPEG-1
I frame
– Put together DC coeff. from each block (and apply proper scaling)
Predictive (P/B) frame
– Fast approximation of reference block’s DC
– Adding DC of the motion compensation residue
recall DCT is a linear transform
See Yeo-Liu’s paper for more derivations on approximations (DC; DC+2AC)
hi wi
[ DCT ( Pref )]00  
[ DCT ( Pi )]00
i 1 64
[ DCT ( Pcur )]00  [ DCT ( Pref )]00  [ DCT ( Pdiff )]00
Compressed-Domain Scene Change Detection
UMCP ENEE408G Slides (created by M.Wu © 2002)
– Take pixel-wise difference of nearby DC-frames
– Or take pixel-wise difference of every N frames to
accumulate more changes
=> useful for detect gradual transitions
Observe the pixel-wise difference for different frame pairs
– Peaks @ cuts, and plateaus @ gradual transitions
Figure from Yeo-Liu CSVT'95 paper
Scene Change Detection (cont’d)
– Identify candidate
places for gradual
– Can further explore
the linearity in DC
=> Help differentiate
gradual transitions
from motions
Figure from Yeo-Liu
CSVT’95 paper
Summary on Video Temporal Segmentation
A first step toward video content understanding
Two types of transitions
– “Cut” ~ abrupt transition
– Gradual transition: Fade out and Fade in; Dissolve; Wipe
Detecting transitions: can be done on “DC images”
w/o full decompression
– Detecting cut is relatively easier ~ check frame-wise
– Detecting dissolve and fade by checking linearity
f0 (1 – t/T) + f1 * t/T
– Detecting wipe ~ more difficult
exploit transition patterns, or linearity of color histogram
Video Communications
MM + Data Comm. = Effective MM Communications?
Multimedia vs. Generic Data
– Perceptual no-difference vs. Bit-by-bit accuracy
– Unequal importance within multimedia data
– High data volume and real-time requirements
Need consider the interplay between source coding and
transmission and make use of MM specific properties
E.g. wireless video need “good” compression algorithm to:
– Support scalable video compression rate ( from 10 to several
hundred kbps)
– Be robust to the transmission errors and channel impairments
– Minimize end-to-end delay
– Handle missing frames intelligently
(From D. Lun @ HK PolyUniv. Short Course 6/01)
Error-Resilient Coding with Localized Synch Marker
To reduce error propagation
H.263 encoder
H.263 decoder
H.263 with FRM
H.263 with LRM
Issues in Video Communications/Streaming
Source coding aspects
Rate-Distortion tradeoff and bit allocation in R-D optimal sense
Scalable coding and Fine Granular Scalability (FGS)
Multiple description coding
Error resilient source coding
Channel coding aspects ~ see ENEE626 for general theory
– Unequal Error Protection (UEP) channel codes
– Embedded modulation for achieving UEP
Joint source-channel approaches
– Jointly select source and channel coding parameters to optimize
end-to-end distortion
– Wisely map source codewords to channel symbols
– Take advantage of channel’s non-uniform characteristics for UEP
Bandwidth resource determination, allocation & adaptation
Reading References
Video temporal segmentation for content analysis
– Yeo-Liu CSVT 12/1995 paper (DC-image & scene change detection)
– Joyce-Liu TMM 2006 paper (Wipe detection)
Video communications
– Wang’s video textbook: Chapter 14, 15.
– Wood’s book: Chapter 12
