lecture19

advertisement
ENEE631 Spring’09
Lecture 19 (4/13/2009)
Video Content Analysis and Streaming
Spring ’09 Instructor: Min Wu
Electrical and Computer Engineering Department,
University of Maryland, College Park
 bb.eng.umd.edu (select ENEE631 S’09)
 minwu@eng.umd.edu
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Overview and Logistics
UMCP ENEE631 Slides (created by M.Wu © 2004)

Last Time:
– General methodologies on motion analysis
– Optical flow equations

Today:
– Wrap up motion analysis
– Video content analysis


Basic framework
Temporal segmentation; Compressed domain processing
– A quick guide on video communications
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [2]
Review: Optical Flow Equation
UMCP ENEE631 Slides (created by M.Wu © 2001)

Orthogonal decomposition of the flow vector v
– Projection along “normal direction” ~ vn
i.e., along image gradient  f ’s direction
– Projection along tangent direction ~ vt
i.e., along orthogonal direction to image gradient  f

Tangent direction
O.F.E.
f
|| f || vn 
0
t
f
 vn  
t
f
Normal direction
|| f ||
From Wang’s
Preprint Fig.6.2
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [3]
Ambiguity in Motion Estimation
UMCP ENEE631 Slides (created by M.Wu © 2001)

One equation for two unknowns
– Tangent direction of motion vector
is undetermined
– “Aperture problem”


From Wang’s Preprint Fig.6.3
Aperture ~ small window over which to apply const. intensity assumption
MV can be estimated only if aperture contains 2+ different gradient
directions (e.g. corners)
– Usually need additional constraints


Spatial smoothness of motion field
Indeterminate motion vector over constant region (||f || = 0)
– Reliable motion estimation only for regions with brightness
variations (e.g. edges or nonflat textures)
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [4]
UMCP ENEE631 Slides (created by M.Wu © 2001)
General Methodologies for Motion Estimation

Two categories: Feature vs. Intensity based estimation

Feature based
– Step-1 establish correspondences between feature pairs
– Step-2 estimate parameters of a chosen motion model by
least-square fitting of the correspondences
– Good for global/camera motion describable by parametric
models



Common models: affine, projective, … (Wang Sec.5.5.2-5.5.4)
Applications: Image mosaicing, synthesis of multiple-views
Intensity based
– Apply optical flow equation (or its variation) to local regions
– Good for non-simple motion and multiple objects
– Applications: video coding, motion prediction and filtering
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [5]
UMCP ENEE631 Slides (created by M.Wu © 2001)
Motion Estimation Criteria

Criterion based on displaced frame difference
– E.g. in block matching approach

Criterion based on optical flow equations

Other criteria and considerations
– Smoothness constraints
– Bayesian criterion
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [6]
UMCP ENEE631 Slides (created by M.Wu © 2001)
Commonly Used Optimization Methods

For minimizing the previously defined M.E. error function

Exhaustive search
– MAD often used for computational simplicity
– Guaranteed global optimality at expense of computation complexity
– Fast algorithms for sub-optimal solutions

Gradient-based search (Appendix B of Wang’s book)
– MSE often used for mathematical tractability (differentiable)
– Iterative approach

refine an estimate along negative gradient directions of objective func.
– Generally converge to local optimal

require good initial estimate
– Estimation method of Gradient also affects accuracy & robustness
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [7]
Various Motion Estimation Approaches
UMCP ENEE631 Slides (created by M.Wu © 2001)

Pixel-based motion estimation (Wang’s sec.6.3)




Estimate one MV for every pixel
Use relation from Optical Flow Equation to construct M.E. criterion
Add smoothness constraints on motion field to deal with aperture
problem and avoid poor estimation of MV
Block-matching
– Correlation method (Wang’s sec.6.4.5)

Deformable block-matching (Wang’s sec.6.5)
– Use more block-based motion model than translational model



e.g., affine/bilinear/projective mapping for each block (sec.5.5)
square block in current frame match with non-square block in ref.
Mesh-based motion estimation (Wang’s sec.6.6)
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [8]
Video Content Analysis
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [10]
Figure from MPEG-7
Document N4031
(March 2001)
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Recall: MPEG-7

“Multimedia Content Description Interface”
– Not a video coding/compression standard like previous MPEG
– Emphasize on how to describe the video content for efficient
indexing, search, and retrieval

Standardize the description mechanism of content
– Descriptor, Description Scheme & Description Definition
Languages
– Commonly used visual descriptors: Color, Texture, Shape, …
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [11]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Introduction to Video Content Analysis

Teach computer to “understand” video content
–
Define features that computer can learn to measure and compare




–
Give example correspondences so that computer can learn



color (RGB values or other color coordinates)
motion (magnitude and directions)
shape (contours)
texture and patterns
build connections between feature & higher-level semantics/concepts
statistical classification and recognition techniques
Video understanding
1.
2.
3.
4.
Break a video sequence into chunks, each with consistent content ~ “shot”
Group similar shot into scenes that represent certain events
Describe connections among scenes via story boards or scene graphs
Associate shot/scene with representative feature/semantics for future query
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [12]
Video Understanding (step-1)
From Yeung-Yeo-Liu:
STG (Princeton)
– Break a video sequence into chunks, each with consistent
content ~ “shot”
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [13]
Video Understanding (step-2)
From Yeung-Yeo-Liu:
STG (Princeton)
– Group similar shot into scenes
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [14]
Video Understanding (step-3)
From Yeung-Yeo-Liu:
STG (Princeton)
– Describe connections among scenes via story boards or
scene graphs
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [15]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Video Temporal Segmentation

A first step toward video content understanding
– Elect “key frames” to represent each shot for index/retrieval
– Sequence of shot duration as a “signature” for a video

Two types of transitions
– “Cut” ~ abrupt transition
– Gradual transition: Fade out and Fade in; Dissolve; Wipe

Detecting transitions
– Detecting cut is relatively easier

check frame-wise difference
– Detecting dissolve and fade by checking linearity

f0 (1 – t/T) + f1 * t/T
– Detecting wipe ~ more difficult

exploit transition patterns, or linearity of color histogram
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [16]
Detect Dissolve via Linearity in Pixel Changes
Pixel 2
hk
dissolve
m
gk
Pixel 1
n
Pixel 3

Dissolve: a linear combination of g and h

Detect straight lines in DC frame space
– correlation detection on triplets
M. Wu: ENEE631 Digital Image Processing (Spring'09)
From talks by
Joyce-Liu (Princeton)
Lec.19 – Video Analysis & Comm [17]
UMCP ENEE408G Slides (created by M.Wu © 2002)
Examples of Wipes
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [18]
UMCP ENEE408G Slides (created by M.Wu © 2002)
Wipe Detection (1)
– Convert the 2-D
problem to 1-D by
projection

A common strategy
in feature extraction
and analysis in
image processing
– Perform horizon,
vertical, diagonal
projection to detect
diverse wipe types
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [19]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Review: Color Histogram

Generalize from luminance histogram

What is color histogram?
– Count the # of pixels with the same color
– Plot color-value vs. corresponding pixel#

Give idea of the dominate color and color distribution
– Ignore the exact spatial location of each color value
– Useful in image and video analysis

Color histogram can be used to:
– Detect gradual shot transition esp. for fancy wipes
– Measure content similarity between images / video shots
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [20]
From talks by
Joyce-Liu (Princeton)
Wipe Detection (2)

Diverse and
fancy wipes

Linear change in
color histogram
Bin 2
m
wipe
Gk
Hk
Bin 1
Ref: Joyce & Liu, IEEE Trans. Multimedia, 2006.
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Bin 3
n
Lec.19 – Video Analysis & Comm [21]
From talks by
Joyce-Liu (Princeton)
Types of Transitions
– [above] Transition types offered by Adobe Premiere
– See also transition demos provided by PowerPoint
Video transition collection (Dr. Rob Joyce)
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [22]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Compressed-Domain Processing

Does video analysis have to decompress the whole video?

Use I & P frames only to reduce computation and enhance
robustness in scene change detection
… I bb P bb P bb P bb I b b P …

Working in compressed domain
– Process video by only doing partial decoding (inverse VLC,
etc.) without a full decoding (IDCT) to save computation
– Low-resolution version provides enough info for transition
detection
=> “DC-image”
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [23]
Example From
Joyce-Liu (Princeton)
DC Image
– Put DC of each block together
– Already contain most information of the video
DC Frame
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [24]
Fast Extraction of DC Image From MPEG-1
UMCP ENEE408G Slides (created by M.Wu © 2002)

I frame
– Put together DC coeff. from each block (and apply proper scaling)

Predictive (P/B) frame
– Fast approximation of reference block’s DC
– Adding DC of the motion compensation residue

recall DCT is a linear transform
See Yeo-Liu’s paper for more derivations on approximations (DC; DC+2AC)
1
2
R
3
C
4
hi wi
[ DCT ( Pref )]00  
[ DCT ( Pi )]00
i 1 64
4
[ DCT ( Pcur )]00  [ DCT ( Pref )]00  [ DCT ( Pdiff )]00
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [25]
Compressed-Domain Scene Change Detection
UMCP ENEE408G Slides (created by M.Wu © 2002)

Compare nearby frames
– Take pixel-wise difference of nearby DC-frames
– Or take pixel-wise difference of every N frames to
accumulate more changes
=> useful for detect gradual transitions

Observe the pixel-wise difference for different frame pairs
– Peaks @ cuts, and plateaus @ gradual transitions
Figure from Yeo-Liu CSVT’95 paper
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [26]
UMCP ENEE408G Slides (created by M.Wu © 2002)
Scene Change Detection (cont’d)
– Identify candidate
places for gradual
transitions
– Can further explore
the linearity in DC
frames
=> Help differentiate
gradual transitions
from motions
Figure from Yeo-Liu
CSVT’95 paper
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [27]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Summary on Video Temporal Segmentation

A first step toward video content understanding

Two types of transitions
– “Cut” ~ abrupt transition
– Gradual transition: Fade out and Fade in; Dissolve; Wipe

Detecting transitions: can be done on “DC images”
w/o full decompression
– Detecting cut is relatively easier ~ check frame-wise
difference
– Detecting dissolve and fade by checking linearity

f0 (1 – t/T) + f1 * t/T
– Detecting wipe ~ more difficult

exploit transition patterns, or linearity of color histogram
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [28]
Video Communications
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [29]
MM + Data Comm. = Effective MM Communications?

Multimedia vs. Generic Data
– Perceptual no-difference vs. Bit-by-bit accuracy
– Unequal importance within multimedia data
– High data volume and real-time requirements

Need consider the interplay between source coding and
transmission and make use of MM specific properties

E.g. wireless video need “good” compression algorithm to:
– Support scalable video compression rate ( from 10 to several
hundred kbps)
– Be robust to the transmission errors and channel impairments
– Minimize end-to-end delay
– Handle missing frames intelligently
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [30]
(From D. Lun @ HK PolyUniv. Short Course 6/01)
Error-Resilient Coding with Localized Synch Marker

To reduce error propagation
Input
sequence
H.263 encoder
MB
detection
H.263 decoder
Error
concealment
LRM
Output
sequence
Random
noise
H.263 with FRM
M. Wu: ENEE631 Digital Image Processing (Spring'09)
H.263 with LRM
Lec.19 – Video Analysis & Comm [31]
Issues in Video Communications/Streaming

Source coding aspects
–
–
–
–

Rate-Distortion tradeoff and bit allocation in R-D optimal sense
Scalable coding and Fine Granular Scalability (FGS)
Multiple description coding
Error resilient source coding
Channel coding aspects ~ see ENEE626 for general theory
– Unequal Error Protection (UEP) channel codes
– Embedded modulation for achieving UEP

Joint source-channel approaches
– Jointly select source and channel coding parameters to optimize
end-to-end distortion
– Wisely map source codewords to channel symbols
– Take advantage of channel’s non-uniform characteristics for UEP

Bandwidth resource determination, allocation & adaptation
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [32]
Reading References
UMCP ENEE408G Slides (created by M.Wu © 2002)

Video temporal segmentation for content analysis
– Yeo-Liu CSVT 12/1995 paper (DC-image & scene change detection)
– Joyce-Liu TMM 2006 paper (Wipe detection)

Video communications
– Wang’s video textbook: Chapter 14, 15.
– Wood’s book: Chapter 12
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec.19 – Video Analysis & Comm [33]
Download