video_coding2

advertisement
Roadmap
Introduction
Intra-frame coding
Inter-frame coding
Object-based and scalable video coding*
– Why object-based?
motion segmentation, shape coding, R-D optimization
– scalability issues
Spatial/temporal/quality scalabilities
EE569 Digital Video Processing
1
Object-based Video Coding
Waveform-based coding discussed so far uses a simple source model
(e.g., H.261/263/264, MPEG-1/-2)
– Does not consider the semantic content (e.g. objects and their shape)
of the video
Object-based video coding identifies objects (or regions) in a
video and encodes them. Potential benefits may include
–
–
–
–
Improved coding efficiency
Improved visual quality (e.g., no blocking artifacts)
Content description
Content-based interactivity
Also called “content-dependent video coding”
– The buzz word for MPEG-4 but less successful than expected (so the
important question is to understand why it does not work so well)
EE569 Digital Video Processing
2
Essential Tasks in Object-based
Video Coding
Object/region segmentation
– Separate pixels based on their color, texture, motion
characteristics
– Closely related to motion detection and segmentation
– Intrinsically ill-defined and desperate for a breakthrough
2D shape modeling and coding
– Not all shapes are equally probable
– Subtle implications into video coding (hidden pitfalls)
2D texture modeling and coding
– Extension of existing block-based MCP into region-based
– Deformable textures (tradeoff between spatial and temporal
prediction)
EE569 Digital Video Processing
3
Object/Region Segmentation
The major challenge in content/object-based
coding
Common approaches for segmentation in a still
image: gray-level thresholding, clustering, edge
detection, region growing, splitting and
merging
Object segmentation in video
– Motion information can be utilized, but how?
– Should we trust more on motion or spatial clues?
EE569 Digital Video Processing
4
Motion-based Segmentation
Motion-based segmentation: to segment an
image using motion information
– We can first estimate the motion field and then
segment the motion field
– However, estimation and segmentation are like
two sides of the same coin
+
EE569 Digital Video Processing
5
A Mind-bothering Example
Frame 1
Frame 2
It is easy to convince yourself that tree branches are moving,
But how do we know the sky is still? What if it were also moving
at the same speed (shouldn’t we observe the same intensity patterns
because sky is a smooth region)?
EE569 Digital Video Processing
6
Implications into Video Coding
True motion representation might be useful to
computer vision and motion perception, but it is not
indispensable in video coding
The fundamental reason lies in the relationship
between motion representation and video coding:
how to tolerate the uncertainty in motion?
The same issue remains in object-based image
coding: how to tolerate the uncertainty in shape? (we
will discuss this in more detail later)
EE569 Digital Video Processing
7
Simplified Segmentation: Change
Detection
To detect the changing parts in a video, from time ti to
time tj , we compute a difference image and threshold
the difference by T
 1 if | f ( x, y, ti )  f ( x, y, t j ) | T
d ij ( x, y )  
0
otherwise
f (x, y, tj)
f (x, y, ti)
dij(x,y) can be further processed, e.g., to remove isolated
1’s, or to group 1’s that are close by to each other
EE569 Digital Video Processing
8
Change Detection: Pros and Cons
Simple to implement; fast
Detects all changes
Detects even unwanted changes
Positive and negative changes detected
(occlusion)
Difficult to quantify motion
Requires a static reference frame
EE569 Digital Video Processing
9
Change Detection: An Example
Monitor the traffic
EE569 Digital Video Processing
10
If without a static reference frame
Background extraction methods
– Ad-hoc median detector (your CA#6)
– To eliminate the impact of (small) moving
objects, use the “robust estimator” approach
to iteratively remove the outliers
– More sophisticated approaches involve the
modeling of background by mixture of
Gaussian distributions and graph-cut based
optimization
EE569 Digital Video Processing
11
Simplified Segmentation: Global
Motion Estimation
Planar homography (feature-based)
– Homogeneous coordinates
– Conditions for planar homography
– Homography estimation from feature
correspondence
Hierarchical model-based GME (feature-less)
– Directly minimize an energy function (the MSE of
MCP errors)
– Solve the optimization problem in a coarse-to-fine
fashion (more robust and efficient)
EE569 Digital Video Processing
12
Plane Homography
EE569 Digital Video Processing
13
Model-based GME
Target function for minimization
Solution: Gauss-Newton method
where
Bergen, J. R., Anandan, P., Hanna, K. J., and Hingorani, R. “Hierarchical Model-Based Motion
Estimation.” In Proc. of the Second European Conference on Computer Vision, pp. 237-252, 1992
EE569 Digital Video Processing
14
Multi-resolution GME
EE569 Digital Video Processing
15
Numerical Example
EE569 Digital Video Processing
16
Summary for Change Detection and
Global Motion Estimation
Motion segmentation becomes relatively easier
to solve when either camera is still or
background objects belong to a plane
Latest advances include a joint motion
segmentation and estimation using level-set
methods (PDE-based formulation)
Mansouri, A.-R.; Konrad, J., "Multiple motion segmentation with level sets," Image Processing,
IEEE Transactions on , vol.12, no.2, pp. 201-220, Feb 2003
EE569 Digital Video Processing
17
2-D Shape Modeling and Coding
Bitmap coding: a binary map specifying whether
or not a pixel belongs to an object
– A special case of the general alpha-map
Contour coding: code only the contour of the
object or the region
– Chain codes
– Polygon approximation
– Spline approximation
EE569 Digital Video Processing
18
Image Matting (Soft segmentation)
X (i, j )   (i, j ) F (i, j )  [1   (i, j )]B(i, j ),0   (i, j )  1
Not for coding but for interactive editing
EE569 Digital Video Processing
19
2-D Texture Modeling and Coding*
Shape-adaptive DCT
Shape-adaptive wavelet transform
EE569 Digital Video Processing
20
Roadmap
Introduction
Intra-frame coding
– Review of JPEG
Inter-frame coding
– Conditional Replenishment (CR)
– Motion Compensated Prediction (MCP)
Scalable video coding
– 3D subband/wavelet coding and recent trend
EE569 Digital Video Processing
21
Scalable vs. Multicast
What is scalable coding?
foreman.yuv
foreman.yuv
foreman128k.cod
foreman256k.cod
foreman512k.cod
foreman1024k.cod
foreman.cod
128 256
Multicast
512
1024
Scalable coding
EE569 Digital Video Processing
22
Spatial scalability
1 0 1 1 1 …0 1 0 1 0 0 0 …1 1 0 1 0 0
EE569 Digital Video Processing
23
Temporal scalability
1 0 1 1 1 …0 1 0 1 0 0 0 …1 1 0 1 0 0
Frame 0,4,8,12,…
7.5Hz
Frame 0,2,4,6,8,…
15Hz
EE569 Digital Video Processing
Frame 0,1,2,3,4,5,…
30Hz
24
SNR (Rate) scalability
1 0 1 1 1 …0 1 0 1 0 0 0 …1 1 0 1 0 0
PSNRavg=30dB
PSNRavg
PSNRavg=40dB
PSNRavg=35dB
1

N
N
 PSNR
i 1
i
PSNRi: PSNR of frame i
EE569 Digital Video Processing
25
Scalability via Bit-Plane Coding
sign bit
A=(a0+a12+a222+ … … +a727)
Least Significant Bit
(LSB)
Example
Most Significant Bit
(MSB)
A=129  sign=+,a0a1a2 …a7=10000001
sign=-, a0a1a2 …a7=00110011 
A=-(4+8+64+128)=-204
EE569 Digital Video Processing
26
Why DPCM Bad for Scalability?
Frame number
1
2
3
…
Base layer
Ibase
P
P
P
Enhancement
Layer 1
Ienh1
P
P
P
Enhancement
Layer 2
Ienh2
P
P
P
suffer from drifting problem
suffer from coding efficiency loss
EE569 Digital Video Processing
27
Fine Granular Scalability (FGS)
Efficiency gap
Enhancement layer
variable bit-rate
~2dB gap
Base layer
20 kbps
EE569 Digital Video Processing
H.264 with/without FGS
option
28
Foreman sequence (5fps)
3D Wavelet/Subband Coding
y
t
x
2D spatial WT+1D temporal WT
EE569 Digital Video Processing
29
Wavelet Video Coder
Original
video
frames
0
1
2
3
4
5
6
7
LH
HHH
H
H
H
H
LH
LLL
Temporal
Wavelet
Transform
H
HH
HH
H
H
H
LLH
Spatial
Wavelet
Transform
Embedded
Quantization &
Entropy Coding
[Taubman & Zakhor, 1994] [Ohm, 1994]
[Choi & Woods, 1999] [Hsiang & Woods, VCIP ’99] . . . and others
EE569 Digital Video Processing
30
Motion-Adaptive 3D Wavelet Transform
Recall Haar transform
1
s(n )  ( x (2n )  x (2n  1)),
2
d (n )  x (2n )  x(2n  1)
d (n )  x (2n )  x (2n  1),
1
s(n )  ( x (2n )  d (n ))
2
lifting-based implementation
Motion-adaptive Haar transform
d n  f 2 n  W [ f 2 n 1 ],
1 2n
n
s  ( f  W 1[d n ])
2
W,W-1: forward and backward motion vector
EE569 Digital Video Processing
31
Lifting
Even Frames
Analysis:
P
G0
Low Band
G1
High Band
U
Odd Frames
Motion Compensation
Even Frames
Synthesis:
P
Low Band
G11
High Band
U
Odd Frames
[Secker & Taubman, 2001]
G01
[Popescu & Bottreau, 2001]
EE569 Digital Video Processing
32
38
Luminance PSNR (dB)
36
34
MC Wavelet Coding vs.
H.264/AVC
Non-scalable
H.264/AVC
32
30
28
26
Scalable
MC 5/3 Wavelet
24
Sequence: Mobile CIF
H.264/AVC
• high complexity RD control
22
• CABAC
• PBBPBBP . . .
• 5 prev/3 future reference frames
• data courtesy of M. Flierl
20
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
bit-rate (Mbps)
EE569 Digital Video Processing
1.8
2.0
[Taubman & Secker, VCIP 2003]
courtesy D. Taubman
33
Wavelet Synthesis with Lossy
Motion Vector
Video
in
MC Wavelet
Transform
Embedded
Encoding
Inverse
Wavelet
Transform
Decoder
Video
out
Minimize
J=D+lR
d
Motion
Estimat
or
Embedded
Encoding
Decoder
d
Minimize
J=D+lR
[Taubman & Secker, ICIP03]
EE569 Digital Video Processing
34
R-D Performance with Lossy
Motion Vector
40
Video PSNR (dB)
38 Non-embedded
single-rate
36
34
Embedded wavelet coefficients
Lossless motion
32
30
28
Embedded wavelet coefficients
Lossy motion
CIF Foreman
26
24
0
200
400
60
800
0 (kbps)
Bit-Rate
1000
1200
[Taubman & Secker, VCIP 2003]
courtesy D. Taubman
EE569 Digital Video Processing
35
Surprising Success of ITU-T
Rec. H.263
. . . and what is was used for.
What H.263 was developed for . . .
??
Analog videophone
Internet video streaming
EE569 Digital Video Processing
36
What is Streaming Video?
•Download mode: no delay bound
•Streaming mode: delay bound
Receiver 1
Access
SW
Domain B
Domain A
Data path
Domain C
Access
SW
Internet
Source
cnn.com
RealPlayer
EE569 Digital Video Processing
Access
SW
Receiver 2
37
Outline
• Challenges for quality video transport
• An architecture for video streaming
–
–
–
–
–
–
Video compression
Application-layer QoS control
Continuous media distribution services
Streaming server
Media synchronization mechanisms
Protocols for streaming media
• Summary
EE569 Digital Video Processing
38
Time-varying Available Bandwidth
Receiver
No bandwidth reservation
Access
SW
Domain B
R>=56 kb/s
Domain A
Data path
R<56 kb/s
Access
SW
56 kb/s
RealPlayer
Source
cnn.com
EE569 Digital Video Processing
39
Time-varying Delay
Receiver
Access
SW
RealPlayer
Domain B
Domain A
Data path
Delayed packets regarded as lost
Access
SW
56 kb/s
Source
cnn.com
EE569 Digital Video Processing
40
Effect of Packet Loss
Receiver
No packet loss
Access
SW
Domain B
Domain A
Data path
Access
SW
Loss of packets
No retransmission
Source
EE569 Digital Video Processing
41
Unicast vs. Multicast
Multicast
Unicast
Pros and cons?
EE569 Digital Video Processing
42
Heterogeneity For Multicast
•Network heterogeneity
256 kb/s
•Receiver heterogeneity
Receiver 2
Access
SW
What Quality?
Domain B
Domain A
Domain C
Access
SW
Internet
Gateway
Ethernet
Source
Telephone
networks
1 Mb/s
Receiver 1
64 kb/s
EE569 Digital Video Processing
What
Quality?
Receiver 3
43
Outline
• Challenges for quality video transport
• An architecture for video streaming
–
–
–
–
–
–
Video compression
Application-layer QoS control
Continuous media distribution services
Streaming server
Media synchronization mechanisms
Protocols for streaming media
• Summary
EE569 Digital Video Processing
44
Architecture for Video Streaming
EE569 Digital Video Processing
45
Video Compression
Layer 0
Layer 1
Layer 2
64 kb/s
256 kb/s
+
+
1 Mb/s
D
D
D
Layered video encoding/decoding.
D denotes the decoder.
EE569 Digital Video Processing
46
Application of Layered Video
256 kb/s
IP multicast
Receiver 2
Access
SW
Domain B
Domain A
Domain C
Access
SW
Internet
Gateway
Ethernet
Source
Telephone
networks
1 Mb/s
Receiver 1
64 kb/s
EE569 Digital Video Processing
Receiver 3
47
Application-layer QoS Control
Congestion control (using rate control):
– Source-based, requires
rate-adaptive compression or
rate shaping
– Receiver-based
– Hybrid
Error control:
–
–
–
–
Forward error correction (FEC)
Retransmission
Error resilient compression
Error concealment
EE569 Digital Video Processing
48
Congestion Control
• Window-based vs. rate control
Window-based control
EE569 Digital Video Processing
(pros and cons?)
Rate control
49
Source-based Rate Control
EE569 Digital Video Processing
50
Video Multicast
• How to extend source-based rate control to multicast?
• Limitation of source-based rate control in multicast
• Trade-off between bandwidth efficiency and service
flexibility
EE569 Digital Video Processing
51
Receiver-based Rate Control
IP multicast for layered video
256 kb/s
Receiver 2
Access
SW
Domain B
Domain A
Domain C
Access
SW
Internet
Gateway
Ethernet
Source
Telephone
networks
1 Mb/s
Receiver 1
64 kb/s
EE569 Digital Video Processing
Receiver 3
52
Error Control
• FEC
– Channel coding
– Source coding-based FEC
– Joint source/channel coding
• Delay-constrained retransmission
• Error resilient compression
• Error concealment
EE569 Digital Video Processing
53
Channel Coding
EE569 Digital Video Processing
54
Delay-constrained Retransmission
EE569 Digital Video Processing
55
Outline
• Challenges for quality video transport
• An architecture for video streaming
–
–
–
–
–
–
Video compression
Application-layer QoS control
Continuous media distribution services
Streaming server
Media synchronization mechanisms
Protocols for streaming media
• Summary
EE569 Digital Video Processing
56
EE569 Digital Video Processing
57
Continuous Media Distribution Services
• Content replication (caching & mirroring)
• Network filtering/shaping/thinning
• Application-level multicast (overlay networks)
EE569 Digital Video Processing
58
Caching
• What is caching?
• Why using caching? WWW means World Wide Wait?
• Pros and cons?
EE569 Digital Video Processing
59
Outline
• Challenges for quality video transport
• An architecture for video streaming
–
–
–
–
–
–
Video compression
Application-layer QoS control
Continuous media distribution services
Streaming server
Media synchronization mechanisms
Protocols for streaming media
• Summary
EE569 Digital Video Processing
60
Streaming Server
• Different from a web server
– Timing constraints
– Video-cassette-recorder (VCR) functions (e.g.,
fast forward/backward, random access, and
pause/resume).
• Design of streaming servers
– Real-time operating system
– Special disk scheduling schemes
EE569 Digital Video Processing
61
Media Synchronization
• Why media synchronization?
• Example: lip-synchronization (video/audio)
EE569 Digital Video Processing
62
Protocols for Streaming Video
• Network-layer protocol: Internet Protocol (IP)
• Transport protocol:
– Lower layer: UDP & TCP
– Upper layer: Real-time Transport Protocol (RTP) &
Real-Time Control Protocol (RTCP)
• Session control protocol:
– Real-Time Streaming Protocol (RTSP): RealPlayer
– Session Initiation Protocol (SIP): Microsoft
Windows MediaPlayer; Internet telephony
EE569 Digital Video Processing
63
Protocol Stacks
EE569 Digital Video Processing
64
Summary
• Challenges for quality video transport
– Time-varying available bandwidth
– Time-varying delay
– Packet loss
• An architecture for video streaming
–
–
–
–
–
–
Video compression
Application-layer QoS control
Continuous media distribution services
Streaming server
Media synchronization mechanisms
Protocols for streaming media
EE569 Digital Video Processing
65
Download