Roadmap Introduction Intra-frame coding Inter-frame coding Object-based and scalable video coding* – Why object-based? motion segmentation, shape coding, R-D optimization – scalability issues Spatial/temporal/quality scalabilities EE569 Digital Video Processing 1 Object-based Video Coding Waveform-based coding discussed so far uses a simple source model (e.g., H.261/263/264, MPEG-1/-2) – Does not consider the semantic content (e.g. objects and their shape) of the video Object-based video coding identifies objects (or regions) in a video and encodes them. Potential benefits may include – – – – Improved coding efficiency Improved visual quality (e.g., no blocking artifacts) Content description Content-based interactivity Also called “content-dependent video coding” – The buzz word for MPEG-4 but less successful than expected (so the important question is to understand why it does not work so well) EE569 Digital Video Processing 2 Essential Tasks in Object-based Video Coding Object/region segmentation – Separate pixels based on their color, texture, motion characteristics – Closely related to motion detection and segmentation – Intrinsically ill-defined and desperate for a breakthrough 2D shape modeling and coding – Not all shapes are equally probable – Subtle implications into video coding (hidden pitfalls) 2D texture modeling and coding – Extension of existing block-based MCP into region-based – Deformable textures (tradeoff between spatial and temporal prediction) EE569 Digital Video Processing 3 Object/Region Segmentation The major challenge in content/object-based coding Common approaches for segmentation in a still image: gray-level thresholding, clustering, edge detection, region growing, splitting and merging Object segmentation in video – Motion information can be utilized, but how? – Should we trust more on motion or spatial clues? EE569 Digital Video Processing 4 Motion-based Segmentation Motion-based segmentation: to segment an image using motion information – We can first estimate the motion field and then segment the motion field – However, estimation and segmentation are like two sides of the same coin + EE569 Digital Video Processing 5 A Mind-bothering Example Frame 1 Frame 2 It is easy to convince yourself that tree branches are moving, But how do we know the sky is still? What if it were also moving at the same speed (shouldn’t we observe the same intensity patterns because sky is a smooth region)? EE569 Digital Video Processing 6 Implications into Video Coding True motion representation might be useful to computer vision and motion perception, but it is not indispensable in video coding The fundamental reason lies in the relationship between motion representation and video coding: how to tolerate the uncertainty in motion? The same issue remains in object-based image coding: how to tolerate the uncertainty in shape? (we will discuss this in more detail later) EE569 Digital Video Processing 7 Simplified Segmentation: Change Detection To detect the changing parts in a video, from time ti to time tj , we compute a difference image and threshold the difference by T 1 if | f ( x, y, ti ) f ( x, y, t j ) | T d ij ( x, y ) 0 otherwise f (x, y, tj) f (x, y, ti) dij(x,y) can be further processed, e.g., to remove isolated 1’s, or to group 1’s that are close by to each other EE569 Digital Video Processing 8 Change Detection: Pros and Cons Simple to implement; fast Detects all changes Detects even unwanted changes Positive and negative changes detected (occlusion) Difficult to quantify motion Requires a static reference frame EE569 Digital Video Processing 9 Change Detection: An Example Monitor the traffic EE569 Digital Video Processing 10 If without a static reference frame Background extraction methods – Ad-hoc median detector (your CA#6) – To eliminate the impact of (small) moving objects, use the “robust estimator” approach to iteratively remove the outliers – More sophisticated approaches involve the modeling of background by mixture of Gaussian distributions and graph-cut based optimization EE569 Digital Video Processing 11 Simplified Segmentation: Global Motion Estimation Planar homography (feature-based) – Homogeneous coordinates – Conditions for planar homography – Homography estimation from feature correspondence Hierarchical model-based GME (feature-less) – Directly minimize an energy function (the MSE of MCP errors) – Solve the optimization problem in a coarse-to-fine fashion (more robust and efficient) EE569 Digital Video Processing 12 Plane Homography EE569 Digital Video Processing 13 Model-based GME Target function for minimization Solution: Gauss-Newton method where Bergen, J. R., Anandan, P., Hanna, K. J., and Hingorani, R. “Hierarchical Model-Based Motion Estimation.” In Proc. of the Second European Conference on Computer Vision, pp. 237-252, 1992 EE569 Digital Video Processing 14 Multi-resolution GME EE569 Digital Video Processing 15 Numerical Example EE569 Digital Video Processing 16 Summary for Change Detection and Global Motion Estimation Motion segmentation becomes relatively easier to solve when either camera is still or background objects belong to a plane Latest advances include a joint motion segmentation and estimation using level-set methods (PDE-based formulation) Mansouri, A.-R.; Konrad, J., "Multiple motion segmentation with level sets," Image Processing, IEEE Transactions on , vol.12, no.2, pp. 201-220, Feb 2003 EE569 Digital Video Processing 17 2-D Shape Modeling and Coding Bitmap coding: a binary map specifying whether or not a pixel belongs to an object – A special case of the general alpha-map Contour coding: code only the contour of the object or the region – Chain codes – Polygon approximation – Spline approximation EE569 Digital Video Processing 18 Image Matting (Soft segmentation) X (i, j ) (i, j ) F (i, j ) [1 (i, j )]B(i, j ),0 (i, j ) 1 Not for coding but for interactive editing EE569 Digital Video Processing 19 2-D Texture Modeling and Coding* Shape-adaptive DCT Shape-adaptive wavelet transform EE569 Digital Video Processing 20 Roadmap Introduction Intra-frame coding – Review of JPEG Inter-frame coding – Conditional Replenishment (CR) – Motion Compensated Prediction (MCP) Scalable video coding – 3D subband/wavelet coding and recent trend EE569 Digital Video Processing 21 Scalable vs. Multicast What is scalable coding? foreman.yuv foreman.yuv foreman128k.cod foreman256k.cod foreman512k.cod foreman1024k.cod foreman.cod 128 256 Multicast 512 1024 Scalable coding EE569 Digital Video Processing 22 Spatial scalability 1 0 1 1 1 …0 1 0 1 0 0 0 …1 1 0 1 0 0 EE569 Digital Video Processing 23 Temporal scalability 1 0 1 1 1 …0 1 0 1 0 0 0 …1 1 0 1 0 0 Frame 0,4,8,12,… 7.5Hz Frame 0,2,4,6,8,… 15Hz EE569 Digital Video Processing Frame 0,1,2,3,4,5,… 30Hz 24 SNR (Rate) scalability 1 0 1 1 1 …0 1 0 1 0 0 0 …1 1 0 1 0 0 PSNRavg=30dB PSNRavg PSNRavg=40dB PSNRavg=35dB 1 N N PSNR i 1 i PSNRi: PSNR of frame i EE569 Digital Video Processing 25 Scalability via Bit-Plane Coding sign bit A=(a0+a12+a222+ … … +a727) Least Significant Bit (LSB) Example Most Significant Bit (MSB) A=129 sign=+,a0a1a2 …a7=10000001 sign=-, a0a1a2 …a7=00110011 A=-(4+8+64+128)=-204 EE569 Digital Video Processing 26 Why DPCM Bad for Scalability? Frame number 1 2 3 … Base layer Ibase P P P Enhancement Layer 1 Ienh1 P P P Enhancement Layer 2 Ienh2 P P P suffer from drifting problem suffer from coding efficiency loss EE569 Digital Video Processing 27 Fine Granular Scalability (FGS) Efficiency gap Enhancement layer variable bit-rate ~2dB gap Base layer 20 kbps EE569 Digital Video Processing H.264 with/without FGS option 28 Foreman sequence (5fps) 3D Wavelet/Subband Coding y t x 2D spatial WT+1D temporal WT EE569 Digital Video Processing 29 Wavelet Video Coder Original video frames 0 1 2 3 4 5 6 7 LH HHH H H H H LH LLL Temporal Wavelet Transform H HH HH H H H LLH Spatial Wavelet Transform Embedded Quantization & Entropy Coding [Taubman & Zakhor, 1994] [Ohm, 1994] [Choi & Woods, 1999] [Hsiang & Woods, VCIP ’99] . . . and others EE569 Digital Video Processing 30 Motion-Adaptive 3D Wavelet Transform Recall Haar transform 1 s(n ) ( x (2n ) x (2n 1)), 2 d (n ) x (2n ) x(2n 1) d (n ) x (2n ) x (2n 1), 1 s(n ) ( x (2n ) d (n )) 2 lifting-based implementation Motion-adaptive Haar transform d n f 2 n W [ f 2 n 1 ], 1 2n n s ( f W 1[d n ]) 2 W,W-1: forward and backward motion vector EE569 Digital Video Processing 31 Lifting Even Frames Analysis: P G0 Low Band G1 High Band U Odd Frames Motion Compensation Even Frames Synthesis: P Low Band G11 High Band U Odd Frames [Secker & Taubman, 2001] G01 [Popescu & Bottreau, 2001] EE569 Digital Video Processing 32 38 Luminance PSNR (dB) 36 34 MC Wavelet Coding vs. H.264/AVC Non-scalable H.264/AVC 32 30 28 26 Scalable MC 5/3 Wavelet 24 Sequence: Mobile CIF H.264/AVC • high complexity RD control 22 • CABAC • PBBPBBP . . . • 5 prev/3 future reference frames • data courtesy of M. Flierl 20 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 bit-rate (Mbps) EE569 Digital Video Processing 1.8 2.0 [Taubman & Secker, VCIP 2003] courtesy D. Taubman 33 Wavelet Synthesis with Lossy Motion Vector Video in MC Wavelet Transform Embedded Encoding Inverse Wavelet Transform Decoder Video out Minimize J=D+lR d Motion Estimat or Embedded Encoding Decoder d Minimize J=D+lR [Taubman & Secker, ICIP03] EE569 Digital Video Processing 34 R-D Performance with Lossy Motion Vector 40 Video PSNR (dB) 38 Non-embedded single-rate 36 34 Embedded wavelet coefficients Lossless motion 32 30 28 Embedded wavelet coefficients Lossy motion CIF Foreman 26 24 0 200 400 60 800 0 (kbps) Bit-Rate 1000 1200 [Taubman & Secker, VCIP 2003] courtesy D. Taubman EE569 Digital Video Processing 35 Surprising Success of ITU-T Rec. H.263 . . . and what is was used for. What H.263 was developed for . . . ?? Analog videophone Internet video streaming EE569 Digital Video Processing 36 What is Streaming Video? •Download mode: no delay bound •Streaming mode: delay bound Receiver 1 Access SW Domain B Domain A Data path Domain C Access SW Internet Source cnn.com RealPlayer EE569 Digital Video Processing Access SW Receiver 2 37 Outline • Challenges for quality video transport • An architecture for video streaming – – – – – – Video compression Application-layer QoS control Continuous media distribution services Streaming server Media synchronization mechanisms Protocols for streaming media • Summary EE569 Digital Video Processing 38 Time-varying Available Bandwidth Receiver No bandwidth reservation Access SW Domain B R>=56 kb/s Domain A Data path R<56 kb/s Access SW 56 kb/s RealPlayer Source cnn.com EE569 Digital Video Processing 39 Time-varying Delay Receiver Access SW RealPlayer Domain B Domain A Data path Delayed packets regarded as lost Access SW 56 kb/s Source cnn.com EE569 Digital Video Processing 40 Effect of Packet Loss Receiver No packet loss Access SW Domain B Domain A Data path Access SW Loss of packets No retransmission Source EE569 Digital Video Processing 41 Unicast vs. Multicast Multicast Unicast Pros and cons? EE569 Digital Video Processing 42 Heterogeneity For Multicast •Network heterogeneity 256 kb/s •Receiver heterogeneity Receiver 2 Access SW What Quality? Domain B Domain A Domain C Access SW Internet Gateway Ethernet Source Telephone networks 1 Mb/s Receiver 1 64 kb/s EE569 Digital Video Processing What Quality? Receiver 3 43 Outline • Challenges for quality video transport • An architecture for video streaming – – – – – – Video compression Application-layer QoS control Continuous media distribution services Streaming server Media synchronization mechanisms Protocols for streaming media • Summary EE569 Digital Video Processing 44 Architecture for Video Streaming EE569 Digital Video Processing 45 Video Compression Layer 0 Layer 1 Layer 2 64 kb/s 256 kb/s + + 1 Mb/s D D D Layered video encoding/decoding. D denotes the decoder. EE569 Digital Video Processing 46 Application of Layered Video 256 kb/s IP multicast Receiver 2 Access SW Domain B Domain A Domain C Access SW Internet Gateway Ethernet Source Telephone networks 1 Mb/s Receiver 1 64 kb/s EE569 Digital Video Processing Receiver 3 47 Application-layer QoS Control Congestion control (using rate control): – Source-based, requires rate-adaptive compression or rate shaping – Receiver-based – Hybrid Error control: – – – – Forward error correction (FEC) Retransmission Error resilient compression Error concealment EE569 Digital Video Processing 48 Congestion Control • Window-based vs. rate control Window-based control EE569 Digital Video Processing (pros and cons?) Rate control 49 Source-based Rate Control EE569 Digital Video Processing 50 Video Multicast • How to extend source-based rate control to multicast? • Limitation of source-based rate control in multicast • Trade-off between bandwidth efficiency and service flexibility EE569 Digital Video Processing 51 Receiver-based Rate Control IP multicast for layered video 256 kb/s Receiver 2 Access SW Domain B Domain A Domain C Access SW Internet Gateway Ethernet Source Telephone networks 1 Mb/s Receiver 1 64 kb/s EE569 Digital Video Processing Receiver 3 52 Error Control • FEC – Channel coding – Source coding-based FEC – Joint source/channel coding • Delay-constrained retransmission • Error resilient compression • Error concealment EE569 Digital Video Processing 53 Channel Coding EE569 Digital Video Processing 54 Delay-constrained Retransmission EE569 Digital Video Processing 55 Outline • Challenges for quality video transport • An architecture for video streaming – – – – – – Video compression Application-layer QoS control Continuous media distribution services Streaming server Media synchronization mechanisms Protocols for streaming media • Summary EE569 Digital Video Processing 56 EE569 Digital Video Processing 57 Continuous Media Distribution Services • Content replication (caching & mirroring) • Network filtering/shaping/thinning • Application-level multicast (overlay networks) EE569 Digital Video Processing 58 Caching • What is caching? • Why using caching? WWW means World Wide Wait? • Pros and cons? EE569 Digital Video Processing 59 Outline • Challenges for quality video transport • An architecture for video streaming – – – – – – Video compression Application-layer QoS control Continuous media distribution services Streaming server Media synchronization mechanisms Protocols for streaming media • Summary EE569 Digital Video Processing 60 Streaming Server • Different from a web server – Timing constraints – Video-cassette-recorder (VCR) functions (e.g., fast forward/backward, random access, and pause/resume). • Design of streaming servers – Real-time operating system – Special disk scheduling schemes EE569 Digital Video Processing 61 Media Synchronization • Why media synchronization? • Example: lip-synchronization (video/audio) EE569 Digital Video Processing 62 Protocols for Streaming Video • Network-layer protocol: Internet Protocol (IP) • Transport protocol: – Lower layer: UDP & TCP – Upper layer: Real-time Transport Protocol (RTP) & Real-Time Control Protocol (RTCP) • Session control protocol: – Real-Time Streaming Protocol (RTSP): RealPlayer – Session Initiation Protocol (SIP): Microsoft Windows MediaPlayer; Internet telephony EE569 Digital Video Processing 63 Protocol Stacks EE569 Digital Video Processing 64 Summary • Challenges for quality video transport – Time-varying available bandwidth – Time-varying delay – Packet loss • An architecture for video streaming – – – – – – Video compression Application-layer QoS control Continuous media distribution services Streaming server Media synchronization mechanisms Protocols for streaming media EE569 Digital Video Processing 65