Overview of the Scalable Video Coding Extension of the H.264/AVC Standard Kai-Chao Yang 2007/8 Kai-Chao Yang, NTHU, Taiwan 1 Outline Introduction Problems Definition Functionality Goal Competition Applications Targets History of SVC Structure of SVC Temporal Scalability Spatial Scalability Quality Scalability Combined Scalability Profiles of SVC Conclusions 2007/8 Kai-Chao Yang, NTHU, Taiwan 2 Introduction - problem Non-Scalable Video Streaming Multiple video streams are needed for heterogeneous clients 8Mb/s 512Kb/s 1Mb/s 6Mb/s 2007/8 4Mb/s Kai-Chao Yang, NTHU, Taiwan 3 Introduction - definition Scalable video stream Sub-stream n Sub-stream 2 Sub-stream 1 reconstruc tion Sub-stream ki High quality … … Sub-stream k2 Sub-stream k1 Low quality Scalability 2007/8 Removal of parts of the video bit-stream to adapt to the various needs of end users and to varying terminal capabilities or network conditions Kai-Chao Yang, NTHU, Taiwan 4 Introduction - functionality Functionality of SVC 2007/8 Graceful degradation when “right” parts of the bit-stream are lost Bit-rate adaptation to match the channel throughput Format adaptation for backwards compatible extension Power adaptation for trade-off between runtime and quality Kai-Chao Yang, NTHU, Taiwan 5 Introduction - mode Example Most significant bit Enhancement 1 Enhancement 2 Enhancement 3 Enhancement 4 Enhancement 5 Enhancement layer Base layer 0 1 1 0 1 residual 10010 01101 10010 11001 00101 Scalability mode 2007/8 Fidelity reduction (SNR scalability) Picture size reduction (spatial scalability) Frame rate reduction (temporal scalability) Sharpness reduction (frequency scalability) Selection of content (ROI or object-based scalability) Kai-Chao Yang, NTHU, Taiwan 6 Structure of SVC SNR scalable coding Temporal scalable coding Prediction Multiplex Spatial decimation SNR scalable coding Temporal scalable coding 2007/8 Base layer coding Prediction Kai-Chao Yang, NTHU, Taiwan Base layer coding 7 Temporal Scalability Hierarchical prediction structures Hierarchical B pictures 0 4 3 5 2 7 6 8 1 12 11 13 10 15 14 16 9 GOP Non-dyadic hierarchical prediction 0 3 4 2 6 7 5 8 9 1 12 13 11 15 16 14 17 18 10 Hierarchical prediction with zero delay 2007/8 Yang, NTHU, Taiwan 0 1 2 3 4 5 6 7 8 Kai-Chao 9 1011 1213 14 15 16 8 Temporal Scalability N=1 Video Coding Experiment with H.264/MPEG4-AVC Foreman, CIF 30Hz @ 1320kbps Performance as a function of N I P P P P P P P P Cascaded QP assignment QP(P) QP(B0)-3 QP(B1)-4 QP(B2)-5 Temporal scalability N=2 I B0 P B0 P B0 P B0 P N=4 I B1 B0 B1 P B1 B0 B1 P N=8 2007/8 I B2 B1 B2 B0 B2 B1 B2 P Kai-Chao Yang, NTHU, Taiwan 9 This slide is copied from JVT-W132-Talk Spatial Scalability Hierarchical MCP & Intra-prediction Spatial decimation texture motion Base layer coding Inter-layer prediction •Intra •Motion •Residual H.264/AVC MCP & Intra-prediction 2007/8 motion Base layer coding Inter-layer prediction •Intra •Motion •Residual Hierarchical MCP & Intra-prediction Spatial decimation texture texture motion Multiplex Scalable bit-stream H.264/AVC compatible base layer bit-stream Base layer coding H.264/AVC compatible coder Kai-Chao Yang, NTHU, Taiwan 10 Spatial Scalability Similar to MPEG-2, H.263, and MPEG-4 Arbitrary resolution ratio The same coding order in all spatial layers Combination with temporal scalability Inter-layer prediction Spatial 1 Temporal 2 Intra Spatial 0 Temporal 0 Temporal 1 Intra 2007/8 Kai-Chao Yang, NTHU, Taiwan 11 Spatial Scalability The prediction signals are formed by MCP inside the enhancement layer (Temporal) (small motion and high spatial detail) Up-sampling from the lower layer (Spatial) Average of the above two predictions (Temporal + Spatial) Inter-layer prediction Three kinds of inter-layer prediction Base mode MB 2007/8 Inter-layer motion prediction Inter-layer residual prediction Inter-layer intra prediction Only residual are transmitted, but no additional side info. Kai-Chao Yang, NTHU, Taiwan 12 Spatial Scalability Inter-layer motion prediction base_mode_flag = 1 The reference layer is inter-coded Data are derived from the reference layer (2x1,2y1) 16 16 (x2,y2) Reference layer (x1,y1) 8 8 motion_pred_flag 2007/8 MB partitioning Reference indices MVs (2x2,2y2) 1: MV predictors are obtained from the reference layer 0: MV predictors are obtained by conventional spatial predictors. Kai-Chao Yang, NTHU, Taiwan 13 Spatial Scalability Inter-layer residual prediction residual_pred_flag = 1 Predictor 2007/8 Block-wise up-sampling by a bi-linear filter from the corresponding 88 sub-MB in the reference layer Transform block basis Kai-Chao Yang, NTHU, Taiwan 14 Spatial Scalability Inter-layer intra prediction base_mode_flag = 1 The reference layer is intra-coded Up-sampling from the reference layer 2007/8 Luma: one-dimensional 4-tap FIR filter Chroma: bi-linear filter Kai-Chao Yang, NTHU, Taiwan 15 Spatial Scalability Past spatial scalable video: Inter-layer intra prediction requires completely decoding of base layer. Multiple motion compensation and deblocking filter are needed. Full decoding + inter-layer prediction: complexity > simulcast. Single-loop decoding 2007/8 Inter-layer intra prediction is restricted to MBs for which the co-located base layer is intra-coded Kai-Chao Yang, NTHU, Taiwan 16 Spatial Scalability Single-loop vs. multi-loop decoding Inter I 2007/8 B P Kai-Chao Yang, NTHU, Taiwan 17 This slide is copied from http://iphome.hhi.de/wiegand/assets/pdfs/H264AVC_SVC.pdf Spatial Scalability Generalized spatial scalability in SVC Arbitrary ratio Cropping 2007/8 Neither the horizontal nor the vertical resolution can decrease from one layer to the next. Containing new regions Higher quality of interesting regions Kai-Chao Yang, NTHU, Taiwan 18 Spatial Scalability Encoder control (JSVM) Base layer p0 ' arg min {D0 ( p0 ) 0 R0 ( p0 )} { p0 } p0’ is optimized for base layer Enhancement layer p1 ' arg min {D1 ( p1 | p0 ) 1R1 ( p1 | p0 )} { p1| p0 } Decisions of p1 depend on p0 2007/8 p1’ is optimized for enhancement layer Efficient base layer coding but inefficient enhancement layer coding Kai-Chao Yang, NTHU, Taiwan 19 Spatial Scalability Encoder control (optimization) Base layer Considering enhancement layer coding Eliminating p0’s disadvantaging enhancement layer coding p0 ' arg min {(1 w)[ D0 ( p0 ) 0 R0 ( p0 )] w[ D1 ( p1 | p0 ) 1R1 ( p1 | p0 )]} { p0 , p1| p0 } Enhancement layer w 2007/8 No change w = 0: JSVM encoder control w = 1: Single-loop encoder control (base layer is not controlled) Kai-Chao Yang, NTHU, Taiwan 20 Quality Scalability Coarse-grain quality scalability (CGS) A special case of spatial scalability Smaller quantization step sizes of for higher enhancement residual layers Designed for only several selected bit-rate points 2007/8 Identical sizes for base and enhancement layers Supported bit-rate points = Number of layers Switch can only occur at IDR access units Kai-Chao Yang, NTHU, Taiwan 21 Quality Scalability Medium-grain quality scalability (MGS) More enhancement layers are supported Key pictures 2007/8 Refinement quality layers of residual Drift control Switch can occur at any access units CGS + key pictures + refinement quality layers Kai-Chao Yang, NTHU, Taiwan 22 Quality Scalability Drift control Drift: The effect caused by unsynchronized MCP at the encoder and decoder side Trade-off of MCP in quality SVC 2007/8 Coding efficiency drift Kai-Chao Yang, NTHU, Taiwan 23 Quality Scalability MPEG-4 quality scalability with FGS Refinement (possibly lost or truncated) Base layer Base layer is stored and used for MCP of following pictures Drift: Drift free Complexity: Low Efficiency: Efficient based layer but inefficient enhancement layer 2007/8 Refinement data are not used for MCP Kai-Chao Yang, NTHU, Taiwan 24 Quality Scalability MPEG-2 quality scalability (without FGS) Refinement (possibly lost or truncated) Base layer Only 1 reference picture is stored and used for MCP of following pictures Drift: Both base layer and enhancement layer 2007/8 Frequent intra updates is necessary Complexity: Low Efficiency: Efficient enhancement layer but inefficient base layer Kai-Chao Yang, NTHU, Taiwan 25 Quality Scalability 2-loop prediction Refinement (possibly lost or truncated) Base layer 2007/8 Several closed encoder loops run at different bitrate points in a layered structure Drift: Enhancement layer Complexity: High Efficiency: Efficient base layer and medium efficient enhancement layer Kai-Chao Yang, NTHU, Taiwan 26 Quality Scalability SVC concepts Refinement (possibly lost or truncated) Base layer Key picture 2007/8 Trade-off between coding efficiency and drift MPEG-4 FGS: All key pictures MPEG-2 quality scalability: No key pictures Kai-Chao Yang, NTHU, Taiwan 27 Quality Scalability Drift control with hierarchical prediction Refinement (possibly lost or truncated) Base layer P Key pictures 2007/8 B1 B2 P B2 B1 B2 P Based layer is stored and used for the MCP of following pictures Other pictures B2 Enhancement layer is stored and used for the MCP of following pictures GOP size adjusts the trade-off between enhancement layer coding efficiency andNTHU, drift Kai-Chao Yang, Taiwan 28 Combined Scalability SVC encoder structure The same motion/prediction information Dependency layer Temporal Decomposition The same motion/prediction information 2007/8 Kai-Chao Yang, NTHU, Taiwan 29 Combined Scalability Dependency and Quality refinement layers Q=2 D=2 Q=1 Q=0 Q=2 D=1 Q=1 Scalable bitstream Q=0 Q=2 D=0 2007/8 Q=1 Q=0 Kai-Chao Yang, NTHU, Taiwan 30 Combined Scalability Q1 D1 Q0 T0 T2 T1 T2 T0 Q1 D0 Q0 2007/8 Kai-Chao Yang, NTHU, Taiwan 31 Combined Scalability Bit-stream format NAL unit header NAL unit header extension 2 6 3 3 2 P T D Q NAL unit payload 1 1 1 1 1 3 P (priority_id): indicates the importance of a NAL unit T (temporal_id): indicates temporal level D (dependency_id): indicates spatial/CGS layer Q (quality_id): indicates MGS/FGS layer 2007/8 Kai-Chao Yang, NTHU, Taiwan 32 Combined Scalability Bit-stream switching Inside a dependency layer Outside a dependency layer 2007/8 Switching everywhere Switching up only at IDR access units Switching down everywhere if using multiple-loop decoding Kai-Chao Yang, NTHU, Taiwan 33 Profiles of SVC Scalable Baseline 2007/8 For conversational and surveillance applications requiring low decoding complexity Spatial scalability: fixed ratio (1, 1.5, or 2) and MBaligned cropping Temporal and quality scalability: arbitrary No interlaced coding tools B-slices, weighted prediction, CABAC, and 8x8 luma transform The base layer conforms Baseline profile of H.264/AVC Kai-Chao Yang, NTHU, Taiwan 34 Profiles of SVC Scalable High For broadcast, streaming, and storage Spatial, temporal, and quality scalability: arbitrary The base layer conforms High profile of H.264/AVC Scalable High Intra 2007/8 Scalable High + all IDR pictures Kai-Chao Yang, NTHU, Taiwan 35 References H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the Scalable Video Coding Extension of the H.264/AVC Standard,” CSVT 2007. T. Wiegand, “Scalable Video Coding,” Joint Video Team, doc. JVT-W132, San Jose, USA, April 2007. T. Wiegand, “Scalable Video Coding,” Digital Image Communication, Course at Technical University of Berlin, 2006. (Available on http://iphome.hhi.de/wiegand/dic.htm) H. Schwarz, D. Marpe, and T. Wiegand, “Constrained Inter-Layer Prediction for Single-Loop Decoding in Spatial Scalability,” Proc. of ICIP’05. 2007/8 Kai-Chao Yang, NTHU, Taiwan 36