Introduction to H.264 Video Standard Anurag Jain Texas Instruments H.264 Background Jointly developed by ITU-T and MPEG. Upto 50% more efficient at the same virtual quality compared to MPEG-4 ASP Supports wide range of applications. (interlaced, progressive, low bitrate, studio quality digital cinema etc). Multiple profiles (Baseline, Main, Extended, High, FRExt). Good results obtained from interoperability tests making it suitable for wide deployment in short span of time. H.264 Encoder Block Diagram Intra Prediction Modes 9 4x4 & 4 16x16 modes = 13 modes Video Source Intra Prediction +_ Quantization step more resolution for finer control of bit rate Coding Control Intra Transform Quantization Quantized Transform Coefficients Inter Inverse Quantization Predicted Frame Inverse Transform Motion Compensation + + Frame Store Motion Estimation •Seven block sizes and shapes •Multiple reference picture selection •1/4-pel motion estimation accuracy •Referenced B-frames Loop Filter Entropy Coding Bit Stream Out [Single Universal VLC and Context Adaptive VLC] OR [Context-Based Adaptive Binary Arithmetic Coding] Motion Vectors Integer 16-bit fixed point transform with no mismatch Common Elements Common elements with other standards Macroblocks: 16x16 luma + 2 x 8x8 chroma samples Input: association of luma and chroma and conventional block motion displacement Motion vectors over picture boundaries Block Transform Variable block-size motion I, P and B picture coding types High Level Coding Tools Sequence and Picture Parameter Sets (SPS & PPS) Picture Order Count (POC) Decoded Picture Buffer (DPB) Slice group map (FMO) Multiple slices and arbitrary arrangements (ASO) Supplemental Enhancement Information (SEI) Hypothetical Reference Decoder (HRD) Video Usability Information (VUI) High Level Tools: Coding Hierarchy A coded sequence contains one or more access units An access unit is a set of NAL units that contains all necessary information for decoding exactly one (primary) coded picture A coded picture is divided into Slices (VLC NAL units) A slice contains a slice header and a set of macroblocks A macroblock contains a 16x16 luma block and two chroma blocks An I-slice contains a set of INTRA-coded macroblocks A P-slice contains a set of INTRA- and INTER-coded macroblocks An IDR (instantaneous decoding refresh) picture contains only I-slices (SI-slices too in extended profile) Sequence Parameter Set Profile @ Level indicator Profile constraint indicator Sequence parameter set ID (0..31) Picture order count type and infos DPB (Decode Picture Buffer) info Picture size Frame/field coding flag Method for vector derivation of B-direct mode Frame cropping parameters VUI_parameters (Annex E, Video usability information) Picture Parameter Set Picture parameter ID (0..255) Sequence parameter ID (0..31) Entropy coding mode flag (CABAC/CAVLC) Slice POC info presence flag Slice group map parameters Max. number (1..16) of ref. frames used for decoding slices Weighted prediction flags Quantization scales (qp minus 26, range -26 ..+ 25) Chroma QP offset for loop-filter (-12 ..+12) Slice loop-filter control flag (Alpha/Beta table offsets) INTRA predication using pixels of INTER neighboring MBs? Slice redundant pic. parameters presence flag Slice Header Starting macroblock address Slice type (I, P, B, SI, SP ) Temporal reference (frame_num) Picture parameter set ID (0..255) Interlaced frame/field coding, top/bottom field indicators IDR pictire ID (0,… 65536) Slice POC parameters Redundant picture count(0.. 127, 0 for baseline) B-slice temporal or spatial direct mode indicator Max. number (1..32) of ref. pictures for decoding current slice Reference picture reordering parameters (DPB) Weighted prediction parameters DPB marking parameters (e.g. short term, long term pred. Pics) Slice delta QP (-26 ..25) SP switch flag and SP/SI slice QP Loop-filter indicator (0: disabled, 1: enabled, 2: enabled but LP across slice Boundaries disabled) Loop-filter alpha/beta table access offset (-6, +6) Slice group change cycle (derives the No. of MBs in slice group 0) Slice Group Maps For error resilience Ordering of Slices within Slice Groups Low Level Coding Tools Motion compensated prediction Additional intra modes for spatial compensation Transform: 4x4 Integer transform (Baseline, Main Profiles) Transform: 8x8 Integer transform (High Profile) Quantization: Scalar quantization Entropy Coding : CABAC / CAVLC In-loop deblocking filter Enhanced MC (Inter Prediction) Every macroblock can be split in one of 7 ways for improved motion estimation D B A Current Macroblock or Partition or Block C Accuracy of motion compensation = 1/4 pixel Up to 5 reference frames for SDTV size @ L3 Weighted predictions Reference B pictures Trade off between accuracy and side information B Slice - Direct Mode Direct mode Forward / backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures. Current Picture List 0 Reference Spatial Direct mode Temporal Direct mode mvL0 = tb mvCol / td mvL1 = – (td – tb) mvCol / td List 1 Reference ...... mvCol mvL0 co-located partition where mvCol is a MV used in the co-located MB of the subsequent picture direct-mode partition td tb mvL1 B Slice : Multi-picture Reference Mode Generalized Bidirectional prediction Multiple reference pictures mode Two forward references : proper for a region just before scene change Two backward references : proper for a region just after scene change current picture previous pictures ...... next pictures ...... ...... ...... 2 forward MVs traditional Bidirectional 1 forward MV + 1 backward MV 2 backward MVs H.264 Intra Prediction 9 modes for 4x4 blocks 4 modes for 16x16 intra prediction Luma Sub-Pixel Interpolation Chroma Sub-pel Calculation If (vx, vy) is luma vector, then xFracc = vx&0x7, yFracc = vy&0x7 Block Scanning Order in a MB One more extraction of correlation among sub-blocks Transform & Quant Integer 4x4 DCT approximation. 8x8 Cost of transformed differences (i.e. residual coefficients) for 4x4 block using 4 x 4 Hadamard-Transformation for INTRA_16x16 coded macroblocks. Scalar quantization. All integers! 4x4 Luma/Chroma AC 8x8 Luma-Chroma Hadamard Interlaced Coding Deblocking filter Frame / Field Adaptation Picture Adaptive Frame Field (PicAFF). Macroblock Adaptive Frame Field (MBAFF) Field scan and zig-zag scan options Zig-zag Frame Scan Field Scan Entropy Coding Universal Variable Length Coding (UVLC) using Exp-Golomb codes. Context Adaptive VLC (CAVLC) Context Adaptive Binary Arithmetic Coding (CABAC) CAVLC Zigzag order: 50 33 27 20 0 5 0 0 1 -1 0 0 0 0 0 0 • TotalCoeff = 7 : # of non-zeros • Trailing 1s = 2 : 1, -1 • Sign Trail = 1 0 (reverse order) : minus, plus • Levels = 5 20 27 33 50 (reverse order) : 7 – 2 = 5 • TotalZeros = 3 (# of zeros) • RunsBefore = 0 2 1 : 0 before -1, 2 before 1, and 0’s before 5 Exp Golomb Coding Loop filter Check if the boundary is original to picture or blocking effects 16*16 Macroblock 16*16 Macroblock Horizontal edges (luma) Horizontal edges (chroma) Vertical edges Vertical edges (chroma) (luma) Profiles and Tools H.264 Profiles and Tools: Graphical Representation Main Profile Extended Profile Data partition SI slice SP slice CABAC B slice Weighted prediction I slice P slice CAVLC Arbitrary slice order Flexible macroblock order Redundant slice Baseline Profile FRExt: Fidelity Range Extension Lossless representation Allows more than 8-bits per sample (upto 12-bits) Higher resolution for color representation (4:2:2, 4:4:4) Source editing function like alpha blending Very high bit-rates (often with constant quality) Very high-resolution Color space transformation (YCgCo, YCbCr, RGB) RGB color representation Adaptive block transform sizes Quantization matrices Coding Efficiency Comparision of Standards Feature/Standard MPEG-1 MPEG-2 MPEG-4 part 2 (visual) H.264/MPEG-4 part 10 16x16 16x16 (frame mode) 16x8 (field mode) 16x16 16x16 Block Size 8x8 8x8 16x16, 16x8, 8x8 16x16, 8x16, 16x8, 8x8, 4x8, 8x4, 4x4 Transform 8x8 DCT 8x8 DCT 8x8 DCT/Wavelet 4x4, 8x8 Int DCT 4x4, 2x2 Hadamard Scalar quantization with step size of constant increment Scalar quantization with step size of constant increment Vector quantization Scalar quantization with step size of increase at the rate of 12.5% Entropy coding VLC VLC VLC VLC, CAVLC, CABAC Motion Estimation & Compensation Yes Yes Yes Yes, more flexible Up to 16 MVs per MB Playback & Random Access Yes Yes Yes Yes Macroblock size Quantization Comparision of Standards (cont’d..) Feature/Standard Pel accuracy MPEG-1 MPEG-2 (visual) H.264/MPEG-4 part 10 MPEG-4 part 2 Integer, ½-pel Integer, ½-pel Integer, ½-pel, ¼-pel Integer, ½-pel, ¼-pel Profiles No 5 8 3 Reference picture one one one multiple forward/backward forward/backward forward/backward forward/forward forward/backward backward/backward I, P, B, D I, P, B I, P, B I, P, B, SP, SI Error robustness Synchronization & concealment Data partitioning, FEC for important packet transmission Synchronization, Data partitioning, Header extension, Reversible VLCs Data partitioning, Parameter setting, Flexible macroblock ordering, Redundant slice, Switched slice Transmission rate Up to 1.5Mbps 2-15Mbps Compatibility with previous standards n/a Yes Yes No Encoder complexity Low Medium Medium High Bidirectional prediction mode Picture Types 64kbps - 2Mbps 64kbps -150Mbps References – Related group • MPEG website http://www.mpeg.org • JVT website: ftp://ftp.imtc-files.org/jvt-experts • www.mpegif.org – Test software • H.264/AVC JM Software: http://bs.hhi.de/~suehring/tml/download – Test sequences • • • • • http://ise.stanford.edu/video.html http://kbs.cs.tu-berlin.de/~stewe/vceg/sequences.htm http://www.its.bldrdoc.gov/vqeg ftp.tnt.uni-hannover.de/pub/jvt/sequences/ http://trace.eas.asu.edu/yuv/yuv.html THANKS