CSE 489-02 & CSE 589-02 Multimedia Processing Lecture 11 Video Coding Spring 2009 New Mexico Tech 4/13/2015 1 History H.264/AVC 4/13/2015 2 MC-DCT Coding Framework Motion estimation/compensation based on previously decoded frames Block-translation motion model Inter-coding: DCT-based coding of prediction error (residue) Intra-coding: If motion estimation fails or synchronization is desired, macro-block is encoded in intra-mode Most international video coding standards are based on this coding framework Video teleconferencing: H.261, H.263, H.263++, H.264 Video archive & play-back: MPEG-1, MPEG-2 (in DVDs), MPEG-4 4/13/2015 3 Hybrid MC-DCT Encoder Transform, Quantization, Entropy Coding Input Macro-Block Motion Compensated Prediction Encoded Residual (To Channel) Entropy Decoding, Inverse Q, Inverse Transform Decoder Decoded Input Macro-Block (To Display) Motion Comp. Predictor 4/13/2015 Motion Estimation Frame Buffer (Delay) Motion Vector and Block Mode Data (Side-Info, To Channel) 4 Inter and Intra Coding Intra MB is encoded as is without motion compensation DCT followed by Q, zig-zag, run-length, Huffman Inter 4/13/2015 Block-matching motion estimation Predictive motion residue from best-match block is DCT encoded (similarly to intra-mode) Motion vector is differentially encoded 5 Intra-Coding Mode DCT input MB E Q Q Encoder to bit-stream 1 IDCT to motion compensated frame bit-stream E 1 Q 1 IDCT to display frame Decoder 4/13/2015 6 Inter-Coding Mode xn input MB rn DCT xˆMC n 1 E Q to bit-stream Q 1 IDCT Encoder rˆn xˆMC n 1 MC 4/13/2015 xn ME xˆn 1 D reference frame xˆn 7 Video Sequence and Picture Intra 0 Inter 2 Inter 3 Inter 4 Intra Picture (I-Picture) Inter 1 Encoded without referencing others All MBs are intra coded Inter Picture (P-Picture, B-Picture) 4/13/2015 Encoded by referencing other pictures Some MBs are intra coded, and some are inter coded 8 Inter 5 Group of Pictures GOP GOP … GOP I B Group of Pictures (GOP) I B B P B B P Frame order: 0 1 2 3 4 5 6 Encoding order: 0 2 3 1 5 6 4 Video stream 4/13/2015 … B B B P … Coding of I-Slice DCT Original block Bit-stream 15 0 -2 -1 -1 -1 0 … Entropy coding 4/13/2015 Transformed block Zig-zag scan 10 Quantization matrix Coding of P-Slice - Motion Estimation = Original current frame = Motion Vectors + Frame buffer 4/13/2015 Reconstructed reference frame 11 Motion Compensation Residual Motion Estimation in H.261 8 4/13/2015 Macro-block Luminance: 16x16, four 8x8 blocks Chrominance: two 8x8 blocks Motion estimation only performed for luminance component Motion vector range [ -15, 15] 8 Y Y Y Y Cr Cb 15 15 MB 15 15 Search Area in Reference Frame 12 Coding of Motion Vectors MV has range [-15, 15] Integer pixel ME search only Motion vectors are differentially & separably encoded MVDx MVx [n] MVx [n 1] MVDy MVy [n] MVy [n 1] 11-bit VLC for MVD Example MV = 2 2 3 5 3 1 -1… MVD = 0 1 2 -2 -2 -2… 4/13/2015 Binary: 1 010 0010 0011 0011 0011… 13 Inter/Intra Switching Based on energy of prediction error High energy: scene change, occlusions, uncovered areas… use intra mode Low energy: stationary background, translational motion … use inter mode VAR INTER 1 2 VAR c [ x , y ] c 256 MB 64 INTRA MSE 64 4/13/2015 1 2 MSE c [ x , y ] r [ x dx , y dy ] 14 256 MB Loop Filter 4/13/2015 Optional Can be turned on or off for each block, usually go together with MC Advantage Decreases prediction error by smoothing the prediction frame Reduces high-frequency artifacts like mosquito effects Disadvantage Increases complexity & overhead 15 Quantization Uniform mid-rise quantizer for intra DC coefficients Uniform mid-tread quantizer with double dead zone for inter DC and all AC coefficients Y ^ =X -2Q Y -Q 1 1 X 0 -1 -2 4/13/2015 2 ^ =X 2 For intra DC Q 2Q -2Q -Q 0 Q X 2Q -1 -2 For inter DC and all AC 16 H.263 Standardization effort started Nov 1993 Aim Near-term H.263 and H.263+: established late 1997 Long-term low bit-rate video communications, less than 64 kbps target PSTN and mobile network: 10-32 kbps H.26L, H.264: still under investigation Main properties 4/13/2015 H.261 with many MPEG features optimized for low bit rates Performance: 3-4 dB improvements over H.261 at less than 64 kbps; 30% bit rate saving over MPEG-1 17 MPEG Coding and communications of moving pictures and associated audio for digital storage and archival MPEG: Moving Picture Expert Group MPEG family MPEG-1, MPEG-2, MPEG-4, MPEG-7, Nov 1992 Nov 1994 Oct 1998 ongoing work Main features of the MPEG video family 4/13/2015 Bi-directional MEMC I-frame, P-frame, B-frame Structure: Group of Pictures (GOP), picture, slice, macroblock 18 Coding decisions MPEG Goals and Applications MPEG-1 Optimized for applications that support a continuous transfer bit rate of about 1.5 Mbps (example, CD-ROM) Target 1.2 Mbps for video and 250-300 kbps for audio, around analog VHS quality Does not support interlaced sources Main target source: SIF YCrCb 4:2:0 360 x 240 x 30 fps VCD MPEG-2 4/13/2015 The most commercially successful international coding standard Wide range of bit rates: 4 – 80 Mbps; optimized for 4 Mbps Target high-resolution, high-quality video broadcast & playback 19 Requirements 4/13/2015 Coding of generic video at around 1.5 Mbps at reasonable quality (VHS) Random access capability, frequent access point Fast forward and fast rewind capability Audio-video synchronization during play and access Simple decoder Flexibility of data format Certain degree of robustness to communication errors Real-time encoder possibility 20 From H.261 to MPEG-1 There are a few new features in MPEG-1 comparing to the pioneering H.261 codec 4/13/2015 Flexible data sizes and frame rates More flexible slice structure to replace the fixed GOB structure Data structure: introducing Group of Picture (GOP) allowing frequent access points Bi-directional motion compensation, B-frames Half-pixel motion compensation More finely tuned VLCs for different purposes Quantization table (like JPEG) replaces single Q step size 21 Bidirectional MC Properties Advantage Higher coding efficiency, frame rate can be increased significantly with few bits More accurate motion estimation & compensation No error propagation Disadvantage 4/13/2015 More memory buffer for frame storage (minimum of 3) More end-to-end delay 22 H.264/AVC History In the early 1990’s, the first video compression standards were introduced: Since then, the technology has advanced rapidly H.261 (1990) and H.263 (1995) from ITU MPEG-1 (1993) and MPEG-2 (1996) from ISO H.263 was followed by H.263+, H.263++, H.26L MPEG-1/2 followed by MPEG-4 visual But industry and research coders are still way ahead H.264/AVC is a joint project of ITU and ISO, to create an up-to-date standard. 4/13/2015 23 Scope and Context Aimed at providing high-quality compression for various services: Standard defines: IP streaming media (50-1500 kbps) SDTV and HDTV Broadcast and video-on-demand (1 - 8+ Mbps) DVD Conversational services (<1 Mbps, low latency) Decoder functionality (but not encoder) File and stream structure Final results: 2-fold improvement in compression Same fidelity, half the size --- Compared to H.263 and MPEG-2 4/13/2015 24 Video Compression Motion compensation / prediction Image transform Described current frame based on previous frame Output description + residual image Predicted frames are called “inter-frames”. Some frames (intra-frames) are encoded without prediction, as natural images. Concentrate image energy in relatively few numeric coefficients Lossy coding 4/13/2015 Compress coefficient values in a lossy manner Try to keep most important information 25 The H.263 Standard Coder original video compressed video Motion Compensation 4/13/2015 Image Transform Lossy Coding 26 The H.263 Standard Coder original video H.263 Motion Compensation compressed video • Image is divided into 16x16 macroblocks, • Each macroblock is matched against nearby blocks in previous frame (called reference frame), • “Nearby” = within 15-pixel horizontal/vertical range Image Lossy Motion • Half-pixel accuracy (with bilinear pixel interpolation) Transform Coding Compensation • Best match is used to predict the macroblock, • The relative displacement, or motion vector, is encoded and transmitted to decoder • Prediction error for all blocks constitute the residual. 4/13/2015 27 Motion Compensation Example 4/13/2015 T=1 (reference) T=2 (current) 28 The H.263 Standard Coder original video compressed video H.263 Image Transform • Residual is divided into 8x8 blocks, • 8x8 2-d Discrete Cosine Transform (DCT) is applied to each block independently Image Lossy Motion • DCT coefficients describe spatial frequencies in the block: Transform Coding Compensation • High frequencies correspond to small features and texture • Low frequencies correspond to larger features • Lowest frequency coefficient, called DC, corresponds to the average intensity of the block 4/13/2015 29 8x8 DCT Example 4/13/2015 30 8x8 DCT Example 4/13/2015 31 8x8 DCT Example 4/13/2015 32 The H.263 Standard Coder original video compressed video H.263 Lossy Coding • Transform coefficients are quantized: • Some less-significant bits are dropped • Only the remaining bits are encoded • For inter-frames, all coefficients get the same number of bits, except for the DC which gets more. Image Lossy • ForMotion intra-frames, lower-frequency coefficients get more bits Transform Coding Compensation • To preserve larger features better • The actual number of bits used depends on a quantization parameter (QP), whose value depends on the bit-allocation policy • Finally, bits are encoded using entropy (lossless) code • Traditionally Huffman-style code 4/13/2015 33 Changes in Motion Compensation Quarter-pixel accuracy Variable block-size: Every 16x16 macroblock can be subdivided Each sub-block gets predicted separately Multiple and arbitrary reference frames A gain of 1.5-2dB across the board over ½pixel Vs. only previous (H.263) or previous and next (MPEG). Anti-aliasing sub-pixel interpolation 4/13/2015 Removes some common artifacts in residual 34 Variable Block-Size MC Motivation: size of moving/stationary objects is variable Many small blocks may take too many bits to encode Few large blocks give lousy prediction In H.264, each 16x16 macroblock may be: 4/13/2015 Kept whole, Divided horizontally (vertically) into two subblocks of size 16x8 (8x16) Divided into 4 sub-blocks In the last case, the 4 sub-blocks may be divided once more into 2 or 4 smaller blocks. 35 H.264 Variable Block Sizes 4/13/2015 36 Motion Scale Example 4/13/2015 T=1 T=2 37 Motion Scale Example 4/13/2015 T=1 T=2 38 Motion Scale Example 4/13/2015 T=1 T=2 39 H.264 VBS Example 4/13/2015 T=1 T=2 40 Arbitrary Reference Frames In H.263, the reference frame for prediction is always the previous frame In MPEG and H.26L, some frames are predicted from both the previous and the next frames (biprediction) In H.264, any one frame may be used as reference: Encoder and decoder maintain synchronized buffers of available frames (previously decoded) Reference frame is specified as index into this buffer In bi-predictive mode, each macroblock may be: 4/13/2015 Predicted from one of the two references Predicted from both, using weighted mean of predictors 41 Intra Prediction Motivation: intra-frames are natural images, so they exhibit strong spatial correlation Macroblocks in intra-coded frames are predicted based on previously-coded ones Implemented to some extent in H.263++ and MPEG-4, but in transform domain Above and/or to the left of the current block The macroblock may be divided into 16 4x4 sub-blocks which are predicted in cascading fashion An encoded parameter specifies which neighbors should be used to predict, and how 4/13/2015 42 Intra-Prediction Example 4/13/2015 43 Intra-Prediction Example Vertical 4/13/2015 44 Intra-Prediction Example Horizontal 4/13/2015 45 Intra-Prediction Example Main Diagonal 4/13/2015 46 H.264 Image Transform Motivation: DCT requires real-number operations, which may cause inaccuracies in inversion H.264 uses a very simple integer 4x4 transform A (pretty crude) approximation to 4x4 DCT Transform matrix contains only +/-1 and +/-2 Can be computed with only additions, subtractions, and shifts Results show negligible loss in quality (~0.02dB) 4/13/2015 47 Deblocking Filter Non Deblocked Image Deblocked Image Courtesy : Images from http://compression.ru/video/deblocking/ 4/13/2015 48 Entropy Coding Motivation: traditional coders use fixed, variable-length codes Essentially Huffman-style codes Non-adaptive Can’t encode symbols with probability > 0.5 efficiently, since at least one bit required H.263 Annex E defines an arithmetic coder 4/13/2015 Still non-adaptive Uses multiple non-binary alphabets, which results in high computational complexity 49 Entropy Coding: CABAC Context-adaptive binary arithmetic coding (CABAC) framework designed specifically for H.264 Binarization: all syntax symbols are translated to bit-strings 399 predefined context models, used in groups E.g. models 14-20 used to code macroblock type for inter-frames The model to use next is selected based on previously coded information (the context) 4/13/2015 50 Comparison to MPEG-2, H.263, MPEG-4p2 Quality Y-PSNR [dB] Tempete CIF 30Hz 38 37 36 35 34 33 32 31 30 29 28 27 26 25 JVT/H.264/AVC MPEG-4 Visual MPEG-2 H.263 0 500 1000 1500 2000 Bit-rate [kbit/s] 2500 3000 3500