Video Compression Fall 2011 Hongli Luo Video Compression Image compression To reduce spatial redundancy Video compression spatial redundancy exists in each frame as in images Temporal redundancy exists between frames and can be used for compression Video compression reduces spatial redundancy within a frame and temporal redundancy between frames Each video frame can be encoded differently depending on whether to exploit spatial redundancy or temporal redundancy • Intraframe • Interframe Intraframe and Interframe Intraframe Each frame is encoded as an individual image Use image compression technique, e.g., DCT Interframe Predictive Encoding between frames in the temporal domain Instead of coding the current frame directly, the difference between the current frame and a prediction based on previous frames Use motion compensation Intraframe Coding The frames are compressed using Lossy compression, e.g., DCT or subsampling and quantization Lossless entropy compression, e.g. huffman or arithmetic MPEG/ITU standard compress the intraframe similar to JPEG image standard Get 8 x 8 blocks DCT transformation on each block Quantization of the coefficients AC zigzag DPCM on DC coefficients Runlength coding on AC coefficients Huffman or arithmetic coding Interframe Coding How does a pixel value change from one frame to the next frame? No change, e.g., background Slight changes due to quantization Changes due to motion of the object Changes due to motion of the camera Changes due to environment and lighting No changes – no need to code Changes due to motion of object or camera Predict how the pixel has moved Encoding the changing vector Video Compression with Motion Compensation Consecutive frames in a video are similar - temporal redundancy exists. Temporal redundancy is exploited so that not every frame of the video needs to be coded independently as a new image. The difference between the current frame and other frame(s) in the sequence will be coded - small values and low entropy, good for compression. Steps of Video compression based on Motion Compensation (MC): 1. 2. 3. Motion Estimation (motion vector search). Motion Compensation based Prediction. Derivation of the prediction error, i.e., the difference. Video Compression Based on Motion Compensation Each image is divided into macroblocks of size N x N. By default For luminance images, N = 16 For chrominance images, N = 8 if 4:2:0 chroma subsampling is adopted Motion compensation is at the macroblock level The current image frame is referred to as Target Frame. A match is sought between the macroblock in the Target Frame and the most similar macroblock in previous and/or future frame(s) (referred to as Reference frame(s)). The displacement of the reference macroblock to the target macroblock is called a motion vector MV. Assume color of (x, y) is the same or very similar to (x0,y0) Displacement or motion vector d = (dx, dy) (x, y) = (x0+dx, y0+dy) d = (dx, dy) = (x-x0, y-y0) = (x,y) – (x0,y0) dx = x-x0 dy=y-y0 Motion Estimation and Compensation Motion Estimation For a certain macroblock of pixels in the current frame (referred to as target frame) , find the most similar macroblock in a reference frame (previous or future frame), within specified search area. • Search for the Motion Vector - MV search is usually limited to a small immediate neighborhood – both horizontal and vertical displacements in the range [−p, p] Motion Compensation The target macroblock is predicted from the reference macroblock Use the motion vectors to compensate the picture Simple Motion Example Consider a simple block of a moving circle. Instead of coding the current frame, code the difference between 2 frames. The difference needs fewer bits to encode. From Multimedia CM0340 David Marshall Estimate Motion of Blocks Estimate the motion of the object, encode the motion vectors and difference picture. From Multimedia CM0340 David Marshall Decode Motion of Blocks Use the motion vector and difference picture for decoding. From Multimedia CM0340 David Marshall Motion Estimation and Compensation Advantage Motion estimation and compensation reduce the video bitrates significantly After the first frame, only the motion vectors and difference macroblocks need be coded. Introduce extra computational complexity The motion estimation is the most computation expensive part of a video encoder Need to buffer reference pictures – previous frames or future frames Video Compression Standard Image, Video and Audio compression standards have been specified by two major groups since 1985 ISO (International Standards Organization) JPEG MPEG • MPEG-1, MPEG-2, MPEG-4, MPEG-7, MPEG-21 ITU (International Telecommunications Union) H.261 H.263 H.264 – by Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG. H.261 H.261: An earlier digital video compression standard, its principle of MC-based compression is retained in all later video compression standards. The standard was designed for videophone, video conferencing and other audiovisual services over ISDN. The video codec supports bit-rates of p x 64 kbps, where p ranges from 1 to 30 (Hence also known as p x 64). Require that the delay of the video encoder be less than 150 msec so that the video can be used for realtime bidirectional video conferencing. ITU Recommendations & H.261 Video Formats H.261 belongs to the following set of ITU recommendations for visual telephony systems: H.221 - Frame structure for an audiovisual channel supporting 64 to 1,920 kbps. H.230 - Frame control signals for audiovisual systems. H.242 - Audiovisual communication protocols. H.261 - Video encoder/decoder for audiovisual services at p x 64 kbps. H.320 - Narrow-band audiovisual terminal equipment for p x 64 kbps transmission. H.261 Frame Sequence Two types of image frames are defined: Intra-frames (I-frames) and Inter-frames (P-frames): I-frames are treated as independent images. Transform coding method similar to JPEG is applied within each Iframe, hence “Intra”. P-frames are not independent: coded by a forward predictive coding method (prediction from a previous P-frame is allowed - not just from a previous I-frame). Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only spatial redundancy removal. To avoid propagation of coding errors, an I-frame is usually sent a couple of times in each second of the video. Intra-frame (I-frame) Coding Macroblocks are of size 16 x 16 pixels for the Y frame, and 8 x 8 for Cb and Cr frames, since 4:2:0 chroma subsampling is employed. A macroblock consists of four Y, one Cb, and one Cr 8 x 8 blocks. For each 8 x 8 block a DCT transform is applied, the DCT coefficients then go through quantization zigzag scan and entropy coding. Inter-frame (P-frame) Predictive Coding Figure 10.6 shows the H.261 P-frame coding scheme based on motion compensation: For each macroblock in the Target frame, a motion vector is allocated by one of the search methods discussed earlier. After the prediction, a difference macroblock is derived to measure the prediction error. Each of these 8 x 8 blocks go through DCT, quantization, zigzag scan and entropy coding procedures. Inter-frame (P-frame) Predictive Coding The P-frame coding encodes the difference macroblock (not the Target macroblock itself). Sometimes, a good match cannot be found, i.e., the prediction error exceeds a certain acceptable level. The MB itself is then encoded (treated as an Intra MB) and in this case it is termed a non-motion compensated MB. In fact, even the motion vector is not directly coded. The difference, MVD, between the motion vectors of the preceding macroblock and current macroblock is sent for entropy coding: MVD = MVPreceding − MVCurrent (10:3) H.263 H.263 is an improved video coding standard for video conferencing and other audiovisual services transmitted on Public Switched Telephone Networks (PSTN). Aims at low bit-rate communications at bit-rates of less than 64 kbps. Uses predictive coding for inter-frames to reduce temporal redundancy and transform coding for the remaining signal to reduce spatial redundancy (for both Intra-frames and inter-frame prediction). MPEG-1 MPEG: Moving Pictures Experts Group, established in 1988 for the development of digital video. MPEG-1 adopts the CCIR601 digital TV format also known as SIF (Source Input Format). MPEG-1 supports only non-interlaced video. Normally, its picture resolution is: 352 x 240 for NTSC video at 30 fps 352 x 288 for PAL video at 25 fps It uses 4:2:0 chroma subsampling Motion Compensation in MPEG-1 Motion Compensation (MC) based video encoding in H.261 works as follows: In Motion Estimation (ME), each macroblock (MB) of the Target P-frame is assigned a best matching MB from the previously coded I or P frame prediction. prediction error: The difference between the MB and its matching MB, sent to DCT and its subsequent encoding steps. The prediction is from a previous frame - forward prediction. • The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. • A match however can readily be obtained from the next frame. Motion Compensation in MPEG-1 (Cont'd) MPEG introduces a third frame type - B-frames, and its accompanying bi-directional motion compensation. The MC-based B-frame coding idea is illustrated in Fig. 11.2: Each MB from a B-frame will have up to two motion vectors (MVs) (one from the forward and one from the backward prediction). Group of Picture (GOP): starts with a I-frame, followed by B and P frames This GOP has 10 frames, with the structure: IBBPBBPBB MPEG-1 Frames Coding mechanism similar to H.261 Three types of frames: I-frames, coded in intra-frame mode P-frames, coded with motion compensation using a previous I or P frame as reference) B-frames, coded with bidirectional motion compensation based on a previous or a future I or P frames B-frames Advantages: Coding efficiency. Most B frames use less bits. Better Error propagation: B frames are not used to predict future frames, errors generated will not be propagated further within the sequence. Disadvantage: Frame reconstruction memory buffers within the encoder and decoder must be doubled in size to accommodate the 2 anchor frames. Other MPEG MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps. Originally designed as a standard for digital broadcast TV Also adopted for DVDs MPEG-3: Originally for HDTV (1920 x 1080), got folded into MPEG-2 MPEG-4: very low bit-rate communication The bit-rate for MPEG-4 video now covers a large range between 5 kbps to 10 Mbps. MPEG-7: Main objective is to serve the need of audiovisual content- based retrieval (or audiovisual object retrieval) in applications such as digital libraries. MPEG-21: New standard The vision for MPEG-21 is to define a multimedia framework to enable transparent and augmented use of multimedia resources across a wide range of networks and devices used by different communities. MPEG-4 Part10/H.264 The H.264 video compression standard, formerly known as “H.26L”, is being developed by the Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG. Preliminary studies using software based on this new standard suggests that H.264 offers up to 30-50% better compression than MPEG-2, and up to 30% over H.263+ and MPEG-4 advanced simple profile. The outcome of this work is actually two identical standards: ISO MPEG-4 Part10 and ITU-T H.264. H.264 H.264 is currently one of the leading candidates to carry High Definition TV (HDTV) video content on many potential applications. H.264 is adopted by Apple QuickTime 7 delivers high quality at remarkably low data rates. Generate bit stream across a broad range of bandwidths, • 3G mobile devices, iPod • Video on demand, video streaming (MPEG-4 Part 2) • video conferencing (H.263) • HD for broadcast (MPEG-2) • DVD (MPEG-2)