Video coding [??] Video coding Types of redundancies: – Spatial: Correlation between neighboring pixel values – Spectral: Correlation between different color planes or spectral bands – Temporal: Correlation between different frames in a video sequence In video coding, temporal correlation is also exploited, typically using motion compensation (a predictive coding based on motion estimation) Video standards review H.261 For video-conferencing/video phone – Low delay (real-time, interactive) – Slow motion in general • For transmission over ISDN – Fixed bandwidth: px64 Kbps, p=1,2,…,30 H.261 • Video Format: – CIF (352x288, above 128 Kbps) – QCIF (176x144, 64-128 Kbps) – 4:2:0 color format, progressive scan • Published in 1990 • Each macroblock can be coded in intra- or inter-mode • Periodic insertion of intra-mode to eliminate error propagation due to network impairments DCT coefficient quantization DC Coefficient in Intra-mode: Uniform Others: Uniform with deadzone (to avoid too many small coefficients being coded, which are typically due to noise) MVs coded differentially (DMV) DCT coefficients are converted into runlength representations and then coded using VLC (Huffman coding for each pair of symbols) – Symbol: (Zero run-length, non-zero value range) • Other information is also coded using VLC (Huffman coding) MPEG-1 • Finalized in ~1991 • Audio/video on CD-ROM (1.5 Mbps, CIF: 352x240, 30 fps). – Maximum: 1.856 mbps, 768x576 pels – Progressive frames only • Prompted explosion of digital video applications: MPEG1 video CD and downloadable video over Internet • Software only decoding, made possible by the introduction of Pentium chips, key to the success in the commercial market • MPEG-1 Audio – Offers 3 coding options (3 layers), higher layers have higher coding efficiency with more computations – MP3 = MPEG1 layer 3 audio MPEG-1 vs H.261 • Developed at about the same time • Must enable random access (Fast forward/rewind) – Using GOP structure with periodic I-picture and P-picture • Not for interactive applications – Does not have as stringent delay requirement • Fixed rate (1.5 Mbps), good quality (VHS equivalent) – SIF video format (similar to CIF) • CIF: 352x288, SIF: 352x240 – Using more advanced motion compensation • Half-pel accuracy motion estimation, range up to +/- 64 – Using bi-directional temporal prediction • Important for handling uncovered regions MPEG-1 GOP Encoding order: 1 4 2 3 8 5 6 7 MPEG-1 coder H.263 • Targeted for visual telephone over PSTN or Internet • Enable video phone over regular phone lines (28.8 Kbps) or wireless modem • Developed later than H.261, can accommodate computationally more intensive options – Initial version (H.263 baseline): 1995 – H.263+: 1997 – H.263++: 2000 • Result: Significantly better quality at lower rates – Better video at 18-24 Kbps than H.261 at 64 Kbps H.263 (some of the ) H.263 improvements over H.261 • Better motion estimation – half-pel accuracy motion estimation with bilinear interpolation filter – larger motion search range [-31.5,31], and unrestricted MV at boundary blocks – more efficient predictive coding for MVs (median prediction using three neighbors) – overlapping block motion compensation (option) – variable block size: 16x16 -> 8x8, 4 MVs per MB (option) – use bidirectional temporal prediction (PB picture) (option) • 3-D VLC for DCT coefficients (runlength, value, EOB) • Syntax-based arithmetic coding (option; at 50% more computations) H.263 and beyond - Aimed particularly at video coding for low bit rates (typically 2030 Kbps and above). - Similar to that used by H.261, however with some improvements and changes to improve performance and error recovery. - Main differences: - Half pixel precision is used for motion compensation - Four optional negotiable options - Unrestricted Motion Vectors - Syntax-based arithmetic coding, - Advance prediction, and - forward and backward frame prediction (similar to MPEG called P-B frames) - Five resolutions instead of two Further improvements in H.263+ and H.264 H.263 Example: MissAmerica Description Average PSNR(dB) Compr. Ratio Original, 30fps 1:1 n/a 10fps, 20Kbps 139:1 29.79 10fps, 100Kbps 29:1 36.0 Bitrate (Kbit/s) 9124 21.83 105.47 MPEG-2 MPEG-2: finalized in 1994 » Field-interlaced video » Levels and profiles • Profiles: Define bit stream scalability and color space resolutions • Levels: Define image resolutions and maximum bit-rate per profile MPEG-2 • A/V broadcast (TV, HDTV, Terrestrial, Cable, Satellite, High Speed Inter/Intranet) as well as DVD video • 4~8 Mbps for TV quality, 10-15 for better quality at SDTV resolutions (BT.601) • 18-45 Mbps for HDTV applications – MPEG-2 video high profile at high level is the video coding standard used in HDTV • Test in 11/91, Committee Draft 11/93 • Consist of various profiles and levels • Backward compatible with MPEG1 • MPEG-2 Audio – Support 5.1 channel – MPEG2 AAC: requires 30% fewer bits than MPEG1 layer 3 MPEG-2 vs MPEG-1 • MPEG1 only handles progressive sequences (SIF). • MPEG2 is targeted primarily at interlaced sequences and at higher resolution (BT.601 = 4CIF). • More sophisticated motion estimation methods (frame/field prediction mode) are developed to improve estimation accuracy for interlaced sequences. - Frame Motion Vectors: one motion vector is generated per MB in each direction, which corresponds to a 16x16 pels luminance area. - Field Motion Vectors: two motion vectors per MB is generated for each direction, one for each of the fields. Each vector corresponds to a 16x8 pels luminance area. • Different DCT modes and scanning methods are developed for interlaced sequences. • MPEG2 has various scalability modes. • MPEG2 has various profiles and levels, each combination targeted for different application MPEG-2 scalability • Data partition – All headers, MVs, first few DCT coefficients in the base layer – Can be implemented at the bit stream level – Simple • SNR scalability – Base layer includes coarsely quantized DCT coefficients – Enhancement layer further quantizes the base layer quantization error – Relatively simple • Spatial scalability – Complex • Temporal scalability – Simple SNR scalability Spatial scalability temporal scalability or MPEG-2 profiles and levels Profiles: tools Levels: parameter range for a given profile Main profile at main level (mp@ml) is the most popular, used for digital TV Main profile at high level (mp@hl): HDTV 4:2:2 at main level (4:2:2@ml) is used for studio production MPEG-4 New features » Provides technologies to view access and manipulate objects rather than pixels » Entire scene is decomposed into multiple objects – Object segmentation is the most difficult task! – But this does not need to be standardized ☺ » Each object is specified by its shape, motion, and texture (color) - Shape and texture both changes in time (specified by motion) - Texture encoding is done with DCT (8x8 pixel blocks) or Wavelets » MPEG-4 assumes the encoder has a segmentation map available, specifies how to code (actually decode!) shape, motion and texture MPEG-4 Example of Scene Composition Object-Based Coding MPEG-4 MPEG-4 block diagram MPEG-4 MPEG-4 – Coding Tools » Shape coding: Binary or Gray Scale » Motion Compensation: Similar to H.263, Overlapped mode is supported » Texture Coding: Block-based DCT and Wavelets for Static Texture – Type of Video Object Planes (VOPs) » I-VOP: VOP is encoded independently of any other VOPs » P-VOP: Predicted VOP using another previous VOP and motion compensation » B-VOP: Bidirectional Interpolated VOP using other I-VOPs or P-VOPs » Similar concept to MPEG-2 Mesh Animation • An object can be described by an initial mesh and MVs of the nodes in the following frames • MPEG-4 defines coding of mesh geometry, but not mesh generation Body and Face Animation • MPEG-4 defines a default 3-D body model (including its geometry and possible motion) through body definition table (BDP) • The body can be animated using the body animation parameters (BAP) • Similarly, face definition table (FDP) and face animation parameters (FAP) are specified for a face model and its animation • E.g. eye blink (FAP19) Text-to-Speech Synthesis with Face Animation Others… • Sprite – Code a large background in the beginning of the sequence, plus affine mappings, which map parts of the background to the displayed scene at different time instances – Decoder can vary the mapping to zoom in/out, pan left/right • Global motion compensation – Using 8-parameter projective mapping – Effective for sequences with large global motion • Quarter-pixel motion estimation • DivX: - based on MPEG-4 - can reduce an MPEG-2 video (the same format used for DVD and pay per view) to 10 percent of its original size (so that a DVD can be recorded on a CD) - audio is normally coded using MP3 MPEG-7 MPEG-1/2/4 make content available, whereas MPEG-7 allows you to find the content you need! – A content description standard » Video/images: Shape, size, texture, color, movements and positions, etc… » Audio: Key, mood, tempo, changes, position in sound space, etc… – Applications: » Digital Libraries » Multimedia Directory Services » Broadcast Media Selection » Editing, etc… Example: Draw an object and be able to find object with similar characteristics. Play a note of music and be able to find similar type of music MPEG-21 Aims at standardizing interfaces and tools to facilitate the exchange of multimedia resources across heterogeneous devices, networks and users. More specifically, it standardizes requisite elements for packaging, identifying, adapting and processing these resources as well as managing their usage rights. This framework will benefit the entire consumption chain from creators and rights holders to service providers and consumers. Basic unit of transaction in the MPEG-21 Multimedia Framework: the Digital Item, which packages resources along with identifiers, metadata, licenses and methods that enable interaction with the Digital Item. Another key concept : the User, i.e. any entity that interacts in the MPEG-21 environment or makes use of Digital Items. MPEG-21 MPEG-21 can be seen as providing a framework in which one User interacts with another User and the object of that interaction is a Digital Item. Some example interactions include content creation, management, protection, archiving, adaptation, delivery and consumption. MPEG-A MPEG’s Multimedia Application Formats (MAF) provide the framework for integration of elements from several MPEG standards into a single specification that is suitable for specific, but widely usable applications. Typically, MAFs specify how to combine metadata with timed media information for a presentation in a well-defined format that facilitates interchange, management, editing, and presentation of the media. The presentation may be ‘local’ to the system or may be via a network or other stream delivery mechanism. MPEG-A MAF specifications shall integrate elements from different MPEG standards into a single specification that is useful for specific but very widely used applications. Examples are delivering music, pictures or home videos. MAF specifications may use elements from MPEG-1, MPEG-2, MPEG-4, MPEG-7 and MPEG-21. Typically, MAF specifications include: - The ISO File Format family for storage - A simple MPEG-7 tool set for Metadata - One or more coding Profiles for representing the Media - Tools for encoding metadata in either binary or XML form MPEG-A MAFs may specify use of: - MPEG-21 Digital Item Declaration Language for representing the Structure of the Media and the Metadata - Other MPEG-21 tools - non-MPEG coding tools (e.g., JPEG) for representation of "non-MPEG" media - Elements from non-MPEG standards that are required to achieve full interoperability MPEG-A: 2 examples 3on4: - MP3, is one of the most widely used MPEG standards. Currently, the ID3 simply appends simple metadata tags such as Artist, Album, Song Title, etc. -MPEG-4 specifies what MPEG expects to be another very successful specification, the MPEG-4 File Format, while MPEG7 specifies not only signal-derived meta-data, but also archival meta-data such as Artist, Album and Song Title. - As such, MPEG-4 and MPEG-7 represent an ideal environment to support the current “MP3 music library” user experience, and, moreover, to extend that experience in new directions. MPEG-A: 2 examples Jon4 - Digital Cameras -> library with thousands of digital photos - Search for photographs of interest can be difficult -> - Need for provision of suitable metadata: photo content (e.g. the subject being photographed), author, shoot location, imaging parameters, etc, stored in a standardized format - The EXIF standard (commonly adopted by camera manufacturers) does not support advanced metadata. MPEG-7 defines rich metadata descriptions for still images, audio and also provides associated systems tools (file formats, etc) As such, MPEG-7 and MPEG-4 file format represent an ideal environment to support the current “Digital Photos Library” user experience Summary (1/2) • H.261: – First video coding standard, targeted for video conf. over ISDN – Uses block-based hybrid coding framework with integer-pel MC • H.263, H.264… – Improved quality at lower bit rate, to enable video conferencing/telephony below 54 Kbps (modems or internet access, desktop conferencing); half-pixel MC • MPEG-1 video – Video on CD and video on the Internet (good quality at 1.5 Mbps) – Half-pixel MC and bidirectional MC • MPEG-2 video – TV/HDTV/DVD (4-15 Mbps) – Extended from MPEG-1, considering interlaced video Summary (2/2) • MPEG-4 – To enable object manipulation and scene composition at the decoder -> interactive TV/virtual reality – Object-based video coding: shape coding – Coding of synthetic video and audio: animation • MPEG-7 – To enable search and browsing of multimedia documents – Defines the syntax for describing the structural and conceptual content • MPEG-21: beyond MPEG-7, considering intellectual property protection, etc. • MPEG-A: integration of elements from different MPEG standards into a single specification that is useful for specific but very widely used applications