Overview of the Stereo and Multiview Video Coding Extensions of the H.264/MPEG-4 AVC Standard By Anthony Vetro, Fellow IEEE, Thomas Wiegand, Fellow IEEE, and Gary J. Sullivan, Fellow IEEE PROCEEDINGS OF THE IEEE 2011 The Emerging MVC Standard for 3D Video Services Ying Chen, Ye-KuiWang, Kemal Ugur, MiskaM. Hannuksela, Jani Lainema, andMoncef Gabbouj EURASIP Journal on Advances in Signal Processing 2009 Outline Introduction Multiview Scenarios and Applications Standardization Requirements H.264/MPEG-4 AVC Basics Extending H.264/MPEG-4 AVC for Multiview Frame-Compatible Stereo Encoding Formats Conclusion and Future Work 3 Introduction Joint Video Team of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) standardize an extension of H.264/MPEG-4 that is referred to as Multiview Video Coding (MVC) MVC provides a compact representation for multiple views of a video scene, stereopaired video for 3-D viewing The stereo high profile of the MVC extension was selected by the Blu-Ray Disc Association as the coding format for 3-D video with high-definition resolution The system level integration of MVC is more challenging as the decoder output may contain more than one view and can consist of any combination of the views with any temporal level Various “frame compatible” approaches for support of stereo-view video as an alternative to MVC are discussed 4 Multiview Scenarios and Applications 5 Multiview Scenarios and Applications Free-viewpoint video, the viewpoint can be interactively changed There exist several candidate views for the viewer, one of them is selected as the target view Decoder focus on decoding target view Efficient switching between different view 3-D TV, more than one view is decoded and display simultaneously Stereoscopic video Classic stereo systems that require special-purpose glasses Auto-stereoscopic displays that do not require glasses 3-D video Multiple actual or rendered views of the scene are presented to the viewer, e.g. using ‘virtual reality’ glasses or an advanced auto-stereoscopic display, so that view changes with head movements and the viewer has the feeling of immersion in the 3-D Parallel processing of different views and flexible stream adaption 6 Multiview Scenarios and Applications Teleconference applications Both interactivity and virtual reality Rendering of 3-D TV content or view synthesis Depth information is needed 2-D TV or HDTV application are still dominating the market MVC content should provide a way for those 2-D decoders to generate a display from an MVC bitstream 7 Standardization Requirements High compression efficiency Huge amount of data in MVC Enable Inter-view prediction Efficient memory management of decoded pictures Hierarchical temporal scalability was found to be efficient for MVC Significant gain compared to independent compression of each view Random access Ensure that any image can be accessed, decoded, and displayed by starting the decoder at a random access point and decoding a relatively small quantity of data on which that image may depend Insertion intra coded pictures View-switching random access 8 Standardization Requirements Typical MVC prediction structure IDR anchor back 9 Standardization Requirements Scalability The ability of a decoder to access only a portion of a bitstream still being able to generate effective video output – reduce temporal or spatial resolution Temporal scalability and View-scalability Adaption of user preference, network bandwidth, decoder complexity Decoder resource consumption A number of views are to be decoded and display Optimal decoder in terms of memory and complexity is very important to make realtime decoding possible Parallel processing In 3-D TV, multiple views need to be decoded simultaneously Reduce computation time to achieve real-time decoding 10 Standardization Requirements Backward compatibility A subset of the MVC bitstream corresponding to “base view” needs to be decodable by an ordinary H.264/MPEG-4 AVC decoder, and other data representing other views should be encoded in a way that will not affect base view decoding Achieving a desired degree quality consistency among views or to select a preferential quality for encoding some views versus others Convey camera parameters along with the bitstream in order to support intermediate view interpolation at the decoder MVC share some design principles with SVC, such as backward compatibility with H.264/AVC, temporal scalability, network friendly adaption… New mechanisms include view scalability, interview prediction structure, coexisting of decoded pictures from multiple dimensions in the decoded picture buffer, multiple representation in the display, parallel decoding at decoder 11 H.264/MPEG-4 AVC BASICS H.264/AVC covers Video coding layer (VCL) : creates a coded representation of the source content Network abstraction layer (NAL) : formats these data and provides header information Network Abstraction Layer (NAL) A coded H.264/MPEG-4 AVC video data stream is organized into NAL units, which are packets that contain an integer number of bytes VCL NAL units Type Payload Picture content (coded slices or slice data) Non-VCL NAL units Parameter sets Sequence Parameter Set (SPS) : sequence-level header information Picture Parameter Set (PPS) : infrequently changing picture-level header information SEI messages Do not affect the core decoding process Assist the decoding process or subsequent processing (bitstream manipulation or display) 12 H.264/MPEG-4 AVC BASICS 13 H.264/MPEG-4 AVC Basics The set of consecutive NAL units associated with a single coded picture is referred to as an access unit A set of consecutive access unit with certain properties is referred to as a coded video sequence A coded video sequence represents an independently decodable part of a video stream and always starts with an instantaneous decoding refresh (IDR) access unit Video coding Layer (VCL) Reference picture buffering and the associated buffering memory control The behavior of the decoded picture buffer (DBP) can be adaptively controlled by memory management control operation (MMCO) commands The reference picture lists that are used for coding of P or B slices can be arbitrarily constructed from the pictures available in the DPB via reference picture list modification (RPLM) commands 14 Extending H.264/MPEG-4 AVC for Multiview Bitstream Structure Compress multiview stream to include a “base view” bitstream, which is coded independently from all other views in a manner compatible with decoders for single-view profile of the standard There are useful properties of the coded pictures in the H.264/AVC-compliant base view, such as temporal level, which are not indicated in the VCL NAL units of H.264/AVC. To indicate those properties, the prefix NAL unit has been introduced Coded pictures from different views may use different SPS that contain the view dependency information for inter-view prediction New syntax elements include view_id : identifier of each view temporal_id : temporal scalability hierarchy priority_id : used for the simple one-path bitstream adaption process anchor_pic_flag : indicate a picture is an anchor picture or non-anchor picture idr_flag : indicate a picture is IDR picture or not inter_view_flag : indicate a decoded picture is used for inter-view reference or not 15 Extending H.264/MPEG-4 AVC for Multiview Extension NAL unit type back 16 Extending H.264/MPEG-4 AVC for Multiview Enabling Inter-View Prediction Exploit both temporal and spatial redundancy The flexible reference picture management capabilities that had already been designed into H.264/MPEG-4 AVC Making the decoded pictures from other views available in the reference picture lists for use by the inter-picture prediction processing MVC design does not allow the prediction of a picture in one view at a given time using a picture from another view at different time Inter-view prediction may be used for encoding the non-base view of an IDR picture 17 Extending H.264/MPEG-4 AVC for Multiview Additional picture type: anchor picture Similar to IDR that they do not use temporal prediction , do allow inter-view prediction It is prohibited for any picture that follows anchor picture to use any picture that precedes the anchor picture as a reference for inter-picture prediction Provides a clean random access point for access to a given view High-Level Syntax Three important pieces of information are carried in the SPS extension View identification Total number of views A listing of view identifiers (ex: 0-2-1) View dependency information The number of inter-view reference pictures for list0/list1 The views that may be used for predicting a particular view (ex: view1 use view0 and view2 as reference views) Separate for anchor and non-anchor pictures 18 Extending H.264/MPEG-4 AVC for Multiview Level index for operation point Indicator of the resource requirements for a decoder that conforms to a particular level A specific temporal subset and a set of views including those intended for output and the views that they depend Multiple level values could be signaled as part of the SPS extension, with each level being associated with a particular operating point The syntax indicates the number of views that are targeted for output as well as the number of views that would be required for decoding particular operating points Profiles and Levels Profiles Determine the subset of coding tools that must be supported by conforming decoders Based on the high profile of H.264/MPEG-4 AVC Multiview high profile Supports multiple views Does not support interlace coding tools Stereo high profile Two views Support interlace coding tools 19 Extending H.264/MPEG-4 AVC for Multiview 20 Extending H.264/MPEG-4 AVC for Multiview Levels Constrains on the bitstreams produced by MVC encoders, to establish bounds on the necessary decoder resources and complexity Limit on the amount of frame memory required for the decoding of a bitstream The maximum throughput in terms of macroblocks per second Maximum picture size Overall bitrate Coding Performance 20%-30% bitrate saving 2-3 dB gains 21 Extending H.264/MPEG-4 AVC for Multiview SEI Message for Multiview Video Parallel decoding information SEI message MVC scalable nesting SEI message Indicate the scope of views or temporal levels for which the message apply Reuse the syntax of H.264/AVC SEI messages for a specific set of views and temporal level View scalability information SEI message The mapping between each operation point and the required NAL units Signal profile and level for each operation point which is identified by the view_id Multiview scene information SEI message Multiview acquisition information SEI message Signal camera parameters, which are helpful in view interpolation by a renderer Nonrequired view component SEI message Indicates that a particular view component is not needed for decoding Operation point : a subset bitstream which identified by the combination of required view_id and temporal_id values. 22 Extending H.264/MPEG-4 AVC for Multiview Parallel coding of multiple views 3-D broadcasting use cases, display need to output many views simultaneously to support head-motion parallax Use Parallel decoding information SEI message indicates the video is encoded in a way that macroblock in view 1 picture could only use reconstruction values of macroblocks that belong to certain rows in view 0 picture 23 Extending H.264/MPEG-4 AVC for Multiview View dependency change SEI message Signal changes in the view dependency structure Operation point not present SEI message Indicates operation points that are not present in the bitstream Useful in streaming and networking scenarios Base view temporal HRD SEI message Associated with an IDR access unit Signal information relevant to the hypothetical reference decoder (HRD) parameters associated with the base view HRD: a virtual buffering algorithm that can be used to test the behavior of the coded bitstream and its effect on a real decoder 24 Frame-Compatible Stereo Encoding Formats Frame-compatible formats refer to a class of stereo video formats in which the two stereo views are essentially multiplexed into a single coded frame or sequence of frames Other common names include stereo interleaving or spatial/temporal multiplexing formats 25 Frame-Compatible Stereo Encoding Formats Basic Principles With a frame-compatible format, the left and right views are packed together in the samples of a single video frame Half of the coded samples represent the left view and other half represent the right view Each coded view has half the resolution of the full coded frame Temporal multiplexing The left and right views would be interleaved as alternating frames or field of a coded sequence Frame-compatible formats have received considerable attention from the broadcast industry since the coded video can be processed by encoders and decoders that were not specially designed to handle stereo video Only the final display stage requires some customization for recognizing and properly rendering the video to enable a 3-D viewing experience 26 Frame-Compatible Stereo Encoding Formats The drawback of representing the stereo signal in this way is that spatial or temporal resolution would be only half of that used for 2-D video with the same encoded resolution Key additional issue with frame-compatible formats is distinguishing the left and right views Signaling The signaling for a complete set of frame-compatible formats has been standardized within the H.264/MPEG-4 AVC standard as SEI messages Frame packing arrangement (FPA) SEI message was specified in an amendment of the H.264/MPEG-4 AVC standard 27 Conclusion and Future Work Three-dimensional video has drawn significant attention recently. The efficient representation and compression of stereo and multiview video is a central component of any 3-D or multiview This paper reviewed the recent extensions H.264/MPEG-4 AVC standard that support 3-D stereo and multiview video The MVC standard includes stereo and multiview video by enabling inter-view prediction as well as temporal interpicture prediction Another important development has been the efficient representation, coding, and signaling of frame-compatible stereo video formats As the market evolves and new types of displays and services are offered, additional new technologies and standards will need to be introduced. The generation of the large number of views required by autostereoscopic displays would be needed. Solutions that consider the inclusion of depth map information for this purpose are a significant area of focus for future designs 28