Overview of the Stereo and Multiview Video Coding Extensions of

advertisement
Overview of the Stereo and
Multiview Video Coding Extensions
of the H.264/MPEG-4 AVC Standard
By Anthony Vetro, Fellow IEEE, Thomas Wiegand, Fellow IEEE, and
Gary J. Sullivan, Fellow IEEE
PROCEEDINGS OF THE IEEE 2011
The Emerging MVC Standard for 3D
Video Services
Ying Chen, Ye-KuiWang, Kemal Ugur, MiskaM. Hannuksela,
Jani Lainema, andMoncef Gabbouj
EURASIP Journal on Advances in Signal Processing 2009
Outline
 Introduction
 Multiview Scenarios and Applications
 Standardization Requirements
 H.264/MPEG-4 AVC Basics
 Extending H.264/MPEG-4 AVC for Multiview
 Frame-Compatible Stereo Encoding Formats
 Conclusion and Future Work
3
Introduction
 Joint Video Team of the ITU-T Video Coding Experts Group (VCEG) and the
ISO/IEC Moving Picture Experts Group (MPEG) standardize an extension of
H.264/MPEG-4 that is referred to as Multiview Video Coding (MVC)
 MVC provides a compact representation for multiple views of a video scene, stereopaired video for 3-D viewing
 The stereo high profile of the MVC extension was selected by the Blu-Ray Disc
Association as the coding format for 3-D video with high-definition resolution
 The system level integration of MVC is more challenging as the decoder output may
contain more than one view and can consist of any combination of the views with any
temporal level
 Various “frame compatible” approaches for support of stereo-view video as an
alternative to MVC are discussed
4
Multiview Scenarios and Applications
5
Multiview Scenarios and Applications
 Free-viewpoint video, the viewpoint can be interactively changed
 There exist several candidate views for the viewer, one of them is selected as the
target view
 Decoder focus on decoding target view
Efficient switching between different view
 3-D TV, more than one view is decoded and display simultaneously
 Stereoscopic video
 Classic stereo systems that require special-purpose glasses
 Auto-stereoscopic displays that do not require glasses
 3-D video
 Multiple actual or rendered views of the scene are presented to the viewer, e.g. using
‘virtual reality’ glasses or an advanced auto-stereoscopic display, so that view changes
with head movements and the viewer has the feeling of immersion in the 3-D
Parallel processing of different views and flexible stream adaption
6
Multiview Scenarios and Applications
 Teleconference applications
 Both interactivity and virtual reality
 Rendering of 3-D TV content or view synthesis
 Depth information is needed
 2-D TV or HDTV application are still dominating the market
MVC content should provide a way for those 2-D decoders to generate a display from
an MVC bitstream
7
Standardization Requirements
 High compression efficiency
 Huge amount of data in MVC
 Enable Inter-view prediction
 Efficient memory management of decoded pictures
 Hierarchical temporal scalability was found to be efficient for MVC
 Significant gain compared to independent compression of each view
 Random access
 Ensure that any image can be accessed, decoded, and displayed by starting the
decoder at a random access point and decoding a relatively small quantity of data
on which that image may depend
 Insertion intra coded pictures
 View-switching random access
8
Standardization Requirements
 Typical MVC prediction structure
IDR
anchor
back
9
Standardization Requirements
 Scalability
 The ability of a decoder to access only a portion of a bitstream still being able to
generate effective video output – reduce temporal or spatial resolution
 Temporal scalability and View-scalability
 Adaption of user preference, network bandwidth, decoder complexity
 Decoder resource consumption
 A number of views are to be decoded and display
 Optimal decoder in terms of memory and complexity is very important to make realtime decoding possible
 Parallel processing
 In 3-D TV, multiple views need to be decoded simultaneously
 Reduce computation time to achieve real-time decoding
10
Standardization Requirements
 Backward compatibility
 A subset of the MVC bitstream corresponding to “base view” needs to be decodable by
an ordinary H.264/MPEG-4 AVC decoder, and other data representing other views
should be encoded in a way that will not affect base view decoding
 Achieving a desired degree quality consistency among views or to select a
preferential quality for encoding some views versus others
 Convey camera parameters along with the bitstream in order to support
intermediate view interpolation at the decoder
 MVC share some design principles with SVC, such as backward compatibility
with H.264/AVC, temporal scalability, network friendly adaption…
 New mechanisms include view scalability, interview prediction structure,
coexisting of decoded pictures from multiple dimensions in the decoded picture
buffer, multiple representation in the display, parallel decoding at decoder
11
H.264/MPEG-4 AVC BASICS
 H.264/AVC covers
 Video coding layer (VCL) : creates a coded representation of the source content
 Network abstraction layer (NAL) : formats these data and provides header information
 Network Abstraction Layer (NAL)
 A coded H.264/MPEG-4 AVC video data stream is organized into NAL units, which are
packets that contain an integer number of bytes
 VCL NAL units
Type
Payload
 Picture content (coded slices or slice data)
 Non-VCL NAL units
 Parameter sets
 Sequence Parameter Set (SPS) : sequence-level header information
 Picture Parameter Set (PPS) : infrequently changing picture-level header information
 SEI messages

Do not affect the core decoding process

Assist the decoding process or subsequent processing (bitstream manipulation or display)
12
H.264/MPEG-4 AVC BASICS
13
H.264/MPEG-4 AVC Basics
 The set of consecutive NAL units associated with a single coded picture is referred to
as an access unit
 A set of consecutive access unit with certain properties is referred to as a coded video
sequence
 A coded video sequence represents an independently decodable part of a video
stream and always starts with an instantaneous decoding refresh (IDR) access unit
 Video coding Layer (VCL)
 Reference picture buffering and the associated buffering memory control
 The behavior of the decoded picture buffer (DBP) can be adaptively controlled by memory
management control operation (MMCO) commands
 The reference picture lists that are used for coding of P or B slices can be arbitrarily constructed
from the pictures available in the DPB via reference picture list modification (RPLM) commands
14
Extending H.264/MPEG-4 AVC for
Multiview
 Bitstream Structure
 Compress multiview stream to include a “base view” bitstream, which is coded
independently from all other views in a manner compatible with decoders for
single-view profile of the standard
 There are useful properties of the coded pictures in the H.264/AVC-compliant
base view, such as temporal level, which are not indicated in the VCL NAL units of
H.264/AVC. To indicate those properties, the prefix NAL unit has been introduced
 Coded pictures from different views may use different SPS that contain the view
dependency information for inter-view prediction
 New syntax elements include
 view_id : identifier of each view
 temporal_id : temporal scalability hierarchy
 priority_id : used for the simple one-path bitstream adaption process
 anchor_pic_flag : indicate a picture is an anchor picture or non-anchor picture
 idr_flag : indicate a picture is IDR picture or not
 inter_view_flag : indicate a decoded picture is used for inter-view reference or not
15
Extending H.264/MPEG-4 AVC for
Multiview
Extension NAL unit type
back
16
Extending H.264/MPEG-4 AVC for
Multiview
 Enabling Inter-View Prediction
 Exploit both temporal and spatial redundancy
 The flexible reference picture management capabilities that had already been
designed into H.264/MPEG-4 AVC
 Making the decoded pictures from other views available in the reference picture lists for
use by the inter-picture prediction processing
 MVC design does not allow the prediction of a picture in one view at a given time using a
picture from another view at different time
 Inter-view prediction may be used for encoding the non-base view of an IDR
picture
17
Extending H.264/MPEG-4 AVC for
Multiview
 Additional picture type: anchor picture
 Similar to IDR that they do not use temporal prediction , do allow inter-view prediction
 It is prohibited for any picture that follows anchor picture to use any picture that precedes
the anchor picture as a reference for inter-picture prediction
 Provides a clean random access point for access to a given view
 High-Level Syntax
 Three important pieces of information are carried in the SPS extension
 View identification
Total number of views
A listing of view identifiers (ex: 0-2-1)
 View dependency information
The number of inter-view reference pictures for list0/list1
The views that may be used for predicting a particular view (ex: view1 use view0 and view2 as
reference views)
Separate for anchor and non-anchor pictures
18
Extending H.264/MPEG-4 AVC for
Multiview
 Level index for operation point
Indicator of the resource requirements for a decoder that conforms to a particular level
A specific temporal subset and a set of views including those intended for output and the views
that they depend
Multiple level values could be signaled as part of the SPS extension, with each level being
associated with a particular operating point
The syntax indicates the number of views that are targeted for output as well as the number of
views that would be required for decoding particular operating points
 Profiles and Levels
 Profiles
 Determine the subset of coding tools that must be supported by conforming decoders
 Based on the high profile of H.264/MPEG-4 AVC
 Multiview high profile
 Supports multiple views
 Does not support interlace coding tools
 Stereo high profile
 Two views
 Support interlace coding tools
19
Extending H.264/MPEG-4 AVC for
Multiview
20
Extending H.264/MPEG-4 AVC for
Multiview
 Levels
 Constrains on the bitstreams produced by MVC encoders, to establish bounds on the
necessary decoder resources and complexity
 Limit on the amount of frame memory required for the decoding of a bitstream
 The maximum throughput in terms of macroblocks per second
 Maximum picture size
 Overall bitrate
 Coding Performance
 20%-30% bitrate saving
 2-3 dB gains
21
Extending H.264/MPEG-4 AVC for
Multiview
 SEI Message for Multiview Video
 Parallel decoding information SEI message
 MVC scalable nesting SEI message
 Indicate the scope of views or temporal levels for which the message apply
 Reuse the syntax of H.264/AVC SEI messages for a specific set of views and temporal level
 View scalability information SEI message
 The mapping between each operation point and the required NAL units
 Signal profile and level for each operation point which is identified by the view_id
 Multiview scene information SEI message
Multiview acquisition information SEI message
 Signal camera parameters, which are helpful in view interpolation by a renderer
 Nonrequired view component SEI message
 Indicates that a particular view component is not needed for decoding
Operation point : a subset bitstream which identified by the combination of required view_id
and temporal_id values.
22
Extending H.264/MPEG-4 AVC for
Multiview
 Parallel coding of multiple views
 3-D broadcasting use cases, display need to output many views simultaneously to
support head-motion parallax
 Use Parallel decoding information SEI message indicates the video is encoded in a
way that macroblock in view 1 picture could only use reconstruction values of
macroblocks that belong to certain rows in view 0 picture
23
Extending H.264/MPEG-4 AVC for
Multiview
 View dependency change SEI message
 Signal changes in the view dependency structure
 Operation point not present SEI message
 Indicates operation points that are not present in the bitstream
 Useful in streaming and networking scenarios
 Base view temporal HRD SEI message
 Associated with an IDR access unit
 Signal information relevant to the hypothetical reference decoder (HRD) parameters
associated with the base view
HRD: a virtual buffering algorithm that can be used to test the behavior of the coded
bitstream and its effect on a real decoder
24
Frame-Compatible Stereo Encoding
Formats
 Frame-compatible formats refer to a class of stereo video formats in which
the two stereo views are essentially multiplexed into a single coded frame
or sequence of frames
 Other common names include stereo interleaving or spatial/temporal
multiplexing formats
25
Frame-Compatible Stereo Encoding
Formats
 Basic Principles
 With a frame-compatible format, the left and right views are packed together in
the samples of a single video frame
 Half of the coded samples represent the left view and other half represent the
right view
 Each coded view has half the resolution of the full coded frame
 Temporal multiplexing
 The left and right views would be interleaved as alternating frames or field of a coded
sequence
 Frame-compatible formats have received considerable attention from the
broadcast industry since the coded video can be processed by encoders and
decoders that were not specially designed to handle stereo video
 Only the final display stage requires some customization for recognizing and
properly rendering the video to enable a 3-D viewing experience
26
Frame-Compatible Stereo Encoding
Formats
 The drawback of representing the stereo signal in this way is that spatial or
temporal resolution would be only half of that used for 2-D video with the same
encoded resolution
 Key additional issue with frame-compatible formats is distinguishing the left and
right views
 Signaling
 The signaling for a complete set of frame-compatible formats has been
standardized within the H.264/MPEG-4 AVC standard as SEI messages
 Frame packing arrangement (FPA) SEI message was specified in an amendment of
the H.264/MPEG-4 AVC standard
27
Conclusion and Future Work
 Three-dimensional video has drawn significant attention recently. The efficient
representation and compression of stereo and multiview video is a central
component of any 3-D or multiview
 This paper reviewed the recent extensions H.264/MPEG-4 AVC standard that
support 3-D stereo and multiview video
 The MVC standard includes stereo and multiview video by enabling inter-view
prediction as well as temporal interpicture prediction
 Another important development has been the efficient representation, coding, and
signaling of frame-compatible stereo video formats
 As the market evolves and new types of displays and services are offered, additional
new technologies and standards will need to be introduced.
 The generation of the large number of views required by autostereoscopic displays would be needed.
 Solutions that consider the inclusion of depth map information for this purpose are a significant area of
focus for future designs
28
Download