Project Proposal on Topic: Scalable Extension of HEVC UNDER THE GUIDANCE OF DR. K. R. RAO COURSE: EE5359 MULTIMEDIA PROCESSING, SPRING 2015 Submitted By: Aanal Desai UT ARLINGTON ID: 1001103728 EMAIL ID: aanal.desai@mavs.uta.edu DEPARTMENT OF ELECTRICAL ENGINEERING UNIVERSITY OF TEXAS, ARLINGTON Overview • An increasing demand for video streaming to mobile devices such as smartphones, tablet computers, or notebooks and their broad variety of screen sizes and computing capabilities stimulate the need for a scalable extension. • Modern video transmission systems using the Internet and mobile networks are typically characterized by a wide range of connection qualities, which are a result of the used adaptive resource sharing mechanisms. In such diverse environments with varying connection qualities and different receiving devices, a flexible adaptation of onceencoded content is necessary[2]. • The objective of a scalable extension for a video coding standard is to allow the creation of a video bitstream that contains one or more subbitstreams, that can be decoded by themselves with a complexity and reconstruction quality comparable to that achieved using single-layer coding with the same quantity of data as that in the sub-bitstream[2]. Introduction • SHVC provides a 50% bandwidth reduction for the same video quality when compared to the current H.264/AVC standard. SHVC further offers a scalable format that can be readily adapted to meet network conditions or terminal capabilities. Both bandwidth saving and scalability are highly desirable characteristics of adaptive video streaming applications in bandwidth-constrained, wireless networks[3]. • The scalable extension to the current H.264/AVC [4] video coding standard (H.264/SVC) [8] provided resources of readily adapting encoded video stream to meet receiving terminal's resource constraints or prevailing network conditions. • The JCT-VC is now developing the scalable extension (SHVC) [5] to HEVC in order to bring similar benefits in terms of terminal constraint and network resource matching as H.264/SVC does, but with a significantly reduced bandwidth requirement[3] Types of Scalabilities • Temporal, Spatial and SNR Scalabilities • Spatial scalability and temporal scalability defines cases in which a sub-bitstream represents the source content with a reduced picture size (or spatial resolution) and frame rate (or temporal resolution), respectively[1]. • Quality scalability, which is also referred to as signal-tonoise ratio (SNR) scalability or fidelity scalability, the subbitstream delivers the same spatial and temporal resolution as the complete bitstream, but with a lower reproduction quality and, thus, a lower bit rate[2]. High-Level Block Diagram of the Proposed Encoder Fig.1 [1] Inter-layer Intra prediction • A block of the enhancement layer is predicted using the reconstructed (and up-sampled) base layer signal.[2] • Inter-layer motion prediction:- The motion data of a block are completely inferred using the (scaled) motion data of the co-located base layer blocks, or the (scaled) motion data of the base layer are used as an additional predictor for coding the enhancement layer motion. [2] • Inter-layer residual prediction:- The reconstructed (and upsampled) residual signal of the co-located base layer area is used for predicting the residual signal of an inter-picture coded block in the enhancement layer, while the motion compensation is applied using enhancement layer reference pictures[2]. Intra-BL prediction • To utilize reconstructed base layer information, two Coding Unit (CU) level modes, namely intra-BL and intra-BL skip, are introduced[1]. • The first scalable coding tool in which the enhancement layer prediction signal is formed by copying or up-sampling the reconstructed samples of the co-located area in the base layer is called Intra-BL prediction mode. [2] • For an enhancement layer CU, the prediction signal is formed by copying or, for spatial scalable coding, up-sampling the co-located base layer reconstructed samples. Since the final reconstructed samples from the base layer are used, multi-loop decoding architecture is essential. [2] • When a CU in the EL picture is coded by using the intra-BL mode, the pixels in the collocated block of the up-sampled BL are used as the prediction for the current CU. [1] Fig.2 Intra BL mode [2] Intra residual prediction • In the intra residual prediction mode, the difference between the intra prediction reference samples in the EL and collocated pixels in the upsampled BL is generally used to produce a prediction, denoted as difference prediction, based on the intra prediction mode. The generated difference prediction is further added to the collocated block in the upsampled BL to form the final prediction.[1] Fig.3 [1] Weighted Intra prediction • In this mode, the (upsampled) base layer reconstructed signal constitutes one component for prediction. Another component is acquired by regular spatial intra prediction as in HEVC, by using the samples from the causal neighborhood of the current enhancement layer block. The base layer component is low pass filtered and the enhancement layer component is high pass filtered and the results are added to form the prediction.[2] • The weights for the base layer signal are set such that the low frequency components are taken and the high frequency components are suppressed, and the weights for the enhancement layer signal are set vice versa. The weighted base and enhancement layer coefficients are added and an inverse DCT is computed to obtain the final prediction[2]. • In our implementation, both low pass and high pass filtering happen in the DCT domain, as illustrated in Figure 8. First, the DCTs of the base and enhancement layer prediction signals are computed and the resulting coefficients are weighted according to spatial frequencies.[2] Fig.4 : Weighted intra prediction mode. [2] Intra Prediction • In the intra difference prediction, the (up-sampled) base layer reconstructed signal constitutes one component for the prediction. The intra prediction modes that are used for spatial intra prediction of the difference signal are coded using the regular HEVC syntax. [2] Fig.5 Intra difference prediction mode. [2] Motion vector prediction • Our scalable video extension of HEVC employs several methods to improve the coding of enhancement layer motion information by exploiting the availability of base layer motion information[2] • In the offered scheme, collocated base layer MVs are used in both the merge mode and the AMVP mode for enhancement layer coding. The base layer MV is inserted as the first candidate in the merge candidate list and added after the temporal candidate in the AMVP candidate list. The MV at the center position of the collocated block in the base layer picture is used in both merge and AVMP modes[1]. • In HEVC, the motion vectors are compressed after being coded and the compressed motion vectors are utilized in the TMVP derivation for pictures that are coded later. [1] References [1] IEEE paper by Jianle Chen, Krishna Rapaka, Xiang Li, Vadim Seregin, Liwei Guo, Marta Karczewicz, Geert Van der Auwera, Joel Sole, Xianglin Wang, Chengjie Tu, Ying Chen, Rajan Joshi “ Scalable Video coding extension for HEVC”. Qualcomm Technology Inc, Data compression conference (DCC)2013, DOC 20-22 March 2013 [2] IEEE paper by, Philipp Helle, Haricharan Lakshman, Mischa Siekmann, Jan Stegemann, Tobias Hinz, Heiko Schwarz, Detlev Marpe, and Thomas Wiegand Fraunhofer Institute for Telecommunications – Heinrich Hertz Institute, Berlin, Germany. “ScalableVideo coding extension of HEVC” Data compression conference (DCC)2013, DOC 20-22 March 2013 { T. Hinz et al, “An HEVC Extension for Spatial and Quality Scalable Video Coding”, Proceedings of SPIE, vol. 8666, pp. 866605-1 to 866605-16, Feb. 2013. } [3] IEEE paper “Scalable HEVC (SHVC)-Based Video Stream Adaptation in Wireless Networks” by James Nightingale, Qi Wang, Christos Grecos Centre for Audio Visual Communications & Networks (AVCN). 2013 IEEE 24th International Symposium on Personal, Indoor and Mobile Radio Communications: Services, Applications and Business Track [4] T. Weingand et al, "Overview of the H.264/AVC video coding standard," IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560-576, July 2003. [5] T. Hinz et al, "An HEVC extension for spatial and quality scalable video coding," Proc. SPIE Visual Information Processing and Communication IV, Feb. 2013. [6] B. Oztas et al, "A study on the HEVC performance over lossy networks," Proc. 19th IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp.785-788, Dec. 2012. [7] J. Nightingale et al, "HEVStream: a framework for streaming and evaluation of high efficiency video coding (HEVC) content in loss-prone networks," IEEE Trans. Consum. Electron., vol.58, no.2, pp.404-412, May 2012. [8] H.Schwarz et al, “Overview of the scalable extension of the H.264/AVC standard,”IEEE Trans Circuits Syst Video Technology, vol.17, pp.1103-1120,Sept 2007 [9] J. Nightingale et al, "Priority-based methods for reducing the impact of packet loss on HEVC encoded video streams," Proc. SPIE Real-Time Image and Video Processing 2013, Feb. 2013. [10] T.Schierl et al, “Mobile Video Transmission coding”, IEEE Trans. Circuits Syst. Video Technol., vol. 1217, Sept 2007. [11] J. Chen, K. Rapaka, X. Li, V. Seregin, L. Guo, M. Karczewicz, G. Van der Auwera, J. Sole, X. Wang, C. J. Tu, Y. Chen, “Description of scalable video coding technology proposal by Qualcomm (configuration 2)”, Joint Collaborative Team on Video Coding, doc. JCTVC- K0036, Shanghai, China, Oct. 2012. [12] ISO/IEC JTC1/SC29/WG11 and ITU-T SG 16, “Joint Call for Proposals on Scalable Video Coding Extensions of High Efficiency Video Coding (HEVC)”, ISO/IEC JTC 1/SC 29/WG 11 (MPEG) Doc. N12957 or ITU-T SG 16 Doc. VCEG-AS90, Stockholm, Sweden, Jul. 2012. [13] A. Segall, “BoG report on HEVC scalable extensions”, Joint Collaborative Team on Video Coding, doc. JCTVC-K0354, Shanghai, China, Oct. 2012 . [14] H. Schwarz, D. Marpe, T. Wiegand, “Overview of the Scalable Video Coding Extension of the H.264/AVC Standard”, IEEE Trans. Circuits and Syst. Video Technol., vol. 17, no. 9, pp. 11031120, 2007. [15] D. Hong, W. Jang, J. Boyce, A. Abbas, “Scalability Support in HEVC”, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JCTVC-F290, Torino, Italy, Jul. 2011. [16] G. J. Sullivan, J.-R. Ohm, W.-J. Han, T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE Trans. Circuits and Syst. Video Technol., to be published. [17] J. Boyce, D. Hong, W. Jang, A. Abbas, “Information for HEVC scalability extension,” Joint Collaborative Team on Video Coding, doc. JCTVC-G078, Nov. 2011. [18] G.J. Sullivan et al, “Standardized extensions of High Efficiency Video Coding (HEVC)”, IEEE JSTSP, vol. 7, no. 6, pp. 1001 – 1016, Dec. 2013. (H.265/HEVC) Tutorial by Madhukar Budagavi m.budagavi@samsung.com http://www.uta.edu/faculty/krrao/dip/Courses/EE5359/budagaviiscas2014ppt.pdf