InterimPPT - The University of Texas at Arlington

advertisement

A project proposal on

Residual DPCM for improving Inter Prediction in HEVC for Lossless Screen Content Coding

Under the guidance of Dr. K. R. Rao

For the fulfillment of the course Multimedia Processing (EE5359) Spring 2015 Submitted by

Siddu Basawaraj Pratapur

UTA ID: 1001053422 1

Table of Contents

 Objective of the Project  Basic Concepts of Video Coding • Color Spaces  H.265/ High Efficiency Video Coding • Introduction • Encoder and Decoder  Introduction to Screen Content Coding  Residual DPCM in HEVC Inter-predication • General considerations and the HEVC coding structure • General method for inter RDPCM • Additional tools for inter RDPCM 2

Progress..

 Test Configurations • Intra-only configuration • Low-delay configurations • Random-access configuration  Comparison Metrics • Peak Signal to Noise Ratio • Bjontegaard Delta Bit-rate (BD-BR) and Bjontegaard Delta PSNR (BD-PSNR) • Implementation Complexity  Test Sequences  Implementation • Configuration profiles used for comparison • Parameters modified • Sample command line parameters for HM-16.4+SCM-4.0RC1

 Graphs for test sequence parameters  References 3

List of Acronyms and Abbreviations

• • • • • • • • • • • • • • • AVC : Advanced Video Coding.

CABAC: Context Adaptive Binary Arithmetic Coding. CTB: Coding Tree Block.

CTU: Coding Tree Unit.

CU: Coding Unit.

CB : Coding Block DCT : Discrete Cosine Transform.

DBF: De-blocking Filter.

HEVC: High Efficiency Video Coding.

HM: HEVC Test Model.

HP : Hierarchical Prediction. JCT: Joint Collaborative Team.

JCT-VC: Joint Collaborative Team on Video Coding. JM: H.264 Test Model.

JPEG: Joint Photographic Experts Group.

4

• MV : Motion Vector.

• MC: Motion Compensation.

• ME: Motion Estimation.

• MPEG: Motion Picture Experts Group.

• PC : Prediction Chunking.

• PU : Prediction Units • PB: Prediction Block.

• QP: Quantization Parameter • RDPCM : Residual Differential Pulse code Modulation.

• SAO: Sample Adaptive Offset.

• TB: Transform Block.

• TU: Transform Unit.

• VCEG: Video Coding Experts Group.

5

Objective of the Project

• We propose the mathematical implementation of Inter Residual Differential Pulse Code Modulation (inter RDPCM) applied to motion compensated residuals in lossless screen content coding (SCC) scenarios.

• Two additional tools are proposed for inter RDPCM: Prediction Chunking (PC)[1] and Hierarchical Prediction (HP)[1].

• The simulation will be conducted using HM-16.4+SCM-4.0rc1/ [2], with different video sequences [3], search range, block sizes and number of frames using GPU multi-core computing.

6

Basic Concepts of Video Coding

Color Spaces :

RGB color space – Each pixel is represented by three numbers indicating the relative proportions of red, green and blue colors • YCrCb color space – Y is the luminance component, a monochrome version of color image. Y is a weighted average of R, G and B: Y = k

r

R + k

g

G + k

b B

where k are the weighting factors.

7

The popular patterns of sub-sampling [4] are: • 4:4:4 – The three components Y: Cr: Cb have the same resolution, which is for every 4 luminance samples there are 4 Cr and 4 Cb samples. • 4:2:2 – For every 4 luminance samples in the horizontal direction, there are 2 Cr and 2 Cb samples. This representation is used for high quality video color reproduction. • 4:2:0 – The Cr and Cb each have half the horizontal and vertical resolution of Y. This is popularly used in applications such as video conferencing, digital television and DVD storage. 8

Figure 1: 4:2:0 sub-sampling pattern [4] Figure 2: 4:2:2 and 4:4:4 sub-sampling patterns [4] 9

H.265 / High Efficiency Video Coding

• High Efficiency Video Coding (HEVC) [5] is an international standard for video compression developed by a working group of ISO/IEC MPEG (Moving Picture Experts Group) and ITU-T( International Telecommunication Union ) VCEG (Video Coding Experts Group).

• The main goal of HEVC standard is to significantly improve compression performance compared to existing standards (such as H.264/Advanced Video Coding [6]) in the range of 50% bit rate reduction at similar visual quality[7].

• The macroblocks and blocks in H.264 are replaced by CTU,CU,TU and PU in H.265/HEVC.

10

Figure 3: Block Diagram of HEVC CODEC[10] 11

HEVC Encoder and Decoder

Figure 4: Block Diagram of the HEVC Encoder[6] 12

Figure 5: Block Diagram of the HEVC Decoder[11] 13

• Each picture is split into block-shaped regions, with the exact block partitioning being conveyed to the decoder. The first picture of a video sequence is coded using only intra-picture prediction.

• The encoder and decoder generate identical inter-picture prediction signals by applying motion compensation (MC) using the MV and mode decision data, which are transmitted as side information.

• The transform coefficients are then scaled, quantized, entropy coded, and transmitted together with the prediction information.

• The quantized transform coefficients are constructed by inverse scaling and are then inverse transformed to duplicate the decoded approximation of the residual signal. The residual is then added to the prediction, and the result of that addition may then be fed into one or two loop filters to smooth out artifacts induced by block-wise processing and quantization.

• The duplicate of the output of the decoder is stored in a decoded picture buffer to be used for the prediction of subsequent pictures.

14

Coding tree units and coding tree block (CTB) structure:

Figure 6: 64*64 CTBs split into CBs [13] 15

Coding units (CUs) and coding blocks (CBs):

One Luma CB and ordinarily two Chroma CBs, together with associated syntax, form a coding unit (CU) as shown in Figure.7. A CTB may contain only one CU or may be split to form multiple CUs, and each CU has an associated partitioning into prediction units (PUs) and a tree of transform units (TUs).

Figure 7: CU’s split into CB’s [13] 16

Prediction units and prediction blocks (PBs) :

Figure 8: Partitioning of Prediction Blocks from Coding Blocks [13] 17

TUs and transform blocks :

Figure 9: Partitioning of Transform Blocks from Coding Blocks [13] 18

Quantization control: As in H.264/MPEG-4 AVC, uniform reconstruction quantization (URQ) is used in HEVC, with quantization scaling matrices supported for the various transform block sizes.

Entropy coding: Context adaptive binary arithmetic coding (CABAC) is used for entropy coding. This is similar to the CABAC scheme in H.264/MPEG-4 AVC, but has undergone several improvements to improve its throughput speed (especially for parallel-processing architectures).

In-loop de-blocking filtering: A de-blocking filter similar to the one used in H.264/MPEG-4 AVC is operated within the inter-picture prediction loop. However, the design is simplified in regard to its decision-making and filtering processes, and is made more friendly to parallel processing.

Sample adaptive offset (SAO): A nonlinear amplitude mapping is introduced within the inter-picture prediction loop after the de-blocking filter. Its goal is to better reconstruct the original signal amplitudes by using a look-up table that is described by a few additional parameters that can be determined by histogram analysis at the encoder side.

19

Introduction to Screen Content Coding

• Screen content refers to images and videos which contain computer generated objects or screen shots from computer applications.

• This kind of content requires efficient compression solutions as its use is becoming more popular in emerging technologies such as desktop sharing, video walls in control rooms, wireless display and digital remote operating rooms for surgeries [15], [16].

• Screen content differs significantly from the camera captured content due to the presence of high frequency features such as sharp edges and high contrast areas.

• The presence of these features reduces the coding efficiency of classical hybrid block-based image and video codecs which use spatial transforms to compact the energy of signals into a few lower frequency coefficients.

20

Residual DPCM in HEVC Inter-prediction

General considerations and the HEVC coding structure

• The HEVC standard performs inter-prediction by means of block-based motion compensation which assumes that all the pixels inside a block move approximately with the same motion. This assumption leads to poor prediction performance along sharp edges.

• In screen content it is reasonable to expect that inter-prediction residuals still present some correlation along image edges, which can be exploited by performing a spatial DPCM along the edge direction. This intuition is the basis for the proposed inter RDPCM.

• Several directions may be considered; however, to limit the computational complexity, only horizontal and vertical ones are included since they are predominant in screen content.

21

General method for inter RDPCM

Let r(i, j) be the elements of an M×N residual block of inter-predicted luma or chroma samples where M and N are the block height and width respectively. The vertical inter RDPCM mode is defined as follows.

Figure 10: Hierarchical prediction (green line) - an additional step is applied on the top row after vertical RDPCM (red lines)[1].

The horizontal inter RDPCM mode is defined in a similar way.

22

• At the decoder side, when vertical RDPCM is selected, the residuals r(i, j) to be added to the motion compensated prediction are obtained as follows For horizontal RDPCM, the summation is performed across the current row.

23

Additional tools for inter RDPCM:

• These two observations motivated the design of the two proposed prediction chunking (PC) and hierarchical prediction (HP) tools.

• The prediction chunking tool limits the residual DPCM prediction to groups of samples with a specified length L, denoted as chunking length. In this way the RDPCM process is reset every L samples so that the number of operations per sample at the decoder side is reduced. The vertical RDPCM prediction when the PC tool is used is defined as follows: 24

• At the decoder, the residuals can be reconstructed as follows: where the operator returns the largest integer smaller than or equal to the argument than its argument. Equivalent expressions for forward and inverse inter RDPCM can be easily derived for the horizontal mode when using PC.

25

• Once RDPCM is performed on a block, samples in the first column and the first row for horizontal and vertical RDPCM, respectively, are not predicted. Therefore it is beneficial to exploit redundancy by performing prediction on these samples in the direction orthogonal to the main RDPCM direction. The HP tool performs a RDPCM along the first column of samples when horizontal RDPCM is selected as the best mode or along the first row for vertical RDPCM. For the case of vertical RDPCM, the HP is defined as: 26

Test Configurations Intra-only configuration

• All the frames are encoded as independent I frames. • Have no dependency with the neighboring frames.

• Spatial compression is seen.

0 1 2 3 4 5 6 7 8 QPI QPI ・・・・・ IDR Picture Figure 19: Graphical presentation of Intra-only configuration [34].

time 27

Low-delay configuration

• Only the first frame in the video sequence is encoded as I frame.

• Other successive pictures will be encoded as Generalized P and B-picture (GPB).

• Both Spatial and Temporal compression occurs.

QPB L3 =QPI+3 1 2 QPB L3 =QPI+3 3 QPB L3 =QPI+3 5 6 QPB L3 =QPI+3 7 GPB(Generalized P and B) Picture 0 4 8 QPI QPB L2 =QPI+2 QPB L2 =QPI+2 QPB L1 =QPI+1 QPB L1 =QPI+1 IDR or Intra Picture time Figure 20 : Graphical presentation of Low-delay configuration [34].

28

• • •

Random-access configuration

Only the first frame in the video sequence is encoded as I frame.

Other successive pictures will be encoded as Generalized P and B-picture (GPB).

The frames may be encoded in a random manner with ‘Open GOP’. QPB L4 =QPI+4 0 5 3 6 Referenced B Picture 2 7 4 8 QPB L3 =QPI+3 QPB L2 =QPI+2 Non-referenced B Picture 1 GPB(Generalized P and B) Picture QPI IDR or Intra Picture Referenced B Picture QPB L1 =QPI+1 time Figure 21 : Graphical presentation of Random-access configuration [34].

29

Comparison Metrics

Peak Signal to Noise Ratio

 Peak signal-to-noise ratio (PSNR) [35] [36] is an expression for the ratio between the maximum possible value (power) of a signal and the power of distorting noise that affects the quality of its representation.  PSNR is usually expressed in terms of the logarithmic decibel scale.

Bjontegaard Delta Bit-rate (BD-BR) and Bjontegaard Delta PSNR (BD-PSNR)

 BD metrics allow to compute the average gain in PSNR or the average per cent saving in Bit-rate between two rate-distortion curves.  However, BD-PSNR has a critical drawback: It does not take the coding complexity into account [37].

Implementation Complexity

 The computational time for various configuration profiles in HM-16.4+SCM-4.0rc1/ software will be compared and this serves as an indication of implementation complexity.

30

Test Sequences

Figure 22 : Different video resolutions ranging from mobile devices, tablets to advanced Televisions [39].

31

• The following test sequences [23] of various resolutions are used for study of different configuration profiles of HEVC codecs:

Table 1 : List of test sequences for different resolutions

32

33

Implementation

Parameters modified F - Number of frames to be encoded Fr - Frame rate Wdt - Width of the video sequence Hgt - Height of the video sequence Profile - encoder_intra_main/ encoder_randomaccess _main /encoder_ lowdelay_main.

Testing Platform Processor - Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz

Number of cores - 4 Memory - 4.00 GB Operating System - 64 bit Windows(TM)7 Ultimate OS 34

Sample command line parameters for HM-16.4+SCM-4.0RC1

Encoding

C:\Users\Siddu_Pratapur\Desktop\HEVC16.4\bin\vc9\Win32\Debug>TAppEncoder.exe –c C:\Users\Siddu_Pratapur \Desktop\HEVC16.4\cfg\encoder_intra_main.cfg –wdt 352 –hgt 288 –fr 24 –f 90 -i C:\Users\Siddu_Pratapur\Desktop\Test_sequences_n_results\Without_modifications\CIF\Elephant\Elepha ntsDream_CIF_24fps.yuv

>>C:\Users\Siddu_Pratapur\Desktop\Test_sequences_n_results\Without_modifications\CIF\Elephant\Enc oded_data\Log_Encoded_IM.txt

Decoding

C:\Users\Siddu_Pratapur\Desktop\HEVC16.4\bin\vc9\Win32\Debug>TAppDecoder.exe –b C:\Users\Siddu_Pratapur\Desktop\Test_sequences_n_results\Without_modifications\CIF\Elephant\Encod ed_data\str_IM.bin –o C:\Users\Siddu_Pratapur\Desktop\Test_sequences_n_results\Without_modifications\CIF\Elephant\Decod ed_data\Elephant_Decoded.yuv >>C:\Users\Siddu_Pratapur\Desktop\Test_sequences_n_results\Without_modifications\CIF\Elephant\Dec oded_data\Elephant_Decoded_IM.txt

35

36

Bit-rate Comparison

37

Size of the binary file

38

%BD Bit-rate

39

BD-PSNR

40

Encoding time

41

Decoding time

42

Further work : The test sequences for higher mobile resolutions like 640x480

have to be tested and their respective results are expected to be tabulated and graphs are to be displayed. 43

REFERENCES

[1] M. Naccari1et.al, “Improving Inter Prediction in HEVC with Residual DPCM for Lossless Screen Content Coding”, Picture Coding Symposium (PCS), 2013 , San Jose, CA, 361 – 364, 8-11 Dec. 2014.

[2] HM-16.4+SCM-4.0rc1/ 4.0rc1/ software - https://hevc.hhi.fraunhofer.de/trac/hevc/milestone /HM-16.4+SCM [3] Video test sequences https://media.xiph.org/video/derf/ [4] I.E.G. Richardson, “Video Codec Design: Developing Image and Video Compression Systems”, Wiley, 2002. [5] B. Bross et al, “High Efficiency Video Coding (HEVC) Text Specification Draft 10”, Document JCTVC L1003, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC),Mar.2013 http://phenix.itsudparis.eu/jct/doc_end_user/current_document.php?id=7243 [6] D. Marpe et al, “The H.264/MPEG4 advanced video coding standard and its applications”, IEEE Communications Magazine, Vol. 44, pp. 134-143, Aug. 2006.

[7] G. J. Sullivan et al, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 22, No. 12, pp. 1649-1668, Dec. 2012. [8] HEVC white paper-Ateme: http://www.ateme.com/an-introduction-to-uhdtv-and-hevc 44

[9] G. Sullivan et al, “Standardized Extensions of High Efficiency Video Coding (HEVC)”, IEEE Journal of selected topics in Signal Processing, Vol. 7, No. 6, pp. 1001-1016, Dec. 2013.

[10] HEVC tutorial by I.E.G. Richardson: http://www.vcodex.com/h265.html

[11] C. Fogg, “Suggested figures for the HEVC specification”, ITU-T / ISO-IEC Document: JCTVC J0292r1, July 2012.

[12] T. Wiegand et al, “Overview of the H.264/AVC Video Coding Standard”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, Jul. 2003.

[13] Uma Sagar Madhugiri Dayananda, “Study and Performance comparison of HEVC and H.264 video codecs” M.S. Thesis, EE Dept., UTA, Arlington, TX, Dec. 2011.

[14] B. Bross et al, “High Efficiency Video Coding (HEVC) Text Specification Draft 10”, Document JCTVC L1003, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC), Mar. 2013 available on http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=7243 [15] T. Vermeir, “Use cases and requirements for lossless and screen content coding”, JCTVC-M0172, 13 th JCT VC meeting, Incheon, KR, Apr. 2013. 45

[16] J. Sole, R. Joshi and M. Karczewicz, “Requirements for wireless display applications”, JCTVC-M0315, 13 th JCT-VC meeting, Incheon, KR, Apr. 2013. [17] A. Gabriellini, D. Flynn; M. Mrak and T. Davies, “Combined Intra-Prediction for High-Efficiency Video Coding”, IEEE J. of Sel. Topics in Signal Processing. Vol. 5, no. 7; pp. 1282-1289, Nov. 2011.

[18] Software repository for HEVC http://hevc.hhi.fraunhofer.de/ [19] HM Software Manual https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/ [20] Visual studio: http://www.dreamspark.com

[21] Tortoise SVN: http://tortoisesvn.net/downloads.html

[22] Multimedia processing course website: http://www.uta.edu/faculty/krrao/dip/ [23] K.R. Rao, D.N. Kim and J.J. Hwang, “Video Coding Standards: AVS China, H.264/MPEG-4 Part 10, HEVC, VP6, DIRAC and VC-1”, Springer, 2014.

[24] Vivienne Sze, Madhukar Budagavi, Gary J. Sullivan,”High Efficiency Video Coding (HEVC) Algorithms and Standards” Springer,2014.

46

[25] G. Braeckman, S. M. Satti, H. Chen, P. Schelkens and A. Munteanu,"Lossy-to-Iossless screen content coding using an HEVC base-layer." in Proceedings of IEEE international Conference on Signal Processing (DSP), Santorini, Greece, 1-3 July, 2013.

[26] M. Mrak and Ji-Zheng Xu, "Improving screen content coding in HEVC by transform skipping," in Proceedings of 20th European Signal Processing Conference (EUSIPCO), August 2012.

[27] M. Wien, H. Schwarz, and T. Oelbaum, “Performance analysis of SVC,” Special issue on Scalable Video Coding (SVC), IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 9, pp. 1194 1203, Sep. 2007.

[28] W. Zhu,et al, "Screen Content Coding Based on HEVC Framework," IEEE Trans. on Multimedia, vol.16, no.5, pp.1316-1326, Aug. 2014.

[29] I.E.G. Richardson, “ Coding Video: A practical guide to HEVC and beyond”, Wiley, 11 May 2015. [30] Aggelos K. Katsaggelos ,“Fundamentals of Image and Video Coding” , Northwestern University https://www.coursera.org/course/digital .

[31] N.N. Mundgemane, A thesis proposal on “Multi-stage prediction scheme for Screen Content based on HEVC” M.S. Thesis, EE Dept., UTA, Arlington, TX, Sep. 2014, available on http://www.uta.edu/faculty/krrao/dip/Courses/EE5359/index_tem.html

47

[32] S. Kodpadi, A thesis proposal on “Fast algorithms for Screen Content Coding in HEVC” M.S. Thesis, EE Dept., UTA, Arlington, TX, Sep. 2014, available on http://www.uta.edu/faculty/krrao/dip/Courses/EE5359/index_tem.html [33] W. Zhu, et al, "Compound image compression by multi-stage prediction," IEEE Trans. on Visual Communications and Image Processing (VCIP), pp.1-6, 27-30 Nov. 2012.

[34] I.K. Kim et al ,”CODING OF MOVING PICTURES AND AUDIO”, ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Document: JCTVC K1002-v1 ,11th Meeting: Shanghai, CN, 10–19 October 2012.

[35] White paper on PSNR-NI: http://www.ni.com/white-paper/13306/en/ [36] Website on PSNR: http://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio [37] X. Li et al, “Rate-complexity-distortion evaluation for hybrid video coding”, IEEE International Conference on Multimedia and Expo (ICME), pp. 685-690, Jul. 2010.

[38] G. Bjontegaard, “Calculation of Average PSNR Differences between RD Curves”, document VCEG-M33, ITU-T SG 16/Q 6, Austin, TX, Apr. 2001.

[39] Different Video Resolutions : http://www.mediamerge.com/what-your-tech-team-needs-to-know-about-hd video-projection/ [40] Video test sequences with screen content : http://trace.eas.asu.edu/yuv/ 48

Download