Optimizing Baseline Profile in H.264/AVC Video Coding by Parallel Programming and Fast Intra and Inter predictions BY VINOOTHNA GAJULA ID 1000803103 MS in Electrical Engineering University of Texas at Arlington Under the Guidance of Dr. K. R. Rao TABLE OF ACRONYMS ASO AVC CABAC CAVLC CBP CIF DCT FMO IEC I-frame ITU-T ISO JM JVT MB MPEG MSE NAL PSNR QCIF QP RDO RS SATD SSIM VCEG VLC Arbitrary slice ordering Advanced video coding Context adaptive binary arithmetic coding Context adaptive variable length coding Coded block pattern Common intermediate format Discrete cosine transform Flexible macro block ordering International electro technical commission Intra frame International telecommunication union International organization for standardization Joint model Joint video team Micro block Moving picture experts group Mean square error Network abstraction layer Peak signal to noise ratio Quarter common intermediate format Quantization parameter Rate distortion optimization Redundant slices Sum of absolute transformed differences Structural similarity index metric Video coding experts group Variable length coding Optimizing Baseline Profile in H.264/AVC Video Coding by Parallel Programming and Fast Intra and Inter predictions OBJECTIVE: In this project, the computational complexity and encoding time of baseline profile of H.264 are reduced by using parallel programming in encoding video frames [1], [7], instead of sequentially encoding and then by using fast adaptive termination (FAT) algorithmin intra and inter predictions [2] [15]. FAT algorithm in intra prediction is executed by using simple directional masks and neighboring modes [2], [8], [15] and in inter prediction mode decision and motion estimation by adapting minimum rate distortion (RD) cost of both skip and non-skip modes and an early-skip mode detection test is proposed for skip mode and a three-stage scheme is proposed to speed up the mode decision process for non-skip mode [3], [9], [15], [20]. INTRODUCTION: H.264 also known as MPEG (Moving picture experts group) Part10/ AVC “ (MPEG-4’s advanced video coding)” was jointly published in 2003 by International standards bodies - International Telecommunication Union (ITU-T) [17], International Organization for Standardization and International Electro-Technical Commission (ISO / IEC) called as Joint Video Team (JVT) [4]. It has many advantages over previous coding standards MPEG-2 [13] and MPEG-4 [14], like significant rate distortion efficiency, achieving higher bit rate reduction, error resilience and most networks friendly compared to other standards. H.264 - PROFILES: H.264 has three major profiles which are the baseline, main and extended and in addition to the four high profiles namely High, High 10 [11], High 4:2:2 [11], and High 4:4:4 [5] [11] as given in the figure1[5] [11]. -Baseline profile is applicable in real-time conversational services such as video conferencing and video phone. [5] [11] -Main profile is designed for digital storage media and television broadcasting [5] [11]. -Extended profile targets multimedia services over the internet [5] [11]. -High, High 10, High 4:2:2, and High 4:4:4 [11] are used in the fidelity range extensions for applications such as content-contribution, content-distribution, and studio editing and post-processing respectively [5] [11]. Fig.1: Various profiles of H.264 [5] Encoder and Decoder of H.264: A H.264 is a codec i.e., a combination of encoder and decoder complimenting each other to achieve the required compression and better picture quality. H.264 encoder converts the video into a compressed format (.264 formats) and a decoder converts the compressed video back into an uncompressed format with very few losses. A H.264 video encoder carries out prediction, transform and encoding processes to produce a compressed video form as in the figure 2 [6]. Fig. 2 H.264 encoder block diagram [6] After encoding, the coded video data is organized into network abstraction layer (NAL) containing NAL units, each of which is effectively a packet that contains an integer number of bytes. Each NAL unit is a collection of SLICES which is a group of macro blocks (MB) representing MB type, prediction information, coded block pattern (CBP), residual coefficients and explained in figure 3 [4]. quantization parameter (QP) as Fig. 3: NAL unit interface between encoder and decoder [4] A H.264 video decoder function is to re-produce video sequence by carrying out the complementary functions of the encoder i.e., decoding, inverse transformation and reconstruction as explained in the figure 3a [6]. Fig. 3a H.264 decoder block diagram [6] Prediction Modes: The prediction modes in H.264 can be categorized as intra prediction (I), inter prediction (P) and their combination. INTRA PREDICTION MODE: An intra (I) macro block is a coded reference to the data only in the current slice. I macro blocks may occur in any slice type. In an intra MB the luma component can be selected in 3 ways, namely 16 × 16, 8 × 8 or 4 × 4. A single prediction block is generated for each chroma component as shown in table 1 [4]. The modes of prediction for 16 × 16 MB are given in table 2 and figure 4 [4][5]. Table 1: Various intra prediction block sizes and properties. [4] Table 2: 16x16 luma prediction modes and properties.[4] Fig 4: 16x16 luma prediction modes, all predicted fron pixelsH and V. [4] 8x8 (for Chroma) – Mode 0 (DC): mean of upper and left-hand samples (H+V). [5] Mode 1 (horizontal): extrapolation from left samples (V). [5] Mode 2 vertical): extrapolation from upper samples (H). [5] Mode 3 (Plane): a linear “plane” function is fitted to the upper and left-hand samples H and V. This works well in areas of smoothly-varying luminance. [5]The properties of the modes of prediction are given in table 3[4] and the pictorial representation in figure 5 [4Table 3: 4x4 luma prediction modes and properties[4] Table 3: 4x4 luma prediction modes and properties [4] Fig.5: 4x4 luma prediction (intra-prediction) modes in H.264[1] (Pixels A through M which have been coded and reconstructed to form the prediction for the 4 x 4 block.) INTER PREDICTION MODE: Inter prediction is the process of predicting a block of luma and chroma samples in a current frane from samples already coded and transmitted from another frame or a reference frame. Initially a prediction region is selected, generating a prediction block and this is removed/subtracted from the original block of samples to form a residual that is then transformed, coded and transmitted along with the sample number or the reference sample [4]. The MB’s are split into four types [4] as shown in the figures 6 and 7. (a) One 16x16 MB Partition. (b) Two 8x16 MB Partitions. (c) Two 16x8 MB Partitions. (d) Four 8x8 Partitions and (e) Combination of any of b, c and d. Fig.6 Macro block partitions: 16x16, 8x16, 16x8, 8x8 [4] Fig. 7 Macro block sub partitions: 8x8, 4x8, 8x4, 4x4 [4] Rate Distortion Optimization (RDO) [6], [8], [9], [20]: Once the prediction is obtained and residual is calculated for all the modes, the best mode among these modes is one which has least residual. The H.264/AVC encoder performs the rate-distortion optimization (RDO) technique for each macro block to obtain the best mode. [2] Set macro block parameters : Quantization parameter (QP) and Lagrangian multiplier λ Calculate : 0.85 x 2(QP-12)/3[2]………………………………..(1) Then calculate the cost, which determines the best mode Cost = D + λ MODE x R[2],………………………………………(2) Where D – Distortion and R - Bit rate with given QP Distortion (D) is obtained by SSD (sum of squared differences) between the original macro block and its reconstructed block. Bit rate (R) includes the number of bits for the mode information and transforms coefficients for macro block. Considering the RDO procedure for intra mode selection in H.264/AVC, the number of mode combinations in one macro block is N8x (16x N4 + N16) = 8x(16+16)=592 N8 – number of modes of an 8x8 chroma block N4 – number of modes of a 4x4 luma block N16 – number of modes of a 16x16 luma block The H.264/AVC encoder carries out 592 RDO calculations to choose the best matching MB. As a result, the complexity of the encoder increases extremely [16]. INPUT FORMATS: H.264 can compress planar and interleaved/packed raw image data (viz., yuv, rgb) and depending upon the video, it converts them into intermediate formats like CIF (common intermediate format), QCIF (quarter common intermediate format), Sub-QCIF and 4 CIF. But mostly CIF and QCIF are used here. The resolutions of the different formats are shown in the table 4 [4]. The resolutions of CIF and QCIF for 4:2:0 sampling are shown in figure 8 [18]. Table 4: Different intermediate formats[4] Fig. 8 CIF and QCIF resolutions(Y, Cb, Cr), (4:2:0) [8][9] QUALITY MEASUREMENT: The major challenge is determining the quality of the image/video obtained as measuring visual quality using objective criteria gives accurate and repeatable results but as yet there are no objective measurement systems that completely reproduce the human visual system [11]. PSNR Peak signal to noise ratio (PSNR) is measured on a logarithmic scale and depends on the mean squared error (MSE) between an original and a decoded/lossy image or video frame, relative to the square of the highest-possible signal value in the image, where n is the number of bits per image sample [11]. PSNRdB = 10 log10 ((2n − 1)2/ MSE) This is easy to calculate and is widely used and the most popular measure of quality. SSIM The structural similarity index (SSIM) is a method to measure the similarity between two images.But, while calculating SSIM, the reference image used is assumed as a perfect one i.e., the original image without any artifacts. Hence, SSIM is measured by providing the original image or image which is the most close to original one [4]. OPTIMIZATION PROCESS OF BASELINE PROFILE: H.264 provides the best compression but is computationally much more complex than any of the previous codecs and also time consuming for real time applications. So to make H.264 more adaptable for practical application, the encoding time is to be reduced. In this project, encoding time reduction is achieved by applying following methods simultaneously. 1. Parallel programming in baseline profile [7], 2. Fast algorithm for intra mode selection [8] and 3. Fast algorithm for inter mode selection [9] [20]. Baseline profile is selected because of the ease of implementation and the important features of baseline profile are: a) I and P slice coding. b) Enhanced error resilience such as flexible macro block ordering (FMO) and arbitrary slice ordering(ASO) and redundant slices (RS). c) Context adaptive variable length coding (CAVLC) Baseline profile is primarily used for low-cost applications, for data loss robustness like video conferencing and videophone. The joint model (JM 18.0) implementation of the H.264 encoder is used in this project [10]. 1. Parallel Programming in Baseline Profile [7]: This parallel programming is done by considering several frames together for encoding. This can be achieved by The strategy adopted for encoding the frames to be parallel is as follows: Step1. Separate the total number of frames to encode into 2 equal sets. Ex: If the total number of frames to be encoded is 30, then part ion is done as the frame numbers from 1 to 15 into set 1 and frame numbers from 16 to 30 into set 2 . Step2. Perform the parallel intra coding on two frames in both partitions. Ex: Frame 1 and frame 16 together. Frame 1 can be used as a reference frame for frame 2 and frame 16 can be used as a reference frame for frame 17 and so on. Step3. Perform inter coding on frame 2 and frame 17 by incorporating changes in the encoding algorithm using Open MP. Repeat for frame 3 and frame 18 and so on till all the frames are encoded, as given in the figure 9. FRAME 1 FRAME 2 FRAME 3 FRAME 15 INTRA INTER INTER INTER PARALLEL ENCODING PARALLEL ENCODING PARALLEL ENCODING PARALLEL ENCODING INTRA INTER INTER INTER FRAME 16 FRAME 17 FRAME 18 FRAME 30 Fig.9: Parallel processing of frames to reduce encoding time[7] 2. Fast Algorithm for Intra Mode Selection[8]: Proposed intra mode selection algorithm for a 4x4 luma block [12], [8]: In figure 10, black dots indicate positions of the pixels to be computed for investing directional correlation in the 4x4 luma block, and arrows represent the directions of correlation associated with the corresponding mask. Since directions of the H.264/AVC intra-prediction are limited to 8 directions except DC mode, 8 directional masks are proposed instead of a precise edge detector such as Sobel operator [16]. One candidate mode with the minimum difference is selected [8]. Fig.10: The proposed directional masks for a 4.4 luma block. (a) Vertical, (b) Horizontal, (c) Diagonal down left, (d) Diagonal down right, (e) Vertical right, (f) Horizontal down, (g) Vertical left, (h) Horizontal up mask [8]. Fig. 11: Pixel indices and modes of adjacent blocks used in the proposed intra mode selection algorithm. (a) Indices used in (3) to (10) for a 4x4 luma block, (b) Modes of upper and left blocks for additional candidate modes [8]. Diff = |a – m| + |b – n| + |c – o| + |d – p|, for vertical direction, (3) Diff = |a – d| + |e – h| + |i – l| + |m – p|, for horizontal direction, (4) Diff = |c – i| + 2·|d – m| + |h – n|, for diagonal down left direction, (5) Diff = |b – l| + 2·|a – p| + |e – o|, for diagonal down right direction, (6) Diff = |a – n| + 2·|b – o| + |c – p|, for vertical right direction, (7) Diff = |a – h| + 2·|e – l| + |i – p|, for horizontal down direction, (8) Diff = |b – m| + 2·|c – n| + |d – o|, for vertical left direction, (9) Diff = |e – d| + 2·|i – h| + |m – l|, for horizontal up direction, (10) Where a to p denote the pixels for investing directional correlation associated with the corresponding mask of the indices for pixel positions used in (3) to (10) as shown in figure 10. Diff is used as a criterion for correlation, i.e., the direction with smaller Diff is the more correlated one. From the second observation, additional candidate modes are obtained by using mode information of adjacent blocks, where one is the upper block with the corresponding mode of mode A and the other is the left block with the corresponding mode of mode B, as shown in figure11 [8]. The additional modes are included namely mode A and mode B, to the candidate modes for RDO procedure. Since the directions in the H.264/AVC intraprediction are defined with the directional relation between current block and boundary pixels of adjacent blocks, instead of direction within the current block only. In this case, one mode when mode A and mode B are the same, or two modes when mode A and mode B are different from each other, is included in RDO procedure. [8] To determine whether DC mode is included in RDO procedure or not, the sum(S) of difference between an average of current block to each pixel (pi) is considered (11). Where the condition is , and pi is each pixel of current block.…….(11) Condition 1: If S is smaller than a threshold, T1, RDO is carried out for at most 4 candidate modes, i.e., one mode from the proposed masks, at most two modes from adjacent blocks, and DC mode [8]. Condition 2: If S is larger than a threshold, T1, RDO is performed for at most 4 candidate modes, i.e., two modes from the proposed masks (with minimum and second minimum Diff) and at most two modes from adjacent blocks [8]. The proposed intra mode selection algorithm for a 4x4 luma block is summarized as follows: Step 1 - For a 4x4 luma block, obtain avg and S by (1). [8] Step 2a - If S is larger than a threshold, T1, carry out RDO procedure for at most 4 candidate modes: two modes with minimum and second minimum Diff by (3) to (10), and at most two modes from adjacent blocks. In this case, DC mode of adjacentblocks is excluded from RDO procedure [8]. Step 2b - If S is smaller than a threshold, T1, carry out RDO procedure for at most 4 candidate modes: one mode with minimum Diff by (3) to (10), at most two modes from adjacent blocks, and DC mode [8]. Proposed intra mode selection algorithm for a 16x16 luma block [12], [8]: Step 1 - Examine sizes of adjacent blocks: if both blocks (upper block and left block) are 16x16, go to Step 2, otherwise go to Step 4 [8]. Step 2 - Examine modes of adjacent blocks: if both modes are same, go to Step 3, otherwise select the best mode for a 16x16 luma block, which results in the minimum SATD (sum of absolute transformed differences) between two adjacent modes of mode A and mode B [8]. Step 3 - If both adjacent modes are DC mode, go to Step 4, and otherwise select the best mode for a 16x16 luma block, which results in the minimum SATD between the adjacent mode and DC mode [8]. Step 4 - Let ΔV be a vertical difference between upper boundary pixels of the current block and boundary pixels of the upper block, and ΔH be a horizontal difference between left boundary pixels of the current block and boundary pixels of the left block as follows [8]. Where, ΔV = Σ |u(i)-q(i)| for i =0 to 15. ΔH = Σ |l(i)-r(i)| for i =0 to 15. u(i) -> upper block boundary pixels, q(i) -> upper boundary pixels of current block, l(i) -> boundary pixels of the left block, and r(i) -> left boundary pixels of the current block. Fig. 12: Calculation for ΔV and ΔH in 16x16 luma block [2] [8]. Obtain candidate modes by using two difference values, ΔV and ΔH: if |ΔV − ΔH | is smaller than 2xT2, candidate modes are DC mode and plane mode as shown in the figure 12; if (ΔV − ΔH) is larger than T2, candidate modes are DC mode and horizontal mode; if (ΔV − ΔH) is smaller than T2, candidate modes are DC and vertical mode, where T2 is a positive value. The threshold T2 is set equal to 32. Finally, select the best mode between each candidate mode by choosing the mode with minimum SATD. 3. Fast algorithm for Inter Mode Selection [9], [20]: FAT for mode decision exploits statistical similarity between current macro block and predicted macro block. Predicted mode is obtained from the spatial and temporal macro blocks. For accuracy, the rate distortion cost is checked against adaptive Threshold I and adaptive Threshold II Adaptive Threshold I: RD thres = RD pred x (1-8xβ) Adaptive Threshold II: RD thres = RD pred x (1+10xβ) Such that ………. (4) Where, β is the modulator, N is the rows of the image and M is number of columns of N X M MB. If the predicted mode is less than P 8 x 8, it is checked if the current macro block is homogeneous or not. Further partitioning is done into 8x4, 4x8 and 4x4 blocks, if the current macro block is not homogenous. A mode histogram from spatial and temporal neighboring macro blocks is obtained; then the best mode as the index corresponding to the maximum value in the mode histogram is selected. The average ratedistortion cost of each neighboring macro block corresponding to the best mode is then selected as the prediction cost for the current macro block [9], [20]. FAT Algorithm [9][20]:The algorithm is given in figure 13 and is explained below: Step 1: If current macro block belongs to I slice, check for intra prediction using I4x4 or I16x16, go to step 10 else go to step 2. Step 2: If a current macro block belongs to the first macro block in P slice check inter and intra prediction modes, go to step 10 else go to step 2. Step 3: Compute mode histogram from neighboring spatial and temporal macro blocks, go to step 4. Step 4: Select prediction mode as the index corresponding to maximum in the mode histogram and obtain values of adaptive Threshold I and adaptive Threshold II, go to step 5. Step 5: Always check over P16x16 mode and check the conditions in the skip mode, if the conditions of skip mode are satisfied go to step 10, otherwise go to step 6. Step 6: If all left, up , up-left and up-right have skip modes, then check the skip mode against, then check the skip mode against adaptive Threshold I if the rate distortion is less than adaptive Threshold I , the current macro block is labeled as skip mode and go to step 10, otherwise, go to step 7. Step 7: First round check over the predicted mode; if the predicted mode is P8x8, go to step 8; otherwise, check the rate distortion cost of the predicted mode against Adaptive Threshold I. If the RD cost is less than Adaptive Threshold I, go to step 10; otherwise go to step 9. Step 8: If a current P 8x8 is homogeneous, no further partition is required. Otherwise, further partitioning into smaller blocks 8x4, 4x8, 4x4 is performed. If the RD of P 8x8 is less than Adaptive Threshold I, go to step 10; otherwise go to step 9. Step 9: Second round check over the remaining modes against Adaptive Threshold II : If the rate distortion is less than Adaptive Threshold II; go to step 10; otherwise continue check all the remaining modes, go to step 10. Step 10: Save the best mode and rate distortion cost. Fig 13: Flow chart for inter prediction [9] [20] CONCLUSION: As proposed by implementing parallel programming in baseline profile along with FATalgorithm in intra and inter prediction modes on numerous test subjects, and by obtaining various quality measurements like PSNR and SSIM , the optimized baseline profile will be obtained. The performance of the optimized H.264 baseline profile is compared with the H.264 baseline profile using the quality measurements, and thus the faster computation speed, video quality and bit rates can be calculated based on various test sequences. REFERENCES: [1] H. Kalva, “Parallel programming for multimedia applications”, Springer Science and Business Media, Florida Atlantic University, Florida, USA, Dec. 2010. [2] J. Kim, D. Kim, and J. Jeong, “Complexity reduction algorithm for intra mode selection in H.264/AVC video coding” J. Blanc-Talon et al. (Eds.): ACIVS 2006, LNCS 4179, pp. 454 – 465, Springer-Verlag Berlin Heidelberg, 2006. [3] J. Ren, et al, “Computationally efficient mode selection in H.264/AVC video coding”, IEEE Trans. on Consumer Electronics, vol. 54, pp. 877 – 886, May 2008. [4] I. Richardson, “The H.264 advanced video compression standard” –second edition, Wiley, 2010. [5] I. E. G. Richardson, “H.264 and MPEG-4 video compression: video coding for next generation multimedia”, Wiley 2nd edition, Aug. 2010. [6] D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its applications”, IEEE Communications Magazine, vol. 44, pp. 134-143, Aug. 2006. [7] T. Saxena, “Reducing the encoding time of H.264 baseline profile using parallel programming techniques”, M.S., Thesis EE, UTA, expected Dec. 2012. [8] S.K Muniyappa, “Implementation of complexity algorithm for intra mode selection in H.264/AVC video coding”, M.S., Thesis EE, UTA, Dec. 2011. [9] A. Kulkarni, ”Implementation of fast inter-prediction mode decision algorithm in H.264/AVC video encoder”, ” M.S., Thesis EE, UTA, May 2012. [10] JM reference software, Fraunhofer Institute for Telecommunications Heinrich Hertz Institute. http://iphome.hhi.de/suehring/tml/. [11] G. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC advanced video coding standard: overview and introduction to the fidelity range extensions”, SPIE Conference on Applications of Digital Image Processing XXVII, vol. 5558, pp. 53-74, 2004. [12] F. Pan et al, “Fast intra mode decision algorithm for H.264/AVC video coding”, in Proc.IEEE Int. Conf. Image Process., pp. 781–784, Singapore, Oct. 2004. [13] I. E.G. Richardson, “H.264 and MPEG-4 video compression: video coding for next-generation multimedia”, Wiley, 2003. [14] ISO/IEC 11172-5. Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbps. Nov. 1998. [15] M. Jafari and S. Kasaei, “Fast intra- and inter-prediction mode decision in H.264 advanced video coding”, International Journal of Computer Science and Network Security, VOL.8 No.5, pp. 1-6, May 2008. [16] T. Stockhammer, D. Kontopodis, and T. Wiegand, “Rate-distortion optimization for H.26L video coding in packet loss environment,” in Proc. Packet Video Workshop 2002, Pittsburgh, PA, April 2002. [17] Draft ITU-T Recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264/ISO/IEC 14 496-10 AVC), Mar. 2003. [18] YUV test video sequences : http://trace.eas.asu.edu/yuv/. [19] T.Wiegand, et al, “Overview of the H.264/AVC Video Coding Standard.” IEEE Trans. Circuits and Syst. for Video Technol., Vol. 13, pp. 560-576, July 2003. [20] D. Han, A. Kulkarni and K.R. Rao, “Fast inter-prediction mode decision algorithm for H.264 video encoder”, ECTICON 2012, Cha Am, Thailand, May 2012.