Multimedia Processing Term project ERROR CONCEALMENT TECHNIQUES IN H.264 VIDEO TRANSMISSION OVER WIRELESS NETWORKS Spring 2011 Dr. K. R. Rao Murtaza Mustafa Zaveri (1000671952) murtaza.zaveri@mavs.uta.edu EE 5359 Multimedia Processing Acknowledgement I would like to acknowledge the guidance of Dr. K.R. Rao (Electrical Engineering Department at the University of Texas at Arlington) and his insightful support and inspiration throughout the various stages of this project. I sincerely appreciate the help and advice given by Dr. Rao which went a long way in helping me understand the key underlying concepts of the project. Several of the resources available at the University of Texas at Arlington were of great assistance in the fruitful completion of this project. I would also like to thank my fellow students for their valuable and timely inputs. EE 5359 Multimedia Processing List of figures and tables: Figure 1: Position of H.264/MPEG-4 AVC standard Figure 2: Block diagram of a H.264 encoder Figure 3: Block diagram of a H.264 decoder Figure 4: Illustration of H.264/MPEG4-AVC profiles Figure 5: Typical situation in 3G/4G cellular telephony Figure 6: Illustration of spatio-temporal error propagation Figure 7: Illustration of error propagation Figure 8: VCL/NAL layers of H.264 Figure 9: Sizes of various macroblocks Figure 10: Weighted Averaging Figure 11: Copy paste algorithm Figure 12: Boundary matching algorithm Figure 13: Block matching Figure 14: Plot of the comparison between the weighted averaging and copy paste algorithms of e error concealment using the suzie_qcif video sequence Figure 15: Plot of the comparison between the weighted averaging and copy paste algorithms of e error concealment using the foreman_qcif video sequence Table 1: Comparison of Average PSNR values using the sequence foreman_qcif before and after e error concealment Table 2: Comparison of Average PSNR values using the sequence foreman_qcif before and after e error concealment EE 5359 Abbreviations and Acronyms: PSNR - Peak Signal to Noise Ratio AVC - Advanced Video Coding JVT - Joint Video Team VCEG - Video Coding Experts Group MPEG - Moving Picture Experts Group CAVLC - Context adaptive variable length coding CABAC - Context adaptive binary arithmetic coding QCIF - Quadrature Common Intermediate Format MMS - Multimedia Messaging Service VCL - Video Coding Layer NAL - Network Abstraction Layer MB - Macroblock MV - Motion Vector SSIM - Structural Similarity Index Metric SAD - Sum of Absolute Differences Multimedia Processing EE 5359 Multimedia Processing Abstract: In the last few years, the birth and the development of various wireless technologies have been seen. Cellular telephony has been the most important development in recent times. Initially, cellular telephony was conceived for voice communication; however, nowadays it is able to provide a diverse variety of services, such as data, audio and video transmission due to the advent of the third and fourth generation (3G/4G) developments in cellular telephony. The H.264 [14] standard of video compression is used for a large range of video consumer applications, such as television broadcasting, streaming multimedia and video conferencing. Wireless video transmission suffers due to the imperfections in the communication channel, which often result in packet losses, and lead to frame loss or corrupted areas in the decoded frame. This kind of corruption tends to spread spatio-temporally in the current and consecutive frames. In order to curb the effects of this corruption various error resilience and concealment techniques are currently used in the H.264 standard [1]. Two error concealment techniques are used in order to demonstrate the effects of error concealment in mitigating corruption in both the spatial and temporal domains, thereby improving the quality of wireless video transmission. Also discussed is the use of metrics like the peak signal to noise ratio (PSNR) in order compare and evaluate the quality of reconstruction of the end video from the erroneous video received. EE 5359 Multimedia Processing The H.264 standard: H.264 / MPEG-4 (Part 10) Advanced Video Coding (commonly referred as H.264/AVC) [14] is the newest entry in the series of international video coding standards. It is currently the most powerful and state-of-the-art standard, and was developed by a Joint Video Team (JVT) consisting of experts from ITU-T’s Video Coding Experts Group (VCEG) and ISO/IEC’s Moving Picture Experts Group (MPEG). As has been the case with past standards, its design provides the most current balance between the coding efficiency, implementation complexity, and cost – based on state of VLSI design technology (CPUs, DSPs, ASICs, FPGAs, etc.). In the process, a standard was created that improved coding efficiency by a factor of at least about two (on average) over MPEG-2 [6] – the most widely used video coding standard today – while keeping the cost within an acceptable range. In July, 2004, a new amendment was added to this standard, called the Fidelity Range Extensions (FRExt, Amendment 1), which demonstrates even further coding efficiency against MPEG-2, potentially by as much as 3:1 for some key applications[6]. Figure 1 shows the evolution of the H.264 standard and illustrates its position. Figure 1: Position of H.264/MPEG-4 AVC standard [9] EE 5359 Multimedia Processing The basic source coding algorithm used in H.264 is a hybrid of inter-picture prediction, in order to exploit the temporal statistical dependencies and transform coding of the prediction. Figures 2 and 3 show the block diagrams of the H.264 encoder and decoder respectively. The picture is essentially split into blocks. The first picture of a sequence or a random access point is typically “Intra” coded, i.e., without using information other than that contained in the picture itself. Each sample of a block in an Intra frame is predicted using spatially neighboring samples of previously coded blocks. Figure 2: Block diagram of a H.264 encoder [14] EE 5359 Multimedia Processing Figure 3: Block diagram of a H.264 decoder [14] H.264 Encoder – Profiles The seven prominent profiles used in H.264 are: • Baseline profile • Main profile • Extended profile • High Profile • High 10 Profile • High 4:2:2 Profile • High 4:4:4 Profile EE 5359 Multimedia Processing Figure 4 below illustrates the specific coding profiles of the H.264 standard. Figure 4: Illustration of H.264/MPEG4-AVC profiles [8] H.264 Encoder baseline profile: This project was implemented on the baseline profile. The baseline profile is primarily designed for: • Low processing power platforms • Error prone transmission environments Its features include: • Low on coding efficiency • I- and P- slice coding • Enhanced error resilience coding such as flexible macroblock ordering • Context adaptive variable length coding (CAVLC) EE 5359 Multimedia Processing Features not included in baseline profile are as follows: • B- slices, SI- or SP- slices. • Interlace coding tools. • Context adaptive binary arithmetic coding (CABAC) Error Propagation: A typical situation in mobile telephony is as shown in figure 5 below: Figure 5: Typical situation in 3G/4G cellular telephony [20] A user with a mobile device solicits a video service. The video stream is obtained from the application server over the network and is transmitted over the wireless environment to the user. During the transmission, video sequence may undergo problems due to the fact that it is an error prone environment. This system, because of the bandwidth limitation, works with low resolution (QCIF 176 x 144) videos, thus the loss of one packet means a big loss of information. As this process is a real time process, it is not possible to perform retransmissions. The only way to fix the errors produced by packet losses is to use error concealment methods at the mobile terminal. Major categories for video services over the wireless systems include: Multimedia messaging services (MMS) Packet-switched pre-coded streaming service Circuit-switched and packet-switched conversational services. Conversational services are especially susceptible to error as the time-delay requirement here is stringent. EE 5359 Multimedia Processing As mentioned before, wireless video transmission suffers due to the imperfections in the communication channel, which often results in packet losses, and leads to frame loss or corrupted areas in the decoded frame. This kind of corruption tends to spread spatio-temporally in the current and consecutive frames as shown in figure 6. This is because H.264 employs predictive coding. H.264 is thus susceptible to error propagation due to channel noise which in turn leads to a considerable degradation in the video quality. [1] Figure 6: Illustration of spatio-temporal error propagation [19] An illustration of the spread of the error in the spatial and temporal domain is shown in the sequence of frames in the figure 7 below: Figure 7: Illustration of error propagation [17] EE 5359 Multimedia Processing Error Resilience: In order to provide better coding efficiency, the H.264 standard gives strong emphasis to error resiliency and the adaptability to various networks. H.264/AVC has adopted a two-layer structure design containing a video coding layer (VCL), which is designed to obtain highly compressed video data, and a network abstraction layer (NAL), which formats the VCL data and adds corresponding header information for adaptation to various transportation protocols or storage media [19]. Figure 8 illustrates the VCL/NAL layers of H.264. Figure 8: VCL/NAL layers of H.264 [19] To perform video coding, a frame is divided into macroblocks (MBs). For each MB, motion estimation finds the best match from the reference frame(s) by minimizing the difference between the current MB and the candidate MBs (from the reference frame). The sizes of the various MBs are as shown in figure 9. These residual MBs form a residual frame that is essentially the difference between the current frame and the corresponding motion compensated predicted frame. Simultaneously, motion vectors (MVs) are used to encode the locations of MBs that have been used to each MB in the current frame. The residual frame is then transformed through DCT or integer transform, and quantized. [15] EE 5359 Multimedia Processing Figure 9: Sizes of various macroblocks [25] Error Concealment: The main task of error concealment is to replace missing parts of the video content by previously decoded parts of the video sequence in order to eliminate or reduce the visual effects of bit stream error. Error concealment exploits the spatial and temporal correlations between the neighboring image parts within the same frame or from the past and future frames. [2] The various error concealment methods can be divided into two categories: error concealment methods in the spatial domain and error concealment methods in the time domain. Spatial domain error concealment utilizes information from the spatial smoothness nature of the video image. Each missing pixel of the corrupted image part is interpolated from the intact surroundings pixels. Weighted averaging is an example of a spatial domain error concealment method. [3] Temporal domain error concealment utilizes the temporal smoothness between adjacent frames within the video sequence. The simplest implementation of this method is replacing the missing EE 5359 Multimedia Processing image part with the spatially corresponding part inside a previously decoded frame, which has maximum correlation with the affected frame. Examples of temporal domain error concealment methods include the copy-paste algorithm, the boundary matching algorithm and the block matching algorithm. [4] The most effective methods for error concealment though are hybrid error concealment methods which are created as a combination of temporal and spatial methods. These methods can adequately exploit the spatial and temporal relativities of video sequence to adaptively select the spatial concealment or temporal concealment according to the boundary match criterion. [5] Typical parameters used to evaluate the quality of reconstruction include: peak signal to noise ratio (PSNR) and structural similarity index metric (SSIM) [18]. Spatial Error Concealment: All error concealment methods in spatial domain are based on the same idea which says that the pixel values within the damaged macroblocks can be recovered by a specified combination of the pixels surrounding the damaged macroblocks. In this technique, the interpixel difference between adjacent pixels for an image is determined. The interpixel difference is defined as the average of the absolute difference between a pixel and its four surrounding pixels. This property is used to perform error concealment. The first step in implementing spatial based error concealment is to interpolate the pixel values within the damaged macroblock from four next pixels in its four 1-pixel wide boundaries as shown in figure 10. This method is known as ‘weighted averaging’ [21], because the missing pixel values can be recovered by calculating the average pixel values from the four pixels in the four 1-pixel wide boundaries of the damaged macroblock weighted by the distance between the missing pixel and the four macroblocks boundaries (upper, down, left and right). EE 5359 Multimedia Processing Figure 10: Weighted Averaging algorithm for spatial error concealment [22] The formula used for weighed averaging is as follows [22]: EE 5359 Multimedia Processing Temporal Error Concealment: It is easier to conceal linear movements in one direction because pictures can be predicted from previous frames (the scene is almost the same). If there are movements in many directions or scene cuts, finding a part of previous frame that is similar is more difficult, or even impossible. Copy paste Algorithm: It replaces the missing image part with the spatially corresponding part inside a previously decoded frame, which has maximum correlation with the affected frame [4] as shown in figure 11. The formula used in the copy paste algorithm is as follows: Figure 11: Copy paste algorithm Boundary matching [16]: Let B be the area corresponding to a one pixel wide boundary of a missing block in the nth frame Fn. Motion vectors of the missing block as well as those of its neighbors are unknown. The coordinates [ˆx, ˆy] of the best match to B within the search area A in the previous frame Fn−1 have to be found. The equation used is as follows: EE 5359 Multimedia Processing The sum of absolute differences (SAD) is chosen as a similarity metric for its low computational complexity. The size of B depends on the number of correctly received neighbors M, boundaries of which are used for matching as shown in figure 12 below. Figure 12: Boundary matching algorithm [16] Block matching [16]: Better results can be obtained by looking for the best match for the correctly received MB on top, bottom, left or right side of the missing MB. The equation used is as follows: where ‘AD’ represents the search area for the best match of MBD, with its center spatially corresponding to the start of the missing MB. The final position of the best match is given by an average over the positions of the best matches found for the neighboring blocks, computed as follows: The MB sized area starting at the position [ˆx, ˆy] in Fn−1 is used to conceal the damaged MB in Fn. To reduce the necessary number of operations, only parts of the neighboring MBs can be used for the MV search as shown in figure 13. EE 5359 Multimedia Processing Figure 13: Block matching [16] EE 5359 Multimedia Processing Results observed: YUV File 1: suzie_qcif.yuv Specifications: QCIF sequence: suzie_qcif.yuv Total number of frames: 150 Height: 176; Width: 144 Total number of frames used: 20 Frame rate: 30 frames/second Average Bit Quantization Average Average PSNR Original Rate Parameter PSNR after after weighted PSNR(dB) (Kbps) copy paste averaging (dB) (No Errors) (dB) 28 69.92 37.064 33.194 30.23 24 138.05 39.531 33.502 30.783 20 291.98 42.707 34.948 31.984 Table 1: Comparison of Average PSNR values using the sequence suzie_qcif before and after error concealment EE 5359 Multimedia Processing 50 45 40 PSNR in dB 35 30 25 20 15 10 No errors Weighted averaging Copy-paste 5 0 0 50 100 150 200 Bit rate in kbps 250 300 350 Figure 14: Plot of the comparison between the weighted averaging and copy paste algorithms of error concealment using the suzie_qcif video sequence EE 5359 Multimedia Processing YUV File 2: foreman_qcif.yuv Specifications: QCIF sequence: foreman_qcif.yuv Total number of frames: 300 Height: 176; Width: 144 Total number of frames used: 20 Frame rate: 30 frames/second Average Bit Quantization Average Average PSNR Original Rate Parameter PSNR after after weighted PSNR(dB) (Kbps) copy paste averaging (dB) (No Errors) (dB) 28 180.65 36.976 33.141 32.54 24 292.03 39.678 34.624 33.233 20 492.14 42.861 36.413 35.784 Table 2: Comparison of Average PSNR values using the sequence foreman_qcif before and after error concealment EE 5359 Multimedia Processing 50 45 40 PSNR in dB 35 30 25 20 15 10 No errors Weighted averaging Copy-paste 5 0 100 150 200 250 300 350 Bit rate in kbps 400 450 500 550 Figure 15: Plot of the comparison between the weighted averaging and copy paste algorithms of error concealment using the foreman_qcif video sequence EE 5359 Multimedia Processing Conclusions: The conclusions drawn from the implementation of the weighted averaging algorithm and the copy-paste algorithm types of the spatial and temporal domain error concealment techniques respectively are: Spatial error concealment using the weighted averaging algorithm is in general more effective than the temporal error concealment technique using the copy-paste algorithm as is seen in Tables 1,2 and Figures 14,15. If the screen cuts are slow and there are no fast movements between the frames then the difference in effectiveness between these two techniques is very small. The copy-paste algorithm may be more effective than the weighted averaging algorithm in rare cases such as when the error is contained in the background which does not change over a sequence of frames. It was ultimately concluded that a hybrid error concealment technique which adaptively switches between the spatial and temporal error concealment techniques would be most effective for error concealment. Future research: Possible areas in which future work on this subject may be undertaken include the following: Further enhancing error detection by locating random and burst errors. Implementation of a hybrid error concealment technique which combines the advantages of both the spatial and temporal techniques of error concealment. Implementation of temporal error concealment using motion vector estimation. Error concealment in other codecs like AVS China[26] and Dirac Pro[27]. EE 5359 Multimedia Processing References: [1] Y. Xu and Y. Zhou, “H.264 Video Communication Based Refined Error Concealment Schemes”, IEEE Transactions on Consumer Electronics, vol. 50, issue 4, pp. 1135–1141, November 2004. [2] M. Wada, “Selective Recovery of Video Packet Loss using Error Concealment,” IEEE Journal on Selected Areas in Communication, vol. 7, issue 5, pp. 807-814, June 1989. [3] Y. Chen, et al, “An Error Concealment Algorithm for Entire Frame Loss in Video Transmission”, Microsoft Research Asia, Picture Coding Symposium, December 2004. [4] H. Ha, C. Yim and Y. Y. Kim, “Packet Loss Resilience using Unequal Forward Error Correction Assignment for Video Transmission over Communication Networks”, ACM digital library on Computer Communications, vol. 30, pp. 3676-3689, December 2007. [5] X. Xiu, L. Zhuo and L. Shen, "A hybrid error concealment method based on H.264 standard", 8th International Conference on Signal Processing, vol. 2, April 2006. [6] G. Sullivan, P. Topiwala and A. Luthra, "The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions", SPIE Conference on Applications of Digital Image Processing XXVII, vol. 5, pp. 454-474, November 2004. [7] R. Schafer, T. Wiegand and H. Schwarz, “The emerging H.264/AVC standard,” EBU Technical Review, Special Issue on Best of 2003, January 2003. [8] T. Wiegand, et al, “Overview of the H.264/AVC Video Coding Standard” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 560-576, June 2003. [9] S. K. Bandyopadhyay, et al, “An error concealment scheme for entire frame losses for H.264/AVC”, IEEE Sarnoff Symposium, pp. 1-4, March 2006. [10] Y. Xu and Y. Zhou, "Adaptive Temporal Error Concealment Scheme for H.264/AVC Video Decoder", IEEE Transactions on Consumer Electronics, vol. 54, issue 4, pp. 1846 – 1851, November 2008. [11] D. Levine, W. Lynch and T. Le-Ngoc, "Observations on Error Detection in H.264", 50th Midwest Symposium on Circuits and Systems, pp. 815-818, August 2007. [12] B. Hrušovský, J. Mochná and S. Marchevský, "Temporal-spatial Error Concealment Algorithm for Intra-Frames in H.264/AVC Coded Video", 20th International Conference Radioelektronika, pp. 1-4, April 2010. EE 5359 Multimedia Processing [13] W. Kung, C. Kim and C. Kuo "Spatial and Temporal Error Concealment Techniques for Video Transmission Over Noisy Channels", IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, issue 7, pp. 789-803, July 2006. [14] S. Kwon, A. Tamhankar and K.R. Rao, “Overview of H.264 / MPEG-4 Part 10”, J. Visual Communication and Image Representation, vol. 17, pp. 186-216, April 2006. [15] S. Kumar, et al, “Error Resiliency Schemes in H.264/AVC Standard”, IEEE Military Communications Conference, pp. 1-6, October 2006. [16] Ignacio Cort Todoli, “Performance of Error Concealment Methods for Wireless Video”, Diploma Thesis, Vienna University of Technology, 2007. [17] M.S. Koul, “Error Concealment And Performance Evaluation Of H.264/AVC Video Streams In A Lossy wireless Environment”, Department of Electrical Engineering, University of Texas at Arlington, May 2008. [18] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004. [19] H.264/AVC Reference Software Download: http://iphome.hhi.de/suehring/tml/download/ [20] L. Liu, S. Zhang, X. Ye and Y. Zhang, “Error Resilience Schemes of H.264/AVC for 3G Conversational Video”, Proc. IEEE Conf. Computer and Information Technology, pp. 657- 661, Sept. 2005. [21] M.T. Sun, and A.R. Reibman, Compressed Video over Networks, Marcel Dekker, New York, 2001. [22] V.S. Kolkeri "Error Concealment Techniques In H.264/AVC ,for Video Transmission Over Wireless Networks", M.S. Thesis, Department of Electrical Engineering, University of Texas at Arlington, December 2009. [23] YUV video sequences; website: http://trace.eas.asu.edu/yuv/ [24] MSU video quality measurement tool: http://compression.ru/video/quality_measure/video_measurement_tool_en.html [25] “JVT Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T rec. H.264– ISO/IEC 14496-10 AVC),” March 2003, JVT-G050 available on http://ip.hhi.de/imagecom_G1/assets/pdfs/JVT- G050.pdf. [26] The Dirac web page: http://www.bbc.co.uk/rd/projects/dirac/technology.shtml EE 5359 Multimedia Processing [27] AVS China software: ftp://159.226.42.57/public/avs_doc/avs_software [28] Iain Richardson, “The H.264 advanced video compression standard”, 2nd edition, pp 30-60, published by John Wiley and Sons Ltd, June 2010.