Final Report - The University of Texas at Arlington

advertisement
Multimedia Processing
Term project
ERROR CONCEALMENT TECHNIQUES IN H.264
VIDEO TRANSMISSION OVER WIRELESS
NETWORKS
Spring 2011
Dr. K. R. Rao
Murtaza Mustafa Zaveri (1000671952)
murtaza.zaveri@mavs.uta.edu
EE 5359
Multimedia Processing
Acknowledgement
I would like to acknowledge the guidance of Dr. K.R. Rao (Electrical Engineering Department at
the University of Texas at Arlington) and his insightful support and inspiration throughout the
various stages of this project. I sincerely appreciate the help and advice given by Dr. Rao which
went a long way in helping me understand the key underlying concepts of the project.
Several of the resources available at the University of Texas at Arlington were of great assistance
in the fruitful completion of this project. I would also like to thank my fellow students for their
valuable and timely inputs.
EE 5359
Multimedia Processing
List of figures and tables:
Figure 1: Position of H.264/MPEG-4 AVC standard
Figure 2: Block diagram of a H.264 encoder
Figure 3: Block diagram of a H.264 decoder
Figure 4: Illustration of H.264/MPEG4-AVC profiles
Figure 5: Typical situation in 3G/4G cellular telephony
Figure 6: Illustration of spatio-temporal error propagation
Figure 7: Illustration of error propagation
Figure 8: VCL/NAL layers of H.264
Figure 9: Sizes of various macroblocks
Figure 10: Weighted Averaging
Figure 11: Copy paste algorithm
Figure 12: Boundary matching algorithm
Figure 13: Block matching
Figure 14: Plot of the comparison between the weighted averaging and copy paste algorithms of
e
error concealment using the suzie_qcif video sequence
Figure 15: Plot of the comparison between the weighted averaging and copy paste algorithms of
e
error concealment using the foreman_qcif video sequence
Table 1: Comparison of Average PSNR values using the sequence foreman_qcif before and after
e
error concealment
Table 2: Comparison of Average PSNR values using the sequence foreman_qcif before and after
e
error concealment
EE 5359
Abbreviations and Acronyms:
PSNR - Peak Signal to Noise Ratio
AVC - Advanced Video Coding
JVT - Joint Video Team
VCEG - Video Coding Experts Group
MPEG - Moving Picture Experts Group
CAVLC - Context adaptive variable length coding
CABAC - Context adaptive binary arithmetic coding
QCIF - Quadrature Common Intermediate Format
MMS - Multimedia Messaging Service
VCL - Video Coding Layer
NAL - Network Abstraction Layer
MB - Macroblock
MV - Motion Vector
SSIM - Structural Similarity Index Metric
SAD - Sum of Absolute Differences
Multimedia Processing
EE 5359
Multimedia Processing
Abstract:
In the last few years, the birth and the development of various wireless technologies have been
seen. Cellular telephony has been the most important development in recent times. Initially,
cellular telephony was conceived for voice communication; however, nowadays it is able to
provide a diverse variety of services, such as data, audio and video transmission due to the
advent of the third and fourth generation (3G/4G) developments in cellular telephony.
The H.264 [14] standard of video compression is used for a large range of video consumer
applications, such as television broadcasting, streaming multimedia and video conferencing.
Wireless video transmission suffers due to the imperfections in the communication channel,
which often result in packet losses, and lead to frame loss or corrupted areas in the decoded
frame. This kind of corruption tends to spread spatio-temporally in the current and consecutive
frames.
In order to curb the effects of this corruption various error resilience and concealment techniques
are currently used in the H.264 standard [1]. Two error concealment techniques are used in order
to demonstrate the effects of error concealment in mitigating corruption in both the spatial and
temporal domains, thereby improving the quality of wireless video transmission. Also discussed
is the use of metrics like the peak signal to noise ratio (PSNR) in order compare and evaluate the
quality of reconstruction of the end video from the erroneous video received.
EE 5359
Multimedia Processing
The H.264 standard:
H.264 / MPEG-4 (Part 10) Advanced Video Coding (commonly referred as H.264/AVC) [14] is
the newest entry in the series of international video coding standards. It is currently the most
powerful and state-of-the-art standard, and was developed by a Joint Video Team (JVT)
consisting of experts from ITU-T’s Video Coding Experts Group (VCEG) and ISO/IEC’s
Moving Picture Experts Group (MPEG). As has been the case with past standards, its design
provides the most current balance between the coding efficiency, implementation complexity,
and cost – based on state of VLSI design technology (CPUs, DSPs, ASICs, FPGAs, etc.). In the
process, a standard was created that improved coding efficiency by a factor of at least about two
(on average) over MPEG-2 [6] – the most widely used video coding standard today – while
keeping the cost within an acceptable range. In July, 2004, a new amendment was added to this
standard, called the Fidelity Range Extensions (FRExt, Amendment 1), which demonstrates even
further coding efficiency against MPEG-2, potentially by as much as 3:1 for some key
applications[6]. Figure 1 shows the evolution of the H.264 standard and illustrates its position.
Figure 1: Position of H.264/MPEG-4 AVC standard [9]
EE 5359
Multimedia Processing
The basic source coding algorithm used in H.264 is a hybrid of inter-picture prediction, in order
to exploit the temporal statistical dependencies and transform coding of the prediction. Figures 2
and 3 show the block diagrams of the H.264 encoder and decoder respectively.
The picture is essentially split into blocks. The first picture of a sequence or a random access
point is typically “Intra” coded, i.e., without using information other than that contained in the
picture itself. Each sample of a block in an Intra frame is predicted using spatially neighboring
samples of previously coded blocks.
Figure 2: Block diagram of a H.264 encoder [14]
EE 5359
Multimedia Processing
Figure 3: Block diagram of a H.264 decoder [14]
H.264 Encoder – Profiles
The seven prominent profiles used in H.264 are:
• Baseline profile
• Main profile
• Extended profile
• High Profile
• High 10 Profile
• High 4:2:2 Profile
• High 4:4:4 Profile
EE 5359
Multimedia Processing
Figure 4 below illustrates the specific coding profiles of the H.264 standard.
Figure 4: Illustration of H.264/MPEG4-AVC profiles [8]
H.264 Encoder baseline profile:
This project was implemented on the baseline profile. The baseline profile is primarily designed
for:
• Low processing power platforms
• Error prone transmission environments
Its features include:
• Low on coding efficiency
• I- and P- slice coding
• Enhanced error resilience coding such as flexible macroblock ordering
• Context adaptive variable length coding (CAVLC)
EE 5359
Multimedia Processing
Features not included in baseline profile are as follows:
• B- slices, SI- or SP- slices.
• Interlace coding tools.
• Context adaptive binary arithmetic coding (CABAC)
Error Propagation:
A typical situation in mobile telephony is as shown in figure 5 below:
Figure 5: Typical situation in 3G/4G cellular telephony [20]
A user with a mobile device solicits a video service. The video stream is obtained from the
application server over the network and is transmitted over the wireless environment to the user.
During the transmission, video sequence may undergo problems due to the fact that it is an error
prone environment. This system, because of the bandwidth limitation, works with low resolution
(QCIF 176 x 144) videos, thus the loss of one packet means a big loss of information. As this
process is a real time process, it is not possible to perform retransmissions. The only way to fix
the errors produced by packet losses is to use error concealment methods at the mobile terminal.
Major categories for video services over the wireless systems include:
 Multimedia messaging services (MMS)
 Packet-switched pre-coded streaming service
 Circuit-switched and packet-switched conversational services.
Conversational services are especially susceptible to error as the time-delay requirement here is
stringent.
EE 5359
Multimedia Processing
As mentioned before, wireless video transmission suffers due to the imperfections in the
communication channel, which often results in packet losses, and leads to frame loss or
corrupted areas in the decoded frame. This kind of corruption tends to spread spatio-temporally
in the current and consecutive frames as shown in figure 6. This is because H.264 employs
predictive coding. H.264 is thus susceptible to error propagation due to channel noise which in
turn leads to a considerable degradation in the video quality. [1]
Figure 6: Illustration of spatio-temporal error propagation [19]
An illustration of the spread of the error in the spatial and temporal domain is shown in the
sequence of frames in the figure 7 below:
Figure 7: Illustration of error propagation [17]
EE 5359
Multimedia Processing
Error Resilience:
In order to provide better coding efficiency, the H.264 standard gives strong emphasis to error
resiliency and the adaptability to various networks. H.264/AVC has adopted a two-layer
structure design containing a video coding layer (VCL), which is designed to obtain highly
compressed video data, and a network abstraction layer (NAL), which formats the VCL data and
adds corresponding header information for adaptation to various transportation protocols or
storage media [19]. Figure 8 illustrates the VCL/NAL layers of H.264.
Figure 8: VCL/NAL layers of H.264 [19]
To perform video coding, a frame is divided into macroblocks (MBs). For each MB, motion
estimation finds the best match from the reference frame(s) by minimizing the difference
between the current MB and the candidate MBs (from the reference frame). The sizes of the
various MBs are as shown in figure 9. These residual MBs form a residual frame that is
essentially the difference between the current frame and the corresponding motion compensated
predicted frame. Simultaneously, motion vectors (MVs) are used to encode the locations of MBs
that have been used to each MB in the current frame. The residual frame is then transformed
through DCT or integer transform, and quantized. [15]
EE 5359
Multimedia Processing
Figure 9: Sizes of various macroblocks [25]
Error Concealment:
The main task of error concealment is to replace missing parts of the video content by previously
decoded parts of the video sequence in order to eliminate or reduce the visual effects of bit
stream error. Error concealment exploits the spatial and temporal correlations between the
neighboring image parts within the same frame or from the past and future frames. [2]
The various error concealment methods can be divided into two categories: error concealment
methods in the spatial domain and error concealment methods in the time domain.
Spatial domain error concealment utilizes information from the spatial smoothness nature of the
video image. Each missing pixel of the corrupted image part is interpolated from the intact
surroundings pixels. Weighted averaging is an example of a spatial domain error concealment
method. [3]
Temporal domain error concealment utilizes the temporal smoothness between adjacent frames
within the video sequence. The simplest implementation of this method is replacing the missing
EE 5359
Multimedia Processing
image part with the spatially corresponding part inside a previously decoded frame, which has
maximum correlation with the affected frame. Examples of temporal domain error concealment
methods include the copy-paste algorithm, the boundary matching algorithm and the block
matching algorithm. [4]
The most effective methods for error concealment though are hybrid error concealment methods
which are created as a combination of temporal and spatial methods. These methods can
adequately exploit the spatial and temporal relativities of video sequence to adaptively select the
spatial concealment or temporal concealment according to the boundary match criterion. [5]
Typical parameters used to evaluate the quality of reconstruction include: peak signal to noise
ratio (PSNR) and structural similarity index metric (SSIM) [18].
Spatial Error Concealment:
All error concealment methods in spatial domain are based on the same idea which says that the
pixel values within the damaged macroblocks can be recovered by a specified combination of the
pixels surrounding the damaged macroblocks. In this technique, the interpixel difference
between adjacent pixels for an image is determined. The interpixel difference is defined as the
average of the absolute difference between a pixel and its four surrounding pixels. This property
is used to perform error concealment.
The first step in implementing spatial based error concealment is to interpolate the pixel values
within the damaged macroblock from four next pixels in its four 1-pixel wide boundaries as
shown in figure 10. This method is known as ‘weighted averaging’ [21], because the missing
pixel values can be recovered by calculating the average pixel values from the four pixels in the
four 1-pixel wide boundaries of the damaged macroblock weighted by the distance between the
missing pixel and the four macroblocks boundaries (upper, down, left and right).
EE 5359
Multimedia Processing
Figure 10: Weighted Averaging algorithm for spatial error concealment [22]
The formula used for weighed averaging is as follows [22]:
EE 5359
Multimedia Processing
Temporal Error Concealment:
It is easier to conceal linear movements in one direction because pictures can be predicted from
previous frames (the scene is almost the same). If there are movements in many directions or
scene cuts, finding a part of previous frame that is similar is more difficult, or even impossible.
Copy paste Algorithm:
It replaces the missing image part with the spatially corresponding part inside a previously
decoded frame, which has maximum correlation with the affected frame [4] as shown in figure
11. The formula used in the copy paste algorithm is as follows:
Figure 11: Copy paste algorithm
Boundary matching [16]:
Let B be the area corresponding to a one pixel wide boundary of a missing block in the nth frame
Fn. Motion vectors of the missing block as well as those of its neighbors are unknown. The
coordinates [ˆx, ˆy] of the best match to B within the search area A in the previous frame Fn−1
have to be found. The equation used is as follows:
EE 5359
Multimedia Processing
The sum of absolute differences (SAD) is chosen as a similarity metric for its low computational
complexity. The size of B depends on the number of correctly received neighbors M, boundaries
of which are used for matching as shown in figure 12 below.
Figure 12: Boundary matching algorithm [16]
Block matching [16]:
Better results can be obtained by looking for the best match for the correctly received MB on top,
bottom, left or right side of the missing MB. The equation used is as follows:
where ‘AD’ represents the search area for the best match of MBD, with its center spatially
corresponding to the start of the missing MB.
The final position of the best match is given by an average over the positions of the best matches
found for the neighboring blocks, computed as follows:
The MB sized area starting at the position [ˆx, ˆy] in Fn−1 is used to conceal the damaged MB in
Fn. To reduce the necessary number of operations, only parts of the neighboring MBs can be
used for the MV search as shown in figure 13.
EE 5359
Multimedia Processing
Figure 13: Block matching [16]
EE 5359
Multimedia Processing
Results observed:
YUV File 1: suzie_qcif.yuv
Specifications:
QCIF sequence: suzie_qcif.yuv
Total number of frames: 150
Height: 176; Width: 144
Total number of frames used: 20
Frame rate: 30 frames/second
Average
Bit
Quantization
Average
Average PSNR
Original
Rate
Parameter
PSNR after
after weighted
PSNR(dB)
(Kbps)
copy paste
averaging (dB)
(No Errors)
(dB)
28
69.92
37.064
33.194
30.23
24
138.05
39.531
33.502
30.783
20
291.98
42.707
34.948
31.984
Table 1: Comparison of Average PSNR values using the sequence suzie_qcif before and after
error concealment
EE 5359
Multimedia Processing
50
45
40
PSNR in dB
35
30
25
20
15
10
No errors
Weighted averaging
Copy-paste
5
0
0
50
100
150
200
Bit rate in kbps
250
300
350
Figure 14: Plot of the comparison between the weighted averaging and copy paste algorithms of
error concealment using the suzie_qcif video sequence
EE 5359
Multimedia Processing
YUV File 2: foreman_qcif.yuv
Specifications:
QCIF sequence: foreman_qcif.yuv
Total number of frames: 300
Height: 176; Width: 144
Total number of frames used: 20
Frame rate: 30 frames/second
Average
Bit
Quantization
Average
Average PSNR
Original
Rate
Parameter
PSNR after
after weighted
PSNR(dB)
(Kbps)
copy paste
averaging (dB)
(No Errors)
(dB)
28
180.65
36.976
33.141
32.54
24
292.03
39.678
34.624
33.233
20
492.14
42.861
36.413
35.784
Table 2: Comparison of Average PSNR values using the sequence foreman_qcif before and after
error concealment
EE 5359
Multimedia Processing
50
45
40
PSNR in dB
35
30
25
20
15
10
No errors
Weighted averaging
Copy-paste
5
0
100
150
200
250
300
350
Bit rate in kbps
400
450
500
550
Figure 15: Plot of the comparison between the weighted averaging and copy paste algorithms of
error concealment using the foreman_qcif video sequence
EE 5359
Multimedia Processing
Conclusions:
The conclusions drawn from the implementation of the weighted averaging algorithm and the
copy-paste algorithm types of the spatial and temporal domain error concealment techniques
respectively are:
 Spatial error concealment using the weighted averaging algorithm is in general more
effective than the temporal error concealment technique using the copy-paste algorithm
as is seen in Tables 1,2 and Figures 14,15.
 If the screen cuts are slow and there are no fast movements between the frames then the
difference in effectiveness between these two techniques is very small.
 The copy-paste algorithm may be more effective than the weighted averaging algorithm
in rare cases such as when the error is contained in the background which does not
change over a sequence of frames.
It was ultimately concluded that a hybrid error concealment technique which adaptively switches
between the spatial and temporal error concealment techniques would be most effective for error
concealment.
Future research:
Possible areas in which future work on this subject may be undertaken include the following:
 Further enhancing error detection by locating random and burst errors.
 Implementation of a hybrid error concealment technique which combines the advantages
of both the spatial and temporal techniques of error concealment.
 Implementation of temporal error concealment using motion vector estimation.
 Error concealment in other codecs like AVS China[26] and Dirac Pro[27].
EE 5359
Multimedia Processing
References:
[1] Y. Xu and Y. Zhou, “H.264 Video Communication Based Refined Error Concealment
Schemes”, IEEE Transactions on Consumer Electronics, vol. 50, issue 4, pp. 1135–1141,
November 2004.
[2] M. Wada, “Selective Recovery of Video Packet Loss using Error Concealment,” IEEE
Journal on Selected Areas in Communication, vol. 7, issue 5, pp. 807-814, June 1989.
[3] Y. Chen, et al, “An Error Concealment Algorithm for Entire Frame Loss in Video
Transmission”, Microsoft Research Asia, Picture Coding Symposium, December 2004.
[4] H. Ha, C. Yim and Y. Y. Kim, “Packet Loss Resilience using Unequal Forward Error
Correction Assignment for Video Transmission over Communication Networks”, ACM digital
library on Computer Communications, vol. 30, pp. 3676-3689, December 2007.
[5] X. Xiu, L. Zhuo and L. Shen, "A hybrid error concealment method based on H.264 standard",
8th International Conference on Signal Processing, vol. 2, April 2006.
[6] G. Sullivan, P. Topiwala and A. Luthra, "The H.264/AVC Advanced Video Coding Standard:
Overview and Introduction to the Fidelity Range Extensions", SPIE Conference on Applications
of Digital Image Processing XXVII, vol. 5, pp. 454-474, November 2004.
[7] R. Schafer, T. Wiegand and H. Schwarz, “The emerging H.264/AVC standard,” EBU
Technical Review, Special Issue on Best of 2003, January 2003.
[8] T. Wiegand, et al, “Overview of the H.264/AVC Video Coding Standard” IEEE Transactions
on Circuits and Systems for Video Technology, vol. 13, pp. 560-576, June 2003.
[9] S. K. Bandyopadhyay, et al, “An error concealment scheme for entire frame losses for
H.264/AVC”, IEEE Sarnoff Symposium, pp. 1-4, March 2006.
[10] Y. Xu and Y. Zhou, "Adaptive Temporal Error Concealment Scheme for H.264/AVC Video
Decoder", IEEE Transactions on Consumer Electronics, vol. 54, issue 4, pp. 1846 – 1851,
November 2008.
[11] D. Levine, W. Lynch and T. Le-Ngoc, "Observations on Error Detection in H.264", 50th
Midwest Symposium on Circuits and Systems, pp. 815-818, August 2007.
[12] B. Hrušovský, J. Mochná and S. Marchevský, "Temporal-spatial Error Concealment
Algorithm for Intra-Frames in H.264/AVC Coded Video", 20th International Conference
Radioelektronika, pp. 1-4, April 2010.
EE 5359
Multimedia Processing
[13] W. Kung, C. Kim and C. Kuo "Spatial and Temporal Error Concealment Techniques for
Video Transmission Over Noisy Channels", IEEE Transactions on Circuits and
Systems for Video Technology, vol. 16, issue 7, pp. 789-803, July 2006.
[14] S. Kwon, A. Tamhankar and K.R. Rao, “Overview of H.264 / MPEG-4 Part 10”, J. Visual
Communication and Image Representation, vol. 17, pp. 186-216, April 2006.
[15] S. Kumar, et al, “Error Resiliency Schemes in H.264/AVC Standard”, IEEE Military
Communications Conference, pp. 1-6, October 2006.
[16] Ignacio Cort Todoli, “Performance of Error Concealment Methods for Wireless Video”,
Diploma Thesis, Vienna University of Technology, 2007.
[17] M.S. Koul, “Error Concealment And Performance Evaluation Of H.264/AVC Video
Streams In A Lossy wireless Environment”, Department of Electrical Engineering, University of
Texas at Arlington, May 2008.
[18] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, "Image quality assessment: From
error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4,
pp. 600-612, Apr. 2004.
[19] H.264/AVC Reference Software Download:
http://iphome.hhi.de/suehring/tml/download/
[20] L. Liu, S. Zhang, X. Ye and Y. Zhang, “Error Resilience Schemes of H.264/AVC for 3G
Conversational Video”, Proc. IEEE Conf. Computer and Information Technology, pp. 657- 661,
Sept. 2005.
[21] M.T. Sun, and A.R. Reibman, Compressed Video over Networks, Marcel Dekker, New
York, 2001.
[22] V.S. Kolkeri "Error Concealment Techniques In H.264/AVC ,for Video Transmission Over
Wireless Networks", M.S. Thesis, Department of Electrical Engineering, University of Texas at
Arlington, December 2009.
[23] YUV video sequences; website: http://trace.eas.asu.edu/yuv/
[24] MSU video quality measurement tool:
http://compression.ru/video/quality_measure/video_measurement_tool_en.html
[25] “JVT Draft ITU-T recommendation and final draft international standard of joint video
specification (ITU-T rec. H.264– ISO/IEC 14496-10 AVC),” March 2003, JVT-G050 available
on http://ip.hhi.de/imagecom_G1/assets/pdfs/JVT- G050.pdf.
[26] The Dirac web page: http://www.bbc.co.uk/rd/projects/dirac/technology.shtml
EE 5359
Multimedia Processing
[27] AVS China software: ftp://159.226.42.57/public/avs_doc/avs_software
[28] Iain Richardson, “The H.264 advanced video compression standard”, 2nd edition, pp 30-60,
published by John Wiley and Sons Ltd, June 2010.
Download