A NEW ERROR CONCEALMENT SCHEME FOR WHOLE FRAME LOSS IN VIDEO TRANSMISSION Jialue Fan, Xudong Zhang and Yu Chen Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China ABSTRACT In video streaming applications, packet loss usually leads to the loss of an entire video frame. Based on the correlations of motion vectors and mode information in consecutive frames, this paper proposes a novel bi-directional error concealment scheme. The algorithm can achieve finer result especially in high motion scenes by pixel level processing. Simulation results show that the proposed method outperforms the existing methods both on the PSNR and visual quality. Also, it is very efficient in stopping temporal error propagation. Moreover, the SKIP mode judgment in the algorithm can reduce the computational complexity effectively without significant loss of video quality. Index Terms— Video signal processing, Error concealment 1. INTRODUCTION With the rapid development of wireless communication, the demand of transmitting video over Internet and wireless channels is increasing fast. However, practical communication channel is not reliable where video stream is usually corrupted by channel error, packet loss and congestion. Since the majority of existing video coding standards adopt predictive coding and variable-length coding to obtain high compression performance, an incorrectly reconstructed macroblock (MB) can lead to error propagation spatially and temporally [1]. Therefore, compression and transmission provisions that can constrain occurrence of error or lost data are certainly necessary. Lots of robust video coding techniques including source coding, channel coding and error concealment following decoding have been studied [2]. In general, most decoder error concealment methods reconstruct the lost video content by assuming smoothness and continuity of the pixel values. Spatial, temporal or spatial/temporal interpolation is utilized in these methods. Unlike other techniques, error concealment does not increase the bit rate, nor does it require a change of the encoder and it can be easily applied in existing video streaming applications. So it is preferable in low bit rate realtime video communications [3]. In video streaming application over 3GPP network, one video packet carries a whole frame in order to save transmission bandwidth. Thus, the loss of a packet can result in the loss of the whole frame. In high bit rate application, even one video frame can be divided into many video packets to transmit, traffic congestion can erase them simultaneously. As a result, the majority of a frame is missed and some problems can occur, e.g., spatial error concealment is not feasible [4]. Hence, the entire coded picture is difficult to recover using classical concealment algorithms. S.Belfiore et al proposed a multi-frame motion vector averaging algorithm (MMA) [4] which is capable of estimating the whole missing frame. Algorithms based on multi-frame optical flow estimation have been proposed [5, 6, 7, 8]. In [5], an algorithm based on 4×4 block level MV estimation (BMVE) has been presented. In [6], the projection and reestimation of MVs are separated into two stages. An estimator of the lacking motion information for each intra coded macroblock has been introduced in [7]. A bi-directional temporal error concealment method (BTEC) has been proposed in [8], which extrapolates MVs from the MVs of the previous and the next frame. In this paper, we propose a new bi-directional error concealment algorithm for H.264/AVC to recover the loss of a whole frame. Not only the correlations between adjacent MVs, but also the correlations between adjacent modes are adopted in this algorithm. To achieve a smooth and fine result, pixel level estimation is used here. Moreover, we introduce a bilinear interpolation scheme for compensation. 2. BI-DIRECTIONAL TEMPORAL ERROR CONCEALMENT Conventional temporal error concealment approaches are usually based on the MVs of the previous received frames. However, there are correlations between the missing frame and its next frame. There are also correlations between the modes of consecutive frames. For example, the background areas are often encoded as SKIP mode. If the co-located MBs in several adjacent frames are all encoded as SKIP mode, there is a high probability that these MBs describe static areas. Consistent with this idea, our algorithm can be separated into four steps: judgment of SKIP mode, MV extrapolation, forward/backward estimation and bi-directional compensation. We use an ordered pair (i, j, k) to denote a pixel at po- sition (i, j) of frame k. Since we plan to estimate MVs at the pixel level, we define MVs for each pixel instead of each MB, i.e., let M V x(i, j, k) and M V y(i, j, k) be the MVs of (i, j, k). Let P (i, j, k) be the value of (i, j, k). And we assume to have lost the frame t. In addition, the video stream is compressed as IPPP structure in our discussion. 2.1. Judgment of SKIP Mode The first step is to determine whether a MB in the missing frame is encoded as SKIP mode. For each MB in the frame t, if the co-located MBs in the previous two frames and the next one frame are all encoded as SKIP mode, we assume the area of the missing MB describes a static region or background. Hence, we let the missing MB be encoded as SKIP mode, and the pixels in this MB are directly compensated by copying the co-located pixels in the previous frame. Otherwise we go to MV extrapolation. To evaluate the performance of this judgment, we calculate the right percentage by statistical experiments. For each MB xn , let xn−2 , xn−1 and xn+1 be its co-located MB in the previous two frames and the next frame. And we denote the parameters P1 and P2 as follows: P1 : The percentage of xn is SKIP mode when xn−2 , xn−1 and xn+1 are all SKIP modes. P2 : The percentage of xn is SKIP mode or 16 × 16 mode with zero MV when xn−2 , xn−1 and xn+1 are all SKIP modes. We test several video sequences to obtain P1 and P2 . Testing sequences are Miss America(M), Claire(CL), News(N), Susie(S), Carphone(CA) and Foreman(F). The experimental results are shown in Table 1. From Table 1, P1 and P2 are more than 90% in low motion sequences and 70% in high motion sequences, which verifies our previous judgment. In fact, P2 is more valuable than P1 because we don’t have the residue information of the missing MB, thus a 16 × 16 mode with zero MV is equivalent to SKIP mode in this case. 2.2. MV Extrapolation (MVE) The proposed pixel level MVE method, which aims to select the best estimated MV, is based on MV projection [5]. For each pixel (i, j, t − 1), we project its inverted MV onto the pixel (i−M V x(i, j, t−1), j−M V y(i, j, t−1), t) in the frame t. Since MV is at quarter-pel accuracy in H.264, the inverted MV does not always point to the integer pixels. Therefore, we define a pixel (m, n, t) is covered by (i, j, t − 1), if and only if it is close enough to the pixel (i − M V x(i, j, t − 1), j − M V y(i, j, t − 1), t), i.e., |m − (i − M V x(i, j, t − 1))| ≤ 1/2 |n − (j − M V y(i, j, t − 1))| ≤ 1/2 (1) (2) For the pixels in non-SKIP MBs in the frame t, we extrapolate the forward MVs in the following steps: Table 1. The right percentage of SKIP mode judgment M(%) CL(%) N(%) S(%) CA(%) F(%) P1 92.4 94.8 95.9 90.5 80.4 70.7 P2 97.8 98.4 98.8 93.6 89.5 76.4 First of all, in low motion scenes, it has been found in [8] that directly repeating the MVs of the previous frame works well. Let a pixel be a multi-covered pixel if it is covered by more than one pixel. Then the scenes that possess low motion also have a small number of multi-covered pixels [8]. Thus, in our algorithm, for all the pixels which are in non-SKIP mode MBs in the frame t, if less than 50% of them are multicovered pixels, the pixels’ MVs are directly duplicated from the co-located MVs in the frame t − 1. Then finish forward MVE. Otherwise go to step (i) and (ii): (i) For a pixel that is covered by at least one pixel in the frame t − 1, its MV is estimated by averaging the MVs falling into it. We define a threshold T HRmv in order to avoid wild values, the MVs s.t. |M V x| > T HRmv or |M V y| > T HRmv should be excluded. (ii) For a pixel that is not covered by any of the pixel in the frame t − 1, we duplicate the MV from the co-located pixel in the frame t − 1. After (i) and (ii), forward MVE is finished. Then we extrapolate the backward MV of each pixel in a similar way: For each pixel (i, j, t + 1), we project its MV onto the pixel (i + M V x(i, j, t + 1), j + M V y(i, j, t + 1), t) in the frame t. The following procedure is nearly the same as the forward MVE. The difference is that t − 1 should be replaced by t + 1. 2.3. Forward/backward estimation Pixel value in the missing frame is estimated in forward/backward estimation. Let f = (fx , fy ) and b = (bx , by ) be the forward and backward estimated MV of (i, j, t), respectively. Let Pf (i, j, t) and Pb (i, j, t) be the forward and backward estimated pixel value (i, j, t). Pf (i, j, t) can be recovered as follows: Pf (i, j, t) = P (i + fx , j + fy , t − 1) (3) Since (fx , fy ) does not always point to the integer pixel, we use a bi-linear interpolation to recover the pixel. In Fig.1, (i + fx , j + fy , t − 1) is in the center of four adjacent pixels. Let a, b, c and d be the values of the four pixels. Let x and y be the distance from (i + fx , j + fy , t − 1) to the two sides of the square, as illustrated in Fig.1. Using bi-linear interpolation, we have: P (i + fx , j + fy , t − 1) = (1 − x)(1 − y)a + x(1 − y)b +(1 − x)yc + xyd (4) Similarly, we can use (bx , by ) to obtain Pb (i, j, t). As (bx , by ) is an estimated MV using backward extrapolation, it Table 2. A frame-by-frame each concealed frame(dB) MMA BMVE M 37.86 37.85 CL 37.84 37.51 N 31.03 30.51 S 26.14 26.30 CA 27.56 27.00 F 29.92 30.11 Fig. 1. Bi-linear interpolation Miss America PSNR(dB) PSNR(dB) 31.16 38.62 38.6 0.5 Wf Claire BTEC 38.30 38.20 31.53 26.58 28.00 30.81 TR 36.80 37.54 30.77 25.30 27.59 27.71 Proposed 38.64 38.36 31.58 26.73 28.01 31.14 Foreman 38.64 38.58 0.4 comparison of mean PSNR for 31.12 31.1 0.4 0.6 Moreover, as we exploit pixel level error concealment, it is not necessary to use deblocking filtering after compensation. Experiment shows there is little PSNR degradation without deblocking filtering. However, the results are not reported here for brevity. 31.14 0.5 Wf Carphone 0.6 PSNR(dB) PSNR(dB) 28.02 38.36 38.34 38.32 0.4 0.5 Wf 0.6 3. EXPERIMENTAL RESULTS 28 27.98 27.96 0.4 0.5 Wf 0.6 Fig. 2. PSNR performance on different values of weight points to the pixel in the frame t−1 but not the frame t+1. So Pb (i, j, t) = P (i + bx , j + by , t − 1). The backward bi-linear interpolation is the same as the forward method. 2.4. Bi-directional compensation It has been found that weighted averaging of multiple candidate concealment can achieve better performance than single candidate concealment [3]. Accordingly, P (i, j, t) can be estimated as a combination of Pf (i, j, t) and Pb (i, j, t): P̃ (i, j, t) = wf Pf (i, j, t) + wb Pb (i, j, t) s.t.wf + wb = 1 (5) (6) where wf and wb are the weights of the forward and backward methods. The typical compensation performance based on different values of wf is shown in Fig.2. From Fig.2, we find that the maximum of PSNR occurs when wf is a little less than wb , which can be explained by the fact that (i, j, t−1) is sometimes pointed to by several MVs of the pixels in the frame t, but it covers only the nearest pixels in the frame t using MV projection. This phenomenon makes the forward method a little degradation in PSNR performance. Since the optimum wf is around 0.4 in low motion sequences and 0.5 in high motion sequences, we set wf = 0.45 and wb = 0.55 in our algorithm. We implemented the proposed algorithm on H.264 JM 8.2 reference software. All testing sequences are in QCIF size. The frame rate is set as 30 fps. One reference frame is used here. Quantization parameter QP = 28. T HRmv is set to 8 pixels. Only the first frame is encoded as an I-frame and all the others are encoded as P-frames. 3.1. Comparison We compare the proposed algorithm with MMA [4], BMVE [5], BTEC [8] and TR (temporal replacement) methods. In the comparison, each time a P-frame is dropped from a compressed sequence. It is then concealed by MMA, BMVE, BTEC, TR and the proposed algorithm respectively, finally their corresponding PSNR values are calculated. We repeated this process from the 11th frame to the 140th frame in the selected sequences. The results are shown in Table 2. From Table 2, the proposed algorithm has better performance than the reference algorithms. For the scenes that possess high motions like Foreman sequence, it achieves 1.22 dB, 1.03 dB and 0.33 dB to MMA, BMVE and BTEC on average respectively. For the scenes that have low motions like Miss America sequence, it still outperforms MMA, BMVE and BTEC with 0.78 dB, 0.79 dB and 0.34 dB. Hence, the proposed method can provide relatively satisfactory performance in various scenarios. For subjective evaluation, one error-free frame and three frames recovered by MMA, BMVE and the proposed algorithm are shown in Fig.3. We choose the 65th frame of Foreman sequence. In this scene, the man is raising his head and opening his mouth. Therefore, there is high motion in the face area. We can see the picture that is recovered by MMA or BMVE has obvious block artifacts in the face area. The proposed algorithm provides a finer presentation. Table 3. Complexity reduction by SKIP mode judgment M(%) CL(%) N(%) S(%) CA(%) F(%) P3 48.7 68.8 60.5 29.3 20.5 10.5 (a) error free sequences, which means that a high percentage of pixels are recovered without MVE. Accordingly, for low motion scenes, the computational complexity is highly reduced. For high motion scenes, we can still save more than 10% complexity. (b) MMA 4. CONCLUSION (c) BMVE (d) Proposed Fig. 3. Subjective quality of the 65th frame of Foreman 5. REFERENCES Foreman 36 Proposed MMA BMVE BTEC TR 35 34 33 [1] Y.Wang, S.Wenger, J.Wen, and A.Katsaggelos, “Error resilient video coding techniques,” Signal Processing Magazine, vol. 17, pp. 61–82, July 2000. 32 PSNR(dB) In this paper, we propose a new bi-directional error concealment method. The main objective is to recover the loss of a whole frame. In our algorithm, we exploit the pixel level MV estimation and bi-linear interpolation to reconstruct the lost frame. The mode information is exploited to reduce computational complexity. From analysis and experimental results, the proposed algorithm has better performance than the existing methods both on PSNR and visual quality. In addition, it is efficient in stopping error propagation. [2] S.Hemami, “Robust video coding - an overview,” Proc. IEEE ICASSP, vol. 5, pp. 901–904, 2005. 31 30 29 [3] M.E.Al-Mualla, N. Canagarajah, and D.R.Bull, “Multiple reference temporal error concealment,” Proc. IEEE ISCAS, vol. 5, pp. 149–152, 2001. 28 27 26 25 50 60 70 80 Frame number 90 100 110 [4] S.Belfiore, M.Grangetto, E.Magli, and G.Olmo, “An error concealment algorithm for streaming video,” Proc. IEEE ICIP, vol. 3, pp. 649–652, 2003. Fig. 4. The recovery of PSNR after a lost frame [5] P.Baccichet and A.Chimienti, “A low complexity concealment algorithm for the whole-frame loss in H.264/AVC,” IEEE Workshop on Multimedia Signal Processing, 2004. Fig.4 illustrates the recovery of PSNR after a loss occurs at the 65th frame of the Foreman sequence. We can see that the proposed algorithm is more efficient in stopping error propagation than MMA, BMVE, BTEC and TR. 3.2. Complexity In our algorithm, the computational complexity is reduced by SKIP mode judgment and skipping deblocking filtering compared to reference methods. Here, we study the complexity reduction caused by SKIP mode judgment. Let P3 be the percentage of the MBs which meet the condition of SKIP mode judgment. The statistical result of P3 is shown in Table 3. From Table 3, P3 is more than 40% in low motion video [6] Z. Wu and J.M.Boyce, “An error concealment scheme for entire frame losses based on H.264/AVC,” Proc. IEEE ISCAS, 2006. [7] E.Quacchio, E.Magli, G.Olmo, P.Baccichet, and A.Chimienti, “An error concealment scheme for entire frame losses based on H.264/AVC,” Proc. IEEE ICASSP, vol. 2, pp. 329–332, 2005. [8] Y. Chen, K. Yu, J.Li, and S.Li, “An error concealment algorithm for entire frame loss in video transmission,” Proc. Picture Coding Symposium, 2004.