A New Error Concealment Scheme for Whole Frame Loss in Video

advertisement
A NEW ERROR CONCEALMENT SCHEME FOR WHOLE FRAME LOSS IN VIDEO
TRANSMISSION
Jialue Fan, Xudong Zhang and Yu Chen
Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China
ABSTRACT
In video streaming applications, packet loss usually leads to
the loss of an entire video frame. Based on the correlations of
motion vectors and mode information in consecutive frames,
this paper proposes a novel bi-directional error concealment
scheme. The algorithm can achieve finer result especially in
high motion scenes by pixel level processing. Simulation results show that the proposed method outperforms the existing methods both on the PSNR and visual quality. Also, it is
very efficient in stopping temporal error propagation. Moreover, the SKIP mode judgment in the algorithm can reduce
the computational complexity effectively without significant
loss of video quality.
Index Terms— Video signal processing, Error concealment
1. INTRODUCTION
With the rapid development of wireless communication, the
demand of transmitting video over Internet and wireless channels is increasing fast. However, practical communication
channel is not reliable where video stream is usually corrupted by channel error, packet loss and congestion. Since the
majority of existing video coding standards adopt predictive
coding and variable-length coding to obtain high compression
performance, an incorrectly reconstructed macroblock (MB)
can lead to error propagation spatially and temporally [1].
Therefore, compression and transmission provisions that can
constrain occurrence of error or lost data are certainly necessary.
Lots of robust video coding techniques including source
coding, channel coding and error concealment following decoding have been studied [2]. In general, most decoder error
concealment methods reconstruct the lost video content by
assuming smoothness and continuity of the pixel values. Spatial, temporal or spatial/temporal interpolation is utilized in
these methods. Unlike other techniques, error concealment
does not increase the bit rate, nor does it require a change
of the encoder and it can be easily applied in existing video
streaming applications. So it is preferable in low bit rate realtime video communications [3]. In video streaming application over 3GPP network, one video packet carries a whole
frame in order to save transmission bandwidth. Thus, the loss
of a packet can result in the loss of the whole frame. In high
bit rate application, even one video frame can be divided into
many video packets to transmit, traffic congestion can erase
them simultaneously. As a result, the majority of a frame is
missed and some problems can occur, e.g., spatial error concealment is not feasible [4]. Hence, the entire coded picture
is difficult to recover using classical concealment algorithms.
S.Belfiore et al proposed a multi-frame motion vector averaging algorithm (MMA) [4] which is capable of estimating
the whole missing frame. Algorithms based on multi-frame
optical flow estimation have been proposed [5, 6, 7, 8]. In
[5], an algorithm based on 4×4 block level MV estimation
(BMVE) has been presented. In [6], the projection and reestimation of MVs are separated into two stages. An estimator of the lacking motion information for each intra coded
macroblock has been introduced in [7]. A bi-directional temporal error concealment method (BTEC) has been proposed
in [8], which extrapolates MVs from the MVs of the previous
and the next frame.
In this paper, we propose a new bi-directional error concealment algorithm for H.264/AVC to recover the loss of a
whole frame. Not only the correlations between adjacent MVs,
but also the correlations between adjacent modes are adopted
in this algorithm. To achieve a smooth and fine result, pixel
level estimation is used here. Moreover, we introduce a bilinear interpolation scheme for compensation.
2. BI-DIRECTIONAL TEMPORAL ERROR
CONCEALMENT
Conventional temporal error concealment approaches are usually based on the MVs of the previous received frames. However, there are correlations between the missing frame and its
next frame. There are also correlations between the modes of
consecutive frames. For example, the background areas are
often encoded as SKIP mode. If the co-located MBs in several adjacent frames are all encoded as SKIP mode, there is a
high probability that these MBs describe static areas. Consistent with this idea, our algorithm can be separated into
four steps: judgment of SKIP mode, MV extrapolation, forward/backward estimation and bi-directional compensation.
We use an ordered pair (i, j, k) to denote a pixel at po-
sition (i, j) of frame k. Since we plan to estimate MVs at
the pixel level, we define MVs for each pixel instead of each
MB, i.e., let M V x(i, j, k) and M V y(i, j, k) be the MVs of
(i, j, k). Let P (i, j, k) be the value of (i, j, k). And we assume to have lost the frame t. In addition, the video stream is
compressed as IPPP structure in our discussion.
2.1. Judgment of SKIP Mode
The first step is to determine whether a MB in the missing
frame is encoded as SKIP mode. For each MB in the frame t,
if the co-located MBs in the previous two frames and the next
one frame are all encoded as SKIP mode, we assume the area
of the missing MB describes a static region or background.
Hence, we let the missing MB be encoded as SKIP mode, and
the pixels in this MB are directly compensated by copying the
co-located pixels in the previous frame. Otherwise we go to
MV extrapolation.
To evaluate the performance of this judgment, we calculate the right percentage by statistical experiments. For each
MB xn , let xn−2 , xn−1 and xn+1 be its co-located MB in the
previous two frames and the next frame. And we denote the
parameters P1 and P2 as follows:
P1 : The percentage of xn is SKIP mode when xn−2 , xn−1
and xn+1 are all SKIP modes.
P2 : The percentage of xn is SKIP mode or 16 × 16 mode
with zero MV when xn−2 , xn−1 and xn+1 are all SKIP modes.
We test several video sequences to obtain P1 and P2 . Testing sequences are Miss America(M), Claire(CL), News(N),
Susie(S), Carphone(CA) and Foreman(F). The experimental
results are shown in Table 1.
From Table 1, P1 and P2 are more than 90% in low motion
sequences and 70% in high motion sequences, which verifies
our previous judgment. In fact, P2 is more valuable than P1
because we don’t have the residue information of the missing
MB, thus a 16 × 16 mode with zero MV is equivalent to SKIP
mode in this case.
2.2. MV Extrapolation (MVE)
The proposed pixel level MVE method, which aims to select
the best estimated MV, is based on MV projection [5]. For
each pixel (i, j, t − 1), we project its inverted MV onto the
pixel (i−M V x(i, j, t−1), j−M V y(i, j, t−1), t) in the frame
t. Since MV is at quarter-pel accuracy in H.264, the inverted
MV does not always point to the integer pixels. Therefore, we
define a pixel (m, n, t) is covered by (i, j, t − 1), if and only
if it is close enough to the pixel (i − M V x(i, j, t − 1), j −
M V y(i, j, t − 1), t), i.e.,
|m − (i − M V x(i, j, t − 1))| ≤ 1/2
|n − (j − M V y(i, j, t − 1))| ≤ 1/2
(1)
(2)
For the pixels in non-SKIP MBs in the frame t, we extrapolate the forward MVs in the following steps:
Table 1. The right percentage of SKIP mode judgment
M(%) CL(%) N(%) S(%) CA(%) F(%)
P1
92.4
94.8
95.9
90.5
80.4
70.7
P2
97.8
98.4
98.8
93.6
89.5
76.4
First of all, in low motion scenes, it has been found in [8]
that directly repeating the MVs of the previous frame works
well. Let a pixel be a multi-covered pixel if it is covered by
more than one pixel. Then the scenes that possess low motion
also have a small number of multi-covered pixels [8]. Thus,
in our algorithm, for all the pixels which are in non-SKIP
mode MBs in the frame t, if less than 50% of them are multicovered pixels, the pixels’ MVs are directly duplicated from
the co-located MVs in the frame t − 1. Then finish forward
MVE. Otherwise go to step (i) and (ii):
(i) For a pixel that is covered by at least one pixel in the
frame t − 1, its MV is estimated by averaging the MVs falling
into it. We define a threshold T HRmv in order to avoid
wild values, the MVs s.t. |M V x| > T HRmv or |M V y| >
T HRmv should be excluded.
(ii) For a pixel that is not covered by any of the pixel in the
frame t − 1, we duplicate the MV from the co-located pixel
in the frame t − 1.
After (i) and (ii), forward MVE is finished. Then we extrapolate the backward MV of each pixel in a similar way:
For each pixel (i, j, t + 1), we project its MV onto the pixel
(i + M V x(i, j, t + 1), j + M V y(i, j, t + 1), t) in the frame
t. The following procedure is nearly the same as the forward
MVE. The difference is that t − 1 should be replaced by t + 1.
2.3. Forward/backward estimation
Pixel value in the missing frame is estimated in forward/backward
estimation. Let f = (fx , fy ) and b = (bx , by ) be the forward and backward estimated MV of (i, j, t), respectively.
Let Pf (i, j, t) and Pb (i, j, t) be the forward and backward estimated pixel value (i, j, t). Pf (i, j, t) can be recovered as
follows:
Pf (i, j, t) = P (i + fx , j + fy , t − 1)
(3)
Since (fx , fy ) does not always point to the integer pixel, we
use a bi-linear interpolation to recover the pixel.
In Fig.1, (i + fx , j + fy , t − 1) is in the center of four
adjacent pixels. Let a, b, c and d be the values of the four
pixels. Let x and y be the distance from (i + fx , j + fy , t − 1)
to the two sides of the square, as illustrated in Fig.1. Using
bi-linear interpolation, we have:
P (i + fx , j + fy , t − 1) =
(1 − x)(1 − y)a + x(1 − y)b
+(1 − x)yc + xyd
(4)
Similarly, we can use (bx , by ) to obtain Pb (i, j, t). As
(bx , by ) is an estimated MV using backward extrapolation, it
Table 2. A frame-by-frame
each concealed frame(dB)
MMA BMVE
M
37.86
37.85
CL 37.84
37.51
N
31.03
30.51
S
26.14
26.30
CA 27.56
27.00
F
29.92
30.11
Fig. 1. Bi-linear interpolation
Miss America
PSNR(dB)
PSNR(dB)
31.16
38.62
38.6
0.5
Wf
Claire
BTEC
38.30
38.20
31.53
26.58
28.00
30.81
TR
36.80
37.54
30.77
25.30
27.59
27.71
Proposed
38.64
38.36
31.58
26.73
28.01
31.14
Foreman
38.64
38.58
0.4
comparison of mean PSNR for
31.12
31.1
0.4
0.6
Moreover, as we exploit pixel level error concealment, it
is not necessary to use deblocking filtering after compensation. Experiment shows there is little PSNR degradation without deblocking filtering. However, the results are not reported
here for brevity.
31.14
0.5
Wf
Carphone
0.6
PSNR(dB)
PSNR(dB)
28.02
38.36
38.34
38.32
0.4
0.5
Wf
0.6
3. EXPERIMENTAL RESULTS
28
27.98
27.96
0.4
0.5
Wf
0.6
Fig. 2. PSNR performance on different values of weight
points to the pixel in the frame t−1 but not the frame t+1. So
Pb (i, j, t) = P (i + bx , j + by , t − 1). The backward bi-linear
interpolation is the same as the forward method.
2.4. Bi-directional compensation
It has been found that weighted averaging of multiple candidate concealment can achieve better performance than single
candidate concealment [3]. Accordingly, P (i, j, t) can be estimated as a combination of Pf (i, j, t) and Pb (i, j, t):
P̃ (i, j, t) = wf Pf (i, j, t) + wb Pb (i, j, t)
s.t.wf + wb = 1
(5)
(6)
where wf and wb are the weights of the forward and backward methods. The typical compensation performance based
on different values of wf is shown in Fig.2.
From Fig.2, we find that the maximum of PSNR occurs
when wf is a little less than wb , which can be explained by the
fact that (i, j, t−1) is sometimes pointed to by several MVs of
the pixels in the frame t, but it covers only the nearest pixels
in the frame t using MV projection. This phenomenon makes
the forward method a little degradation in PSNR performance.
Since the optimum wf is around 0.4 in low motion sequences
and 0.5 in high motion sequences, we set wf = 0.45 and
wb = 0.55 in our algorithm.
We implemented the proposed algorithm on H.264 JM 8.2
reference software. All testing sequences are in QCIF size.
The frame rate is set as 30 fps. One reference frame is used
here. Quantization parameter QP = 28. T HRmv is set to 8
pixels. Only the first frame is encoded as an I-frame and all
the others are encoded as P-frames.
3.1. Comparison
We compare the proposed algorithm with MMA [4], BMVE
[5], BTEC [8] and TR (temporal replacement) methods. In
the comparison, each time a P-frame is dropped from a compressed sequence. It is then concealed by MMA, BMVE,
BTEC, TR and the proposed algorithm respectively, finally
their corresponding PSNR values are calculated. We repeated
this process from the 11th frame to the 140th frame in the
selected sequences. The results are shown in Table 2.
From Table 2, the proposed algorithm has better performance than the reference algorithms. For the scenes that possess high motions like Foreman sequence, it achieves 1.22
dB, 1.03 dB and 0.33 dB to MMA, BMVE and BTEC on average respectively. For the scenes that have low motions like
Miss America sequence, it still outperforms MMA, BMVE
and BTEC with 0.78 dB, 0.79 dB and 0.34 dB. Hence, the
proposed method can provide relatively satisfactory performance in various scenarios.
For subjective evaluation, one error-free frame and three
frames recovered by MMA, BMVE and the proposed algorithm are shown in Fig.3. We choose the 65th frame of Foreman sequence. In this scene, the man is raising his head and
opening his mouth. Therefore, there is high motion in the
face area. We can see the picture that is recovered by MMA
or BMVE has obvious block artifacts in the face area. The
proposed algorithm provides a finer presentation.
Table 3. Complexity reduction by SKIP mode judgment
M(%) CL(%) N(%) S(%) CA(%) F(%)
P3
48.7
68.8
60.5
29.3
20.5
10.5
(a) error free
sequences, which means that a high percentage of pixels are
recovered without MVE. Accordingly, for low motion scenes,
the computational complexity is highly reduced. For high motion scenes, we can still save more than 10% complexity.
(b) MMA
4. CONCLUSION
(c) BMVE
(d) Proposed
Fig. 3. Subjective quality of the 65th frame of Foreman
5. REFERENCES
Foreman
36
Proposed
MMA
BMVE
BTEC
TR
35
34
33
[1] Y.Wang, S.Wenger, J.Wen, and A.Katsaggelos, “Error resilient video coding techniques,” Signal Processing Magazine, vol. 17, pp. 61–82, July 2000.
32
PSNR(dB)
In this paper, we propose a new bi-directional error concealment method. The main objective is to recover the loss of a
whole frame. In our algorithm, we exploit the pixel level MV
estimation and bi-linear interpolation to reconstruct the lost
frame. The mode information is exploited to reduce computational complexity. From analysis and experimental results,
the proposed algorithm has better performance than the existing methods both on PSNR and visual quality. In addition, it
is efficient in stopping error propagation.
[2] S.Hemami, “Robust video coding - an overview,” Proc.
IEEE ICASSP, vol. 5, pp. 901–904, 2005.
31
30
29
[3] M.E.Al-Mualla, N. Canagarajah, and D.R.Bull, “Multiple reference temporal error concealment,” Proc. IEEE
ISCAS, vol. 5, pp. 149–152, 2001.
28
27
26
25
50
60
70
80
Frame number
90
100
110
[4] S.Belfiore, M.Grangetto, E.Magli, and G.Olmo, “An error concealment algorithm for streaming video,” Proc.
IEEE ICIP, vol. 3, pp. 649–652, 2003.
Fig. 4. The recovery of PSNR after a lost frame
[5] P.Baccichet and A.Chimienti,
“A low complexity
concealment algorithm for the whole-frame loss in
H.264/AVC,” IEEE Workshop on Multimedia Signal Processing, 2004.
Fig.4 illustrates the recovery of PSNR after a loss occurs at the 65th frame of the Foreman sequence. We can see
that the proposed algorithm is more efficient in stopping error
propagation than MMA, BMVE, BTEC and TR.
3.2. Complexity
In our algorithm, the computational complexity is reduced by
SKIP mode judgment and skipping deblocking filtering compared to reference methods. Here, we study the complexity
reduction caused by SKIP mode judgment. Let P3 be the percentage of the MBs which meet the condition of SKIP mode
judgment. The statistical result of P3 is shown in Table 3.
From Table 3, P3 is more than 40% in low motion video
[6] Z. Wu and J.M.Boyce, “An error concealment scheme for
entire frame losses based on H.264/AVC,” Proc. IEEE
ISCAS, 2006.
[7] E.Quacchio, E.Magli, G.Olmo, P.Baccichet, and
A.Chimienti, “An error concealment scheme for entire
frame losses based on H.264/AVC,” Proc. IEEE ICASSP,
vol. 2, pp. 329–332, 2005.
[8] Y. Chen, K. Yu, J.Li, and S.Li, “An error concealment
algorithm for entire frame loss in video transmission,”
Proc. Picture Coding Symposium, 2004.
Download