PAPER IDENTIFICATION NUMBER:3231 1 Dual Frame Motion Compensation with Optimal Long-term Reference Frame Selection and Bit Allocation Da Liu, Debin Zhao, Xiangyang Ji, and Wen Gao, Fellow, IEEE Abstract—In dual frame motion compensation (DFMC), one short-term reference frame (STR) and one long-term reference frame (LTR) are utilized for motion compensation. The performance of DFMC is heavily influenced by the jump updating parameter and bit allocation for the reference frames. In this paper, firstly the rate-distortion performance analysis of motion compensated prediction in DFMC is presented. Based on this analysis, an adaptive jump updating DFMC (JU-DFMC) with optimal LTR selection and bit allocation is proposed. Subsequently, an error resilient JU-DFMC is further presented based on the error propagation analysis of the proposed adaptive JU-DFMC. The experimental results show that the proposed adaptive JU-DFMC achieves better performance over the existing JU-DFMC schemes and the normal DFMC scheme in which the temporally most recently decoded two frames are used as the references. The performance of the adaptive JU-DFMC is significantly improved for video transmission over noisy channels when the specified error resilience functionality is introduced. Index Terms—Video coding, motion compensation, dual frame motion compensation, bit allocation, error propagation, error resilience. I. INTRODUCTION M otion-compensated prediction in inter-prediction coding plays an important role in the existing hybrid video codecs such as MPEG-4 [1], H.263 [2], and H.264/AVC [3]. For each inter-block in the current frame, its prediction signal is obtained from the reference frame via motion compensation. Subsequently, the difference between the current original frame and its prediction is compressed and transmitted. Multi-frame motion compensation [4,5,6,7,8] allows that more than one reference frame can be used for the motion-compensated prediction. In most cases, it improves coding performance significantly. However, with the increase of the number of reference frames, the memory storage and the motion searching Manuscript received July 1, 2008; revised July 26, 2009. This work was supported by National Science Foundation of China under Grant 60736043 and National Basic Research Program of China (973 Program, 2009CB320903). This paper was recommended by Associate Editor Eckehard Steinbach. D. Liu and D. Zhao are with the Department of Computer Science, Harbin Institute of Technology, Harbin, 150001, China. (e-mail: dliu@jdl.ac.cn, dbzhao@jdl.ac.cn) X. Ji is with Broadband Networks & digital Media Laboratory of Automation Department, Tsinghua University, Beijing, 100084, China (e-mail: xyji@mail.tsinghua.edu.cn) W. Gao is with the Key Laboratory of Machine Perception, the School of Electronic Engineering and Computer Science, Peking University, No.5, Yiheyuan Road, Beijing 100871, China(e-mail:wgao@pku.edu.cn) Copyright (c) 2009 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org. LTR Buffer STR Buffer Fig. 1 Dual frame motion compensation Current Frame complexity increase dramatically. Dual frame motion compensation (DFMC) is the special case of multi-frame motion compensation in which only two reference buffers are utilized, thus requiring a relatively modest increase in memory storage and motion searching complexity. In DFMC, as shown in Fig. 1, the first reference buffer contains the most recently decoded frame, called short-term reference frame (STR), and the second one contains a reference frame from the past that is periodically updated, called long-term reference frame (LTR). Generally, there are two types of approaches for DFMC [9]. The first approach is jump updating DFMC (JU-DFMC), in which LTR remains static for N frames, and jumps forward to be the frame at a distance 2 back from the frame to be encoded. For example, suppose for the frames from time instant i-N+1 to i, the LTR is i-N-1. Then after encoding frame i, when the encoder moves on to encoding frame i+1, the STR will slide forward by one to frame i, and the LTR will jump forward by N to frame i-1. After that, the LTR remains fixed for N frames, and then jumps forward again. N is called the jump update parameter. The second approach is continuous updating DFMC (CU-DFMC), where the LTR for each current frame always has a fixed temporal distance D, called the continuous update parameter, to the current frame. As a result, every frame has a chance serving as a STR and as a LTR. A number of DFMC based approaches to improve the video coding performance have been reported in the literatures. In [10], a refreshing rule of LTR was proposed by introducing scene changing detection. In [11], the concept of the dual frame was simulated in a low bandwidth situation by the block-partitioning prediction and the utilization of two time differential reference frames. [12] has found that using a high quality frame as a reference frame for the following frames will benefit the overall performance. [13] and [14] have shown that PSNR is influenced by the different extra bandwidth and the period giving to the LTR. In [15], the update period of the LTR was set to ten frames. The PSNR of nine frames that follow the LTR frame was utilized to determine how many bits can be allocated to the LTR. In [16], simulated annealing was utilized to select the LTR; however, its computational complexity is relatively high. PAPER IDENTIFICATION NUMBER:3231 For video transmission over noisy channels, in [9] [17] and [18], the recursive optimal per-pixel estimate (ROPE) algorithm was utilized to provide mode decision in dual frame coding. They had pointed out that the fixed jump updating parameter of LTR was not optimal for all sequences. Feedback was also utilized in dual frame coding in [19] to control the drift errors. In [20], the uneven allocation of error protection to the LTR was examined. It showed that assigning higher error protection for the LTRs was better than assigning equal error protection for all frames. In [21], a binary decision tree designed by the classification and regression trees (CART) algorithm was utilized to choose among various error concealment choices in the dual frame coding. The trade-off of end-to-end delay and compression efficiency in dual frame coding motion compensation was investigated in [22] and [23]. Multihypothesis motion compensated prediction (MHMCP) was also utilized to enhance error resilience. In [24], each block is predicted from two reference blocks using two motion vectors. In [25], the error propagation model of MHMCP jointly considered the coding efficiency and error resilience in predictor selection. Furthermore, the reference picture interleaving and data partitioning was utilized in [26] to make MHMCP more resilient to channel errors. In [27], the error propagation impact in MHMCP was examined and the rate-distortion performance considering the hypothesis number and coefficients were analyzed. In [28], two-hypothesis prediction and one-hypothesis prediction were adaptively used to decrease error propagation. In [29], all frames were divided into period frame and non-period frame. The period frame has fixed distance between each other. For all non-period frames, only a previous period frame was utilized as the reference frame. However, the adaptive period frame selection and bit allocation for different packet loss rate was not reported. For different video sequences, adaptive LTR selection and bit allocation to improve coding efficiency have not been fully studied yet. In this paper, the rate distortion (R-D) performance for motion compensated prediction (MCP) in DFMC is analyzed. Based on the analysis, an adaptive JU-DFMC with optimal LTR selection and bit allocation is provided. Furthermore, for video transmission over noisy channels, the error propagation in the proposed adaptive JU-DFMC is analyzed first, then an error resilient JU-DFMC is presented. The rest of the paper is organized as follows. In Section II, the R-D performance analysis for MCP in DFMC is given. Based on the analysis, optimal LTR selection and bit allocation in the adaptive JU-DFMC are separately presented in Section III and Section IV. In Section V, based on the error propagation analysis for the proposed adaptive JU-DFMC, an error resilient JU-DFMC is presented for video transmission over noisy channels. The experimental results and discussions are provided in Section VI. Finally, Section VII concludes this paper. II. RATE DISTORTION PERFORMANCE ANALYSIS FOR MCP IN DFMC In this section, the R-D performance analysis for MCP is 2 firstly presented. Secondly, the prediction error variances in both JU-DFMC and CU-DFMC are formulated. A. Power Spectral Density Power spectral density (PSD) Φ(ω x , ω y ) describes the power of a signal as a function of frequency and is achieved as ∞ Φ (ω x , ω y ) = ∫−∞ R (τ )e −2π i (ω x ,ω y )τ dτ , (1) where (ω x , ω y ) is a vector representing the two-dimensional spatial frequency, R(τ ) is an autocorrelation function that describes the correlation between different time points. The rate-distortion analysis of MCP [30] relates the PSD of the prediction error to the accuracy of motion compensation captured by the probability density function (pdf) of displacement error. The rate-distortion analysis was extended to the multihypothesis prediction in [31]. Especially, as described in [23], the PSD for two hypotheses prediction can be simplified to ⎛ 6 + α 1 + α 2 + 2 P1 (Λ ) P2 (Λ) ⎞ Φ ee (Λ ) = Φ ss (Λ ) ⎜ − P1 (Λ ) − P2 (Λ ) ⎟ , (2) 4 ⎝ ⎠ where Φ ee (Λ ) is the PSD of the prediction error, Φ ss (Λ ) is the signal power spectrum of the input video signal and is non-negative, Λ = (ω x , ω y ) , and P(ω x , ω y ) = e −2πσ + (ω x +ω y ) . 2 2 2 (3) P1 ( Λ ) and P2 (Λ ) are separately the displacement error pdfs from the first and the second hypotheses. σ +2 is the displacement error variance (DEV) and it reflects the inaccuracy of the displacement vector used for the motion compensation [30]. α i represents the spectral noise-to-signal power ratio in the ith hypothesis and it has been given in [31] as α i = Φ nn _ i (Λ) / Φ ss (Λ ) . Φ nn _ i (Λ) represents the PSD of residual noise in the ith hypothesis and has been given in [30] as Φ nn _ i (Λ ) = max[0,θ (1 − θ / Φ ee _ i (Λ ))] . Φ ee _ i (Λ ) is the PSD of the prediction error in the ith hypothesis. θ is a parameter that generates the rate-distortion function by taking on all positive real values [30]. If Φ nn _ i (Λ ) = 0 , it has no influence on Φ ee (Λ ) . If Φ nn _ i (Λ ) = θ (1 − θ / Φ ee _ i (Λ )) , since Φ nn _ i (Λ ) is nearly linear proportional to Φ ee _ i (Λ ) in the short term, it can be simplified as Φ nn _ i (Λ ) = h(Φ ee _ i (Λ )) . h( ) is the linear function that represents the relationship between Φ nn _ i (Λ ) and Φ ee _ i ( Λ ) . So considering both cases, (2) can be re-written as Φee (Λ) = h(Φee _1) + h(Φee _2) 4 +Φ ss (Λ) 3 + P1(Λ)P2(Λ) − 2P1(Λ) − 2P2(Λ) .(4) 2 The PSD Φ ee (ω x , ω y ) is utilized as the performance measurement. From [31], ∆R = Φ ee (ω x , ω y ) 1 + π +π )dω x dω y , ∫ ∫ log 2 ( Φ ss (ω x , ω y ) 8π 2 −π −π (5) where ∆R represents the maximum bit-rate reduction (in PAPER IDENTIFICATION NUMBER:3231 3 11 10 Prediction error variances from LTR Prediction error variances from STR 7 1 3 5 7 Frame number 9 11 Fig. 3 The prediction error variance σ e2 in JU-DFMC f i −HQN −1 fi −LQN f i −LQN +1 f i −LQN + 2 f i −LQN + 3 f i −LQ2 f i −HQ1 LTR 9 8 prediction structure is smaller than the other, then its ∆R will be smaller, and its coding efficiency will be better. B. Prediction Error Variance Mobile(qcif) 12 Variancee bits/sample) by optimum encoding of the prediction error, compared to the optimum intra frame encoding of the signals for the same mean squared error (MSE) [32]. A negative ∆R corresponds to a reduced bit-rate compared to the optimum intra frame coding. From (5), we can conclude that when the reconstructed frames from two prediction structures have the same MSE, if the PSD Φ ee (ω x , ω y ) in the frame from one f i LQ recently decoded two frames are separately STR and LTR. Therefore for every frame in CU-DFMC, the prediction performance from LTR is nearly the same and thus σ e2 from LTR is nearly the same. The same conclusion can also be obtained for STR prediction. Generally, if the prediction performance is better, the PEV σ e2 will be smaller. Since σ e2 is nearly directly proportional to fi +LQ1 LTR Fig. 2 JU-DFMC structure The PSD Φ ee (Λ ) is related to the displacement error pdf. From (3), the displacement error pdf is determined by the DEV σ +2 . In video coding, the displacement error is obtained from the distance between the actual motion vector and its estimated one. The DEV reflects the inaccuracy of the motion compensation [30]. The prediction error variance (PEV) is the variance of the motion compensated error between the original value and the reference value, it also reflects the inaccuracy of the motion compensation. From [23], in the short term, the DEV σ +2 is nearly directly proportional to the PEV σ e2 . So σ e2 plays a key role in determining Φ ee (Λ ) . In the following, we will give the analysis of σ e2 . In JU-DFMC, there are two kinds of frames. As shown in Fig. 2, one kind of frames is called high quality frame (HQF) with HQ relatively more bits allocated, such as the (i-1)th frame f i -1 and the (i-N-1)th frame f i -HQ N -1 . The other kind of frames has f i LQ +k relatively lower quality (LQF), such as the (i+k)th frame (k=-N, -N+1, …, -2, 0, 1). To simplify our discussion, all these LQFs are coded to have similar MSEs. Any one frame (HQF or LQF) can be utilized as STR for the next frame. Meanwhile, the HQF is utilized as LTR for following several frames. One example of the PEV σ e2 in JU-DFMC is shown in Fig. 3. The frame at time instant 1 is encoded as a HQF, and then utilized as the LTR for following several frames. The frames at other time instants are encoded as LQFs (in the figure, bit rate in LQF is 420.62kbps, bit allocation in HQF is 3 times that in LQF). In every encoded LQF, the prediction performance from STR is similar, so σ e2 from STR (dashed line) is similar. For the second LQF (at time instant 3) following the LTR, the prediction performance from the LTR is better, thus σ e2 is smaller. With the coding of the following frames, the prediction performance from the LTR degrades while σ e2 from the LTR increase (solid line). In CU-DFMC of this work, every frame is allocated approximately the same quality. The continuous update parameter D is generalized as 2. For every frame, the most σ +2 [23], σ +2 is smaller as well. III. OPTIMAL LTR SELECTION IN JU-DFMC In this section, the optimal LTR selection in JU-DFMC is presented. In the first subsection, the coding performance of the same LQF in CU-DFMC and JU-DFMC is compared. In the second subsection, the coding performance of the same HQF in CU-DFMC and JU-DFMC is compared. The optimal LTR selection in JU-DFMC is provided in the last subsection. A. Coding Performance Comparison of LQF LQ f i −HQ f i −LQ f i −LQ2 f i −LQ 4 3 1 fi LTR Fig. 4 LQF i in JU-DFMC fi −LQ4 fi −LQ3 fi −LQ2 fi −LQ1 fi LQ Fig. 5 LQF i in CU-DFMC The frame at instant i can be encoded as a LQF (denoted as f i LQ ) in JU-DFMC or in CU-DFMC (as shown in Fig. 4 and Fig. 5), its power spectral densities are separately represented J ( Λ ) and Φ CL ( Λ ) . For giving the performance as Φ ee ee comparison when f i LQ is encoded in the two DFMC coding structures, the reconstructed f i LQ s in both structures are assumed to have the same MSE. Then the difference of power sut ee (Λ ) between the two f LQ s in the two spectral densities Φ i structures can be derived from (4) as follows: sut ee (Λ) = Φ J (Λ) − Φ CL (Λ) Φ ee ee ,(6) 1 J (Λ) − Φ CL (Λ)) + h(Φ J (Λ) − Φ CL (Λ))) + Φ (Λ)P LQ = (h(Φ ee ss ee _1 ee _ 2 ee _ 2 _1 4 J CL where Φ ee _ k ( Λ ) and Φ ee _ k ( Λ ) are separately the PSD in the kth hypothesis (k=1 or 2 represent taking STR or LTR as hypothesis here and after) in JU-DFMC and CU-DFMC, and PAPER IDENTIFICATION NUMBER:3231 4 1 1 P LQ = P1J (Λ) ×( P2J (Λ) −1) − P2J (Λ) − (P1CL(Λ) × ( P2CL(Λ) −1) − P2CL(Λ)) .(7) 2 2 In (7), P1J (Λ ) and P2J (Λ ) separately represent the displacement error pdfs from STR and LTR in JU-DFMC, P1CL (Λ ) and P2CL (Λ ) separately represent the displacement error pdfs from STR and LTR in CU-DFMC. J ( Λ ) − Φ CL ( Λ ) depends on Φ ( Λ ) P LQ and From (6), Φ ee ee ss J CL J CL h(Φ ee _ k ( Λ ) − Φ ee _ k ( Λ )) . The calculation of Φ ee _ k (Λ) − Φ ee _ k (Λ) J ( Λ ) − Φ CL ( Λ ) . Therefore in iterative formula is the same as Φ ee ee J ( Λ ) − Φ CL ( Λ ) . (6), P LQ plays a dominant role in Φ ee ee In JU-DFMC and CU-DFMC, the reconstructed f i LQ s are assumed to have the same MSE. STR in both structures has nearly the same MSE as the current reconstructed f i LQ , and the temporal distance from STR to the current f i LQ is the same (one frame distance), then the prediction performance from STRs in both structures is nearly the same. From Subsection II-B, the prediction performance influences the DEV σ +2 , so in the two DFMC coding structures, σ +2 from STR is nearly the same. According to (3), P1J (Λ) is nearly the same as P1CL (Λ ) . Then (7) can be further represented as 1 P LQ = (1 − P1J (Λ))( P2CL (Λ ) − P2J (Λ )) . 2 (8) In JU-DFMC (in Fig. 4), the MSE of LTR is smaller than that of the current reconstructed f i LQ . In CU-DFMC (in Fig. 5), the MSE of LTR is nearly the same as that of the current reconstructed f i LQ . When f i LQ is located in the several previous LQFs following LTR in JU-DFMC, the prediction performance from LTR is better than that when f i LQ is encoded in CU-DFMC. From the analysis in Subsection II-B, if the prediction performance is better, the DEV σ +2 will be smaller, so σ +2 from LTR in JU-DFMC is smaller than that in CU-DFMC. According to (3), P2CL (Λ ) is smaller than P2J (Λ) . And the σ +2 is always larger than 0, then P1J (Λ ) < 1 . According to the above analysis and (8), P LQ = (1−1/2P1J (Λ))(P2CL (Λ) − P2J (Λ)) < 0 . Since P LQ palys a J ( Λ ) − Φ CL ( Λ ) , we can get dominant role in determination of Φ ee ee J ( Λ ) − Φ CL ( Λ ) < 0 . This means that compared to encoding Φ ee ee f i LQ using CU-DFMC, the coding of f i LQ using JU-DFMC has better R-D performance (represented as bits saving in the same reconstructed MSE) if f i LQ is located in the several previous LQFs following LTR. The R-D performance gain is sut ee (Λ ) . represented as Φ reconstructed f i HQ s in both structures are assumed to have the same MSE. Then, the difference of power spectral densities Φ ee (Λ ) between the two f i HQ s in the two structures can be derived from (4) as follows: J Φee (Λ) = ΦCH ee (Λ) −Φ ee (Λ) , (9) 1 J CH J HQ = (h(ΦCH ee _1(Λ) −Φ ee _1(Λ)) + h(Φ ee _2 (Λ) −Φ ee _2 (Λ))) +Φ ss (Λ)P 4 J where Φ CH ee _ k ( Λ ) and Φ ee _ k ( Λ ) (k=1 or 2) are separately the PSD in the kth hypothesis in CU-DFMC and JU-DFMC, and 1 1 PHQ = ((P1CH(Λ)×( P2CH(Λ) −1) − P1J (Λ)×( P2J (Λ) −1)) + (P2J (Λ) − P2CH(Λ)) . (10) 2 2 In (10), P1J (Λ ) and P2J ( Λ ) separately represent the displacement error pdfs from STR and LTR in JU-DFMC, P1CH ( Λ ) and P2CH (Λ ) separately represent the displacement error pdfs from STR and LTR in CU-DFMC. J HQ and h(ΦCH (Λ) −ΦJ (Λ)) ΦCH ee (Λ) −Φee (Λ) depends on Φss (Λ)P ee _ k ee _ k J (k=1 or 2). The calculation of Φ CH ee _ k ( Λ ) − Φ ee _ k ( Λ ) is the same J HQ as Φ CH ee ( Λ ) − Φ ee ( Λ ) . Therefore in iterative formula (9), P J plays a dominant role in Φ CH ee ( Λ ) − Φ ee ( Λ ) . The reconstructed f i HQ s in both structures are assumed to have the same MSE. In CU-DFMC (in Fig. 7), STR and LTR have nearly the same MSE as the current reconstructed f i HQ . In JU-DFMC (in Fig. 6), STR has larger MSE than the current reconstructed f i HQ ; LTR has nearly the same MSE as the current reconstructed f i HQ , but the long temporal distance weakens its influence. So the prediction performance from LTR and STR in CU-DFMC is separately better than those in JU-DFMC. From the analysis in Subsection II-B, if the prediction performance is better, the DEV σ +2 will be smaller, and according to (3), we can have 0 < P1J (Λ ) < P1CH (Λ) , 0 < P2J ( Λ ) < P2CH (Λ ) , and P1J ( Λ ) < 1 . From the above analysis and (10), we can get (see Appendix A) 1 P HQ ≈ (1 − P1J (Λ )) × ( P2J (Λ ) − P2CH (Λ )) < 0 . 2 (11) J Since P HQ plays a key role in determining Φ CH ee ( Λ ) − Φ ee ( Λ ) , J we can get Φ CH ee ( Λ ) − Φ ee ( Λ ) < 0 . This means that compared with the coding of f i HQ using CU-DFMC, the coding of f i HQ in JU-DFMC results in R-D performance loss (represented as more bits consumption in the same reconstructed MSE), denoted as Φ ee (Λ ) . B. Coding Performance Comparison of HQF The frame at instant i can also be encoded as a HQF (denoted as f i HQ ) in JU-DFMC or in CU-DFMC (as shown in Fig. 6 and Fig. 7), its PSDs are separately represented as J (Λ ) Φ ee and HQ Φ CH ee ( Λ ) . For giving the performance comparison when f i is encoded in the two DFMC coding structures, the HQ f i −HQ f i −LQ3 f i −LQ2 f i −LQ 1 fi 4 LTR Fig. 6 HQF i in JU-DFMC HQ HQ HQ HQ f i −HQ 4 f i − 3 f i − 2 f i −1 f i Fig. 7 HQF i in CU-DFMC PAPER IDENTIFICATION NUMBER:3231 C. Optimal LTR Selection Before describing the proposed LTR selection method, some frequently used notations in the section is given in Table I. Table I. Notations Variable Definition σ+2_JL DEV from LTR in JU-DFMC σ+2_CLL σ +2_CHL σ e2_CLL σ e2 _ CHL σ e2_ JL2 σ e2_ JSA DEV from LTR in CU-DFMC, in which every frame is LQF DEV from LTR in CU-DFMC, in which every frame is HQF PEV from LTR in CU-DFMC, in which every frame is LQF PEV from LTR in CU-DFMC, in which every frame is HQF PEV in the second LQF following LTR (HQF) in JU-DFMC when the reference frame is LTR average PEV in LQFs when the reference frame is STR 5 consequently, P2CL (Λ) ≈ 1− k ×σ +2_CLL and P2CH (Λ) ≈ 1− k ×σ +2 _CHL , where σ +2_ JL is the DEV from LTR in JU-DFMC; σ +2 _ CLL is the DEV from LTR in CU-DFMC, in which every frame is a LQF; and σ +2_ CHL is also the DEV from LTR in CU-DFMC, but in which every fame is a HQF. Then P LQ and P HQ can be separately written as 1 1 PLQ = (1− P1J (Λ))×(P2CL(Λ) − P2J (Λ)) ≈−(1− P1J (Λ))×k ×(σ+2_CLL −σ+2_ JL) ,(13) 2 2 1 1 PHQ ≈ (1− P1J (Λ))×(P2J (Λ) − P2CH(Λ)) ≈−(1− P1J (Λ))×k ×(σ+2_JL −σ+2_CHL) .(14) 2 2 The DEV σ +2 can be used to compare the performance and determine the next LTR. In JU-DFMC, with the encoding of the frame j (j=i+1, i+2, i+3…) following LTR, the prediction performance from the LTR decreases, and the DEV σ +2_ JL from the LTR increases. When σ +2_ JL from LTR is less than 1) LTR Selection Strategy (σ 2 In this subsection, based on the previous analysis of the performance comparison, the optimal LTR selection in JU-DFMC is presented. In CU-DFMC, the bit allocation in every frame can be changed, but the R-D performance is fixed. For example, although the quality of every LQF in Fig. 5 is lower than that of every HQF in Fig. 7, the R-D performance is stable. So the R-D performance in CU-DFMC is used to compare with the R-D performance in JU-DFMC. In JU-DFMC, the R-D performance in LQF and HQF is different. From Subsection III-A, compared with the coding of f i LQ in sut ee (Λ ) . P HQ is larger than P LQ and Φ ee ( Λ ) is larger than Φ CU-DFMC, the encoded f i LQ in JU-DFMC has better R-D performance if f i LQ is located in the several previous LQFs following LTR. From Subsection III-B, compared with the coding of f i HQ in CU-DFMC, the encoded f i HQ in JU-DFMC has lower R-D performance. In JU-DFMC, with the coding of frames following LTR, the R-D performance gain when the frame is encoded as a LQF and the R-D performance loss when the frame is encoded as a HQF are utilized to determine the end of LQF coding and beginning the next HQF (LTR). When a frame is separately encoded as a LQF and a HQF in JU-DFMC, the R-D performance gain and loss are separately sut ee (Λ ) and Φ ee ( Λ ) . The difference of R-D denoted as Φ performance gain and loss can be obtained as (see Appendix B) sut ee (Λ) − Φ ee (Λ) ≈ 1 h(Φ sut ee _1 (Λ) − Φ ee _1 (Λ)) + Φ ( P LQ − P HQ ) . (12) Φ ss 2 s u t In (12), Φ ee _1 (Λ ) and Φ ee _1 (Λ) are the R-D performance gain and loss when STR is separately encoded as a LQF or a HQF in sut ee _1 ( Λ ) − Φ ee _1 ( Λ ) is the same JU-DFMC, the calculation of Φ sut ee (Λ) −Φ ee (Λ) , therefore in iterative formula (12), as Φ sut ee ( Λ ) − Φ ee (Λ ) . ( P LQ − P HQ ) plays a dominant role in Φ P LQ and P HQ are determined by the DEV σ +2 . In the short term, P2 (Λ ) has nearly linear relationship with σ +2 . For the simplicity of calculation, we suppose P2J (Λ ) ≈ 1 − k × σ +2_ JL , 2 +_ CLL + σ +_ CHL ) / 2 , (σ +2 − σ +2 _ CHL ) < (σ +2 _ CLL − σ +2 ) . Then The R-D performance gain when frame j is encoded as a LQF is larger than the R-D performance loss when frame j is encoded as a HQF. At this point, if the frame j is encoded as a LQF, JU-DFMC will have better performance. Otherwise, when σ +2_ JL from the LTR is larger than (σ +2 _ CLL + σ +2 _ CHL ) / 2 , the R-D performance gain will be smaller than the performance loss. At this point, if the frame j is encoded as a LQF, the coding of the next HQF will result in more quality loss, thus the performance of JU-DFMC will decrease. From the above analysis, we select the DEV σ +2_ T = (σ +2_ CLL + σ +2_ CHL ) / 2 (15) as the point to terminate the LQF coding. When σ +2_ JL from LTR is larger than σ +2_ T , the next frame is encoded as a HQF. 2) Implementation in Video Coding In the short term, the DEV σ +2 is nearly directly proportional to the PEV σ e2 [23]. Then in JU-DFMC, σe2_T = (σe2_CLL +σe2_CHL)/ 2 can be utilized instead of σ +2_ T as the point to terminate the coding of LQF, where σ e2_ T is the PEV from LTR in JU-DFMC; σ e2 _ CLL is the PEV from LTR in CU-DFMC, in which every frame is a LQF; σ e2 _ CHL is also the PEV from LTR in CU-DFMC, in which every frame is a HQF. σ e2 _ CLL and σ e2 _ CHL cannot be obtained in JU-DFMC. HQFs in CU-DFMC and JU-DFMC are assumed to have the same MSE, then σ e2 _ CHL can be replaced by σ e2_ JL 2 , which is the PEV in the second LQF following LTR (HQF) in JU-DFMC when the reference frame is LTR. LQFs in CU-DFMC and JU-DFMC are assumed to have the same MSE. If the quality of the reference frame is the same, and the temporal distance between the two reference frames is only one PAPER IDENTIFICATION NUMBER:3231 6 n Φ eeT ( Λ ) − Φ eeT (Λ ) = ∑ (Φ eei (Λ ) − Φ eei (Λ )) . frame, the PEVs from the two reference frames are similar, so σ e2 _ CLL can be replaced by the PEV in LQF in JU-DFMC when the reference frame is a STR (LQF). For safety, σ e2 _ CLL is replaced by σ e2_ JSA , which is the average PEV in LQFs (from the second LQF to the last encoded LQF) in the GOP when the reference frame is STR. So finally in JU-DFMC, σ e2_ T = (σ e2_ JL 2 + σ e2_ JSA ) / 2 is utilized instead of σ e2_ T = (σ e2 _ CLL + σ e2_ CHL ) / 2 as the point to terminate the coding of LQF. In determing bit allocation, PSD Φ ee (ω x , ω y ) is compared under the same bit rate. According to Parseval’s relation, σ = 4π +π f sx +π f sy Φ ee (ω x , ω y )dω x dω y 2 ∫−π f sx ∫−π f sy , (16) where terms f sx and f sy are the spatial sampling frequencies in horizontal and vertical directions. From (16), if Φ ee (ω x , ω y ) is smaller, σ e2 will be smaller. And from [33], σ2 1 R ( D) = log 2 ( e ) . D 2 (17) Under the same bit-rate, if σ e2 is smaller, D (MSE) will be smaller, and the coding efficiency is better. Therefore, given the same bit rate, if PSD Φ ee (ω x , ω y ) is smaller, the coding efficiency will be better. In JU-DFMC, one HQF and the following LQFs comprise a group of pictures (GOP). The total PSD of all frames in a GOP is utilized as performance measure of bit allocation. Assuming the overall bit rate of a GOP is fixed, the bit allocation between HQF and LQFs can be changed. The changed bits in LQFs are assumed to be averagely allocated to every LQF, thus the changed MSE in every LQF is nearly the same. With the change of bit allocation between HQF and LQFs under the overall bit rate of the GOP, the PSD in HQF and LQFs will change, when the total PSD is the smallest, the coding efficiency is the best, and then the bit allocation ratio between HQF bits and average LQF bits is the best. The total PSD can be simplified to a concise performance measure. Suppose before the change of bit allocation, the PSD in frame i and the total PSD in the GOP are separately denoted as Φ eei (Λ ) and Φ eeT (Λ ) . After the change of bit allocation, the PSD in frame i and the total PSD in the GOP are separately denoted as Φ eei (Λ ) and Φ eeT (Λ ) . Assume a GOP has n frames. Then we have n Φ eeT ( Λ) = ∑ Φ eei (Λ ) , (18) n Φ eeT ( Λ) = ∑ Φ eei (Λ ) . (19) i =1 Φ eei (Λ) − Φ eei (Λ) 1 1 .(21) = h(Φ eei _1 (Λ) − Φ eei _1(Λ)) + h(Φ eei _ 2 (Λ) − Φ eei _ 2 (Λ)) 4 4 1 + Φ ss (Λ)((2 − Pi 2 (Λ))(2 − Pi1 (Λ)) − (2 − Pi 2 (Λ))(2 − Pi1 (Λ))) 2 In (21), Φ eei _ k (Λ) and Φ eei _ k (Λ) (k=1 or 2) represent the PSD (2 − Pi 2 (Λ))(2 − Pi1 (Λ)) represent the values of (2 − Pi2 (Λ))(2 − Pi1 (Λ)) A. Performance Measurement of Bit Allocation 1 In (20), Φ eei (Λ ) − Φ eei (Λ) (i=1,2,…,n) is the change of PSD in every frame and can be calculated by (see appendix C) in the kth reference frame before and after the change of bit (2 − Pi 2 ( Λ ))(2 − Pi1 ( Λ )) and allocation respectively. IV. BIT ALLOCATION FOR JU-DFMC 2 e (20) i=1 i=1 With the change of bit allocation, the change of total PSD is before and after the change of bit allocation respectively. Pi 2 ( Λ ) and Pi1 ( Λ ) are the displacement error pdfs in frame i when the reference is separately a LTR and a STR in JU-DFMC. The calculation of Φ eei _ k (Λ) − Φ eei _ k (Λ) is the same as Φ eei (Λ ) − Φ eei (Λ) , therefore in iterative formula (21), plays a (2 − Pi 2 ( Λ ))(2 − Pi1 ( Λ )) − (2 − Pi 2 (Λ ))(2 − Pi1 (Λ )) dominant role in Φ eei (Λ ) − Φ eei (Λ ) . From (20), n plays a ∑ ((2 − Pi 2 ( Λ ))(2 − Pi1 (Λ )) − (2 − Pi 2 (Λ ))(2 − Pi1 (Λ ))) i =1 dominant role in Φ eeT (Λ) − Φ eeT (Λ ) . This means that the change of (2 − Pi 2 (Λ))(2 − Pi1 (Λ)) plays a dominant role in the change of PSD in a frame, and the change of n ∑ ((2 − Pi 2 (Λ ))(2 − Pi1 (Λ ))) plays a key role in the change of i =1 total PSD in the GOP. With the change of bit allocation between HQF and LQFs under the same overall bit rate of the GOP, if n ∑ ((2 − Pi 2 (Λ))(2 − Pi1 (Λ))) increases (or decreases), the value of i =1 total PSD will also increases (or decreases). When n ∑ ((2 − Pi 2 (Λ))(2 − Pi1 (Λ))) is the smallest, the total PSD is the i =1 smallest, the coding efficiency is the best. Therefore, the value n of ∑ ((2 − Pi 2 (Λ ))(2 − Pi1 (Λ ))) is utilized instead of total PSD as i =1 the performance measure of bit allocation. (2 − Pi 2 ( Λ ))(2 − Pi1 (Λ )) can be further simplified. According to (3), Pi 2 (Λ ) and Pi1 (Λ ) are within (0, 1). If Pi 2 (Λ ) or Pi1 ( Λ ) increases, the value of (2 − Pi 2 ( Λ )) or (2 − Pi1 (Λ )) will decrease. So (2 − Pi 2 (Λ ))(2 − Pi1 (Λ )) has an inverse relationship with Pi 2 (Λ) Pi1 (Λ) , which can be calculated from (3) as follows: Pi 2 (Λ ) Pi1 (Λ ) = e =e −2πσ +2 _ JLi (ω x2 + ω 2y ) ×e −2πσ +2 _ JSi (ω x2 +ω 2y ) −2π (σ +2 _ JLi +σ +2 _ JSi )(ω x2 + ω 2y ) =e −2π (σ +2_ Pi )(ω x2 + ω 2y ) ,(22) where σ +2_ Pi = σ +2_ JSi + σ +2_ JLi , (23) PAPER IDENTIFICATION NUMBER:3231 7 σ +2_ JLi is the DEV in frame i when reference frame is a LTR in JU-DFMC, σ +2_ JSi is the DEV in frame i when the reference frame is a STR in JU-DFMC. From (22), σ +2_ Pi has an inverse relationship with Pi2 (Λ)Pi1 (Λ) , then it has a direct relationship with (2 − Pi 2 (Λ ))(2 − Pi1 (Λ )) . Therefore, σ +2_ Pi is utilized instead of (2 − Pi 2 (Λ ))(2 − Pi1 (Λ )) as the performance measure of bit allocation for frame i and n thus ∑ σ +2_ Pi is utilized as the performance measure of bit the jth GOP is calculated. If j is equal to 1, Ra is initialized as 4. Otherwise, Ra is updated from the previous GOP as follows. After encoding the (j-1)th GOP, the bit allocation and MSEs in HQF and LQFs is obtained and fixed. Assuming the overall bit-rate of the (j-1)th GOP is fixed, if the bit allocation between HQF and LQFs in the (j-1)th GOP changes (suppose the changed bit allocation in every LQF is the same) based on the fixed bit allocation, the changed MSE in HQF and LQFs corresponding to the changed bit-rate can be approximately calculated from the derivation of function R( D) in (17), i =1 σ2 dR 1 D 1 1 = × 2 × (− e2 ) = − × , 2ln 2 D dD 2ln 2 σ e D allocation for the GOP. With the change of bit allocation n between HQF and LQFs, when ∑ σ +2_ Pi is the smallest, the i =1 overall power special density is the smallest, the bit allocation ratio between HQF bits and average LQF bits a GOP is the best. In the short term, the DEV σ +2 is nearly directly proportional to the PEV σ e2 [23]. Then in every GOP, σ +2_ JLi and σ +2_ JSi are nearly directly proportional to σ e2_ JLi and σ e2_ JSi , which are the PEVs in frame i when reference frame is LTR and STR respectively. So σ e2_ Pi = σ e2_ JLi + σ e2_ JSi and n n i =1 i =1 ∑ σ e2_ Pi can be used instead of σ +2_ Pi and ∑ σ +2_ Pi as the performance measure of bit allocation for frame i and for a GOP respectively. If MSE in the LTR changes, the change of PEV σ e2_ JLi in each frame i is nearly the same. If the change of MSE in different STRs is the same, the change of PEV σ e2_ JSi in each frame i is nearly the same as well. Then with the change of MSE in LTR and STRs, the change of σ e2_ Pi in each frame is nearly the same. Therefore, σ e2_ Pi can be used instead of n ∑ σ i =1 2 e _ Pi as the performance measure of bit allocation for the GOP. Finally σ e2_ P = σ e2_ JL 2 + σ e2_ JSA is utilized as the (25) then ∆D = −2ln 2 × D × ∆R , (26) where ∆D is the change of MSE, ∆R is the change of bit-rate, and D is the MSE in the frame. After adding the changed MSE to the fixed MSE, and adding the changed bit-rate to the fixed bit-rate, the different MSE corresponding to different bit allocation in HQF and LQFs of the (j-1)th GOP can be calculated. The MSE of the reference frame is nearly directly proportional to the PEV σ e2 from the reference frame, thus if the MSEs in HQF and LQFs are known, σ e2_ JL 2 from LTR and the average σ e2_ JSA from STRs can be calculated. In the strategy of bit allocation change in the work, the bit allocation change step in HQF is R H × 1% , and the changed bit allocation in HQF is within the range ( −RH ×10%, +R H ×10% ), where R H is the actual bit allocation in HQF after encoding the (j-1)th GOP. The bit allocation change in HQF is averagely compensated from LQFs . For each changed bit allocation case, the σ e2_ P (σ e2_ JSA + σ e2_ JL 2 ) is calculated respectively. The bit allocation ratio which brings the smallest σ e2_ P is reserved. As in adjacent GOPs, the bit allocation ratios between HQF bits and average LQF bits have little difference, so the reserved bit allocation ratio in the (j-1)th GOP is selected as the bit allocation ratio in the jth GOP. performance measurement of bit allocation for the GOP in accordance with that in the optimal LTR selection. Step 3: Frame level bit allocation B. Bit Allocation The bit allocation in HQF T H and the average bit allocation in LQFs T L _ ave in the jth GOP are separately calculated using Step 1: GOP level bit allocation the reserved bit allocation ratio and GOP level bit allocation, In the jth GOP, GOP length is initialized as N for peforming GOP level bit allocation. If j is equal to 1, N is set to 10. Otherwise, N is set to the actual GOP length in the previous GOP. Suppose the overall remaining frames waiting to be encoded is M, while the remaining bits are R r . The target bit allocation T ( j ) for the jth GOP is calculated as T ( j) = N × Rr . M (24) Step 2: Obtaining bit allocation ratio After we obtained the GOP level bit allocation, the bit allocation ratio Ra between HQF bits and average LQF bits in TH = Ra × T ( j) , Ra + ( N − 1) × 1 T L _ ave = 1 × T ( j) . Ra + ( N − 1) × 1 (27) (28) Step 4: Quantization parameter determination For determining quantization parameter (QP) in HQF and LQFs, the rate and quantization parameter models are similar to those in [34] with some modifications. For LQFs and HQFs, the quadratic rate control model is updated and stored respectively. In every GOP, to maintain the smoothness of LQFs quality, the average bit allocation in LQFs is utilized to PAPER IDENTIFICATION NUMBER:3231 8 calculate the average QP ( Q ) in LQFs, and then in different LQFs, the QP is a slightly adjusted based on the average QP. In the work, for the first LQF in the GOP, the QP is set to Q +1. For the middle LQFs (from the second LQF to the (2N / 3)th LQF, where N is the actual GOP length of previous GOP), the QP is set to Q . For the following LQFs, the QP is set to Q -1. To reduce the error propagation, an error resilient JU-DFMC structure is proposed. As shown in Fig. 8, for the HQF in each GOP, only the previous LTR (HQF) is utilized for prediction, and for LQFs, the prediction structure is unchanged. Then, when transmission errors occur in LQFs of the current GOP, the error will not be propagated to the following HQFs and GOPs. V. EXTENSION TO NOISY CHANNELS B. Optimal LTR Selection and Bit Allocation in Error Resilient JU-DFMC To deduce the average increase of error propagation in a GOP, we adopt [35] Based on the proposed adaptive JU-DFMC, an error resilient JU-DFMC prediction structure is firstly given. Then adaptive LTR selection and bit allocation with respect to different packet loss rates are presented. A. Error Resilient JU-DFMC For the video transmission over error prone channels, the end-to-end distortion model [35][36] is extended in this work to analyze the error propagation. In the end-to-end distortion model, transmission error rates of two data partitions A (the header information) and B (transformed coefficients of the inter-coded blocks) are represented by p A and p B , respectively. Let f ni be the original value of pixel i in frame n, and let fˆ ni and f ni be the reconstructed values in the encoder and decoder, respectively. And suppose it references pixel k in frame ref. Then expected inter-mode end-to-end distortion in decoder is represented in [35] as d(n,i) = E{( f ni − fni )2} =(1− pA)E{( fˆrefk − frefk )2}+(1− pA)(1− pC)E{( f ni − fˆni )2} +(1− pA)pCE{( f ni − fˆrefk )2}+ pA(E{( f ni − fˆni−1)2}+ E{( fˆni−1 − fni−1)2}) .(29) = (1− pA)dep_ref +(1− pA)(1− pC)ds +(1− pA)pCdec_ref _o + pA(dec_ prev_o +dep_ prev) In (29), d (n, i) is the expected end-to-end distortion in decoder, d ep _ ref is the error propagated distortion from the reference frame, d s denotes the source distortion, d ec _ ref _ o indicates the original referenced error-concealment distortion, d ec _ prev _ o denotes the original previous error-concealment distortion, d ep _ prev denotes the error-propagated distortion from previous frame. By the observation of (29), in the expected end-to-end distortion d (n, i) , the percentage of error propagated distortion d ep (including d ep _ ref and d ep _ prev ) is larger than the source distortion d s , the error-concealment distortions d ec _ ref _ o and d ec _ prev _ o . Furthermore, the error propagated distortion d ep increases frame by frame in video decoding. Therefore, if there is an error in the current frame, the image quality in the following frames will be heavily affected. d ep = (1 − p A )(1 − p C )d ep _ ref + (1 − p A ) p C (d ec _ ref _ r + d ep _ ref ) , (30) + p A (d ec _ prev _ r + d ep _ prev ) where d ep , d ep _ ref and d ep _ prev are the error propagated distortions in the current HQF, the previous LTR and the previous LQF, respectively. Since d ep _ prev only accounts for a few percentage pA LTR LTR Fig. 8 Error resilient JU-DFMC structure d ep , we adopt p A × d ep _ prev ≈ p A × d ep _ ref . So (30) can be rewritten as d ep ≈ d ep _ ref + (1 − p A ) p C d ec _ ref _ r + p Ad ec _ prev _ r . (31) If the GOP length is N, the average increase of the error propagated distortion d ep _ ave in every frame of the GOP is d ep _ ave = d ep − d ep _ ref N ≈ (1 − p A ) p C d ec _ ref _ r + p Ad ec _ prev _ r N . (32) From (32), we can see that to reduce the average increase of error propagated distortion, two factors can be adjusted. The first is to increase of GOP length N. The second is to decrease error concealment distortion values d ec _ ref _ r and d ec _ prev _ r , which needs to increase the bit allocation in the HQF. If more bits are allocated to HQF, the prediction performance will be better and the error concealment distortion will be smaller. In determining the LTR (HQF) selection and bit allocation in error resilient JU-DFMC, the method is the same as that in the proposed adaptive JU-DFMC. But the PEV utilized in calculating performance measure is not computed from source distortion but the expected end-to-end distortion d (n, i) in (29). Since the distortion of reference frame is nearly directly proportional to the PEV σ e2 from the reference frame, and the expected end-to-end distortion d (n, i ) LQ in LQF and d (n, i) HQ in HQF are separately calculated by (29), the virtual PEV σ e2_ JS from d (n, i) LQ and σ e2_ JL from d (n, i) HQ are separately calculated as σ e2_ JS = σ e2_ JL = LQ LQ LQ LQ HQ f LQ fi −HQ fi LQ f i +LQ N −1 f i − N f i − N +1 f i − N + 2 i − N + 3 f i − 2 f i −1 1 in d (n, i ) LQ d S _ LQ d (n, i ) HQ d S _ HQ × σ e2_ JS , (33) × σ e2_ JL , (34) where d S _ LQ and d S _ HQ are the source distortion d S in LQF and HQF, σ e2_ JS and σ e2_ JL are the PEV from the source distortion d S _ LQ and the source distortion d S _ HQ . With the PAPER IDENTIFICATION NUMBER:3231 9 Table II. Performance comparison of the proposed adaptive JU-DFMC with existing DFMC schemes Sequence Bit rate (kbps) PSNR in CU-DFMC (dB) PSNR in JU-DFMC [16] (dB) 249.12 401.23 580.10 850.12 149.53 310.27 507.21 673.68 139.92 232.82 403.63 751.94 17.66 27.69 46.47 80.19 100.36 149.87 200.55 292.76 135.22 200.83 270.02 317.80 70.47 141.12 194.78 255.54 225.45 349.76 539.24 708.35 34.25 45.12 68.82 94.18 30.63 33.03 35.08 37.49 30.36 33.83 36.56 38.14 32.11 34.42 36.59 39.03 32.88 34.81 36.86 38.91 34.48 36.68 38.23 40.05 32.02 34.61 36.54 37.86 33.92 36.73 38.02 39.33 35.85 38.02 40.08 41.85 34.54 36.41 38.32 39.97 31.32 33.68 35.62 37.99 31.11 34.43 37.10 38.68 33.04 35.27 37.54 39.73 33.97 35.79 37.88 39.64 35.12 37.32 38.91 40.72 32.91 35.52 37.43 38.65 34.28 37.27 38.48 39.67 36.55 38.93 40.82 42.59 35.13 36.82 39.13 40.34 Mobile (qcif) Tempete (qcif) Waterfall (cif) Container (qcif) News (cif) Paris (qcif) Foreman (qcif) Silent (cif) Hall (qcif) PSNR in Proposed Adaptive JU-DFMC (dB) 32.32 34.43 36.22 38.51 31.87 34.99 37.48 39.09 34.14 36.21 38.29 40.37 34.56 36.19 38.24 40.04 35.44 37.67 39.26 40.97 33.23 35.82 37.88 38.99 34.54 37.64 38.73 39.99 36.99 39.23 41.60 42.99 35.29 37.03 39.36 40.57 Gain over CU-DFMC PSNR 1.69 1.40 1.14 1.02 1.51 1.16 0.92 0.95 2.03 1.79 1.7 1.34 1.68 1.38 1.38 1.13 0.96 0.99 1.03 0.92 1.21 1.21 1.34 1.13 0.62 0.91 0.71 0.66 1.14 1.21 1.52 1.14 0.75 0.62 1.04 0.6 Average gain (dB) 1.31 1.14 1.72 1.39 0.98 1.22 0.73 1.25 0.75 Gain over JU-DFMC[16] PSNR 1. 00 0.75 0.60 0.52 0.76 0.56 0.38 0.41 1.10 0.94 0.75 0.64 0.59 0.40 0.36 0.4 0.32 0.35 0.35 0.25 0.32 0.30 0.45 0.34 0.26 0.37 0.25 0.32 0.44 0.30 0.78 0.4 0.16 0.21 0.23 0.23 Average gain (dB) 0.72 0.53 0.86 0.44 0.32 0.35 0.30 0.48 0.21 Table III. The average percentage of blocks in each frame whose reference frame is LTR or STR Sequence Mobile (qcif) Tempete (qcif) Waterfall (cif) Container (qcif) News (cif) Paris (qcif) Foreman (qcif) Silent (cif) Hall (qcif) Adaptive JU-DFMC Reference frame is Reference frame is LTR(%) STR (%) 63.91 36.09 49.85 50.15 68.22 31.78 52.87 47.13 68.93 31.07 42.35 57.65 37.41 62.52 45.62 54.36 64.35 35.65 same method as in the proposed adaptive JU-DFMC, the performance measure σ e2_ T of LTR selection and performance measure σ e2_ P of bit allocation in error resilient JU-DFMC are σ 2 e _T σ = 2 e_ P (σ 2 e _ JL 2 = σ e2_ JL 2 + σ e2_ JSA ) / 2 + σ e2_ JSA . , (35) (36) CU-DFMC Reference frame is LTR (%) 24.61 8.20 8.50 1.12 1.08 5.20 14.20 1.23 0.80 Reference frame is STR (%) 75.39 91.80 91.50 98.88 98.92 94.80 85.80 98.77 99.20 In (35) and (36), σ e2_ JL 2 is the virtual PEV in the second LQF of the GOP when the reference frame is LTR; σ e2_ JSA is the average virtual PEV in LQFs (from the second LQF to the last encoded LQF) of the GOP when the reference frame is STR. However, in the rate-distortion mode decision, the distortion is not the overall end-to-end distortion d (n, i) but the source distortion d S . PAPER IDENTIFICATION NUMBER:3231 10 41 38 40 37 39 36 38 35 PSNR(dB) PSNR(dB) Mobile(qcif) 39 37 36 34 Adaptive JU-DFMC 33 JU-DFMC in [16] 32 CU-DFMC+RC 34 31 CU-DFMC+JU_BA 33 30 SFMC+JU_BA 32 29 Waterfall(cif) Adaptive JU-DFMC 35 JU-DFMC in [16] CU-DFMC+RC CU-DFMC+JU_BA SFMC+JU_BA 31 200 300 400 500 600 Bitrate(kbps) 700 800 900 100 200 300 400 500 Bitrate(kbps) (a) 600 700 800 (b) Fig. 9 Rate distortion curves for some sequences Mobile(qcif) 38 40 37 39 36 35 Adaptive JU-DFMC 34 LTR_interval+2 33 LTR_interval-3 32 31 PSNR(dB) PSNR(dB) Waterfall(cif) 41 39 38 Adaptive JU-DFMC 37 LTR_interval+2 36 LTR_interval-3 LTR_BA+7% 35 LTR_BA+7% LTR_BA-9% 34 30 LTR_BA-9% 33 200 300 400 500 600 Bitrate(kbps) 700 800 900 100 200 300 400 500 Bitrate(kbps) (a) 600 700 800 (b) Fig. 10 Rate distortion curves under different LTR intervals and different bit allocations Waterfall(cif) Mobile(qcif) 40 36 35 39 33 PSNR(dB) PSNR(dB) 34 32 31 38 37 36 30 29 CU-DFMC JU-DFMC [16] SFMC+JU_BA 28 27 0 10 20 30 40 Adaptive JU-DFMC CU-DFMC+JU_BA CU-DFMC JU-DFMC[16] SFMC+JU_BA 35 Adaptive JU-DFMC CU-DFMC+JU_BA 34 50 60 Frame_no 70 80 90 100 (a) 0 10 20 30 40 50 60 Frame_no 70 80 90 100 (b) Fig. 11 Frame by frame PSNR in test sequences Mobile and Waterfall under the same bit rate VI. EXPERIMENTAL RESULTS AND DISCUSSIONS To evaluate the general performance of the proposed adaptive JU-DFMC, we integrated the proposed methods into the H.264/AVC reference software JM10.2. In the proposed adaptive JU-DFMC, the first P frame is set as the first LTR, its allocated bits are initialized as four times the average bits of the following STRs. For the selection of the following LTR and the corresponding bit allocation, the proposed methods in Section III and Section IV are adopted. In the CU-DFMC, two most recently decoded frames are used for motion compensation, LTRs are continuously updated and not allocated any extra rate. The test sequences are all encoded at 30 fps with 120 frames in total for each sequence. In motion estimation, the search range is ±16. The entropy coder is context-adaptive binary arithmetic coding (CABAC). Each row of macroblocks comprises a slice and is transmitted in a separate packet. In Table II, the PSNRs of CU-DFMC (fixed QP in every frame), the JU-DFMC [16] and the proposed adaptive JU-DFMC are given respectively. The performance gain of the proposed adaptive JU-DFMC under different bit rates is also presented. Compared with the JU-DFMC [16], the average PSNR gains obtained by the proposed adaptive JU-DFMC are 0.72, 0.53, 0.86, 0.44, 0.32, 0.35, 0.30, 0.48, and 0.21 dB in sequences Mobile, Tempete, Waterfall, Container, News, Paris, Foreman, Silent and Hall, respectively. PAPER IDENTIFICATION NUMBER:3231 11 Table IV. Performance comparison of the error resilient JU-DFMC with other schemes Sequence Mobile (qcif) 650.35kbps Tempete (qcif) 500.26kbps Waterfall (cif) 480.78kbps Container (qcif) 75.24kbps News (cif) 300.57kbps Paris (qcif) 200.83kbps Foreman (qcif) 150.52kbps Silent (cif) 404.26kbps Hall (qcif) 41.28kbps Schemes 2HMC [26] CU-DFMC +DPRD [36] Adaptive JU-DFMC Error resilient JU-DFMC 2HMC [26] CU-DFMC +DPRD [36] Adaptive JU-DFMC Error resilient JU-DFMC 2HMC [26] CU-DFMC +DPRD [36] Adaptive JU-DFMC Error resilient JU-DFMC 2HMC [26] CU-DFMC +DPRD [36] Adaptive JU-DFMC Error resilient JU-DFMC 2HMC [26] CU-DFMC +DPRD [36] Adaptive JU-DFMC Error resilient JU-DFMC 2HMC [26] CU-DFMC +DPRD [36] Adaptive JU-DFMC Error resilient JU-DFMC 2HMC [26] CU-DFMC +DPRD [36] Adaptive JU-DFMC Error resilient JU-DFMC 2HMC [26] CU-DFMC +DPRD [36] Adaptive JU-DFMC Error resilient JU-DFMC 2HMC [26] CU-DFMC +DPRD [36] Adaptive JU-DFMC Error resilient JU-DFMC 3% 31.49 31.51 33.56 34.42 32.38 32.41 32.47 33.56 33.04 33.06 33.12 34.96 35.55 35.56 35.63 36.82 36.31 36.43 36.52 37.23 31.94 32.01 32.13 33.32 34.26 34.45 33.82 35.57 36.94 37.01 37.19 37.84 35.16 35.18 35.23 35.87 PSNR of different packet loss rates (dB) 5% 10% 20% 30.21 28.23 26.14 30.26 28.34 26.22 30.32 28.21 26.08 32.67 31.85 30.68 31.12 29.25 27.16 30.95 29.46 27.52 31.14 29.21 27.09 32.86 31.76 30.58 32.08 30.46 28.42 32.12 30.53 28.65 32.13 30.42 28.38 34.25 33.67 32.43 34.09 32.54 30.71 34.12 32.57 30.87 34.14 32.52 30.62 35.92 34.69 33.87 34.96 33.50 31.51 35.01 33.56 31.72 35.03 33.41 31.42 36.58 35.92 35.23 30.78 29.17 27.84 30.92 29.26 27.94 30.92 29.08 27.73 32.78 32.36 31.82 33.13 30.67 28.01 33.32 30.97 28.27 33.29 30.62 27.98 35.37 33.89 31.87 35.97 33.40 31.70 36.02 33.56 31.92 36.04 33.32 31.67 37.32 36.27 35.12 35.01 33.78 32.64 35.05 33.89 32.78 35.09 33.73 32.61 35.67 34.83 33.68 Table III shows the average percentage of blocks in every frame that utilizes LTR or STR as reference frames. Intra mode blocks are not included in the experimental results. It can be seen that in the adaptive JU-DFMC, the percentage of blocks which utilize LTR as reference frame is higher than that in CU-DFMC. The R-D curves in some sequences achieved from the proposed adaptive JU-DFMC (Adaptive JU-DFMC), JU-DFMC in [16], CU-DFMC with rate control [34] (CU-DFMC+RC) are shown in Fig. 9. Furthermore, CU-DFMC structure with proposed JU-DFMC bit allocation profile (CU-DFMC+JU_BA), single reference frame motion compensation with proposed JU-DFMC bit allocation profile (SFMC+JU_BA) are also presented in the figure. In CU-DFMC+JU_BA and SFMC+JU_BA, the bit allocation in every frame is the same as that in the proposed adaptive JU-DFMC scheme. But for every frame in CU-DFMC+JU_BA, only the two recently decoded frames are utilized for motion compensation. For every frame in SFMC+JU_BA, only one recently decoded frame is utilized for motion compensation. Original PSNR (dB) 3% 35.32 33.01 36.79 36.45 35.92 33.51 37.33 37.03 36.75 34.58 38.79 38.50 38.18 36.88 39.84 39.53 39.98 37.82 41.18 40.89 33.94 32.92 35.82 35.38 36.55 35.82 37.87 37.53 38.17 37.12 40.02 39.74 34.62 35.52 36.49 36.22 5% 35.32 31.94 36.79 36.33 35.92 32.46 37.33 36.92 36.75 33.79 38.79 38.41 38.18 35.56 39.84 39.43 39.98 36.61 41.18 40.81 33.94 32.02 35.82 35.30 36.55 34.87 37.87 37.41 38.17 36.34 40.02 39.66 34.62 35.54 36.49 36.12 10% 35.32 30.22 36.79 36.21 35.92 31.16 37.33 36.79 36.75 32.43 38.79 38.31 38.18 34.12 39.84 39.32 39.98 35.12 41.18 40.72 33.94 30.56 35.82 35.19 36.55 32.99 37.87 37.26 38.17 35.12 40.02 39.54 34.62 34.57 36.49 36.04 20% 35.32 28.31 36.79 36.10 35.92 29.62 37.33 36.67 36.75 30.78 38.79 38.22 38.18 32.68 39.84 39.20 39.98 33.52 41.18 40.65 33.94 29.04 35.82 35.09 36.55 30.98 37.87 37.14 38.17 33.72 40.02 39.45 34.62 33.69 36.49 35.94 The performance of CU-DFMC+JU_BA is worse than the proposed adaptive JU-DFMC. This is because in CU-DFMC+JU_BA, some frames following HQF do not utilize the HQF as references to improve coding performance. SFMC+JU_BA also has worse performance than the proposed adaptive JU-DFMC. The reason is that only one frame is utilized for motion compensation. If the proposed bit allocation method does not change, but the LTR interval is 2 frames larger or 3 frames less than the actual LTR interval proposed in adaptive JU-DFMC (separately named as LTR_interval+2 and LTR_interval-3), the performance are given in Fig.10. If the proposed LTR interval does not change, but LTR bit allocation is 7% larger or 9% less than the actual LTR bit allocation proposed in adaptive JU- DFMC ( separately named as LTR_BA+7% and LTR_BA-9%), the performance are given in Fig.10 as well. The performance of LTR_interval+2, LTR_interval-3, LTR_BA+7% and LTR_BA-9% are slightly lower than that in the proposed adaptive JU-DFMC scheme. It shows that the PAPER IDENTIFICATION NUMBER:3231 12 1.4 11 Bit allocation ratio GOP Length(In frames) 12 10 9 8 Mobile Foreman 7 6 1.3 1.2 Mobile Foreman 1.1 1 0 5 10 15 Packet loos rate(%) 20 Fig. 12 Average jump updating parameter of LTR proposed adaptive LTR selection and bit allocation are effective. The frame by frame PSNR comparison of proposed adaptive JU-DFMC (Adaptive JU-DFMC), JU-DFMC in [16], CU-DFMC+JU_BA, SFMC+ JU_BA and CU-DFMC (fixed QP in every frame) under the same bit-rate in sequences Mobile (255.71 kbps), Tempete (220.62kbps), Waterfall (362.82 kbps) and Container (71.64 kbps) is shown in Fig. 11. It can be seen that the PSNR deterioration after the HQF in [16] is much more graceful than the proposed scheme, but the PSNR in majority of LQFs in the proposed scheme is better than that in [16] and CU-DFMC. The PSNR in HQFs in the proposed adaptive JU-DFMC is much larger than that of LQFs. This will introduce objectionable pulses in quality over time. But as mentioned in [14], when the PSNR of the sequence is higher than 30dB, the pulsing is not perceptible. Furthermore, the pulsing can benefit some applications, such as video surveillance, the HQF can create higher image quality of video surveillance content. In the video communication or multimedia, the HQF can be quantized by a larger QP when it is playbacked in decoder to maintain the similar image quality as that in LQFs, then the overall quality fluctuation in the proposed scheme is less than that in [16]. The image quality in LQFs can likewise be enhanced using the previous and the following HQFs in decoder. The experimental results of the error resilient JU-DFMC in decoder are repeated 100 times using the bit error sequences which are transmitted from the encoder via the error-prone channels [38]. Although the method in [26] is used without the need of live encoding, whereas the proposed error resilient JU-DFMC needs to adaptively adjust the coding parameters, the performance of 2HMC [26] is listed to provide useful information. CU-DFMC+DPRD [36] is also utilized for performance comparison here. In Table IV, the experimental results of the proposed error resilient JU-DFMC (Error resilient JU-DFMC), CU-DFMC+DPRD [36], proposed adaptive JU-DFMC (Adaptive JU-DFMC), 2HMC [26] are listed and compared. Under the given bit-rate, the original PSNR of proposed sequences without error is shown on the right side of Table IV. The PSNRs of the error resilient JU-DFMC and the other coding schemes measured after packet loss rate are listed on the left side of Table IV. 2HMC [26] has slightly lower performance than CU-DFMC+DPRD [36], it is because error 0 5 10 15 Packet loos rate (%) 20 Fig. 13 Ratio of bit allocation in LTR propagation in [26] is alleviated, but not terminated. Adaptive JU-DFMC is a slightly better than CU-DFMC+DPRD [36] in low packet loss rate. This is because lots of blocks in LQF utilize LTR as their reference frame, if some errors occur in LQF while the error occurred areas are not utilized as reference for the next LQF, the error will not be propagated to following frames. And the other reason is that the original PNSR of the adaptive JU-DFMC is better than CU-DFMC +DPRD [36]. However, at high packet loss rate, the performance of the adaptive JU-DFMC is less than that in DFMC+DPRD [36]. This is because the adaptive JU-DFMC can not terminate error propagation. Compared to CU-DFMC+DPRD [36], the proposed error resilient JU-DFMC can achieve a maximum gain of 4.46 dB on Mobile sequence at a packet loss rate of 20% and an average gain of 2.23 dB on all the sequences under the different packet loss rates. This is because in CU-DFMC+DPRD [36], the error propagation is terminated by inserting intra mode blocks, but the R-D cost is huge, and sometimes the error cannot be accurately predicted in encoder. Compared with the adaptive JU-DFMC, the proposed error resilient JU-DFMC can achieve a maximum gain of 4.60 dB on Mobile sequence at packet loss rate at 20% and an average gain of 2.27 dB on the all sequences under the different packet loss rates. This shows the proposed error resilient JU-DFMC is effective in eliminating error propagation. The elimination of error propagation comes from two factors, the first factor is that if an error occurred in LQFs in the error resilient JU-DFMC (this kind of error takes the largest percentage), the error propagation will be terminated in the last LQF of the GOP. The second factor is that even error occurs in HQF, the average error propagation speed is smaller than the other schemes since HQF interval is large. However, HQF is more important than LQFs. More protection to HQFs is needed as indicated in [20]. In addition, Fig. 12 shows the average jump updating parameter of LTR at different packet loss rates. Fig. 13 shows the ratio of bit allocation for LTR under different packet loss rates compared with that without packet loss rate. It can be seen that with the increase of packet loss rate, the jump updating parameter and bits in HQF (LTR) increase adaptively. VII. CONCLUSIONS In this paper, an optimal LTR selection and bit allocation for JU-DFMC has been presented. Firstly the rate-distortion PAPER IDENTIFICATION NUMBER:3231 performance of MCP for DFMC was analyzed. Then based on the analysis, an adaptive JU-DFMC with the optimal LTR selection and the bit allocation was given. The proposed adaptive JU-DFMC can lead to better performance than previous JU-DFMC schemes. For video transmission over noisy channels, the error propagation of the proposed adaptive JU-DFMC was analyzed. Furthermore, an error resilient JU-DFMC considering the LTR selection and the bit allocation was presented. The error resilient DFMC can obtain a much better performance in video transmission over noisy channels. The proposed schemes can be used in multimedia applications and video surveillance systems. It can also be utilized to instruct the rental of extra bandwidth for the video transmission over cognitive radio. In the future, the rate control for the proposed scheme will be further exploited. ACKNOWLEDGMENT The authors would like to thank the editor and the anonymous reviewers whose invaluable comments and suggestions led to greatly improved manuscript. APPENDIX A. Proof of Equation (11) We know 0 < P1J (Λ ) < P1CH (Λ) and 0 < P2J (Λ ) < P2CH (Λ ) , J P1 (Λ ) < 1 . And the difference between DEVs from STRs in JU-DFMC and CU-DFMC is much smaller than the DEV from STR in JU-DFMC, then P1CH (Λ) − P1J (Λ ) P1J (Λ ) . From (10), we have 13 STR, the prediction performance in the STR from its reference frames (its STR and LTR) is nearly the same as that in the coding of LTR (HQF). So whenever STR is encoded as a LQF J or a HQF, the PSD Φ ee _1 ( Λ ) in the STR is nearly the same as J J J the PSD Φ ee _ 2 ( Λ ) in LTR, Φ ee _1 ( Λ ) ≈ Φ ee _ 2 ( Λ ) . So (B1) can be further rewritten as sut ee (Λ) −Φee (Λ) Φ 1 J (Λ) −ΦCL (Λ)) − 1 h(ΦCH (Λ) −Φ J (Λ)) +Φ (P LQ − P HQ) ,(B2) ≈ h(Φee ss _1 ee _1 ee _1 ee _1 2 2 1 sut = h(Φee _1(Λ) −Φee _1(Λ)) +Φss (P LQ − P HQ) 2 sut ee _1 (Λ ) represents the performance gain when the in (B2), Φ STR is encoded as LQF in JU-DFMC, Φ ee _1 (Λ) represents the performance loss when the STR is encoded as HQF in JU-DFMC. C. Proof of Equation (21) Before and after the change of bit allocation, the PSD in frame i is separately denoted as Φ eei (Λ) and Φ eei (Λ) . From (4), we have Φ eei (Λ) − Φ eei (Λ) 1 1 = h(Φ eei _1(Λ) − Φ eei _1(Λ)) + h(Φ(Λ) eei _ 2 − Φ eei _ 2 (Λ)) + Φ ss (Λ) 4 4 1 1 ×( Pi1 (Λ) Pi 2 (Λ) − Pi1 (Λ) − Pi 2 (Λ)) − ( Pi1 (Λ) Pi 2 (Λ) − Pi1 (Λ) − Pi 2 (Λ))) 2 2 1 1 = h(Φ eei _1(Λ) − Φ eei _1(Λ)) + h(Φ(Λ) eei _ 2 − Φ eei _ 2 (Λ)) 4 4 1 + Φ ss (Λ)((2 − Pi 2 (Λ))(2 − Pi1 (Λ)) − (2 − Pi 2 (Λ))(2 − Pi1 (Λ))) 2 1 1 P HQ ≈ (P1J (Λ) × ( P2CH (Λ) −1) − P1J (Λ) × ( P2J (Λ) −1)) + (P2J (Λ) − P2CH (Λ)) REFERENCES 2 2 1 = (P1J (Λ) × × (P2CH (Λ) − P2J (Λ)) + (P2J (Λ) − P2CH (Λ)) [1] T. Sikora, “The MPEG-4 video standard verification model,” IEEE Trans. 2 Circuits Syst. Video Technol., vol. 7, no 1, pp. 19–31, Feb. 1997. 1 [2] G.Cote, B.Erol, M. Gallant, and F. Kossentini, “H.263+: Video coding at = (1 − P1J (Λ)) × (P2J (Λ) − P2CH (Λ)) < 0 2 low bit rates,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no 7, B. Proof of Equation (12) sut ee (Λ ) and Φ ee (Λ ) are the R-D performance gain Suppose Φ and loss when frame j is separately encoded as LQF and HQF sut ee (Λ ) in JU-DFMC. The difference of R-D performance gain Φ and R-D performance loss Φ ee (Λ) can be obtained as sut ee(Λ) −Φee(Λ) Φ 1 J (Λ) −ΦCL (Λ)) + h(Φ J (Λ) −ΦCL (Λ)) = (h(Φee ee _1 ee _2 ee _2 _1 4 J CH J LQ HQ − h(ΦCH ee _1(Λ) −Φee _1(Λ)) − h(Φee _2(Λ) −Φee _2(Λ))) +Φss (P − P ) .(B1) 1 1 1 1 J (Λ) − (ΦCL (Λ) +ΦCL (Λ)))− h( (ΦCH (Λ) +ΦCH (Λ)) = h(Φee ee _1 ee _2 ee _1 ee _2 _1 2 2 2 2 J −Φee_2(Λ)) +Φss (P LQ − P HQ) For every frame in CU-DFMC, the prediction performance from reference frames (LTR and STR) is roughly similar, then CL the PSD in every frame is roughly similar, Φ CL ee _1 ( Λ ) ≈ Φ ee _ 2 ( Λ ) CH and Φ CH ee _1 ( Λ ) ≈ Φ ee _ 2 ( Λ ) . In JU-DFMC, the GOP length of the current GOP is nearly the same as that in the previous GOP. For the last several frames in the current GOP, in the coding of its pp.849–865, Nov. 1998. T. Wiegand, “Joint final committee draft for joint video specification H.264,” Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, JVT-D157, 2002. [4] D. Hepper, “Efficiency analysis and application of uncovered background prediction in a low bit rate image coder,” IEEE Trans. Commun., vol. 38, no 9, pp. 1578–1584, Sept. 1990. [5] F. Dufaux and F. Moscheni, “Background mosaicking for low bit rate video coding,” in Proc. IEEE Int. Conf. Image Processing, Lausanne, Switzerland, Sept. 1996. [6] ISO/IEC JTC1/SC29/WG11 MPEG96/N1648, “Core experiment on sprites and GMC,” Apr. 1997. [7] M. Budagavi and J.D. Gibson, “Multiframe video coding for improved performance over wireless channels,” IEEE Trans. Image Process., vol.10, no. 2, pp. 252-265, Feb. 2001. [8] T. Wiegand, X. Zhang, and B. Girod, “Long-term memory motion-compensated prediction,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 1, pp. 70-84, Feb. 1999. [9] A. Leontaris and P. C. Cosman, “Video compression for lossy packet networks with mode switching and a dual-frame buffer,” IEEE Trans. Image Process., vol. 13, no. 7, pp. 885–897, July 2004. [10] ISO/IEC JTC1/SC29/WG11 MPEG96/M0654, “Core experiment of video coding with block-partitioning and adaptive selection of two frame memories (STFM/LTFM),” Dec. 1996. [11] T. Fukuhara , K. Asai, and T. Murakami, “Very low bit-rate video coding with block partitioning and adaptive selection of two time-differential frame memories,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no 1, pp. 212-220, Feb. 1997. [3] PAPER IDENTIFICATION NUMBER:3231 [12] V. Chellappa, P. C. Cosman, and G.M. Voelker, “Dual frame motion compensation for a rate switching network,” Asilomar Conf. on Signals, Systems, and Comp., Nov. 2003. [13] V. Chellappa, P. C. Cosman, and G.M. Voelker, “Dual frame motion compensation with uneven quality assignment,” IEEE Data Compression Conference 2004, pp. 262-271, Mar 2004. [14] V. Chellappa, P. C. Cosman, and G.M. Voelker, “Dual frame motion compensation with uneven quality assignment,” IEEE Trans. Circuits Syst. Video Technol., Volume 18, no 2, pp. 249-256, Feb. 2008. [15] M. Tiwari and P. C. Cosman, “Dual frame video coding with pulsed quality and a lookahead window,” IEEE Data Compression Conference 2006, pp. 372-381, Mar 2006. [16] M. Tiwari and P. C. Cosman, “Selection of long-term reference frames in dual-frame video coding using simulated annealing,” IEEE Signal Processing Letters, Vol. 15, pp. 249-252, 2008. [17] A. Leontaris and P. C. Cosman, “Video compression with intra/inter mode switching and a dual frame buffer,” IEEE Data Compression Conference 2003, pp. 63-72, 2003. [18] A. Leontaris, V. Chellappa, and P. Cosman, “Optimal mode selection for a pulsed-quality dual frame video coder,” IEEE Signal Processing Letters, Vol. 11, No. 12, pp. 952-955, Dec. 2004. [19] A. Leontaris and P. C. Cosman, “Dual frame video encoding with feedback,” Asilomar Conf. on Signals, Systems, and Comp., Nov. 2003. [20] V. Chellappa, P. C. Cosman, and G. M. Voelker, “Source and channel coding trade-offs for a pulsed quality video encoder,” 40th Asilomar Conference on Signals, Systems and Computers, Nov. 2006. [21] V. Chellappa, P. C. Cosman, and G.M. Voelker, “Error concealment for dual frame video coding with uneven quality assignment,” IEEE Data Compression Conference 2005, pp. 319-328, Mar. 2005. [22] A. Leontaris and P. C. Cosman, “End-to-end delay for hierarchical B-pictures and pulsed quality dual frame video coders,” in Proc. IEEE Int. Conf. Image Process., pp. 3133–3136, Oct. 2006. [23] A. Leontaris and P. C. Cosman, “Compression efficiency and delay tradeoffs for hierarchical B-pictureframes and pulsed-quality frames,” IEEE Trans. Image Process., vol. 16, no. 7, pp. 1726–1740, July 2007. [24] C.-S. Kim, R.-C. Kim, and S.-U. Lee, “Robust transmission of video sequence using double-vector motion compensation,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 9, pp. 1011–1021, Sep. 2001. [25] S. Lin, Y. Wang, “Error resilience property of multihypothesis motion compensated prediction,” IEEE Int. Conf. Image Process., pp. 545–548, 2002. [26] Y.-C. Tsai, C.-W. Lin, and C.-M. Tsai, “H.264 error resilience coding based on multihypothesis motion-compensated prediction,” Signal Process.: Image Commun., vol. 22, no. 9, pp. 734-751, Oct. 2007. [27] W.-Y. Kung, C.-S. Kim, C.-C. Kuo, “Analysis of multihypothesis motion compensated prediction (MHMCP) for robust visual communication,” IEEE Trans. Circuits Syst. Video Technol., Volume 16, no 1, pp. 146–153, Jan. 2006. [28] M. Ma, Oscar C. Au, Liwei Guo, S.-H. Gary Chan, Xiaopeng Fan and Ling Hou, “Alternate motion-compensated prediction for error resilient video coding,” J. Visual Commun. Image representation, Volume 19, No. 7, Oct. 2008. [29] I. Rhee, and S. Joshi, "Error recovery for interactive video transmission over the internet," IEEE J. Sel. Areas Commun., vol. 18, no. 6, pp.1033-1049, Jun. 2000. [30] B. Girod, “The efficiency of motion-compensating prediction for hybrid coding of video sequences,” IEEE J. Selecl. Areas Commun., vol. SAC-5, no. 8, pp. 1140-1154, Aug. 1987. [31] B. Girod, “Efficiency analysis of multihypothesis motion-compensated prediction for video coding,” IEEE Trans. Image Process., vol. 9, no.2, pp. 173–183, Feb. 2000. [32] N. S. Jayant and P. Noll, Digital Coding of Waveforms. Englewood Cliffs, NJ: Prentice-Hall, 1984. [33] T. M. Cover, Joy A. Thomas, “Elements of information theory,” Chapter 13, Available: http://www.matf.bg.ac.yu/nastavno/viktor/Rate_Distortion_Theory.pdf [34] Z. G. Li, F. Pan, K. P. Lim, G. Feng, X. Lin, and S. Rahardja, “Adaptive basic unit layer rate control for JVT,” presented at the 7th JVT Meeting, JVT-G012-rl Thailand, Mar. 2003. 14 [35] Y. Zhang, Wen Gao, Debin Zhao, “Joint data partition and Rate-Distortion optimized mode selection for H.264 error-resilient coding,” IEEE 8th Workshop on Multimedia Signal Process., pp.248-251, Oct. 2006. [36] Y. Zhang, Wen Gao, Yan Lu, Qingming Huang, Debin Zhao, “Joint source-channel rate-distortion optimization for H.264 video coding over error-prone networks,” IEEE Trans. Multimedia, Vol. 9, no 3, pp.445 – 454, Apr. 2007. [37] S. Wenger, “Error patterns for Internet experiments,” ITU-T SG16, Doc.Q15-I-16r1, 1999. http://ftp3.itu.int/av-arch/video-site/9910_Red/q15i16r1.zip. Da Liu received MSc degree in Computer Science from Harbin Institute of Technology, China, in 2004. And now, he is working for his PhD degree in Computer Science at Harbin Institute of Technology. In 2005, he joined the Joint Research and Development Laboratory as a research assistant. His research interests include image/video coding, image processing, and computer vision. Debin Zhao received the B.S., M.S., and Ph.D. degrees from the Harbin Institute of Technology (HIT), Harbin, China, in 1985, 1988, and 1998, respectively, all in computer science. He joined the Department of Computer Science, HIT, as an Associate Professor in 1993. He is currently a Professor with HIT and the Institute of Computing Technology, Chinese Academy of Sciences, Beijing. His research interests include video coding and transmission, multimedia processing, and pattern recognition. He has authored or coauthored over 200 publications. Dr. Zhao was the recipient of three National Science and Technology Progress Awards of China (Second Prize) and the Excellent Teaching Award from the Baogang Foundation. Xiangyang Ji received the B.S. and M.S. degrees in computer science from Harbin Institute of Technology, Harbin, China, in 1999, and 2001, respectively, and the Ph.D. degree in computer science from Institute of Computing Technology, and the Graduate School of Chinese Academy of Science, Beijing, in 2008. He has authored or co-authored over 40 conference and journal papers. He is now with Broadband Networks & Digital Media Laboratory of Automation Department, Tsinghua University. His research interests include video/image coding, video streaming, and multimedia processing. Wen Gao (M’92-SM’05-Fel’2009) received the Ph.D. degree in electronics engineering from the University of Tokyo, Japan. He is a professor of computer science in Peking University, China. Before joining Peking University, he was a full professor of computer science in Harbin Institute of Technology from 1991 to 1995, and with CAS (Chinese Academy of Sciences) from 1996 to 2005, he served as a professor, the managing director of the Institute of Computing Technology of CAS, the executive vice president of the Graduate School of CAS, and a vice president of the University of Science and Technology of China. He is the editor-in-chief of Journal of Computer (a journal of Chinese Computer Federation), an associate editor of IEEE Transactions on Circuits and Systems for Video Technology, an associate editor of IEEE Transactions on Multimedia, an associate editor of IEEE Transactions on Autonomous Mental Development, an area editor of EURASIP Journal of Image Communications, and an editor of Journal of Visual Communication and Image Representation. He chaired a number of prestigious international conferences on multimedia and video signal processing, and also served on the advisory and technical committees of numerous professional organizations. He has published extensively including four books and over 500 technical articles in refereed journals and conference proceedings in the areas of image processing, video coding and communication, pattern recognition, multimedia information retrieval, multimodal interface, and bioinformatics. He is a fellow of IEEE.