Dual Frame Motion Compensation with Optimal Long

advertisement
PAPER IDENTIFICATION NUMBER:3231
1
Dual Frame Motion Compensation with Optimal Long-term
Reference Frame Selection and Bit Allocation
Da Liu, Debin Zhao, Xiangyang Ji, and Wen Gao, Fellow, IEEE
Abstract—In dual frame motion compensation (DFMC), one
short-term reference frame (STR) and one long-term reference
frame (LTR) are utilized for motion compensation. The
performance of DFMC is heavily influenced by the jump updating
parameter and bit allocation for the reference frames. In this
paper, firstly the rate-distortion performance analysis of motion
compensated prediction in DFMC is presented. Based on this
analysis, an adaptive jump updating DFMC (JU-DFMC) with
optimal LTR selection and bit allocation is proposed.
Subsequently, an error resilient JU-DFMC is further presented
based on the error propagation analysis of the proposed adaptive
JU-DFMC. The experimental results show that the proposed
adaptive JU-DFMC achieves better performance over the existing
JU-DFMC schemes and the normal DFMC scheme in which the
temporally most recently decoded two frames are used as the
references. The performance of the adaptive JU-DFMC is
significantly improved for video transmission over noisy channels
when the specified error resilience functionality is introduced.
Index Terms—Video coding, motion compensation, dual frame
motion compensation, bit allocation, error propagation, error
resilience.
I. INTRODUCTION
M
otion-compensated prediction in inter-prediction coding
plays an important role in the existing hybrid video
codecs such as MPEG-4 [1], H.263 [2], and H.264/AVC [3].
For each inter-block in the current frame, its prediction signal is
obtained from the reference frame via motion compensation.
Subsequently, the difference between the current original frame
and its prediction is compressed and transmitted. Multi-frame
motion compensation [4,5,6,7,8] allows that more than one
reference frame can be used for the motion-compensated
prediction. In most cases, it improves coding performance
significantly. However, with the increase of the number of
reference frames, the memory storage and the motion searching
Manuscript received July 1, 2008; revised July 26, 2009. This work was
supported by National Science Foundation of China under Grant 60736043 and
National Basic Research Program of China (973 Program, 2009CB320903).
This paper was recommended by Associate Editor Eckehard Steinbach.
D. Liu and D. Zhao are with the Department of Computer Science, Harbin
Institute of Technology, Harbin, 150001, China. (e-mail: dliu@jdl.ac.cn,
dbzhao@jdl.ac.cn)
X. Ji is with Broadband Networks & digital Media Laboratory of
Automation Department, Tsinghua University, Beijing, 100084, China (e-mail:
xyji@mail.tsinghua.edu.cn)
W. Gao is with the Key Laboratory of Machine Perception, the School of
Electronic Engineering and Computer Science, Peking University, No.5,
Yiheyuan Road, Beijing 100871, China(e-mail:wgao@pku.edu.cn)
Copyright (c) 2009 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending an email to pubs-permissions@ieee.org.
LTR Buffer
STR Buffer
Fig. 1 Dual frame motion compensation
Current Frame
complexity increase dramatically.
Dual frame motion compensation (DFMC) is the special case
of multi-frame motion compensation in which only two
reference buffers are utilized, thus requiring a relatively modest
increase in memory storage and motion searching complexity.
In DFMC, as shown in Fig. 1, the first reference buffer contains
the most recently decoded frame, called short-term reference
frame (STR), and the second one contains a reference frame
from the past that is periodically updated, called long-term
reference frame (LTR).
Generally, there are two types of approaches for DFMC [9].
The first approach is jump updating DFMC (JU-DFMC), in
which LTR remains static for N frames, and jumps forward to
be the frame at a distance 2 back from the frame to be encoded.
For example, suppose for the frames from time instant i-N+1 to
i, the LTR is i-N-1. Then after encoding frame i, when the
encoder moves on to encoding frame i+1, the STR will slide
forward by one to frame i, and the LTR will jump forward by N
to frame i-1. After that, the LTR remains fixed for N frames,
and then jumps forward again. N is called the jump update
parameter. The second approach is continuous updating DFMC
(CU-DFMC), where the LTR for each current frame always has
a fixed temporal distance D, called the continuous update
parameter, to the current frame. As a result, every frame has a
chance serving as a STR and as a LTR.
A number of DFMC based approaches to improve the video
coding performance have been reported in the literatures. In
[10], a refreshing rule of LTR was proposed by introducing
scene changing detection. In [11], the concept of the dual frame
was simulated in a low bandwidth situation by the
block-partitioning prediction and the utilization of two time
differential reference frames. [12] has found that using a high
quality frame as a reference frame for the following frames will
benefit the overall performance. [13] and [14] have shown that
PSNR is influenced by the different extra bandwidth and the
period giving to the LTR. In [15], the update period of the LTR
was set to ten frames. The PSNR of nine frames that follow the
LTR frame was utilized to determine how many bits can be
allocated to the LTR. In [16], simulated annealing was utilized
to select the LTR; however, its computational complexity is
relatively high.
PAPER IDENTIFICATION NUMBER:3231
For video transmission over noisy channels, in [9] [17] and
[18], the recursive optimal per-pixel estimate (ROPE)
algorithm was utilized to provide mode decision in dual frame
coding. They had pointed out that the fixed jump updating
parameter of LTR was not optimal for all sequences. Feedback
was also utilized in dual frame coding in [19] to control the drift
errors. In [20], the uneven allocation of error protection to the
LTR was examined. It showed that assigning higher error
protection for the LTRs was better than assigning equal error
protection for all frames. In [21], a binary decision tree
designed by the classification and regression trees (CART)
algorithm was utilized to choose among various error
concealment choices in the dual frame coding. The trade-off of
end-to-end delay and compression efficiency in dual frame
coding motion compensation was investigated in [22] and [23].
Multihypothesis motion compensated prediction (MHMCP)
was also utilized to enhance error resilience. In [24], each block
is predicted from two reference blocks using two motion
vectors. In [25], the error propagation model of MHMCP
jointly considered the coding efficiency and error resilience in
predictor selection. Furthermore, the reference picture
interleaving and data partitioning was utilized in [26] to make
MHMCP more resilient to channel errors. In [27], the error
propagation impact in MHMCP was examined and the
rate-distortion performance considering the hypothesis number
and coefficients were analyzed. In [28], two-hypothesis
prediction and one-hypothesis prediction were adaptively used
to decrease error propagation. In [29], all frames were divided
into period frame and non-period frame. The period frame has
fixed distance between each other. For all non-period frames,
only a previous period frame was utilized as the reference
frame. However, the adaptive period frame selection and bit
allocation for different packet loss rate was not reported.
For different video sequences, adaptive LTR selection and
bit allocation to improve coding efficiency have not been fully
studied yet. In this paper, the rate distortion (R-D) performance
for motion compensated prediction (MCP) in DFMC is
analyzed. Based on the analysis, an adaptive JU-DFMC with
optimal LTR selection and bit allocation is provided.
Furthermore, for video transmission over noisy channels, the
error propagation in the proposed adaptive JU-DFMC is
analyzed first, then an error resilient JU-DFMC is presented.
The rest of the paper is organized as follows. In Section II,
the R-D performance analysis for MCP in DFMC is given.
Based on the analysis, optimal LTR selection and bit allocation
in the adaptive JU-DFMC are separately presented in Section
III and Section IV. In Section V, based on the error propagation
analysis for the proposed adaptive JU-DFMC, an error resilient
JU-DFMC is presented for video transmission over noisy
channels. The experimental results and discussions are
provided in Section VI. Finally, Section VII concludes this
paper.
II. RATE DISTORTION PERFORMANCE ANALYSIS FOR MCP IN
DFMC
In this section, the R-D performance analysis for MCP is
2
firstly presented. Secondly, the prediction error variances in
both JU-DFMC and CU-DFMC are formulated.
A. Power Spectral Density
Power spectral density (PSD) Φ(ω x , ω y ) describes the
power of a signal as a function of frequency and is achieved as
∞
Φ (ω x , ω y ) = ∫−∞
R (τ )e −2π i (ω x ,ω y )τ dτ ,
(1)
where (ω x , ω y ) is a vector representing the two-dimensional
spatial frequency, R(τ ) is an autocorrelation function that
describes the correlation between different time points.
The rate-distortion analysis of MCP [30] relates the PSD of
the prediction error to the accuracy of motion compensation
captured by the probability density function (pdf) of
displacement error. The rate-distortion analysis was extended
to the multihypothesis prediction in [31]. Especially, as
described in [23], the PSD for two hypotheses prediction can be
simplified to
⎛ 6 + α 1 + α 2 + 2 P1 (Λ ) P2 (Λ)
⎞
Φ ee (Λ ) = Φ ss (Λ ) ⎜
− P1 (Λ ) − P2 (Λ ) ⎟ , (2)
4
⎝
⎠
where Φ ee (Λ ) is the PSD of the prediction error, Φ ss (Λ ) is the
signal power spectrum of the input video signal and is
non-negative, Λ = (ω x , ω y ) , and
P(ω x , ω y ) = e −2πσ + (ω x +ω y ) .
2
2
2
(3)
P1 ( Λ ) and P2 (Λ ) are separately the displacement error pdfs
from the first and the second hypotheses. σ +2 is the
displacement error variance (DEV) and it reflects the
inaccuracy of the displacement vector used for the motion
compensation [30]. α i represents the spectral noise-to-signal
power ratio in the ith hypothesis and it has been given in [31]
as α i = Φ nn _ i (Λ) / Φ ss (Λ ) . Φ nn _ i (Λ) represents the PSD of
residual noise in the ith hypothesis and has been given in [30]
as Φ nn _ i (Λ ) = max[0,θ (1 − θ / Φ ee _ i (Λ ))] . Φ ee _ i (Λ ) is the PSD
of the prediction error in the ith hypothesis. θ is a parameter
that generates the rate-distortion function by taking on all
positive real values [30]. If Φ nn _ i (Λ ) = 0 , it has no influence on
Φ ee (Λ ) . If Φ nn _ i (Λ ) = θ (1 − θ / Φ ee _ i (Λ )) , since Φ nn _ i (Λ ) is
nearly linear proportional to Φ ee _ i (Λ ) in the short term, it can
be simplified as Φ nn _ i (Λ ) = h(Φ ee _ i (Λ )) . h( ) is the linear
function that represents the relationship between Φ nn _ i (Λ ) and
Φ ee _ i ( Λ ) . So considering both cases, (2) can be re-written as
Φee (Λ) =
h(Φee _1) + h(Φee _2)
4
+Φ ss (Λ)
3 + P1(Λ)P2(Λ) − 2P1(Λ) − 2P2(Λ)
.(4)
2
The PSD Φ ee (ω x , ω y ) is utilized as the performance
measurement. From [31],
∆R =
Φ ee (ω x , ω y )
1 + π +π
)dω x dω y ,
∫ ∫ log 2 (
Φ ss (ω x , ω y )
8π 2 −π −π
(5)
where ∆R represents the maximum bit-rate reduction (in
PAPER IDENTIFICATION NUMBER:3231
3
11
10
Prediction error variances from LTR
Prediction error variances from STR
7
1
3
5
7
Frame number
9
11
Fig. 3 The prediction error variance σ e2 in JU-DFMC
f i −HQN −1 fi −LQN f i −LQN +1 f i −LQN + 2 f i −LQN + 3 f i −LQ2 f i −HQ1
LTR
9
8
prediction structure is smaller than the other, then its ∆R will
be smaller, and its coding efficiency will be better.
B. Prediction Error Variance
Mobile(qcif)
12
Variancee
bits/sample) by optimum encoding of the prediction error,
compared to the optimum intra frame encoding of the signals
for the same mean squared error (MSE) [32]. A negative ∆R
corresponds to a reduced bit-rate compared to the optimum
intra frame coding. From (5), we can conclude that when the
reconstructed frames from two prediction structures have the
same MSE, if the PSD Φ ee (ω x , ω y ) in the frame from one
f i LQ
recently decoded two frames are separately STR and LTR.
Therefore for every frame in CU-DFMC, the prediction
performance from LTR is nearly the same and thus σ e2 from
LTR is nearly the same. The same conclusion can also be
obtained for STR prediction.
Generally, if the prediction performance is better, the PEV
σ e2 will be smaller. Since σ e2 is nearly directly proportional to
fi +LQ1
LTR
Fig. 2 JU-DFMC structure
The PSD Φ ee (Λ ) is related to the displacement error pdf.
From (3), the displacement error pdf is determined by the DEV
σ +2 . In video coding, the displacement error is obtained from
the distance between the actual motion vector and its estimated
one. The DEV reflects the inaccuracy of the motion
compensation [30]. The prediction error variance (PEV) is the
variance of the motion compensated error between the original
value and the reference value, it also reflects the inaccuracy of
the motion compensation. From [23], in the short term, the
DEV σ +2 is nearly directly proportional to the PEV σ e2 . So
σ e2 plays a key role in determining Φ ee (Λ ) . In the following,
we will give the analysis of σ e2 .
In JU-DFMC, there are two kinds of frames. As shown in Fig.
2, one kind of frames is called high quality frame (HQF) with
HQ
relatively more bits allocated, such as the (i-1)th frame f i -1
and the (i-N-1)th frame f i -HQ
N -1 . The other kind of frames has
f i LQ
+k
relatively lower quality (LQF), such as the (i+k)th frame
(k=-N, -N+1, …, -2, 0, 1). To simplify our discussion, all these
LQFs are coded to have similar MSEs. Any one frame (HQF or
LQF) can be utilized as STR for the next frame. Meanwhile, the
HQF is utilized as LTR for following several frames. One
example of the PEV σ e2 in JU-DFMC is shown in Fig. 3. The
frame at time instant 1 is encoded as a HQF, and then utilized as
the LTR for following several frames. The frames at other time
instants are encoded as LQFs (in the figure, bit rate in LQF is
420.62kbps, bit allocation in HQF is 3 times that in LQF). In
every encoded LQF, the prediction performance from STR is
similar, so σ e2 from STR (dashed line) is similar. For the
second LQF (at time instant 3) following the LTR, the
prediction performance from the LTR is better, thus σ e2 is
smaller. With the coding of the following frames, the prediction
performance from the LTR degrades while σ e2 from the LTR
increase (solid line).
In CU-DFMC of this work, every frame is allocated
approximately the same quality. The continuous update
parameter D is generalized as 2. For every frame, the most
σ +2 [23], σ +2 is smaller as well.
III. OPTIMAL LTR SELECTION IN JU-DFMC
In this section, the optimal LTR selection in JU-DFMC is
presented. In the first subsection, the coding performance of the
same LQF in CU-DFMC and JU-DFMC is compared. In the
second subsection, the coding performance of the same HQF in
CU-DFMC and JU-DFMC is compared. The optimal LTR
selection in JU-DFMC is provided in the last subsection.
A. Coding Performance Comparison of LQF
LQ
f i −HQ
f i −LQ
f i −LQ2 f i −LQ
4
3
1 fi
LTR
Fig. 4 LQF i in JU-DFMC
fi −LQ4 fi −LQ3 fi −LQ2 fi −LQ1 fi LQ
Fig. 5 LQF i in CU-DFMC
The frame at instant i can be encoded as a LQF (denoted as
f i LQ ) in JU-DFMC or in CU-DFMC (as shown in Fig. 4 and
Fig. 5), its power spectral densities are separately represented
J ( Λ ) and Φ CL ( Λ ) . For giving the performance
as Φ ee
ee
comparison when f i LQ is encoded in the two DFMC coding
structures, the reconstructed f i LQ s in both structures are
assumed to have the same MSE. Then the difference of power
sut ee (Λ ) between the two f LQ s in the two
spectral densities Φ
i
structures can be derived from (4) as follows:
sut ee (Λ) = Φ J (Λ) − Φ CL (Λ)
Φ
ee
ee
,(6)
1
J (Λ) − Φ CL (Λ)) + h(Φ J (Λ) − Φ CL (Λ))) + Φ (Λ)P LQ
= (h(Φ ee
ss
ee _1
ee _ 2
ee _ 2
_1
4
J
CL
where Φ ee
_ k ( Λ ) and Φ ee _ k ( Λ ) are separately the PSD in the
kth hypothesis (k=1 or 2 represent taking STR or LTR as
hypothesis here and after) in JU-DFMC and CU-DFMC, and
PAPER IDENTIFICATION NUMBER:3231
4
1
1
P LQ = P1J (Λ) ×( P2J (Λ) −1) − P2J (Λ) − (P1CL(Λ) × ( P2CL(Λ) −1) − P2CL(Λ)) .(7)
2
2
In (7), P1J (Λ ) and P2J (Λ ) separately represent the
displacement error pdfs from STR and LTR in JU-DFMC,
P1CL (Λ ) and P2CL (Λ ) separately represent the displacement
error pdfs from STR and LTR in CU-DFMC.
J ( Λ ) − Φ CL ( Λ ) depends on Φ ( Λ ) P LQ and
From (6), Φ ee
ee
ss
J
CL
J
CL
h(Φ ee
_ k ( Λ ) − Φ ee _ k ( Λ )) . The calculation of Φ ee _ k (Λ) − Φ ee _ k (Λ)
J ( Λ ) − Φ CL ( Λ ) . Therefore in iterative formula
is the same as Φ ee
ee
J ( Λ ) − Φ CL ( Λ ) .
(6), P LQ plays a dominant role in Φ ee
ee
In JU-DFMC and CU-DFMC, the reconstructed f i LQ s are
assumed to have the same MSE. STR in both structures has
nearly the same MSE as the current reconstructed f i LQ , and the
temporal distance from STR to the current f i LQ is the same
(one frame distance), then the prediction performance from
STRs in both structures is nearly the same. From Subsection
II-B, the prediction performance influences the DEV σ +2 , so in
the two DFMC coding structures, σ +2 from STR is nearly the
same. According to (3), P1J (Λ) is nearly the same as P1CL (Λ ) .
Then (7) can be further represented as
1
P LQ = (1 − P1J (Λ))( P2CL (Λ ) − P2J (Λ )) .
2
(8)
In JU-DFMC (in Fig. 4), the MSE of LTR is smaller than that of
the current reconstructed f i LQ . In CU-DFMC (in Fig. 5), the
MSE of LTR is nearly the same as that of the current
reconstructed f i LQ . When f i LQ is located in the several
previous LQFs following LTR in JU-DFMC, the prediction
performance from LTR is better than that when f i LQ is
encoded in CU-DFMC. From the analysis in Subsection II-B, if
the prediction performance is better, the DEV σ +2 will be
smaller, so σ +2 from LTR in JU-DFMC is smaller than that in
CU-DFMC. According to (3), P2CL (Λ ) is smaller than P2J (Λ) .
And the σ +2 is always larger than 0, then P1J (Λ ) < 1 . According
to
the
above
analysis
and
(8),
P LQ = (1−1/2P1J (Λ))(P2CL (Λ) − P2J (Λ)) < 0 . Since P LQ palys a
J ( Λ ) − Φ CL ( Λ ) , we can get
dominant role in determination of Φ ee
ee
J ( Λ ) − Φ CL ( Λ ) < 0 . This means that compared to encoding
Φ ee
ee
f i LQ using CU-DFMC, the coding of f i LQ using JU-DFMC
has better R-D performance (represented as bits saving in the
same reconstructed MSE) if f i LQ is located in the several
previous LQFs following LTR. The R-D performance gain is
sut ee (Λ ) .
represented as Φ
reconstructed f i HQ s in both structures are assumed to have the
same MSE. Then, the difference of power spectral densities
Φ ee (Λ ) between the two f i HQ s in the two structures can be
derived from (4) as follows:
J
Φee (Λ) = ΦCH
ee (Λ) −Φ ee (Λ)
, (9)
1
J
CH
J
HQ
= (h(ΦCH
ee _1(Λ) −Φ ee _1(Λ)) + h(Φ ee _2 (Λ) −Φ ee _2 (Λ))) +Φ ss (Λ)P
4
J
where Φ CH
ee _ k ( Λ ) and Φ ee _ k ( Λ ) (k=1 or 2) are separately the
PSD in the kth hypothesis in CU-DFMC and JU-DFMC, and
1
1
PHQ = ((P1CH(Λ)×( P2CH(Λ) −1) − P1J (Λ)×( P2J (Λ) −1)) + (P2J (Λ) − P2CH(Λ)) . (10)
2
2
In (10), P1J (Λ ) and
P2J ( Λ ) separately represent the
displacement error pdfs from STR and LTR in JU-DFMC,
P1CH ( Λ ) and P2CH (Λ ) separately represent the displacement
error pdfs from STR and LTR in CU-DFMC.
J
HQ and h(ΦCH (Λ) −ΦJ (Λ))
ΦCH
ee (Λ) −Φee (Λ) depends on Φss (Λ)P
ee _ k
ee _ k
J
(k=1 or 2). The calculation of Φ CH
ee _ k ( Λ ) − Φ ee _ k ( Λ ) is the same
J
HQ
as Φ CH
ee ( Λ ) − Φ ee ( Λ ) . Therefore in iterative formula (9), P
J
plays a dominant role in Φ CH
ee ( Λ ) − Φ ee ( Λ ) .
The reconstructed f i HQ s in both structures are assumed to
have the same MSE. In CU-DFMC (in Fig. 7), STR and LTR
have nearly the same MSE as the current reconstructed f i HQ .
In JU-DFMC (in Fig. 6), STR has larger MSE than the current
reconstructed f i HQ ; LTR has nearly the same MSE as the
current reconstructed f i HQ , but the long temporal distance
weakens its influence. So the prediction performance from
LTR and STR in CU-DFMC is separately better than those in
JU-DFMC. From the analysis in Subsection II-B, if the
prediction performance is better, the DEV σ +2 will be smaller,
and according to (3), we can have 0 < P1J (Λ ) < P1CH (Λ) ,
0 < P2J ( Λ ) < P2CH (Λ ) , and P1J ( Λ ) < 1 . From the above analysis
and (10), we can get (see Appendix A)
1
P HQ ≈ (1 − P1J (Λ )) × ( P2J (Λ ) − P2CH (Λ )) < 0 .
2
(11)
J
Since P HQ plays a key role in determining Φ CH
ee ( Λ ) − Φ ee ( Λ ) ,
J
we can get Φ CH
ee ( Λ ) − Φ ee ( Λ ) < 0 . This means that compared
with the coding of f i HQ using CU-DFMC, the coding of f i HQ
in JU-DFMC results in R-D performance loss (represented as
more bits consumption in the same reconstructed MSE),
denoted as Φ ee (Λ ) .
B. Coding Performance Comparison of HQF
The frame at instant i can also be encoded as a HQF (denoted
as f i HQ ) in JU-DFMC or in CU-DFMC (as shown in Fig. 6 and
Fig. 7), its PSDs are separately represented as
J (Λ )
Φ ee
and
HQ
Φ CH
ee ( Λ ) . For giving the performance comparison when f i
is encoded in the two DFMC coding structures, the
HQ
f i −HQ
f i −LQ3 f i −LQ2 f i −LQ
1 fi
4
LTR
Fig. 6 HQF i in JU-DFMC
HQ
HQ
HQ HQ
f i −HQ
4 f i − 3 f i − 2 f i −1 f i
Fig. 7 HQF i in CU-DFMC
PAPER IDENTIFICATION NUMBER:3231
C. Optimal LTR Selection
Before describing the proposed LTR selection method, some
frequently used notations in the section is given in Table I.
Table I. Notations
Variable
Definition
σ+2_JL
DEV from LTR in JU-DFMC
σ+2_CLL
σ +2_CHL
σ e2_CLL
σ e2 _ CHL
σ e2_ JL2
σ e2_ JSA
DEV from LTR in CU-DFMC, in which every frame is
LQF
DEV from LTR in CU-DFMC, in which every frame is
HQF
PEV from LTR in CU-DFMC, in which every frame is
LQF
PEV from LTR in CU-DFMC, in which every frame is
HQF
PEV in the second LQF following LTR (HQF) in
JU-DFMC when the reference frame is LTR
average PEV in LQFs when the reference frame is STR
5
consequently, P2CL (Λ) ≈ 1− k ×σ +2_CLL and P2CH (Λ) ≈ 1− k ×σ +2 _CHL ,
where σ +2_ JL is the DEV from LTR in JU-DFMC; σ +2 _ CLL is
the DEV from LTR in CU-DFMC, in which every frame is a
LQF; and σ +2_ CHL is also the DEV from LTR in CU-DFMC,
but in which every fame is a HQF. Then P LQ and P HQ can be
separately written as
1
1
PLQ = (1− P1J (Λ))×(P2CL(Λ) − P2J (Λ)) ≈−(1− P1J (Λ))×k ×(σ+2_CLL −σ+2_ JL) ,(13)
2
2
1
1
PHQ ≈ (1− P1J (Λ))×(P2J (Λ) − P2CH(Λ)) ≈−(1− P1J (Λ))×k ×(σ+2_JL −σ+2_CHL) .(14)
2
2
The DEV σ +2 can be used to compare the performance and
determine the next LTR. In JU-DFMC, with the encoding of
the frame j (j=i+1, i+2, i+3…) following LTR, the prediction
performance from the LTR decreases, and the DEV σ +2_ JL
from the LTR increases. When σ +2_ JL from LTR is less than
1) LTR Selection Strategy
(σ 2
In this subsection, based on the previous analysis of the
performance comparison, the optimal LTR selection in
JU-DFMC is presented. In CU-DFMC, the bit allocation in
every frame can be changed, but the R-D performance is fixed.
For example, although the quality of every LQF in Fig. 5 is
lower than that of every HQF in Fig. 7, the R-D performance is
stable. So the R-D performance in CU-DFMC is used to
compare with the R-D performance in JU-DFMC. In
JU-DFMC, the R-D performance in LQF and HQF is different.
From Subsection III-A, compared with the coding of f i LQ in
sut ee (Λ ) .
P HQ is larger than P LQ and Φ ee ( Λ ) is larger than Φ
CU-DFMC, the encoded f i LQ in JU-DFMC has better R-D
performance if f i LQ is located in the several previous LQFs
following LTR. From Subsection III-B, compared with the
coding of f i HQ in CU-DFMC, the encoded f i HQ in JU-DFMC
has lower R-D performance. In JU-DFMC, with the coding of
frames following LTR, the R-D performance gain when the
frame is encoded as a LQF and the R-D performance loss when
the frame is encoded as a HQF are utilized to determine the end
of LQF coding and beginning the next HQF (LTR).
When a frame is separately encoded as a LQF and a HQF in
JU-DFMC, the R-D performance gain and loss are separately
sut ee (Λ ) and Φ ee ( Λ ) . The difference of R-D
denoted as Φ
performance gain and loss can be obtained as (see Appendix B)
sut ee (Λ) − Φ ee (Λ) ≈ 1 h(Φ
sut ee _1 (Λ) − Φ ee _1 (Λ)) + Φ ( P LQ − P HQ ) . (12)
Φ
ss
2
s
u
t
In (12), Φ ee _1 (Λ ) and Φ ee _1 (Λ) are the R-D performance gain
and loss when STR is separately encoded as a LQF or a HQF in
sut ee _1 ( Λ ) − Φ ee _1 ( Λ ) is the same
JU-DFMC, the calculation of Φ
sut ee (Λ) −Φ ee (Λ) , therefore in iterative formula (12),
as Φ
sut ee ( Λ ) − Φ ee (Λ ) .
( P LQ − P HQ ) plays a dominant role in Φ
P LQ and P HQ are determined by the DEV σ +2 . In the short
term, P2 (Λ ) has nearly linear relationship with σ +2 . For the
simplicity of calculation, we suppose P2J (Λ ) ≈ 1 − k × σ +2_ JL ,
2
+_ CLL + σ +_ CHL ) / 2
, (σ +2 − σ +2 _ CHL ) < (σ +2 _ CLL − σ +2 ) . Then
The R-D performance gain when frame j is encoded as a LQF is
larger than the R-D performance loss when frame j is encoded
as a HQF. At this point, if the frame j is encoded as a LQF,
JU-DFMC will have better performance. Otherwise, when
σ +2_ JL from the LTR is larger than (σ +2 _ CLL + σ +2 _ CHL ) / 2 , the
R-D performance gain will be smaller than the performance
loss. At this point, if the frame j is encoded as a LQF, the coding
of the next HQF will result in more quality loss, thus the
performance of JU-DFMC will decrease.
From the above analysis, we select the DEV
σ +2_ T = (σ +2_ CLL + σ +2_ CHL ) / 2
(15)
as the point to terminate the LQF coding. When σ +2_ JL from
LTR is larger than σ +2_ T , the next frame is encoded as a HQF.
2) Implementation in Video Coding
In the short term, the DEV σ +2 is nearly directly
proportional to the PEV σ e2 [23]. Then in JU-DFMC,
σe2_T = (σe2_CLL +σe2_CHL)/ 2 can be utilized instead of σ +2_ T as the
point to terminate the coding of LQF, where σ e2_ T is the PEV
from LTR in JU-DFMC; σ e2 _ CLL is the PEV from LTR in
CU-DFMC, in which every frame is a LQF; σ e2 _ CHL is also
the PEV from LTR in CU-DFMC, in which every frame is a
HQF.
σ e2 _ CLL and σ e2 _ CHL cannot be obtained in JU-DFMC.
HQFs in CU-DFMC and JU-DFMC are assumed to have the
same MSE, then σ e2 _ CHL can be replaced by σ e2_ JL 2 , which is
the PEV in the second LQF following LTR (HQF) in
JU-DFMC when the reference frame is LTR. LQFs in
CU-DFMC and JU-DFMC are assumed to have the same MSE.
If the quality of the reference frame is the same, and the
temporal distance between the two reference frames is only one
PAPER IDENTIFICATION NUMBER:3231
6
n Φ eeT ( Λ ) − Φ eeT (Λ ) = ∑ (Φ eei (Λ ) − Φ eei (Λ )) .
frame, the PEVs from the two reference frames are similar, so
σ e2 _ CLL can be replaced by the PEV in LQF in JU-DFMC
when the reference frame is a STR (LQF). For safety, σ e2 _ CLL
is replaced by σ e2_ JSA , which is the average PEV in LQFs
(from the second LQF to the last encoded LQF) in the GOP
when the reference frame is STR.
So finally in JU-DFMC, σ e2_ T = (σ e2_ JL 2 + σ e2_ JSA ) / 2 is
utilized instead of σ e2_ T = (σ e2 _ CLL + σ e2_ CHL ) / 2 as the point to
terminate the coding of LQF.
In determing bit allocation, PSD Φ ee (ω x , ω y ) is compared
under the same bit rate. According to Parseval’s relation,
σ =
4π
+π f sx +π f sy
Φ ee (ω x , ω y )dω x dω y
2 ∫−π f sx ∫−π f sy
,
(16)
where terms f sx and f sy are the spatial sampling frequencies
in horizontal and vertical directions. From (16), if Φ ee (ω x , ω y )
is smaller, σ e2 will be smaller. And from [33],
σ2
1
R ( D) = log 2 ( e ) .
D
2
(17)
Under the same bit-rate, if σ e2 is smaller, D (MSE) will be
smaller, and the coding efficiency is better. Therefore, given
the same bit rate, if PSD Φ ee (ω x , ω y ) is smaller, the coding
efficiency will be better.
In JU-DFMC, one HQF and the following LQFs comprise a
group of pictures (GOP). The total PSD of all frames in a GOP
is utilized as performance measure of bit allocation. Assuming
the overall bit rate of a GOP is fixed, the bit allocation between
HQF and LQFs can be changed. The changed bits in LQFs are
assumed to be averagely allocated to every LQF, thus the
changed MSE in every LQF is nearly the same. With the
change of bit allocation between HQF and LQFs under the
overall bit rate of the GOP, the PSD in HQF and LQFs will
change, when the total PSD is the smallest, the coding
efficiency is the best, and then the bit allocation ratio between
HQF bits and average LQF bits is the best.
The total PSD can be simplified to a concise performance
measure. Suppose before the change of bit allocation, the PSD
in frame i and the total PSD in the GOP are separately denoted
as Φ eei (Λ ) and Φ eeT (Λ ) . After the change of bit allocation, the
PSD in frame i and the total PSD in the GOP are separately
denoted as Φ eei (Λ ) and Φ eeT (Λ ) . Assume a GOP has n frames.
Then we have
n Φ eeT ( Λ) = ∑ Φ eei (Λ ) ,
(18)
n Φ eeT ( Λ) = ∑ Φ eei (Λ ) .
(19)
i =1
Φ eei (Λ) − Φ eei (Λ)
1
1 .(21)
= h(Φ eei _1 (Λ) − Φ eei _1(Λ)) + h(Φ eei _ 2 (Λ) − Φ eei _ 2 (Λ))
4
4
1
+ Φ ss (Λ)((2 − Pi 2 (Λ))(2 − Pi1 (Λ)) − (2 − Pi 2 (Λ))(2 − Pi1 (Λ)))
2
In (21), Φ eei _ k (Λ) and Φ eei _ k (Λ) (k=1 or 2) represent the PSD
(2 − Pi 2 (Λ))(2 − Pi1 (Λ)) represent the values of (2 − Pi2 (Λ))(2 − Pi1 (Λ))
A. Performance Measurement of Bit Allocation
1
In (20), Φ eei (Λ ) − Φ eei (Λ) (i=1,2,…,n) is the change of PSD
in every frame and can be calculated by (see appendix C)
in the kth reference frame before and after the change of bit
(2 − Pi 2 ( Λ ))(2 − Pi1 ( Λ ))
and
allocation
respectively.
IV. BIT ALLOCATION FOR JU-DFMC
2
e
(20)
i=1
i=1
With the change of bit allocation, the change of total PSD is
before and after the change of bit allocation respectively.
Pi 2 ( Λ ) and Pi1 ( Λ ) are the displacement error pdfs in frame i
when the reference is separately a LTR and a STR in JU-DFMC.
The calculation of Φ eei _ k (Λ) − Φ eei _ k (Λ) is the same as
Φ eei (Λ ) − Φ eei (Λ) , therefore in iterative formula (21),
plays
a
(2 − Pi 2 ( Λ ))(2 − Pi1 ( Λ )) − (2 − Pi 2 (Λ ))(2 − Pi1 (Λ ))
dominant role in Φ eei (Λ ) − Φ eei (Λ ) . From (20),
n
plays a
∑ ((2 − Pi 2 ( Λ ))(2 − Pi1 (Λ )) − (2 − Pi 2 (Λ ))(2 − Pi1 (Λ )))
i =1
dominant role in Φ eeT (Λ) − Φ eeT (Λ ) . This means that the
change of (2 − Pi 2 (Λ))(2 − Pi1 (Λ)) plays a dominant role in the
change
of
PSD
in
a
frame,
and
the
change
of
n
∑ ((2 − Pi 2 (Λ ))(2 − Pi1 (Λ ))) plays a key role in the change of
i =1
total PSD in the GOP.
With the change of bit allocation between HQF and LQFs
under the same overall bit rate of the GOP, if
n
∑ ((2 − Pi 2 (Λ))(2 − Pi1 (Λ))) increases (or decreases), the value of
i =1
total PSD will also increases (or decreases). When
n
∑ ((2 − Pi 2 (Λ))(2 − Pi1 (Λ))) is the smallest, the total PSD is the
i =1
smallest, the coding efficiency is the best. Therefore, the value
n
of ∑ ((2 − Pi 2 (Λ ))(2 − Pi1 (Λ ))) is utilized instead of total PSD as
i =1
the performance measure of bit allocation.
(2 − Pi 2 ( Λ ))(2 − Pi1 (Λ )) can be further simplified. According
to (3), Pi 2 (Λ ) and Pi1 (Λ ) are within (0, 1). If Pi 2 (Λ ) or
Pi1 ( Λ ) increases, the value of (2 − Pi 2 ( Λ )) or (2 − Pi1 (Λ )) will
decrease. So (2 − Pi 2 (Λ ))(2 − Pi1 (Λ )) has an inverse relationship
with Pi 2 (Λ) Pi1 (Λ) , which can be calculated from (3) as follows:
Pi 2 (Λ ) Pi1 (Λ ) = e
=e
−2πσ +2 _ JLi (ω x2 + ω 2y )
×e
−2πσ +2 _ JSi (ω x2 +ω 2y )
−2π (σ +2 _ JLi +σ +2 _ JSi )(ω x2 + ω 2y )
=e
−2π (σ +2_ Pi )(ω x2 + ω 2y )
,(22)
where
σ +2_ Pi = σ +2_ JSi + σ +2_ JLi ,
(23)
PAPER IDENTIFICATION NUMBER:3231
7
σ +2_ JLi is the DEV in frame i when reference frame is a LTR in
JU-DFMC, σ +2_ JSi is the DEV in frame i when the reference
frame is a STR in JU-DFMC.
From (22), σ +2_ Pi has an inverse relationship with Pi2 (Λ)Pi1 (Λ) ,
then it has a direct relationship with (2 − Pi 2 (Λ ))(2 − Pi1 (Λ )) .
Therefore, σ +2_ Pi is utilized instead of (2 − Pi 2 (Λ ))(2 − Pi1 (Λ ))
as the performance measure of bit allocation for frame i and
n
thus ∑ σ +2_ Pi is utilized as the performance measure of bit
the jth GOP is calculated. If j is equal to 1, Ra is initialized as 4.
Otherwise, Ra is updated from the previous GOP as follows.
After encoding the (j-1)th GOP, the bit allocation and MSEs
in HQF and LQFs is obtained and fixed. Assuming the overall
bit-rate of the (j-1)th GOP is fixed, if the bit allocation between
HQF and LQFs in the (j-1)th GOP changes (suppose the
changed bit allocation in every LQF is the same) based on the
fixed bit allocation, the changed MSE in HQF and LQFs
corresponding to the changed bit-rate can be approximately
calculated from the derivation of function R( D) in (17),
i =1
σ2
dR
1
D
1
1
=
× 2 × (− e2 ) = −
× ,
2ln 2 D
dD 2ln 2 σ e
D
allocation for the GOP. With the change of bit allocation
n
between HQF and LQFs, when ∑ σ +2_ Pi is the smallest, the
i =1
overall power special density is the smallest, the bit allocation
ratio between HQF bits and average LQF bits a GOP is the best.
In the short term, the DEV σ +2 is nearly directly
proportional to the PEV σ e2 [23]. Then in every GOP, σ +2_ JLi
and σ +2_ JSi are nearly directly proportional to σ e2_ JLi and
σ e2_ JSi , which are the PEVs in frame i when reference frame is
LTR and STR respectively. So σ e2_ Pi = σ e2_ JLi + σ e2_ JSi and
n
n
i =1
i =1
∑ σ e2_ Pi can be used instead of σ +2_ Pi and ∑ σ +2_ Pi as the
performance measure of bit allocation for frame i and for a
GOP respectively.
If MSE in the LTR changes, the change of PEV σ e2_ JLi in
each frame i is nearly the same. If the change of MSE in
different STRs is the same, the change of PEV σ e2_ JSi in each
frame i is nearly the same as well. Then with the change of
MSE in LTR and STRs, the change of σ e2_ Pi in each frame is
nearly the same. Therefore, σ e2_ Pi can be used instead of
n
∑ σ
i =1
2
e _ Pi
as the performance measure of bit allocation for the
GOP.
Finally
σ e2_ P = σ e2_ JL 2 + σ e2_ JSA
is
utilized
as
the
(25)
then
∆D = −2ln 2 × D × ∆R ,
(26)
where ∆D is the change of MSE, ∆R is the change of bit-rate,
and D is the MSE in the frame.
After adding the changed MSE to the fixed MSE, and adding
the changed bit-rate to the fixed bit-rate, the different MSE
corresponding to different bit allocation in HQF and LQFs of
the (j-1)th GOP can be calculated.
The MSE of the reference frame is nearly directly
proportional to the PEV σ e2 from the reference frame, thus if
the MSEs in HQF and LQFs are known, σ e2_ JL 2 from LTR and
the average σ e2_ JSA from STRs can be calculated.
In the strategy of bit allocation change in the work, the bit
allocation change step in HQF is R H × 1% , and the changed bit
allocation in HQF is within the range ( −RH ×10%, +R H ×10% ),
where R H is the actual bit allocation in HQF after encoding the
(j-1)th GOP. The bit allocation change in HQF is averagely
compensated from LQFs . For each changed bit allocation case,
the σ e2_ P (σ e2_ JSA + σ e2_ JL 2 ) is calculated respectively. The bit
allocation ratio which brings the smallest σ e2_ P is reserved.
As in adjacent GOPs, the bit allocation ratios between HQF
bits and average LQF bits have little difference, so the reserved
bit allocation ratio in the (j-1)th GOP is selected as the bit
allocation ratio in the jth GOP.
performance measurement of bit allocation for the GOP in
accordance with that in the optimal LTR selection.
Step 3: Frame level bit allocation
B. Bit Allocation
The bit allocation in HQF T H and the average bit allocation
in LQFs T L _ ave in the jth GOP are separately calculated using
Step 1: GOP level bit allocation
the reserved bit allocation ratio and GOP level bit allocation,
In the jth GOP, GOP length is initialized as N for peforming
GOP level bit allocation. If j is equal to 1, N is set to 10.
Otherwise, N is set to the actual GOP length in the previous
GOP. Suppose the overall remaining frames waiting to be
encoded is M, while the remaining bits are R r . The target bit
allocation T ( j ) for the jth GOP is calculated as
T ( j) =
N
× Rr .
M
(24)
Step 2: Obtaining bit allocation ratio
After we obtained the GOP level bit allocation, the bit
allocation ratio Ra between HQF bits and average LQF bits in
TH =
Ra
× T ( j) ,
Ra + ( N − 1) × 1
T L _ ave =
1
× T ( j) .
Ra + ( N − 1) × 1
(27)
(28)
Step 4: Quantization parameter determination
For determining quantization parameter (QP) in HQF and
LQFs, the rate and quantization parameter models are similar to
those in [34] with some modifications. For LQFs and HQFs,
the quadratic rate control model is updated and stored
respectively. In every GOP, to maintain the smoothness of
LQFs quality, the average bit allocation in LQFs is utilized to
PAPER IDENTIFICATION NUMBER:3231
8
calculate the average QP ( Q ) in LQFs, and then in different
LQFs, the QP is a slightly adjusted based on the average QP. In
the work, for the first LQF in the GOP, the QP is set to Q +1.
For the middle LQFs (from the second LQF to the (2N / 3)th
LQF, where N is the actual GOP length of previous GOP), the
QP is set to Q . For the following LQFs, the QP is set to Q -1.
To reduce the error propagation, an error resilient JU-DFMC
structure is proposed. As shown in Fig. 8, for the HQF in each
GOP, only the previous LTR (HQF) is utilized for prediction,
and for LQFs, the prediction structure is unchanged. Then,
when transmission errors occur in LQFs of the current GOP,
the error will not be propagated to the following HQFs and
GOPs.
V. EXTENSION TO NOISY CHANNELS
B. Optimal LTR Selection and Bit Allocation in Error
Resilient JU-DFMC
To deduce the average increase of error propagation in a
GOP, we adopt [35]
Based on the proposed adaptive JU-DFMC, an error resilient
JU-DFMC prediction structure is firstly given. Then adaptive
LTR selection and bit allocation with respect to different packet
loss rates are presented.
A. Error Resilient JU-DFMC
For the video transmission over error prone channels, the
end-to-end distortion model [35][36] is extended in this work to
analyze the error propagation. In the end-to-end distortion
model, transmission error rates of two data partitions A (the
header information) and B (transformed coefficients of the
inter-coded blocks) are represented by p A and p B ,
respectively. Let f ni be the original value of pixel i in frame n,
and let fˆ ni and f ni be the reconstructed values in the encoder
and decoder, respectively. And suppose it references pixel k in
frame ref. Then expected inter-mode end-to-end distortion in
decoder is represented in [35] as
d(n,i) = E{( f ni − fni )2}
=(1− pA)E{( fˆrefk − frefk )2}+(1− pA)(1− pC)E{( f ni − fˆni )2}
+(1− pA)pCE{( f ni − fˆrefk )2}+ pA(E{( f ni − fˆni−1)2}+ E{( fˆni−1 − fni−1)2}) .(29)
= (1− pA)dep_ref +(1− pA)(1− pC)ds
+(1− pA)pCdec_ref _o + pA(dec_ prev_o +dep_ prev)
In (29), d (n, i) is the expected end-to-end distortion in decoder,
d ep _ ref is the error propagated distortion from the reference
frame, d s denotes the source distortion, d ec _ ref _ o indicates the
original referenced error-concealment distortion, d ec _ prev _ o
denotes the original previous error-concealment distortion,
d ep _ prev denotes the error-propagated distortion from previous
frame.
By the observation of (29), in the expected end-to-end
distortion d (n, i) , the percentage of error propagated distortion
d ep (including d ep _ ref and d ep _ prev ) is larger than the source
distortion d s , the error-concealment distortions d ec _ ref _ o and
d ec _ prev _ o . Furthermore, the error propagated distortion d ep
increases frame by frame in video decoding. Therefore, if there
is an error in the current frame, the image quality in the
following frames will be heavily affected.
d ep = (1 − p A )(1 − p C )d ep _ ref + (1 − p A ) p C (d ec _ ref _ r + d ep _ ref )
, (30)
+ p A (d ec _ prev _ r + d ep _ prev )
where d ep , d ep _ ref and d ep _ prev are the error propagated
distortions in the current HQF, the previous LTR and the
previous LQF, respectively. Since d ep _ prev only accounts for a
few
percentage
pA
LTR
LTR
Fig. 8 Error resilient JU-DFMC structure
d ep
,
we
adopt
p A × d ep _ prev ≈ p A × d ep _ ref . So (30) can be rewritten as
d ep ≈ d ep _ ref + (1 − p A ) p C d ec _ ref _ r + p Ad ec _ prev _ r .
(31)
If the GOP length is N, the average increase of the error
propagated distortion d ep _ ave in every frame of the GOP is
d ep _ ave =
d ep − d ep _ ref
N
≈
(1 − p A ) p C d ec _ ref _ r + p Ad ec _ prev _ r
N
. (32)
From (32), we can see that to reduce the average increase of
error propagated distortion, two factors can be adjusted. The
first is to increase of GOP length N. The second is to decrease
error concealment distortion values d ec _ ref _ r and d ec _ prev _ r ,
which needs to increase the bit allocation in the HQF. If more
bits are allocated to HQF, the prediction performance will be
better and the error concealment distortion will be smaller.
In determining the LTR (HQF) selection and bit allocation in
error resilient JU-DFMC, the method is the same as that in the
proposed adaptive JU-DFMC. But the PEV utilized in
calculating performance measure is not computed from source
distortion but the expected end-to-end distortion d (n, i) in (29).
Since the distortion of reference frame is nearly directly
proportional to the PEV σ e2 from the reference frame, and the
expected end-to-end distortion d (n, i ) LQ in LQF and d (n, i) HQ
in HQF are separately calculated by (29), the virtual PEV
σ e2_ JS from d (n, i) LQ and σ e2_ JL from d (n, i) HQ are
separately calculated as
σ e2_ JS =
σ e2_ JL =
LQ
LQ
LQ
LQ
HQ
f LQ
fi −HQ
fi LQ f i +LQ
N −1 f i − N f i − N +1 f i − N + 2 i − N + 3 f i − 2 f i −1
1
in
d (n, i ) LQ
d S _ LQ
d (n, i ) HQ
d S _ HQ
× σ e2_ JS ,
(33)
× σ e2_ JL ,
(34)
where d S _ LQ and d S _ HQ are the source distortion d S in
LQF and HQF, σ e2_ JS and σ e2_ JL are the PEV from the source
distortion d S _ LQ and the source distortion d S _ HQ . With the
PAPER IDENTIFICATION NUMBER:3231
9
Table II. Performance comparison of the proposed adaptive JU-DFMC with existing DFMC schemes
Sequence
Bit rate
(kbps)
PSNR in
CU-DFMC
(dB)
PSNR in
JU-DFMC
[16] (dB)
249.12
401.23
580.10
850.12
149.53
310.27
507.21
673.68
139.92
232.82
403.63
751.94
17.66
27.69
46.47
80.19
100.36
149.87
200.55
292.76
135.22
200.83
270.02
317.80
70.47
141.12
194.78
255.54
225.45
349.76
539.24
708.35
34.25
45.12
68.82
94.18
30.63
33.03
35.08
37.49
30.36
33.83
36.56
38.14
32.11
34.42
36.59
39.03
32.88
34.81
36.86
38.91
34.48
36.68
38.23
40.05
32.02
34.61
36.54
37.86
33.92
36.73
38.02
39.33
35.85
38.02
40.08
41.85
34.54
36.41
38.32
39.97
31.32
33.68
35.62
37.99
31.11
34.43
37.10
38.68
33.04
35.27
37.54
39.73
33.97
35.79
37.88
39.64
35.12
37.32
38.91
40.72
32.91
35.52
37.43
38.65
34.28
37.27
38.48
39.67
36.55
38.93
40.82
42.59
35.13
36.82
39.13
40.34
Mobile
(qcif)
Tempete
(qcif)
Waterfall
(cif)
Container
(qcif)
News
(cif)
Paris
(qcif)
Foreman
(qcif)
Silent
(cif)
Hall
(qcif)
PSNR in
Proposed
Adaptive
JU-DFMC
(dB)
32.32
34.43
36.22
38.51
31.87
34.99
37.48
39.09
34.14
36.21
38.29
40.37
34.56
36.19
38.24
40.04
35.44
37.67
39.26
40.97
33.23
35.82
37.88
38.99
34.54
37.64
38.73
39.99
36.99
39.23
41.60
42.99
35.29
37.03
39.36
40.57
Gain over
CU-DFMC
PSNR
1.69
1.40
1.14
1.02
1.51
1.16
0.92
0.95
2.03
1.79
1.7
1.34
1.68
1.38
1.38
1.13
0.96
0.99
1.03
0.92
1.21
1.21
1.34
1.13
0.62
0.91
0.71
0.66
1.14
1.21
1.52
1.14
0.75
0.62
1.04
0.6
Average
gain (dB)
1.31
1.14
1.72
1.39
0.98
1.22
0.73
1.25
0.75
Gain over
JU-DFMC[16]
PSNR
1. 00
0.75
0.60
0.52
0.76
0.56
0.38
0.41
1.10
0.94
0.75
0.64
0.59
0.40
0.36
0.4
0.32
0.35
0.35
0.25
0.32
0.30
0.45
0.34
0.26
0.37
0.25
0.32
0.44
0.30
0.78
0.4
0.16
0.21
0.23
0.23
Average
gain (dB)
0.72
0.53
0.86
0.44
0.32
0.35
0.30
0.48
0.21
Table III. The average percentage of blocks in each frame whose reference frame is LTR or STR
Sequence
Mobile (qcif)
Tempete (qcif)
Waterfall (cif)
Container (qcif)
News (cif)
Paris (qcif)
Foreman (qcif)
Silent (cif)
Hall (qcif)
Adaptive JU-DFMC
Reference frame is
Reference frame is
LTR(%)
STR (%)
63.91
36.09
49.85
50.15
68.22
31.78
52.87
47.13
68.93
31.07
42.35
57.65
37.41
62.52
45.62
54.36
64.35
35.65
same method as in the proposed adaptive JU-DFMC, the
performance measure σ e2_ T of LTR selection and performance
measure σ e2_ P of bit allocation in error resilient JU-DFMC are
σ
2
e _T
σ
=
2
e_ P
(σ 2
e _ JL 2
= σ e2_ JL 2
+ σ e2_ JSA ) / 2
+ σ e2_ JSA
.
,
(35)
(36)
CU-DFMC
Reference frame is
LTR (%)
24.61
8.20
8.50
1.12
1.08
5.20
14.20
1.23
0.80
Reference frame is
STR (%)
75.39
91.80
91.50
98.88
98.92
94.80
85.80
98.77
99.20
In (35) and (36), σ e2_ JL 2 is the virtual PEV in the second LQF
of the GOP when the reference frame is LTR; σ e2_ JSA is the
average virtual PEV in LQFs (from the second LQF to the last
encoded LQF) of the GOP when the reference frame is STR.
However, in the rate-distortion mode decision, the distortion
is not the overall end-to-end distortion d (n, i) but the source
distortion d S .
PAPER IDENTIFICATION NUMBER:3231
10
41
38
40
37
39
36
38
35
PSNR(dB)
PSNR(dB)
Mobile(qcif)
39
37
36
34
Adaptive JU-DFMC
33
JU-DFMC in [16]
32
CU-DFMC+RC
34
31
CU-DFMC+JU_BA
33
30
SFMC+JU_BA
32
29
Waterfall(cif)
Adaptive JU-DFMC
35
JU-DFMC in [16]
CU-DFMC+RC
CU-DFMC+JU_BA
SFMC+JU_BA
31
200
300
400
500
600
Bitrate(kbps)
700
800
900
100
200
300
400
500
Bitrate(kbps)
(a)
600
700
800
(b)
Fig. 9 Rate distortion curves for some sequences
Mobile(qcif)
38
40
37
39
36
35
Adaptive JU-DFMC
34
LTR_interval+2
33
LTR_interval-3
32
31
PSNR(dB)
PSNR(dB)
Waterfall(cif)
41
39
38
Adaptive JU-DFMC
37
LTR_interval+2
36
LTR_interval-3
LTR_BA+7%
35
LTR_BA+7%
LTR_BA-9%
34
30
LTR_BA-9%
33
200
300
400
500
600
Bitrate(kbps)
700
800
900
100
200
300
400
500
Bitrate(kbps)
(a)
600
700
800
(b)
Fig. 10 Rate distortion curves under different LTR intervals and different bit allocations
Waterfall(cif)
Mobile(qcif)
40
36
35
39
33
PSNR(dB)
PSNR(dB)
34
32
31
38
37
36
30
29
CU-DFMC
JU-DFMC [16]
SFMC+JU_BA
28
27
0
10
20
30
40
Adaptive JU-DFMC
CU-DFMC+JU_BA
CU-DFMC
JU-DFMC[16]
SFMC+JU_BA
35
Adaptive JU-DFMC
CU-DFMC+JU_BA
34
50
60
Frame_no
70
80
90
100
(a)
0
10
20
30
40
50
60
Frame_no
70
80
90
100
(b)
Fig. 11 Frame by frame PSNR in test sequences Mobile and Waterfall under the same bit rate
VI. EXPERIMENTAL RESULTS AND DISCUSSIONS
To evaluate the general performance of the proposed adaptive
JU-DFMC, we integrated the proposed methods into the
H.264/AVC reference software JM10.2. In the proposed
adaptive JU-DFMC, the first P frame is set as the first LTR, its
allocated bits are initialized as four times the average bits of the
following STRs. For the selection of the following LTR and the
corresponding bit allocation, the proposed methods in Section
III and Section IV are adopted. In the CU-DFMC, two most
recently decoded frames are used for motion compensation,
LTRs are continuously updated and not allocated any extra rate.
The test sequences are all encoded at 30 fps with 120 frames in
total for each sequence. In motion estimation, the search range
is ±16. The entropy coder is context-adaptive binary arithmetic
coding (CABAC). Each row of macroblocks comprises a slice
and is transmitted in a separate packet.
In Table II, the PSNRs of CU-DFMC (fixed QP in every
frame), the JU-DFMC [16] and the proposed adaptive
JU-DFMC are given respectively. The performance gain of the
proposed adaptive JU-DFMC under different bit rates is also
presented. Compared with the JU-DFMC [16], the average
PSNR gains obtained by the proposed adaptive JU-DFMC are
0.72, 0.53, 0.86, 0.44, 0.32, 0.35, 0.30, 0.48, and 0.21 dB in
sequences Mobile, Tempete, Waterfall, Container, News, Paris,
Foreman, Silent and Hall, respectively.
PAPER IDENTIFICATION NUMBER:3231
11
Table IV. Performance comparison of the error resilient JU-DFMC with other schemes
Sequence
Mobile
(qcif)
650.35kbps
Tempete
(qcif)
500.26kbps
Waterfall
(cif)
480.78kbps
Container
(qcif)
75.24kbps
News
(cif)
300.57kbps
Paris
(qcif)
200.83kbps
Foreman
(qcif)
150.52kbps
Silent
(cif)
404.26kbps
Hall
(qcif)
41.28kbps
Schemes
2HMC [26]
CU-DFMC +DPRD [36]
Adaptive JU-DFMC
Error resilient JU-DFMC
2HMC [26]
CU-DFMC +DPRD [36]
Adaptive JU-DFMC
Error resilient JU-DFMC
2HMC [26]
CU-DFMC +DPRD [36]
Adaptive JU-DFMC
Error resilient JU-DFMC
2HMC [26]
CU-DFMC +DPRD [36]
Adaptive JU-DFMC
Error resilient JU-DFMC
2HMC [26]
CU-DFMC +DPRD [36]
Adaptive JU-DFMC
Error resilient JU-DFMC
2HMC [26]
CU-DFMC +DPRD [36]
Adaptive JU-DFMC
Error resilient JU-DFMC
2HMC [26]
CU-DFMC +DPRD [36]
Adaptive JU-DFMC
Error resilient JU-DFMC
2HMC [26]
CU-DFMC +DPRD [36]
Adaptive JU-DFMC
Error resilient JU-DFMC
2HMC [26]
CU-DFMC +DPRD [36]
Adaptive JU-DFMC
Error resilient JU-DFMC
3%
31.49
31.51
33.56
34.42
32.38
32.41
32.47
33.56
33.04
33.06
33.12
34.96
35.55
35.56
35.63
36.82
36.31
36.43
36.52
37.23
31.94
32.01
32.13
33.32
34.26
34.45
33.82
35.57
36.94
37.01
37.19
37.84
35.16
35.18
35.23
35.87
PSNR of different packet loss
rates (dB)
5%
10%
20%
30.21
28.23
26.14
30.26
28.34
26.22
30.32
28.21
26.08
32.67
31.85
30.68
31.12
29.25
27.16
30.95
29.46
27.52
31.14
29.21
27.09
32.86
31.76
30.58
32.08
30.46
28.42
32.12
30.53
28.65
32.13
30.42
28.38
34.25
33.67
32.43
34.09
32.54
30.71
34.12
32.57
30.87
34.14
32.52
30.62
35.92
34.69
33.87
34.96
33.50
31.51
35.01
33.56
31.72
35.03
33.41
31.42
36.58
35.92
35.23
30.78
29.17
27.84
30.92
29.26
27.94
30.92
29.08
27.73
32.78
32.36
31.82
33.13
30.67
28.01
33.32
30.97
28.27
33.29
30.62
27.98
35.37
33.89
31.87
35.97
33.40
31.70
36.02
33.56
31.92
36.04
33.32
31.67
37.32
36.27
35.12
35.01
33.78
32.64
35.05
33.89
32.78
35.09
33.73
32.61
35.67
34.83
33.68
Table III shows the average percentage of blocks in every
frame that utilizes LTR or STR as reference frames. Intra mode
blocks are not included in the experimental results. It can be
seen that in the adaptive JU-DFMC, the percentage of blocks
which utilize LTR as reference frame is higher than that in
CU-DFMC.
The R-D curves in some sequences achieved from the
proposed adaptive JU-DFMC (Adaptive JU-DFMC),
JU-DFMC in [16], CU-DFMC with rate control [34]
(CU-DFMC+RC) are shown in Fig. 9. Furthermore,
CU-DFMC structure with proposed JU-DFMC bit allocation
profile (CU-DFMC+JU_BA), single reference frame motion
compensation with proposed JU-DFMC bit allocation profile
(SFMC+JU_BA) are also presented in the figure. In
CU-DFMC+JU_BA and SFMC+JU_BA, the bit allocation in
every frame is the same as that in the proposed adaptive
JU-DFMC scheme. But for every frame in CU-DFMC+JU_BA,
only the two recently decoded frames are utilized for motion
compensation. For every frame in SFMC+JU_BA, only one
recently decoded frame is utilized for motion compensation.
Original PSNR (dB)
3%
35.32
33.01
36.79
36.45
35.92
33.51
37.33
37.03
36.75
34.58
38.79
38.50
38.18
36.88
39.84
39.53
39.98
37.82
41.18
40.89
33.94
32.92
35.82
35.38
36.55
35.82
37.87
37.53
38.17
37.12
40.02
39.74
34.62
35.52
36.49
36.22
5%
35.32
31.94
36.79
36.33
35.92
32.46
37.33
36.92
36.75
33.79
38.79
38.41
38.18
35.56
39.84
39.43
39.98
36.61
41.18
40.81
33.94
32.02
35.82
35.30
36.55
34.87
37.87
37.41
38.17
36.34
40.02
39.66
34.62
35.54
36.49
36.12
10%
35.32
30.22
36.79
36.21
35.92
31.16
37.33
36.79
36.75
32.43
38.79
38.31
38.18
34.12
39.84
39.32
39.98
35.12
41.18
40.72
33.94
30.56
35.82
35.19
36.55
32.99
37.87
37.26
38.17
35.12
40.02
39.54
34.62
34.57
36.49
36.04
20%
35.32
28.31
36.79
36.10
35.92
29.62
37.33
36.67
36.75
30.78
38.79
38.22
38.18
32.68
39.84
39.20
39.98
33.52
41.18
40.65
33.94
29.04
35.82
35.09
36.55
30.98
37.87
37.14
38.17
33.72
40.02
39.45
34.62
33.69
36.49
35.94
The performance of CU-DFMC+JU_BA is worse than the
proposed adaptive JU-DFMC. This is because in
CU-DFMC+JU_BA, some frames following HQF do not
utilize the HQF as references to improve coding performance.
SFMC+JU_BA also has worse performance than the proposed
adaptive JU-DFMC. The reason is that only one frame is
utilized for motion compensation.
If the proposed bit allocation method does not change, but
the LTR interval is 2 frames larger or 3 frames less than the
actual LTR interval proposed in adaptive JU-DFMC
(separately named as LTR_interval+2 and LTR_interval-3), the
performance are given in Fig.10. If the proposed LTR interval
does not change, but LTR bit allocation is 7% larger or 9% less
than the actual LTR bit allocation proposed in adaptive
JU- DFMC ( separately named as LTR_BA+7% and
LTR_BA-9%), the performance are given in Fig.10 as well.
The performance of LTR_interval+2, LTR_interval-3,
LTR_BA+7% and LTR_BA-9% are slightly lower than that in
the proposed adaptive JU-DFMC scheme. It shows that the
PAPER IDENTIFICATION NUMBER:3231
12
1.4
11
Bit allocation ratio
GOP Length(In frames)
12
10
9
8
Mobile
Foreman
7
6
1.3
1.2
Mobile
Foreman
1.1
1
0
5
10
15
Packet loos rate(%)
20
Fig. 12 Average jump updating parameter of LTR
proposed adaptive LTR selection and bit allocation are
effective.
The frame by frame PSNR comparison of proposed adaptive
JU-DFMC (Adaptive JU-DFMC), JU-DFMC in [16],
CU-DFMC+JU_BA, SFMC+ JU_BA and CU-DFMC (fixed
QP in every frame) under the same bit-rate in sequences Mobile
(255.71 kbps), Tempete (220.62kbps), Waterfall (362.82 kbps)
and Container (71.64 kbps) is shown in Fig. 11. It can be seen
that the PSNR deterioration after the HQF in [16] is much more
graceful than the proposed scheme, but the PSNR in majority of
LQFs in the proposed scheme is better than that in [16] and
CU-DFMC. The PSNR in HQFs in the proposed adaptive
JU-DFMC is much larger than that of LQFs. This will
introduce objectionable pulses in quality over time. But as
mentioned in [14], when the PSNR of the sequence is higher
than 30dB, the pulsing is not perceptible. Furthermore, the
pulsing can benefit some applications, such as video
surveillance, the HQF can create higher image quality of video
surveillance content. In the video communication or
multimedia, the HQF can be quantized by a larger QP when it is
playbacked in decoder to maintain the similar image quality as
that in LQFs, then the overall quality fluctuation in the
proposed scheme is less than that in [16]. The image quality in
LQFs can likewise be enhanced using the previous and the
following HQFs in decoder.
The experimental results of the error resilient JU-DFMC in
decoder are repeated 100 times using the bit error sequences
which are transmitted from the encoder via the error-prone
channels [38]. Although the method in [26] is used without the
need of live encoding, whereas the proposed error resilient
JU-DFMC needs to adaptively adjust the coding parameters,
the performance of 2HMC [26] is listed to provide useful
information. CU-DFMC+DPRD [36] is also utilized for
performance comparison here. In Table IV, the experimental
results of the proposed error resilient JU-DFMC (Error resilient
JU-DFMC), CU-DFMC+DPRD [36], proposed adaptive
JU-DFMC (Adaptive JU-DFMC), 2HMC [26] are listed and
compared.
Under the given bit-rate, the original PSNR of proposed
sequences without error is shown on the right side of Table IV.
The PSNRs of the error resilient JU-DFMC and the other
coding schemes measured after packet loss rate are listed on the
left side of Table IV. 2HMC [26] has slightly lower
performance than CU-DFMC+DPRD [36], it is because error
0
5
10
15
Packet loos rate (%)
20
Fig. 13 Ratio of bit allocation in LTR
propagation in [26] is alleviated, but not terminated. Adaptive
JU-DFMC is a slightly better than CU-DFMC+DPRD [36] in
low packet loss rate. This is because lots of blocks in LQF
utilize LTR as their reference frame, if some errors occur in
LQF while the error occurred areas are not utilized as reference
for the next LQF, the error will not be propagated to following
frames. And the other reason is that the original PNSR of the
adaptive JU-DFMC is better than CU-DFMC +DPRD [36].
However, at high packet loss rate, the performance of the
adaptive JU-DFMC is less than that in DFMC+DPRD [36].
This is because the adaptive JU-DFMC can not terminate error
propagation.
Compared to CU-DFMC+DPRD [36], the proposed error
resilient JU-DFMC can achieve a maximum gain of 4.46 dB on
Mobile sequence at a packet loss rate of 20% and an average
gain of 2.23 dB on all the sequences under the different packet
loss rates. This is because in CU-DFMC+DPRD [36], the error
propagation is terminated by inserting intra mode blocks, but
the R-D cost is huge, and sometimes the error cannot be
accurately predicted in encoder. Compared with the adaptive
JU-DFMC, the proposed error resilient JU-DFMC can achieve
a maximum gain of 4.60 dB on Mobile sequence at packet loss
rate at 20% and an average gain of 2.27 dB on the all sequences
under the different packet loss rates. This shows the proposed
error resilient JU-DFMC is effective in eliminating error
propagation. The elimination of error propagation comes from
two factors, the first factor is that if an error occurred in LQFs
in the error resilient JU-DFMC (this kind of error takes the
largest percentage), the error propagation will be terminated in
the last LQF of the GOP. The second factor is that even error
occurs in HQF, the average error propagation speed is smaller
than the other schemes since HQF interval is large. However,
HQF is more important than LQFs. More protection to HQFs is
needed as indicated in [20].
In addition, Fig. 12 shows the average jump updating
parameter of LTR at different packet loss rates. Fig. 13 shows
the ratio of bit allocation for LTR under different packet loss
rates compared with that without packet loss rate. It can be seen
that with the increase of packet loss rate, the jump updating
parameter and bits in HQF (LTR) increase adaptively.
VII. CONCLUSIONS
In this paper, an optimal LTR selection and bit allocation for
JU-DFMC has been presented. Firstly the rate-distortion
PAPER IDENTIFICATION NUMBER:3231
performance of MCP for DFMC was analyzed. Then based on
the analysis, an adaptive JU-DFMC with the optimal LTR
selection and the bit allocation was given. The proposed
adaptive JU-DFMC can lead to better performance than
previous JU-DFMC schemes. For video transmission over
noisy channels, the error propagation of the proposed adaptive
JU-DFMC was analyzed. Furthermore, an error resilient
JU-DFMC considering the LTR selection and the bit allocation
was presented. The error resilient DFMC can obtain a much
better performance in video transmission over noisy channels.
The proposed schemes can be used in multimedia applications
and video surveillance systems. It can also be utilized to
instruct the rental of extra bandwidth for the video transmission
over cognitive radio. In the future, the rate control for the
proposed scheme will be further exploited.
ACKNOWLEDGMENT
The authors would like to thank the editor and the
anonymous reviewers whose invaluable comments and
suggestions led to greatly improved manuscript.
APPENDIX
A. Proof of Equation (11)
We know 0 < P1J (Λ ) < P1CH (Λ) and 0 < P2J (Λ ) < P2CH (Λ ) ,
J
P1 (Λ ) < 1 . And the difference between DEVs from STRs in
JU-DFMC and CU-DFMC is much smaller than the DEV from
STR in JU-DFMC, then P1CH (Λ) − P1J (Λ ) P1J (Λ ) . From (10),
we have
13
STR, the prediction performance in the STR from its reference
frames (its STR and LTR) is nearly the same as that in the
coding of LTR (HQF). So whenever STR is encoded as a LQF
J
or a HQF, the PSD Φ ee
_1 ( Λ ) in the STR is nearly the same as
J
J
J
the PSD Φ ee
_ 2 ( Λ ) in LTR, Φ ee _1 ( Λ ) ≈ Φ ee _ 2 ( Λ ) . So (B1) can
be further rewritten as
sut ee (Λ) −Φee (Λ)
Φ
1
J (Λ) −ΦCL (Λ)) − 1 h(ΦCH (Λ) −Φ J (Λ)) +Φ (P LQ − P HQ) ,(B2)
≈ h(Φee
ss
_1
ee _1
ee _1
ee _1
2
2
1 sut
= h(Φee _1(Λ) −Φee _1(Λ)) +Φss (P LQ − P HQ)
2
sut ee _1 (Λ ) represents the performance gain when the
in (B2), Φ
STR is encoded as LQF in JU-DFMC, Φ ee _1 (Λ) represents the
performance loss when the STR is encoded as HQF in
JU-DFMC.
C. Proof of Equation (21)
Before and after the change of bit allocation, the PSD in
frame i is separately denoted as Φ eei (Λ) and Φ eei (Λ) . From (4),
we have
Φ eei (Λ) − Φ eei (Λ)
1
1 = h(Φ eei _1(Λ) − Φ eei _1(Λ)) + h(Φ(Λ) eei _ 2 − Φ eei _ 2 (Λ)) + Φ ss (Λ)
4
4
1
1
×( Pi1 (Λ) Pi 2 (Λ) − Pi1 (Λ) − Pi 2 (Λ)) − ( Pi1 (Λ) Pi 2 (Λ) − Pi1 (Λ) − Pi 2 (Λ)))
2
2
1 1 = h(Φ eei _1(Λ) − Φ eei _1(Λ)) + h(Φ(Λ) eei _ 2 − Φ eei _ 2 (Λ))
4
4
1
+ Φ ss (Λ)((2 − Pi 2 (Λ))(2 − Pi1 (Λ)) − (2 − Pi 2 (Λ))(2 − Pi1 (Λ)))
2
1
1
P HQ ≈ (P1J (Λ) × ( P2CH (Λ) −1) − P1J (Λ) × ( P2J (Λ) −1)) + (P2J (Λ) − P2CH (Λ))
REFERENCES
2
2
1
= (P1J (Λ) × × (P2CH (Λ) − P2J (Λ)) + (P2J (Λ) − P2CH (Λ))
[1] T. Sikora, “The MPEG-4 video standard verification model,” IEEE Trans.
2
Circuits Syst. Video Technol., vol. 7, no 1, pp. 19–31, Feb. 1997.
1
[2] G.Cote, B.Erol, M. Gallant, and F. Kossentini, “H.263+: Video coding at
= (1 − P1J (Λ)) × (P2J (Λ) − P2CH (Λ)) < 0
2
low bit rates,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no 7,
B. Proof of Equation (12)
sut ee (Λ ) and Φ ee (Λ ) are the R-D performance gain
Suppose Φ
and loss when frame j is separately encoded as LQF and HQF
sut ee (Λ )
in JU-DFMC. The difference of R-D performance gain Φ
and R-D performance loss Φ ee (Λ) can be obtained as
sut ee(Λ) −Φee(Λ)
Φ
1
J (Λ) −ΦCL (Λ)) + h(Φ J (Λ) −ΦCL (Λ))
= (h(Φee
ee _1
ee _2
ee _2
_1
4
J
CH
J
LQ
HQ
− h(ΦCH
ee _1(Λ) −Φee _1(Λ)) − h(Φee _2(Λ) −Φee _2(Λ))) +Φss (P − P ) .(B1)
1
1
1
1
J (Λ) − (ΦCL (Λ) +ΦCL (Λ)))− h( (ΦCH (Λ) +ΦCH (Λ))
= h(Φee
ee _1
ee _2
ee _1
ee _2
_1
2
2
2 2
J
−Φee_2(Λ)) +Φss (P LQ − P HQ)
For every frame in CU-DFMC, the prediction performance
from reference frames (LTR and STR) is roughly similar, then
CL
the PSD in every frame is roughly similar, Φ CL
ee _1 ( Λ ) ≈ Φ ee _ 2 ( Λ )
CH
and Φ CH
ee _1 ( Λ ) ≈ Φ ee _ 2 ( Λ ) . In JU-DFMC, the GOP length of the
current GOP is nearly the same as that in the previous GOP. For
the last several frames in the current GOP, in the coding of its
pp.849–865, Nov. 1998.
T. Wiegand, “Joint final committee draft for joint video specification
H.264,” Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG,
JVT-D157, 2002.
[4] D. Hepper, “Efficiency analysis and application of uncovered background
prediction in a low bit rate image coder,” IEEE Trans. Commun., vol. 38,
no 9, pp. 1578–1584, Sept. 1990.
[5] F. Dufaux and F. Moscheni, “Background mosaicking for low bit rate
video coding,” in Proc. IEEE Int. Conf. Image Processing, Lausanne,
Switzerland, Sept. 1996.
[6] ISO/IEC JTC1/SC29/WG11 MPEG96/N1648, “Core experiment on
sprites and GMC,” Apr. 1997.
[7] M. Budagavi and J.D. Gibson, “Multiframe video coding for improved
performance over wireless channels,” IEEE Trans. Image Process., vol.10,
no. 2, pp. 252-265, Feb. 2001.
[8] T. Wiegand, X. Zhang, and B. Girod, “Long-term memory
motion-compensated prediction,’’ IEEE Trans. Circuits Syst. Video
Technol., vol. 9, no. 1, pp. 70-84, Feb. 1999.
[9] A. Leontaris and P. C. Cosman, “Video compression for lossy packet
networks with mode switching and a dual-frame buffer,” IEEE Trans.
Image Process., vol. 13, no. 7, pp. 885–897, July 2004.
[10] ISO/IEC JTC1/SC29/WG11 MPEG96/M0654, “Core experiment of
video coding with block-partitioning and adaptive selection of two frame
memories (STFM/LTFM),” Dec. 1996.
[11] T. Fukuhara , K. Asai, and T. Murakami, “Very low bit-rate video coding
with block partitioning and adaptive selection of two time-differential
frame memories,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no 1,
pp. 212-220, Feb. 1997.
[3]
PAPER IDENTIFICATION NUMBER:3231
[12] V. Chellappa, P. C. Cosman, and G.M. Voelker, “Dual frame motion
compensation for a rate switching network,” Asilomar Conf. on Signals,
Systems, and Comp., Nov. 2003.
[13] V. Chellappa, P. C. Cosman, and G.M. Voelker, “Dual frame motion
compensation with uneven quality assignment,” IEEE Data Compression
Conference 2004, pp. 262-271, Mar 2004.
[14] V. Chellappa, P. C. Cosman, and G.M. Voelker, “Dual frame motion
compensation with uneven quality assignment,” IEEE Trans. Circuits
Syst. Video Technol., Volume 18, no 2, pp. 249-256, Feb. 2008.
[15] M. Tiwari and P. C. Cosman, “Dual frame video coding with pulsed
quality and a lookahead window,” IEEE Data Compression Conference
2006, pp. 372-381, Mar 2006.
[16] M. Tiwari and P. C. Cosman, “Selection of long-term reference frames in
dual-frame video coding using simulated annealing,” IEEE Signal
Processing Letters, Vol. 15, pp. 249-252, 2008.
[17] A. Leontaris and P. C. Cosman, “Video compression with intra/inter mode
switching and a dual frame buffer,” IEEE Data Compression Conference
2003, pp. 63-72, 2003.
[18] A. Leontaris, V. Chellappa, and P. Cosman, “Optimal mode selection for a
pulsed-quality dual frame video coder,” IEEE Signal Processing Letters,
Vol. 11, No. 12, pp. 952-955, Dec. 2004.
[19] A. Leontaris and P. C. Cosman, “Dual frame video encoding with
feedback,” Asilomar Conf. on Signals, Systems, and Comp., Nov. 2003.
[20] V. Chellappa, P. C. Cosman, and G. M. Voelker, “Source and channel
coding trade-offs for a pulsed quality video encoder,” 40th Asilomar
Conference on Signals, Systems and Computers, Nov. 2006.
[21] V. Chellappa, P. C. Cosman, and G.M. Voelker, “Error concealment for
dual frame video coding with uneven quality assignment,” IEEE Data
Compression Conference 2005, pp. 319-328, Mar. 2005.
[22] A. Leontaris and P. C. Cosman, “End-to-end delay for hierarchical
B-pictures and pulsed quality dual frame video coders,” in Proc. IEEE Int.
Conf. Image Process., pp. 3133–3136, Oct. 2006.
[23] A. Leontaris and P. C. Cosman, “Compression efficiency and delay
tradeoffs for hierarchical B-pictureframes and pulsed-quality frames,”
IEEE Trans. Image Process., vol. 16, no. 7, pp. 1726–1740, July 2007.
[24] C.-S. Kim, R.-C. Kim, and S.-U. Lee, “Robust transmission of video
sequence using double-vector motion compensation,” IEEE Trans.
Circuits Syst. Video Technol., vol. 11, no. 9, pp. 1011–1021, Sep. 2001.
[25] S. Lin, Y. Wang, “Error resilience property of multihypothesis motion
compensated prediction,” IEEE Int. Conf. Image Process., pp. 545–548,
2002.
[26] Y.-C. Tsai, C.-W. Lin, and C.-M. Tsai, “H.264 error resilience coding
based on multihypothesis motion-compensated prediction,” Signal
Process.: Image Commun., vol. 22, no. 9, pp. 734-751, Oct. 2007.
[27] W.-Y. Kung, C.-S. Kim, C.-C. Kuo, “Analysis of multihypothesis motion
compensated prediction (MHMCP) for robust visual communication,”
IEEE Trans. Circuits Syst. Video Technol., Volume 16, no 1, pp. 146–153,
Jan. 2006.
[28] M. Ma, Oscar C. Au, Liwei Guo, S.-H. Gary Chan, Xiaopeng Fan and
Ling Hou, “Alternate motion-compensated prediction for error resilient
video coding,” J. Visual Commun. Image representation, Volume 19, No.
7, Oct. 2008.
[29] I. Rhee, and S. Joshi, "Error recovery for interactive video transmission
over the internet," IEEE J. Sel. Areas Commun., vol. 18, no. 6,
pp.1033-1049, Jun. 2000.
[30] B. Girod, “The efficiency of motion-compensating prediction for hybrid
coding of video sequences,” IEEE J. Selecl. Areas Commun., vol. SAC-5,
no. 8, pp. 1140-1154, Aug. 1987.
[31] B. Girod, “Efficiency analysis of multihypothesis motion-compensated
prediction for video coding,” IEEE Trans. Image Process., vol. 9, no.2, pp.
173–183, Feb. 2000.
[32] N. S. Jayant and P. Noll, Digital Coding of Waveforms. Englewood Cliffs,
NJ: Prentice-Hall, 1984.
[33] T. M. Cover, Joy A. Thomas, “Elements of information theory,” Chapter
13,
Available:
http://www.matf.bg.ac.yu/nastavno/viktor/Rate_Distortion_Theory.pdf
[34] Z. G. Li, F. Pan, K. P. Lim, G. Feng, X. Lin, and S. Rahardja, “Adaptive
basic unit layer rate control for JVT,” presented at the 7th JVT Meeting,
JVT-G012-rl Thailand, Mar. 2003.
14
[35] Y. Zhang, Wen Gao, Debin Zhao, “Joint data partition and
Rate-Distortion optimized mode selection for H.264 error-resilient
coding,” IEEE 8th Workshop on Multimedia Signal Process., pp.248-251,
Oct. 2006.
[36] Y. Zhang, Wen Gao, Yan Lu, Qingming Huang, Debin Zhao, “Joint
source-channel rate-distortion optimization for H.264 video coding over
error-prone networks,” IEEE Trans. Multimedia, Vol. 9, no 3, pp.445 –
454, Apr. 2007.
[37] S. Wenger, “Error patterns for Internet experiments,” ITU-T SG16,
Doc.Q15-I-16r1,
1999.
http://ftp3.itu.int/av-arch/video-site/9910_Red/q15i16r1.zip.
Da Liu received MSc degree in Computer Science from
Harbin Institute of Technology, China, in 2004. And now,
he is working for his PhD degree in Computer Science at
Harbin Institute of Technology. In 2005, he joined the Joint
Research and Development Laboratory as a research
assistant. His research interests include image/video coding,
image processing, and computer vision.
Debin Zhao received the B.S., M.S., and Ph.D. degrees
from the Harbin Institute of Technology (HIT), Harbin,
China, in 1985, 1988, and 1998, respectively, all in
computer science.
He joined the Department of Computer Science, HIT, as
an Associate Professor in 1993. He is currently a Professor
with HIT and the Institute of Computing Technology,
Chinese Academy of Sciences, Beijing. His research interests include video
coding and transmission, multimedia processing, and pattern recognition. He
has authored or coauthored over 200 publications. Dr. Zhao was the recipient of
three National Science and Technology Progress Awards of China (Second
Prize) and the Excellent Teaching Award from the Baogang Foundation.
Xiangyang Ji received the B.S. and M.S. degrees in
computer science from Harbin Institute of Technology,
Harbin, China, in 1999, and 2001, respectively, and the
Ph.D. degree in computer science from Institute of
Computing Technology, and the Graduate School of
Chinese Academy of Science, Beijing, in 2008. He has
authored or co-authored over 40 conference and journal
papers. He is now with Broadband Networks & Digital
Media Laboratory of Automation Department, Tsinghua University. His
research interests include video/image coding, video streaming, and multimedia
processing.
Wen Gao (M’92-SM’05-Fel’2009) received the Ph.D.
degree in electronics engineering from the University of
Tokyo, Japan. He is a professor of computer science in
Peking University, China. Before joining Peking University,
he was a full professor of computer science in Harbin
Institute of Technology from 1991 to 1995, and with CAS
(Chinese Academy of Sciences) from 1996 to 2005, he
served as a professor, the managing director of the Institute
of Computing Technology of CAS, the executive vice
president of the Graduate School of CAS, and a vice president of the University
of Science and Technology of China. He is the editor-in-chief of Journal of
Computer (a journal of Chinese Computer Federation), an associate editor of
IEEE Transactions on Circuits and Systems for Video Technology, an associate
editor of IEEE Transactions on Multimedia, an associate editor of IEEE
Transactions on Autonomous Mental Development, an area editor of EURASIP
Journal of Image Communications, and an editor of Journal of Visual
Communication and Image Representation. He chaired a number of prestigious
international conferences on multimedia and video signal processing, and also
served on the advisory and technical committees of numerous professional
organizations. He has published extensively including four books and over 500
technical articles in refereed journals and conference proceedings in the areas of
image processing, video coding and communication, pattern recognition,
multimedia information retrieval, multimodal interface, and bioinformatics. He
is a fellow of IEEE.
Download