1 - Tinkos

advertisement
Operational Rate-Distortion
information theory in optimization of
advanced digital video codec
Dragorad A. Milovanović
Zoran S. Bojković
DragoAM@Gmail.com
z.bojkovic@yahoo.com
University of Belgrade
TINKOS
25.09.2013.
1/25
CONTENTS
1. Rate-Distortion theory
1.1 Source coding and R-D function
1.2 Operational R-D framework
1.3 Formulation of efficient video coding
2. Operational control of standard-based encoder
2.1 Operational MPEG framework
2.2 Performance/efficiency of digital video codec
2.3 Bitrate control and joint optimization
2/25
1. Rate-Distortion theory
 Information Transmission System (message, symbols encoding, entropy)
Sˆ
S
R
 Source coding: perceptual signals and distortion criterion D ≤ Dmax

  p s , sˆ  d
Average distortion: D S , Sˆ 
i
j
i j
,
di
j
i, j
 0

 0
s i  sˆ j
s i  sˆ j
 Rate-distortion theory calculates the minimum transmission bitrate R
for a required video quality D.
 Mutual information is the information that symbols S and symbols Sˆ convey about
each other.
p s i , sˆ j 
Average mutual information: I S ; Sˆ   H  S   H S | Sˆ   H Sˆ   H Sˆ | S    p s i , sˆ j 
p s i  p sˆ j 
s , sˆ
 Channel coding: channel capacity C is a maximum of mutual information I between
source and destination.
i
j
3/25
1.1 Source coding and R-D function
 For a given maximum average distortion Dmax , the rate distortion function is lower
bound for the transmission bitrate R L  D   min I S ; Sˆ 
D  D max
1. Shannon lower bound RL(D) assumes statistical independence between distortion and reconstruction.
2. R(D) function is non-increasing and convex function of D.
3. For continuous source S, function R(D) approaches infinity as D approaches zero.
4. For discrete source S, the minimum rate that is required for a lossless transmission is equal to the
entropy rate R(0)=H(S) (losseless coding).
Stochastic model of Laplacian pdf source
(variance σ2=1): DL(R) = e/π・σ2・2−2R
Stochastic model of Gauss-Markov source
(correlation 0<ρ<0.9): DL(R) = (1- ρ2)・σ2・2−2R
1.2 Operational (R,D) framework
 In a practical coding framework, structure of the coder is determined and finite set of
encoding modes is defined. In addition, it is usually difficult or simply impossible to
find closed-form expressions for the R(D) and D(R) functions for general sources.
 Then, each of encoding parameters choices lead to pair of rate and distortion values
of operational point in R-D plane. The lower bound of all these rate-distortion pairs
is referred as ORD function.
 Block diagram for a typical lossy source coding system:
 block code QN={αN,βN,γN} (N consecutive input samples are independently coded)
 bitrate R (average number of bits per source symbol)
 additive distortion measure D (MSE of source/reconstructed symbols)
5/25
Operational R-D function
 For given source S and code Q, operational point (R,D) is defined R=r(Q) and D=δ(Q).
 Operational plane R-D is possible partitioned into region of achievable rate distortion
points (R,D) if there is a code Q with r(Q)≤R and δ(Q)≤D. The function R(D) that
describes this fundamental bound for a given source S is the operational function ORD.
 The ORD boundary regions of achievable rate distortion points specifies:
A. the minimum rate R that is required for representing the source S with a distortion less than or equal
to a given value D or, alternatively,
B. the minimum distortion D that can be achieved if the source S is coded at a rate less than or equal to
a given value R.
D=Max
Region of achievable rate-distortion
points (R,D)
Operational R(D) function
R=Max
Rmin
Dmin
6/25
Quantization
 Uniform scalar quantizer (Δ=const, D~Δ2/12, opt. γ)
 Non-uniform optimal quantizer (Lloyd–Max centroids of pdf )
 Asymptotic performance DL(R) = σ2・εS2・2−2R (Shannon lower bound)
7/25
Etropy coding (γ)
 Variable length code (VLC ):
Huffman code minimize average code length
Ls = Σ p(si)・length(si) [bps]
Optimal code p*(si) minimize first-order entropy
Hs = - Σ p*(si)・log2 p*(si) [bps]
K= 2: p(s1) = P1, p(s2) = 1-P1
Hs = - P1 log2 P1 – (1-P1) log2 (1-P1) bits/symbol
P1 = 0.5 max Hs =1, Redundancys = log2K - Hs = 0
 Arithmetic encoder (CABAC):
adaptive estimation of statistical distribution p(si)
8/25
Predictive coding
 Differential coder
 Predictive coder (DPCM)
 Linear prediction Ŝn:
prediction coefficient pi
prediction error Un
reconstruction error U'n
 Optimal linear prediction (Un orthogonal on Ŝn)
Prediction error variance σ' 2 = εα2σ2 ≥ γS2 εα2σS2 , γ=sfm
asymptotic performance: Coding gain CG =1/ γS2
N =1: p1,opt =ρ1, CG=1/(1- ρ1 2)
N =2: p1,opt =ρ1 (1- ρ2)/(1- ρ1 2), p2,opt =ρ1 (ρ2 - ρ1 2) (1- ρ2)/(1- ρ1 2)
9/25
Transform coding
 Linear transformation
A transformation matrices
B inverse matrices
A orthogonal matrices A-1 = AT, AT A= A AT = I
B orthonormal matrices B = A-1 = AT (sum of N variances of coeff. = variance of s)
 Optimal linear transformation KLT (eigenvalues of auto-covariance matrices RSS)
Asymptotic performance: Coding gain CG =1/ γS2
Optimal bitrate allocation R between N quantizers:
N=2
R0 
R
2

1
R SS  

1
2
log


1
0
2
1
R1 
KLT S 
R
2

1 1

2 1
1
2
log
0
2
1
S
2

2
q0

2
q1
CG 

2
0

2 1/ 2
1
1 

 1
10/25
1.3 Formulation of efficient video coding
 Standard-based codec requires optimization procedure over a set of
allowed operating parameters as well as additional criteria that arise from
real-time operations (complexity, delay).
 The goal of operational information theory is to find a set of operating
parameters of the encoder which is optimal in R(D) sense. Also, an
efficient optimization procedure based on a fast algorithms solution
instead the full search of parameter’s space, is requires.
 Practical trade-off between the allowed distortion D and available bitrate
R in designing an encoder, is based on the discrete optimization
procedure of finding a local optimum operational (R, D) points.
11/25
Lagrange multiplier method
 Formulation of R-D problem: Cost function min D R  with constraint R  R max
D
Necessary condition for the existence of a minimum:
The solution: R *  R max , D min  D R max
R

 0
 Unconstrained Lagrangian cost function: min J R ,  , J  D R    R ,
 J R ,  
Necessary condition for the existence of a minimum :
R
 0,
0    
 J R ,  

0
The solution is simultaneous iteration of R and λ:  D R    *  0 , R *  R
0
max
R
*  
 D R 
R
,
R *  R max
12/25
Geometrical interpretation
Operational R-D function is convex border which
connects subset of local optimum operational
points (connected operational points are suboptimal solution of Lagrange method).
Optimal operational point (D,R) as a solution of
Lagrange method min(D+ λ R) for constant λ, is
operational point on convex border which
touches slope λ.
13/25
Optimal bit allocation
 Formulation:
N
Optimal bit allocation
R
i 1
i
 R max  const


with constraint min 
Ri 


i 1
 N
Di (Ri )  
 Unconstrained Lagrangian cost function: min J   min 
Ri
Ri
 i 1

 Necessary condition for the existence of a minimum:
 J R k ,  
R k



D i ( R i )


N
N

i 1

R i ,   0

N
 D i (R i )
i 1
R k
   0,
 J R k ,  

 The solution is simultaneous iteration of Ri and λ:
D k ( R k )
R k
    const ,
*
*
R  R max  0
14/25
0
Joint hierarchical optimization
 Optimal image decomposition and bitrate allocation:
1. discrete version of Lagrange multiplier method,
2. deterministic dynamic programming (forward/backward).
 The solution:
1. The image is decomposed to pre-specified number of levels.
2. For the adopted value of quality parameters λ = const, on each level of
decomposition is calculated operational point min(D + λR) for each partition and the
specified set of quantizers.
3. At each level of decomposition split/merge decision is made (principle of optimality)
in the comparison of the Lagrange function of successive levels of decomposition:
 D c 1  D c 2     R c 1  R c 2   D p   R p 
4. Binary search (Newton method) determines the optimal λ * for a given bitrate Rmax
and the initial search interval  l ,  h  R *  l   R max  R *  u 
15/25
2. Operational control of standard-based encoder
 Digital video encode exploits statistical redundancy of source
as well as perceptual irrelevancy of an user.
 Block-adaptive hybrid transform-entropy encoder with motion estimation&compensation:
Scope of standardization
16/25
2.1 Operational MPEG framework
ITU/MPEG process of standardization:
Encoding techniques and operational parameters:
17/25
Set of operational parameters
 The task of an encoder control is to determine the values of the
standardized syntax elements, and thus the bitstream b, for a given
input sequence in a way that the distortion between the input
sequence and its reconstruction is minimized subject to a set of
constraints on average and maximum bit rate.
R(QP)
D(QP)
 Let Bc be the set of all conforming bitstreams that obey the given
set of constraints. For distortion measure D, the optimal bitstream in
the rate–distortion sense is given by b *  min D  s , s ' b 
bBc
 Due to the huge parameter space and encoding delay, it is
impossible to directly apply the minimization. Instead, the overall
minimization problem is split into a series of K smaller minimization
problems (p is subset of operational parameters)
min D k  p ,
p Pk
D 

s i  sˆ i
e
Rk  p  Rc
iB
 The constrained minimization problem can be reformulated as an
2
 MODE  cQ
unconstrained minimization, where  MOTION   MODE
min J MOTION ,
J  D ( MAD )   MOTION R ( Q )
min J MODE ,
J  D   MODE R ( Q )
Q denotes the quantization step size, which is controlled by the
quantization parameter QP.
18/60
2.2 Performance/efficiency of digital video codec
1
1
1
H.265
HD720
2
2
QP=30 BR=512 PSNR= 39.66dB
QP=20 BR=512 PSNR= 34.00dB
2
2
QP=30 BR=512 PSNR= 39.36dB
3
HD720
H.263
H.265
HD720
3
1
3
H.263
QP=31 BR=512 PSNR= 30.94dB
3
H.265
QP=30 BR=512 PSNR= 39.24dB
19/60
H.263
QP=25 BR=512 PSNR= 32.78dB
Coding gain BRCG, PSNR=const
 The three test sequences (1/2/3) with typical video conferencing content was selected in
experiments (Vidyo 1280x720 60fps x10s).
 Each test sequence was coded at 12 different bitrates. The ORD function PSNRYUV(BR) are shown
for bitrates BR = 0.256, 0.384, 0.512, 0.850, 1.500 Mbps
 The combined PSNRYUV is first calculated as the weighted sum of the PSNR per picture of the
individual components (PSNR) to obtain PSNRYUV = (6·PSNRY + PSNRU + PSNRV)/8
where individual components are computed as PSNR = 10 log10 (2B-1)2/MSE, B=8
1
1
2
3
2
3
BitRate reduction of HEVC
vs. AVC based on subjective
MOS performance for typical
video conferencing bitrates
Coding gain PSNRCG, BR=const
 Variability PSNRY per frame (time) for BR=const
(BR~0.512Mbps: QPHEVC=30, QPAVC=32, QPH.263=20/31/25)
1
2
3
21/25
Complexity of encoder/decoder
 The encoding and decoding times for the representative HD720 sequences (60fps x 10s) are
shown.Times are recorded in 10s of seconds such as to illustrate the ratio to real-time operation:
 the HEVC encoding time exceed 1000 times real-time,
 the decoding time exceed 4 times real-time on an Ultrabook x86-64 Core i5 2/4@1.7GHz 4GB RAM.
1
2
3
22/25
2.3 Bitrate control
 The objective of rate control is to regulate the MPEG coded bit
stream to satisfy certain given conditions (variable/constant bits
budget constraints, buffer over/underflow prevention).
 Variable/Constant (VBR/CBR) bitrate is under control of
constant/variable quantization parameter QP in open/closed loop.
 A typical rate-control scheme consists of two basic operations:
1.bit allocation (R-D model), and
2.bit rate control (buffer occupancy measure).
 To achieve the target bit rate R, rate control scheme
appropriately chooses a quantization parameter Q . For
accuracy, it is of importance R-Q rate-quantization model.
Together with distortion-quantization D-Q function, R-Q functions
characterize the rate-distortion (R-D) behavior of video encoding.
 The first step of the derivation of a rate control formula is to
approximate the rate-distortion function R-Q by an inverse
proportional curve as shown in figure.
23/25
Joint encoding (Det/Stat Mux)
 Deterministic multiplex of L video sequences, CBR encoded with constant bitrate Ri
(variable Di and picture quality) in fixed channel caacity Rc :
2
 Ri  7836 1
1
2
 Ri  9331 2
2
R
2
 Ri  7427 3
3
i
 R channel , R i  const
( CBR )
i
 Statistical multiplex of L+SMCG video sequences, VBR encoded with variable bitrate Ri
(constant Di and picture quality). Criteria are joint buffer occupancy measure 0  B f ( t )  B s
1
2
R
3
i
.
.
.
.
.
.
i
 R channel , R i  R channel







 (VBR )
Xi 

24/25
Xi

i
References
[1] K.R.Rao, Z.S.Bojkovic, D.A.Milovanovic, Introduction to multimedia communications:
applications – middleware - networking, Wiley, 2005.
[2] K.R.Rao, Z.S.Bojkovic, D.A.Milovanovic, Multimedia communication systems:
techniques, standards, and networks, Prentice Hall, 2002.
[3] Y. Shoham, A Gersho, “Efficient bit allocation for an arbitrary set of quantizers,”
IEEE Trans. ASSP, vol.36,pp. 1445-1453,Sep 1988.
[4] T. Berger, Rate-Distortion theory: A mathematical theory for data compression,
Prentice-Hall, 1971.
[5] D.P. Bertsekas, Constrained optimization and Lagrange multiplier methods,
Athena Scientific, 1996.
[6] R. Bellman, Dynamic Programming, Princeton University Press, 1957.
[7] D.A.Milovanovic, Z.S.Bojkovic, From information theory to standard codec optimization for
digital visual multimedia, Seminar on Computer science and Applied mathematics - June
2013, Mathematical institute of the Serbian Academy of science and arts, and IEEE Chapter
Computer Science (CO-16), Belgrade, Serbia.
[8] D.Milovanović, Z.Milićević, Z.Bojković, MPEG video deployment in digital television: HEVC
vs. AVC codec performance study, 11th International Conference on Telecommunications in
Modern Satellite, Cable and Broadcasting Services TELSIKS2013, Nis, Serbia, Oct. 2013.
25/25
Download