1 - Tinkos

Operational Rate-Distortion information theory in optimization of advanced digital video codec Dragorad A. Milovanović Zoran S. Bojković DragoAM@Gmail.com z.bojkovic@yahoo.com University of Belgrade TINKOS 25.09.2013. 1/25 CONTENTS 1. Rate-Distortion theory 1.1 Source coding and R-D function 1.2 Operational R-D framework 1.3 Formulation of efficient video coding 2. Operational control of standard-based encoder 2.1 Operational MPEG framework 2.2 Performance/efficiency of digital video codec 2.3 Bitrate control and joint optimization 2/25 1. Rate-Distortion theory  Information Transmission System (message, symbols encoding, entropy) Sˆ S R  Source coding: perceptual signals and distortion criterion D ≤ Dmax    p s , sˆ  d Average distortion: D S , Sˆ  i j i j , di j i, j  0   0 s i  sˆ j s i  sˆ j  Rate-distortion theory calculates the minimum transmission bitrate R for a required video quality D.  Mutual information is the information that symbols S and symbols Sˆ convey about each other. p s i , sˆ j  Average mutual information: I S ; Sˆ   H  S   H S | Sˆ   H Sˆ   H Sˆ | S    p s i , sˆ j  p s i  p sˆ j  s , sˆ  Channel coding: channel capacity C is a maximum of mutual information I between source and destination. i j 3/25 1.1 Source coding and R-D function  For a given maximum average distortion Dmax , the rate distortion function is lower bound for the transmission bitrate R L  D   min I S ; Sˆ  D  D max 1. Shannon lower bound RL(D) assumes statistical independence between distortion and reconstruction. 2. R(D) function is non-increasing and convex function of D. 3. For continuous source S, function R(D) approaches infinity as D approaches zero. 4. For discrete source S, the minimum rate that is required for a lossless transmission is equal to the entropy rate R(0)=H(S) (losseless coding). Stochastic model of Laplacian pdf source (variance σ2=1): DL(R) = e/π・σ2・2−2R Stochastic model of Gauss-Markov source (correlation 0<ρ<0.9): DL(R) = (1- ρ2)・σ2・2−2R 1.2 Operational (R,D) framework  In a practical coding framework, structure of the coder is determined and finite set of encoding modes is defined. In addition, it is usually difficult or simply impossible to find closed-form expressions for the R(D) and D(R) functions for general sources.  Then, each of encoding parameters choices lead to pair of rate and distortion values of operational point in R-D plane. The lower bound of all these rate-distortion pairs is referred as ORD function.  Block diagram for a typical lossy source coding system:  block code QN={αN,βN,γN} (N consecutive input samples are independently coded)  bitrate R (average number of bits per source symbol)  additive distortion measure D (MSE of source/reconstructed symbols) 5/25 Operational R-D function  For given source S and code Q, operational point (R,D) is defined R=r(Q) and D=δ(Q).  Operational plane R-D is possible partitioned into region of achievable rate distortion points (R,D) if there is a code Q with r(Q)≤R and δ(Q)≤D. The function R(D) that describes this fundamental bound for a given source S is the operational function ORD.  The ORD boundary regions of achievable rate distortion points specifies: A. the minimum rate R that is required for representing the source S with a distortion less than or equal to a given value D or, alternatively, B. the minimum distortion D that can be achieved if the source S is coded at a rate less than or equal to a given value R. D=Max Region of achievable rate-distortion points (R,D) Operational R(D) function R=Max Rmin Dmin 6/25 Quantization  Uniform scalar quantizer (Δ=const, D~Δ2/12, opt. γ)  Non-uniform optimal quantizer (Lloyd–Max centroids of pdf )  Asymptotic performance DL(R) = σ2・εS2・2−2R (Shannon lower bound) 7/25 Etropy coding (γ)  Variable length code (VLC ): Huffman code minimize average code length Ls = Σ p(si)・length(si) [bps] Optimal code p*(si) minimize first-order entropy Hs = - Σ p*(si)・log2 p*(si) [bps] K= 2: p(s1) = P1, p(s2) = 1-P1 Hs = - P1 log2 P1 – (1-P1) log2 (1-P1) bits/symbol P1 = 0.5 max Hs =1, Redundancys = log2K - Hs = 0  Arithmetic encoder (CABAC): adaptive estimation of statistical distribution p(si) 8/25 Predictive coding  Differential coder  Predictive coder (DPCM)  Linear prediction Ŝn: prediction coefficient pi prediction error Un reconstruction error U'n  Optimal linear prediction (Un orthogonal on Ŝn) Prediction error variance σ' 2 = εα2σ2 ≥ γS2 εα2σS2 , γ=sfm asymptotic performance: Coding gain CG =1/ γS2 N =1: p1,opt =ρ1, CG=1/(1- ρ1 2) N =2: p1,opt =ρ1 (1- ρ2)/(1- ρ1 2), p2,opt =ρ1 (ρ2 - ρ1 2) (1- ρ2)/(1- ρ1 2) 9/25 Transform coding  Linear transformation A transformation matrices B inverse matrices A orthogonal matrices A-1 = AT, AT A= A AT = I B orthonormal matrices B = A-1 = AT (sum of N variances of coeff. = variance of s)  Optimal linear transformation KLT (eigenvalues of auto-covariance matrices RSS) Asymptotic performance: Coding gain CG =1/ γS2 Optimal bitrate allocation R between N quantizers: N=2 R0  R 2  1 R SS    1 2 log   1 0 2 1 R1  KLT S  R 2  1 1  2 1 1 2 log 0 2 1 S 2  2 q0  2 q1 CG   2 0  2 1/ 2 1 1    1 10/25 1.3 Formulation of efficient video coding  Standard-based codec requires optimization procedure over a set of allowed operating parameters as well as additional criteria that arise from real-time operations (complexity, delay).  The goal of operational information theory is to find a set of operating parameters of the encoder which is optimal in R(D) sense. Also, an efficient optimization procedure based on a fast algorithms solution instead the full search of parameter’s space, is requires.  Practical trade-off between the allowed distortion D and available bitrate R in designing an encoder, is based on the discrete optimization procedure of finding a local optimum operational (R, D) points. 11/25 Lagrange multiplier method  Formulation of R-D problem: Cost function min D R  with constraint R  R max D Necessary condition for the existence of a minimum: The solution: R *  R max , D min  D R max R   0  Unconstrained Lagrangian cost function: min J R ,  , J  D R    R ,  J R ,   Necessary condition for the existence of a minimum : R  0, 0      J R ,    0 The solution is simultaneous iteration of R and λ:  D R    *  0 , R *  R 0 max R *    D R  R , R *  R max 12/25 Geometrical interpretation Operational R-D function is convex border which connects subset of local optimum operational points (connected operational points are suboptimal solution of Lagrange method). Optimal operational point (D,R) as a solution of Lagrange method min(D+ λ R) for constant λ, is operational point on convex border which touches slope λ. 13/25 Optimal bit allocation  Formulation: N Optimal bit allocation R i 1 i  R max  const   with constraint min  Ri    i 1  N Di (Ri )    Unconstrained Lagrangian cost function: min J   min  Ri Ri  i 1   Necessary condition for the existence of a minimum:  J R k ,   R k    D i ( R i )   N N  i 1  R i ,   0  N  D i (R i ) i 1 R k    0,  J R k ,     The solution is simultaneous iteration of Ri and λ: D k ( R k ) R k     const , * * R  R max  0 14/25 0 Joint hierarchical optimization  Optimal image decomposition and bitrate allocation: 1. discrete version of Lagrange multiplier method, 2. deterministic dynamic programming (forward/backward).  The solution: 1. The image is decomposed to pre-specified number of levels. 2. For the adopted value of quality parameters λ = const, on each level of decomposition is calculated operational point min(D + λR) for each partition and the specified set of quantizers. 3. At each level of decomposition split/merge decision is made (principle of optimality) in the comparison of the Lagrange function of successive levels of decomposition:  D c 1  D c 2     R c 1  R c 2   D p   R p  4. Binary search (Newton method) determines the optimal λ * for a given bitrate Rmax and the initial search interval  l ,  h  R *  l   R max  R *  u  15/25 2. Operational control of standard-based encoder  Digital video encode exploits statistical redundancy of source as well as perceptual irrelevancy of an user.  Block-adaptive hybrid transform-entropy encoder with motion estimation&compensation: Scope of standardization 16/25 2.1 Operational MPEG framework ITU/MPEG process of standardization: Encoding techniques and operational parameters: 17/25 Set of operational parameters  The task of an encoder control is to determine the values of the standardized syntax elements, and thus the bitstream b, for a given input sequence in a way that the distortion between the input sequence and its reconstruction is minimized subject to a set of constraints on average and maximum bit rate. R(QP) D(QP)  Let Bc be the set of all conforming bitstreams that obey the given set of constraints. For distortion measure D, the optimal bitstream in the rate–distortion sense is given by b *  min D  s , s ' b  bBc  Due to the huge parameter space and encoding delay, it is impossible to directly apply the minimization. Instead, the overall minimization problem is split into a series of K smaller minimization problems (p is subset of operational parameters) min D k  p , p Pk D   s i  sˆ i e Rk  p  Rc iB  The constrained minimization problem can be reformulated as an 2  MODE  cQ unconstrained minimization, where  MOTION   MODE min J MOTION , J  D ( MAD )   MOTION R ( Q ) min J MODE , J  D   MODE R ( Q ) Q denotes the quantization step size, which is controlled by the quantization parameter QP. 18/60 2.2 Performance/efficiency of digital video codec 1 1 1 H.265 HD720 2 2 QP=30 BR=512 PSNR= 39.66dB QP=20 BR=512 PSNR= 34.00dB 2 2 QP=30 BR=512 PSNR= 39.36dB 3 HD720 H.263 H.265 HD720 3 1 3 H.263 QP=31 BR=512 PSNR= 30.94dB 3 H.265 QP=30 BR=512 PSNR= 39.24dB 19/60 H.263 QP=25 BR=512 PSNR= 32.78dB Coding gain BRCG, PSNR=const  The three test sequences (1/2/3) with typical video conferencing content was selected in experiments (Vidyo 1280x720 60fps x10s).  Each test sequence was coded at 12 different bitrates. The ORD function PSNRYUV(BR) are shown for bitrates BR = 0.256, 0.384, 0.512, 0.850, 1.500 Mbps  The combined PSNRYUV is first calculated as the weighted sum of the PSNR per picture of the individual components (PSNR) to obtain PSNRYUV = (6·PSNRY + PSNRU + PSNRV)/8 where individual components are computed as PSNR = 10 log10 (2B-1)2/MSE, B=8 1 1 2 3 2 3 BitRate reduction of HEVC vs. AVC based on subjective MOS performance for typical video conferencing bitrates Coding gain PSNRCG, BR=const  Variability PSNRY per frame (time) for BR=const (BR~0.512Mbps: QPHEVC=30, QPAVC=32, QPH.263=20/31/25) 1 2 3 21/25 Complexity of encoder/decoder  The encoding and decoding times for the representative HD720 sequences (60fps x 10s) are shown.Times are recorded in 10s of seconds such as to illustrate the ratio to real-time operation:  the HEVC encoding time exceed 1000 times real-time,  the decoding time exceed 4 times real-time on an Ultrabook x86-64 Core i5 2/4@1.7GHz 4GB RAM. 1 2 3 22/25 2.3 Bitrate control  The objective of rate control is to regulate the MPEG coded bit stream to satisfy certain given conditions (variable/constant bits budget constraints, buffer over/underflow prevention).  Variable/Constant (VBR/CBR) bitrate is under control of constant/variable quantization parameter QP in open/closed loop.  A typical rate-control scheme consists of two basic operations: 1.bit allocation (R-D model), and 2.bit rate control (buffer occupancy measure).  To achieve the target bit rate R, rate control scheme appropriately chooses a quantization parameter Q . For accuracy, it is of importance R-Q rate-quantization model. Together with distortion-quantization D-Q function, R-Q functions characterize the rate-distortion (R-D) behavior of video encoding.  The first step of the derivation of a rate control formula is to approximate the rate-distortion function R-Q by an inverse proportional curve as shown in figure. 23/25 Joint encoding (Det/Stat Mux)  Deterministic multiplex of L video sequences, CBR encoded with constant bitrate Ri (variable Di and picture quality) in fixed channel caacity Rc : 2  Ri  7836 1 1 2  Ri  9331 2 2 R 2  Ri  7427 3 3 i  R channel , R i  const ( CBR ) i  Statistical multiplex of L+SMCG video sequences, VBR encoded with variable bitrate Ri (constant Di and picture quality). Criteria are joint buffer occupancy measure 0  B f ( t )  B s 1 2 R 3 i . . . . . . i  R channel , R i  R channel         (VBR ) Xi   24/25 Xi  i References [1] K.R.Rao, Z.S.Bojkovic, D.A.Milovanovic, Introduction to multimedia communications: applications – middleware - networking, Wiley, 2005. [2] K.R.Rao, Z.S.Bojkovic, D.A.Milovanovic, Multimedia communication systems: techniques, standards, and networks, Prentice Hall, 2002. [3] Y. Shoham, A Gersho, “Efficient bit allocation for an arbitrary set of quantizers,” IEEE Trans. ASSP, vol.36,pp. 1445-1453,Sep 1988. [4] T. Berger, Rate-Distortion theory: A mathematical theory for data compression, Prentice-Hall, 1971. [5] D.P. Bertsekas, Constrained optimization and Lagrange multiplier methods, Athena Scientific, 1996. [6] R. Bellman, Dynamic Programming, Princeton University Press, 1957. [7] D.A.Milovanovic, Z.S.Bojkovic, From information theory to standard codec optimization for digital visual multimedia, Seminar on Computer science and Applied mathematics - June 2013, Mathematical institute of the Serbian Academy of science and arts, and IEEE Chapter Computer Science (CO-16), Belgrade, Serbia. [8] D.Milovanović, Z.Milićević, Z.Bojković, MPEG video deployment in digital television: HEVC vs. AVC codec performance study, 11th International Conference on Telecommunications in Modern Satellite, Cable and Broadcasting Services TELSIKS2013, Nis, Serbia, Oct. 2013. 25/25

1 - Tinkos

Related documents

Products

Support

1 - Tinkos

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib