Operational Rate-Distortion information theory in optimization of advanced digital video codec Dragorad A. Milovanović Zoran S. Bojković DragoAM@Gmail.com z.bojkovic@yahoo.com University of Belgrade TINKOS 25.09.2013. 1/25 CONTENTS 1. Rate-Distortion theory 1.1 Source coding and R-D function 1.2 Operational R-D framework 1.3 Formulation of efficient video coding 2. Operational control of standard-based encoder 2.1 Operational MPEG framework 2.2 Performance/efficiency of digital video codec 2.3 Bitrate control and joint optimization 2/25 1. Rate-Distortion theory Information Transmission System (message, symbols encoding, entropy) Sˆ S R Source coding: perceptual signals and distortion criterion D ≤ Dmax p s , sˆ d Average distortion: D S , Sˆ i j i j , di j i, j 0 0 s i sˆ j s i sˆ j Rate-distortion theory calculates the minimum transmission bitrate R for a required video quality D. Mutual information is the information that symbols S and symbols Sˆ convey about each other. p s i , sˆ j Average mutual information: I S ; Sˆ H S H S | Sˆ H Sˆ H Sˆ | S p s i , sˆ j p s i p sˆ j s , sˆ Channel coding: channel capacity C is a maximum of mutual information I between source and destination. i j 3/25 1.1 Source coding and R-D function For a given maximum average distortion Dmax , the rate distortion function is lower bound for the transmission bitrate R L D min I S ; Sˆ D D max 1. Shannon lower bound RL(D) assumes statistical independence between distortion and reconstruction. 2. R(D) function is non-increasing and convex function of D. 3. For continuous source S, function R(D) approaches infinity as D approaches zero. 4. For discrete source S, the minimum rate that is required for a lossless transmission is equal to the entropy rate R(0)=H(S) (losseless coding). Stochastic model of Laplacian pdf source (variance σ2=1): DL(R) = e/π・σ2・2−2R Stochastic model of Gauss-Markov source (correlation 0<ρ<0.9): DL(R) = (1- ρ2)・σ2・2−2R 1.2 Operational (R,D) framework In a practical coding framework, structure of the coder is determined and finite set of encoding modes is defined. In addition, it is usually difficult or simply impossible to find closed-form expressions for the R(D) and D(R) functions for general sources. Then, each of encoding parameters choices lead to pair of rate and distortion values of operational point in R-D plane. The lower bound of all these rate-distortion pairs is referred as ORD function. Block diagram for a typical lossy source coding system: block code QN={αN,βN,γN} (N consecutive input samples are independently coded) bitrate R (average number of bits per source symbol) additive distortion measure D (MSE of source/reconstructed symbols) 5/25 Operational R-D function For given source S and code Q, operational point (R,D) is defined R=r(Q) and D=δ(Q). Operational plane R-D is possible partitioned into region of achievable rate distortion points (R,D) if there is a code Q with r(Q)≤R and δ(Q)≤D. The function R(D) that describes this fundamental bound for a given source S is the operational function ORD. The ORD boundary regions of achievable rate distortion points specifies: A. the minimum rate R that is required for representing the source S with a distortion less than or equal to a given value D or, alternatively, B. the minimum distortion D that can be achieved if the source S is coded at a rate less than or equal to a given value R. D=Max Region of achievable rate-distortion points (R,D) Operational R(D) function R=Max Rmin Dmin 6/25 Quantization Uniform scalar quantizer (Δ=const, D~Δ2/12, opt. γ) Non-uniform optimal quantizer (Lloyd–Max centroids of pdf ) Asymptotic performance DL(R) = σ2・εS2・2−2R (Shannon lower bound) 7/25 Etropy coding (γ) Variable length code (VLC ): Huffman code minimize average code length Ls = Σ p(si)・length(si) [bps] Optimal code p*(si) minimize first-order entropy Hs = - Σ p*(si)・log2 p*(si) [bps] K= 2: p(s1) = P1, p(s2) = 1-P1 Hs = - P1 log2 P1 – (1-P1) log2 (1-P1) bits/symbol P1 = 0.5 max Hs =1, Redundancys = log2K - Hs = 0 Arithmetic encoder (CABAC): adaptive estimation of statistical distribution p(si) 8/25 Predictive coding Differential coder Predictive coder (DPCM) Linear prediction Ŝn: prediction coefficient pi prediction error Un reconstruction error U'n Optimal linear prediction (Un orthogonal on Ŝn) Prediction error variance σ' 2 = εα2σ2 ≥ γS2 εα2σS2 , γ=sfm asymptotic performance: Coding gain CG =1/ γS2 N =1: p1,opt =ρ1, CG=1/(1- ρ1 2) N =2: p1,opt =ρ1 (1- ρ2)/(1- ρ1 2), p2,opt =ρ1 (ρ2 - ρ1 2) (1- ρ2)/(1- ρ1 2) 9/25 Transform coding Linear transformation A transformation matrices B inverse matrices A orthogonal matrices A-1 = AT, AT A= A AT = I B orthonormal matrices B = A-1 = AT (sum of N variances of coeff. = variance of s) Optimal linear transformation KLT (eigenvalues of auto-covariance matrices RSS) Asymptotic performance: Coding gain CG =1/ γS2 Optimal bitrate allocation R between N quantizers: N=2 R0 R 2 1 R SS 1 2 log 1 0 2 1 R1 KLT S R 2 1 1 2 1 1 2 log 0 2 1 S 2 2 q0 2 q1 CG 2 0 2 1/ 2 1 1 1 10/25 1.3 Formulation of efficient video coding Standard-based codec requires optimization procedure over a set of allowed operating parameters as well as additional criteria that arise from real-time operations (complexity, delay). The goal of operational information theory is to find a set of operating parameters of the encoder which is optimal in R(D) sense. Also, an efficient optimization procedure based on a fast algorithms solution instead the full search of parameter’s space, is requires. Practical trade-off between the allowed distortion D and available bitrate R in designing an encoder, is based on the discrete optimization procedure of finding a local optimum operational (R, D) points. 11/25 Lagrange multiplier method Formulation of R-D problem: Cost function min D R with constraint R R max D Necessary condition for the existence of a minimum: The solution: R * R max , D min D R max R 0 Unconstrained Lagrangian cost function: min J R , , J D R R , J R , Necessary condition for the existence of a minimum : R 0, 0 J R , 0 The solution is simultaneous iteration of R and λ: D R * 0 , R * R 0 max R * D R R , R * R max 12/25 Geometrical interpretation Operational R-D function is convex border which connects subset of local optimum operational points (connected operational points are suboptimal solution of Lagrange method). Optimal operational point (D,R) as a solution of Lagrange method min(D+ λ R) for constant λ, is operational point on convex border which touches slope λ. 13/25 Optimal bit allocation Formulation: N Optimal bit allocation R i 1 i R max const with constraint min Ri i 1 N Di (Ri ) Unconstrained Lagrangian cost function: min J min Ri Ri i 1 Necessary condition for the existence of a minimum: J R k , R k D i ( R i ) N N i 1 R i , 0 N D i (R i ) i 1 R k 0, J R k , The solution is simultaneous iteration of Ri and λ: D k ( R k ) R k const , * * R R max 0 14/25 0 Joint hierarchical optimization Optimal image decomposition and bitrate allocation: 1. discrete version of Lagrange multiplier method, 2. deterministic dynamic programming (forward/backward). The solution: 1. The image is decomposed to pre-specified number of levels. 2. For the adopted value of quality parameters λ = const, on each level of decomposition is calculated operational point min(D + λR) for each partition and the specified set of quantizers. 3. At each level of decomposition split/merge decision is made (principle of optimality) in the comparison of the Lagrange function of successive levels of decomposition: D c 1 D c 2 R c 1 R c 2 D p R p 4. Binary search (Newton method) determines the optimal λ * for a given bitrate Rmax and the initial search interval l , h R * l R max R * u 15/25 2. Operational control of standard-based encoder Digital video encode exploits statistical redundancy of source as well as perceptual irrelevancy of an user. Block-adaptive hybrid transform-entropy encoder with motion estimation&compensation: Scope of standardization 16/25 2.1 Operational MPEG framework ITU/MPEG process of standardization: Encoding techniques and operational parameters: 17/25 Set of operational parameters The task of an encoder control is to determine the values of the standardized syntax elements, and thus the bitstream b, for a given input sequence in a way that the distortion between the input sequence and its reconstruction is minimized subject to a set of constraints on average and maximum bit rate. R(QP) D(QP) Let Bc be the set of all conforming bitstreams that obey the given set of constraints. For distortion measure D, the optimal bitstream in the rate–distortion sense is given by b * min D s , s ' b bBc Due to the huge parameter space and encoding delay, it is impossible to directly apply the minimization. Instead, the overall minimization problem is split into a series of K smaller minimization problems (p is subset of operational parameters) min D k p , p Pk D s i sˆ i e Rk p Rc iB The constrained minimization problem can be reformulated as an 2 MODE cQ unconstrained minimization, where MOTION MODE min J MOTION , J D ( MAD ) MOTION R ( Q ) min J MODE , J D MODE R ( Q ) Q denotes the quantization step size, which is controlled by the quantization parameter QP. 18/60 2.2 Performance/efficiency of digital video codec 1 1 1 H.265 HD720 2 2 QP=30 BR=512 PSNR= 39.66dB QP=20 BR=512 PSNR= 34.00dB 2 2 QP=30 BR=512 PSNR= 39.36dB 3 HD720 H.263 H.265 HD720 3 1 3 H.263 QP=31 BR=512 PSNR= 30.94dB 3 H.265 QP=30 BR=512 PSNR= 39.24dB 19/60 H.263 QP=25 BR=512 PSNR= 32.78dB Coding gain BRCG, PSNR=const The three test sequences (1/2/3) with typical video conferencing content was selected in experiments (Vidyo 1280x720 60fps x10s). Each test sequence was coded at 12 different bitrates. The ORD function PSNRYUV(BR) are shown for bitrates BR = 0.256, 0.384, 0.512, 0.850, 1.500 Mbps The combined PSNRYUV is first calculated as the weighted sum of the PSNR per picture of the individual components (PSNR) to obtain PSNRYUV = (6·PSNRY + PSNRU + PSNRV)/8 where individual components are computed as PSNR = 10 log10 (2B-1)2/MSE, B=8 1 1 2 3 2 3 BitRate reduction of HEVC vs. AVC based on subjective MOS performance for typical video conferencing bitrates Coding gain PSNRCG, BR=const Variability PSNRY per frame (time) for BR=const (BR~0.512Mbps: QPHEVC=30, QPAVC=32, QPH.263=20/31/25) 1 2 3 21/25 Complexity of encoder/decoder The encoding and decoding times for the representative HD720 sequences (60fps x 10s) are shown.Times are recorded in 10s of seconds such as to illustrate the ratio to real-time operation: the HEVC encoding time exceed 1000 times real-time, the decoding time exceed 4 times real-time on an Ultrabook x86-64 Core i5 2/4@1.7GHz 4GB RAM. 1 2 3 22/25 2.3 Bitrate control The objective of rate control is to regulate the MPEG coded bit stream to satisfy certain given conditions (variable/constant bits budget constraints, buffer over/underflow prevention). Variable/Constant (VBR/CBR) bitrate is under control of constant/variable quantization parameter QP in open/closed loop. A typical rate-control scheme consists of two basic operations: 1.bit allocation (R-D model), and 2.bit rate control (buffer occupancy measure). To achieve the target bit rate R, rate control scheme appropriately chooses a quantization parameter Q . For accuracy, it is of importance R-Q rate-quantization model. Together with distortion-quantization D-Q function, R-Q functions characterize the rate-distortion (R-D) behavior of video encoding. The first step of the derivation of a rate control formula is to approximate the rate-distortion function R-Q by an inverse proportional curve as shown in figure. 23/25 Joint encoding (Det/Stat Mux) Deterministic multiplex of L video sequences, CBR encoded with constant bitrate Ri (variable Di and picture quality) in fixed channel caacity Rc : 2 Ri 7836 1 1 2 Ri 9331 2 2 R 2 Ri 7427 3 3 i R channel , R i const ( CBR ) i Statistical multiplex of L+SMCG video sequences, VBR encoded with variable bitrate Ri (constant Di and picture quality). Criteria are joint buffer occupancy measure 0 B f ( t ) B s 1 2 R 3 i . . . . . . i R channel , R i R channel (VBR ) Xi 24/25 Xi i References [1] K.R.Rao, Z.S.Bojkovic, D.A.Milovanovic, Introduction to multimedia communications: applications – middleware - networking, Wiley, 2005. [2] K.R.Rao, Z.S.Bojkovic, D.A.Milovanovic, Multimedia communication systems: techniques, standards, and networks, Prentice Hall, 2002. [3] Y. Shoham, A Gersho, “Efficient bit allocation for an arbitrary set of quantizers,” IEEE Trans. ASSP, vol.36,pp. 1445-1453,Sep 1988. [4] T. Berger, Rate-Distortion theory: A mathematical theory for data compression, Prentice-Hall, 1971. [5] D.P. Bertsekas, Constrained optimization and Lagrange multiplier methods, Athena Scientific, 1996. [6] R. Bellman, Dynamic Programming, Princeton University Press, 1957. [7] D.A.Milovanovic, Z.S.Bojkovic, From information theory to standard codec optimization for digital visual multimedia, Seminar on Computer science and Applied mathematics - June 2013, Mathematical institute of the Serbian Academy of science and arts, and IEEE Chapter Computer Science (CO-16), Belgrade, Serbia. [8] D.Milovanović, Z.Milićević, Z.Bojković, MPEG video deployment in digital television: HEVC vs. AVC codec performance study, 11th International Conference on Telecommunications in Modern Satellite, Cable and Broadcasting Services TELSIKS2013, Nis, Serbia, Oct. 2013. 25/25