Bit allocation in scalar quantization Problem formulation: A block of r.v.’s X1 , X2 , . . . , Xk is scalar quantized. The Xi might have different distributions. Data Compression and Source Coding V: Transform Coding We want to minimize the overall MSE E T. Linder k !" i=1 (Xi − Qi (Xi ))2 where Qi has Ni levels, under the condition that overall no more than B bits are used, i.e. Queen’s University Winter 2016 k " i=1 Data Compression and Source Coding V: Transform Coding # 1 / 59 log Ni ≤ B Data Compression and Source Coding V: Transform Coding 2 / 59 Simplifying assumption: Each Xi has pdf fi and Define Wi (b) = min Q: r(Q)≤b Wi (b) = E[(Xi − Q(Xi ))2 ] If bi bits are used to quantize Xi optimally, the overall distortion is k " Wi (bi ), b = (b1 , . . . , bk ) i=1 ∥fi ∥1/3 = ∥f˜i ∥1/3 σi2 Bit allocation problem: $k Given the constraint i=1 bi ≤ B, find b = (b1 , . . . , bk ) minimizing D(b). Thus Note: The bi are nonnegative integers. If the functions Wi (b) are known, the bit allocation problem can (in principle) be solved by exhaustive search. Data Compression and Source Coding V: Transform Coding i = 1, . . . , k Thus we assume that the high-rate approximation to optimal performance holds with equality. Xi Let σi2 = Var(Xi ) and X̃i = . Then X̃i has unit variance and its σi pdf f˜i (x) = σi f (σi x) satisfies (MSE of the optimal b-bit quantizer for Xi ). D(b) = 1 ∥fi ∥1/3 2−2b , 12 Wi (b) = = 1 ˜ ∥fi ∥1/3 σi2 2−2b 12 % &' ( hi 2 −2b hi σ i 2 Note: hi is invariant to scaling and depends only on the shape of the density. E.g., hi is the same for all normal r.v.’s. 3 / 59 Data Compression and Source Coding V: Transform Coding 4 / 59 We can solve the bit allocation problem analytically if we relax the condition that the bi are nonnegative integers. Observation: The optimal b must satisfy $k Proof: Assume i=1 bi < B. Let Theorem 1 (optimal bit allocation) D(b) = k " b=B− hi σi2 2−2bi i=1 is minimized subject to $k D(b) − D(b̂) i = 1, . . . , k = b̄ = B , k ρ2 = k )* i=1 σi2 +1/k , H= k )* i=1 hi +1/k Data Compression and Source Coding V: Transform Coding $k i=1 b̂i hi σi2 2−2bi h1 σ12 = B. bi i=1 = where k " i=1 bi i=1 and define b̂ = (b1 + b, b2 , . . . , bk ). Then i=1 bi ≤ B if and only if σ2 1 hi 1 bi = b̄ + log2 2i + log2 , 2 ρ 2 H k " $k − = B and k " hi σi2 2−2b̂i i=1 , −2b1 2 − 2−2(b1 +b) > 0 since b > 0. Thus b cannot solve the optimal bit allocation problem. 5 / 59 ! Data Compression and Source Coding V: Transform Coding 6 / 59 Detour: The Lagrange multiplier method Remarks: The Lagrange multiplier theorem yields the equations . -. ∂ , f (x) + λg(x) .. = 0, j = 1, . . . , k ∂xj x=x∗ Assume f : Rk → R and g : Rk → R are continuously differentiable functions. We want to solve the following constrained minimization problem (CMP): minimize f (x) subject to g(x) Let ∇u = , ∂ ∂ ∂x1 u, . . . , ∂xk u - g(x∗ ) = 0 0 We have k + 1 equations for k + 1 unknowns (λ, x∗ ). k denote the gradient of a u : R → R. Assume the CMP has a solution and ∇g(x) ̸= 0 for all x such that g(x) = 0. Then the theorem implies that if (λ, x∗ ) uniquely solve these k + 1 equations, then x∗ is a solution of the CMP. Theorem 2 (Lagrange multiplier) If x∗ is a solution of the CMP and ∇g(x∗ ) ̸= 0, then there exists λ ∈ R, called the Lagrange multiplier, such that In general, the Lagrange multiplier theorem only gives a necessary condition for optimality. ∇f (x∗ ) + λ∇g(x∗ ) = 0 Data Compression and Source Coding V: Transform Coding = 7 / 59 Data Compression and Source Coding V: Transform Coding 8 / 59 Lemma 3 ∗ If ∇g(x∗ ) ̸= 0, then the tangent plane of S at x∗ is given by Proof sketch of Lagrange multiplier theorem: Assume x is a solution of the CMP, i.e., ∗ g(x ) = 0 and T = {y : ∇g(x∗ )y = 0} ∗ f (x ) ≤ f (x) for all x such that g(x) = 0 Now let y ∈ T . By the lemma, there exists a smooth curve x(t) ∈ S such that x(0) = x∗ and x′ (0) = y. Then u(t) = f (x(t)) is differentiable and since x(t) ∈ S and x∗ solves the CMP u(0) = f (x(0)) = f (x∗ ) ≤ f (x(t)) = u(t) ∗ We have to show that if ∇g(x ) ̸= 0, then for some λ ∇f (x∗ ) + λ∇g(x∗ ) = 0 Thus u(t) has a minimum at t = 0, so Let S = {x : g(x) = 0}. Then x∗ ∈ S and S is an (k − 1)-dimensional smooth surface in Rk . The tangent plane T of S at x∗ is the collection of all derivatives x′ (0) of smooth curves x(t), t ∈ R such that x(t) ∈ S and x(0) = x∗ . . 0 = u′ (t).t=0 = = = Data Compression and Source Coding V: Transform Coding 9 / 59 . d f (x(t)).t=0 dt ∇f (x∗ )x′ (0) ∇f (x∗ )y Data Compression and Source Coding V: Transform Coding 10 / 59 Illustration of proof for k = 2 We have shown that ∇g(x∗ )y = 0 ∇f (x∗ )y = 0 =⇒ Let x(t) be a smooth curve in S such that x(0) = x∗ . This means that ∇f (x∗ ) is orthogonal to the subspace T = {y : ∇g(x )y = 0} ∗ Assume ∇f (x ) is not perpendicular to T . Note that T is (k − 1)-dimensional because ∇g(x∗ ) ̸= 0. Hence its orthogonal complement x ∗ x′ (0) ③ f (x(t)) is 1-dimensional. Since T ⊥ includes both ∇g(x∗ ) and ∇f (x∗ ), we must have ∇f (x∗ ) = −λ∇g(x∗ ) T S In this example, ∇f (x∗ )x′ (0) < 0. Using a first order approximation, for small t T ⊥ = {x : xt y = 0 for all y ∈ T } Data Compression and Source Coding V: Transform Coding ✗ ❑ We have x′ (0) ∈ T . ∗ for some λ ∈ R. ∇g(x∗ ) ∇f (x∗ ) = = g(x) = 0 ! d f (x(t))!t=0 t + o(t) dt f (x∗ ) + ∇f (x∗ )x′ (0)t + o(t) f (x(0)) + where o(t)/t → 0 as t → 0. Since ∇f (x∗ )x′ (0) < 0, we obtain f (x(t)) < f (x∗ ) for t > 0 small enough. ! Since x(t) ∈ S, this would contradict the optimality of x∗ . Thus ∇f (x∗ )x′ (0) = 0 must hold, i.e, ∇f (x∗ ) = −λ∇g(x∗ ) for some λ. 11 / 59 Data Compression and Source Coding V: Transform Coding 12 / 59 Proof of bit allocation formula: method with f (b) = D(b) = k " We apply the Lagrange multipliers hi σi2 2−2bi , g(b) = i=1 Define J(b, λ) = k " k " i=1 hi σi2 2−2bi + λ i=1 /" k i=1 bi − B Solving bi − B gives , ∂ J(b, λ) = hj σj2 −2(ln 2)2−2bj + λ = 0 ∂bj bj = 0 λ̂ From the constraint Then for the optimum b there is a λ such that ∂ J(b, λ) = 0, ∂bj B j = 1, . . . , k = 13 / 59 k Substituting into k k 1" 1" log2 hj + log σ 2 + k λ̂ 2 j=1 2 j=1 2 j k )* + 1 )* + 1 log2 hi + log2 σi2 + k λ̂ 2 2 i=1 i=1 Data Compression and Source Coding V: Transform Coding 14 / 59 One can show that D(b) = f (b) is a strictly convex function. This implies (without using the Lagrange multiplier method) that the CMP has a unique solution A shorter, but less intuitive proof of the bit allocation formula can be given using the convexity of the logarithm and Jensen’s inequality. σj2 1 hj +1/k + 2 log2 )1 +1/k k 2 i=1 σi i=1 hi for j = 1, . . . , k. Note that b is optimal since (λ, b) is unique and Data Compression and Source Coding V: Transform Coding = B, λ̂ must satisfy Remarks: yields B 1 + log2 ) 1k k 2 j=1 bj k )* +1/k 1 )* +1/k B 1 − log2 − log2 hi σi2 k 2 2 i=1 i=1 1 1 bj = log2 hj + log2 σj2 + λ̂ 2 2 bj = $k k Data Compression and Source Coding V: Transform Coding λ̂ = = (∗) Fix λ and solve (∗) for b. Then determine λ and b = b(λ) so that the $k solution satisfies the constraint i=1 bi = B. If the solution (λ, b) is unique, the resulting b is optimal. Hence 1 1 2 ln 2 1 log2 hj + log2 σj2 + log2 2 2 &' λ ( %2 ∂ ∂xj g(x) = 1 for all j. ! 15 / 59 Data Compression and Source Coding V: Transform Coding 16 / 59 Minimum distortion: Dopt = k " hi σi2 2−2bi Dealing with the disregarded constraints i=1 = k " −2 b̄+ 12 log2 hi σi2 2 i=1 = k " , − log2 hi σi2 2 σi2 ρ2 σi2 ρ2 + 12 log2 2− log2 hi H hi H bi ≥ 0 for all i: Analytic solutions exist for certain source models, but the resulting expressions are less explicit. - bi must be an integer for all i: Hard in general. However, assuming the high-rate formula it can be shown that a simple greedy bit allocation method yields the optimal solution. 2−2b̄ i=1 = k " Practical approach: Use the analytic solution, set the negative bi ’s to zero, and round the positive bi ’s to the nearest integer. If the rate constraint is exceeded, adjust the bi ’s in some heuristic fashion. ρ2 H2−2b̄ i=1 = kHρ2 2−2b̄ Note: The same distortion is achieved if k random variables with common “shape factor” H and variance ρ2 are optimally scalar quantized at a rate b̄-bits each. Data Compression and Source Coding V: Transform Coding 17 / 59 Transform coding: linear algebra review Data Compression and Source Coding V: Transform Coding 18 / 59 Fact 1: If A is symmetric and nonnegative (positive) definite, then all of its eigenvalues are nonnegative (positive). Fact 2: If A is symmetric and nonnegative definite, then it has k eigenvalues (counting multiplicities) and corresponding k mutually orthogonal eigenvectors. Definitions: Let A be a k × k matrix of real elements. If A is symmetric and xt Ax ≥ 0 for all x ∈ Rk , then A is called nonnegative definite. Fact 3: If A = {aij } is a k × k matrix with eigenvalues λ1 , . . . , λk (counting multiplicities), then If A is symmetric and xt Ax > 0 for all x ̸= 0, then A is called positive definite. A is orthogonal if △ trace(A) = t A =A −1 i.e. t k " aii = i=1 AA = I and where I is the k × k identity matrix. det(A) = k " λi i=1 k * λi i=1 Data Compression and Source Coding V: Transform Coding 19 / 59 Data Compression and Source Coding V: Transform Coding 20 / 59 Fact 7 If X = (X1 , . . . , Xk )t , A is a k × k matrix, and Y = AX, then Fact 4 If A is orthogonal, then it is norm-preserving, i.e. ∥Ax∥ = ∥x∥ (Here ∥y∥ = )$ k i=1 yi2 +1/2 RY = ARX At for all x ∈ Rk for any y = (y1 , . . . , yk )t . ) Proof: We have Fact 5 If A is orthogonal, then | det(A)| = 1. RY Fact 6 Let X = (X1 , . . . , Xk ) = t ⎡ Y1 Y1 ⎢ Y2 Y1 ⎢ E⎢ . ⎣ .. Yk Y1 be a vector of real random variables having finite variance. Then the k × k autocorrelation matrix RX = {E(Xi Xj )} is symmetric and nonnegative definite. Proof: We proved this when we derived the optimal linear predictor coefficients. Y1 Y2 Y2 Y2 .. . ... ... .. . Yk Y2 ... ⎤ Y 1 Yk Y 2 Yk ⎥ ⎥ t .. ⎥ = E[YY ] . ⎦ Yk Y k = E[AX(AX)t ] = E[AXXt At ] = AE[XXt ]At = ARX At ! Data Compression and Source Coding V: Transform Coding 21 / 59 Data Compression and Source Coding V: Transform Coding 22 / 59 Transform coding with scalar quantization Fact 8 If Y = AX, then X det(RY ) = det(A)2 det(RX ) Y Ŷ ✲ Q1 ✲ X1 ✲ If A is orthogonal, then X2 ✲ det(RY ) = det(RX ) .. . ✲ Xk A ✲ Q2 ✲ .. . X̂ ✲X̂ 1 A−1 ✲ Qk ✲ ✲X̂ 2 .. . ✲X̂ t k Proof: Since RY = ARX A , det(RY ) = = det(A) det(RX ) det(At ) A is a k × k orthogonal matrix 2 det(A) det(RX ) X = (X1 , . . . , Xk )t , AX = Y = (Y1 , . . . , Yk )t . The Yi are called the transform coefficients. If A is orthogonal, then det(A)2 = 1 by Fact 5, so det(RY ) = det(RX ). ! Each Qi is an Ni -level scalar quantizer. Ŷ = (Q1 (Y1 ), . . . , Qk (Yk ))t , Data Compression and Source Coding V: Transform Coding 23 / 59 Data Compression and Source Coding V: Transform Coding A−1 Ŷ = X̂ = (X̂1 , . . . , X̂k )t 24 / 59 The end-to-end MSE distortion in transform coding is Dtc = k " i=1 Proof of Proposition: E[(Xi − X̂i )2 ] = E[∥X − X̂∥2 ] ∥Y − Ŷ∥ = = Proposition 1 = In transform coding Dtc = E[∥Y − Ŷ∥2 ] = k " i=1 ∥A(X − X̂)∥ ∥X − X̂∥ (since A−1 Ŷ = X̂) (by Fact 4) Thus ∥X − X̂∥ = ∥Y − Ŷ∥ so that ∥AX − AA−1 Ŷ∥ Dtc = E[∥X − X̂∥2 ] = E[∥Y − Ŷ∥2 ] ! E[(Yi − Qi (Yi ))2 ] What is a good choice for the transform A? Note: The overall mean square reconstruction error is equal to the MSE distortion incurred by quantizing the transform coefficients. Data Compression and Source Coding V: Transform Coding 25 / 59 Data Compression and Source Coding V: Transform Coding 26 / 59 Remarks: Karhunen-Loeve (KL) transform In general T is not unique (different orderings of the eigenvectors give rise to different T’s). Order the eigenvalues k of RX (Fact 1) as In statistics, the transform coefficients (Y1 , . . . , Yk )t = TX are called the principal components of X. λ1 ≥ λ2 ≥ · · · ≥ λk ≥ 0 Since Y = TX = Ut X, we have and let u1 , . . . , uk be the corresponding orthogonal eigenvectors (Fact 2). Yi = uti X Normalize the ui to unit length: ∥ui ∥ = 1, i = 1, . . . , k. and Define U = [u1 u2 . . . uk ] and let X = IX = UUt X T = Ut = T is called the Karhunen-Loeve transform (KLT) matrix for X. U(Y1 , . . . , Yk )t k " Yi ui i=1 Note: T is orthogonal since TTt = Ut U = I. Data Compression and Source Coding V: Transform Coding = Thus the transform coefficient Yi represents the ith coordinate of X with respect to the orthogonal basis u1 , . . . , uk . 27 / 59 Data Compression and Source Coding V: Transform Coding 28 / 59 Proposition 2 The transformation Y = TX decorrelates X, i.e., Y = (Y1 , . . . , Yk ) satisfies E(Yi Yj ) = 0 for all i ̸= j Proof: Since RY = {E(Yi Yj )}, this means t E(Yi Yj ) = λi if i = j 0 if i ̸= j ! Since Y = Ut X, from Fact 7 we have Note: R Y = Ut R X U E(Yi2 ) = λi . Thus the second moment of Yi is the ith eigenvalue of RX . But RX ui = λi ui so The proof demonstrates that any k × k nonnegative definite R with eigenvalues λ1 , . . . , λk can be written as RX U = [RX u1 RX u2 . . . RX uk ] = [λ1 u1 λ2 u2 . . . λk uk ] Hence RY 8 ⎡ ⎤ ut1 ⎢ut2 ⎥ ⎢ ⎥ ⎡ λ1 ⎢0 ⎢ = Ut RX U = ⎢ . ⎥ [λ1 u1 λ2 u2 . . . λk uk ] = ⎢ . ⎣ .. ⎦ ⎣ .. t uk 0 0 λ2 .. . ... ... .. . 0 ... R = UΛUt ⎤ 0 0⎥ ⎥ .. ⎥ .⎦ where U is orthogonal and Λ = diag(λ1 , . . . , λk ) is diagonal. (This is the spectral theorem and it holds more generally, e.g., for symmetric matrices.) λk Data Compression and Source Coding V: Transform Coding 29 / 59 Data Compression and Source Coding V: Transform Coding 30 / 59 We will show that the KLT is “optimal” under certain conditions. We need some auxiliary results. Lemma 4 (Arithmetic-geometric mean inequality) For any sequence of k nonnegative real numbers b1 , . . . , bk Lemma 5 (Hadamard’s Inequality) k k )* +1/k 1" bi bi ≤ k i=1 i=1 Let R be a nonnegative definite matrix with diagonal elements rii , i = 1, . . . , k. Then k * det(R) ≤ rii where equality holds if and only if bi = b, i = 1, . . . , k. i=1 Proof: If bi = 0 for some i, the inequality is trivial. Assuming bi > 0 for all i, the inequality is equivalent to If R is positive definite, then equality holds if and only if R is diagonal. k k )1 " + 1" ln bi ≤ ln bi k i=1 k i=1 which follows from Jensen’s inequality and the convexity of the logarithm. Condition for equality also follows from Jensen. Data Compression and Source Coding V: Transform Coding ! 31 / 59 Data Compression and Source Coding V: Transform Coding 32 / 59 Note that R̂ = SRS with S = diag(s1 , s2 , . . . , sk ) implies Proof of Lemma 5: If R is not positive definite, then det(R) = 0. Since rii ≥ 0 for all i, the inequality holds. r̂ij = si rij sj Assume R is positive definite. Then rii > 0 for all i. 1 and define Let si = √ rii rii so that r̂ii = √ = 1, i = 1, . . . , k √ ( rii )( rii ) R̂ is clearly symmetric and positive definite. Let µ1 , . . . , µk be the (positive) eigenvalues of R̂. Then from Fact 3 and the arithmetic-geometric mean inequality S = diag(s1 , s2 , . . . , sk ) Consider R̂ = SRS det(R̂)1/k = and note that /* k i=1 det(R̂) = det(S)2 det(R) = Thus det(R) ≤ k * i=1 k )* 1 + det(R) r i=1 ii 01/k ≤ k 1" 1 µi = trace(R̂) = 1 k i=1 k (∗) This is equivalent to det(R̂) ≤ 1 (∗∗) i.e., rii ⇔ det(R̂) ≤ 1 det(R) ≤ Data Compression and Source Coding V: Transform Coding 33 / 59 Condition for equality: 1k If R is diagonal, then det(R) = i=1 rii . rii i=1 Data Compression and Source Coding V: Transform Coding 34 / 59 Recall that 2 2 Dtc = E[∥X − X̂∥ ] = E[∥Y − Ŷ∥ ] = Since R̂ is symmetric and positive definite, by the spectral theorem R̂ = UΛUt k " i=1 E[(Yi − Qi (Yi ))2 ] where Y = AX and A is orthogonal. Assumptions: where Λ = diag(µ1 , . . . , µk ) and U is orthogonal. X = (X1 , . . . , Xk )t is a Gaussian random vector having zero mean such that RX is nonsingular. If µi = µ for all i, then Λ = µI and R̂ = U(µI)Ut = µUUt = µI The high-resolution approximation to the optimal scalar quantizer distortion as valid with equality. Thus R = S−1 R̂S−1 = S−1 (µI)S−1 = µS−2 Each Qi is the optimal bi -bit quantizer for Yi . Since The bi are optimally allocated for the constraint where the bi need not be positive integers. −2 S−2 = diag(s−2 1 , . . . , sk ) we obtain that R = µS−2 is diagonal. k * High-resolution optimality of the KLT Conversely, if equality holds in (∗) and (∗∗), then µi = µ for all i = 1, . . . , k. Data Compression and Source Coding V: Transform Coding µi ! 35 / 59 Data Compression and Source Coding V: Transform Coding $k i=1 bi ≤ B, 36 / 59 For any orthogonal A the transform coding distortion is k k k +1/k )* +1/k ) * " σi2 Dtc,A = E[(Yi − Qi (Yi ))2 ] = k hi 2−2b̄ i=1 Key observation: If Y = AX and X is Gaussian, then each Yi is $k Gaussian since Yi = j=1 aij Xj . Thus all the Yi have the same hi = hg coefficient in the high-resolution formula: 2 E[(Yi − Qi (Yi )) ] = = hg = 1 12 ∞/ −∞ 2 1 √ e−x /2 2π 01/3 dx k )* ;3 &' hg σi2 i=1 σi2 hg 2−2bi where σi2 = E(Yi2 ) and 9: i=1 i=1 % +1/k ( khg 2−2b̄ ≥ det(RY )1/k khg 2−2b̄ = det(RX )1/k khg 2−2b̄ (from Hadamard’s inequality) (from Fact 8) Equality holds iff RY is diagonal. If A = T, the KLT, then RY is diagonal from Proposition 2, so Dtc,T = khg 2−2b̄ det(RX )1/k Thus Dtc,A ≥ Dtc,T Data Compression and Source Coding V: Transform Coding 37 / 59 Performance gain: k )* i=1 λi +1/k 2 σX = Thus Scalar quantization without transform coding: Suppose equal 2 variance E(Xi2 ) = σX for all i. If each Xi is independently and optimally scalar quantized, the optimal bit allocation is B k k " i=1 $k 1 2 DPCM σX i=1 λi k =) = + ) +1/k ≥ 1 1/k 1k 1k Dtc i=1 λi i=1 λi The more uneven the distribution of the λi , the more can be gained by transform coding. 2 −2b̄ E[(Xi − Qi (Xi ))2 ] = khg σX 2 Data Compression and Source Coding V: Transform Coding k 1" 1 E(Xi2 ) = trace(RX ) k i=1 k Equality holds if and only if all the λi are equal, which happens if 2 2 and only if RX = diag(σX , . . . , σX ). We obtain that the gain is always greater than 1 unless the Xi are independent. i = 1, . . . , k. The resulting MSE is DPCM = 38 / 59 2 Since E(Xi2 ) = σX for all i where λ1 , . . . , λk are the eigenvalues of RX . bi = b̄ = Data Compression and Source Coding V: Transform Coding Gain of transform coding: Transform coding distortion: With the KLT and optimal bit-allocation Dtc = khg 2−2b̄ det(RX )1/k = khg 2−2b̄ ! 39 / 59 Data Compression and Source Coding V: Transform Coding 40 / 59 Transform coding in practice The KLT depends on the source autocorrelation matrix RX . It has to be recalculated based on estimates of RX if the source statistics changes, which is computationally expensive. The Gaussian source assumption is essential in the KLT optimality proof and the performance gain calculation. The most popular fixed (source-independent) transform is the discrete cosine transform (DCT) given by the transform matrix T = {tmn }k−1 m,n=0 : The assumptions of high-resolution quantization and non-integer bit rates can be dropped. It can be shown without any further conditions that the KLT is the optimal transform for Gaussian random vectors if optimal scalar quantization and bit allocation are used. tmn = Data Compression and Source Coding V: Transform Coding 41 / 59 ⎧@ ,π 2 1 ⎪ ⎪ ⎨ k cos k m(n + 2 ) @ ⎪ , ⎪ ⎩ 1 cos π m(n + 1 ) k k 2 m = 1, . . . , k − 1, n = 0, . . . , k − 1 m = 0, n = 0, . . . , k − 1 Data Compression and Source Coding V: Transform Coding 42 / 59 Can be shown: Advantages of DCT DCT is an orthogonal transform Fixed (signal-independent). The DCT of a vector x = (x0 , . . . , xk−1 ) can be obtained using the FFT: Easy to compute (using FFT). Very good energy compaction for highly correlated data. Define the mirrored sequence y = (x0 , . . . , xk−1 , xk−1 , . . . , x0 ) Good approximation to KLT for Markov sources. Often achieves performance close to that of the KLT. Compute the FFT of y. The DCT coefficients of x are obtained by scaling the first k FFT coefficients of y. Complexity reduced from O(k 2 ) to O(k log k). Data Compression and Source Coding V: Transform Coding 43 / 59 Data Compression and Source Coding V: Transform Coding 44 / 59 Transform coding of images The inverse transform for Y = AXAt We defined transform coding for “one dimensional” signals, but images are two-dimensional. is given by Let X = {Xi,j }k−1 i,j=0 denote a block (matrix) of pixels: ⎡ ⎢ ⎢ X=⎢ ⎣ X0,0 X1,0 .. . X0,1 X1,1 .. . ... ... .. . X0,k−1 X1,k−1 .. . Xk−1,0 Xk−1,1 ... Xk−1,k−1 X = At YA ⎤ Let At = [a0 a1 . . . ak−1 ] (ati is the ith row of A). It is easy to show that k−1 " X= Yi,j Bij ⎥ ⎥ ⎥ ⎦ i,j=0 (Can be a sub-block of a larger image.) where the k × k matrix Bi,j is given by Bi,j = ai atj . Let A be a k × k orthogonal matrix and define the 2-D separable transform by Y = AXAt = A(XAt ) The Bi,j are called basis functions (matrices) for the 2-D separable transform A. It can be shown that the Bi,j form an orthonormal basis for Mk (R), the vector space of k × k real matrices. Interpretation: First the rows of X are transformed by A, then the columns of the resulting matrix are transformed by A. Data Compression and Source Coding V: Transform Coding 45 / 59 Data Compression and Source Coding V: Transform Coding 46 / 59 Generic Transform Image Coder ✲ 2-D Transform ✲ Quantizer ✲ Lossless Encoder Encoder Original Image ❄ ✛ Decoder Example: The 64 basis functions for the 8 × 8 2-D DCT Data Compression and Source Coding V: Transform Coding Inverse ✛ Lossless Inverse ✛ Quantizer 2-D Transform Decoder Reconstructed Image 47 / 59 Data Compression and Source Coding V: Transform Coding 48 / 59 JPEG (Joint Photographic Experts Group) standard For color images, RGB → YCbCr color space conversion is applied first. The luminance (Y) and chrominance (Cb and Cr) components are separately encoded. The chrominance components are usually subsampled by a factor of 2. DCT transform coding on 8 × 8 pixel blocks. Transform coefficients are quantized using uniform quantizers; instead of explicit bit allocation, the step sizes are varied. Performance (color photos): fair - good : 0.25 bits/pixel (bpp) Quantizer output is further compressed by lossless coding: good - very good : 0.5 bpp Runlength coding of quantized coefficients (efficient since many coefficients become zero after quantization); excellent: 1.5 bpp indistinguishable: above 2.5 bpp Huffman coding of runs of zeros and quantizer output indices. Data Compression and Source Coding V: Transform Coding Compare with original 3×8 = 24 bpp. 49 / 59 Data Compression and Source Coding V: Transform Coding JPEG: 0.5 bpp Original: 24 bpp 50 / 59 JPEG: 0.25 bpp JPEG: 1 bpp Note: At 0.25 bpp, the blocking effects are clearly visible. Data Compression and Source Coding V: Transform Coding 51 / 59 Data Compression and Source Coding V: Transform Coding 52 / 59 Subband coding ✲ H0 ✲ ↓2 ✲ ✲ H1 ✲ ↓2 ✲ ✲ H0 ✲ ↓2 ✲ ✲ H1 ✲ ↓2 ✲ Xn ✲ Pass signal through a filter bank and quantize the outputs (subbands) separately. Each subband is down-sampled according to the ratio of the bandwidth of the filter output and input. ✲ H1 Use bit allocation among subband quantizers to minimize overall distortion. In image coding practice, often a cascade of lowpass-highpass filter pairs is used. Data Compression and Source Coding V: Transform Coding ✲ ↓2 ✲ H0 ✲ ↓2 4-level subband decomposition with lowpass-highpass filter pair. 53 / 59 Data Compression and Source Coding V: Transform Coding 54 / 59 Application to image compression: ✲ ↑2 G0 ✲ ↑2 G1 ✲ ↑2 G0 ✲ ↑2 G1 Instead of uniform subbands, first break up signal into high and low frequency subbands using a lowpass-highpass filter pair. Divide only the low frequency subbands into further high and low frequency subbands. ✲ ↑ 2 ✲ G0 ✲ X̂ n Reconstruction reverses process. Begin with the lowest frequency subband and successively add back in higher frequencies. For images, do this separately on rows and columns. ✲ ↑ 2 ✲ G1 “Multiresolution”: natural means of reconstructing larger and higher resolution images working from lower to higher frequencies. Can use e.g. quadrature mirror filters (QMF) of wavelet decomposition (JPEG 2000). Reconstruction from 4-level decomposition with synthesis filter pair. Data Compression and Source Coding V: Transform Coding 55 / 59 Data Compression and Source Coding V: Transform Coding 56 / 59 DCT vs. wavelet-based coding JPEG: 0.25 bpp Single pass of subband decomposition of image using wavelet transform. Note: The highpass components are almost invisible because most of the signal energy is concentrated in the low-low image subband. ⇒ Energy compaction. Data Compression and Source Coding V: Transform Coding JPEG: 0.25 bpp Note: JPEG blocking effects are clearly visible, but some “high frequency” details show better on the JPEG picture. 57 / 59 JPEG2000: 0.25 bpp Note: JPEG2000 is clearly superior here. Data Compression and Source Coding V: Transform Coding JPEG2000: 0.25 bpp 59 / 59 Data Compression and Source Coding V: Transform Coding 58 / 59