Transform Coding - Queen`s University

advertisement
Bit allocation in scalar quantization
Problem formulation: A block of r.v.’s X1 , X2 , . . . , Xk is scalar
quantized. The Xi might have different distributions.
Data Compression and Source Coding
V: Transform Coding
We want to minimize the overall MSE
E
T. Linder
k
!"
i=1
(Xi − Qi (Xi ))2
where Qi has Ni levels, under the condition that overall no more
than B bits are used, i.e.
Queen’s University
Winter 2016
k
"
i=1
Data Compression and Source Coding V: Transform Coding
#
1 / 59
log Ni ≤ B
Data Compression and Source Coding V: Transform Coding
2 / 59
Simplifying assumption: Each Xi has pdf fi and
Define
Wi (b) =
min
Q: r(Q)≤b
Wi (b) =
E[(Xi − Q(Xi ))2 ]
If bi bits are used to quantize Xi optimally, the overall distortion is
k
"
Wi (bi ),
b = (b1 , . . . , bk )
i=1
∥fi ∥1/3 = ∥f˜i ∥1/3 σi2
Bit allocation problem:
$k
Given the constraint i=1 bi ≤ B, find b = (b1 , . . . , bk ) minimizing
D(b).
Thus
Note: The bi are nonnegative integers. If the functions Wi (b) are known,
the bit allocation problem can (in principle) be solved by exhaustive
search.
Data Compression and Source Coding V: Transform Coding
i = 1, . . . , k
Thus we assume that the high-rate approximation to optimal
performance holds with equality.
Xi
Let σi2 = Var(Xi ) and X̃i =
. Then X̃i has unit variance and its
σi
pdf f˜i (x) = σi f (σi x) satisfies
(MSE of the optimal b-bit quantizer for Xi ).
D(b) =
1
∥fi ∥1/3 2−2b ,
12
Wi (b)
=
=
1 ˜
∥fi ∥1/3 σi2 2−2b
12
% &' (
hi
2 −2b
hi σ i 2
Note: hi is invariant to scaling and depends only on the shape of
the density. E.g., hi is the same for all normal r.v.’s.
3 / 59
Data Compression and Source Coding V: Transform Coding
4 / 59
We can solve the bit allocation problem analytically if we relax the
condition that the bi are nonnegative integers.
Observation: The optimal b must satisfy
$k
Proof: Assume i=1 bi < B. Let
Theorem 1 (optimal bit allocation)
D(b) =
k
"
b=B−
hi σi2 2−2bi
i=1
is minimized subject to
$k
D(b) − D(b̂)
i = 1, . . . , k
=
b̄ =
B
,
k
ρ2 =
k
)*
i=1
σi2
+1/k
,
H=
k
)*
i=1
hi
+1/k
Data Compression and Source Coding V: Transform Coding
$k
i=1 b̂i
hi σi2 2−2bi
h1 σ12
= B.
bi
i=1
=
where
k
"
i=1 bi
i=1
and define b̂ = (b1 + b, b2 , . . . , bk ). Then
i=1 bi ≤ B if and only if
σ2
1
hi
1
bi = b̄ + log2 2i + log2 ,
2
ρ
2
H
k
"
$k
−
= B and
k
"
hi σi2 2−2b̂i
i=1
, −2b1
2
− 2−2(b1 +b) > 0
since b > 0. Thus b cannot solve the optimal bit allocation problem.
5 / 59
!
Data Compression and Source Coding V: Transform Coding
6 / 59
Detour: The Lagrange multiplier method
Remarks:
The Lagrange multiplier theorem yields the equations
.
-.
∂ ,
f (x) + λg(x) ..
= 0,
j = 1, . . . , k
∂xj
x=x∗
Assume f : Rk → R and g : Rk → R are continuously differentiable
functions. We want to solve the following constrained minimization
problem (CMP):
minimize f (x)
subject to g(x)
Let ∇u =
,
∂
∂
∂x1 u, . . . , ∂xk u
-
g(x∗ )
=
0
0
We have k + 1 equations for k + 1 unknowns (λ, x∗ ).
k
denote the gradient of a u : R → R.
Assume the CMP has a solution and ∇g(x) ̸= 0 for all x such that
g(x) = 0. Then the theorem implies that if (λ, x∗ ) uniquely solve
these k + 1 equations, then x∗ is a solution of the CMP.
Theorem 2 (Lagrange multiplier)
If x∗ is a solution of the CMP and ∇g(x∗ ) ̸= 0, then there exists λ ∈ R,
called the Lagrange multiplier, such that
In general, the Lagrange multiplier theorem only gives a necessary
condition for optimality.
∇f (x∗ ) + λ∇g(x∗ ) = 0
Data Compression and Source Coding V: Transform Coding
=
7 / 59
Data Compression and Source Coding V: Transform Coding
8 / 59
Lemma 3
∗
If ∇g(x∗ ) ̸= 0, then the tangent plane of S at x∗ is given by
Proof sketch of Lagrange multiplier theorem: Assume x is a solution of
the CMP, i.e.,
∗
g(x ) = 0 and
T = {y : ∇g(x∗ )y = 0}
∗
f (x ) ≤ f (x) for all x such that g(x) = 0
Now let y ∈ T . By the lemma, there exists a smooth curve x(t) ∈ S
such that x(0) = x∗ and x′ (0) = y.
Then u(t) = f (x(t)) is differentiable and since x(t) ∈ S and x∗ solves
the CMP
u(0) = f (x(0)) = f (x∗ ) ≤ f (x(t)) = u(t)
∗
We have to show that if ∇g(x ) ̸= 0, then for some λ
∇f (x∗ ) + λ∇g(x∗ ) = 0
Thus u(t) has a minimum at t = 0, so
Let S = {x : g(x) = 0}. Then x∗ ∈ S and S is an (k − 1)-dimensional
smooth surface in Rk .
The tangent plane T of S at x∗ is the collection of all derivatives x′ (0)
of smooth curves x(t), t ∈ R such that x(t) ∈ S and x(0) = x∗ .
.
0 = u′ (t).t=0
=
=
=
Data Compression and Source Coding V: Transform Coding
9 / 59
.
d
f (x(t)).t=0
dt
∇f (x∗ )x′ (0)
∇f (x∗ )y
Data Compression and Source Coding V: Transform Coding
10 / 59
Illustration of proof for k = 2
We have shown that
∇g(x∗ )y = 0
∇f (x∗ )y = 0
=⇒
Let x(t) be a smooth curve in S
such that x(0) = x∗ .
This means that ∇f (x∗ ) is orthogonal to the subspace
T = {y : ∇g(x )y = 0}
∗
Assume ∇f (x ) is not
perpendicular to T .
Note that T is (k − 1)-dimensional because ∇g(x∗ ) ̸= 0. Hence its
orthogonal complement
x
∗
x′ (0)
③
f (x(t))
is 1-dimensional. Since T ⊥ includes both ∇g(x∗ ) and ∇f (x∗ ), we must
have
∇f (x∗ ) = −λ∇g(x∗ )
T
S
In this example,
∇f (x∗ )x′ (0) < 0.
Using a first order approximation, for small t
T ⊥ = {x : xt y = 0 for all y ∈ T }
Data Compression and Source Coding V: Transform Coding
✗
❑
We have x′ (0) ∈ T .
∗
for some λ ∈ R.
∇g(x∗ )
∇f (x∗ )
=
=
g(x) = 0
!
d
f (x(t))!t=0 t + o(t)
dt
f (x∗ ) + ∇f (x∗ )x′ (0)t + o(t)
f (x(0)) +
where o(t)/t → 0 as t → 0. Since ∇f (x∗ )x′ (0) < 0, we obtain
f (x(t)) < f (x∗ ) for t > 0 small enough.
!
Since x(t) ∈ S, this would contradict the optimality of x∗ . Thus
∇f (x∗ )x′ (0) = 0 must hold, i.e, ∇f (x∗ ) = −λ∇g(x∗ ) for some λ.
11 / 59
Data Compression and Source Coding V: Transform Coding
12 / 59
Proof of bit allocation formula:
method with
f (b) = D(b) =
k
"
We apply the Lagrange multipliers
hi σi2 2−2bi ,
g(b) =
i=1
Define
J(b, λ) =
k
"
k
"
i=1
hi σi2 2−2bi + λ
i=1
/"
k
i=1
bi − B
Solving
bi − B
gives
,
∂
J(b, λ) = hj σj2 −2(ln 2)2−2bj + λ = 0
∂bj
bj =
0
λ̂
From the constraint
Then for the optimum b there is a λ such that
∂
J(b, λ) = 0,
∂bj
B
j = 1, . . . , k
=
13 / 59
k
Substituting into
k
k
1"
1"
log2 hj +
log σ 2 + k λ̂
2 j=1
2 j=1 2 j
k
)* + 1
)* +
1
log2
hi + log2
σi2 + k λ̂
2
2
i=1
i=1
Data Compression and Source Coding V: Transform Coding
14 / 59
One can show that D(b) = f (b) is a strictly convex function. This
implies (without using the Lagrange multiplier method) that the
CMP has a unique solution
A shorter, but less intuitive proof of the bit allocation formula can be
given using the convexity of the logarithm and Jensen’s inequality.
σj2
1
hj
+1/k + 2 log2 )1
+1/k
k
2
i=1 σi
i=1 hi
for j = 1, . . . , k.
Note that b is optimal since (λ, b) is unique and
Data Compression and Source Coding V: Transform Coding
= B, λ̂ must satisfy
Remarks:
yields
B
1
+ log2 )
1k
k
2
j=1 bj
k
)* +1/k 1
)* +1/k
B
1
− log2
− log2
hi
σi2
k
2
2
i=1
i=1
1
1
bj = log2 hj + log2 σj2 + λ̂
2
2
bj =
$k
k
Data Compression and Source Coding V: Transform Coding
λ̂ =
=
(∗)
Fix λ and solve (∗) for b. Then determine λ and b = b(λ) so that the
$k
solution satisfies the constraint i=1 bi = B. If the solution (λ, b) is
unique, the resulting b is optimal.
Hence
1
1
2 ln 2
1
log2 hj + log2 σj2 + log2
2
2
&' λ (
%2
∂
∂xj g(x)
= 1 for all j. !
15 / 59
Data Compression and Source Coding V: Transform Coding
16 / 59
Minimum distortion:
Dopt
=
k
"
hi σi2 2−2bi
Dealing with the disregarded constraints
i=1
=
k
"
−2 b̄+ 12 log2
hi σi2 2
i=1
=
k
"
,
− log2
hi σi2 2
σi2
ρ2
σi2
ρ2
+ 12 log2
2− log2
hi
H
hi
H
bi ≥ 0 for all i: Analytic solutions exist for certain source models,
but the resulting expressions are less explicit.
-
bi must be an integer for all i: Hard in general. However, assuming
the high-rate formula it can be shown that a simple greedy bit
allocation method yields the optimal solution.
2−2b̄
i=1
=
k
"
Practical approach: Use the analytic solution, set the negative bi ’s to
zero, and round the positive bi ’s to the nearest integer. If the rate
constraint is exceeded, adjust the bi ’s in some heuristic fashion.
ρ2 H2−2b̄
i=1
=
kHρ2 2−2b̄
Note: The same distortion is achieved if k random variables with
common “shape factor” H and variance ρ2 are optimally scalar quantized
at a rate b̄-bits each.
Data Compression and Source Coding V: Transform Coding
17 / 59
Transform coding: linear algebra review
Data Compression and Source Coding V: Transform Coding
18 / 59
Fact 1: If A is symmetric and nonnegative (positive) definite, then all
of its eigenvalues are nonnegative (positive).
Fact 2: If A is symmetric and nonnegative definite, then it has k
eigenvalues (counting multiplicities) and corresponding k mutually
orthogonal eigenvectors.
Definitions: Let A be a k × k matrix of real elements.
If A is symmetric and xt Ax ≥ 0 for all x ∈ Rk , then A is called
nonnegative definite.
Fact 3: If A = {aij } is a k × k matrix with eigenvalues λ1 , . . . , λk
(counting multiplicities), then
If A is symmetric and xt Ax > 0 for all x ̸= 0, then A is called
positive definite.
A is orthogonal if
△
trace(A) =
t
A =A
−1
i.e.
t
k
"
aii =
i=1
AA = I
and
where I is the k × k identity matrix.
det(A) =
k
"
λi
i=1
k
*
λi
i=1
Data Compression and Source Coding V: Transform Coding
19 / 59
Data Compression and Source Coding V: Transform Coding
20 / 59
Fact 7 If X = (X1 , . . . , Xk )t , A is a k × k matrix, and Y = AX, then
Fact 4 If A is orthogonal, then it is norm-preserving, i.e.
∥Ax∥ = ∥x∥
(Here ∥y∥ =
)$
k
i=1
yi2
+1/2
RY = ARX At
for all x ∈ Rk
for any y = (y1 , . . . , yk )t . )
Proof: We have
Fact 5 If A is orthogonal, then | det(A)| = 1.
RY
Fact 6 Let
X = (X1 , . . . , Xk )
=
t
⎡
Y1 Y1
⎢ Y2 Y1
⎢
E⎢ .
⎣ ..
Yk Y1
be a vector of real random variables having finite variance. Then the
k × k autocorrelation matrix RX = {E(Xi Xj )} is symmetric and
nonnegative definite.
Proof: We proved this when we derived the optimal linear predictor
coefficients.
Y1 Y2
Y2 Y2
..
.
...
...
..
.
Yk Y2
...
⎤
Y 1 Yk
Y 2 Yk ⎥
⎥
t
.. ⎥ = E[YY ]
. ⎦
Yk Y k
=
E[AX(AX)t ] = E[AXXt At ]
=
AE[XXt ]At
=
ARX At
!
Data Compression and Source Coding V: Transform Coding
21 / 59
Data Compression and Source Coding V: Transform Coding
22 / 59
Transform coding with scalar quantization
Fact 8 If Y = AX, then
X
det(RY ) = det(A)2 det(RX )
Y
Ŷ
✲ Q1 ✲
X1 ✲
If A is orthogonal, then
X2 ✲
det(RY ) = det(RX )
..
.
✲
Xk
A
✲ Q2 ✲
..
.
X̂
✲X̂
1
A−1
✲ Qk ✲
✲X̂
2
..
.
✲X̂
t
k
Proof: Since RY = ARX A ,
det(RY )
=
=
det(A) det(RX ) det(At )
A is a k × k orthogonal matrix
2
det(A) det(RX )
X = (X1 , . . . , Xk )t , AX = Y = (Y1 , . . . , Yk )t .
The Yi are called the transform coefficients.
If A is orthogonal, then det(A)2 = 1 by Fact 5, so det(RY ) = det(RX ).
!
Each Qi is an Ni -level scalar quantizer.
Ŷ = (Q1 (Y1 ), . . . , Qk (Yk ))t ,
Data Compression and Source Coding V: Transform Coding
23 / 59
Data Compression and Source Coding V: Transform Coding
A−1 Ŷ = X̂ = (X̂1 , . . . , X̂k )t
24 / 59
The end-to-end MSE distortion in transform coding is
Dtc =
k
"
i=1
Proof of Proposition:
E[(Xi − X̂i )2 ] = E[∥X − X̂∥2 ]
∥Y − Ŷ∥ =
=
Proposition 1
=
In transform coding
Dtc = E[∥Y − Ŷ∥2 ] =
k
"
i=1
∥A(X − X̂)∥
∥X − X̂∥
(since A−1 Ŷ = X̂)
(by Fact 4)
Thus
∥X − X̂∥ = ∥Y − Ŷ∥
so that
∥AX − AA−1 Ŷ∥
Dtc = E[∥X − X̂∥2 ] = E[∥Y − Ŷ∥2 ]
!
E[(Yi − Qi (Yi ))2 ]
What is a good choice for the transform A?
Note: The overall mean square reconstruction error is equal to the MSE
distortion incurred by quantizing the transform coefficients.
Data Compression and Source Coding V: Transform Coding
25 / 59
Data Compression and Source Coding V: Transform Coding
26 / 59
Remarks:
Karhunen-Loeve (KL) transform
In general T is not unique (different orderings of the eigenvectors
give rise to different T’s).
Order the eigenvalues k of RX (Fact 1) as
In statistics, the transform coefficients (Y1 , . . . , Yk )t = TX are
called the principal components of X.
λ1 ≥ λ2 ≥ · · · ≥ λk ≥ 0
Since Y = TX = Ut X, we have
and let u1 , . . . , uk be the corresponding orthogonal eigenvectors
(Fact 2).
Yi = uti X
Normalize the ui to unit length: ∥ui ∥ = 1, i = 1, . . . , k.
and
Define U = [u1 u2 . . . uk ] and let
X = IX = UUt X
T = Ut
=
T is called the Karhunen-Loeve transform (KLT) matrix for X.
U(Y1 , . . . , Yk )t
k
"
Yi ui
i=1
Note: T is orthogonal since TTt = Ut U = I.
Data Compression and Source Coding V: Transform Coding
=
Thus the transform coefficient Yi represents the ith coordinate of X
with respect to the orthogonal basis u1 , . . . , uk .
27 / 59
Data Compression and Source Coding V: Transform Coding
28 / 59
Proposition 2
The transformation Y = TX decorrelates X, i.e., Y = (Y1 , . . . , Yk )
satisfies
E(Yi Yj ) = 0 for all i ̸= j
Proof:
Since RY = {E(Yi Yj )}, this means
t
E(Yi Yj ) =
λi
if i = j
0
if i ̸= j
!
Since Y = Ut X, from Fact 7 we have
Note:
R Y = Ut R X U
E(Yi2 ) = λi . Thus the second moment of Yi is the ith eigenvalue
of RX .
But RX ui = λi ui so
The proof demonstrates that any k × k nonnegative definite R with
eigenvalues λ1 , . . . , λk can be written as
RX U = [RX u1 RX u2 . . . RX uk ] = [λ1 u1 λ2 u2 . . . λk uk ]
Hence
RY
8
⎡
⎤
ut1
⎢ut2 ⎥
⎢ ⎥
⎡
λ1
⎢0
⎢
= Ut RX U = ⎢ . ⎥ [λ1 u1 λ2 u2 . . . λk uk ] = ⎢ .
⎣ .. ⎦
⎣ ..
t
uk
0
0
λ2
..
.
...
...
..
.
0
...
R = UΛUt
⎤
0
0⎥
⎥
.. ⎥
.⎦
where U is orthogonal and Λ = diag(λ1 , . . . , λk ) is diagonal.
(This is the spectral theorem and it holds more generally, e.g., for
symmetric matrices.)
λk
Data Compression and Source Coding V: Transform Coding
29 / 59
Data Compression and Source Coding V: Transform Coding
30 / 59
We will show that the KLT is “optimal” under certain conditions. We
need some auxiliary results.
Lemma 4 (Arithmetic-geometric mean inequality)
For any sequence of k nonnegative real numbers b1 , . . . , bk
Lemma 5 (Hadamard’s Inequality)
k
k
)*
+1/k
1"
bi
bi
≤
k i=1
i=1
Let R be a nonnegative definite matrix with diagonal elements rii ,
i = 1, . . . , k. Then
k
*
det(R) ≤
rii
where equality holds if and only if bi = b, i = 1, . . . , k.
i=1
Proof: If bi = 0 for some i, the inequality is trivial. Assuming bi > 0 for
all i, the inequality is equivalent to
If R is positive definite, then equality holds if and only if R is diagonal.
k
k
)1 "
+
1"
ln bi ≤ ln
bi
k i=1
k i=1
which follows from Jensen’s inequality and the convexity of the
logarithm. Condition for equality also follows from Jensen.
Data Compression and Source Coding V: Transform Coding
!
31 / 59
Data Compression and Source Coding V: Transform Coding
32 / 59
Note that R̂ = SRS with S = diag(s1 , s2 , . . . , sk ) implies
Proof of Lemma 5: If R is not positive definite, then det(R) = 0.
Since rii ≥ 0 for all i, the inequality holds.
r̂ij = si rij sj
Assume R is positive definite. Then rii > 0 for all i.
1
and define
Let si = √
rii
rii
so that r̂ii = √
= 1, i = 1, . . . , k
√
( rii )( rii )
R̂ is clearly symmetric and positive definite. Let µ1 , . . . , µk be the
(positive) eigenvalues of R̂. Then from Fact 3 and the
arithmetic-geometric mean inequality
S = diag(s1 , s2 , . . . , sk )
Consider
R̂ = SRS
det(R̂)1/k =
and note that
/*
k
i=1
det(R̂) = det(S)2 det(R) =
Thus
det(R) ≤
k
*
i=1
k
)*
1 +
det(R)
r
i=1 ii
01/k
≤
k
1"
1
µi = trace(R̂) = 1
k i=1
k
(∗)
This is equivalent to
det(R̂) ≤ 1
(∗∗)
i.e.,
rii
⇔
det(R̂) ≤ 1
det(R) ≤
Data Compression and Source Coding V: Transform Coding
33 / 59
Condition for equality:
1k
If R is diagonal, then det(R) = i=1 rii .
rii
i=1
Data Compression and Source Coding V: Transform Coding
34 / 59
Recall that
2
2
Dtc = E[∥X − X̂∥ ] = E[∥Y − Ŷ∥ ] =
Since R̂ is symmetric and positive definite, by the spectral theorem
R̂ = UΛUt
k
"
i=1
E[(Yi − Qi (Yi ))2 ]
where Y = AX and A is orthogonal.
Assumptions:
where Λ = diag(µ1 , . . . , µk ) and U is orthogonal.
X = (X1 , . . . , Xk )t is a Gaussian random vector having zero mean
such that RX is nonsingular.
If µi = µ for all i, then Λ = µI and
R̂ = U(µI)Ut = µUUt = µI
The high-resolution approximation to the optimal scalar quantizer
distortion as valid with equality.
Thus
R = S−1 R̂S−1 = S−1 (µI)S−1 = µS−2
Each Qi is the optimal bi -bit quantizer for Yi .
Since
The bi are optimally allocated for the constraint
where the bi need not be positive integers.
−2
S−2 = diag(s−2
1 , . . . , sk )
we obtain that R = µS−2 is diagonal.
k
*
High-resolution optimality of the KLT
Conversely, if equality holds in (∗) and (∗∗), then µi = µ for all
i = 1, . . . , k.
Data Compression and Source Coding V: Transform Coding
µi
!
35 / 59
Data Compression and Source Coding V: Transform Coding
$k
i=1 bi
≤ B,
36 / 59
For any orthogonal A the transform coding distortion is
k
k
k
+1/k
)*
+1/k ) *
"
σi2
Dtc,A =
E[(Yi − Qi (Yi ))2 ] = k
hi
2−2b̄
i=1
Key observation: If Y = AX and X is Gaussian, then each Yi is
$k
Gaussian since Yi = j=1 aij Xj .
Thus all the Yi have the same hi = hg coefficient in the high-resolution
formula:
2
E[(Yi − Qi (Yi )) ] =
=
hg =
1
12
∞/
−∞
2
1
√ e−x /2
2π
01/3
dx
k
)*
;3
&'
hg
σi2
i=1
σi2 hg 2−2bi
where σi2 = E(Yi2 ) and
9:
i=1
i=1
%
+1/k
(
khg 2−2b̄
≥
det(RY )1/k khg 2−2b̄
=
det(RX )1/k khg 2−2b̄
(from Hadamard’s inequality)
(from Fact 8)
Equality holds iff RY is diagonal.
If A = T, the KLT, then RY is diagonal from Proposition 2, so
Dtc,T = khg 2−2b̄ det(RX )1/k
Thus
Dtc,A ≥ Dtc,T
Data Compression and Source Coding V: Transform Coding
37 / 59
Performance gain:
k
)*
i=1
λi
+1/k
2
σX
=
Thus
Scalar quantization without transform coding: Suppose equal
2
variance E(Xi2 ) = σX
for all i. If each Xi is independently and
optimally scalar quantized, the optimal bit allocation is
B
k
k
"
i=1
$k
1
2
DPCM
σX
i=1 λi
k
=)
=
+
)
+1/k ≥ 1
1/k
1k
1k
Dtc
i=1 λi
i=1 λi
The more uneven the distribution of the λi , the more can be gained
by transform coding.
2 −2b̄
E[(Xi − Qi (Xi ))2 ] = khg σX
2
Data Compression and Source Coding V: Transform Coding
k
1"
1
E(Xi2 ) = trace(RX )
k i=1
k
Equality holds if and only if all the λi are equal, which happens if
2
2
and only if RX = diag(σX
, . . . , σX
).
We obtain that the gain is always greater than 1 unless the Xi are
independent.
i = 1, . . . , k.
The resulting MSE is
DPCM =
38 / 59
2
Since E(Xi2 ) = σX
for all i
where λ1 , . . . , λk are the eigenvalues of RX .
bi = b̄ =
Data Compression and Source Coding V: Transform Coding
Gain of transform coding:
Transform coding distortion: With the KLT and optimal
bit-allocation
Dtc = khg 2−2b̄ det(RX )1/k = khg 2−2b̄
!
39 / 59
Data Compression and Source Coding V: Transform Coding
40 / 59
Transform coding in practice
The KLT depends on the source autocorrelation matrix RX . It has
to be recalculated based on estimates of RX if the source statistics
changes, which is computationally expensive.
The Gaussian source assumption is essential in the KLT optimality
proof and the performance gain calculation.
The most popular fixed (source-independent) transform is the
discrete cosine transform (DCT) given by the transform matrix
T = {tmn }k−1
m,n=0 :
The assumptions of high-resolution quantization and non-integer bit
rates can be dropped. It can be shown without any further
conditions that the KLT is the optimal transform for Gaussian
random vectors if optimal scalar quantization and bit allocation are
used.
tmn =
Data Compression and Source Coding V: Transform Coding
41 / 59
⎧@
,π
2
1
⎪
⎪
⎨ k cos k m(n + 2 )
@
⎪
,
⎪
⎩ 1 cos π m(n + 1 )
k
k
2
m = 1, . . . , k − 1, n = 0, . . . , k − 1
m = 0, n = 0, . . . , k − 1
Data Compression and Source Coding V: Transform Coding
42 / 59
Can be shown:
Advantages of DCT
DCT is an orthogonal transform
Fixed (signal-independent).
The DCT of a vector x = (x0 , . . . , xk−1 ) can be obtained using the
FFT:
Easy to compute (using FFT).
Very good energy compaction for highly correlated data.
Define the mirrored sequence y = (x0 , . . . , xk−1 , xk−1 , . . . , x0 )
Good approximation to KLT for Markov sources. Often achieves
performance close to that of the KLT.
Compute the FFT of y. The DCT coefficients of x are
obtained by scaling the first k FFT coefficients of y.
Complexity reduced from O(k 2 ) to O(k log k).
Data Compression and Source Coding V: Transform Coding
43 / 59
Data Compression and Source Coding V: Transform Coding
44 / 59
Transform coding of images
The inverse transform for
Y = AXAt
We defined transform coding for “one dimensional” signals, but
images are two-dimensional.
is given by
Let X = {Xi,j }k−1
i,j=0 denote a block (matrix) of pixels:
⎡
⎢
⎢
X=⎢
⎣
X0,0
X1,0
..
.
X0,1
X1,1
..
.
...
...
..
.
X0,k−1
X1,k−1
..
.
Xk−1,0
Xk−1,1
...
Xk−1,k−1
X = At YA
⎤
Let At = [a0 a1 . . . ak−1 ] (ati is the ith row of A). It is easy to
show that
k−1
"
X=
Yi,j Bij
⎥
⎥
⎥
⎦
i,j=0
(Can be a sub-block of a larger image.)
where the k × k matrix Bi,j is given by Bi,j = ai atj .
Let A be a k × k orthogonal matrix and define the 2-D separable
transform by
Y = AXAt = A(XAt )
The Bi,j are called basis functions (matrices) for the 2-D separable
transform A.
It can be shown that the Bi,j form an orthonormal basis for Mk (R),
the vector space of k × k real matrices.
Interpretation: First the rows of X are transformed by A, then the
columns of the resulting matrix are transformed by A.
Data Compression and Source Coding V: Transform Coding
45 / 59
Data Compression and Source Coding V: Transform Coding
46 / 59
Generic Transform Image Coder
✲ 2-D Transform ✲ Quantizer ✲ Lossless
Encoder
Encoder
Original Image
❄
✛
Decoder
Example: The 64 basis functions for the 8 × 8 2-D DCT
Data Compression and Source Coding V: Transform Coding
Inverse ✛ Lossless
Inverse
✛
Quantizer
2-D Transform
Decoder
Reconstructed Image
47 / 59
Data Compression and Source Coding V: Transform Coding
48 / 59
JPEG (Joint Photographic Experts Group) standard
For color images, RGB → YCbCr color space conversion is applied first.
The luminance (Y) and chrominance (Cb and Cr) components are
separately encoded. The chrominance components are usually
subsampled by a factor of 2.
DCT transform coding on 8 × 8 pixel blocks.
Transform coefficients are quantized using uniform quantizers;
instead of explicit bit allocation, the step sizes are varied.
Performance (color photos):
fair - good : 0.25 bits/pixel (bpp)
Quantizer output is further compressed by lossless coding:
good - very good : 0.5 bpp
Runlength coding of quantized coefficients (efficient since many
coefficients become zero after quantization);
excellent: 1.5 bpp
indistinguishable: above 2.5 bpp
Huffman coding of runs of zeros and quantizer output indices.
Data Compression and Source Coding V: Transform Coding
Compare with original 3×8 = 24 bpp.
49 / 59
Data Compression and Source Coding V: Transform Coding
JPEG: 0.5 bpp
Original: 24 bpp
50 / 59
JPEG: 0.25 bpp
JPEG: 1 bpp
Note: At 0.25 bpp, the blocking effects are clearly visible.
Data Compression and Source Coding V: Transform Coding
51 / 59
Data Compression and Source Coding V: Transform Coding
52 / 59
Subband coding
✲ H0
✲ ↓2
✲
✲ H1
✲ ↓2
✲
✲ H0
✲ ↓2
✲
✲ H1
✲ ↓2
✲
Xn ✲
Pass signal through a filter bank and quantize the outputs
(subbands) separately. Each subband is down-sampled according to
the ratio of the bandwidth of the filter output and input.
✲ H1
Use bit allocation among subband quantizers to minimize overall
distortion.
In image coding practice, often a cascade of lowpass-highpass filter
pairs is used.
Data Compression and Source Coding V: Transform Coding
✲ ↓2
✲ H0
✲ ↓2
4-level subband decomposition with lowpass-highpass filter pair.
53 / 59
Data Compression and Source Coding V: Transform Coding
54 / 59
Application to image compression:
✲ ↑2
G0
✲ ↑2
G1
✲ ↑2
G0
✲ ↑2
G1
Instead of uniform subbands, first break up signal into high and low
frequency subbands using a lowpass-highpass filter pair. Divide only
the low frequency subbands into further high and low frequency
subbands.
✲ ↑ 2 ✲ G0
✲ X̂
n
Reconstruction reverses process. Begin with the lowest frequency
subband and successively add back in higher frequencies.
For images, do this separately on rows and columns.
✲ ↑ 2 ✲ G1
“Multiresolution”: natural means of reconstructing larger and higher
resolution images working from lower to higher frequencies.
Can use e.g. quadrature mirror filters (QMF) of wavelet
decomposition (JPEG 2000).
Reconstruction from 4-level decomposition with synthesis filter pair.
Data Compression and Source Coding V: Transform Coding
55 / 59
Data Compression and Source Coding V: Transform Coding
56 / 59
DCT vs. wavelet-based coding
JPEG: 0.25 bpp
Single pass of subband decomposition of image using wavelet transform.
Note: The highpass components are almost invisible because most of the
signal energy is concentrated in the low-low image subband.
⇒ Energy compaction.
Data Compression and Source Coding V: Transform Coding
JPEG: 0.25 bpp
Note: JPEG blocking effects are clearly visible, but some “high
frequency” details show better on the JPEG picture.
57 / 59
JPEG2000: 0.25 bpp
Note: JPEG2000 is clearly superior here.
Data Compression and Source Coding V: Transform Coding
JPEG2000: 0.25 bpp
59 / 59
Data Compression and Source Coding V: Transform Coding
58 / 59
Download