imagecompression

advertisement
Image Compression
Jin-Zuo Liu
E-mail: r99942103@ntu.edu.tw
Graduate Institute of Communication Engineering
National Taiwan University, Taipei, Taiwan, ROC
Abstract
Image compression, the technology of reducing the amount of data required to
represent an image ,is one of the most important application in the field of digital
processing. In this tutorial, we introduce the theory and practice of digital image
compression. We examine the most frequently used compression techniques and describe
the industry standards that make them useful.
1. The Introduction Of Digital Image
The digital images can be generate by capturing the scene using camera or
composited by computer software. The digital image consist of three matrices
composed of integer-sampled values, and each matrix consists of one component of
the color tristimulus – red, green, and blue. In general, we will not process the RGB
components directly because the perception of the Human Visual System (HVS)
cannot be efficiently exploited in the RGB color space .Instead, we transform the
color components from RGB color space into the luminance and chrominance color
space. In digital image compression, we often adopt the YCbCr color space as our
three new primaries. Y is the luminance of the image which represents the brightness,
while Cb and Cr are the chrominance of the image which represents the color
information. We regards Cb as the difference between the gray and blue, and Cr as the
difference between the difference between the gray and red. But, if we want to
reconstruct the digital image, we intend to transform the color components from
YCbCr color space into the RGB color space. Because it is simple for us to retrieve
the image by remixing the frequencies of color components which are generated from
vibrating.The transform formula from RGB to YCbCr is defined in (1.1).
1
Y   0.299 0.587 0.114   R  0 
Cb    0.169 0.334 0.500  G   128
  
   
Cr   0.500 0.419 0.081  B  128
(1.1)
According to the HVS, we know that the human eyes are more sensitive to the
luminance than the chrominance. Therefore, we can reduce the accuracy of the
chrominance to achieve data compaction through subsampling, and the difference is
not easily perceived by the human eyes. There are four frequently used YCbCr
subsampling formats, and all of them are shown in Fig. 1.1. It is worth noting that the
name of the format is not always related to the subsampling ratio.
(a) 4 : 4 : 4
W
H
Y
(b) 4 : 2 : 2
W
H
(c) 4 : 2 : 0
W
Y
H
W/2
W
W/2
H/2
H
Cb
H
W
Cr
W/2
W/2
H
Cb
Cb
H/2
H
Y
H
Y
W/4
H C
b
W/4
Cr
Cr
Fig. 1.1.1
(d) 4 : 1 : 1
W
H Cr
YCbCr formats
2. Basic Image Compression Model
As Fig2.1 shows, an image compression system is consists of two distinct functional
components: an encoder and a decoder. The encoder performs compression, and the
decoder performs the complementary operation of decompression. In the encoding
process, the first stage transform the image into a non-visual format which contains
reduced inter-pixel redundancies compared to that in the original input image. Most
transform operations employed in this stage are usually reversible. A quantizer is
employed in the second stage in order to reduced the perceptual redundancies of the
input image. This process is, however, irreversible and it is not used in error-free
2
compression. In the last stage of the basic image compression model, a system
encoder creates a fixed- or variable-length code to represent the quantized output and
maps the output in accordance with the code. A variable-length code is used to
represent the mapped and quantized data set in most case .The encoder assigns the
shortest code words to the most frequently occurring output values and thus reduces
coding redundancy. This operation is reversible. In the decoding process, the blocks
are performed in reverse order. The reverse quantization is not included in the overall
decoding process because it cause irreversible information loss.
Encoding Process
Input Image
Quantization
Transform Coding
Symbol
Encoding
Bit
stream
(a)
Decoding Process
Bit
stream
Symbol
Decoding
Transform
Decoding
Recovered
Image
(b)
Fig. 2.1 Basic image compression model
There are many algorithms for practicing image compression in the fact. In order to
evaluate the performance of image compression algorithms, we must define some
measures. The compression ratio (CR), which is used to quantify the reduction in the
quantity of data, is defined as (2.1).
n
CR  1
n2
(2.1)
where n1 is the data quantity of original image, and n2 is the data quantity of the
generated bitstream.
In addition, the two well-known and widely used measures in the field of image
compression are the Root Mean Square Error (RMSE) and the Peak-to-Signal Ratio
(PSNR),
which
are
defined
in
(2.2)
and
(2.3),
respectively.
3
W 1 H 1
RMSE 
  f ( x, y)  f '( x, y)
2
x 0 y 0
WH
PSNR  20 log 10
(2.2)
255
MSE
(2.3)
where f ( x, y ) is the pixel value of the original image, and f '( x, y ) is the pixel value of
the compressed image. W and H are the width and height of the original image,
respectively.
The higher the value of CR is, the larger the amount of reduction in the data quantity
can be achieved. The lower the value of RMSE and the larger the value of PSNR is,
the better the quality of the compressed image is.
3. Transform Coding
The reason for applying transform coding is to reduce the correlation between the
neighboring pixels in the image through coordinate rotation. First, we discussed the
Karhunen-Loeve Transform.
Karhunen-Loeve Transform, abbreviated KLT, is the optimal transform coding
method that can reduce the correlation mostly.
Suppose M is the number of total blocks in an image and N is the number of total
pixels within each block, pixels in each block can be expressed in the following 1-D
vector form:
 x (1)   x (1) x (1) x (1)  t
2
N 
 1



t
 (m)
(m)
x2( m ) xN( m ) 
 x   x1


t
 (M )
(M )
x2( M ) xN( M ) 
 x   x1
(3.1)
The general orthogonal transform equation to be perform on the divided blocks is
y ( m)  V t x( m)
(m  1, 2,
,M).
(3.2)
Optimal orthogonal transform reduces the correlation among the N pixels in each
block to the greatest extent. The definition of covariance can be applied here to
4
provide a good measurement on the correlation between pixels xi and xj or the
correlation between the transform coefficients yi and yj:
Cyi y j  Em ( yi( m)  yi( m) )( y (jm)  y (jm) ) 


(3.3)
Cxi x j  Em ( xi( m)  xi( m) )( x(jm)  x(jm) ) 


(3.4)
where
N
N
yi( m )   vni xn( m )
yi( m )   vni xn( m )
n 1
n 1
(3.5)
.
To simplify the calculation, we assume that
xi( m)  xi( m)  xi( m) .
(3.6)
Therefore, the covariance measurement can be simplified and re-expressed as
C xi x j  Em  xi( m ) x (jm ) 
C yi y j  Em  yi( m ) y (jm ) 
(3.7)
(3.8)
.
The covariance measurements can be written as the following covariance matrices:
 Em  x1( m ) x1( m ) 

 
Cxx  

(m) (m)
 Em  xN x1 
Em  x1( m ) xN( m )  



Em  xN( m ) xN( m )  

Em  y1( m ) y N( m )  



Em  y N( m ) y N( m )  

 Em  y1( m ) y1( m ) 

 
C yy  

(m) (m)
 Em  y N y1 
(3.9)
and the covariance matrices can also be defined in vector format
C xx  Em  x ( m ) ( x ( m ) )t 
C yy  Em  y ( m ) ( y ( m ) )t 
(3.10)
(3.11)
.
By substituting y ( m)  V t x( m) into C yy , the following relation can be derived:
C yy  Em V t x ( m ) (V t x ( m) )t   V t Em  x ( m) ( x ( m) )t  V
 V t CxxV
C yy is a diagonal matrix C yy  diag[1 ,
(3.12)
.
, N ] which is a result of the
5
diagonalization of C xx by matrix V. For KLT, C xx has N real numbered
characteristic values which form a characteristic vector [1 , , N ] . There are also N
corresponding characteristic vector [v1 , , vN ] and all characteristic vectors are
orthogonal to each other. The condition vi  1 must be satisfied for KLT.
KLT is actually an orthogonal transform that optimizes the transformation
process by minimizing the average difference of square D. Consequently, minimum D
is obtained by
1/ N
Dmin (V )  2
2 R
N
    yk2 
 k 1

2
(3.13)
where ε is a correction constant determined by the image itself and the  yk2
represents the discrete degree. Theoretically, the following condition is true for an
arbitrary covariance matrix:
N
det Cyy    yk2
k 1
.
We note that Cyy is a diagonal matrix for KLT; therefore we can substitute
(3.14)
Error! Reference source not found. into the left-hand side of the above equation:
det C yy  det(V t CxxV )  det Cxx det(V tV )
 det Cxx
(3.15)
.
From the above equation, we can conclude that det Cxx is independent of the
transformation matrix V and Dmin (V ) can be re-written according to this relation:
1/ N
Dmin (V )  22 R  2 det C yy 
 22 R  2  det C xx 
1/ N
.
(3.16)
As a result, Dmin (V ) is independent of the transformation matrix V.
Suppose that we have a 3 by 2 transform block:
x
X  1
 x4
x2
x5
x3 
x6 
(3.17)
We can represent the ij element of Cxx as
Em  xi x j    H h  V v
(3.18)
where h is the horizontal distance between xi and xj, and v is the vertical distance
between xi and xj. In the case of Error! Reference source not found., Em[x1x6] equals
to ρH2ρV. Therefore, Cxx can be obtained as
6
x1
 1

 H
  2
C xx   V H
 V
 
 V H2
 V  H
x2
x3
H
 H2
H
1
H
V  H
V
V  H
x4
V
V  H
V  H2
1
V 
V  H
V
2
H
1
x5
x6
V  H
V
V  H
H
V  H2 

V  H 
V 

 H2 
H 
H
 H2
1

1 
H
x1
x2
x3
x4
x5
x6
(3.19)
.
This above equation can also be re-written as
  1  H  H2 
 

 1   H 1  H 
  2 
1 
H
 H
C xx  

 1  H  H2 



 V    H 1  H 

  H2  H 1 



 1  H  H2  


V    H 1  H  
  H2  H 1  


(3.20)
 1  H  H2  

 
1   H 1  H  
  H2  H 1  

 .
Once again we can further simplify Error! Reference source not found. by using
Kronecker Product and represent Cxx as


 1
C xx   
  V


 1

  H

1
  H2

V 
CV
H
1
H
CH

 

H  
1  

2
H
(3.21)
.
As shown in (3.21), the auto-covariance function can be separated into vertical and
horizontal components CV and CH such that the computational complexity of the KLT
is greatly reduced.
Since KLT is related to and dependent on the image data to be transformed, individual
KLT must be computed for each image. This increases the computational complexity.
The KLT matrix used by the encoder must be sent to the decoder. This also increases
the overall decoding time. These problems can be solved by generalizing KLT [B1].
Now, we discussed the extreme situation of KLT, Discrete Cosine Transform. Based
on the concept of KLT, DCT, a generic transform that does not need to be computed
for each image can be derived. Based on the derivation in
Error! Reference source not found., the auto-covariance matrix of any transform
block of size N can be represented by
7
CxxMODEL
 
 2



 N 1

2

 N 1 

 N 2 


 
 N 2
(3.22)
which is in a Toeplitz matrix form. This means that the KLT can be decided only by
one particular parameter ρ, but the parameter ρ is still data dependent. Therefore, by
setting the parameter ρ as 0.90 to 0.98 such that ρ→1, we can obtain the extreme
condition of KLT. That is, as ρ approaches to 1, the KLT is no longer optimal, but
fortunately the KLT at this extreme condition can still effectively remove the
dependencies between pixels. After a detailed derivation we get
 1
(2i  1)( j  1)
cos
( j  1)

2N
 N
vij  
(3.23)
 2 cos (2i  1)( j  1) ( j  1)
 N
2N
where i = 1, 2,…, N and j = 1, 2,…, N. Note that N is the dimension of the selected
block. This equation constitutes the basis set for the DCT and the kernel of the 2-D
DCT is expressed as
F (u , v) 
2C (u )C (v)
N
2C (u )C (v)

N
N
 (2i  1)(u  1) 
 (2 j  1)(v  1) 
 cos 

2N
2N



N
 f (i, j )cos 
i 1 j 1
N 1 N 1

i 0 j 0
 (2i  1)u 
 (2 j  1)v 
f (i, j )cos 
 cos 

2N
2N




(3.24)
where 0  i , v  N 1 and
1/ 2
C (n)  
1
( n  0)
(n  0)
(3.25)
.
Based on the fact that the basis images for DCT are fixed and input independent, we
can easily notice that the compuational complexity of DCT is much lower than that of
the KLT. Although the implementation of the sinusoidal transforms including the DCT
is not as simple as the nonsinusoidal transforms, the DCT does possess a comparable
information packing ability. On the whole, the DCT provides a good compromise
between information packing ability and computational complexity.
8
4. Quantization
In the previous topic, we have introduced the methods to reduce the correlation of the
image. These methods are all lossless processing because they just represent the
original image in another forms, so it is more suitable for us to process the image data.
But, these methods does not reduce the quantity of image data. In quantization, we
reduce the number of distinct output values to a much smaller set, so the objective of
compression can be achieved.
The quantizer can be classified into uniform quantizer and non-uniform quantizer. A
uniform quantizer divides the input domain into several equally spaced intervals. The
output or the reconstruction value is taken from the midpoint of the corresponding
interval. One simple uniform quantizer is as follows
x

Q( x)    0.5 , where  is the quantization step size
(4.1)


If the quantization step size is 1, then the quantization output function is shown in Fig.
4.1. As shown in Fig.4.1, the number of elements in the output domain is finite. On
the other hand, it is easy to understand that the larger the quantization step size, the
larger the loss of the precision.
Q ( x)
4.0
3.0
2.0
1.0
-3.5
-2.5
-1.5
-0.5
0.5
1.5
-1.0
-2.0
-3.0
-4.0
Fig. 4.1 The uniform quantitor.
9
2.5
3.5
x

(  1)
In non-uniform quantization, we will determine the quantization step according to the
probability of the elements in the input domain. For example, if the probability of the
small value is high, then we assign small quantization step to it. If the probability of
the large value is low, then we assign large quantization step to it. Fig.4.2 show one
example of the non-uniform quantizer with the characteristics described above. As can
be seen from Fig.4.2, the quantization of small value is smaller but its probability is
high, so we can reduce the quantization error efficiently.
Q( x)
4.0
3.0
2.0
1.0
-3.5
-2.5
-1.5
-0.5
0.5
1.5
2.5
3.5
-1.0
x

(  1)
-2.0
-3.0
-4.0
Fig. 4.2 The non-uniform quantizator
5.
Huffman Coding
The Huffman is a variable length coding algorithm. This data compression algorithm
has been widely adopted in many image and video compression standards such as
JPEG and MPEG. The Huffman algorithm is proved to provide the shortest average
codeword length per symbol. The Huffman Coding Algorithm is illustrated as follows
Assume the domain set of the input is composed of Sn  and the
corresponding probability is Pn  . Then perform the following procedure
Step 1: Order the symbols according to the probabilities
10
(A) Alphabet set: S1, S2,…, SN
(B) Probabilities: P1, P2,…, PN
The symbols are arranged such that P1  P2  ...  PN
Step 2: Replace the last two symbols SN and SN-1 to form a new symbol HN-1
that has the probabilities PN +PN-1. The new set of symbols has N-1
members: S1, S2,…, SN-2 , HN-1
Step 3: Repeat Step 2 until the final set has only 1 element
Step 4: The codeword for each symbol Si is obtained by traversing the binary
tree from its root to the leaf node corresponding to Si
Fig.5.1 shows one simple of Huffman coding. First, we order symbols with the
corresponding probability. Secondly, we combine the symbols a3 and a5 to form a new
symbol, and the probability of this symbol is the combination of its corresponding
combined symbols. The similar procedure is repeated until the final probability
become 1.0. After that, we assign the corresponding codewords to each symbol
backwardly. Finally, the codeword for each corresponding symbol is obtained.
Symbol
a2
a6
a1
a4
a3
a5
Probability
0.4
0.3
0.1
0.1
0.06
0.04
1
0.4
0.3
0.1
0.1
0.1
2
0.4
0.3
0.2
0.1
3
0.4
0.3
0.3
Fig.5.1 One Huffman Coding example
11
4
0.6
0.4
Code
1
00
011
0100
01010
01011
6. The JPEG Standard
JPEG is the first international digital image compression standard for still images, and
many of the later image and video compression standards we will introduced later are
mainly based on the basic structure of JPEG. The JPEG standard defines two basic
compression methods. One is the DCT-Based compression method, which is the lossy
compression, and another is the Predictive compression method, which is the lossless
compression. The DCT-Based compression method is known as the JPEG Baseline,
which is the most widely used image compression format in the web and digital
camera. In this toptic, we will both introduce the lossy and lossless JPEG compression
methods.
6.1
The JPEG Encoder
Fig. 6.1 shows the DCT-Based JPEG encoder. The YCbCr color transform and the
chrominance downsampling is not defined in the JPEG standard, but most JPEG
compression software will perform these processing because it makes the JPEG
encoder to reduce the data quantity more efficiently. The downsampled YCbCr
passing through the JPEG encoder will be encoded as bitstream afterward.
G
R
B
YCbCr Color
Transform
Image
Quantizer
Quantization
Table
Chrominance
Downsampling
(4:2:2 or 4:2:0)
Zigzag &
Run Length
Coding
Huffman
Encoding
Differential
Coding
Huffman
Encoding
8×8
FDCT
Bit-stream
Fig. 6.1 The DCT-Based encoder of JPEG compression standard
6.2 Differential Coding of the DC Components
Notably particularly, we will discussed the DC components separately rather than deal
with AC components and DC components in the same considerations because large
correlation still exists between the DC components in the neighboring macroblocks.
Assume the DC component in i-th macroblock is DCi and the DC component in the
12
previous block is DCi-1. One example is shown in Fig. 6.2.
DCi-1
…
DCi
Blocki-1
Blocki
…
Diffi = DCi - DCi-1
Fig. 6.2 Differential Coding of the DC components in the neighboring blocks
We will perform differential coding on the two DC components as follows
Diffi  DCi  DCi 1
(6.1)
We set DC0 = 0. DC of the current block DCi will be equal to DCi-1 + Diffi .
Therefore, in the JPEG file, the first coefficient is actually the difference of DCs.
Then the difference is encoded with the AC coefficients.
6.3 Zigzag Scanning of the AC Coefficients
In the entropy encoder, we encode the quantizated and differential coded coefficients
into bitstream. However, the quantizated and differential coded coefficients are 2D
signal and the structure seems not suitable for the entropy encoder. Thus, the JPEG
standard perform zigzag scanning on the AC coefficients to reorder the coefficients
in a zigzag order from low frequency to high frequency. The zigzag scan order is
illustrated in Fig. 6.3. It is worth noting that the zigzag scanning is performed on the
63 AC coefficients only, exclusive of the DC coefficient.
Fig. 6.3 Zigzag scanning order
13
6.4 Run Length Coding of the AC Coefficients
Because we assign larger quantization step size to the high frequency components
based on HVS theory, most of the high frequency components will be truncated to
zero while coarse quantization (strong quantization) is applied. In order to take
advantage of this property, the JPEG standard defines run length coding to convert the
AC coefficients to more efficient format for the entropy encoder. The RLC step
replaces the quantized values by
(6.2)
(RUNLENGTH , VALUE)
where RUNLENGTH is the number of zeros, and VALUE is the nonzero coefficients.
For example, assume the quantized 63 AC coefficients are as follows
57, 45,
0, 0, 0, 0, 23, 0,  30, 16, 0, 0, 1, 0, 0, 0,0, 0, 0, 0, ..., 0 (6.3)
Then we can perform RLC on this vector and we will have
 0,57   0, 45   4, 23 1, 30   0, 16   2,1 EOB 
(6.4)
where EOB means that the remaining coefficients are all zeros.
6.5 Entropy Coding
The DC and the AC coefficients after run length coding finally undergo the entropy
coding. Because it takes large computation to grow the optimal Huffman tree for the
input image and the dynamic Huffman tree must be included in the bitstream, the
JPEG defines the two Huffman tables for the DC and AC coefficients, respectively.
These Huffman coding tables save large amount of computation to encode the images
and the Huffman tree is not necessarily included in the encoded bitstream because the
default Huffman tree defined in the JPEG standard is fixed. The codeword table for
the value of the DC and AC coefficients is shown in Table 6.1.
Values
Bits for the value
0
Size
0
-1,1
0,1
1
-3,-2,2,3
00,01,10,11
2
-7,-6,-5,-4,4,5,6,7
000,001,010,011,100,101,110,111
3
-15,...,-8,8,...,15
0000,...,0111,1000,...,1111
4
-31,...,-16,16,...31
00000,...,01111,10000,...,11111
5
14
-63,...,-32,32,...63
000000,...,011111,100000,...,111111
6
-127,...,-64,64,...,127
0000000,...,0111111,1000000,...,1111111
7
-255,..,-128,128,..,255
...
8
-511,..,-256,256,..,511
...
9
-1023,..,-512,512,..,1023
...
10
-2047,...,-1024,1024,...,2047
...
11
Table 6.1 The codeword table for VALUE of the coefficients
Each differential coded DC coefficients will be encoded as
(SIZE , VALUE)
(6.5)
where SIZE indicates how many bits are needed to represent the VALUE of the
coefficient, and VALUE is the actual encoded binary DC coefficients. SIZE must be
Huffman coded while VALUE is not.
For example, if we have four DC coefficients 150, 5, -6, 3, then the DC coefficients
will be turned into Error! Reference source not found. according to Table 6.1.
 8,10010110   3,101  3, 001  2,11 
(6.6)
After the value of SIZE is obtained, we encode them with the Huffman table defined
in Table 6.2, so we can finally encode all the DC coefficients as
111110,10010110  100,101 100, 001  011,11 
Size
Values
Code Length
0
00
2
1
010
3
2
011
3
3
100
3
4
101
3
5
110
3
6
1110
4
7
11110
5
8
111110
6
9
1111110
7
10
11111110
8
11
111111110
9
Table 6.2 The Huffman table for the SIZE information of the luminance DC coefficients
15
(6.7)
The AC coefficients can be encoded with similar Huffman Table. The Huffman table
for the AC coefficients is shown in Table 6.3. The codeword format of the AC
coefficients is defined as follows
(6.8)
(RUNLENGTH , SIZE, VALUE)
Run/Size
code length
code word
0/0 (EOB)
4
1010
15/0 (ZRL)
11
11111111001
0/1
2
00
7
1111000
0/10
16
1111111110000011
1/1
4
1100
1/2
5
11011
1/10
16
1111111110001000
2/1
5
11100
16
1111111110011000
16
1111111111111110
...
0/6
...
...
...
4/5
...
15/10
Table 6.3 The Run/Size Huffman table for the luminance AC coefficients
7. JPEG 2000
The JPEG 2000 compression standard is a wavelet-based compression which employs
the wavelet transform as its core transform algorithm. The standard is developed to
improve the shortcomings of baseline JPEG. Instead of using the low-complexity and
memory efficient block DCT that was used in the JPEG standard, the JPEG 2000
replaces it by the discrete wavelet transform (DWT) which is based on
multi-resolution image representation. The main purpose of this analysis is to obtain
different approximations of a function f(x) at different levels of resolution. Both the
mother wavelet  ( x) and the scaling function  ( x ) are important functions to be
16
considered in multiresolution analysis. The DWT also improves compression
efficiency due to good energy compaction and the ability to decorrelate the image
across a larger scale.
7.1. Fundamental Building Blocks
The fundamental building blocks used in JPEG2000 are illustrated in Fig. 7.1. Similar
to the JPEG, a core transform coding algorithm is first applied to the input image data.
After quantizing and entropy coding the transformed coefficients, the output
codestream or the so-called bitstream are formed. Each of the blocks is described in
more detail.
Before the image data is fed into the transform block, the source image is decomposed
into three color components, either in RGB or YCbCr format. The input image and its
components are then further decomposed by tiling process which particularly benefits
applications that have a limited amount of available memory compared to the image
size . The tile size can be arbitrarily defined and it can be as large as the original
image. Next, the DWT is applied to each tile such that each tile is decomposed into
different resolution levels. The output subband coefficients are quantized and
collected into rectangular arrays of “code-blocks” before they are coded using the
context dependent binary arithmetic coder .
Compressed
Input
Discrete Wavelet
Uniform
Adaptive Binary
Bit-stream
Image
Transform (DWT)
Quantization
Arithmetic Coder
Organization
Image Data
Fig. 7.1 JPEG2000 fundamental building blocks .
The quantization process employed in JPEG 2000 is similar to that of JPEG which
allows each DCT coefficient to have a different step-size. The difference is that JPEG
2000 incorporated a central deadzone in the quantizer. A quantizer step-size Δb is
determined for each subband b based on the perceptual importance of each subband.
Then, each wavelet coefficient yb(u,v) in subband b is mapped to a quantized index
value qb(u,v) by the quantizer. This is efficiently performed according to
 y (u, v) 
qb (u, v)  sign( yb (u, v))  b

 b 
where the quantization step-size is defined as
17
(7.1)
 

 b  2 Rb  b 1  11b 
 2 .
(7.2)
Note that the quantization step-size is represented using a total of two bytes which
contains an 11-bit mantissa μb and a 5-bit exponent εb. The dynamic range Rb depends
on the number of bits used to represent the original image tile component and on the
choice of the wavelet transform . For reversible operation, the quantization step size is
required to be 1 (when μb = 0 and εb = Rb).
7.2. Wavelet Transform
A two-dimensional scaling function,  ( x, y ) , and three two-dimensional wavelet
 H ( x, y) ,  V ( x, y ) and  D ( x, y) are critical elements for wavelet transforms in two
dimensions . These scaling function and directional wavelets are composed of the
product of a one-dimensional scaling function  and corresponding wavelet  which
are demonstrated as the following:
 ( x, y )   ( x) ( y)
(7.3)
 H ( x, y)   ( x) ( y)
 V ( x, y)   ( y) ( x)
 D ( x, y)   ( x) ( y)
(7.4)
(7.5)
(7.6)
where  measures the horizontal variations (horizontal edges),  corresponds
to the vertical variations (vertical edges), and  D detects the variations along the
H
V
diagonal directions.
The two-dimensional DWT can be implemented using digital filters and
downsamplers. The block diagram in
Fig. 7.2 shows the process of taking the one-dimensional FWT of the rows of f (x, y)
and the subsequent one-dimensional FWT of the resulting columns. Three sets of
detail coefficients including the horizontal, vertical, and diagonal details are produced.
18
By iterating the single-scale filter bank process, multi-scale filter bank can be
generated. This is achieved by tying the approximation output to the input of another
filter bank to produce an arbitrary scale transform. For the one-dimensional case, an
image f (x, y) is used as the first scale input. The resulting outputs are four quarter-size
subimages: W , WH , WV , and WD which are shown in the center quad-image in
Fig. 7.3. Two iterations of the filtering process produce the two-scale decomposition
at the right of
Fig. 7.3. Fig. 7.4 shows the synthesis filter bank that is exactly the reverse of the
forward decomposition process .
h (m)
h (n)
2
WD ( j, m, n)
Rows
2
Columns
h (  m)
W ( j  1, m, n)
2
WV ( j, m, n)
Rows
h (  n)
h (m)
2
2
WH ( j, m, n)
Rows
Columns
h (  m)
2
Rows
Fig. 7.2 The analysis filter bank of the two-dimensional FWT .
19
W ( j , m, n)
W ( j , m, n) WH ( j , m, n)
W ( j  1, m, n)
WV ( j , m, n) WD ( j , m, n)
Fig. 7.3 A two-level decomposition of the two-dimensional FWT .
WD ( j, m, n)
2
h (m)
Rows
WV ( j, m, n)
2
2
+
h (  m)
h (n)
Columns
Rows
WH ( j, m, n)
2
+
h (m)
W ( j , m, n)
2
2
+
Rows
h (  m)
W ( j  1, m, n)
h (  n)
Columns
Rows
Fig. 7.4 The synthesis filter bank of the two-dimensional FWT .
8. Shape-Adaptive Image Compression
Both the JPEG and JPEG 2000 image compression standard can achieve great
compression ratio, however, they both have the following disadvantages while
applying to image compression procedure:
1. Cause severe block effect when the image is compressed with low bit-rate.
2. Image is compressed without consideration of the characteristics in an image
segment.
Therefore, for many image and video coding algorithms, the separation of the
20
contents into segments or objects has become a considerable key to improve the
image/video quality for very low bit rate applications. Moreover, the functionality of
making visual objects available in the compressed form has become an important part
in the next generation visual coding standards such as MPEG-4. Users can create
arbitrarily-shaped objects and manipulate or composite these objects manually. So
Huang proposed the Shape Adaptive Image Compression, which is abbreviated as
SAIC. Instead of taking the whole image as an object and utilizing transform coding,
quantization, and entropy coding to encode this object, the SAIC algorithm segments
the whole image into several objects, and each object has its own local characteristic
and color. Because of the high correlation of the color values in each image segment,
the SAIC can achieve better compression ratio and quality than conventional image
compression algorithm.
There are two aspects of coding an arbitrarily shaped visual object. The first part is to
code the shape of the object and the second part is to code the texture of the object
(pixels contained inside the object region). In this topic we basically pay attention on
the texture coding of still objects.
The flowchart of the shape-adaptive image compression is shown in Fig. 8.1. There
are three parts of operation. First, the input image is segmented into boundary part and
internal texture part. The boundary part contains the boundary information of the
object, while the internal texture part contains the internal contents of the object such
as color value. The two parts of the object is transformed and encoded, respectively.
Finally, the two separate encoded data will be combined into the full bit-stream, and
the processing of shape-adaptive image compression is finished. The theory and
operation of each part will be introduced in the following sections.
Boundary Descriptor
Boundary
Boundary
Transform
Coding
Quantization
And
Entropy Coding
Image
Segmentation
Interal texture
Bit-stream
Arbitrary Shape
Transform
Coding
Quantization
And
Entropy Coding
Coefficients of Transform Bases
Fig. 8.1 The block diagram of shape-adaptive image compression
21
8.1. Shape-Adaptive Transform Coding
After the internal texture of one image segment is obtained, we can take DCT of
the data of the internal texture. Because there are a lot of researches making efforts on
coding rectangular-shaped images and video such as discrete cosine transform (DCT)
coding and wavelet transform coding, it is the simplest way to just pad values into the
pixel positions out of image segment until filling the bounding box of itself. After this
process, the block-based transformation can be applied on the rectangular region
directly.
Although many padding algorithm has been proposed to improve the coding
efficiency, this method still has some problems. Padding values inside the bounding
box increases the high-order transformation coefficients which are later truncated, in
other words, the number of coefficients in the transformation domain is not identical
to the number of pixels in the image domain. Thus it leads to serious degradation of
the compression performance. Furthermore, the padding process actually produces
some redundancy which has to be removed. Fig.8.1 shows an example of padding
zeros into the pixel positions out of the image segment. The hatched region represents
the image region. The block then can be used to the posterior block-based
transformation such as DCT and wavelet transform.
0
0
105 98
0
0
75
96
0
0
99 101
73
85
66
60
0
100
97
89
94
87
64
55
0
0
84
94
90
81
71
66
0
0
93
86
94
81
70
0
0
0
0
86
86
81
72
0
0
0
0
98
97
78
0
0
0
0
0
105 104
0
0
0
Fig.8.1 Padding zeros into the pixel positions out of the image segment
Another method we usually used to transform the coefficients of shape-adaptive
image is using arbitrarily-shape orthogonal DCT bases. In effect, the height and width
of one arbitrarily-shape image segment are usually not equal. Hence, we redefine the
DCT as (8.1) and (8.2):
Forward Shape-Adaptive DCT:
22
W 1 H 1
F (k )    f ( x, y ) ' x , y ( k ), for k  1, 2,..., M
x 0 y 0
1/ 2 for k  0
C (k )  
 1 otherwise
(8.1)
Inverse Shape-Adaptive DCT:
f ( x, y ) 
2
H *W
M
 F (k ) '
k 1
x, y
(k ), for x  0,..., W  1 and y  0,..., H  1
(8.2)
1/ 2 for k  0
C (k )  
 1 otherwise
where W and H are the width and height of the image segment, respectively. M is the
length of the DCT coefficients.
The F(u,v) is called the Shape-Adaptive DCT coefficient, and the DCT basis is
x , y '(u, v) 
2C (u )C (v)
  (2 x  1)u 
  (2 y  1)v 
cos 
cos 


H *W
 2W

 2H
(8.3)
Because the number of points M is less than or equal to HW, we can know that the
HW bases may not be orthogonal. We can obtain the M orthogonal bases by means
of Gram-Schmidt process. Unfortunately, the complexity using Gram-Schmidt process
2
is up to O(n ) , it is still not the best way to transform the coefficients in SAIC
system.
On the other hand, the coding processing time extremely long while the number of
points in an image segment is too large, it is not practical for video streaming or other
real-time applications to use the transform using arbitrarily-shaped DCT bases. Makai
proposed a shape-adaptive DCT algorithm which is based on predefined orthogonal
sets of DCT basis functions. The SA-DCT has transform efficiency similar to that of
the shape-adaptive DCT but it performs with a much lower computational complexity
especially when the number of points in an image segment is large. Moreover,
compared with the traditional shape-adaptive transformation using padding algorithm,
the number of transformed coefficients of SA-DCT is identical to the number of
points in an image segment. This object-based coding of images is backward
compatible to existing block-based coding standard.
8.2. Shape-Adaptive DCT Algorithm
23
x
y
DC values
x
x
u
DCT-1
DCT-2
DCT-3
DCT-4
DCT-6
DCT-3
y'
(a)
(b)
x
x'
u
DC coefficients
DCT-6
DCT-5
DCT-4
DCT-2
DCT-1
DCT-1
u
(d)
(e)
(c)
v
u
(f)
Fig.8.2 Successive steps of SA-DCT performed on an arbitrarily shaped image segment.
As mentioned, the shape-adaptive DCT algorithm is based on predefined orthogonal
sets of DCT basis functions. The algorithm flushes and transforms the
arbitrarily-shaped image segments in horizontal and vertical directions separately. The
overall process is illustrated in Fig.8.2. The image block in Fig.(a) is segmented into
two regions, foreground (dark) and background (light). The shape-adaptive DCT
contains of two parts: (i) the vertical SA-DCT transformation and (ii) the horizontal
SA-DCT transformation. To perform the vertical SA-DCT transformation of the
foreground, the length of each column j of the foreground segment is first
calculated and then the columns are shifted and aligned to the upper border of the 8 by
8 reference block (Fig.(b)). The variable length (N-point) 1-D DCT transform matrix
DCT-N
 
1  
DCT-N( p, k )  c0  cos  p  k     ,
2 N
 
k, p  0  N 1
(8.4)
containing set of N DCT-N basis vectors is chosen. Note that c0  1 2 if p  0 , and
c0  1 otherwise; p denotes the pth DCT basis vector. The N vertical DCT
coefficients c j can then be calculated according to the formula
24
c j  (2 / N )  DCT-N  x j
(8.5)
In Fig.(b), the foreground segments are shifted to the upper boarder and each column
is transformed using DCT-N basis vectors. As a result DC values for the segment
columns are concentrated on the upper border of the reference block as shown in
Fig.(c). Again, as depicted in Fig.(e), before the horizontal DCT transformation, the
length of each row is calculated and the rows are shifted to the left border of the
reference block. A horizontal DCT adapted to the size of each row is then found using
(8.4) and (8.5). Note that the horizontal SA-DCT transformation is performed along
the vertical SA-DCT coefficients with the same index.
Finally the layout of the resulting DCT coefficients is obtained as shown in Fig.(f).
The DC coefficient is located in the upper left border of the reference block. In
addition, we can find that the number of SA-DCT coefficients equal to the number of
pixels inside the image segment. The resulting coefficients are concentrated around
the DC coefficient in accordance with the shape of the image segment
.
The overall processing steps are shown below:
Step 1.
The foreground pixels are shifted and aligned to the upper border of the
8 by 8 reference block.
Step 2.
A variable length 1-D DCT is performed in the vertical direction
according to the length of each column.
Step 3.
The resulting 1-D DCT coefficients of step 2 are shifted and aligned to
the left border of the 8 by 8 reference block.
Step 4.
A variable length 1-D DCT is performed in the horizontal direction
according to the length of each row.
As mentioned, the SA-DCT is backward compatible. Therefore, after performing
SA-DCT, the coefficients are then quantized and encoded by methods used by the 8
by 8 2-D DCT.
25
Because the contour of the segment is first transmitted to the receiver, the decoder can
perform the inverse shape adaptive DCT as the reverse operation
xj  DCT-NT  c j
(8.6)
in both the horizontal and vertical direction.
8.3. Quantization of the Arbitrarily-Shaped DCT Coefficients
Unlike the conventional DCT and the compression technique with padding method ,
the number of the arbitrarily-shaped DCT coefficients depends on the area of the
arbitrarily-shaped image segment. Therefore, the block-based quantization used in
JPEG is not useful in this situation. In JPEG, the 88 DCT coefficients are divided by a
quantization matrix which contains the corresponding image quantization step size for
each DCT coefficient. The compression technique with padding method also uses the
same way to quantize coefficients as the JPEG standard does. Instead of using
block-based quantizer, we define an extendable and segment shape-dependent
quantization array Q(k) as a linearly increasing line:
Q(k )  Qa k  Qc
(8.7)
for k  1, 2,..., M . The two parameter Qa and Qc denote the slope and the intercept of
the line respectively, and M is the number of the DCT coefficients within the
arbitrarily-shaped segment. The resulting arbitrarily-shaped DCT coefficients are
individually divided by the corresponding quantization value of Q(k) and rounded to
the nearest integer:
 F (k ) 
Fq (k )  Round 
(8.8)

 Q(k ) 
where k  1, 2,..., M . The Qc affects the quantization quantity of the DC term and the
slope Qa affects the rest of the coefficients. Because the high frequency components
of image segments are not visually significant to human vision, they are usually
discarded in compression. By choosing Qa appropriately, we can reduce the high
frequency components without having impact of human sight.
In our simulation, we quantize the DCT coefficients of the six image segments in
Fig.8.2 by using the quantization array Q(k) shown in Fig.8.3 with the parameter Qa
and Qc are set as 0.06 and 8 respectively. The quantization process of the DCT
coefficients is carried out according to (8.8) and the result depicts in .
We can find that most of the high frequency components are reduced to zeros. This is
benefic to the posterior coding process of image compression process.
26
45
40
35
30
25
20
15
10
5
0
100
200
300
400
500
600
Fig.8.3 Quantization array Q(k) with Qa = 0.06 and Qc = 8.
8.4. Shape-Adaptive Entropy Coding
Before introducing the theory of Shape-Adaptive entropy coding, here is one problem
must be solved. Because the variation of color around the boundary is large, if we
truncate these high frequency components, the original image may be corrupted by
some unnecessary noise.
In Ref. [4], the author proposes a method to solve this problem. Since most of the
variation is around the boundary region, we can divide the internal texture segments
into internal part and boundary region part.
100
(a)
0
-100
0
100
200
300
400
500
600
100
(b)
0
-100
0
50
100
150
200
250
300
350
400
0
50
100
150
200
250
300
350
400
100
(c)
0
-100
100
(d)
0
-100
0
50
100
150
200
250
300
350
100
(e)
0
-100
0
50
100
150
200
250
300
350
400
450
500
100
(f)
0
-100
0
50
100
150
200
250
300
350
27
Fig. 8.4 The quantized DCT coefficients of Error!
Reference source not found. (a) ~ (f).
The way to divide the image segment is making use of morphological erosion. For
two binary image set A and B, the erosion of A by B, denote A
B, is defined as
A ! B  z | ( B) z  A
(8.9)
where the translation of set B by point z = (z1, z2), denoted (B)z, is defined as
( B) z  c | c  b  z, for b  B
(8.10)
That is, the erosion of A by B is the set of all points z such that B which is translated
by z is contained in A. Base on the definition described above, we erode the shapes of
image segments by a 55 disk-shape image and we get the shape of the internal
region. Then we subtract it from the original shape and we get the shape of boundary
region. The process is illustrated in Fig. 8.5.
Boundaries
Original image
Segmentation
B
Shapes
Fill
origin
Shapes
B
Internal region
Erosion

+
Boundary region

Fig. 8.5 Divide the image segment by morphological erosion
After transforming and quantizing the internal texture, we can encode the quantized
coefficients and combine these encoded internal texture coefficients with the encoded
boundary bit-stream. The full encoding process is shown in Fig. 8.6.
28
Image segments
DCT coefficients
Quantizing & encoding
M1
S.A.
DCT
boundary
encoding
10011110101 EOB
101010111111111001 EOB
0
M2
M3
bit stream of
combine
boundaries
1010101011011111
111000111000101011111 EOB
Bit-stream of image
100111101010
EOB 1010101111111110
segments
01 EOB 111000111000101011111 EOB
Fig. 8.6 Process of Shape-Adaptive Coding Method
Reference
[1] 吉田俊之、酒井善則之共著,白執善編輯,“影像壓縮技術”,全華,2004.
[2] T. Acharya amd A. K. Ray, Image Processing Principles and Applications, John
Wiley & Sons, New Jersey.
[3] C. Christopoulos, A. Skodras, and T. Ebrahimi, “The JPEG2000 Still Image Coding
System: An Overview, ”IEEE Trans. on Consumer Electronics.
[4] M. Rabbani and R. Joshi, “An Overview of the JPEG2000 Still Image Compression
Standard,” Signal Processing: Image Comm.
[5] S. G. Mallat, "A theory for multiresolution signal decomposition: the wavelet
representation," Transactions on Pattern Analysis and Machine Intelligence.
[6] M.J. Slattery, J.L. Mitchell, “The Qx-coder,” IBM J. Res. Development.
[7] Jian-Jiun Ding, Pao-Yen, Lin, Jiun-De Huang, Tzu-Heng Lee, Hsin-Hui Chen,
“Morphology-Based Shape Adaptive Compression”
29
30
Download