Image Compression Jin-Zuo Liu E-mail: r99942103@ntu.edu.tw Graduate Institute of Communication Engineering National Taiwan University, Taipei, Taiwan, ROC Abstract Image compression, the technology of reducing the amount of data required to represent an image ,is one of the most important application in the field of digital processing. In this tutorial, we introduce the theory and practice of digital image compression. We examine the most frequently used compression techniques and describe the industry standards that make them useful. 1. The Introduction Of Digital Image The digital images can be generate by capturing the scene using camera or composited by computer software. The digital image consist of three matrices composed of integer-sampled values, and each matrix consists of one component of the color tristimulus – red, green, and blue. In general, we will not process the RGB components directly because the perception of the Human Visual System (HVS) cannot be efficiently exploited in the RGB color space .Instead, we transform the color components from RGB color space into the luminance and chrominance color space. In digital image compression, we often adopt the YCbCr color space as our three new primaries. Y is the luminance of the image which represents the brightness, while Cb and Cr are the chrominance of the image which represents the color information. We regards Cb as the difference between the gray and blue, and Cr as the difference between the difference between the gray and red. But, if we want to reconstruct the digital image, we intend to transform the color components from YCbCr color space into the RGB color space. Because it is simple for us to retrieve the image by remixing the frequencies of color components which are generated from vibrating.The transform formula from RGB to YCbCr is defined in (1.1). 1 Y 0.299 0.587 0.114 R 0 Cb 0.169 0.334 0.500 G 128 Cr 0.500 0.419 0.081 B 128 (1.1) According to the HVS, we know that the human eyes are more sensitive to the luminance than the chrominance. Therefore, we can reduce the accuracy of the chrominance to achieve data compaction through subsampling, and the difference is not easily perceived by the human eyes. There are four frequently used YCbCr subsampling formats, and all of them are shown in Fig. 1.1. It is worth noting that the name of the format is not always related to the subsampling ratio. (a) 4 : 4 : 4 W H Y (b) 4 : 2 : 2 W H (c) 4 : 2 : 0 W Y H W/2 W W/2 H/2 H Cb H W Cr W/2 W/2 H Cb Cb H/2 H Y H Y W/4 H C b W/4 Cr Cr Fig. 1.1.1 (d) 4 : 1 : 1 W H Cr YCbCr formats 2. Basic Image Compression Model As Fig2.1 shows, an image compression system is consists of two distinct functional components: an encoder and a decoder. The encoder performs compression, and the decoder performs the complementary operation of decompression. In the encoding process, the first stage transform the image into a non-visual format which contains reduced inter-pixel redundancies compared to that in the original input image. Most transform operations employed in this stage are usually reversible. A quantizer is employed in the second stage in order to reduced the perceptual redundancies of the input image. This process is, however, irreversible and it is not used in error-free 2 compression. In the last stage of the basic image compression model, a system encoder creates a fixed- or variable-length code to represent the quantized output and maps the output in accordance with the code. A variable-length code is used to represent the mapped and quantized data set in most case .The encoder assigns the shortest code words to the most frequently occurring output values and thus reduces coding redundancy. This operation is reversible. In the decoding process, the blocks are performed in reverse order. The reverse quantization is not included in the overall decoding process because it cause irreversible information loss. Encoding Process Input Image Quantization Transform Coding Symbol Encoding Bit stream (a) Decoding Process Bit stream Symbol Decoding Transform Decoding Recovered Image (b) Fig. 2.1 Basic image compression model There are many algorithms for practicing image compression in the fact. In order to evaluate the performance of image compression algorithms, we must define some measures. The compression ratio (CR), which is used to quantify the reduction in the quantity of data, is defined as (2.1). n CR 1 n2 (2.1) where n1 is the data quantity of original image, and n2 is the data quantity of the generated bitstream. In addition, the two well-known and widely used measures in the field of image compression are the Root Mean Square Error (RMSE) and the Peak-to-Signal Ratio (PSNR), which are defined in (2.2) and (2.3), respectively. 3 W 1 H 1 RMSE f ( x, y) f '( x, y) 2 x 0 y 0 WH PSNR 20 log 10 (2.2) 255 MSE (2.3) where f ( x, y ) is the pixel value of the original image, and f '( x, y ) is the pixel value of the compressed image. W and H are the width and height of the original image, respectively. The higher the value of CR is, the larger the amount of reduction in the data quantity can be achieved. The lower the value of RMSE and the larger the value of PSNR is, the better the quality of the compressed image is. 3. Transform Coding The reason for applying transform coding is to reduce the correlation between the neighboring pixels in the image through coordinate rotation. First, we discussed the Karhunen-Loeve Transform. Karhunen-Loeve Transform, abbreviated KLT, is the optimal transform coding method that can reduce the correlation mostly. Suppose M is the number of total blocks in an image and N is the number of total pixels within each block, pixels in each block can be expressed in the following 1-D vector form: x (1) x (1) x (1) x (1) t 2 N 1 t (m) (m) x2( m ) xN( m ) x x1 t (M ) (M ) x2( M ) xN( M ) x x1 (3.1) The general orthogonal transform equation to be perform on the divided blocks is y ( m) V t x( m) (m 1, 2, ,M). (3.2) Optimal orthogonal transform reduces the correlation among the N pixels in each block to the greatest extent. The definition of covariance can be applied here to 4 provide a good measurement on the correlation between pixels xi and xj or the correlation between the transform coefficients yi and yj: Cyi y j Em ( yi( m) yi( m) )( y (jm) y (jm) ) (3.3) Cxi x j Em ( xi( m) xi( m) )( x(jm) x(jm) ) (3.4) where N N yi( m ) vni xn( m ) yi( m ) vni xn( m ) n 1 n 1 (3.5) . To simplify the calculation, we assume that xi( m) xi( m) xi( m) . (3.6) Therefore, the covariance measurement can be simplified and re-expressed as C xi x j Em xi( m ) x (jm ) C yi y j Em yi( m ) y (jm ) (3.7) (3.8) . The covariance measurements can be written as the following covariance matrices: Em x1( m ) x1( m ) Cxx (m) (m) Em xN x1 Em x1( m ) xN( m ) Em xN( m ) xN( m ) Em y1( m ) y N( m ) Em y N( m ) y N( m ) Em y1( m ) y1( m ) C yy (m) (m) Em y N y1 (3.9) and the covariance matrices can also be defined in vector format C xx Em x ( m ) ( x ( m ) )t C yy Em y ( m ) ( y ( m ) )t (3.10) (3.11) . By substituting y ( m) V t x( m) into C yy , the following relation can be derived: C yy Em V t x ( m ) (V t x ( m) )t V t Em x ( m) ( x ( m) )t V V t CxxV C yy is a diagonal matrix C yy diag[1 , (3.12) . , N ] which is a result of the 5 diagonalization of C xx by matrix V. For KLT, C xx has N real numbered characteristic values which form a characteristic vector [1 , , N ] . There are also N corresponding characteristic vector [v1 , , vN ] and all characteristic vectors are orthogonal to each other. The condition vi 1 must be satisfied for KLT. KLT is actually an orthogonal transform that optimizes the transformation process by minimizing the average difference of square D. Consequently, minimum D is obtained by 1/ N Dmin (V ) 2 2 R N yk2 k 1 2 (3.13) where ε is a correction constant determined by the image itself and the yk2 represents the discrete degree. Theoretically, the following condition is true for an arbitrary covariance matrix: N det Cyy yk2 k 1 . We note that Cyy is a diagonal matrix for KLT; therefore we can substitute (3.14) Error! Reference source not found. into the left-hand side of the above equation: det C yy det(V t CxxV ) det Cxx det(V tV ) det Cxx (3.15) . From the above equation, we can conclude that det Cxx is independent of the transformation matrix V and Dmin (V ) can be re-written according to this relation: 1/ N Dmin (V ) 22 R 2 det C yy 22 R 2 det C xx 1/ N . (3.16) As a result, Dmin (V ) is independent of the transformation matrix V. Suppose that we have a 3 by 2 transform block: x X 1 x4 x2 x5 x3 x6 (3.17) We can represent the ij element of Cxx as Em xi x j H h V v (3.18) where h is the horizontal distance between xi and xj, and v is the vertical distance between xi and xj. In the case of Error! Reference source not found., Em[x1x6] equals to ρH2ρV. Therefore, Cxx can be obtained as 6 x1 1 H 2 C xx V H V V H2 V H x2 x3 H H2 H 1 H V H V V H x4 V V H V H2 1 V V H V 2 H 1 x5 x6 V H V V H H V H2 V H V H2 H H H2 1 1 H x1 x2 x3 x4 x5 x6 (3.19) . This above equation can also be re-written as 1 H H2 1 H 1 H 2 1 H H C xx 1 H H2 V H 1 H H2 H 1 1 H H2 V H 1 H H2 H 1 (3.20) 1 H H2 1 H 1 H H2 H 1 . Once again we can further simplify Error! Reference source not found. by using Kronecker Product and represent Cxx as 1 C xx V 1 H 1 H2 V CV H 1 H CH H 1 2 H (3.21) . As shown in (3.21), the auto-covariance function can be separated into vertical and horizontal components CV and CH such that the computational complexity of the KLT is greatly reduced. Since KLT is related to and dependent on the image data to be transformed, individual KLT must be computed for each image. This increases the computational complexity. The KLT matrix used by the encoder must be sent to the decoder. This also increases the overall decoding time. These problems can be solved by generalizing KLT [B1]. Now, we discussed the extreme situation of KLT, Discrete Cosine Transform. Based on the concept of KLT, DCT, a generic transform that does not need to be computed for each image can be derived. Based on the derivation in Error! Reference source not found., the auto-covariance matrix of any transform block of size N can be represented by 7 CxxMODEL 2 N 1 2 N 1 N 2 N 2 (3.22) which is in a Toeplitz matrix form. This means that the KLT can be decided only by one particular parameter ρ, but the parameter ρ is still data dependent. Therefore, by setting the parameter ρ as 0.90 to 0.98 such that ρ→1, we can obtain the extreme condition of KLT. That is, as ρ approaches to 1, the KLT is no longer optimal, but fortunately the KLT at this extreme condition can still effectively remove the dependencies between pixels. After a detailed derivation we get 1 (2i 1)( j 1) cos ( j 1) 2N N vij (3.23) 2 cos (2i 1)( j 1) ( j 1) N 2N where i = 1, 2,…, N and j = 1, 2,…, N. Note that N is the dimension of the selected block. This equation constitutes the basis set for the DCT and the kernel of the 2-D DCT is expressed as F (u , v) 2C (u )C (v) N 2C (u )C (v) N N (2i 1)(u 1) (2 j 1)(v 1) cos 2N 2N N f (i, j )cos i 1 j 1 N 1 N 1 i 0 j 0 (2i 1)u (2 j 1)v f (i, j )cos cos 2N 2N (3.24) where 0 i , v N 1 and 1/ 2 C (n) 1 ( n 0) (n 0) (3.25) . Based on the fact that the basis images for DCT are fixed and input independent, we can easily notice that the compuational complexity of DCT is much lower than that of the KLT. Although the implementation of the sinusoidal transforms including the DCT is not as simple as the nonsinusoidal transforms, the DCT does possess a comparable information packing ability. On the whole, the DCT provides a good compromise between information packing ability and computational complexity. 8 4. Quantization In the previous topic, we have introduced the methods to reduce the correlation of the image. These methods are all lossless processing because they just represent the original image in another forms, so it is more suitable for us to process the image data. But, these methods does not reduce the quantity of image data. In quantization, we reduce the number of distinct output values to a much smaller set, so the objective of compression can be achieved. The quantizer can be classified into uniform quantizer and non-uniform quantizer. A uniform quantizer divides the input domain into several equally spaced intervals. The output or the reconstruction value is taken from the midpoint of the corresponding interval. One simple uniform quantizer is as follows x Q( x) 0.5 , where is the quantization step size (4.1) If the quantization step size is 1, then the quantization output function is shown in Fig. 4.1. As shown in Fig.4.1, the number of elements in the output domain is finite. On the other hand, it is easy to understand that the larger the quantization step size, the larger the loss of the precision. Q ( x) 4.0 3.0 2.0 1.0 -3.5 -2.5 -1.5 -0.5 0.5 1.5 -1.0 -2.0 -3.0 -4.0 Fig. 4.1 The uniform quantitor. 9 2.5 3.5 x ( 1) In non-uniform quantization, we will determine the quantization step according to the probability of the elements in the input domain. For example, if the probability of the small value is high, then we assign small quantization step to it. If the probability of the large value is low, then we assign large quantization step to it. Fig.4.2 show one example of the non-uniform quantizer with the characteristics described above. As can be seen from Fig.4.2, the quantization of small value is smaller but its probability is high, so we can reduce the quantization error efficiently. Q( x) 4.0 3.0 2.0 1.0 -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 -1.0 x ( 1) -2.0 -3.0 -4.0 Fig. 4.2 The non-uniform quantizator 5. Huffman Coding The Huffman is a variable length coding algorithm. This data compression algorithm has been widely adopted in many image and video compression standards such as JPEG and MPEG. The Huffman algorithm is proved to provide the shortest average codeword length per symbol. The Huffman Coding Algorithm is illustrated as follows Assume the domain set of the input is composed of Sn and the corresponding probability is Pn . Then perform the following procedure Step 1: Order the symbols according to the probabilities 10 (A) Alphabet set: S1, S2,…, SN (B) Probabilities: P1, P2,…, PN The symbols are arranged such that P1 P2 ... PN Step 2: Replace the last two symbols SN and SN-1 to form a new symbol HN-1 that has the probabilities PN +PN-1. The new set of symbols has N-1 members: S1, S2,…, SN-2 , HN-1 Step 3: Repeat Step 2 until the final set has only 1 element Step 4: The codeword for each symbol Si is obtained by traversing the binary tree from its root to the leaf node corresponding to Si Fig.5.1 shows one simple of Huffman coding. First, we order symbols with the corresponding probability. Secondly, we combine the symbols a3 and a5 to form a new symbol, and the probability of this symbol is the combination of its corresponding combined symbols. The similar procedure is repeated until the final probability become 1.0. After that, we assign the corresponding codewords to each symbol backwardly. Finally, the codeword for each corresponding symbol is obtained. Symbol a2 a6 a1 a4 a3 a5 Probability 0.4 0.3 0.1 0.1 0.06 0.04 1 0.4 0.3 0.1 0.1 0.1 2 0.4 0.3 0.2 0.1 3 0.4 0.3 0.3 Fig.5.1 One Huffman Coding example 11 4 0.6 0.4 Code 1 00 011 0100 01010 01011 6. The JPEG Standard JPEG is the first international digital image compression standard for still images, and many of the later image and video compression standards we will introduced later are mainly based on the basic structure of JPEG. The JPEG standard defines two basic compression methods. One is the DCT-Based compression method, which is the lossy compression, and another is the Predictive compression method, which is the lossless compression. The DCT-Based compression method is known as the JPEG Baseline, which is the most widely used image compression format in the web and digital camera. In this toptic, we will both introduce the lossy and lossless JPEG compression methods. 6.1 The JPEG Encoder Fig. 6.1 shows the DCT-Based JPEG encoder. The YCbCr color transform and the chrominance downsampling is not defined in the JPEG standard, but most JPEG compression software will perform these processing because it makes the JPEG encoder to reduce the data quantity more efficiently. The downsampled YCbCr passing through the JPEG encoder will be encoded as bitstream afterward. G R B YCbCr Color Transform Image Quantizer Quantization Table Chrominance Downsampling (4:2:2 or 4:2:0) Zigzag & Run Length Coding Huffman Encoding Differential Coding Huffman Encoding 8×8 FDCT Bit-stream Fig. 6.1 The DCT-Based encoder of JPEG compression standard 6.2 Differential Coding of the DC Components Notably particularly, we will discussed the DC components separately rather than deal with AC components and DC components in the same considerations because large correlation still exists between the DC components in the neighboring macroblocks. Assume the DC component in i-th macroblock is DCi and the DC component in the 12 previous block is DCi-1. One example is shown in Fig. 6.2. DCi-1 … DCi Blocki-1 Blocki … Diffi = DCi - DCi-1 Fig. 6.2 Differential Coding of the DC components in the neighboring blocks We will perform differential coding on the two DC components as follows Diffi DCi DCi 1 (6.1) We set DC0 = 0. DC of the current block DCi will be equal to DCi-1 + Diffi . Therefore, in the JPEG file, the first coefficient is actually the difference of DCs. Then the difference is encoded with the AC coefficients. 6.3 Zigzag Scanning of the AC Coefficients In the entropy encoder, we encode the quantizated and differential coded coefficients into bitstream. However, the quantizated and differential coded coefficients are 2D signal and the structure seems not suitable for the entropy encoder. Thus, the JPEG standard perform zigzag scanning on the AC coefficients to reorder the coefficients in a zigzag order from low frequency to high frequency. The zigzag scan order is illustrated in Fig. 6.3. It is worth noting that the zigzag scanning is performed on the 63 AC coefficients only, exclusive of the DC coefficient. Fig. 6.3 Zigzag scanning order 13 6.4 Run Length Coding of the AC Coefficients Because we assign larger quantization step size to the high frequency components based on HVS theory, most of the high frequency components will be truncated to zero while coarse quantization (strong quantization) is applied. In order to take advantage of this property, the JPEG standard defines run length coding to convert the AC coefficients to more efficient format for the entropy encoder. The RLC step replaces the quantized values by (6.2) (RUNLENGTH , VALUE) where RUNLENGTH is the number of zeros, and VALUE is the nonzero coefficients. For example, assume the quantized 63 AC coefficients are as follows 57, 45, 0, 0, 0, 0, 23, 0, 30, 16, 0, 0, 1, 0, 0, 0,0, 0, 0, 0, ..., 0 (6.3) Then we can perform RLC on this vector and we will have 0,57 0, 45 4, 23 1, 30 0, 16 2,1 EOB (6.4) where EOB means that the remaining coefficients are all zeros. 6.5 Entropy Coding The DC and the AC coefficients after run length coding finally undergo the entropy coding. Because it takes large computation to grow the optimal Huffman tree for the input image and the dynamic Huffman tree must be included in the bitstream, the JPEG defines the two Huffman tables for the DC and AC coefficients, respectively. These Huffman coding tables save large amount of computation to encode the images and the Huffman tree is not necessarily included in the encoded bitstream because the default Huffman tree defined in the JPEG standard is fixed. The codeword table for the value of the DC and AC coefficients is shown in Table 6.1. Values Bits for the value 0 Size 0 -1,1 0,1 1 -3,-2,2,3 00,01,10,11 2 -7,-6,-5,-4,4,5,6,7 000,001,010,011,100,101,110,111 3 -15,...,-8,8,...,15 0000,...,0111,1000,...,1111 4 -31,...,-16,16,...31 00000,...,01111,10000,...,11111 5 14 -63,...,-32,32,...63 000000,...,011111,100000,...,111111 6 -127,...,-64,64,...,127 0000000,...,0111111,1000000,...,1111111 7 -255,..,-128,128,..,255 ... 8 -511,..,-256,256,..,511 ... 9 -1023,..,-512,512,..,1023 ... 10 -2047,...,-1024,1024,...,2047 ... 11 Table 6.1 The codeword table for VALUE of the coefficients Each differential coded DC coefficients will be encoded as (SIZE , VALUE) (6.5) where SIZE indicates how many bits are needed to represent the VALUE of the coefficient, and VALUE is the actual encoded binary DC coefficients. SIZE must be Huffman coded while VALUE is not. For example, if we have four DC coefficients 150, 5, -6, 3, then the DC coefficients will be turned into Error! Reference source not found. according to Table 6.1. 8,10010110 3,101 3, 001 2,11 (6.6) After the value of SIZE is obtained, we encode them with the Huffman table defined in Table 6.2, so we can finally encode all the DC coefficients as 111110,10010110 100,101 100, 001 011,11 Size Values Code Length 0 00 2 1 010 3 2 011 3 3 100 3 4 101 3 5 110 3 6 1110 4 7 11110 5 8 111110 6 9 1111110 7 10 11111110 8 11 111111110 9 Table 6.2 The Huffman table for the SIZE information of the luminance DC coefficients 15 (6.7) The AC coefficients can be encoded with similar Huffman Table. The Huffman table for the AC coefficients is shown in Table 6.3. The codeword format of the AC coefficients is defined as follows (6.8) (RUNLENGTH , SIZE, VALUE) Run/Size code length code word 0/0 (EOB) 4 1010 15/0 (ZRL) 11 11111111001 0/1 2 00 7 1111000 0/10 16 1111111110000011 1/1 4 1100 1/2 5 11011 1/10 16 1111111110001000 2/1 5 11100 16 1111111110011000 16 1111111111111110 ... 0/6 ... ... ... 4/5 ... 15/10 Table 6.3 The Run/Size Huffman table for the luminance AC coefficients 7. JPEG 2000 The JPEG 2000 compression standard is a wavelet-based compression which employs the wavelet transform as its core transform algorithm. The standard is developed to improve the shortcomings of baseline JPEG. Instead of using the low-complexity and memory efficient block DCT that was used in the JPEG standard, the JPEG 2000 replaces it by the discrete wavelet transform (DWT) which is based on multi-resolution image representation. The main purpose of this analysis is to obtain different approximations of a function f(x) at different levels of resolution. Both the mother wavelet ( x) and the scaling function ( x ) are important functions to be 16 considered in multiresolution analysis. The DWT also improves compression efficiency due to good energy compaction and the ability to decorrelate the image across a larger scale. 7.1. Fundamental Building Blocks The fundamental building blocks used in JPEG2000 are illustrated in Fig. 7.1. Similar to the JPEG, a core transform coding algorithm is first applied to the input image data. After quantizing and entropy coding the transformed coefficients, the output codestream or the so-called bitstream are formed. Each of the blocks is described in more detail. Before the image data is fed into the transform block, the source image is decomposed into three color components, either in RGB or YCbCr format. The input image and its components are then further decomposed by tiling process which particularly benefits applications that have a limited amount of available memory compared to the image size . The tile size can be arbitrarily defined and it can be as large as the original image. Next, the DWT is applied to each tile such that each tile is decomposed into different resolution levels. The output subband coefficients are quantized and collected into rectangular arrays of “code-blocks” before they are coded using the context dependent binary arithmetic coder . Compressed Input Discrete Wavelet Uniform Adaptive Binary Bit-stream Image Transform (DWT) Quantization Arithmetic Coder Organization Image Data Fig. 7.1 JPEG2000 fundamental building blocks . The quantization process employed in JPEG 2000 is similar to that of JPEG which allows each DCT coefficient to have a different step-size. The difference is that JPEG 2000 incorporated a central deadzone in the quantizer. A quantizer step-size Δb is determined for each subband b based on the perceptual importance of each subband. Then, each wavelet coefficient yb(u,v) in subband b is mapped to a quantized index value qb(u,v) by the quantizer. This is efficiently performed according to y (u, v) qb (u, v) sign( yb (u, v)) b b where the quantization step-size is defined as 17 (7.1) b 2 Rb b 1 11b 2 . (7.2) Note that the quantization step-size is represented using a total of two bytes which contains an 11-bit mantissa μb and a 5-bit exponent εb. The dynamic range Rb depends on the number of bits used to represent the original image tile component and on the choice of the wavelet transform . For reversible operation, the quantization step size is required to be 1 (when μb = 0 and εb = Rb). 7.2. Wavelet Transform A two-dimensional scaling function, ( x, y ) , and three two-dimensional wavelet H ( x, y) , V ( x, y ) and D ( x, y) are critical elements for wavelet transforms in two dimensions . These scaling function and directional wavelets are composed of the product of a one-dimensional scaling function and corresponding wavelet which are demonstrated as the following: ( x, y ) ( x) ( y) (7.3) H ( x, y) ( x) ( y) V ( x, y) ( y) ( x) D ( x, y) ( x) ( y) (7.4) (7.5) (7.6) where measures the horizontal variations (horizontal edges), corresponds to the vertical variations (vertical edges), and D detects the variations along the H V diagonal directions. The two-dimensional DWT can be implemented using digital filters and downsamplers. The block diagram in Fig. 7.2 shows the process of taking the one-dimensional FWT of the rows of f (x, y) and the subsequent one-dimensional FWT of the resulting columns. Three sets of detail coefficients including the horizontal, vertical, and diagonal details are produced. 18 By iterating the single-scale filter bank process, multi-scale filter bank can be generated. This is achieved by tying the approximation output to the input of another filter bank to produce an arbitrary scale transform. For the one-dimensional case, an image f (x, y) is used as the first scale input. The resulting outputs are four quarter-size subimages: W , WH , WV , and WD which are shown in the center quad-image in Fig. 7.3. Two iterations of the filtering process produce the two-scale decomposition at the right of Fig. 7.3. Fig. 7.4 shows the synthesis filter bank that is exactly the reverse of the forward decomposition process . h (m) h (n) 2 WD ( j, m, n) Rows 2 Columns h ( m) W ( j 1, m, n) 2 WV ( j, m, n) Rows h ( n) h (m) 2 2 WH ( j, m, n) Rows Columns h ( m) 2 Rows Fig. 7.2 The analysis filter bank of the two-dimensional FWT . 19 W ( j , m, n) W ( j , m, n) WH ( j , m, n) W ( j 1, m, n) WV ( j , m, n) WD ( j , m, n) Fig. 7.3 A two-level decomposition of the two-dimensional FWT . WD ( j, m, n) 2 h (m) Rows WV ( j, m, n) 2 2 + h ( m) h (n) Columns Rows WH ( j, m, n) 2 + h (m) W ( j , m, n) 2 2 + Rows h ( m) W ( j 1, m, n) h ( n) Columns Rows Fig. 7.4 The synthesis filter bank of the two-dimensional FWT . 8. Shape-Adaptive Image Compression Both the JPEG and JPEG 2000 image compression standard can achieve great compression ratio, however, they both have the following disadvantages while applying to image compression procedure: 1. Cause severe block effect when the image is compressed with low bit-rate. 2. Image is compressed without consideration of the characteristics in an image segment. Therefore, for many image and video coding algorithms, the separation of the 20 contents into segments or objects has become a considerable key to improve the image/video quality for very low bit rate applications. Moreover, the functionality of making visual objects available in the compressed form has become an important part in the next generation visual coding standards such as MPEG-4. Users can create arbitrarily-shaped objects and manipulate or composite these objects manually. So Huang proposed the Shape Adaptive Image Compression, which is abbreviated as SAIC. Instead of taking the whole image as an object and utilizing transform coding, quantization, and entropy coding to encode this object, the SAIC algorithm segments the whole image into several objects, and each object has its own local characteristic and color. Because of the high correlation of the color values in each image segment, the SAIC can achieve better compression ratio and quality than conventional image compression algorithm. There are two aspects of coding an arbitrarily shaped visual object. The first part is to code the shape of the object and the second part is to code the texture of the object (pixels contained inside the object region). In this topic we basically pay attention on the texture coding of still objects. The flowchart of the shape-adaptive image compression is shown in Fig. 8.1. There are three parts of operation. First, the input image is segmented into boundary part and internal texture part. The boundary part contains the boundary information of the object, while the internal texture part contains the internal contents of the object such as color value. The two parts of the object is transformed and encoded, respectively. Finally, the two separate encoded data will be combined into the full bit-stream, and the processing of shape-adaptive image compression is finished. The theory and operation of each part will be introduced in the following sections. Boundary Descriptor Boundary Boundary Transform Coding Quantization And Entropy Coding Image Segmentation Interal texture Bit-stream Arbitrary Shape Transform Coding Quantization And Entropy Coding Coefficients of Transform Bases Fig. 8.1 The block diagram of shape-adaptive image compression 21 8.1. Shape-Adaptive Transform Coding After the internal texture of one image segment is obtained, we can take DCT of the data of the internal texture. Because there are a lot of researches making efforts on coding rectangular-shaped images and video such as discrete cosine transform (DCT) coding and wavelet transform coding, it is the simplest way to just pad values into the pixel positions out of image segment until filling the bounding box of itself. After this process, the block-based transformation can be applied on the rectangular region directly. Although many padding algorithm has been proposed to improve the coding efficiency, this method still has some problems. Padding values inside the bounding box increases the high-order transformation coefficients which are later truncated, in other words, the number of coefficients in the transformation domain is not identical to the number of pixels in the image domain. Thus it leads to serious degradation of the compression performance. Furthermore, the padding process actually produces some redundancy which has to be removed. Fig.8.1 shows an example of padding zeros into the pixel positions out of the image segment. The hatched region represents the image region. The block then can be used to the posterior block-based transformation such as DCT and wavelet transform. 0 0 105 98 0 0 75 96 0 0 99 101 73 85 66 60 0 100 97 89 94 87 64 55 0 0 84 94 90 81 71 66 0 0 93 86 94 81 70 0 0 0 0 86 86 81 72 0 0 0 0 98 97 78 0 0 0 0 0 105 104 0 0 0 Fig.8.1 Padding zeros into the pixel positions out of the image segment Another method we usually used to transform the coefficients of shape-adaptive image is using arbitrarily-shape orthogonal DCT bases. In effect, the height and width of one arbitrarily-shape image segment are usually not equal. Hence, we redefine the DCT as (8.1) and (8.2): Forward Shape-Adaptive DCT: 22 W 1 H 1 F (k ) f ( x, y ) ' x , y ( k ), for k 1, 2,..., M x 0 y 0 1/ 2 for k 0 C (k ) 1 otherwise (8.1) Inverse Shape-Adaptive DCT: f ( x, y ) 2 H *W M F (k ) ' k 1 x, y (k ), for x 0,..., W 1 and y 0,..., H 1 (8.2) 1/ 2 for k 0 C (k ) 1 otherwise where W and H are the width and height of the image segment, respectively. M is the length of the DCT coefficients. The F(u,v) is called the Shape-Adaptive DCT coefficient, and the DCT basis is x , y '(u, v) 2C (u )C (v) (2 x 1)u (2 y 1)v cos cos H *W 2W 2H (8.3) Because the number of points M is less than or equal to HW, we can know that the HW bases may not be orthogonal. We can obtain the M orthogonal bases by means of Gram-Schmidt process. Unfortunately, the complexity using Gram-Schmidt process 2 is up to O(n ) , it is still not the best way to transform the coefficients in SAIC system. On the other hand, the coding processing time extremely long while the number of points in an image segment is too large, it is not practical for video streaming or other real-time applications to use the transform using arbitrarily-shaped DCT bases. Makai proposed a shape-adaptive DCT algorithm which is based on predefined orthogonal sets of DCT basis functions. The SA-DCT has transform efficiency similar to that of the shape-adaptive DCT but it performs with a much lower computational complexity especially when the number of points in an image segment is large. Moreover, compared with the traditional shape-adaptive transformation using padding algorithm, the number of transformed coefficients of SA-DCT is identical to the number of points in an image segment. This object-based coding of images is backward compatible to existing block-based coding standard. 8.2. Shape-Adaptive DCT Algorithm 23 x y DC values x x u DCT-1 DCT-2 DCT-3 DCT-4 DCT-6 DCT-3 y' (a) (b) x x' u DC coefficients DCT-6 DCT-5 DCT-4 DCT-2 DCT-1 DCT-1 u (d) (e) (c) v u (f) Fig.8.2 Successive steps of SA-DCT performed on an arbitrarily shaped image segment. As mentioned, the shape-adaptive DCT algorithm is based on predefined orthogonal sets of DCT basis functions. The algorithm flushes and transforms the arbitrarily-shaped image segments in horizontal and vertical directions separately. The overall process is illustrated in Fig.8.2. The image block in Fig.(a) is segmented into two regions, foreground (dark) and background (light). The shape-adaptive DCT contains of two parts: (i) the vertical SA-DCT transformation and (ii) the horizontal SA-DCT transformation. To perform the vertical SA-DCT transformation of the foreground, the length of each column j of the foreground segment is first calculated and then the columns are shifted and aligned to the upper border of the 8 by 8 reference block (Fig.(b)). The variable length (N-point) 1-D DCT transform matrix DCT-N 1 DCT-N( p, k ) c0 cos p k , 2 N k, p 0 N 1 (8.4) containing set of N DCT-N basis vectors is chosen. Note that c0 1 2 if p 0 , and c0 1 otherwise; p denotes the pth DCT basis vector. The N vertical DCT coefficients c j can then be calculated according to the formula 24 c j (2 / N ) DCT-N x j (8.5) In Fig.(b), the foreground segments are shifted to the upper boarder and each column is transformed using DCT-N basis vectors. As a result DC values for the segment columns are concentrated on the upper border of the reference block as shown in Fig.(c). Again, as depicted in Fig.(e), before the horizontal DCT transformation, the length of each row is calculated and the rows are shifted to the left border of the reference block. A horizontal DCT adapted to the size of each row is then found using (8.4) and (8.5). Note that the horizontal SA-DCT transformation is performed along the vertical SA-DCT coefficients with the same index. Finally the layout of the resulting DCT coefficients is obtained as shown in Fig.(f). The DC coefficient is located in the upper left border of the reference block. In addition, we can find that the number of SA-DCT coefficients equal to the number of pixels inside the image segment. The resulting coefficients are concentrated around the DC coefficient in accordance with the shape of the image segment . The overall processing steps are shown below: Step 1. The foreground pixels are shifted and aligned to the upper border of the 8 by 8 reference block. Step 2. A variable length 1-D DCT is performed in the vertical direction according to the length of each column. Step 3. The resulting 1-D DCT coefficients of step 2 are shifted and aligned to the left border of the 8 by 8 reference block. Step 4. A variable length 1-D DCT is performed in the horizontal direction according to the length of each row. As mentioned, the SA-DCT is backward compatible. Therefore, after performing SA-DCT, the coefficients are then quantized and encoded by methods used by the 8 by 8 2-D DCT. 25 Because the contour of the segment is first transmitted to the receiver, the decoder can perform the inverse shape adaptive DCT as the reverse operation xj DCT-NT c j (8.6) in both the horizontal and vertical direction. 8.3. Quantization of the Arbitrarily-Shaped DCT Coefficients Unlike the conventional DCT and the compression technique with padding method , the number of the arbitrarily-shaped DCT coefficients depends on the area of the arbitrarily-shaped image segment. Therefore, the block-based quantization used in JPEG is not useful in this situation. In JPEG, the 88 DCT coefficients are divided by a quantization matrix which contains the corresponding image quantization step size for each DCT coefficient. The compression technique with padding method also uses the same way to quantize coefficients as the JPEG standard does. Instead of using block-based quantizer, we define an extendable and segment shape-dependent quantization array Q(k) as a linearly increasing line: Q(k ) Qa k Qc (8.7) for k 1, 2,..., M . The two parameter Qa and Qc denote the slope and the intercept of the line respectively, and M is the number of the DCT coefficients within the arbitrarily-shaped segment. The resulting arbitrarily-shaped DCT coefficients are individually divided by the corresponding quantization value of Q(k) and rounded to the nearest integer: F (k ) Fq (k ) Round (8.8) Q(k ) where k 1, 2,..., M . The Qc affects the quantization quantity of the DC term and the slope Qa affects the rest of the coefficients. Because the high frequency components of image segments are not visually significant to human vision, they are usually discarded in compression. By choosing Qa appropriately, we can reduce the high frequency components without having impact of human sight. In our simulation, we quantize the DCT coefficients of the six image segments in Fig.8.2 by using the quantization array Q(k) shown in Fig.8.3 with the parameter Qa and Qc are set as 0.06 and 8 respectively. The quantization process of the DCT coefficients is carried out according to (8.8) and the result depicts in . We can find that most of the high frequency components are reduced to zeros. This is benefic to the posterior coding process of image compression process. 26 45 40 35 30 25 20 15 10 5 0 100 200 300 400 500 600 Fig.8.3 Quantization array Q(k) with Qa = 0.06 and Qc = 8. 8.4. Shape-Adaptive Entropy Coding Before introducing the theory of Shape-Adaptive entropy coding, here is one problem must be solved. Because the variation of color around the boundary is large, if we truncate these high frequency components, the original image may be corrupted by some unnecessary noise. In Ref. [4], the author proposes a method to solve this problem. Since most of the variation is around the boundary region, we can divide the internal texture segments into internal part and boundary region part. 100 (a) 0 -100 0 100 200 300 400 500 600 100 (b) 0 -100 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 100 (c) 0 -100 100 (d) 0 -100 0 50 100 150 200 250 300 350 100 (e) 0 -100 0 50 100 150 200 250 300 350 400 450 500 100 (f) 0 -100 0 50 100 150 200 250 300 350 27 Fig. 8.4 The quantized DCT coefficients of Error! Reference source not found. (a) ~ (f). The way to divide the image segment is making use of morphological erosion. For two binary image set A and B, the erosion of A by B, denote A B, is defined as A ! B z | ( B) z A (8.9) where the translation of set B by point z = (z1, z2), denoted (B)z, is defined as ( B) z c | c b z, for b B (8.10) That is, the erosion of A by B is the set of all points z such that B which is translated by z is contained in A. Base on the definition described above, we erode the shapes of image segments by a 55 disk-shape image and we get the shape of the internal region. Then we subtract it from the original shape and we get the shape of boundary region. The process is illustrated in Fig. 8.5. Boundaries Original image Segmentation B Shapes Fill origin Shapes B Internal region Erosion + Boundary region Fig. 8.5 Divide the image segment by morphological erosion After transforming and quantizing the internal texture, we can encode the quantized coefficients and combine these encoded internal texture coefficients with the encoded boundary bit-stream. The full encoding process is shown in Fig. 8.6. 28 Image segments DCT coefficients Quantizing & encoding M1 S.A. DCT boundary encoding 10011110101 EOB 101010111111111001 EOB 0 M2 M3 bit stream of combine boundaries 1010101011011111 111000111000101011111 EOB Bit-stream of image 100111101010 EOB 1010101111111110 segments 01 EOB 111000111000101011111 EOB Fig. 8.6 Process of Shape-Adaptive Coding Method Reference [1] 吉田俊之、酒井善則之共著,白執善編輯,“影像壓縮技術”,全華,2004. [2] T. Acharya amd A. K. Ray, Image Processing Principles and Applications, John Wiley & Sons, New Jersey. [3] C. Christopoulos, A. Skodras, and T. Ebrahimi, “The JPEG2000 Still Image Coding System: An Overview, ”IEEE Trans. on Consumer Electronics. [4] M. Rabbani and R. Joshi, “An Overview of the JPEG2000 Still Image Compression Standard,” Signal Processing: Image Comm. [5] S. G. Mallat, "A theory for multiresolution signal decomposition: the wavelet representation," Transactions on Pattern Analysis and Machine Intelligence. [6] M.J. Slattery, J.L. Mitchell, “The Qx-coder,” IBM J. Res. Development. [7] Jian-Jiun Ding, Pao-Yen, Lin, Jiun-De Huang, Tzu-Heng Lee, Hsin-Hui Chen, “Morphology-Based Shape Adaptive Compression” 29 30