A Comparative Study of DCT, LOT, and DWT-based Image Coders by Warit Wichakool Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degrees of Bachelor of Science in Electircal Engineering and Computer Science and Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2001 © Warit Wichakool, MMI. All rights reserved. The author hereby grants to MIT permission to reproduce and BARKE distribute publicly paper and electronic copies of this thesis document in whole or in part. MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUL 11 2001 LIBRARIES Author ...... Department of Electrical Engineering and Computer Science May 23, 2001 Certified by..... VIA K.S. Thyagarajan Principal Engineer daiesisispupervisf Certified by... David H.itaelin _-Professor of Electrical Engineering WL.T ThesiS~uj rvisor Accepted by........... Arthur C. Smith Chairman, Department Committee on Graduate Students A Comparative Study of DCT, LOT, and DWT-based Image Coders by Warit Wichakool Submitted to the Department of Electrical Engineering and Computer Science on May 23, 2001, in partial fulfillment of the requirements for the degrees of Bachelor of Science in Electrical Engineering and Computer Science and Master of Engineering in Electrical Engineering and Computer Science Abstract Ten transform coding systems were compared in terms of their system complexity and peak signal-to-noise ratios (PSNR). The results showed that the relationship between PSNR and bit rate was affected by the combination of the quantizer and the encoder, but was not affected by the type of transform. The performance of the embedded zerotree wavelet (EZW) system was effected by the number of discrete wavelet transform (DWT) levels. In comparison with the 2-level EZW system at a given bit rate, the EZW system improved PSNR by up to 4 dB at low bit rates as the number of levels increased from two to three, and gained another 1 dB as the number of levels increased from three to four. Four of the ten systems were studied further: the discrete cosine transform (DCT) baseline JPEG, the lapped orthogonal transform (LOT) version of baseline JPEG, the visual threshold wavelet with the run-length Huffman coder, and the EZW with the adaptive Huffman coder. The PSNR values for the Lena image at 0.5 bit/pixel for the four systems were 34.56, 34.43, 34.97, and 34.52 dB, respectively. In comparison with the DCT JPEG, the LOT JPEG provided 0.5 dB better PSNR and also reduced the image blockiness, but it introduced small ringing artifacts in areas with sharp edges. The visual threshold wavelet yielded better PSNR than the DCT system at the same bit rate, but the reconstructed image suffered from blurriness. Finally, the EZW system performed comparably to the DCT system. Although the reconstructed image exhibited no blockiness, it clearly lost some details. VI-A Company Thesis Supervisor: K.S. Thyagarajan Title: Principal Engineer M.I.T. Thesis Supervisor: David H. Staelin Title: Professor of Electrical Engineering Acknowledgments I would like to take this opportunity to express my gratitude toward many people on the course of my thesis. First of all, I would like to thank K.S. Thyagarajan for his advice and guidance. I also would like to thank Professor Staelin for his guidance and comments on my thesis, and Henrique Malvar for providing the programs and references of the LOT system for my simulation. I also would like to thank all members of Digital Cinema at QUALCOMM INCORPORATED for their supports during my research at the company. In addition, I would like to thank Peter Agboh and Songpon Deechongkit for their comments on my research. In addition, I have to thank Yui (Siraprapha Sanchatjate), my family, and all my friends for all the mental support and encouragement they have been giving me through out the years at MIT. Contents 1 Introduction 12 2 Background 2.1 2.2 2.3 16 Transform . . . . . . . . . . . . . . . . . . . . 17 2.1.1 Discrete Cosine Transform . . . . . . . 18 2.1.2 Lapped Orthogonal Transform . . . . . 20 2.1.3 Discrete Wavelet Transform . . . . . . 26 . . . . . . . . . . . . . . . . . . 31 2.2.1 Optimal Uniform Quantizer . . . . . . 33 2.2.2 JPEG Uniform Quantizer . . . . . . . 34 2.2.3 Visual Threshold Wavelet Quantizer 35 2.2.4 Embedded Zerotree Wavelet Quantizer 37 Quantization Entropy Coding . . . . . . . . . . . . . . . . . 40 2.3.1 Huffman Coding . . . . . . . . . . . . 41 2.3.2 Adaptive Huffman Coding . . . . . . . 41 2.3.3 Run-Length Huffman Coding . . . . . 41 3 Simulation Methods 44 3.1 Part I: System Complexity . . . . . . 46 3.2 Part II: System Performance..... 46 3.2.1 Part II-A: Effect of Number of Levels on DWT Systems . 47 3.2.2 Part II-B: Effect of Transform . . . . . . . . . . . . . . . 47 3.2.3 Part II-C: Effect of Quantizer 48 4 4 3.2.4 Part II-D: Effect of Entropy Coder . . . . . . . . . . . . . . . 49 3.2.5 Part II-E: Overall System Performance . . . . . . . . . . . . . 49 3.3 Part III: Visual Quality . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.4 Test Im ages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.1 4.2 4.3 5 52 Results and Discussions Part I: System Complexity . . . . . . . . . . . . . . . . . . . . . . . . 52 4.1.1 Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.1.2 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1.3 Entropy Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Part II: System Performance . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.1 Part II-A: Effect of Number Levels on DWT Systems . . . . . 56 4.2.2 Part II-B: Effect of Transform . . . . . . . . . . . . . . . . . . 65 4.2.3 Part II-C: Effect of Quantizer . . . . . . . . . . . . . . . . . . 70 4.2.4 Part II-D: Effect of Entropy Coder . . . . . . . . . . . . . . . 71 4.2.5 Part II-E: Overall System Performance . . . . . . . . . . . . . 78 Part III: Visual Quality. . . . . . . . . . . . . . . . . . . . . . . . . . 90 Summary 105 A General Thesis Release Letter and Classification Review Letter 5 107 List of Figures 2-1 Basic transform coding system . . . . . . . . . . . . . . . . . . . 16 2-2 Block diagram for a separable 2-D transform . . . . . . . . . . 18 2-3 Flowgraph conventions . . . . . . . . . . . . . . . . . . . . . . . . 18 2-4 1-D DCT basis functions . . . . . . . . . . . . . . . . . . . . . . . 19 2-5 Fast DCT for M=8 .................... . . . . . . . . . . 21 2-6 Fast IDCT for M=8 ................ . . . . . . . . . . 22 2-7 General structure of the LOT . . . . . . . . . . . . . . . . . . . . 23 2-8 LOT basis functions ................ . . . . . . . . . . 25 2-9 Type-I, fast LOT for a block size of 16 . . . . . . . . . . . . . . 26 . . . . . . . . . . 27 2-11 Type-I, fast LOT for the finite length signal . . . . . . . . . . . 28 2-12 2-level, 2-band analysis and synthesis for the 1-D DV T . . . 30 2-13 Organization of 3-level DWT coefficients . . . . . . . . . . . . . 31 . . . . . 39 2-15 Zig-zag scan of the run-length Huffman coder . . . . . . . . . . 43 2-16 Flowgraph of the run-length Huffman coder . . . . . . . . . . . 43 2-10 Type-I, fast ILOT for a block size of 16 .... 2-14 Locations of parent-descendants of the 3-level EZW 4-1 Effect of number of DWT levels on the PSNR performance of the DWT systems using the EZW quantizer and the adaptive H uffm an coder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4-2 Efficiency test for the adaptive Huffman coder . . . . . . . . . 89 4-3 bfragl: original image at 8 bpp . . . . . . . . . . . . . . . . . . . 93 4-4 bfrag2: DCT + JPEG + Run-Length Huffman at 0.50 bpp . 94 6 4-5 bfrag3: 3-level DWT + EZW + Adaptive Huffman at 0.50 bpp 95 4-6 bfrag4: 4-level DWT + EZW + Adaptive Huffman at 0.50 bpp 96 4-7 bfrag5: 3-level DWT + Visual Threshold + Run-Length Huffman at 0.50 bpp ....... 97 ............................ 4-8 bfrag6: LOT + JPEG + Run-Length Huffman at 0.50 bpp . 4-9 lenal: original image at 8 bpp .................. 98 . 99 4-10 lena2: DCT + JPEG + Run-Length Huffman at 0.50 bpp . . 100 4-11 lena3: 3-level DWT + EZW + Adaptive Huffman at 0.50 bpp 101 4-12 lena4: 4-level DWT + EZW + Adaptive Huffman at 0.50 bpp 102 4-13 lena5: LOT + JPEG + Run-Length Huffman at 0.50 bpp . . 103 4-14 lena6: 3-level DWT + Visual Threshold + Run-Length Huffman at 0.50 bpp ............................ 7 104 List of Tables 2.1 Computational costs of the fast DCT . . . . . . . . . . . . . . . 21 2.2 Computational costs of the type-I, fast LOT . . . . . . . . . . 29 2.3 Coefficients of 9/7-tap Villasenor biorthogonal filters . . . . . 29 2.4 Computational costs of the DWT . . . . . . . . . . . . . . . . . 32 2.5 Basis function amplitudes for 9/7-tap biorthogonal filters . . 36 2.6 Quantization levels for 9/7-tap biorthogonal filters . . . . . . 37 3.1 List of tested transform coding systems . . . . . . . . . . . . . 44 3.2 List of systems for comparing the effect of number of DWT levels on the PSNR performance . . . . . . . . . . . . . . . . . . 3.3 47 List of systems for comparing the effect of transforms on the PSNR performance using the optimal uniform quantizer, the Huffman coder, and the run-length Huffman coder . . . . . . 3.4 List of systems for comparing the effect of quantizers on the PSNR performance using the run-length Huffman coder . . . 3.5 48 49 List of systems for comparing the effect of entropy coders on the PSNR performance using Huffman and run-length Huffman coders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.6 List of selected systems for comparing the PSNR performance 50 3.7 List of test im ages . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1 Computational costs of transform algorithms . . . . . . . . . . 53 4.2 Normalized costs of transforming a 512x512 image . . . . . . 53 8 4.3 List of systems for comparing the effect of DWT levels on the PSNR performance of the DWT systems . . . . . . . . . . . . . 4.4 57 Effect of DWT levels on the PSNR performance of the DWT systems using the optimal uniform quantizer and the run58 length Huffman coder ............................... 4.5 Effect of DWT levels on the PSNR performance of the DWT systems using the optimal uniform quantizer and the run59 length Huffman coder (cont.) .................... 4.6 Effect of DWT levels on the PSNR performance of the DWT systems using the visual threshold quantizer and the runlength Huffman coder ............................... 4.7 60 Effect of DWT levels on the PSNR performance of the DWT systems using the visual threshold quantizer and the runlength Huffman coder (cont.) .................... 4.8 61 Effect of DWT levels on the PSNR performance of the DWT systems using the EZW quantizer and the adaptive Huffman co d er . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 62 Effect of DWT levels on the PSNR performance of the DWT systems using the EZW quantizer and the adaptive Huffman coder (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.10 List of systems for comparing the effect of transforms on the PSNR performance 65 .......................... 4.11 Effect of transforms on the PSNR performance using the optimal uniform quantizer and the Huffman coder ........ 66 4.12 Effect of transforms on the PSNR performance using the optimal uniform quantizer and the Huffman coder (cont.) . .. 67 4.13 Effect of transforms on the PSNR performance using the optimal uniform quantizer and the run-length Huffman coder . 9 68 4.14 Effect of transforms on the PSNR performance using the optimal uniform quantizer and the run-length Huffman coder (con t.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.15 List of systems for comparing the effect of quantizers on the PSNR performance using the run-length Huffman coder . . . 71 4.16 Effect of quantizers on the PSNR performance of the DCT systems using the run-length Huffman coder . . . . . . . . . . 72 4.17 Effect of quantizers on the PSNR performance of the DCT systems using the run-length Huffman coder (cont.) . . . . . . 73 4.18 Effect of quantizers on the PSNR performance of the LOT systems using the run-length Huffman coder . . . . . . . . . . 74 4.19 Effect of quantizers on the PSNR performance of the LOT systems using the run-length Huffman coder (cont.) . . . . . . 75 4.20 Effect of quantizers on the PSNR performance of the DWT systems using the run-length Huffman coder . . . . . . . . . . 76 4.21 Effect of quantizers on the PSNR performance of the DWT systems using the run-length Huffman coder (cont.) . . . . . . 77 4.22 List of systems for comparing the effect of entropy coders on the PSNR performance using the optimal uniform quantizer 78 4.23 Effect of entropy coders on the PSNR performance of the DCT systems using the optimal uniform quantizer . . . . . . 79 4.24 Effect of entropy coders on the PSNR performance of the DCT systems using the optimal uniform quantizer (cont.) . . 80 4.25 Effect of entropy coders on the PSNR performance of the LOT systems using the optimal uniform quantizer . . . . . . . 81 4.26 Effect of entropy coders on the PSNR performance of the LOT systems using the optimal uniform quantizer (cont.) . . 82 4.27 Effect of entropy coders on the PSNR performance of the DWT systems using the optimal uniform quantizer . . . . . . 10 83 4.28 Effect of entropy coders on the PSNR performance of the DWT systems using the optimal uniform quantizer (cont.) . 84 4.29 List of selected systems for the comparison in PSNR perform an ce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.30 Comparison of the PSNR performance of DCT, LOT, and DWT systems ....... .............................. 86 4.31 Comparison of the PSNR performance of DCT, LOT, and DW T systems (cont.) ........................ 4.32 Result images and the corresponding PSNRs .......... 11 . 87 91 Chapter 1 Introduction Modern media is overwhelming with graphics such as images and movies. Constraints on bandwidth and memory space create trade-offs between the size and quality of images. One solution is to compress the image using a transform coding system. The basic transform coding system consists of three blocks: a transform block, a quantization block, and an entropy coding block. At a given bit rate, different systems yield different image quality. In addition, each system varies in terms of the algorithm complexity. Within the system, the complexity and the image quality may be mainly influenced by a single functional block or by a combination of all three blocks. It is hard to determine the best system to use. One of the most popular compression systems uses the discrete cosine transform (DCT) system recommended by Joint Photographic Experts Group (JPEG). The simplest JPEG system is the baseline JPEG. This system combines the DCT with a uniform scalar quantizer and a run-length Huffman coder [1], [9]. The DCT system provides high quality images at reasonable bit rates, but exhibits blockiness at low bit rates. This artifact is caused by independent processes of transformed blocks and the discontinuity of the DCT basis functions [7], [14]. This artifact can be reduced by increasing the bit rate or using more complex systems. However, the gain in quality may not justify additional complexity of the system. Another solution to reduce the blocking effect is to change the transform block to the lapped orthogonal transform (LOT). This transform solves the problem by 12 overlapping and modifying the DCT basis functions to eliminate the independent transformation of image blocks and the discontinuity of the basis functions [7]. The LOT is similar to the DCT because it is a block transform. Furthermore, the LOT uses the DCT bases as its building blocks for new basis functions [7]. However, there is no standard quantizer and entropy coder for the LOT system. Since the LOT is similar to the DCT, the LOT may benefit from the JPEG-like quantizer and encoder in the same way as the DCT. Therefore, the reduction of the blocking effect can be done by modifying the transform block to the LOT and updating parameters for the quantizer and the encoder accordingly. This solution may be the simplest one to improve the visual quality of the image. Recently, the wavelet-based system has been extensively studied and developed for image compression applications. Many of the candidates for the JPEG-2000 standard are wavelet-based [6]. The discrete wavelet transform (DWT) is one of sub-band coding. One implementation of DWT system employs a filter bank to separate the signal into a number of frequency bands. Each band is then quantized and encoded depending on the systems. These filter banks operate on the whole image instead of a block like the DCT and the LOT. As a result, it should eliminate the image blockiness completely. In addition, the DWT system can gain compression from a special structure called a zerotree. Sample encoders based on this structure are the embedded zerotree wavelet (EZW) and the set partitioning in hierarchical trees (SPIHT). Both encoders use a bit-plane coding scheme and the zerotree structure to compress the information and are claimed to provided higher PSNR than the baseline DCT JPEG at given bit rates [10], [11]. In addition, both the EZW and the SPIHT are embedded systems that can achieve the targeted bit rate exactly. Furthermore, the encoders do not have to send a code table in the encoded file because the decoders will generate the code table during the decoding process [11]. These properties allow both systems to perform progressive coding. Another wavelet system is the visual threshold system. It uses a predefined quantization level for each wavelet band [16]. This quantization scheme is similar to the baseline JPEG. Therefore, this system can be adapted to use the run-length coder of the baseline JPEG system with some 13 modifications. The choices above present alternatives for image compression. All transform coding systems may achieve the same PSNR and bit rate but at different costs. Since the system consists of three functional blocks: the transform, quantizer, and entropy coder, the performance of each system may depend on one particular block or the combination of the blocks. It is useful to compare the PSNR gain for each type of transform, quantizer, and entropy coder independently. The PSNR gain of the transform block can be compared by using the same type of the quantizer and the entropy coder for all transforms. The PSNR gain, as a result, is influenced mainly by the transform block. The comparison of quantizers can be done by restricting the choice of transform and entropy coder to be the same for all compared systems. The comparison of entropy coders can be done in a similar manner. The results of these comparisons should reveal the optimal encoding scheme for each transform. This thesis compared several transform coding systems in terms of their system complexity and peak signal-to-noise ratios (PSNR) at given bit rates. The tested transforms included the discrete cosine transform (DCT), the lapped orthogonal transform (LOT), and the discrete wavelet transform (DWT). The tested quantizers included the optimal uniform quantizer, the JPEG uniform quantizer, the visual threshold quantizer, and the embedded zerotree wavelet (EZW). Finally, the tested entropy coders included the Huffman coder, the adaptive Huffman coder, and the runlength Huffman coder. The system complexity was compared in terms of number of additions and multiplications. This thesis focused primarily on the transform block for the complexity comparison. However, comparisons of quantizers and entropy coders were also included. Selected combinations of these three components were used to study the influence of each component on the PSNR performance. Finally, four systems were studied further to compare overall system performance including the visual quality of reconstructed images. They were the baseline DCT JPEG, the baseline LOT JPEG, the visual threshold wavelet with the run-length Huffman coder, and the EZW with the adaptive Huffman coder. In all comparisons, seven standard test images were used, including Lena and Barbara. 14 This thesis is organized as follows. Chapter 2 presents background and algorithm implementation of the functional blocks used in this thesis. Chapter 3 presents lists of comparison methods and corresponding tested systems. In Chapter 4, the results are presented and discussed, Finally, the thesis is concluded and further studies are proposed in Chapter 5. 15 Chapter 2 Background This chapter presents brief background and algorithm implementation of all components of the still image compression used in this thesis. A simple transform coding system consists of three fucntional blocks: transform, quantization, and entropy coding. A block diagram of a simple encoder/decoder system is shown in Figure 2-1 below. original image S Transform Quantization Entropy Cdn Coding compressed mg image (a) reconstructed image Entropy Inversed Transfom Dequantization Decoding compressed image (b) Figure 2-1: Block diagram of a basic transform coding system for still image copmression (a) The encoding part of the system. (b) The decoding counterpart of the system An original image is passed through the transform block that outputs transform coefficients. The quantizer then reduces the coefficients to fewer numbers or symbols. 16 Finally, the entropy coder translates those numbers or symbols into a stream of binary bits in order to be stored or transmitted. In this thesis, lists of all components in the simulation are shown as follows. " Transform: (a) Discrete Cosine Transform (DCT) (b) Lapped Orthogonal Transform (LOT) (c) Discrete Wavelet Transform (DWT) " Quantization: (a) Optimal Uniform Quantizer (b) JPEG Uniform Quantizer (c) Visual Threshold Wavelet Quantizer (d) Embedded Zerotree Wavelet Quantizer (EZW) * Entropy Coding: (a) Huffman (b) Adaptive Huffman (c) Run-Length Huffman The quantizers and coders listed above may be specifically designed for particular transform coefficients. Some modifications are required in order to use the same quantizer or entropy coder for different transforms. The modifications are described in the background section of that particular algorithm. 2.1 Transform The transform block is used to remove redundancy of the information of the input signal, an image in this case [1], [4], [8]. The image signal is projected onto a particular set of basis functions. In general, the basis functions are orthonormal in order 17 to minimize the redundancy in the representations. In this thesis, all transforms are either orthogonal or nearly orthogonal because of the advantages of fast algorithms. Furthermore, all transforms are separable, meaning the transform can be performed on each dimension independently. This property reduces the complexity of the implementation significantly. The separable 2-D transform can be computed according to the diagram shown in Figure 2-2 below. There are three transforms used in this thesis: the DCT, the LOT, and the DWT. The details of each transform are discussed in the sections below. In addition, there are some conventions of the flowgraphs used in this thesis. All conventions are shown in Figure 2-3 below. input Row - Row Transform Transpose Transform Transpose 2-D transformed image Transpose 2-D transformed : a (a) reconstructed Inversed Row Inversed Row Transform Transpose Transform signal image (b) Figure 2-2: Block diagram for a separable 2-D transform a b a+b a-b a a c c*a b Figure 2-3: Flowgraph conventions 2.1.1 Discrete Cosine Transform The discrete cosine transform (DCT) is an orthogonal block transform. It projects each block of the input signal onto a set of truncated, sampled cosine waveform. The computation of the DCT coefficients can be done as follows. Let M be the block length, k be the index of the basis function, and n be the sample index, the basis 18 functions of the 1-D DCT are given by = ank Cos (nk+ ) k M 2 2 ] ; (2.1) O<k<M-1, where ; k=O r- 2 1 c[k] (2.2) ; otherwise. First four basis functions of the 1-D DCT are shown in Figure 2-4 below. (b) (a) 0.5 -0.5 ) 2 4 6 -0.5 E 0 2 4 6 8 6 8 (d) 0.5 -T- 0 p 0 (c) 0.5( -0.5 T 0 0 0 '1111.' 2 4 6 -0.5 8 0 2 4 Figure 2-4: First four basis functions of the 1-D DCT for M=8 The 2-D DCT is a separable transform. The 2-D transform can be done as previ- 19 ously shown in Figure 2-2 above. In the case of one dimensional transform, the DCT coefficients, X[k], can be calculated as follows. Let x[n] be an input signal indexed by n and M is the DCT block length, X[k] is given by 2 M-1 X[k] = c[k] Ik [n] Cos (n + 2) 0 k< M-1, (2.3) M -l, (24 n:=O where -k0 c[k] =- v' 1 ; otherwise. The inverse discrete cosine transform (IDCT) is given by V22 M1 z~~n] = [1k 0O<ni< 7r] ~~c[k]ZC[k] cos (n + 2) M O[ Mk= ; _<M-, (24 (2.4 where i[n] is a reconstructed signal and k[k] is the dequantized DCT coefficient. All other symbols are the same as those in the DCT algorithm shown above. In this thesis, the transform block has a length of 8 for 1-D transform and a size of 8x8 for 2-D signal. The fast algorithms for DCT and IDCT are shown below in Figure 2-5 and 2-6 respectively. The DCT algorithms can be found in [6]. Finally, the computational costs of the 1-D and 2-D DCT algorithms used in this thesis are summarized in Table 2.1 below. The costs of the IDCT algorithms are the same as the DCT. The computational costs shown below assumes that the image dimension is RxC, where R is the number of rows and C is the number of columns. In addition, it is assumed that R and C are divisible by 8 in this thesis. 2.1.2 Lapped Orthogonal Transform Although the DCT has been employed extensively in still image compression, the DCT system exhibits blocking artifact at low bit rates. The lapped orthogonal transform (LOT) was developed to minimize this artifact. The causes of the blocking 20 1/(2sqrt(2)) x[O] m[0 x[1] m[1 x[2] m[2 n[O] n[1] - -- - X[0] n[0 1/(2sqrt(2)) n[1- n[2] (C2)/2 n[2] X[2] (S2 x[4] - - m[4 (S2)/2 n[3--- -------n[3] m[13 x[3] m[4] X[4] ------ X[6] - n[4] p[4 (Cl)/2 (S1)/2 x[5] x[6] - - ---- x[7]-------- m[5 m[5] n[5] p[5 m[6 m[7 m[7 n[6]1 n[6 ------ p[6 - n[7] 1/sqrt(2) (C 1)/2 Figure 2-5: Flowgraph of Fast DCT for M=8 Si represents sin( (C5)/2 p[7 - X[5] (S5)/2 (S5)/2 n[7 ------ X[7] ----- 1/sqrt(2) m[6 X[1] (S1)/2 (C5)/2 Ci represents cos( X[3] ) and ). Table 2.1: Computational costs of the fast DCT, M=8 # Additions # Multiplications 1-D, 8-point DCT 26 16 RxC, 2-D DCT 13RC 4RC 2 effect can be divided into two categories. First, the DCT bases are short, non- overlapped, and discontinuous at the edges. Second, the DCT system transforms each block independently. The LOT eliminates this artifact by forcing the bases to decay to zero at both edges and overlapping with a neighboring transform block [7], [14]. In this thesis, the overlapped part was half of the block length and started at the middle of the preceding block. The general structure of LOT is shown in Figure 2-7. In order to make the LOT an orthogonal transform, all the bases must form an orthonormal set. Furthermore, the overlapped portion must be orthogonal to all bases 21 1/(2sqrt(2)) X[O] n[] n0n[] 1/(2sqrt(2)) X[4] - - - - - x[0] n[1] m[1] m[1] x[1] n[2 n[2] m[2] m[2] x[2] n[3 n[3] ------ m[3] x[3] (/2 - m[O] n[1] (C2)/2 X[2] (S2)>: X[6] -- m[O] (S1)/2 X[7] p[5 X[3] p[6] 1/sqrt(2) p[6 /sqrt(2) X[5] -p[7] n[5 --- m[4] m[4] (C1)/2 n[4] p[4] X[1] -m[3] -m[5] -----(C 1)/2 (C5)/2 ----- n[6] m[5] m[6] (C - n[7 -- - m[7] x[5] -- - - - - x[6] m[7] - - - - - - x[7] Ci represents cos( Figure 2-6: Flowgraph of Fast IDCT for M=8 Si represents sin( x[4] m[6] 5)/2 p[7 ---- -- - (S1)/2 N'r ) and ). of the neighboring block [5]. It is not practical to use the original version of the LOT because there was no fast algorithm available. However, the nearly orthogonal lapped transform has a fast algorithm. This type of LOT, called the type-I, fast LOT, uses bases of DCT as building blocks for its bases. This algorithm requires few additional computations beyond the existing DCT algorithm [7]. Let M be the length of a The LOT coefficients can be calculated as follows. transform block, V is the overlapped length, where M - 2V. The transform pair is given by X = PTX (2.5) and x = PX, (2.6) where P is the basis function matrix whose size is MxV. x is an input vector of 22 -2M+1 -M -M+1 I I -2M+1 M M+1 01 M M+l 2M 2M x[n] Direct LOT -M pT PT P Inverse LOT -2M+1 0 1 -M+1 P P P M 0 1 T M+1 2M x[n] Figure 2-7: General structure of the LOT In this structure, the overlapped portion is a half of the block length. PT and P are the transform and the inverse transform operator respectively. length M. The basis function matrix, P, for the fast algorithm is given by P = P0 Z, (2.7) where P is the MxV matrix and Z is the orthogonal matrix of size VxV. The matrix Po is given by PO= 1 De De - Do Do J(De - (2.8) Do) -J(De - Do) De and Do are the even and odd functions of the DCT bases. J is an anti-identity matrix. Z is approximated in the fast algorithm as Z ~ I 0 0 Z ,7 23 (2.9) where Z is a matrix of size V by v. The matrix Z Z is defined as TOT 1T2 ... Ty- 2, (2.10) where matrix T is an m by ! matrix. It is defined as 1 Ti= 0 0 0 Y(0j) 0 0 I 0 . (2.11) Y(Oj) is a rotational matrix in the position (i, i) of the matrix T. Y is a 2x2 matrix and is defined as Y(00 - cos Oi sin Oi sin Oi cos O6 (2.12) where Oi is a rotation angle. In this thesis, the type-I, fast LOT had the block length, M, of 16 and the overlapped length, V, of 8 because the 8-point fast DCT can be used as part of LOT algorithms. The matrix are [0, 01, 02] = Z is equal to TOT 1 T2 . The three angles for matrix Y(0j) [0.137r, 0.167r, 0.137r] for the optimal coding gain, and [00, 0 1, 02] = [0.1457r, 0.177r, 0.167r] for the QR-based quasi-optimal LOT [5]. This thesis used the angle for the optimal coding gain. The first four basis functions of the type-I, fast LOT for optimal coding gain with the block length of 16 samples and overlapped portion of 8 samples are shown in Figure 2-8 below. The implementations of type-I, fast LOT block transform and inverse transform are shown in Figure 2-9 and 2-10 respectively. The LOT programs can be found in [6]. In addition, all figures of the LOT system were reproduced with the permission of Henrique Malvar. In the case of a finite length signal, the beginning block and the ending block must be modified to support the non-existing overlapping part. It is done by reflecting few samples of the beginning and the ending block outside the signal. Then the reflected parts are treated as part of the signal. In this thesis, only first and last four samples 24 (a) (b) 1 1 0.5 0.5 0(D 0 0 -0.5 -0.5 -1 0 10 5 -T iI -1 15 0 5 0.5 ii 0 -0.5 -1 0 5 15 (d) (c) 1 10 1 OT TT 0.5 0 jj 10 TI 0 -0.5 -1 15 0 5 jj jc 10 15 Figure 2-8: First four basis functions of the type-I fast LOT, for M = 2V, V =8 were reflected in order to compute the 8-point DCT of the first and the last block. As a result, only the even DCT coefficients are non-zero and the algorithm uses only the non-zero coefficients to compute the LOT coefficients. The type-I, fast LOT of a finite length signal is shown in Figure 2-11. The computational costs for the type-I, fast LOT are summarized in Table 2.2, where R and C are image dimension. In this thesis, both R and C are assumed to be divisible by 8. The computational costs for the whole image take into account the half length DCT of all edges of the transformed image. 25 br x [0] 0-4 r b x [1] r b x [2] w--r b x [3] Sb 0 0 0 1 2 2 4 3 DCT 6 1 X[0] 5 3 X[2] b 6 5 X[4] x [7] r b X [+ 0] 0 xr+1t] * b 7 7 0 0 1 2 -b 2 4 4 x r[5] b b x [6] x [4] b r x r+12] r+1 b xr+ 14] b 3 4 DCT a ---- X[6--][1 a - - X[1 -S X[3] - -- -X[5] X[7] 6 Cos 1 r+151 "r+ 11 * b 5 3 6 5 "r+17 7 7 -sinei __1. Cos Figure 2-9: Flowgraph of the type-I fast LOT, for M = 2V, V = 8 2.1.3 Discrete Wavelet Transform The discrete wavelet transform (DWT) uses the wavelet expansion to represent the signal in different time scales or spatial resolutions. The DWT is a series expansion of a finite length signal. The expansion consists of an approximation and a wavelet or detail. The wavelet expansion can be implemented by a filter bank. The derivation and detailed explanation can be found in [2], [13], and [12]. The filter bank algorithms have been well known for a long time. Therefore, the development cost for the DWT algorithm using the filter bank is practically low. There are two stages for the DWT system. The transform process of the DWT is called an analysis stage. The inverse DWT is called a synthesis stage. The analysis stage separates the input signal into a set of octave-band frequency bands, whereas the synthesis stage reconstructs the signal from DWT coefficients. DWT coefficients 26 0 0 2 1 4 2 6 X[0] 1 X[2] 3 X[4] X[6] - - ---a X[1] - - - - X[5] - X[7] ------ - - IDCT b b b 34 b b b 5 b 5 6 7 7 - b b 0b 0 2 -4 1 -- 4 2 ---4 6 3 1 IDCT b b -0 b 4 -- 4 0i~ 0 -sin 0.- 3 5 5 6 7 7 1 -4 b x r [0] [1] x Xr[2] Xr[3] x r [4] xr[5] x [6] r r x [7] Xr+i[0] x [1 r+1 x [2] r+1 x [3] r+1 x [4] r+1 x [5] r+1 x [6] r+1 x [7] r+1 cos 0 Figure 2-10: Flowgraph of the type-I fast ILOT, for M = 2V, V = 8 can be calculated as follows. The input signal, x[n], is passed through two filters: a low-pass filter and a high-pass filter. Ideally, the cut-off frequency of both filters should be at . Then the output signal of both filters are decimated by a factor of two. To further divide the frequency band, the output of the fist stage can be passed through the same set of filters. However, the distribution of DCT coefficients of a normal image consists of mostly low frequency components. Therefore, only the output of the low-pass filter from the first stage is used as the input to the second stage filter banks. At the second stage, the length of input signal is half of the original length. The process can be repeated many times as long as the length of the input signal is longer than or equal to the length of the longest filter. Furthermore, the computational costs of the DWT increase according to number of levels or stages that the image is passed through. During the synthesis, the synthesis filters undo 27 2 He W 1/2 Block 1 0-E1/2 1E 1/2 DCT B lock 2 LOT of Block 1 ----- -- -----e -E D CT 0 --------. .... -- - 1/2 LOT of Block 3 Z --......... ....... E 0 1/2 --- 0 -E DCT Block M\ LOT of Block 2 -e I :..........1/J2 .. . . . . . . .. Block 3 D CT 1/2 1/2 1/2 2 JHe fV Z Z - LOT LO of o Block M Figure 2-11: Flowgraph of the type-I, fast LOT for a finite length signal The LOT runs from left to right without the factor 2 after He and JHe. The inverse LOT runs from right to left. E is a set of even DCT coefficients and 0 is a set of odd DCT coefficients. the analysis part by combining the highest level first, and then reconstructing the signal in the lower level. The structure of the 2-level analysis and synthesis DWT and the corresponding frequency bands are shown in Figure 2-12 below. In addition, the 2-D DWT is also separable. Therefore, the 2-D DWT can be calculated in each dimension independently, as previously shown in Figure 2-2 above. After the 2-D transformation of the image, the coefficients are organized in a certain way. An example of the organization of the 3-level DWT coefficients is shown in Figure 2-13 below. The DWT coefficients also depend on the type of filter used. There are two kinds of filters: orthogonal and biorthogonal. The orthogonal filters are asymmetric and have 28 Table 2.2: Computational costs of the type-I, fast LOT, M = 2V, V = 8 # # Additions Multiplications single block, 1-D 48 28 12RC - 18(R + C) 7RC - 4(R + C) type-I, fast LOT, M=16 RxC Image 2-D type-I, fast LOT non-linear phase in the frequency response. The non-linear phase creates distortion or artifact in the reconstructed image, which is undesirable for image processing. On the other hand, the biorthogonal filters are symmetric and have linear or zero phase in the frequency domain. Therefore, it is desirable to use linear phase FIR filters for image processing. This thesis used 9/7-tap Villasenor biorthogonal filters [15]. The coefficients of both filters for the low-pass version are shown in Table 2.3 below. Table 2.3: Coefficients of 9/7-tap Villasenor biorthogonal filters Length 9_ Coefficients 0.03828, -0.023849, -0.110624, 0.377402, 0.852699, 0.377402, -0.110624, -0.023849, 0.03828 -0.064539, -0.040689, 0.418092, 0.788486, 0.418092, -0.040689, -0.064539 The orthogonality in the biorthogonal system is preserved by the relationships between the analysis filters and the synthesis filters. The relationships among these four filters are ga[n] = n] (2.13) (-1)nha[1 - n] (2.14) (-1)"hs[1 - and g,[n] = 29 Input Output Signal a9- 2 ha 2-- Signal 2 a D2S 2 C2: 2 s - e s _h Synthesis Analysis jH(w)l 1 D2 C2 pi/4 DI pi/2 pi Figure 2-12: Structure of 2-level, 2-band analysis and synthesis for the 1-D DWT and the frequency band to which the output of each filter bank ga is a high-pass analysis filter. ha is a low-pass analysis filter. g8 is corresponds a high-pass synthesis filter. h, is a low-pass synthesis filter. where g, is a high-pass analysis filter, ha is a low-pass analysis filter, g, is a high-pass synthesis filter, and h, is a low-pass synthesis filter [2]. In this thesis, the DWT algorithm was derived from the WaveLab802 MATLAB package from Stanford university. The programs can be found in [3]. Parts of the programs were translated into C programs to reduce the simulation time. The MATLAB programs were used for verification purpose. The computational costs for the algorithm are given in two types: the circular convolution algorithm and the minimum limit. The fist one is the circular convolution algorithm, which was used in this thesis. The second one is the minimum limit, which has lower computational costs by exploiting the symmetry of the biorthogonal filters. 30 HL3 LL3 HL2 LH3 HH3 HLI HH2 LH2 LHl HHi Figure 2-13: Organization of 3-level DWT coefficients The computations, especially multiplication, can be reduced significantly by reusing the the previous results. Given the filter length, F, only are distinct values 2 for the odd-length symmetric filters. Let the length of the filters be F and F2 , where F = F + F 2 , the computational costs for both cases are summarized in Table 2.4 below. These costs were calculated for the odd-length biorthogonal wavelet filters only. However, the cost of the even-length filters should be similar. 2.2 Quantization The quantization block is used to limit the range of transform coefficients. This block gives the compression gain for the system. However, it introduces loss in the system. There are many quantization schemes used in the lossy image compression. This thesis used four quantizers, listed as follows. 31 Table 2.4: The computational costs of L-level DWT using biorthogonal Define variables Ka = RC(F - 2), Kb= RCF, and filters for image size RxC Ke = RC(F + 2). Thesis Minimum Level Additions Multiplications Additions Multiplications Ka Kb Ka 1 2 Ke 5 5 5 -Ka 5 -Kc 21 -Kb 21 16 16 21 -K 85 85 64 64 1 2 21 3 16 Ka 85 4 L -Kb 4 -Ka 4 Ka 4(1 (1))L - 3 Ka --- 1 4 L (1 3 2 8 Ka 32 85 128 L 4L Kb 3 Ka 21 3 3 K (a) Optimal Uniform Quantizer (b) JPEG Uniform Quantizer (c) Visual Threshold Uniform Quantizer (d) Embedded Zerotree Wavelet Quantizer (EZW) The following sections explain algorithms, advantages, and disadvantages of the above quantizers. It should be noted that certain quantizer, such as the JPEG uniform quantizer, is specifically designed for the block-based DCT coefficients. Some modifications were made in order to adapt these quantizers to work with LOT coefficients. The modifications are explained in more detail in the section of that particular quantizer. 32 2.2.1 Optimal Uniform Quantizer This quantizer is designed for a specific set of transform coefficients in order to minimize the mean square error (MSE). This quantizer is constructed from the statistics of the transform coefficients. Theoretically, it should give the smallest MSE which indicates the best performance according to the PSNR value used in this thesis. This quantizer was used in order to provide a fair comparison among all transform coding schemes because it is not optimally designed for any particular transform. In order to apply this quantizer to the coefficients of DCT, LOT, and DWT, there are three factors to be considered. First, how many quantizers are necessary. Second, which quantizer does each coefficient use. Finally, how to distribute the bit resources for each quantizer. To answer these questions, let us look at the derivation and the organization of transform coefficients. The DCT and the LOT are block transforms whereas the DWT is not. The DCT and the LOT coefficients are arranged in a block of size 8 by 8 in this thesis. Each coefficient in this 8x8 block corresponds to different basis functions in the 2-D transform. Furthermore, the coefficients that are located at the same location in the 8x8 block of different blocks correspond to the same basis function. Therefore, it is reasonable to use the same quantizer for each coefficient of the same basis function. As a result, there are 64 different quantizers for DCT and LOT transform coefficients because there are 64 distinct frequencies. In the case of the DWT, the coefficients are organized in the frequency bands. Similarly, wavelet coefficients in different bands use different quantizers. Given L-level DWT, there are 3L + 1 distinct quantizers. The bit distribution problem can be solved by the rate-distortion theory as described in [1]. Given the average bit rate, B, the bit allocated to the ith quantizer, bi, is bi= o.2 1 B + 1og 2 .k=1 33 (2.15) N, . where ao is the variance of the coefficients in the ith group, and N is the total number of groups, i.e. 64 for the DCT and the LOT. Next, the width of each bin or the quantization level, qi, can be computed by qi = k max,(IAi 2bi (2.16) where max(IAil) is the maximum magnitude of the ith group, bi is the number of bits allocated for the ith group, and k is the scaling factor, which can be used to change the final bin width and adjust the bit rate. This scaling factor does not effect PSNR but enables the simulation to adjust the bit rate close to the desired value. Finally, the quantized coefficients, j, is calculated by dividing each input coefficient by the corresponding quantization level, i.e. a = - Ci qj (2.17) The advantage of using this quantizer is that the quantizer is symmetric around zero, also called flat-zero [1]. This property would benefit the system if the coefficients have zero mean value because the distribution of the coefficients would be symmetrical about zero. There are encoders that can take advantage of this kind of distribution. Details of the coding schemes are presented in the next sections. 2.2.2 JPEG Uniform Quantizer This quantizer is similar to the optimal uniform quantizer but it is specifically designed for block-based DCT coefficients. The JPEG standard recommends the quantization level for each frequency in the transformed block. These levels are derived from experimental results based on of many images and subjects in order to give the best visual quality for a given bit rate [1], [9]. This thesis used the following 34 quantization table for the black & white image. 16 11 Q = 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 59 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 (2.18) 120 101 72 92 95 98 112 100 103 99 The top left hand corner corresponds to the quantization level for the DC component. The bottom right hand corner is the quantization level for the highest frequency component of the transform coefficients in 2-D. This table is the recommended quantization levels for the luminescence component according to the JPEG standard [9]. Although this quantizer is specifically designed for the DCT coefficients, the LOT coefficients might be able to use this quantization matrix because the LOT uses the DCT as part of its building block for bases. Therefore, this thesis compared the performance of the DCT system and the LOT system using this quantizer. The quantized coefficient, c3 , is computed by a division operation as in equation (2.17). 2.2.3 Visual Threshold Wavelet Quantizer This quantizer is intended to preserve the visual quality of the reconstructed image for the DWT coding system. The development of this quantizer was similar to those of the JPEG uniform quantizer. The quantization levels depend on the experimental results of the DWT coefficients. They also depend on the type of wavelet filter and number of levels of the DWT structure. In this thesis, only 9/7-tap biorthogonal filters were used. The DWT coefficients that belong to the same frequency band share the same quantizer. There are three bands at each DWT level with an exception of the coarsest level which has four bands. In other words, given the number of DWT levels, 35 L, there are 3L + 1 different quantizers. The quantization level, QL,O, for each DWT orientation and each DWT level is given as follows. The subscripted '0' refers to the orientation which is the same as frequency band {LL, LH, HL, HH}. k 2 a1O QL,O [log (o )J 2 (2.19) AL,O where " AL,O : the basis function amplitude " a empirical value = 0.495 Sko: empirical value = 0.466 * fo empirical value = 0.401 e go : orientation parameter, " r JHL = 9LH = 1, 9LL 1.501, and gHH= 0.534 output resolution (pixel/degree) " L: number of DWT levels * k scaling factor for adjusting quantization levels and bit rate In the case of the 9/7-tap biorthogonal filters, values of AL,O and QL,O are shown in Table 2.5 and 2.6 respectively. Table 2.5: The basis function amplitudes, AL,O, for 9/7-tap biorthogonal wavelet filters Orientation 1 LL LH HL HH 0.62171 0.672341 0.672341 0.727095 1 2 Level(s) 0.345374 0.413174 0.413174 0.494284 36 0.18004 0.227267 0.227267 0.286881 4 0.0914012 0.117925 0.117925 0.152145 Table 2.6: Quantization levels for 9/7-tap biorthogonal filters resolution is set at 32 pixel/degree. Orientation ____a__ LL LH HL HH _Level(s) 1 14.049 23.028 23.028 58.756 2 11.106 14.685 14.685 28.408 3 11.363 12.707 12.707 19.540 The output 4 14.500 14.156 14.156 17.864 The quantized coefficient is computed by dividing the DWT coefficients with the corresponding quantization level. According to the quantization table given above, the coefficients in the higher frequency band are quantized more heavily than the low frequency coefficients. This property is similar to the quantization for the JPEG system. Models and development processes of this quantizer can be found in [16]. 2.2.4 Embedded Zerotree Wavelet Quantizer The embedded zerotree wavelet (EZW) quantizer combines the bit-plane encoding and the zerotree structure to compress the data. The zerotree structure captures relationships among the magnitude of the DWT coefficients across different levels. The bit-plane part extracts each plane of bits, and the encoder searches for zerotree structures and encodes them into symbols. The combination of the EZW and an adaptive arithmetic coder has been claimed to achieve higher compression than the baseline JPEG algorithm [11]. Due to the time constraint and unavailability of the arithmetic coder, the adaptive Huffman coder was used in this thesis. The implementation of the EZW is described below. More details can be founded in [11]. The terms used in the EZW algorithm are given as follows. parents-descendants The relationship of the spatial location between the DWT coefficients of the different level. Given the parent's coordinate (i, j), L-level DWT, and image size of RxC, positions of the descendant are {(4i, 4j), (4i + 1, 4j), (4i, 4j + 1), (4i + 1, 4j + 1)} if (ij) 37 is not in the LL-band, and {(i + R2 + R + C if (i,j) is in the LL-band, where L is the number of the DWT level. Examples of parent-descendants of 3-level DWT are shown in Figure 2-14. significant: The DWT coefficient X[k] is significant if and only if IX[k] ;> T, where T is the current threshold. insignificant: The DWT coefficient X[k] is insignificant if IX[k]I < T, where T is the current threshold. zerotree root (Z): The DWT coefficient X[k] is a zerotree root if X[k] and all its descendants are insignificant. isolated zero (I): The DWT coefficient X[k] is an isolated zero if X[k] is insignificant and at least one of its descendants is significant. positive (P): The DWT coefficient X[k] is positive if X[k] is significant and positive. negative (N): The DWT coefficient X[k] is negative if X[k] is significant and negative. dominant list: List of DWT coefficient that have not been found significant. subordinate list: List of DWT coefficient that have been found significant. The process of the EZW can be divided into two steps: the dominant pass and the subordinate pass. The dominant pass compares all coefficients in the dominant list with the current threshold and searches for zerotree structure. On the other hand, the subordinate pass transmits the next bit of the coefficients that have been found significant. The coding begins with the dominant pass by selecting the initial threshold, To, such that max(IX[k]I) < 2To. The EZW starts comparing each coefficient from the highest level down to the lowest level. Within the same level, the EZW scans through all frequency bands according to the following order, {LL, HL, LH, HH}. The output symbols are coded as one of the following five symbols, {P, N, I, Z, EOB}. The first four symbols are defined above. The EOB symbol is the end of block 38 .................... .......... e f ............ g h d . . . . . . . . . . . . .. .. .............. ......... j .......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . q ... m P .. . . . . . . . . . . . . . . . . . . . . . . . . .r .................... S V ........................................ ........................................ .... . ..................................... .................................. .. ..................................... .......... .. ..................................... ...... . .......... .............................. w . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. ..................................... V . . . . . . . . . . . . ................. ................. u .. ........... .................... y .z .................... .................... ........................................ x The Figure 2-14: Locations of parent-descendants of the 3-level EZW parent coefficient a, in the LL-band, has b, c, and d as its descendants. The coefficient b is the parent of e, f, g, and h. All of them reside in the HL-band but in the different level. 39 symbol, which is the same as in the JPEG system. It can be used to indicate the termination of the encoding process. If a coefficient is significant, it is removed from the dominant list and put in the subordinate list. Once all coefficients are compared with the current threshold, the subordinate pass sends the next lower resolution bit of all coefficients in the subordinate list. The output of the subordinate pass is a binary bit, 0 or 1. After the subordinate pass finishes, the current threshold is divided by a factor of two. This threshold is used in the next dominant pass. More details about the EZW can be found in [11]. According to the statistics, if the parent is insignificant, it is likely that all its descendants are also insignificant [11]. At high enough threshold, it is likely to find a lot of zerotree root symbols. As a result, the run-length encoder could take advantage of the long streams of zerotree root symbols. Furthermore, there are only seven symbols in the system, namely {P, N, I, Z, EOB, 0, 1}. The adaptive arithmetic encoder may be more efficient due to the small set of symbols [11]. The EZW may achieve high compression, but the zerotree searching and the recomputation of the code table are the bottle neck of the algorithm used in this thesis. Future works can be done to improve the EZW. Another improved version of the EZW is called the set partitioning in hierarchical trees (SPIHT), which can be found in [10]. Due to the time constraint, the SPIHT was exempted from this thesis. 2.3 Entropy Coding The entropy coder translates values or symbols into a string of binary bits. Then the output can be stored or transmitted in a digital format. Three encoders used in this thesis are (a) Huffman (b) Adaptive Huffman (c) Run-length Huffman The following sections describe each encoder briefly. 40 2.3.1 Huffman Coding The Huffman coding is a variable length coder. It computes the minimum bits to represent each symbol according to its probability. This coding method can give the expected code length close to the entropy of the system [1]. In this thesis, the Huffman coder is used for all transform coefficients. For DCT and LOT, there are two Huffman tables, one for the DC coefficients and the other for the AC coefficients. In the case of DWT, one Huffman table is used for the coefficients in the LL-band, whereas a second table is used for the rest of the coefficients. This Huffman table is computed directly from the statistics of the coefficients. Therefore, it is image specific. This coder was used to compare the effect of the transform and the quantizer because it is the common coder for all transforms. 2.3.2 Adaptive Huffman Coding The adaptive Huffman coder is an adaptive version of the Huffman coder. The general concepts are the same as the Huffman coding, but the code table is updated as the coding progresses. This algorithm could improve the compression ability of the Huffman coder because the coder has the updated statistics when the symbol is coded. However, the disadvantage of this algorithm is speed. The adaptiveness requires the re-computation of the code words. Therefore, it slows the system down as mentioned in the EZW section. 2.3.3 Run-Length Huffman Coding The run-length Huffman coder combines the regular Huffman coding with the modifications of the quantized coefficients to achieve higher compression. This coder is the standard coder used in the baseline JPEG. This coder separates the coefficients into two parts: DC and AC. Each type is coded with different methods. The general concepts can be described as follows. The DC coefficients have a very large range. For example, 8-bit data, the 8x8 block DCT can give the DC coefficients in a of [-2047, 2047]. This range requires 12 41 bits to represent. However, the value of DC coefficients of the adjacent blocks are very close to each other because most image are smooth and does not change abruptly. The DC coder part uses the differential pulse code modulation (DPCM) to compute the first difference, and then encodes these differences. The Huffman code word for the DC coefficients can be separated into two parts: the category and the residue. The category determines the range of the magnitude, while the residue determine the sign and the resolutions of the coefficients [1], [9]. The Huffman table for the DC coefficients of the DCT JPEG can be found in [9]. According to the statistics, the quantized AC coefficients of the DCT are likely to be zeroes at high frequencies [1], [9]. The run-length coder counts the maximum number of zeroes before encountering a non-zero coefficient. The scanning method used for the AC coefficients is called zig-zag scan [1], [9]. The zig-zag scan is shown in Figure 2-15 below. The maximum number of zero-run code word is 16. The end-ofblock (EOB) is used if the rest of the scanning coefficients are zeroes. The run-length Huffman table for the AC coefficients can be found in [9]. The JPEG run-length Huffman coder is shown in Figure 2-16 below. This coder is designed for the DCT coefficients. However, the LOT coefficients share similar organizations and properties. Therefore, the run-length coder can be adjusted for the LOT system. Few modifications must be made to accommodate the higher range of the LOT coefficients in both DC and AC coefficients. In the case of the DWT system, the coefficients in the LL-band are treated as the DC coefficients, and the other bands are treated as the AC coefficients. This quantizer was used with the optimal uniform quantizer and the visual threshold quantizer for the DWT system. The code table for the DWT system is computed from the statistic of the coefficients of that transformed image. Although it is not a universal encoder, it provides a method to compare the visual threshold system with the current JPEG system. 42 Figure 2-15: Zig-zag scan of the run-length Huffman coder DC DPCM Category - residue Huffman Coder coefficients MUX AC coefficients Zig-zag Scan Run-length Encoded 3 Bit stream Huffman Coder Figure 2-16: Flowgraph of the run-length Huffman coder 43 Chapter 3 Simulation Methods This chapter explains the simulation and comparison processes of all transform coding systems used in this thesis. The simulation imitated all functional blocks in the systems, which consists of the transform, the quantizer, and the entropy coder. A list of systems used in this thesis is shown in Table 3.1 below. Table 3.1: List of tested transform coding systems 1 2 3 Transform DCT LOT DWT Quantizer Optimal Uniform Optimal Uniform Optimal Uniform Entropy Coder Huffman Huffman Huffman 4 5 6 DCT LOT DWT Optimal Uniform Optimal Uniform Optimal Uniform Run-length Huffman Run-length Huffman Run-length Huffman # 7 DCT JPEG Uniform Run-length Huffman 8 LOT JPEG Uniform Run-length Huffman 9 DWT Visual Threshold Run-length Huffman 10 DWT Embedded Zerotree Wavelet Adaptive Huffman The simulation program can be configured to run all ten systems. Details of each component in the systems can be found in the Chapter 2. However, the modifications were made to make each type of quantizer and entropy coder work with each transform coefficients. Brief descriptions of each system are summarized as follows. 44 (a) The first three systems, (1-3), used the same type of quantizer and the entropy coder. The quantizer and the entropy coder were computed according to the statistics of the coefficients and the quantized coefficients respectively. Although these systems are not practical, they are helpful for investigating the effect of the quantizer and the coder on the PSNR performance. Lists of comparisons are shown in the next sections. (b) The next three systems, (4-6), differed from the first three systems at the encoder block. The thesis used this difference to test effects of the coder on the PSNR. In addition, the run-length Huffman coder for the DCT is used in the baseline JPEG standard as described in [9]. In a case of the LOT system, the code table was expanded to accommodate larger range of coefficients. On the other hand, the run-length coder for the DWT system was computed from the statistics of quantized coefficients because there was no standardized run-length coder for the DWT system. (c) The next three systems, (7-9), used the same type of quantizer and encoder: the universal uniform quantizer and the run-length Huffman coder respectively. Together with the run-length coder, the DCT system imitated the baseline JPEG, while the LOT system attempted to simulate the baseline JPEG-like coder. The DWT system that used the visual threshold quantizer and the run-length coder represented the baseline JPEG-like coder for the wavelet. (d) Finally, the EZW system uses a different approach to compress an image. One systematic comparison between this system and other systems was to use the PSNR and bit rate. This system was compared against the baseline JPEG-like coder to compare the PSNR performance of the last four systems. The system evaluations were divided into three parts: the system complexity, the system performance in terms of the PSNR and bit rate, and the visual quality. Each comparison investigated the effects of one or more functional blocks in the system. Details of each comparison are given in the sections below. 45 3.1 Part I: System Complexity The system complexity is an indicator of the cost of running the system. The system consists of three functional blocks: transform, quantizer, and entropy coder. The complexity of the transform was measured by the number of additions and multiplications. The number of operations include all additions and multiplications for the transform of the whole image because only the DCT and the LOT are block transforms, whereas the DWT is not. The total costs show the amount of processing required for each transform. The complexity of quantizers and encoders were measured by the cost of running the algorithms. In this thesis, emphasis was placed on the complexity of the transform block. The quantizer blocked and the entropy encoder are briefly discussed to complete system comparisons. 3.2 Part II: System Performance This part of the simulation evaluated the performance in terms of bit rate and PSNR. The simulation compared different transform coding systems that differed in the choice of transform, quantizer, and entropy coder. The purpose of this comparison was to evaluate the effect of each component of the system on the PSNR. The PSNR for the 8-bit image is given by PSNR = 10 log1 0 i=N255'2 E(X, - j) Z . (3.1) i=1. where N are the total number of pixels, xi is the original data, and the 'i is the reconstructed signal. The PSNR is related to the mean square error (MSE). Although the PSNR may not represent the visual quality directly, the PSNR is a systematic way to compare the error of the reconstructed signal and can be computed exactly. The system performance attempted to evaluate (a) the effect of number of levels on DWT systems, 46 (b) the effect of transform, (c) the effect of quantizer, (d) the effect of entropy coder, (e) and the overall system performance. In order to evaluate these quantities, the comparisons were divided into multiple sections according to the above list. Details of each set of comparisons are explained below. 3.2.1 Part II-A: Effect of Number of Levels on DWT Systems This part compared how the number of levels of wavelet expansion affected the system performance. Comparisons were done to determine the trade-offs between the computational costs of the DWT transform and the PSNR gain of the DWT system. A list of systems in this part is shown in Table 3.2 below. Table 3.2: List of systems for comparing the effect of number of DWT levels on the PSNR performance Transform 2-level DWT 3-level DWT 4-level DWT 2-level DWT 3-level DWT 4-level DWT 2-level DWT 3-level DWT 4-level DWT 3.2.2 Quantizer Optimal Uniform Optimal Uniform Optimal Uniform Visual Threshold Visual Threshold Visual Threshold EZW EZW EZW Entropy Coding Huffman Huffman Huffman Run-Length Huffman Run-Length Huffman Run-Length Huffman Adaptive Huffman Adaptive Huffman Adaptive Huffman Part II-B: Effect of Transform This part compared the PSNR performance of different transforms by restricting the choice of quantizer and entropy coder. Because of these constraints, the PSNR 47 performance should reflect the effectiveness of each transform in terms of ability to extract necessary information from the image. In this section, the optimal uniform quantizer was used because it was the only common quantizer among three transforms. In addition, all transforms used either Huffman or the run-length Huffman for the coder. As described in the background section, the optimal uniform quantizer gave the best PSNR performance for a given set of coefficients. A list of systems in this section is shown in Table 3.3 below. Table 3.3: List of systems for comparing the effect of transforms on the PSNR performance using the optimal uniform quantizer, the Huffman coder, and the run-length Huffman coder Transform DCT LOT 3-level DWT DCT LOT 3-level DWT 3.2.3 Quantizer Optimal Uniform Optimal Uniform Optimal Uniform Optimal Uniform Optimal Uniform Optimal Uniform Entropy Coder Huffman Huffman Huffman Run-length Huffman Run-length Huffman Run-length Huffman Part II-C: Effect of Quantizer This part compared the effects on the PSNR performance of each quantizer. Comparisons were done by changing the quantizer block while keeping the transform and the quantizer the same. This thesis compared the optimal uniform quantizer with the JPEG uniform quantizer for the DCT and the LOT. In the case of the DWT, the thesis compared the optimal uniform quantizer with the visual threshold uniform quantizer. In all cases, the only common encoder that can be used in all system was the run-length Huffman coder. A list of systems compared is shown in Table 3.4 below. In addition, this thesis compared the simulated baseline DCT JPEG with the actual baseline JPEG using XV version 3.10a in the Unix platform to validate the simulated DCT JPEG system. 48 Table 3.4: List of systems for comparing the effect of quantizers on the PSNR performance using the run-length Huffman coder 3.2.4 Transform DCT Quantizer Optimal Uniform Entropy Coder Run-length Huffman DCT JPEG Uniform Run-length Human LOT LOT 3-level DWT 3-level DWT Optimal Uniform JPEG Uniform Optimal Uniform Visual Threshold Run-length Run-length Run-length Run-length Huffman Huffman Huffman Huffman Part II-D: Effect of Entropy Coder This part compared the effects on the PSNR of entropy coders. Comparisons were done by restricting the choice of transform and quantizer. Given the same symbols to be coded, the output bit rate should reflect the efficiency of the coder. Only two coders were compared in this thesis. They were the regular Huffman coder, and the run-length Huffman coder. The implementation details are in the background chapter. A list of systems compared is shown in Table 3.5 below. Table 3.5: List of systems for comparing the effect of entropy coders on the PSNR performance using Huffman and run-length Huffman coders Transform DCT DCT LOT LOT 3-level DWT 3-level DWT 3.2.5 Quantizer Optimal Uniform Optimal Uniform Optimal Uniform Optimal Uniform Optimal Uniform Optimal Uniform Entropy Coder Huffman Run-length Huffman Huffman Run-length Huffman Huffman Run-length Huffman Part II-E: Overall System Performance This part compared the current JPEG standard which uses the baseline DCT JPEG with the baseline LOT JPEG system, the visual threshold wavelet system, 49 and the EZW system. The comparisons were targeted to test the DWT with the old JPEG standard and determined the trade-offs among four different systems. Four selected systems are shown in Table 3.6 below. Table 3.6: List of selected systems for comparing the PSNR performance Transform DCT LOT 3-level DWT 4-level DWT 3.3 Quantizer JPEG Uniform JPEG Uniform Visual Threshold EZW Entropy Coding Run-Length Huffman Run-Length Huffman Run-Length Huffman Adaptive Huffman Part III: Visual Quality This comparison discussed the visual artifacts of each system. This thesis was not intended to provide the way to evaluate the visual quality of the reconstructed image. On the other hand, the thesis pointed out the dominant artifacts of each system at low bit rates because at high bit rates, 1 bit/pixel or more, the reconstructed image exhibits no artifact. 3.4 Test Images Seven images were used in this simulation to reduce the image dependency of the comparisons. These images are black & white and have a resolution of eight bits. A list of images is shown in Table 3.7 below. 50 Table 3.7: List of test images Image baboon barbara2 bfrag(barbara) boat goldhill lena peppers Dimension (pixels x pixels) 512x512 720x576 512x512 256x256 720x576 512x512 512x512 51 Chapter 4 Results and Discussions This chapter presents the results and discussions of the simulated systems according to the list of system comparisons described in the last chapter. 4.1 Part I: System Complexity The evaluation of the system complexity was done in terms of computational costs of the algorithms. The evaluation was done for all functional blocks: the transform, the quantizer, and the entropy coder. This thesis focused primarily on the transform block. However, the quantizer and the entropy coder were also evaluated to complete the system. 4.1.1 Transform The transform complexities were calculated in terms of number of additions and multiplications. The algorithm used for each transform is presented in the Background chapter. Given a RxC image, the computational costs of each transform are shown in Table 4.1. In the DWT system, F is the sum of the length of both filters, F = F +F 2 . The DWT filters used in this thesis were the 9/7-tap biorthogonal filters. These costs were calculated for the odd-length biorthogonal filters only. However, the costs for the even-length biorthogonal filters should be similar to the odd-length filters. 52 These values in Table 4.1 are the closed form representation of the costs. In order to compare the systems more specifically, the costs of transforming a 512x512 image are summarized in Table 4.2 below. The DCT costs were normalized to 1.00. Table 4.1: Computational costs of transform algorithms # Transform # Additions Multiplications 13RC 2 4RC 12RC - 18(R + C) 7RC - 4(R + C) 8x8 DCT 2 Type-I, fast LOT M=16, V=8 L-level Circular 4(1 - L-level DWT 4(1 - RCF 3 3 Convolution DWT Minimum Cost 4(1- (4))L 2(1 ( ))L 3 RC(F - 2) - (4)L 3 RC(F + 2) Table 4.2: Normalized costs of transforming a 512x512 image Transform DCT LOT Circular Convolution, 2-Level DWT Circular Convolution, 3-Level DWT Circular Convolution, 4-Level DWT Minimum cost, 2-Level DWT Minimum cost, 3-Level DWT Minimum cost, 4-Level DWT # Additions # Multiplications 1.00 1.84 2.69 2.83 2.86 2.69 2.83 2.86 1.00 1.74 5.00 5.25 5.31 2.82 2.95 2.98 According to the normalized cost above, the implementation of the DWT in this thesis was the most costly compared to those of the DCT and the LOT. The costs of the DWT were concentrated on the number of multiplications in the circular convolution process. Compared to the minimum cost DWT, the algorithm used in this thesis can be greatly improved. One solution is to take advantage of the symmetry 53 of the biorthogonal filters. The number of multiplications can be reduced to almost one-half of the original cost by keeping a record of the previous multiplication results. There are also other algorithms that can reduce the computational costs of the DWT coefficients such as the lifting algorithm [2]. On the other hand, the costs of the LOT were only about 75-85% more than those of the DCT system. Another aspect of the algorithm comparisons was the similarity of the algorithm. The LOT algorithm reuses the DCT block transform algorithm. Therefore, the cost of implementing the LOT is left to the butterfly structure for the overlapped parts. On the contrary, the DWT algorithm is completely different from the DCT algorithm. It does not use the block transform. As a result, the cost of changing from the DCT to the DWT is more expensive than the cost of changing from the DCT to the LOT for the transform block. In addition, the entropy coder may need modifications. Effectively, the whole system must be changed. The costs of changing to another system may or may not be desirable for practical applications. The change may be made if the improvement after changing can justify the cost. The effects of the other parts of the system are presented in the following sections. 4.1.2 Quantization The complexity of the quantizer can be compared in a similar manner as the transform comparison. Evaluations of the quantizers were done in terms of algorithm implementation such as the cost of the algorithm. In this thesis, all quantizers can be classified into three groups as shown below. (a) Optimal uniform (b) Universal uniform: JPEG uniform quantizer and visual threshold (c) Embedded zerotree wavelet The optimal uniform quantizer was used to compare the system from the theoretical perspective only. This quantizer gives a fair comparison of different transforms because it minimizes the signal-to-noise ratio (SNR) of the quantized coefficients of 54 a particular image. It is also useful because only the DCT system has the standard quantizer. To develop the quantizer for the LOT system and the DWT system, further research must be done separately and may take some period of time. Therefore, this optimal quantizer gives a fast way to compare the system fairly. Furthermore, this quantizer uses a recursive algorithm to compute the quantization levels, which is more computational expensive than other types of quantizers. Therefore, this quantizer may not be attractive for practical systems. On the other hand, the universal uniform quantizer uses a predefined set of quantization levels for all images. These levels are usually tested and verified to give good visual quality. The advantage of this type of quantizer is that quantizer and the dequantizer can perform their functions properly regardless of the image as long as they both agree on the quantization levels. The calculation of the quantized coefficient of the uniform quantizer, both the optimal one and the universal one, involves only a division operation, which can be implemented in either hardware or software. Finally, the EZW quantizer combines the bit-plane coding and the zerotree searching together in an efficient way. The compression is achieved by coding the tree structure as one symbol instead of coding the elements of the tree individually. Another advantage of the EZW quantizer is that the EZW algorithm can achieve a desired bit rate exactly, which might be useful for certain applications. Despite these advantages, the zerotree search algorithm used in this thesis was repetitive and slow because once the zerotree is found for some threshold, the whole tree was tested against different thresholds until all elements in the tree were found significant or the bit budget was exhausted. Improvements are needed for the version of EZW in this thesis in order to compete with other quantizers and coders. Some algorithms have been experimented with to optimize the zerotree searching. For example, the set partitioning in hierarchical trees (SPIHT) might be another alternative to the DWT system [10]. 4.1.3 Entropy Coding The entropy coder allows the computation of the bit rate of each system. This thesis did not emphasize in choosing a particular type of coder. In order to complete 55 the system, the comparisons of different coders used in this thesis is briefly mentioned below. There are three coders used in this thesis: Huffman, adaptive Huffman, and the run-length Huffman. Compared to the Huffman coder, the adaptive Huffman coder is more costly in terms of computation. The adaptive Huffman coder required recomputation of the code words at certain rates. The update of the code table increased the overhead of the EZW algorithms used in this thesis. Nevertheless, the adaptiveness of the coder may be locally optimized to code particular regions of the image. On the other hand, the run-length Huffman is easy to implement. This coder changes the definition of the symbols from the single symbol to strings of symbols. This coder is particularly useful for the system with a long string of numbers or characters, i.e. zeros in the case of DCT JPEG, and the LOT JPEG. In the case of the DWT, the size of each band varies according to the DWT level. Therefore, the DWT system may require multiple sets of the Huffman tables in order to efficiently compress the data. Future study may be done to find a universal run-length coder for the visual threshold system and the EZW system. 4.2 Part II: System Performance The performance comparison of all systems was done in terms of the PSNR and the bit rate. The comparisons were divided into five parts as mentioned in the previous chapter. The results and discussions of those five parts are presented below. 4.2.1 Part II-A: Effect of Number Levels on DWT Systems As previously described in the background section, the DWT coefficients are effected by the number of DWT level in the system. Specifically, the LL band of the system changes the size and the value according to the number of filter bank stages. These comparisons investigated the effect of the number of level on the PSNR performance for different combinations of quantizers and entropy coders. Furthermore, 56 these comparisons were used to find the optimal level for the DWT systems, which were used to compare with the DCT and the LOT systems in the subsequent comparisons. There were three combinations of the quantizer and the entropy coders used in these comparisons. The systems are listed in Table 4.3 below. The optimal uniform quantizer used the scaling factor of 0.35, and the run-length Huffman coder limited the maximum category size to 11 and the maximum run size to 15. This coder is similar to the baseline JPEG run-length Huffman coder. Table 4.3: List of systems for comparing the effect of DWT levels on the PSNR performance of the DWT systems Transform 2-Level DWT 3-Level DWT 4-Level DWT 2-Level DWT 3-Level DWT 4-Level DWT 2-Level DWT 3-Level DWT 4-Level DWT Quantizer Optimal Uniform Optimal Uniform Optimal Uniform Visual Threshold Visual Threshold Visual Threshold EZW EZW EZW Entropy Coder Run-Length Huffman Run-Length Huffman Run-Length Huffman Run-Length Huffman Run-Length Huffman Run-Length Huffman Adaptive Huffman Adaptive Huffman Adaptive Huffman The results of the simulation on all seven images are summarized in Table 4.4 4.9 below. The average bit rate presented in the table were the following set, {0.50, 0.75, 1.00, 1.25, 1.50, 1.75}. The PSNR values and the average bit rates were linearly interpolated to achieve desired value because the system simulation could not fix the exact bit rate with the exception of the EZW system. The graph of the EZW systems for some images are shown in Figure 4-1 below. According to the results, the optimal uniform quantizer and the visual threshold quantizer were unaffected by the number of DWT levels. In fact, as the output resolution of the visual threshold was varied from 32 to 16 or 64 pixels per degree, the PSNR performance remained unaffected. On the other hand, the PSNR performance of the EZW system was significantly improved as the number of levels was changed from two to three. The EZW improved the PSNR about 0.5 to 4 dB as the level was 57 Table 4.4: Effect of DWT levels on the PSNR performance of the DWT systems using the optimal uniform quantizer and the run-length Huffman coder Image baboon barbara2 bfrag btfrag Bit Rate 2 Levels PSNR (dB) 3 Levels APSNR (dB) 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 (23.35) (25.95) (27.21) (28.86) (30.11) (31.63) (27.82) (30.85) (32.94) (34.99) (36.35) (37.67) (28.08) (32.06) (33.97) (35.13) (37.66) (38.97) (27.75) (30.79) (33.53) (35.46) (38.51) (39.94) 0.65 0.21 0.30 0.09 0.23 0.31 1.43 0.54 0.94 0.41 0.42 0.39 1.33 0.68 0.34 0.99 0.42 0.68 0.97 0.72 0.66 0.29 0.18 0.18 58 4 Levels APSNR (dB) 0.65 0.21 0.26 0.10 0.22 0.22 1.56 0.63 1.00 0.49 0.49 0.45 1.46 0.76 0.38 0.98 0.48 0.65 0.98 0.88 0.64 0.23 0.17 0.08 Table 4.5: Effect of DWT levels on the PSNR performance of the DWT systems using the optimal uniform quantizer and the run-length Huffman coder (cont.) Image goidhill lena peppers Bit Rate 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 12 Levels PSNR (dB) (31.09) (33.79) (35.47) (36.93) (37.76) (39.23) (33.36) (36.27) (38.19) (39.38) (40.65) (41.66) (31.94) 35.44 36.59 (37.76) (38.76) (40.04) 59 3 Levels APSNR (dB) 0.72 0.46 0.49 0.19 0.53 0.20 1.61 0.85 0.31 0.26 0.11 0.47 1.80 0.35 0.27 0.12 0.21 0.00 4 Levels APSNR (dB) 0.85 0.53 0.47 0.27 0.53 0.10 1.67 0.90 0.33 0.20 0.16 0.58 1.91 0.37 0.29 0.14 0.24 -0.93 Table 4.6: Effect of DWT levels on the PSNR performance of the DWT systems using the visual threshold quantizer and the run-length Huffman coder Image baboon barbara2 bfrag btfrag g B2 Bit Rate 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 Levels PSNR (dB) 3 Levels APSNR (dB) (23.80) (25.75) (27.23) (28.59) (29.75) (30.97) (29.02) (31.53) (33.71) (35.35) (36.70) (37.84) (28.83) (31.91) (33.94) (35.78) (37.27) (38.92) (28.22) (31.60) (34.48) (36.65) (38.35) (37.75) 0.20 0.18 0.11 0.13 0.06 0.06 0.42 0.39 0.25 0.27 0.11 0.28 0.59 0.50 0.33 0.22 0.34 0.14 0.24 0.32 0.07 0.16 0.13 0.20 60 4 Levels APSNR (dB) 0.22 0.17 0.10 0.12 0.07 0.05 0.44 0.41 0.27 0.28 0.13 0.31 0.68 0.50 0.28 0.18 0.37 0.09 0.20 0.28 0.07 0.21 -0.04 0.13 Table 4.7: Effect of DWT levels on the PSNR performance of the DWT systems using the visual threshold quantizer and the run-length Huffman coder (cont.) Image goidhill lena peppers Bit Rate 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 12 Levels PSNR (dB) (32.30) (34.26) (35.73) (37.05) (38.27) (39.22) (34.64) (36.86) (38.26) (39.37) (40.64) (41.77) (33.97) (35.63) (36.82) (37.71) (38.50) (39.34) 61 3 Levels APSNR (dB) 0.13 0.17 0.11 0.16 0.07 0.10 0.32 0.19 0.22 0.31 0.14 0.17 0.24 0.13 -0.03 0.05 -0.03 0.15 4 Levels APSNR (dB) 0.13 0.14 0.14 0.09 -0.06 0.08 0.32 0.22 0.19 0.28 0.09 0.11 0.19 0.05 -0.05 0.03 -0.05 -0.01 Table 4.8: Effect of DWT levels on the PSNR performance of the DWT systems using the EZW quantizer and the adaptive Huffman coder Image baboon barbara2 bfrag btfrag Bit Rate 2 Levels PSNR (dB) 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 (22.20) (23.73) (24.81) (26.99) (27.70) (28.55) (24.57) (27.05) (28.76) (30.30) (32.68) (34.50) (23.95) (26.93) (28.72) (31.29) (33.05) (35.21) (26.36) (28.48) (31.06) (32.72) (35.09) (36.09) 62 3 Levels APSNR (dB) 1.20 1.01 2.04 0.65 0.77 1.15 3.02 3.20 3.28 3.30 2.53 2.37 3.32 3.65 3.55 3.59 2.93 2.53 2.69 2.99 3.52 2.80 2.03 3.30 4 Levels APSNR (dB) 1.68 1.25 2.19 0.77 0.99 1.56 3.94 3.84 4.70 3.60 3.03 2.99 4.54 4.21 4.97 3.95 3.43 3.14 3.69 3.55 3.77 3.14 2.73 3.46 Table 4.9: Effect of DWT levels on the PSNR performance of the DWT systems using the EZW quantizer and the adaptive Huffman coder (cont.) 2 Levels ImageImaeBit Bt Rate RtePSNR (dB) goldhill lena peppers 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 (27.44) (29.79) (31.48) (32.95) (34.55) (35.50) (28.50) (31.67) (33.92) (35.91) (37.01) (38.59) (26.40) (30.02) (32.60) (34.09) (35.64) (36.34) 63 3 Levels APSNR (dB) 4 Levels ZPSNR A (dB) 3.60 3.18 2.98 2.56 2.26 2.79 4.69 4.19 3.21 2.91 2.63 1.63 3.60 3.18 2.98 2.56 2.26 2.79 4.38 4.24 3.41 3.01 3.08 3.01 6.02 4.72 3.93 3.29 2.89 1.92 4.40 4.24 3.41 3.01 3.08 3.01 barbara2.raw baboon.raw 40 40- -30 30 z z I20 1u a) n.20 0.5 0 40 1.5 1 bfrag. raw . . . . . . . . . . .... . . . . . . . . 10 2 S 0.5 . . -. . . . . . .-.. -. - . . . . ... . . . . z z a) 0-20 10 30 . . . . . . . . . . . . . . . .. .. . . . . .. . . 1.5 40 . . . . . . . . . ... - . .. . .-.. 30 1 goldhill. raw (n .. . . . . . .... . . . . . . . . c20 - . . . . . . . . . ... . . . . . . . . . - 0.5 0 1.5 1 10 2 S 40 40 ' 30 30 - a) - z .. -.-. .-. 1.5 1 2 1.5 1 0.5 Average Bit Rate (bit per pixel) ---2-Level I. 20 -. 0.20 10, 0 0.5 peppers.raw lena.raw z . . . . . . ... . . . . . .. . . .-.. 10 2 0 9- 3-Level -.-. 4-Level 1.5 1 0.5 Average Bit Rate (bit per pixel) 2 Figure 4-1: Effect of number of DWT levels on the PSNR performance of the DWT systems using the EZW quantizer and the adaptive Huffman coder 64 changed from two to three, and gained about 1 dB as the level was changed from three to four. This result clearly indicates that the number of levels affects the PSNR performance of the EZW system, but does not effect the DWT system with the uniform quantizer. Therefore, the number of DWT levels can be limited to three levels for the systems that use the uniform quantizer such as the optimal uniform quantizer or the visual threshold. The EZW may gain advantage over other systems using four or higher levels wavelet expansions. However, the trade-offs must be considered between the cost to compute the additional DWT level and the gain in the image quality. 4.2.2 Part II-B: Effect of Transform This section compared the effect of the choice of the transform block on the PSNR performance. Comparisons were done by restricting the choice of quantizer and entropy coder. Because the only difference in the system is the transform block, the differences in the PSNR should be mainly influenced by the transform block. The list of systems tested is shown in Table 4.10. Table 4.10: List of systems for comparing the effect of transforms on the PSNR performance Transform DCT LOT 3-Level DWT DCT LOT 3-Level DWT Quantizer Optimal Uniform Optimal Uniform Optimal Uniform Optimal Uniform Optimal Uniform Optimal Uniform Entropy Coder Huffman Huffman Huffman Run-Length Huffman Run-Length Huffman Run-Length Huffman The DWT system with the optimal uniform quantizer used 3 levels because the results from the previous section showed that there was no significant improvement by increasing the DWT level to four. The optimal uniform used the scaling factor of 0.35 for all transforms. Finally, the run-length Huffman table for the DWT system was generated from the statistics of the quantized coefficients because there is no 65 standard coder for the DWT system. The PSNR performance of all systems listed above are summarized in Table 4.11 - 4.14 below. Table 4.11: Effect of transforms on the PSNR performance using the optimal uniform quantizer and the Huffman coder (23.66) (25.29) (26.82) (28.20) (29.53) LOT APSNR (dB) 0.31 0.32 0.32 0.30 0.25 DWT APSNR (dB) -0.20 -0.17 0.14 -0.06 -0.03 1.75 (30.82) 0.22 -0.12 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 (28.05) (30.47) (32.25) (33.88) (35.48) (37.00) (27.16) (30.00) (32.70) (35.17) (37.02) (38.51) (28.79) (31.56) (33.92) (35.82) (37.43) (38.92) 0.70 0.47 0.59 0.61 0.59 0.49 0.98 1.23 1.25 0.80 0.54 0.38 0.55 0.67 0.59 0.42 0.47 0.28 -0.96 -0.75 -0.48 -0.80 -0.71 -1.51 -0.11 0.28 0.55 -0.55 -1.34 -1.75 0.16 -0.67 -0.87 -0.94 -1.24 -0.94 Image Bit Rate DCT PSNR (dB) baboon 0.50 0.75 1.00 1.25 1.50 barbara2 bfrag btfrag According to the results in Table 4.11 - 4.14, when the optimal uniform quantizer and the Huffman coder were used, the LOT performed the best of three transforms across all images. The PSNR gain was as high as 1 dB in some cases for the LOT compared to the DCT. On the other hand, the 3-level DWT system had lower PSNR than the other two systems. The PSNR of the DWT systems could decrease down to -1.75 dB compared to the DCT systems in some images. 66 Table 4.12: Effect of transforms on the PSNR performance using the optimal uniform quantizer and the Huffman coder (cont.) Image goidhill lena peppers Bit Rate DCT PSNR (dB) LOT APSNR (dB) 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 (31.48) (33.27) (34.73) (36.04) (37.28) (38.54) (32.71) (35.00) (36.61) (37.81) (39.00) (39.99) (31.89) (33.81) (35.05) (36.02) (36.91) (37.84) 0.41 0.37 0.29 0.33 0.33 0.20 0.59 0.44 0.22 0.31 0.31 0.27 0.41 0.26 0.26 0.23 0.31 0.27 67 DWT APSNR (dB) -0.40 -1.21 -1.69 -0.93 -0.47 -1.16 0.15 -0.54 -0.33 -0.70 -0.47 -0.77 -0.32 -1.07 -0.96 -0.21 -0.65 0.06 Table 4.13: Effect of transforms on the PSNR performance using the optimal uniform quantizer and the run-length Huffman coder Image baboon barbara2 bfrag btfrag Bit Bi Rate DCT PSNR (dB) LOT APSNR (dB) 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 (23.28) (25.01) (26.69) (28.16) (29.50) (30.80) (28.72) (31.29) (33.48) (35.31) (36.79) (38.07) (27.99) (31.82) (34.63) (36.64) (38.29) (39.60) (29.16) (32.08) (34.97) (36.96) (38.72) (40.12) 0.17 0.23 0.21 0.14 0.07 0.18 0.49 0.48 0.48 0.25 0.30 0.11 1.17 1.00 0.65 0.59 0.35 0.21 -1.46 -1.06 -0.99 -1.04 -0.76 -0.55 68 DWT APSNR (dB) 0.73 1.15 0.81 0.80 0.83 1.14 0.52 0.10 0.40 0.09 -0.02 -0.01 1.42 0.92 -0.32 -0.52 -0.21 0.05 -0.44 -0.57 -0.78 -1.21 -0.03 0.00 Table 4.14: Effect of transforms on the PSNR performance using the optimal uniform quantizer and the run-length Huffman coder (cont.) DCT Bit Rate Image ImgeBi RtePSNR (dB) (32.06) 0.50 0.75 (34.10) 1.00 (35.51) goidhill 1.25 (36.77) 1.50 (37.91) 1.75 (38.80) 0.50 (34.26) 0.75 (36.55) lena 1.00 (37.97) 1.25 (38.88) 1.50 (39.76) 1.75 (40.68) 0.50 (33.54) (32.30) 0.75 1.00 (35.71) peppers 1.25 (36.07) 1.50 (37.10) 1.75 (38.28) 69 LOT APSNR (dB) 0.21 0.22 0.21 0.20 0.07 0.20 0.08 0.07 -0.05 0.07 0.15 0.18 -0.29 -0.32 -0.35 -0.12 -0.18 0.00 DWT APSNR (dB) -0.24 0.16 0.45 0.35 0.39 0.63 0.72 0.56 0.53 0.76 1.00 1.45 0.19 0.49 1.15 1.81 1.87 1.76 As the coder was changed to the run-length Huffman coder, the PSNR results of all transforms were comparable. The LOT system gave a slightly higher PSNR, up to 0.5 dB in some cases, than the DCT system. For btfrag image and peppers image, the DCT system performed slightly better in the PSNR than the LOT system, about 1 dB in the case of btfrag image. In the case of the 3-level DWT system, it performed better than the DCT system in all images, except for btfrag image. The performance gain of the DWT system could go as high as 1.80 dB when applied to peppers image. This comparison shows a small positive gain in PSNR of the LOT system with the optimal uniform quantizer in comparison with the DCT system in most cases. On the other hand, the performance of the DWT depends on the type of encoder. The gain in the LOT system is too small to confirm that the gain is obtained from the transform block. In the case of the DWT system, the gain clearly does not depend on the transform block. Two comparison results above suggest that the DWT system can gain the PSNR performance at least from the entropy encoder block. The influence of the quantizer block and the entropy coder block is shown in the next comparisons. 4.2.3 Part II-C: Effect of Quantizer The following comparisons investigated the effect of the choice of the quantizer on the PSNR performance . Specifically, the comparison focused on the optimal quantizer and the universal quantizer because practical systems usually uses the universal quantizer. By constraining the choice of the transform and the entropy coder, the comparison isolated the effect on PSNR due to the quantizer alone. The systems tested are listed in Table 4.15. The comparison results are summarized in Table 4.16 - 4.21 below. The optimal uniform quantizer used the scaling factor of 0.35 for all systems. The visual threshold quantizer used in the DWT system used the output resolution of 32 pixels/degree . The output resolution did not significantly change the PSNR performance. This thesis had experimented with 16 pixels/degree and 64 pixels/degree and their results were almost the same as the system that used 32 pixels/degree. Therefore, the system with the output resolution of 32 pixels/degree was used to represent 70 Table 4.15: List of systems for comparing the effect of quantizers on the PSNR performance using the run-length Huffman coder Transform DCT DCT LOT LOT 3-Level DWT 3-Level DWT Quantizer Optimal Uniform JPEG Uniform Optimal Uniform JPEG Uniform Optimal Uniform Visual Threshold Entropy Run-Length Run-Length Run-Length Run-Length Run-Length Run-Length Coder Huffman Huffman Huffman Huffman Huffman Huffman the visual threshold system in all comparisons. The EZW system were not compared in this section because the encoder used in the EZW system were not the same as the visual threshold system. However, the comparison was made at the system level in the later section. According to the results, the optimal uniform quantizer systems had better PSNR than the universal uniform quantizer across most images as expected. However, the relative performance between two quantizers was not significantly different for all transform, about 0.5 dB. As a result, the universal uniform quantizer can be used in a practical system in place of the optimal uniform quantizer without losing much of the PSNR performance. 4.2.4 Part II-D: Effect of Entropy Coder The comparisons of the effect of the choice of entropy coder on the PSNR performance were done by choosing different encoding schemes after the quantizer output. The compared coders used in this thesis were the Huffman coder and the run-length Huffman coder. The complete list of the tested systems is shown in Table 4.22. The results of the compared systems are summarized in Table 4.23 - 4.28. The optimal uniform quantizer in all systems used the scaling factor of 0.35. According to the results, the run-length coder gave slightly better PSNR performance than the DCT system and the LOT system with the optimal uniform coder. The PSNR gain was about 0.5-1.0 dB. On the other hand, the visual threshold wavelet 71 Table 4.16: Effect of quantizers on the PSNR performance of the DCT systems using the run-length Huffman coder Image baboon barbara2 bfrag btfrag Bit Rate Optimal Uniform PSNR (dB) 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 (23.28) (25.01) (26.69) (28.16) (29.50) (30.80) (28.72) (31.29) (33.48) (35.31) (36.79) (38.07) (27.99) (31.82) (34.63) (36.64) (38.29) (39.60) (29.16) (32.08) (34.97) (36.96) (38.72) (40.12) 72 JPEG Uniform APSNR (dB) 0.38 0.17 -0.29 -0.61 -0.85 -1.02 0.25 0.00 -0.44 -0.78 -0.91 -0.94 0.19 -0.54 -0.86 -0.90 -0.93 -0.85 0.36 0.27 -0.36 -0.48 -0.65 -0.66 Table 4.17: Effect of quantizers on the PSNR performance of the DCT systems using the run-length Huffman coder (cont.) Image goidhill lena peppers Bit Rate Optimal Uniform PSNR (dB) 7f 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 (32.06) (34.10) (35.51) (36.77) (37.91) (38.80) (34.26) (36.55) (37.97) (38.88) (39.76) (40.68) (33.54) (35.30) (35.71) (36.07) (37.10) (38.28) 73 JPEG Uniform zPSNR (dB) 0.29 0.04 0.04 0.08 -0.17 -0.17 0.29 -0.09 -0.20 0.01 0.00 -0.05 0.24 -0.03 0.57 0.98 0.65 0.11 Table 4.18: Effect of quantizers on the PSNR performance of the LOT systems using the run-length Huffman coder Image baboon barbara2 bfrag btfrag Bit Rate Optimal Uniform PSNR (dB) 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 (23.45) (25.24) (26.90) (28.30) (29.58) (30.98) (29.21) (31.77) (33.96) (35.56) (37.09) (38.18) (29.15) (32.83) (35.28) (37.64) (38.64) (39.81) (27.70) (31.02) (33.98) (35.92) (37.96) (39.58) 74 JPEG Uniform APSNR (dB) 0.52 0.24 -0.19 -0.44 -0.61 -0.94 0.48 0.08 -0.38 -0.49 -0.70 -0.61 0.51 -0.22 -0.47 -0.62 -0.56 -0.50 2.17 1.63 0.83 0.73 0.22 -0.05 Table 4.19: Effect of quantizers on the PSNR performance of the LOT systems using the run-length Huffman coder (cont.) Image goidhill lena peppers Bit Rate 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 Optimal Uniform PSNR (dB) (32.26) (34.33) (35.72) (36.97) (37.98) (39.00) (34.34) (36.62) (37.92) (38.96) (39.91) (40.86) (33.25) (34.98) (35.36) (35.95) (36.93) (38.29) 75 JPEG Uniform APSNR (dB) 0.45 0.16 0.10 -0.01 -0.03 -0.14 0.58 0.07 0.02 0.05 0.02 -0.07 0.65 0.33 0.93 1.12 0.85 0.17 Table 4.20: Effect of quantizers on the PSNR performance of the DWT systems using the run-length Huffman coder Image baboon barbara2 bfrag btfrag Bit Rate Optimal Uniform PSNR (dB) 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 (24.00) (26.16) (27.51) (28.96) (30.34) (31.94) (29.24) (31.39) (3388) (35.40) (36.77) (38.06) (29.41) (32.74) (34.31) (36.12) (38.08) (39.65) (28.72) (31.51) (34.19) (35.75) (38.69) (40.12) 76 Visual Threshold APSNR (dB) 0.00 -0.23 -0.17 -0.24 -0.53 -0.91 0.21 0.53 0.08 0.21 0.04 0.06 0.02 -0.33 -0.04 -0.12 -0.47 -0.59 -0.26 0.41 0.35 1.06 -0.22 -0.18 Table 4.21: Effect of quantizers on the PSNR performance of the DWT systems using the run-length Huffman coder (cont.) Image goidhill lena peppers Bit Rate Optimal PSNRUniform (dB) 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 (31.82) (34.26) (35.96) (37.12) (38.30) (39.43) (34.98) (37.12) (38.50) (39.64) (40.75) (42.13) (33.73) (35.79) (36.86) (37.88) (38.97) (40.05) 77 Visual Threshold APSNR (dB) 0.63 0.17 -0.12 0.09 0.04 -0.12 -0.02 -0.06 -0.03 0.04 0.03 -0.36 0.48 -0.04 -0.06 -0.12 -0.50 -0.55 Table 4.22: List of systems for comparing the effect of entropy coders on the PSNR performance using the optimal uniform quantizer Transform DCT DCT LOT LOT 3-level DWT 3-level DWT Quantizer Optimal Uniform Optimal Uniform Optimal Uniform Optimal Uniform Optimal Uniform Optimal Uniform Entropy Coder Huffman Run-Length Huffman Huffman Run-Length Huffman Huffman Run-Length Huffman system gained about 2-3 dB above the DWT system with the Huffman coder across most images. The comparisons in the previous section showed that the choice of the optimal uniform quantizer and the universal uniform quantizer had very little effect on the PSNR performance. Therefore, these result should remain valid for the systems that use the universal uniform quantizer as well. The results suggests that the DWT system can gain significant increase in PSNR when using the run-length Huffman coder with the universal uniform quantizer. Although this simulation computed the run-length Huffman table for the DWT system from the statistics of the coefficients, the optimized universal run-length coder should give similar or slightly less PSNR performance for the DWT system with the uniform quantizer. The actual system of the visual threshold wavelet and the run-length coder may achieve at least the same PSNR performance as the current baseline JPEG system. 4.2.5 Part II-E: Overall System Performance This section compared the overall performance of the system in general. All transforms were allowed to be combined with any choice of the quantizer and the entropy coder to optimize their performance. According to the earlier comparisons, the universal uniform quantizer gave similar PSNR performance to the optimal uniform quantizer for all transforms. In addition, the universal quantizer represented the practical system, which was the main focus of most applications. As a result, we decided to exclude the system with optimal uniform quantizer from this comparison. 78 Table 4.23: Effect of entropy coders on the PSNR performance of the DCT systems using the optimal uniform quantizer Image baboon barbara2 bfrag btfrag Bit Rate 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 Huffman PSNR (dB) (23.66) (25.29) (26.82) (28.20) (29.53) (30.82) (28.05) (30.47) (32.25) (33.88) (35.48) (37.00) (27.16) (30.00) (32.70) (35.17) (37.03) (38.51) (28.79) (31.56) (33.91) (35.82) (37.43) (38.92) 79 Run-Length Huffman APSNR (dB) -0.38 -0.28 -0.12 -0.04 -0.03 -0.01 0.67 0.82 1.23 1.43 1.32 1.07 0.83 1.82 1.93 1.46 1.26 1.10 0.37 0.51 1.05 1.14 1.29 1.20 Table 4.24: Effect of entropy coders on the PSNR performance of the DCT systems using the optimal uniform quantizer (cont.) Image goidhill lena peppers Bit Rate 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 Huffman PSNR (dB) (31.48) (33.27) (34.73) (36.04) (37.28) (38.54) (32.71) (35.00) (36.61) (37.81) (39.00) (39.99) (31.89) (33.81) (35.05) (36.02) (36.91) (37.84) 80 Run-Length Huffman APSNR(dB) 0.57 0.83 0.78 0.73 0.63 0.26 1.55 1.55 1.36 1.07 0.76 0.69 1.66 1.49 0.65 0.05 0.20 0.45 Table 4.25: Effect of entropy coders on the PSNR performance of the LOT systems using the optimal uniform quantizer Image Bit Rate 0.50 0.75 baboon barbara2 bfrag btfrag 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 Huffman PSNR (dB) (23.98) (25.61) (27.13) (28.49) (29.78) (31.03) (28.74) (30.95) (32.84) (34.50) (36.07) (37.49) (28.14) (31.23) (33.96) (35.97) (37.57) (38.89) (29.34) (32.23) (34.51) (36.24) (37.90) (39.20) 81 Run-Length Huffman APSNR (dB) -0.52 -0.37 -0.23 -0.19 -0.20 -0.05 0.47 0.82 1.12 1.06 1.02 0.70 1.01 1.59 1.32 1.25 1.07 0.92 -1.63 -1.21 -0.53 -0.32 0.06 0.37 Table 4.26: Effect of entropy coders on the PSNR performance of the LOT systems using the optimal uniform quantizer (cont.) Huffman Image goidhill lena peppers Bit Rate 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 PSNR (dB) (31.89) (33.63) (35.02) (36.37) (37.61) (38.75) (33.30) (35.44) (36.82) (38.12) (39.31) (40.26) (32.29) (34.07) (35.31) (36.24) (37.22) (38.11) 82 Run-Length Huffman APSNR(dB) 0.37 0.69 0.70 0.60 0.37 0.26 1.04 1.18 1.10 0.84 0.59 0.60 0.96 0.90 0.04 -0.29 -0.29 0.18 Table 4.27: Effect of entropy coders on the PSNR performance of the DWT systems using the optimal uniform quantizer Image baboon barbara2 bfrag btfrag Bit Rate 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 Huffman PSNR (dB) (23.47) (25.12) (26.96) (28.13) (29.50) (30.70) (27.09) (29.72) (31.77) (33.08) (34.77) (35.49) (27.04) (30.28) (33.26) (34.62) (35.69) (36.75) (28.95) (30.90) (33.04) (34.88) (36.19) (37.98) 83 Run-Length Huffman APSNR (dB) 0.54 1.04 0.54 0.82 0.83 1.24 2.15 1.67 2.12 2.32 2.00 2.57 2.36 2.46 1.05 1.50 2.39 2.90 -0.23 -0.61 1.15 0.87 2.50 2.14 Table 4.28: Effect of entropy coders on the PSNR performance of the DWT systems using the optimal uniform quantizer (cont.) Huffman Image goidhill lena peppers Bit Rate 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 IRun-Length Huffman PSNR (dB) (31.12) (32.06) (33.04) (35.11) (36.81) (37.39) (32.86) (34.46) (36.28) (37.11) (38.53) (39.22) (31.57) (32.75) (34.09) (35.81) (36.26) (37.89) 84 APSNR(dB) 0.70 2.20 2.92 2.01 1.49 2.05 2.12 2.65 2.23 2.53 2.23 2.91 2.16 3.04 2.76 2.07 2.71 2.15 The list of the compared systems in this section is shown in Table 4.29 below. Table 4.29: List of selected systems for the comparison in PSNR performance Transform DCT LOT 3-Level DWT 4-Level DWT Quantizer JPEG Uniform JPEG Uniform Visual Threshold EZW Entropy Coder Run-Length Huffman Run-Length Huffman Run-Length Huffman Adaptive Huffman The DCT system was the simulated baseline JPEG system which is one of the JPEG standards for lossy image compression. The LOT system in this comparison was similar to the baseline JPEG. The visual threshold system is the JPEG-like system for the DWT. The output resolution of this system was set to 32 pixels/degree. According to the earlier comparison in this thesis, the EZW system used the 4-level wavelet expansion because this number of levels increased the compression ability and PSNR performance of the system. The results are summarized in Table 4.30 and 4.31 below. According to the results, the baseline LOT JPEG-like system performed better than the DCT for all images. On average, the LOT JPEG system provided 0.5 dB PSNR above the baseline DCT JPEG system, and it provided up to 1.5 dB PSNR above the DCT system in some cases. Therefore, it can be concluded that the LOT JPEG system generally provides better PSNR than the DCT JPEG system. For the wavelet system, the visual threshold system provided slightly better PSNR than the DCT system. However, the optimal, run-length encoder was used instead of the universal coder. This coder was the reason for the PSNR gain of the visual threshold system. Nevertheless, the efficient universal coder should slightly reduce the PSNR. At this point, the visual threshold system should performed comparably to the DCT JPEG system. On the other hand, the 4-level EZW system resulted about 0.5 dB less PSNR than other systems. This PSNR loss may be caused by inefficiency of the coder. In other words, the coder might use the bit resource inefficiently such that there 85 Table 4.30: Comparison of the PSNR performance of DCT, LOT, and DWT systems Image baboon barbara2 bfrag btfrag Bit Rate 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 1.75 DCT PSNR (dB) (23.65) (25.18) (26.41) (27.55) (28.66) (29.78) (28.98) (31.29) (33.04) (34.53) (35.88) (37.13) (28.17) (31.28) (33.77) (35.74) (37.36) (38.75) (29.53) (32.34) (34.64) (36.48) (38.07) (39.46) LOT APSNR (dB) 0.32 0.31 0.30 0.32 0.31 0.26 0.71 0.56 0.54 0.54 0.51 0.44 1.49 1.32 1.04 0.86 0.72 0.57 0.35 0.30 0.16 0.17 0.11 0.07 86 VT DWT APSNR (dB) 0.35 0.75 0.94 1.17 1.15 1.25 0.47 0.63 0.93 1.09 0.93 0.99 1.25 1.13 0.50 0.26 0.25 0.32 -1.07 -0.42 -0.10 0.33 0.40 0.49 EZW DWT APSNR (dB) 0.03 -0.19 0.59 0.21 0.04 0.33 -0.46 -0.40 0.43 0.04 -0.18 0.35 0.32 -0.14 -0.08 -0.50 -0.88 -0.40 0.53 -0.31 0.19 -0.61 -0.24 0.09 Table 4.31: Comparison of the PSNR performance of DCT, LOT, and DWT systems (cont.) Image goidhill lena peppers 0.50 0.75 1.00 1.25 1.50 1.75 0.50 0.75 1.00 1.25 1.50 DCT PSNR (dB) (32.35) (34.14) (35.55) (36.69) (37.73) (38.63) (34.56) (36.46) (37.77) (38.89) (39.76) LOT APSNR (dB) 0.37 0.34 0.27 0.27 0.22 0.23 0.37 0.23 0.18 0.12 0.16 VT DWT APSNR (dB) 0.10 0.28 0.29 0.52 0.60 0.69 0.41 0.59 0.71 0.79 1.02 EZW DWT APSNR (dB) -0.51 -0.12 -0.66 -0.72 -0.10 -0.13 -0.04 -0.07 0.09 0.31 0.15 1.75 (40.63) 0.16 1.14 -0.12 0.50 0.75 1.00 1.25 1.50 1.75 (33.78) (35.27) (36.28) (36.05) (37.76) (38.39) 0.12 0.04 0.01 0.03 0.02 0.07 0.43 0.49 0.52 0.72 0.71 1.11 0.18 0.25 0.08 0.42 0.68 0.64 Bit Rate 87 are not enough bits left to send more image information. The efficiency test of the adaptive Huffman for the EZW system was done by comparing the theoretical bit rate that is needed to encode the same file with the actual bit rate. The theoretical bit rate is the minimum bit rate to needed to encode a particular EZW file. The EZW encoded file contains two sections of information: the header section and the data section. The header section includes image dimensions, number of DWT levels, and an initial threshold. The data section contains coefficient information to be stored using the following six symbols, {P,N, 1, Z, 0, 1}. Definitions of all symbols can be found in Chapter 2. The first four symbols are transmitted during the dominant pass, whereas the binary digits, 0 and 1, are sent during the subordinate pass. Most of the information is in the data section. Therefore, in this calculation, the header section is ignored. Assuming the symbols to be independent, the average entropy is the weighted sum of the entropy of the dominant pass and the subordinate pass. The entropy, I(x), is given by N I(x) = Ep,(x) ln(p,(x)), (4.1) i=1 where ND Nx, is number of symbols in the file, where xi can be one of the six symbols above. ND is the total symbols in the dominant pass. The entropy of the subordinate pass can be calculated in the same way. The average bit to represent each symbol in the dominant pass and the subordinate pass were calculated. The theoretical, total bits to encode the EZW encoded file, therefore, are given by, Total bits = ID(x)ND ± Is(x)Ns, (4.2) where ID(x) and Is(x) represent the entropy of the dominant and the subordinate pass respectively, and Ns represents the number of symbols in the subordinate pass 88 respectively. The theoretical bit rate can be calculated by dividing the theoretical, total bits by the number of pixels in the image. The graphs between the theoretical bit rate and the actual bit rate of the 4-level DWT with the EZW system are shown in Figure 4-2. baboon.raw barbara2. raw 2 2 .... ....... ..... 1.5 . ... -01.5 ...... .......- . . . . . . a . ... .. . .. .. .. .. . . . . . . . 0. .. -.. . ... a . .. . . ... . ... _1 .... .. .... . -. .. 0.5 F 0 S 0.5 1 bf rag. raw 1.5 ... .. .. . . . . . . - :30.5 0 2 0 0.5 1 goidhill.raw 1.5 2 2 2 0. 1.5 e-1.5 --.-.-.-.-.-. 1 .... ..... -0 a0. .. - .. ..... 0.5 F 0 ) C) - .. - .......... - - -. 1 0.5 0.5 1 1.5 0 2 0.5 0 lena.raw 1 peppers.raw 1.5 2 2 2 .0 ......... ............ ........... ..................................... .. ......... j0.5 0 Soo 0.5 aD 1.5 1 CO ---... ............--....-.. . 0.5 Ca .. . . . . . . . . ...... . . . . ... . . . . . . . . . C.) 0 0 1.5 1 0.5 Theoretical bit rate (bpp) 0 2 0.5 1 1.5 Theoretical bit rate (bpp) 2 Figure 4-2: Efficiency test for the adaptive Huffman coder using the 4level DWT with the EZW system The solid line represents the 45-degree line. The second line represents the relationship between the theoretical bit rate and the actual bit rate. In Figure 4-2, the 45-degree line indicates the theoretical bit rate that the coder can achieve. The other line represents the actual bit rate. The improvement room for the adaptive Huffman coder can be concluded from the gap between two lines. Results in Figure 4-2 shows small gaps across all images. Therefore, given the definition of 89 the symbols, the adaptive Huffman coder encoded the file efficiently. This encoding scheme is only one way to encode the symbols of the EZW system. Other encoding scheme may provide better compression than the adaptive Huffman used in this thesis. However, the results shows that the EZW system has a potential to achieve at least the same performance as the current JPEG system. 4.3 Part III: Visual Quality The visual quality is the most important aspect in image compression. The results in this section show the artifact of the reconstructed images at 0.5 bit/pixel of selected transfrom coding systems. attempted to indicate any artifacts of all transform coding systems. Two of the seven images, bfrag and lena, were tested and presented in this thesis. According to the simulation results, the compressed image at 1 bit/pixel could not be distinguished from the original image. The artifact becomes dominant as the bit rate decreases. The PSNR values are shown in Table 4.32. For each image, the original image is listed first and the DCT JPEG system next as references. The other systems are listed in the increasing order of magnitude of the PSNR. The images are shown in Figure 4-3 to 4-14. According to the result images, at 0.5 bit/pixel, the reconstructed images of the DCT JPEG system displayed blocking effect strongly. At the same bit rate, the LOT JPEG system significantly reduced the image blockiness. However, the LOT system introduced ringing in areas of sharp edges, but they were not prominant. The ringing artifact of the LOT system can be compared with the blocking artifact of the DCT system to see which artifact is more objectionable to the viewers. On the other hand, the DWT systems suffered from blurriness for both the visual threshold system and the EZW system. The EZW lost more details than the visual threshold system. As the number of DWT levels increased from three to four, more details were recovered, but the image was still not as sharp as those of the baseline DCT JPEG and the baseline LOT JPEG system. The loss of details in the EZW system was caused by exhaustion of the bit budgets at the encoder stage. 90 Table 4.32: Result images and the corresponding PSNRs Name System PSNR (dB) bfrag1 bfrag2 bfrag3 bfrag4 bfrag5 bfrag6 lenal lena2 lena3 lena4 lena5 lena6 Original DCT + JPEG + Run-Length Huffman 3-Level DWT + EZW + Adaptive Huffman 4-Level DWT + EZW + Adaptive Huffman 3-Level DWT + VT + Run-Length Huffman LOT + JPEG + Run-Length Huffman Original DCT + JPEG + Run-Length Huffman 3-Level DWT + EZW + Adaptive Huffman 4-Level DWT + EZW + Adaptive Huffman LOT + JPEG + Run-Length Huffman 3-Level DWT + VT + Run-Length Huffman N/A 28.19 27.27 28.50 29.43 29.67 N/A 34.58 33.19 34.51 34.92 34.98 APSNR (dB) N/A 0.00 -0.92 0.31 1.24 1.48 N/A 0.00 -1.73 -0.07 0.34 0.60 The system comparisons in this chapter showed that the relationship between PSNR and bit rate was mainly influenced by the combination of quantizer and entropy coder. The transform did not have any impact on the PSNR performance. In the comparison between the DCT JPEG, the LOT JPEG, the visual threshold wavelet with the run-length Huffman coder, and the EZW with the adaptive Huffman coder, the LOT JPEG system gave the best performance in terms of PSNRs and the visual quality. The LOT JPEG system provided about 0.5 dB better PSNR than the DCT JPEG system. In addition, it reduced blocking artifacts. However, it introduced small ringings in areas of sharp edges. Furthermore, the cost of changing from the DCT system to the LOT system was negligible because only the transform block was affected. A few adjustments are needed to update quantization levels and code tables. The computational costs of the LOT system were only 75-85% more than the DCT system. Therefore, the LOT system is an economical choice to improve the existing DCT system. On the other hand, the DWT system introduced new ways to compress the image. The PSNR performance of the DWT systems were comparable to the DCT JPEG system. The visual threshold quantizer worked well together with the optimal run-length 91 Huffman coder. The reconstructed image suffered from blurriness due to severe quantization of high frequency bands. The 3-level EZW system with the adaptive Huffman coder yielded the worse PSNR performance of all four systems. Nevertheless, it had the potential to improve by using a different coder as mentioned earlier. This system exhibited no blockiness but suffered from blurriness on the reconstructed image like the visual threshold. In the case of the EZW system, the blurriness occurred because the bit resource ran out. Costs of the wavelet system were mainly the computational costs of wavelet coefficients. The algorithm used in this thesis was the circular convolution, which was not the optimized algorithm. Other algorithms, such as the lifting algorithm, may reduce the computational costs. The quantization and the coding cost is not the main problem with the visual threshold. However, in the case of the EZW system, the zerotree searching algorithm increases the overhead of the system. The newer version of the zerotree algorithm such as SPIHT may work better. In addition, the EZW system in this thesis used the adaptive Huffman to translate one symbol at a time. This encoding scheme may not be optimal. According to the statistics, there were a lot of zerotree root symbols in the encoded file. The run-length coder can take advantage of this situation. 92 Figure 4-3: bfragl: original image at 8 bpp 93 Figure 4-4: bfrag2: DCT + JPEG + Run-Length Huffman at 0.50 bpp 94 Figure 4-5: bfrag3: 3-level DWT + EZW + Adaptive Huffman at 0.50 bpp 95 Figure 4-6: bfrag4: 4-level DWT + EZW + Adaptive Huffman at 0.50 bpp 96 Figure 4-7: bfrag5: 3-level DWT + Visual Threshold + Run-Length Huffman at 0.50 bpp The visual threshold quantizer used the output resolution of 32 pixels/degree 97 Figure 4-8: bfrag6: LOT + JPEG + Run-Length Huffman at 0.50 bpp 98 Figure 4-9: lenal: original image at 8 bpp 99 Figure 4-10: lena2: DCT + JPEG + Run-Length Huffman at 0.50 bpp 100 Figure 4-11: lena3: 3-level DWT + EZW + Adaptive Huffman at 0.50 bpp 101 Figure 4-12: lena4: 4-level DWT + EZW + Adaptive Huffman at 0.50 bpp 102 Figure 4-13: lena5: LOT + JPEG + Run-Length Huffman at 0.50 bpp 103 Figure 4-14: lena6: 3-level DWT + Visual Threshold + Run-Length Huffman at 0.50 bpp The visual threshold quantizer used the output resolution of 32 pixels/degree 104 Chapter 5 Summary Ten variants of three transform coding systems were compared in terms of their system complexity and peak signal-to-noise ratios (PSNR). Seven black & white images were used including Lena and Barbara. The transforms tested included the DCT, the LOT, and the DWT. The quantizers tested included the optimal uniform quantizer, the JPEG uniform quantizer, the visual threshold quantizer, and the EZW quantizer. The entropy coders tested included the Huffman coder, the adaptive Huffman coder, and the run-length Huffman coder. It was found that the relationship between PSNR and bit rate was mainly influenced by the choice of quantizer and entropy coder. In addition, the EZW can take advantage of the number of DWT levels to provide better compression. Finally, in terms of the visual quality given the same bit rate, the LOT JPEG ranked the first, while the DCT JPEG and the visual threshold wavelet tied the second. Finally, the EZW system ranked the last. However, the EZW system may be more desirable when the exact bit rate is required. This simulation investigated only a few combinations of transform coding systems. There are other kinds of quantizer and entropy coder left to be compared, such as the arithmetic coder. In the case of the LOT system, the quantizer and the entropy coder used in the thesis are specifically designed for the DCT JPEG system. Therefore, it may be possible to improve the LOT system by designing the quantizer and the encoder specifically for the LOT coefficients. 105 Compared to the DCT system, the wavelet system is relatively new in this field. Up to now, there is no standardized wavelet system for image compression. Many studies have been conducted about wavelet compression, but further studies should be done to improve both the computational costs and the PSNR performance the DWT system. One possible study is to design the optimal, universal run-length Huffman coder for the visual threshold. In the case of the EZW system, different encoding schemes, such as the run-length coder or the arithmetic coder, may be used to improve the PSNR of the system because there are many zerotree root symbols in the encoded file. 106 Mil~ibraries Document Services Room 14-0551 77 Massachusetts Avenue Cambridge, MA 02139 Ph: 617.253.2800 Email: docs@mit.edu http://libraries.mit.edu/docs DISCLAIMER MISSING PAGE(S), Page 107 is missing from the Archives copy. This is the most complete version available. Bibliography [1] Vasudev Bhaskaran and Konstantinos Konstantinides. Image and Video Compression Standards: Algorithms and Architectures. Kluwer Academic Publisher, Boston, 1995. [2] C.S. Burrus, Ramesh A. Gopinath, and Haitao Guo. Introduction to Wavelets and Wavelet Transforms: A Primer. Prentice Hall, Upper Saddle River, NJ, 1998. [3] David Donoho et al. Wavelab 802 for MATLAB5.x. [Online Document], October 1999. Available HTTP: http://www-stat.stanford.edu/-wavelab/. [4] Jae S. Lim. Two-Dimentional Signal and Image Processing.Prentice Hall, Upper Saddle River, NJ, 1990. [5] Henrique S. Malvar. Signal Processing with Lapped Transforms. Artech House, Boston, MA, 1992. [6] Henrique S. Malvar. Lapped transform algorithms. [Online Document], October 2000. Available HTTP: http://www.research.microsoft.com/-malvar/. [7] Henrique S. Malvar and David H. Staelin. The LOT: Transform coding without blocking effects. IEEE Transactions on Acoustics, Sppech and Signal Processing, 37(4):553-559, April 1989. [8] Alan V. Oppenheim, Ronald W. Shafer, and John R. Buck. Dicrete-Time Signal Processing. Prentice Hall, Upper Saddle River, NJ, 2 edition, 1999. 108 [9] William B. Pennebaker and Joan L. Mitchell. JPEG: Still Image Data Compression Standard. Van Norstrand Reinhold, New York, 1993. [10] A Said and W.A. Pearlman. A new, fast, and efficient image codec based on set partitioning in hierarchical trees. IEEE Transactions on Circuits and Systems for Video Techonology, 6(3):243-250, June 1996. [11] J.M. Shapiro. Embedded image coding using zerotrees of wavelet coefficients. IEEE Transactions on Signal Processing, 41(12):3445-3462, December 1993. [12] Gilbert Strang. Wavelets from filter banks. In Gordon Erlebacher, M.Yousuff Hussaini, and Leland M. Jameson, editors, Wavelets: Theory and Applications. Oxford University Press, New York, 1996. [13] Gilbert Strang and Truong Nguyen. Wavelets and Filter Banks. Wellesly- Cambridge Press, WellesleyMA, 1996. [14] T.D. Tran and T.Q. Nguyen. A progressive transmission image coder using linear phase uniform filter banks as block transforms. IEEE Transactions on Image Processing,8(11):1493-1507, November 1999. [15] J.D. Villasenor, B. Belzer, and J. Liao. Wavelet filter evaluation for image compression. IEEE Transactionson Image Processing,4(8):1053-1060, August 1995. [16] Andrew B. Watson, Gloria Y. Yang, Joshua A. Solomon, and John D. Villasenor. Visual thresholds for wavelet quantization error. In Bernice E. Rogowitz and Jan P. Allebach, editors, Proceedings of SPIE: Human Vision and Electronic Imaging, volume 2657, pages 382-392, April 1996. 109