COLOR IMAGE COMPRESSION USING WAVELET TRANSFORM by STEVEN CARL MEADOWS, B.S.E.E. A THESIS IN ELECTRICAL ENGINEERING Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING Aooroved Accepted Dean of tjr^ Graduate School August, 1997 AC '^^ -7 t ACKNOWLEDGMENTS My sincere appreciation goes to my graduate advisor Dr. Sunanda Mitra for all of her help and encouragement during my research. I also would like to thank Dr. Krile and Dr. Lakhani for serving on my graduate committee. 11 TABLE OF CONTENTS ACKNOWLEDGMENTS ii ABSTRACT iv LIST OF TABLES v LIST OF FIGURES vi CHAPTER I. INTRODUCTION TO IMAGE COMPRESSION II. HUFFMAN CODING 1 8 III. ARITHMETIC CODING 12 IV. DISCRETE WAVELET TRANSFORM 19 V. EMBEDDED ZEROTREE WAVELET ALGORITHM VI. RESULTS OF EZW COMPRESSION VIL SUMMARY AND CONCLUSIONS REFERENCES 35 45 51 52 APPENDIX A. ANSI C IMPLEMENTATION OF HUFFMAN ENCODING AND DECODING 54 B. ANSI C IMPLEMENTATION OF ARITHMETIC ENCODING AND DECODING 64 C. EZW EXAMPLE 74 111 ABSTRACT C language coding of image compression algorithms can be a difficult and tedious task. Image compression methods are usually composed of many stages of cascaded algorithms. Each algorithm may be developed independently. This thesis will address the problem of interfacing new image compression algorithms with older and established algorithms such as entropy coding and the discrete wavelet transform. The thesis will describe ANSI C coding procedures and functions involved in implementing two entrop\ coding algorithms including Huffman coding and arithmetic coding. Wavelet theory will be discussed as it applies to the discrete wavelet transform. The thesis will also describe an ANSI C coding implementation of one of the newest wavelet coefficient coding techniques, embedded zerotree wavelets (EZW) developed by Jerome Shapiro. The EZW compression performance will be compared with JPEG which is the standard adapted currently for still images by the Joint Photographic Experts Group. IV LIST OF TABLES 2.1 Assignment Procedure 11 3.1 Model of Set {a,e,i,o.u,!} 17 4.1. Initialized Filter Impulse Responces 34 6.1 Lena and Baboon Compressed Image Statistics 49 C.l Processing of First Dominant Pass at T = 32 78 C.2 Processing of the First Subordinate Pass 79 LIST OF FIGURES 1.1 Communication System Model 6 1.2 Histogram of Pixel Values 6 1.3 Histogram of Pixel Differences 7 1.4 A Transform Coding System 7 2.1 Huffman Structure 11 3.1 Arithmetic Coding Process 17 3.2 Initialized "freq" and "cumfreq" Arrays 17 3.3 Initialized "char_to_index" and "index_to_char" Arrays 18 3.4 Updated Arrays 18 4.1 Discrete Wavelet Decomposition 34 4.2 Discrete Wavelet Reconstruction 34 5.1 Zerotree Structure 43 5.2 Position Coding Flowchart 44 6.1 Lena and Baboon Images Compressed with EZW and JPEG 47 6.2 EZW and JPEG Compression Plots 50 C.l Example of 3-Scale DWT of an 8 x 8 Image 77 VI CHAPTER I INTRODUCTION TO IMAGE COMPRESSION Currently, there is a large proliferation of digital data. Multimedia is an evohing method of presenting many types of information. Multimedia combines text, pictures, sound, and animation in a digital format to relate an idea or story. In the future, multimedia may be as readily available as newspapers and magazines (which combine text and pictures in a printed format to relate information) are today. With multimedia as well as with other types of digital data, there is a need to reduce the costs of storage and transmission of the information. Reducing costs translates into reducing the amount data needed to represent the information. Data compression fills this role. Data compression is the process of reducing the amount of data needed to represent information. Data compression is often referred to by the type of data being compressed: image compression compresses still images; video compression compresses animation combined with sound; etc. Data is presented to a user in an uncompressed format and is stored and transmitted in a compressed format. Therefore, data compression algorithms need to perform two functions, compression and decompression. In Figure 1.1, the encoder and decoder represent the compression and decompression processes, respectively, and the channel represents a storage device or transmission process. In the work presented in this thesis, the storage device or transmission process is assumed to be lossless. In other words, the compressed data will not be corrupted by the channel. This assumption is often realized in practice since most storage devices reliably preserve data and error correction protocols are incorporated into most digital transmission processes. This thesis focuses on the efficiency and performance of an encoder and decoder in an image compression system. Image compression is generally divided into two categories: lossless and lossy. Lossless compression refers to compression without losing any image information. The decoded pixel values will be the same as encoded pixel values. However, a lossy compression system corrupts the pixel values so that the uncompressed or reconstructed 1 image is an approximation of the original image. Lossy compression has the advantage of compressing an image to a much higher compression ratio (CR) than lossless compression since a lossy compressed image contains less information than a lossless compressed image. The CR is the amount of original data di\ided b\ the amount of compressed data (Equation 1.1). The least amount of data needed to represent all of the image information is constrained by the amount of information contained in the \ alues amount of original data CR = —z TT—• amount oi compressed data (1-1) encoded. The amount of information contained in the values or symbols encoded can be quantitatively measured. The entropy (Equation 1.2) is the amount of information in rary units per symbol of a source, where J is the number of different symbols and P(aj) is the probability of symbol aj. (Probability is a measure of likelihood of a symbol occurrence in a range from 0 to 1 with 1 being most likely.) When the base of the entropy = L^, = - ^ P(cij) log, P(aj) • (1-2) logarithm is 2, the entropy is measured in binary units or bits per symbol. The entropy of a source is the theoretical limit of the least number of bits required to code each symbol on average [1]. All of the entropy calculations in this thesis are measured in bits per symbol. Losslessly coding a set of symbols is generally referred to as entropy coding. There are many sets of symbols which can represent an image. The most obxious set of symbols to represent an image is the set of pixel values. The entropy of the individual pixel values is calculated using Equation 1.2 where the probabilities of the individual pixel values are obtained from their relati\e frequencies of occurrence. This entropy is called the first-order entropy estimate of an image. This entropy is labeled an image entropy estimate because if the pixel values are dependent on one another, the image entropy will be less than the indi\ idual pixel value entropy. When the entrop\ of pixel pairs is calculated, the dependency of the pixel values on their corresponding pairs is taken into accoimt. This entropy is called the second-order entropy estimate. WTien 2 the pixels are grouped together into groups of three, the group entrop\ compensates for the pixel dependencies within each group resulting in a third-order entropy estimate. The infinite order entropy estimate becomes the actual image entrop\'. If indi\ idual pixel values are statistically independent, the first-order entropy estimate becomes the actual image entropy [1]. All of the entropy calculations in this thesis will be first-order entropy estimates since the first-order estimate is the simplest to calculate. Variable length coding techniques, such as Huffman coding, code symbols one at a time with each symbol being represented by a variable number of bits. When pixel values are coded with a variable length coding method, the first-order entropy estimate is the lower bound on the pixel bit rate. However, when more than one pixel is coded at a time with a coding method such as arithmetic coding, the average bits per pixel (BPP) can be lower than the first-order entropy estimate [1]. Huffman and arithmetic coding techniques will be discussed in detail in Chapters II and III, respectively. If the pixel values are reversibly mapped to another set of values, the other set of values may have a much lower entropy than the original pixel values. As stated previously, many sets of values can represent an image. The problem in lossless image compression is to find the set of values, which are reversibly mapped from the pixel values, that has the lowest entropy. Predictive coding is a popular method to map the pixel values to a lower entropy set. With predictive coding, each pixel value is predicted from previous pixel values, and the difference between the original and predicted value is coded [1]. As a demonstration of the advantage of predictive coding over pixel coding, consider the standard Lena image. Lena contains 8 BPP resulting in 256 levels of gray per pixel and has dimensions of 512 x 512 pixels. The probabilities of the pixel values are their frequencies of occurrence (Figure 1.2) divided by the total number of pixels which is 262144. Using Equation 1.2. the first-order entropy estimate is 7.4030 BPP. Reducing the original 8 BPP to 7.4030 BPP achieves a CR of only 1.0806. Howe\er, if each pixel value is predicted b\ the pixel immediately preceding, which is the pixel to the left when scanning the image from left to right and top to bottom, the first-order entrop\ estimate of the differences between the actual and estimated pixel \ alues is 5.0221. This 3 value is less than the first-order estimate of the pixel values. This bit rate reduction achieves a CR of 1.5930. Notice that the probabilifies of small pixel differences (close to zero) are very high (Figure 1.3). These high probabilifies indicate that the pixel \ alues are slowly varying, and the image has many smooth features. The sharp spike in the histogram of the pixel differences is indicative of sets with low entropy. One can estimate the relative entropies of different sets of values based on their histograms. For example, if the histogram of pixel differences had a narrower and taller spike at zero, the entropy would be less. Similarly, if the histogram had a fatter and shorter spike at zero, the entropy would be more. With an improvement in CR from 1.08 to 1.59, predicti\ e coding would store the Lena image with 78,031 fewer bytes than pixel coding. Lossy compression is used to compress images to much higher CR's than lossless compression. As shown in the previous example, variable length coding of the pixel differences could at best compress the image to a CR of 1.59. Even with more complicated coding techniques, generally, the best CR for lossless coded images is about 2. However, with lossy compression, CR's of 10, 20, or 40 are common depending on how much distortion is acceptable. The type of lossy coding discussed in this thesis is transform coding. With transform coding, a linear transformation is performed on the image so that more important information can be separated from less important information. This less important information can then be discarded. In other words, the transformation decorrelates the pixel data so that the most information can be packed into the least number of transform coefficients [1]. Common transforms are the discrete cosine transform (DCT) and the discrete wavelet transform (DWT). This thesis will focus on the DWT. Transform coding proceeds as follows: First, the forward transformation is performed on the image (Figure 1.4). Next, the transform coefficients are quantized. Quantization is a method of approximating the transform coefficients so that the most important image information is retained. Information loss and distortion occurs in the quanfization stage. There are generally two types of quantization: scalar and vector. Vector quantization occurs when values are quantized as groups. Scalar quantization 4 quantizes values individually. This thesis will focus on scalar quantization. One of the simplest forms of scalar quantizafion is threshold coding. With threshold coding, uniform ranges of transform coefficient values are placed into separate bins. The range of values which is assigned to each bin is determined by the value of the bin width. Equation 1.3 is the equation form of the quantization curve where k is the value of the bin index, c is the value of the bin width, and T is the transform coefficient value [1]. Next, the symbols representing these approximated coefficients are losslessly coded. Entrop> coding should be the final stage of all lossy compression algorithms. The compression gained from the entropy coding stage alone may be a CR of less than 1.5, but the entrop> coding stage will add some compression. The decoding process performs the inverse of the encoding stages except for the quantization stage. The inverse transform is performed on the approximated transform coefficients which results in a distorted reconstructed image. Transform coding contains many different stages. Each stage may be developed independently. The quantizer has been the focus of the most intense research. When a new quantizer is developed, it must be interfaced with the other stages. In order to interface a quantizer with an entropy coder, both processes must be well understood. With an ANSI C code implementation, coding a set of quantizer symbols can be difficult and tedious. Chapters II and III will demonstrate how to interface two common entropy coders, Huffman coding and arithmetic coding, with a symbol generating algorithm such kc-^<T<kc^^. (1.3) as a quantizer. Chapter IV will discuss the development and implementation of the DWT. Chapter V will discuss an implementation of the new scalar quantizer of wavelet coefficients, EZW. Chapter VI will demonstrate compression results of EZW and compare them with the compression results of the lossy compression algorithm JPEG for 24-bit color images and a gray-scale image. Information source ^ Encoder Channel ^ Decoder Figure 1.1 Communication System Model 3000 50 100 150 200 pixel values 250 Figure 1.2 Histogram of Pixel Values 300 Information user x10 2.5 0 o c 0 ^o 1 . 5 o 0) -o 1 c 13 0.5 0 -200 -100 0 pixel values 100 200 Figure 1.3 Histogram of Pixel Differences Input image Compressed image ? Forward transform ^ Quantizer Symbol decoder Symbol encoder Inverse transform Figure 1.4 A Transform Coding System 7 -> Compressed image -> Decompressed image CHAPTER II HUFFMAN CODING Huffman coding [2] has become one of the most used methods in coding sets of symbols. As previously menfioned, Huffman coding is a type of \ariable length entrop\ coding where each symbol corresponds to a unique binary string of varying length. Huffman coding is uniquely decodeable. In other words, when the symbols are encoded by concatenating the binary strings, this concatenated binary string can be decoded uniquely when read sequentially in the same order that it was written. No special symbols are required to delimit the binary strings. Extra storage is required to store the codebook which equates each unique symbol with its corresponding binary string. However, the amount of data in the codebook is usually insignificant compared to the amount of data required to code the source of symbols. Therefore, when the codebook is concatenated with the string of encoded symbols, the overall file size is not increased significantly. In order to create a codebook and assign binary strings to symbols, the probabilities of the symbols must be known or estimated. The symbols with higher probabilities have shorter binary strings since they occur more often. The symbols that occur less often have longer binary strings. The symbol probabilities can be established by creating a histogram of the symbols from the source. Since the Huffman coding algorithm only requires knowledge of the relative probabilities (how the probabilities compare with one another), the frequencies of occurrence (FOO) can be used directly by the algorithm. FOO are directly proportional to the symbol probabilities. First, a simple example to demonstrate the Huffman algorithm will be discussed. Next, an ANSI C code implementation of the algorithm will be described. Suppose an image with 100 pixels is coded with the one-dimensional predictive coding technique described in Chapter I. Suppose also that only 6 unique symbols resulted from the coding (Table 2.1) [1]. The FOO sum to 100 since there are 100 pixels. First, the symbols are sorted with respect to their FOO. Next, the two symbols with the least probabilities are 8 combined into one symbol whose probability is the sum of the probabilities of the two symbols. Next, the symbols are sorted again and the process is repeated until only two symbols are left. Each iteration is called a source reduction because the number of symbols is reduced by one. Each source reduction column contains the FOO and the binary strings (Table 2.1). After the fourth source reduction, the two symbols which are left are given the codes 0 and 1. One of these two codes is handed down to the two symbols which were combined in the fourth source reduction. The symbol with the FOO of 60 was created from the two symbols with FOO of 30 in the fourth source reduction. These two symbols are distinguished by concatenating a 0 or 1 to the code 0 which was handed down. The symbol with the FOO of 40 is an original symbol which was not created by the algorithm. Therefore the code for symbol 0 is set at 1. Next, the symbols which were combined in the third source reduction are distinguished by concatenating a 0 or 1 to the handed down string. This process of assigning binary strings to symbols is continued until all of the original symbols have been assigned codes. As predicted, the symbols with the highest FOO have short codes and the symbols with the least FOO have longer codes. A symbol string such as (-1, 0, 3, 1) can now be coded in binary as 00101010011. Wlien this binary string is read from left to right, the symbols can be decoded uniquely. The ANSI C code implementation of Huffman coding uses linked lists of structures to create the binary codes and integer arrays to encode the binary strings. One should be familiar with the ANSI C language before reading further. [3] is a good reference for this programming language. The Huffman structure (Figure 2.1) consists of the symbol, FOO, code length, code number, and two structure pointers. Each unique symbol will have its own structure. The symbols which are being encoded must be translated some way into integer values in "symbol". The "occurrence" \ alue is the FOO of the symbol. The "codejength" and "codenumber" values determine the binary string equated to the symbol. "code_number" is the numerical value of the binar> string. If there are trailing zeros to the left of the binary string, the length of the string may not be apparent from the binary value. Therefore, "codejength" determines the length of the 9 string. Together, "codelength" and "codevalue" uniquely determine an\ binar\ string. The "next" pointer links the structures together in a sorted list. The "child" pointer links the source reduced symbol with one of the two symbols which were combined to form the source reduced symbol. For example, in the first source reduction column in Table 2.1, the symbols are sorted with respect to the FOO. If "huffman" structures represented the symbols, the list would be linked together with "next" pointers. The "3" and "-2" symbols were combined to form the last symbol in the list. The program must be able to retrace the steps of source reducfion, so the "child" pointer in the last symbol would point to the "3" symbol in the previous list. The "next" pointer in the "3" symbol would point to the "-2" symbol because of the order of the list. Therefore, by following the "child" and "next" pointers, the program can find the two symbols which were combined to form a source reduced symbol. The Huffman program was tested with the 8-bit gray scale standard Lena image. The program reduced the file size from 262,144 bytes to 167,948 bytes for a CR of 1.56 and a bit rate of 5.13BPP. The cumulative squared difference between the original and reconstructed pixel values was 0 since the coding was lossless. This compressed bit rate is close to the entropy bit rate calculated above for the same data. Huffman coding resulted in 5.13BPP while the first-order entropy estimate was calculated at 5.02BPP. Since no other variable length coding technique can produce a lower bit rate than Huffman coding, Huffman coding is considered optimal. 10 Table 2.1 Huffman Code Assignment Procedure [1] Original source Symbol Source reduction FOO code 1 0 40 1 40 1 40 1 40 1 -1 30 00 30 00 30 00 30 00 1 10 Oil 10011 r-20010^ 30 01 2 10 0100 100100<-| 10011 3 6 01010^ 100101^ -2 4 OlOll-e 4 struct huffrnan { short int symbol; int occurrence; short int codelength; int codenumber; struct huffrnan *next; struct huffman * child; }; Figure 2.1 Huffman Structure 11 r 60 0 40 1 CHAPTER III ARITHMETIC CODING Arithmetic coding is a relatively new lossless symbol coding technique. Arithmetic coding can code more than one symbol with a single code word, thereb\ allowing arithmetic coding to achieve a lower bit rate than any \ ariable length coding technique. For many years, Huffman coding was considered the best symbol coding technique. Now, arithmetic coding is able to compress strings of symbols better than Huffman coding. The arithmetic coding algorithm is better suited to using adaptive statistical models. In other words, arithmetic coding can adapt to changing symbol probabilities from a source. With an adaptive statistical model, the symbol probabilities are determined while the symbols are being coded instead of being determined beforehand as with the Huffman algorithm. Arithmetic coding is also more computationally efficient than Huffman coding. Huffman decoding can be computationally expensive since, with each bit read from a compressed file, the decoder must scan through a look-up table containing the symbol codes. Howe\ er, with the arithmetic compression program described in this thesis, coding and decoding is performed through integer multiplication and division which is very fast on modem computers. Also with arithmetic coding, symbols from different sources can easily be encoded mixed together without loss of compression efficiency. However, the arithmetic decoder must be aware of the order of the mixing in order to unmix the sources. This technique of mixing and unmixing sources is used extensively in the EZW algorithm described in Chapter V. Arithmetic coding is more complicated and difficult to implement than Huffman coding. The implementation described in this chapter and used in the EZW program was taken from [4]. A standardized, patent-free, off-the-shelf arithmetic software package is currently unavailable. However. IBM has developed a patented arithmetic coding implementation called a Q-coder, and the Joint Photographic Experts Group (JPEG) has agreed on a binary arithmetic coding technique for lossless image compression. 12 Arithmetic coding codes symbols by transmitting a \ alue in a specified range. This range is dependent on the symbols encoded and probabilities used to model the symbols. Consider a set of symbols with a fixed statistical model comprised of the five vowels and an exclamafion point: {a,e,i,o,u,!}. Each symbol is assigned a subinter\ al in the interval [0,1) based on its corresponding probability (Table 3.1). The symbol "e" can be represented by any value in the range [0.2,0.5) such as 0.24. Any value in the range [0,1) will represent a unique symbol. When more than one symbol is coded, the range representing the symbols must be narrowed with each consecutive symbol (Figure 3.1). Consider the code word (e,a,i,i,!). After "e" is selected for coding, the range narrows from [0,1) to [0.2, 0.5). This range is subdivided in the same manner as the initial range. The symbol subintervals will of course become smaller than the initial subintervals. The range is narrowed again when "a" is selected for coding. The subinterval for a symbol again becomes an entire range which is subdivided (Figure 3.1). This process of narrowing the range of an encoded value for each symbol selected for coding continues until the terminator symbol,"!," is reached. The final range of the code word (e,a,i,i,!) is [0.23354,0.2336). The code word can be uniquely decoded with any value in this range. The code word can be coded using 0.23355 or 0.23355349876. The second choice will require more space to store, so the first choice is preferable. As the length of the code word increases, the narrower the range becomes, and more digits will be required to store the encoded value. Since more than one symbol is encoded with a single value, arithmetic coding is able to achieve a bit rate less than the first-order entropy estimate of an image as described in Chapter I. Decoding the code word proceeds similarly to the encoding process. The initial range, [0,1), and the symbol probabilities are known beforehand. The initial range is subdivided according to the symbol probabilities. The value of 0.23355 is observed to lie in the range [0.2, 0.5), so the first decoded symbol is "e." This range is subdivided in the same manner as the inifial range (Figure 3.1). The value of 0.23355 is observed to lie in the range [0.2,0.26), so "a" is the next decoded symbol. The decoder knows to stop 13 subdividing the ranges when the terminating symbol. "!," is decoded. As a result, the decoder obtains the code word (e,a,i,i,!). The ANSI C code implementation of arithmetic coding uses binary ranges and subdivisions rather than base 10 as in the above example. The binary implementation uses one main range of [0, 2'^) for 16-bit values. When this range is subdivided and narrowed by selecting a symbol to encode, a few most significant bits of the narrowed range can be determined. For example, if the narrowed range lies in the lower half of [0, 2'^), any 16-bit value in this narrowed range will have a MSB of 0. This 0 bit can be sent to the compressed output file. The high and low boundaries of the range which have a common 0 MSB can be left shifted one bit which effectively doubles their value which also doubles the range. There are other means to determine the most significant bits of a range of values. However, as each MSB is determined, the range will effectively double as in the above example. The ranges are subdivided and narrowed using the "cumfreq" array. With the above base 10 example, the ranges were subdivided using the probabilities of the symbols. The "cumfreq" array is closely related to the symbol probabilities. Each symbol corresponds to a unique array index. The value of each array element is the cumulation of the FOO of the symbols indexed ahead of the current symbol. For example, consider Figure 3.2. The "freq" and "cumfreq" arrays have been initialized for a set of four symbols indexed with indices 1-4. This example will be part of a running example used in this chapter. Each element of the "freq" array with a symbol index contains the value of the FOO of the corresponding symbol. All of the FOO are initialized to I. No symbol is indexed by the zero element, so this value can be set to 0. The value of "cum_freq[4]" is 0 since there are no symbols indexed ahead of "cum_freq[4]." All of the symbols are indexed ahead of "cum_freq[0]." so this element contains the cumulative FOO. The values of "cumfreq" are accumulated in reverse so that the zero element of "cum_freq" can be used for normalization purposes. As each symbol is coded, "freq" and "cumfreq" are sorted with respect to FOO in descending order. The arrays are sorted so that the symbols can be decoded more quickh 14 and efficiently. When the arrays are sorted, the indices corresponding to the s> mbols change. The program should keep track of the indexes with which the s\ mbols correspond. Two arrays, "charjoindex" and "indexjochar." provide this ftinction (Figure 3.3). This program was inifially written for text compression, so the source symbols are sometimes referred to as characters and abbreviated "char." These two arrays are inifialized in logical ascending order. The two arrays remain invertible. In other words, if a character is translated to an index with "chartoindex," the same index can be used to translate back to the original character with "index_to_char." The indices are coded with the arithmetic algorithm, and these two arrays are used to translate back and forth between the original and coded symbols. The original symbols must be represented by non-negafive integers to provide valid indices for "chartoindex" (Figure 3.3). The statistical model used by the arithmetic coder is basically a histogram of the symbol occurrences. This model is labeled adaptive and adapts to changing symbol probabilities because the histogram is updated with each symbol encoded. [4] provides an example of a non-adaptive model with a static symbol histogram. The adaptive model is adequate for most purposes. To complete the four symbol example, suppose the character 2 is encoded (Figure 3.4). The first symbol index of "freq" was increased to a frequency count of 2 so that "freq" remains in sorted order. The other three arrays were adjusted accordingly. This small example should clarify the statistical model update process. The arithmetic program was tested with the 8-bit gray scale standard Lena image. The program reduced the file size from 262,144 bytes to 165,165 bytes for a CR of 1.59 and a bit rate of 5.04BPP. The cumulative squared difference between the original and reconstructed pixel values was 0 since the coding was lossless. This compressed bit rate is very close to the entropy bit rate and less than the Huffman coded bit rate calculated above for the same data. Huffman coding resulted in 5.13BPP while the first-order entropy esfimate was calculated at 5.02BPP. Execufing on a SPARC20 Sun machine, the arithmetic encoding and decoding programs required only about fi\ e seconds each to 15 code and decode the Lena image. The Huffman decoder alone required about one and a half minutes to decode the Lena image. As a result, arithmetic coding can achie\ e lower bit rates and more efficient execution than Huffman coding. 16 Table 3.1 Model of Set {a,e,i,o.u.I} [4] Symbol Probability Initial Subinterval a 0.2 [0,0.2) e 0.3 [0.2,0.5) i 0.1 [0.5,0.6) o 0.2 [0.6,0.8) u 0.1 [0.8,0.9) I 0.1 [0.9,1.0) After seeing a 0.5 0.26. 0.236-, 0.2336-, 0.2336^ u o i e a OJ 0.2 J 0.2 J 0.23J 0.233 J Figure 3.1 Arithmetic Coding Process [4] freq[0]=0 cum _freq[0]=4 freq[l]=l cum _freq[l]=3 freq[2]=l cum _freq[2]=2 freq[3]=l cum _freq[3]=l freq[4]=l cum _freq[4]=0 Figure 3.2 Initialized "freq" and "cum_freq" Arrays 17 0.23354 J char_to_index[0]=1 index_to_char[ 1 ]=0 char_to_index[ 1 ]=2 index_to_char[2]=1 char_to_index[2]=3 index_to_char[3]=2 char_to_index[3]=4 index_to_char[4]=3 Figure 3.3 Inifialized "char_to_index" and "index_to_char" Arrays freq[0]=0 cum_freq[0]=5 char_to_index[0]=3 index_to_char[l]=2 freq[l]=2 cum_freq[l]=3 char_to_index[l]=2 index_to_char[2]=l freq[2]=l cum_freq[2]=2 char_to_index[2]=l index_to_char[3]=0 freq[3]=l cum_freq[3]=l char_to_index[3]=4 index_to_char[4]=3 freq[4]=l cum_freq[4]=0 Figure 3.4 Updated Arrays 18 CHAPTER IV DISCRETE WAVELET TRANSFORM The confinuous-fime wavelet transform (CTWT) w as developed as an improvement over the familiar confinuous-time Fourier transform (CTFT). The GIFT is used to extract the frequency content from confinuous signals (Equafion 4.1). Some disadvantages of the CTFT are as follows: The CTFT requires knowledge of the entire 00 n(o)= \f{t)e-^dt (4T) -00 signal from -oo to +oo when only part of the signal may be known. Also, when the frequency characterisfics of the signal change with time, the CTFT is unable to identify the time dependent signal frequencies. How can one represent signal frequencies w ith their dependence on time? Musical scores are written with this idea in mind. The musical notes on a page of sheet music represent sound frequencies at specific time intervals. The short-time Fourier transform (STFT) was developed to extract signal frequencies within certain time intervals (Equation 4.2) as determined by a window function, w{t-b) [5]. The transformed signal is now time and frequency dependent. The 00 STFT^ (o),b) = \f(t)e-''^w(t-b)dt (4.2) -00 basis functions for the STFT become windowed complex exponentials. The window function is often chosen as Gaussian so that the inverse transform will also use a Gaussian window function. The STFT is limited in the range of frequencies it can analyze because the window size remains fixed. The window function will extract onh part of a cycle of a low frequency and large numbers of cycles of a high frequency. The window size should vary with frequency so that it can zoom out to measure low frequencies and zoom in to measure high frequencies. 19 A basis funcfion for a transform which measures the full range of frequency content of a signal at a time instant would need to have compact support (a limited interval of non-zero values) in the time domain and frequency domain. This basis ftinction would be time translatable (shift to measure the signal at different times) and frequency scaleable (expand and contract to measure different signal frequencies). The wavelet functions, WaA^') = a-"''xi/\a^R\beR, \ a J were developed as these basis ftincfions for the CTWT, (4.3) 00 CTWT^ {a,b) = \y/^, {t)f{t)dt [6]. (4.4) -00 The scaling value, a, is analogous to frequency and the translation value, b, is analogous to time. Before wavelets were developed, these basis functions were thought not to exist. CTWTj{a,b) can be interpreted loosely as the "content" of/r) near frequency a and time b [7]. The original function,/r), can be reconstructed from CTWTj(a,b) if i/4,t) satisfies the admissibility condition, l-l'P(^)' C^ = ] 0 doxc^. (4.5) « H^(co) denotes the Fourier transform of v|/(/). The reconstrucfion formula becomes /(') = -TT /Jcr»T,(o,6)^„/0-V [6]^1// -ooO (4.6) ^ The two-dimensional (2D) discrete wavelet transform (DWT) which is performed on digital images was developed as an extension of the CTWT. The 2D DWT provides compact, uncorrelated, multiresolution representation of digital images. This chapter will describe the development of the 2D DWT. The wavelet transform terminology and descriptions will remain analogous to the familiar Fourier transform terminology for the sake of clarity. 20 In order to develop the DWT, the wavelet functions must first be descretized into a continuous-time wavelet series (CTWS). The wavelet functions in Equation 4.3 are modified with the following relations: a = a'"'; b = nbQa'"'; m,n eZ; a^ >\; b^ ^0. (4.7) Z is the set of integers. The values of a^ and b^ are arbitrarily set to 2 and 1, respectively. As m gets smaller, the scaling factor, a, increases which expands the wavelet function. The translation step size, b/n, also increases so that the step size is scaled the same as the scaling factor. With this discretization, low frequency and high frequency signals can be comparably analyzed. The wavelet ftmctions in Equation 4.3 become yfr'i//{2'"t-n); m,neZ. (4.8) These wavelet functions form an orthonormal basis of L^(R) (the function space of square integrable functions). The series coefficients of the CTWS are (CTWS,),„„=d„,„ = \42^y/[2'"t-n)f{t)dt,f{t)^L\R). (4.9) If there exist numbers A>0 and 5 < oo such that 2 4/ir ^ Z Z k\d J ^^ll/f' ^f{t)^L\R), (4.10) meZneZ where 00 1/11= \\f{t)U'- (-^ii* — 00 f(t) can be reconstructed [8] with the inverse of the CTWS, /(0 = ZI<,.V2^v(2-'-«). (4.12) meZneZ Multiresolution analysis derives from the ability to reconstructXO partially but not completely. When//) is represented in different resolutions, the constructed signal contains different frequency bands of the original/O- A low resolufion representafion of flj) is constructed with a scaling ftinction, (fit). This representation oifij) contains all frequency bands up to a certain cut off frequency which is similar to a low pass filtering 21 off[t). A low resolution vector space, Vj a iJ (R), is created to contain all functions within resolution^, j eZ. Asj increases, the low resolution \ector space Vj becomes more detailed. In other words, as7 increases, Vj contains functions with higher frequency content. As a result, concentric subspaces are formed with subspace I'j containing all subspaces Vj such that i<j, i.e., V / G Z , F,. eF^.,,, (4T3) limK = M F is dense in l'(R), lim V. = Hv. = {0} [91. Also, these Vj have the property that for each function f(t) e Vj, a contracted version is contained in the subspace Vj^•^, VjeZ,f{t) eV^^filt) GK,,, [6]. (4.14) A unique scaling function, (ff(t) GF^, is created whose translations, (p(t -n), n eZ. form an orthonormal basis for VQ. AS a consequence of Equation 4.14, an orthonormal basis for subspace Vj becomes ^J^(t) = ^[2^(f,{rt-n),neZ. (4.15) Let Aj become an operator on / ( / ) G L^ (R) which creates an orthogonal projection off{t) onto subspace Vj, Ajfit) = Z {f{uU^,{u))(P^,it). (4.16) « = -00 where 00 {f{u).(l>jM))= \f(u^,{u)du. (4.17) This projecfion is similar to projecfing a three dimensional vector onto a two-dimensional plane. The projected two-dimensional vector becomes the nearest representation in the two-dimensional plane of the three-dimensional vector. Similarly. Ajf{t) is the closest representation off{t) in subspace Vj, 22 Vg(/) GF^.. | | g ( 0 - / ( 0 | | > K / ( O - / ( O • (4.18) 4//^0 is similar to a lowpass filtered version off{t), and asy increases. Ajf{t) becomes more similar to the original XOAn important aspect of the Ajf{t) operafion is how the series coefficients in Equation 4.16 relate to/r). Let ^^ represent an operator onf{t) which forms the discrete inner products in Equafion 4.16, < / ( « ) = {fitl^jAO) = 4v{f{t),(t>{V{t-2-^n))), n^Z. (4.19) With the convolution operation defined as 00 / * g{^) = {fit) * g{t)\x) = jfit)g(x - t)dt, (4.20) -00 Equation 4.19 can be rewritten as 00 ^ ; / ( « ) = 4lJ \f{t)(t>{V{t-2-^n))dt = {f{tr42J(t>{-Vt)){2-^n), neZ.(4.21) -00 AJ f(n) can be thought of as a convolution off[t) with a flipped and dilated version of (/{t) uniformly sampled at intervals of 2'^ n . As the scaling ftinction becomes more contracted, which corresponds to higher frequencies, the sampling rate decreases. Figure 1 in [9] graphically demonstrates the low pass filter characteristics of a sample scaling function. As a result, Equation 4.21 represents a descretized low pass filtering off{t) which corresponds to a discrete approximation off{t). Since computers can only work with discrete signals, multiresolution analysis is done by computing the discrete approximations ofXO. ^j fi^^) •> ^t many resolutions. The first discretization offij) which contains the most information is set to AQf{n) (resolufion 0) for normalization purposes. The lower resolution discrete versions off{t) (At^fin), Atjfin), ^-^fi^), • • •) contain less information offij). There is a problem in converting the discrete approximations from one resolution to another without the need of confinuous functions. In theory, one should be able to calculate a lower resolution 23 discrete function from a higher resolution discrete function since the higher resolution function contains more signal information. In fact, this calculation is possible. Since (fijnit) is a member of function space Vj^^,(/)j„{t) G VJ a F^^,, (fjniO can be constructed with the orthonormal basis of F^v,, ^JAO = X {<f>jA^Uj.^A^))</>j.^AO • (4.22) The inner products in Equation 4.22 can be rewritten as 00 {h>-(")'^.>.,* (")) = 2^ V2 j^(2^ u - n)(l){V'' u - k)du . (4.23) -00 With a substitution of variables, — = 2^u-n. Equation 4.23 becomes 00 |V2^^(2-' t)(t>{t -{k- 2n))dt = (^_,o(0,^o..-2.(O) . (4.24) -00 Next, take the inner product ofj{f) with both sides of Equation 4.22, {f{t\<t>j,(t))= f,{<l>.,,MA.,-iM)){f(tUj.u(t)). (4.25) k=-<xi Define an impulse response of a discrete filter, //(&>), as V« G Z , h{n) = 4T'{<t>_,MAM)) (4-26) and H{co) as a mirror filter with an impulse response h{n) = h(-n). The V2"' factor is necessary to equate h(n) with the h{n) found in other wavelet papers. Equation 4.25 then becomes < / ( « ) = S V2/r(2« - k)A^,J{k) = {42h{k) * A'j,J{k)){2n). (4.27) A=-oo with the discrete convolution defined as g * / ( « ) = {gik) * f{k)){n) =J,g{nA=-00 24 k)f{k). (4.28) As a result, the discrete approximation Aj /(^n), at resolutiony, can be calculated from the discrete approximation at the next higher resolution by convolving A'j^^fin) with the mirror filter H(co) ,multiplied by v2 , and keeping every other sample. The filter H{o)) is a unique characterisfic of the scaling funcfion <^t). If the vector spaces Vj are thought of as concentric circles, then the vector spaces Wj would become the ring spaces between the adjacent circles. Let the vector space IVj be defined as the difference between the two low resolution vector spaces Vi and Vr^. In other words, the union of Wj and Vj becomes Vj+^, F^^, = Wj u Vj. The vector space Wj also has the interesting property of being orthogonal to Vj, WjLVj. As a consequence of orthogonality, every function in ^Vi can be written imiquely as a sum of a function in Vj and a fiinction in Wj, F^^, = W^ © Vj. Since vector spaces Vj contain low resolution functions with increasing frequency content as the resolution increases, functions contained in the difference between two adjacent low resolution vector spaces have a narrow range of frequency content. From the discussion above of the narrow frequency content of wavelet functions, one could expect that wavelet functions are members of vector space Wj. In fact, wavelet functions are used to construct an orthonormal basis of Wj, i.e.. \fn eZ, y/ „(t) = v2^ i//{2-' t -n) form an orthonormal basis for Wj. (4.29) Since Wj are non-overlapping, orthogonal vector spaces in L^(R), one can conclude. \/J,n G Z , i//in(t) form an orthonormal basis for L\R). Wavelet and scaling functions are closely related such that several constraining relations can be derived which aid in developing different wavelet and scaling functions. From Equafion 4.22, we obtain, ^,.(0 = Z(^-.,o("),^o,.-2„(")M..,.(0 = t^Kk-2n)^^^,,{t). With g(n) defined as an impulse response of a discretefilter.G(co), 25 (4.30) \fn eZ, gin) = 42^{i//_,,(u),(P,^{u)) . (4.31) A similar relafion to Equafion 4.30 can be written for the wavelet functions since ^7jt(0eF,.^,,i.e., V'j.nit)= l^{v^-ioi^)Ak-2ni^))^j.ikiO= Z>/2g(^-2«)^^.,,,(0. k=-m (4.32) A =-00 By integrating both sides of Equafion 4.30 and settingy,«=0, we obtain. \mdt= Y.Kk)\ \mdt\, -00 A = -00 V_Qn Y.h{k) = / (4.33) /t = -00 With the Fourier transform pair for sequences defined as 1 H{co) = Y,Kn)e-''-" <:> h{n) = — \H{co)e''''dco, 2n (4.34) from Equafion 4.33, we get //(0)=1. A well know relation whose derivafion can be found in [6] is H(^cof ^\H{co^Ti:f =\. (4.35) From this equafion, setting <y=0 yields H(^7i)=0. With Equafion 4.32, setting7>=0 yields y/it) = 2Y^g{k)(/>{2t-k). (4.36) A = -oo The Fourier domain representation of this equation becomes ^ico) = 2 £ k=-oo - f fr„\ CO (f)^ o Z giky ^1 -{fJA (co<^ gik) -e ^'^ O — \2J \' A = -ooV \2) frn\ CO ( CO =G — O \2) \2 (4.37) From Equations 4.30 and 4.32, the orthogonality condition, Vj±Wj, yields 00 V«,/7GZ, 00 0 = (^^.„(O,^,,,(O) = 2 2 ] X ^ ( ^ - 2 « ) g ( ' ^ - 2 j ^ ) ( ^ , . u ( 0 . ^ , . u , ( 0 ) - (4.38) A=-00 0= m=-co 2Y.hik-2n)gik-2p) * = - 0 0 Wlien n,p=0, this equation has a solution, Vr G Z, gik) = i-\)'hi-k 26 + 2r + 1) [6]. (4.39) In order to convert this equation to the Fourier domain, take the summation of both sides with multiplied exponentials, °0 00 00 5]g(A:)g-"^ = J^e''^h(-k + 2r + \)e-''^ ^ g'(--).2..i) X/j(^)^-'(--)'". (4,40) Gico) = e""^^'^'^ H{K - co)e-'''^^''^'^ = -Hi7i-co)e-"'^"-^'\ Inserting this relation into Equafion 4.37 and setting r=0 yields 4>(^) = V 2 / / [ ; r - y ] o [ y ] . (4.41) By setting &F=0 in the previous two equations, one can use the above relation H{7r)=0 to conclude «> 00 4^(0) = \i//it)dt = 0 and G(0) = X g(^) = 0. -00 (4.42) A=-oo Thus, the wavelet function and the impulse response of Gico) have zero mean. The discrete filters, H(co) and Gico), can be determined from the scaling and wavelet functions or vice-versa. However, the discrete filters and scaling and wavelet functions must correspond to the above constraining relations and several other constraining relations which have not been mentioned but are included in [6] and [9]. The constraining relations have assisted mathematicians in developing many different wavelet functions. The complete DWT is based on decomposing Aj^^f{n), - J < j < -\. where J is the number of levels of decomposition, into two sequences Aj f{n) and Dj f{n). first sequence, ^^/(w), was discussed previously. The second sequence, Dj The f{n).w\\\ now be defined. Take the inner product offit) with both sides of Equafion 4.32, (/(O, ^,. (O) = Z ^gik - 2n){fit),^^.„,, it)). (4.43) /C=-Q0 With D"^ defined as an operator on fit) which produces the sequence V« G Z, D^Jfin) = {fit). y/^„ (/)) - 27 (4.44) and the sequence g(n) = gi-n) defined as the impulse response of discrete filter Gico). Equation 4.43 becomes 00 ^ ; / ( « ) = Z V2g(2« - k)A^.Jik) = i^gik) * <,/(^))(2«). (4.45) k=-oo This relation is comparable to Equafion 4.27. The sequence Dj f(n) can also be written as an expression comparable to Equation 4.21, D^fin) = 4V \fit)ii/{Vit-2-^n))dt = if{t)^4Vi{/{-Vt))i2-^n). (4.46) -00 Since the wavelet function has bandpass frequency characteristics [9], from this relation, D^'j fin) can be thought of as a bandpass filtered version offit) sampled at intervals of 2~^ n. D^'j fin) is referred to as a detail sequence, and Aj f(n) is referred to as a smooth sequence. From Equations 4.27 and 4.45, one can graphically represent the decomposition of Aj^^f(n) with a filter block diagram (Figure 4.1). The arrows entering and leaving Figure 4.1 represent the cascading of the filter diagram over J decompositions where - J < J < -\. Once the sequence A^fin) has been decomposed into J detail sequences and a smooth sequence, the original sequence can be reconstructed from the decomposed sequences. Since F^^, = Wj © Vj, (ffj^^„it) e F^.^, can be written uniquely as a sum of a function in Wj and a function in Vj. These two functions are the projections of ^VIT? on the respective vector spaces, Wj and Vj, i.e., ^j.uC) = Z («>..*(«)'«',>,.,.("))«>;.»(')+ Z (^;.*(").(*,.,.„(")K,('). k=-ao (4.47) k=-<x 00 °o ^J.^,niO= Z^/2/2(«-2^)^^.,(0+ A=-oo ^^gin-2k)i//^,it). A=-co Taking the inner product offit) with both sides of this relation yields <,/(«)= f,^Kn-2k)A^fik)+ jt = - 0 0 Y^42g{n-2k)D'jik). A=-00 28 (4.48) Inserting a zero between every sample of Aj fik) and Dj fik) results in an upsampling of the two sequences which is defined as Affi2k) = A'jik), D^f{2k) = D^fik). (4.49) Substituting m = 2k and the upsampled sequences into Equation 4.48 yields A'j^Jin) = i^h{m) * Aff{m)){n) + (V2g(m) * D'^fim)){n). Thus, A'^j^^fin) is reconstructed from the decomposed sequences Ajf(n) and (4.50) Djf{n) by taking the summation of the upsampled sequences convolved with the discrete filters 42Hico) and yl2Gico), respectively (Figure 4.2). Usually, discrete signals have finite length. There is a problem in translating finite length sequences into Equations 4.27, 4.45, and 4.50. Define A'f^^f{n) as a finite length sequence of length T, where Tis even, and Aj^^f{n) as a periodic extension of Af^^fin) with period T. If the sequences /?(«) and gin) are finite length and nonperiodic. A"! fin) and D"! fin) T will be periodic with a period of —. The requirement that Af^^fin) must have an even length translates into the requirement that the length of the original sequence, Af fin), must be divisible by 2*^, where J is the number of decompositions. As a result, the sum of the lengths of the two decomposed sequences Af fin) and Df fin), is the same as the length of Afjin). In other words, the DWT does not increase the number of samples of the original sequence. This compact representation of a sequence is an advantage over other progressive decomposition schemes [9]. The DWT lends itselfeasily to a familiar matrix notation. When Af^^f{n) is represented as a vector v^^' eR^, this vector is multiplied by a square, non-singular matrix, M+i- whose non-zero elements consist of the samples of 42hin) and -J2g{n). This matrix-vector multiplicafion, v' = A^y+iV^^'. results in a vector, v^ eR\ whose 29 elements consist of the two decomposed sequences Af fin) and Df f{n). The matrix notation of the DWT is best illustrated with an example. First, initialize the following finite length sequences: ^fi^) = «o,.. ^-{(^) = «-u, ^-t(^) = d_,^,. 0<n<7.0<k<3. (4.51) In Table 4.\,g{n) is derived from Equation 4.39 by setting r=\. Also, gin) and hin) were defined above as gi-n) and hi-n), respectively. Remembering that Atjin), and D^Jin) are periodic extensions of A^^ f{n), A%f{n), A^fin), and D'fj{n), respectively, one can represent Equations 4.27 and 4.45 with the matrix notation Vi,o' ^0,0 ^-1,1 ^0,1 «-l,2 «-l,3 ^3 ^-1,0 -Cn d-x,x -c. Ci «0,2 ^1 «0,3 (4.52) «0,4 -c. «0.5 -c. "-1,2 -^0 «0,6 ~ ' ^ 2 _ _«0,7 _ _^-13_ where the blank spaces in the square matrix represent zeros. Also, the matrix notation of Equation 4.50 becomes C3 ^0,0 «0,1 <^1 <^1 " -^0 -^2 V,,o' ^-1.1 ^0,2 <^1 C3 ^-1,2 «0,3 -^0 -C2 ^-1.3 «0,4 ^1 ^^3 ^-1,0 «0,5 -^0 -C2 d-u «0,6 <^1 ^3 _«0,7 _ -Co ~^2_ (4.53) d-u d.u_ The square matrix in Equafion 4.53 is the inverse and the transpose of the square matrix in Equafion 4.52. Thus, the square matrix in Equafion 4.52 is orthogonal, and its rows form an orthonormal basis for R^. 30 The filter coefficients of the DWT used in the EZW algorithm (described in the next chapter) are the DAUB4 coefficients named after Ingrid Daubechies [10]. The number 4 is associated with the name DAUB4 because hin) has a length of 4 similar to the above matrix example. Since the rows of the square matrix in Equation 4.52 form an orthonormal basis for R^, two independent equations for the four coefficients can be derived, cl+c^ +cl+cl =\, C^CQ+C^C, = 0 [11]. (4.54) In order for these four coefficients to have a unique solution, two more independent equations are required. The DAUB4 coefficients include vanishing moments for the first two moments of g(n), 00 Y.^'gin) = 0, p = 0,\. (4.55) W= -00 The vanishing moments yield two more independent equations, c, -c^ +c, -Co =0, OC3-IC2 +2c, -3co = 0 , (4.56) which give a imique solution for the coefficients, I + V3 3 + V3 3-V3 I-V3 As the number of coefficients increases, the number of vanishing moments also increases. The Daubechies filters have even lengths because when the filter length increases by two, the number of vanishing moments increases by one. With more vanishing moments of gin), the wavelet ftinction, i/4^t), becomes more regular or smooth with higher continuous derivatives [11]. In order for the DWT to operate on images, the DWT must be extended to two dimensions. As discussed above, the vector spaces Vj form multiresolution approximations of Z (7?). Similarly, define the vector spaces Vj as multiresolution approximations of L^iR^). Further, define the vector spaces Vj as separable multiresolution approximations of L'iR~) where Vj can be decomposed into a tensor 31 product of two one-dimensional vector spaces, Vj = Vj (S) Vj. These one-dimensional vector spaces are multiresolution approximations of L^iR). The scaling fiinction of F" can be separated into two one-dimensional scaling fiinctions, ^ix,y) = (l)ix)(l>{y), (4.58) where (jix) is the one-dimensional scaling function for Vj. The vector spaces which lie in the differences between resolutions in two dimensions are defined as W'^. An orthonormal basis for WJ can be constructed using scaled and translated versions of three wavelet funcfions, ^\x,y), ^'^{x,y), and ^^ix,y). These wavelet functions are separable into scaling and wavelet functions of one dimension, ^'\x,y) = (l>{x)y/iy), '\'\x,y) = y/{x)(l>{y), ^^\x,y) = ii/{x)ii/{y). (4.59) With similar development as above, let A'j be the operator for the scaling function and Df , Df, and Df be the operators for the three wavelet functions such that A'j{n,m) = if{x,y)^V(l>i-Vx)(/>i-Vy))i2-^n,2-^m), Dffin,m) = (/(jc,>^)*2V(-2^x)^(-2^>;))(2-^«,2-^m), Dffin,m) = {f{x,y)^V Dffin,m) = {f{x,y)''Vy/i-Vx)y/i-2'y))i2-'n,2''m), y/i-V x)(l>i-V y))i2-^ n,2-^ m), ^"^'^^^ \/n,meZ. These operators perform descretized filtering offix,y) along the x and y axes. With similar development as above, these operators can be written as a decomposition of A'^^^fin,m) into four, two-dimensional sequences, A'jin,m) = i2hik)hil) * Afjik,l))i2n,2m). Dffin,m) = i2hik)gil) * AfjikJ))i2n,2m), Dffin.m) = i2gik)hil) * Afjik,l))i2n,2m). Dffin,m) = i2gik)gil) * Afjik,l))i2n,2m). 32 ^n^m G Z. These four two-dimensional sequences and Af^fin,m) are set to periodic extensions of finite digital images, similar to the above mentioned periodic extensions of onedimensional finite sequences. The period of the decomposed sequences becomes half of the period of Af.^f{n,m). In other words, the decomposed finite images have half the dimension size of Af^f{n,m) so that the four decomposed images, taken together. ha\e the same number of samples as Af^fin,m). When placed into matrix notation or finite sum notation, the two-dimensional transformation kernel of the decomposition of Af^fin,m) becomes separable in the n and m directions. When a two-dimensional transform is separable, the transform can be written as a cascade of two one-dimensional transforms in the two perpendicular directions [1]. Thus, the 2D DWT of a finite digital image can be represented as the cascade of a one-dimensional DWT on the rows and a one-dimensional DWT on the columns of the image. Wlien a digital image is decomposed one level, the four sub-images contain different types of frequency information of the rows and columns. The low resolution sub-image appears as a recognizable downsampled version of the original image. The other three sub-images are less recognizable with high frequency information of either the rows or columns or both. These three detail sub-images are sometimes referred to as error images since they represent the error or difference between the low resolution subimage and the original image. 33 ^Afjin) i2 V2G 12 ^DU{n) ^H >^2 ^A^fin) keep one sample out of two convolve with filter X X Figure 4.1 Discrete Wavelet Decomposition [9] - > Jd AUin) ' t2 V2// D]fin) ^ t2 V2G iy^<j(«)^ ' 2 put one zero between each sample X convolve with filter X Figure 4.2 Discrete Wavelet Reconstruction [9] Table 4.1 Initialized Filter Impulse Responses n V2/2(«) -3 0 -2 V2g(«) V2g(«) C3 0 -Co 0 C2 0 Ci -1 0 Ci 0 -C2 0 Co Co c? c. 1 Ci 0 -c. 0 2 C2 0 Ci 0 3 C3 0 -Co 0 42h{n) 34 CHAPTER V EMBEDDED ZEROTREE WAVELET ALGORITHM The embedded zerotree wavelet (EZW^ algorithm developed by Jerome Shapiro [12] is a quantization algorithm used in lossy image compression. The EZW algorithm uses two other independently developed algorithms, the DWT and arithmetic coding, to form a complete image compression system. The arithmetic coding and DWT algorithms have been described in detail in Chapters III and IV, respectively. EZW is a type of transform coder discussed in Chapter I because an image is transformed with the DWT prior to quantization. Also, since EZW quantizes the wavelet coefficients individually and not in blocks, EZW is considered a scalar quantizer. Because the image information is not distorted in the DWT or the arithmetic coding algorithm, all of the information loss occurs in the EZW algorithm where the wavelet coefficients are approximated. EZW is a low bit rate algorithm. In other words, EZW performs best at high compression levels. The reconstructed image quality compares well or better than other image compression programs at high compression levels such as 80:1 or 100:1. The algorithm makes the most of very little image data. The compression performance is dependent on the existence of zerotrees which are exponentially growing trees of insignificant wavelet coefficients. Zerotrees will be discussed later. At high compression levels, most of the wavelet coefficients are considered insignificant which results in large and numerous zerotrees. However, at low compression levels, most of the wavelet coefficients are considered significant and the size and number of zerotrees decreases. Therefore, at low compression levels such as 10:1 or 20:1 other compression programs such as JPEG have an advantage over EZW. The worth of EZW is dependent on the specific need of an image compression program. If one needs compressed images with near impercepfible distortion at low compression levels, JPEG compression would offer a good solution. However, if one needs highly compressed images where the image features remain recognizable but have some visible distortion, EZW will ha\e an advantage over JPEG. The distortion introduced by EZW is more visually pleasing than 35 the blocking effect distortion produced by JPEG. Results of EZW compression will be discussed further in Chapter VI. Besides improved compression performance, another advantage of EZW is its embedded code. EZW represents an image in a similar way that a computer represents a decimal number of infinite precision with finite length code. The larger the number of bits which represent a decimal number, the more accurate the representation will be. The lower resolution digits are embedded in the front of the code so that the more precise information is added on the end of the code. The code simply terminates when the desired precision has been reached. Similarly, all coarse representations of the image are embedded at the front of the EZW code and the more detailed image content is added on the end. The EZW algorithm stops executing or generating code when the desired image accuracy has been reached. As a result, a desired CR or distortion metric can be met exactly. This embedded code does not affect the compression efficiency. The EZW algorithm is an iterative algorithm so that it makes successive passes through the image information. Each pass produces more detailed information of the image. Different aspects of each iteration will be described and justified intuitively, but for a more thorough rationalization and mathematical justification, the reader is referred to [12]. In each iteration, the magnitudes of the wavelet coefficients, |jc|. are compared to a single threshold value T. The coefficients equal to or below the threshold, |x| < T, are labeled insignificant, and the coefficients above the threshold, \x\>T, are defined as significant. The wavelet coefficients compactly represent the low frequency and high frequency information of an image in distinct spatial locations. The high frequency information which is contained in the lower level sub-image coefficients may represent very few image pixels, but contributes significantly to the perceptual image qualit>. The low frequency information which is contained in the high level sub-image coefficients represents a large number of pixels compacted into a few coefficients. Therefore, large magnitude high and low frequency coefficients are considered equally important and 36 compared with the same threshold, T. The insignificant coefficients are coded as zero values. Once the significant coefficients are found, their magnitudes and positions must be coded. Magnitude coding will be discussed later. Efficient position coding represents a formidable challenge. Since the wavelet coefficients are uncorrelated, it difficult to predict where a significant coefficient will occur. However, it is easier to predict where the insignificant coefficients will lie because of the decaying nature of the wavelet coefficients across decomposition levels. Coefficients at lower levels tend to have statistically smaller magnitudes than coefficients at higher levels. Therefore, if a coefficient at a high level is found insignificant, the coefficients with the same spatial orientation at lower levels have a high probability of also being insignificant. The coefficients with the same spatial orientation at different levels form tree like structures. These tree structures of insignificant coefficients are called zerotrees because they are coded as zero values. The nodes of the zerotrees have parent-child relationships such that most parent nodes each have four children nodes (Figure 5.1). The two letters in each block in Figure 5.1 represent either the low (L) or high (H) frequency information of each decomposition for the rows and columns, respectively. The subscripts represent the levels of decomposition. A coefficient in a decomposition level above the first, except for the lowest resolution sub-image coefficients, will have four children with the same spatial orientafion in the next lower level. The coefficients in the lowest resolution sub-image will have three children with the same spatial orientation, one in each of the other three sub-images on the same level. All of the children nodes connected to a parent are called the descendants of the parent. Similarly, all of the parent nodes connected to a child are called the ancestors of the child. A zerotree is formed when a parent and all of its descendants are found to be insignificant. This parent or the top node in the zerotree is called the zerotree root. After a zerotree root symbol is coded in a compressed file, the decoder will know to automatically assign zeros to the zerotree root and all of its descendants. As a result, many insignificant coefficients can be predicted w ith one 37 encoded symbol. With the insignificant coefficients coded efficiently with zerotrees, the positions of the significant coefficients also become coded efficiently. The alphabet used to encode the positions of significant coefficients, except in the lowest decomposition level, will include four symbols: positive significant (PS), negative significant (NS), isolated zero (IZ), and zerotree root (ZR). In the lowest level, the alphabet will include three symbols: PS, NS, and zero (Z). There is not a possibility of a ZR in the lowest level, so the ZR symbol is not included in its alphabet. The significant coefficients are divided into PS and NS so that the magnitude coding will not have to keep track of the sign information. The symbols are entropy coded with the adaptive arithmetic coder described in Chapter III. With the arithmetic coder, the statistical model is separate from the coder. The adaptive statistical model consists of a simple histogram. A different histogram is used to represent each separate alphabet of symbols encoded. Therefore, different symbol sources will be intermixed during the coding process. The decoder will be able to unmix the sources since the order of the mixing will be known. The adaptive arithmetic coder has an advantage in coding the small alphabets used in this algorithm. Each of the symbols will occur regularly, so the statistical model will only need to have a short memory to keep track of symbol probabilities. With a short memory, the model will adapt quickly to changing symbol probabilities which are usually non-stationary. A maximum frequency count of 256 was used to balance the need for an accurate model with the need for the model to adapt quickly. The scanning of the image for significant and insignificant coefficients while coding the position information is called the dominant pass of the image. The order of the scanning is important because the decoder will read the information in the same order. The algorithm scans the image from the low resolution sub-images to the high resolution sub-images (from the high decomposifion level to the low decomposition le\ el) so that each parent is scanned before its child. The coefficients are assumed to be zero until the\ are found to be significant. If the coding stops in the middle of the dominant pass, only the lower resolution coefficients will have the chance to be found significant. Since the 38 low resolution wavelet coefficients can produce recognizable image features better than the high resolution coefficients, this ordering is justified. The sub-images are scanned one at a time in the following order after the lowest resolution sub-image (LL^): upper right (HLN), lower left (LH^), and lower right (HH^). The coefficients within each subimage are scanned from left to right and from top to bottom. The ordering of the subimages in each decomposition level and the coefficients within each sub-image need to be consistent but are otherwise arbitrary. As each coefficient is scanned, the algorithm goes through a sequence of decisions to determine which symbol to code (Figure 5.2). Once a coefficient is found to be significant, its position is not coded again in subsequent dominant passes. In other words, the dominant pass skips over coefficients which were found significant in previous dominant passes. When a coefficient is found to be significant, its magnitude is appended to a list called the subordinate list. With each iteration of EZW, the threshold value, T. is divided in half. Therefore, with each dominant pass, more coefficients will be appended to the subordinate list. Based on the value of the current threshold, the decoder can estimate the coefficient magnitudes in the subordinate list during the dominant passes. For example, during the second dominant pass, the decoder will know that the significant magnitudes lie somewhere between T and 772, where T is the original threshold. As a result, the width of the magnitude uncertainty interval, (772, 7], is equivalent to the current threshold, 772, during the dominant pass. The decoder esfimate of a magnitude is the center of an imcertainty interval. This estimate provides a simple and effective magnitude estimate. Since this relationship between the uncertainty interval and threshold will also hold during the first dominant pass, for the sake of consistency, the initial uncertainty interval will become (7, 27]. This initial uncertainty interval places a constraint on the initial threshold with respect to the maximum wa\ elet coefficient magnitude, — max|x| < r < max|x|. 39 (5.1) Obviously, T must be less than the maximum magnitude or else the first dominant pass would not find any significant coefficients. If T is less that half the maximum magnitude, then at least one magnitude will lie outside the initial uncertainty interval. (T, 27], and the decoder estimate will be less accurate. With each iteration of EZW, the uncertainty interval width (UIW) of the significant magnitudes found in the dominant pass is divided in half However, all of the magnitudes in the subordinate list should maintain the same UIW because of the successive approximation principle. In other words, all of the significant magnitudes found should be approximated with the same range of uncertainty. The UIW's of the significant magnitudes found in previous dominant passes are reduced to the current UIW, before the next dominant pass begins, with an iteration called the subordinate pass. After the first dominant pass, the magnitudes in the subordinate list are compared with the centers of the uncertainty intervals which are the decoder estimates. If a magnitude is above the decoder estimate, a " 1" symbol is output. If a magnitude is below the decoder estimate, a "0" symbol is output. These symbols are then entropy coded with a separate two symbol alphabet. The new uncertainty intervals will either be above or below the centers of the old uncertainty intervals. As a result, the UIW of the subordinate list is divided in half by this process. Also, the UIW of the subordinate list will be equivalent to the UIW of the magnitudes appended during the second dominant pass. When the subordinate pass is applied to the subordinate list after each dominant pass, the UIW of the entire subordinate list will remain uniform. Since the accuracy of large magnitude coefficients is more important to image reconstrucfion than the accuracy of small magnitude coefficients, the sorted order of the subordinate list should reflect the importance of the larger magnitudes. At the end of the first dominant pass, the subordinate list magnitudes are in order of appearance within the image. From the decoder point of view, all of these magnitudes are the same, which is the center of the uncertainty interval. After the magnitudes are refined with the subordinate pass, their decoded values will change and may differ from one another. The subordinate list should then be sorted with respect to the decoded \alues so that the 40 second subordinate pass will begin refining the largest magnitudes first (from the decoder point of view). If the subordinate list is sorted each time after it is refined, the decoded subordinate list will remain in sorted descending order before each subordinate pass begins. As a resuh, the more important symbols which refine the larger magnitudes will be decoded before the symbols refining the smaller magnitudes. This aspect of the EZW iteration is useful in case the decoding stops during the subordinate pass. Each EZW iterafion consists of a dominant pass followed by a subordinate pass followed by a sorting of the subordinate list with respect to the decoded magnitudes. The iterations continue until a target bit rate, CR, or distortion metric is reached. In the implementation shown in this chapter, a target CR stops the EZW process. A reconstructed image distortion metric such as mean square error (MSE) should be close to non-increasing if not non-increasing with respect to the bit rate. If the EZW algorithm is still somehow unclear, an example is sometimes worth a thousand words. A very good EZW example is included in Appendix C. EZW is an innovative approach to quantize wavelet coefficients so that a recognizable image can be reconstructed with very little data. It was of interest to see how EZW performed with a different set of wavelet coefficients. The wavelet transform used in [12] was based on the 9-tap symmetric quadrature mirror filters whose coefficients are given in [13]. However, the wa\elet coefficients used in the implementation given in this chapter are the DAUB4 wavelet coefficients described in Chapter IV. The performance of EZW in both implementations was comparable. The results of standard image gray scale compression in Chapter VI can be compared with the results in [12]. Also of interest was to have a readily available EZW compression program to compare with other image compression algorithms developed in our image processing lab such as the adaptive fuzzy clustering algorithm [14]. Since EZW is a well respected algorithm, the EZW compression program can be used in future research to compare with new image compression schemes. The implementation of EZW in ANSI C code compresses and uncompresses an 8bit raw image of variable dimensions. The image can be rectangular, but the w idth and 41 height of the image must be divisible by 2^ where J is the level of wavelet decomposition, as discussed in Chapter IV. With different initial threshold inputs, different reconstructed image distortions will result. The initial threshold must be optimized experimentally as discussed in [12]. The initial threshold can be optimized for a large class of images with a single initial threshold used to compress several images. In order to develop a more user friendly image compression system, the initial threshold input must somehow be eliminated. A user should not be expected to experiment w ith different thresholds to discover the one that gives the least distortion for a particular image. Further research is needed in this area. If the CR's were mapped to a distortion metric such as the peak signal-to-noise ratio (PSNR) for a large class of images, a user could input a target PSNR instead of a desired CR. The option of a desired reconstructed PSNR or CR could make the program easier to use. Calculating the reconstructed PSNR while the image is being coded would give the same ability to set a target PSNR but that method may be too computationally intensive. The EZW program presented in this chapter represents a flexible and improvable starting point for research into EZW compression. 42 LL3 HL3 HL2 LH3 HH3 HL, n HH2 4- LH, \ HH, Figure 5.1 Zerotree Structure 43 Input coefficient ^ Do not code Yes ^ Do not code Code PS Code NS Code IZ Figure 5.2 Position Coding Flowchart 44 Code ZR CHAPTER VI RESULTS OF EZW COMPRESSION EZW compression was tested on two color images and one gray-scale image. The compression tests demonstrate the advantages of EZW over JPEG compression at high compression levels. The 24-bit color images were separated into three color planes (red, green, and blue) for EZW compression since the EZW program can only compress 8-bit images. The RGB color space was the logical separation choice. The three reconstructed color planes were then combined into a reconstructed 24-bit color image. The JPEG compression program compressed the color images directly without separation into color planes. The distortion of the reconstructed images was quantitatively measured with the MSE and PSNR. The MSE and PSNR are both calculated from the cumulative square error (CSE), CSE=Y.Ti^j,k-h>^f^ (6T) 7=0 * = 0 where Xj ^ is the pixel value of the original M by N image and x^^ is the pixel value of the reconstructed image. The MSE and PSNR become ^255^^ V est J CSE MN The reconstructed Lena and Baboon images (512x512 size) show a distinct appearance difference between the EZW and JPEG compressed images (Figure 6.1). The distortion in the EZW compressed images is smooth without much blocking artifacts. Some vertical and horizontal artifacts are due to the vertical and horizontal wavelet transforms. Since the discrete cosine transform (DCT) in JPEG is performed on 8 by 8 image blocks, these image blocks appear as artifacts in the highly compressed images. The EZW compressed Lena image with 100:1 CR and the EZW compressed Baboon image with 80:1 CR more accurately represent the original respective images than the 45 respective JPEG compressed images (Table 6.1). This improved performance of EZW compression over JPEG compression is expected as the CR increases. The gray-scale standard Lena image was compressed many times w ith EZW and JPEG to produce two plots (Figure 6.2). With a CR of 50, gray-scale Lena was EZW compressed with 60 different initial thresholds. The valid initial thresholds lie between 6026 and 12052. The initial thresholds given to the EZW program were from 6100 to 12000 at intervals of 100. The distortion generally decreases as the threshold increases with the minimum distortion at 11900. Different images and different CR's of this image may have different initial threshold curves. This threshold curve was meant to give a general idea of how different initial thresholds can affect the reconstructed distortion. The minimum distortion threshold that was found in Figure 6.2a was used to EZW compress gray-scale Lena at 36 CR's from 10 to 118. JPEG was also used to compress the same image at many compression ratios. The two curves (Figure 6.2b) demonstrate that EZW compression performance surpasses JPEG compression performance at a CR of about 42. This graph reasserts the idea that EZW performs well at high compression levels. 46 (a) Original Lena image 1 (b) EZW compressed at (c) EZW compressed at (d) EZW compressed at 60:1 CR 80:1 CR 100:1 CR 4 WMJMM4 m^'.'^^^i (e) JPEG compressed at (f) JPEG compressed at (g) JPEG compressed at 59.6:1 CR 78.0:1 CR 96.9:1 CR Figure 6.1 Lena and Baboon Images Compressed with EZW and JPEG 47 (h) Original Baboon image (i) EZW compressed at (j) EZW compressed at (k) EZW compressed at 40:1 CR 60:1 CR 80:1 CR (1) JPEG compressed at (m) JPEG compressed at (n) JPEG compressed at 40.8:1 CR 61.0:1 CR 79.1:1 CR Figure 6.1 Continued 48 Table 6.1 Lena and Baboon Compressed Image Statistics Image Compression CR MSE PSNR BPP Lena EZW 60 88.44 28.66 0.4 Lena EZW 80 115.1 27.52 0.3 Lena EZW 100 130.0 26.99 0.24 Lena JPEG 59.6 78.90 29.16 0.403 Lena JPEG 78.0 114.9 27.53 0.308 Lena JPEG 96.9 169.0 25.85 0.248 Baboon EZW 40 480.5 21.31 0.6 Baboon EZW 60 568.1 20.59 0.4 Baboon EZW 80 620.2 20.21 0.3 Baboon JPEG 40.8 420.2 21.90 0.588 Baboon JPEG 61.0 556.6 20.68 0.393 Baboon JPEG 79.0 684.1 19.78 0.304 49 6000 7000 8000 9000 10000 11000 12000 initial threshold (a) Threshold vs. MSE Plot for EZW Compression 250 200 JPEG compression m CO ^ 150 •D O i 100 EZW compression o o 0) 50 0 0 20 40 60 80 compression ratio (nnn:l) 100 (b) EZW and JPEG Rate-Distortion Curves Figure 6.2 EZW and JPEG Compression Plots 50 120 CHAPTER VII SUMMARY AND CONCLUSIONS Two lossless compression programs and one lossy image compression program have been implemented and described in this thesis. The results of Huffman coding were compared with arithmetic coding with one example. Arithmetic coding will consistently compress better than Huffman coding without losing any information. This arithmetic coding characteristic occurs because arithmetic coding does not ha\ e the same theoretical upper compression bound as variable length coders like Huffman coding. After the 2D DWT was described, this transform and arithmetic coding w ere applied to the new EZW quantization algorithm. EZW compression is a low bit-rate compression algorithm, and the results demonstrated EZW's better performance at low bit-rates over the standard JPEG compression for gray-scale images. The advantage of EZW over JPEG in color image compression was less obvious. The major drawback in the method used to compress the color images with EZW was the absence of the elimination or reduction in the redundancy between color planes. The RGB color planes were simply separated, compressed equally, reconstructed, and recombined. The RGB color planes are correlated with one another. A transformation to a more uncorrelated color space, where the color plane with less visual information is quantized more coarsely than the other planes, should be attempted. There has been much research in optimal color image quantizafion [15]. More research is needed into how EZW can be applied to a better color image compression scheme. The EZW program itself can also be improved. With more experiments w ith the initial threshold, the best initial threshold may be predicted by the EZW program so that the user input of an inifial threshold can be eliminated. This result would produce a more user friendly program similar (in ease of use) to the JPEG compression program. 51 REFERENCES I] Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, New York: Addison-Wesley Publishing Company. 1992, pp 307-411. 2] Huffman, D. A., "A Method for the Construcfion of Minimum Redundancy Codes." Proc. IRE^ vol. 40, no. 10, 1952, pp 1098-1101. 3] Stephen G. Kochan, Programming in ANSI C, Indianapolis. Ind.: Sams Publishing, 1994. 4] Ian Witten, Radford Neal, and John Cleary, "Arithmetic Coding for Data Compression," Communications of the ACM. vol. 30, no. 6, June 1987. pp 520-540. 5] D. Gabor, "Theory of Communicafion," J. Inst. Elect. Eng. (London). Vol. 93. No. 3. 1946, pp 429-457. 6] H. J. Barnard, Image and Video Coding Using a Wavelet Decomposition, 1994, pp 727. 7] Ingrid Daubechies, Ten Lectures on Wavelets, Philadelphia, Pa.: Capital City Press. 1992, p. 2. 8] Ingrid Daubechies, "The Wavelet Transform, Time-Frequency Localization and Signal Analysis," IEEE Transactions on Information Theory, Vol. 36, No. 5, pp 9611005, September 1990. 9] Stephane G. Mallat, "A Theory for Multiresolufion Signal Decomposition: The Wavelet Representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 7. pp 674-693 July 1989. 10] Ingrid Daubechies, Communications on Pure and Applied Mathematics, vol. 41, 1988, pp 909-996. II] Press, Teukolsky, Vettering, and Flannery, Numerical Recipes in C, Cambridge: Cambridge University Press, 1992, pp 591-606. 12] Jerome M. Shapiro, "Embedded Image Coding Using Zerotrees of Wavelet Coefficients," IEEE Transactions on Signal Processing, Vol. 41, No. 12, December 1993, pp 3445-3462. 13] E. H. Adelson, E. Simoncelli, and R. Hingorani, "Orthogonal Pyramid Transforms for Image Coding," Proc. SPIE. vol. 845. Cambridge, MA, Oct. 1987. pp 50-58 14] S. Mitra and S. Pemmaraju, "Adaptive Vector Quantization using an ART-based Neuro-ftizzy Clustering Algorithm," Invited Paper presented at the International Conference on Neural Networks, June 3-6, Washington, D. C . June 3-6, 1996. 52 [15] Jean-Pierre Braquelaire and Luc Brun, "Comparison and Optimization of Methods of Color Image Quantization," IEEE Transactions on Image Processing- Vol. 6, No. 7, July 1997, pp 1048-1052. 53 APPENDIX A: ANSI C IMPLEMENTATION OF HUFFMAN ENCODING AND DECODING huffman.c /* functions used in Huffrnan algorithm */ #include <stdio.h> /* Huffrnan structure */ struct hufftnan { short int symbol; int occurrence; short int codelength; int codenumber; struct huffrnan *next; struct hufftnan * child; }; extern FILE *out; static int code_len[511],code_num[511]; /* code each symbol */ void code_symbol(int symbol) { int n,len=code_len[symbol]; void output_bit(int bit); for (n=l ;n<=len;++n) output_bit(code_num[symbol]&(l«(len-n))); } /* The "codebook" ftinction initializes "codelen" and "codenum" and sends the code book to a file. The "codejen" and codenum" arrays are used by the symbol coder to instantly look up the code length and code number for any symbol. Searching through a linked list for the code length and code number would be too time consuming. However, the symbol values must be translated into array indexes which are positive integers. Since the symbol values range from -255 to 255, adding 255 to the symbol values would translate them to a range from 0 to 510. The symbol values and code lengths were written as short integers (2 bytes) instead of integers (4 bytes) to save file space. */ void code_book(struct huffrnan *listptr, int sym_count) { int num; short int sym,len; fwrite(&sym count,sizeof(int), 1 ,out): while(listptr!=NULL) { sym=listptr->symbol,num=listptr->code_number; len=listptt-->code_length; code_len[sym+255]=len; code_num[sym+255]=num; fwrite(&sym,sizeof(short int), 1 ,out); 54 fwrite(&len,sizeof(short int), 1 ,out); fwrite(&num,sizeof(int), 1 ,out); listptr=listptr->next; } } /* insert structure into sorted list */ void insert_occurrence(struct huffrnan list[],int listnum) { struct huffrnan *listptr,*listpp; Iistptr=list[0].next,listpp=list; while(listptr->occurrence<list[list_num].occurrence&&listptr->next!=NULL) listpp=listptr,listptr=listptr->next; if (listptr->occurrence >= list[list_num].occurrence) listpp->next=&list[list_num],list[list_num].next=listptr; else listptr->next=&list[list_num],list[list_num].next=NULL: } /* sort structures with respect to occurrence */ void sort_occurrence(struct huffrnan list[],int symcount) { int n; list[0].next=&list[ 1 ],list[ 1 ].next=NULL; for (n=2;n<=sym_count;++n) insert_occurrence(list,n); } /* perform source reduction */ void create_parent(struct huffrnan list[],int symcount) { struct huffrnan *listptr=list[0].next; Iist[sym_count].child=list[0].next; list[sym count].occurrence=listptr->occurrence,listptr=listptr->next; list[sym_count].occurrence+=listptr->occurrence,list[0].next=listptr->next; insert_occurrence(list,sym_count); } /* initialize structures */ void initialize_list(struct huffrnan *listptr) { while (listptr!= NULL) { listptr->child=NULL,listptr->code_number=0; listptr=listptr->next; } } /* The "assigncodes" ftinction recursively retraces the source reduction process and assigns codes to ever} symbol created. Since the original symbols which were created by the histogram process have the "child" pointer pointing to "NULL," the "assigncodes" function stops iterating when the "NULL" pointer is reached, "level" represents the number of bits in each code. With each recursion, "level" is incremented. The "code_number" value is passed back fr-om the parent to the to the tvvo symbols which were source reduced to form the parent. These two symbols left shift "codenumber," and add a 1 or 0 to the least significant bit. This process is continued until "codelength" and "codenumber" have been assigned to all of the original symbols. */ 55 void assign_codes(struct hufftnan *listptr,int level) { if(listptr=NULLj return; (listptr—'next)->code_number=listptr->code_number; listptr->code_length=level,listptr->code_number«= 1; if (listptr->child!-NULL) (listptr->child)->code_number=listptr->code_number: assign_codes(listptr->child,level-1); listptr=listptr->next,listptr->code_length=level; listptr->code_number«=l ,listptr->code_number|=l; if(listptr->child!=NULL) (listptr->child)->code_number=listptr->code_number; assign_codes(listptr->child,level+1); } /* The Hufftnan codes are generated from the histogram data and the code book is stored in a file. The "hufftnancode" function executes the Huffrnan coding process. First, the linked list of structures is sorted in ascending order with respect to "occurrence" with the "sortoccurrence" ftinction. The first uvo svmbols in the list will have the least probabilities of occurrence and the least "occurrence" values. These two symbols will be source reduced with the "create_parent" function. The symbol count is incremented so "create_parent" can create a new symbol structure. The list count, "listcount," is decremented because the size of the source reduced list will decrease by one. The new symbol's "occurrence" value will be the sum of the "occurrence" values of the symbols which were source reduced. The two symbols which are source reduced are taken out the sorted list, and the new symbol is inserted so that the list remains sorted. The "child" pointer of the new symbol structure points to the first of the two structures which were removed. This process is iterated until only two symbols remain in the list. Next, the code book is sent to the output file. The original symbols are re-sorted, so that an iterative loop can process them in a linked list. */ void hufftnan_code(struct huffrnan list[],int symcount) { int list_count=sym_count,n=sym_count; sort_occurrence(list,n); initialize_list(list[0] .next); while (list_count>2) -H-sym_count,create_parent(list,sym_count),—listcount; assign_codes(list[0].next, 1); sort_occurrence(list,n): code_book(list[0].next,n); } histogram, c /•create a histogram from file data */ #include <stdio.h> struct huffrnan { short int symbol; int occurrence; short int codelength; int codenumber; struct huffrnan *next; 56 struct huffman *child; }; /* create a histogram with a linked list of structures */ int hist_input(struct huffman list[],int sym,int nj { struct huffrnan *listptr,*listpp; if(n==0) { list[ 1 ].symbol=sym,list[ 1 ] .next=NULL,list[ 1 ].occurrence = 1; list[0].next=&list[l],++n; retum(n); } Iistptr=list[0].next,listpp=list; while (listptr->symbol < sym && listptr->next != NULL) listpp=listptr,listptr=listptr->next; if (listptr->symbol == sym) { ++listptr->occurrence; retum(n); } else if (listptr->symbol > sym) { ++n,list[n].symbol=sym,list[n].occurrence=l,list[n].next=listptr; listpp->next=& list[n]; retum(n); } else { ++n,listptr->next=&list[n],list[n].symbol=sym,list[n].occurrence=l; list[n].next=NULL; retum(n); } } /* The "histoutput" function sends the histogram data to an output file in the sorted order. The first structure element contains the pointer to the first structure element with symbol data, so the first structure element with symbol data can be any structure in the array. The final structure in the linked list has the "next" pointer pointing to "NULL," so an iterative loop can find the end of the list. The histogram data consists of the "symbol" value followed by the "occurrence" value. Each data pair is sent to the output file in floating point format. The Hufftnan program was compiled and ran on a SPARC20 Sun machine, and the histogram data was read and plotted with MATLAB on a PC. The floating point format was the only format which the N4ATLAB program could read correctly. */ void hist_output(FlLE *img_dest, struct hufftnan *listptr) { float n[2]; while (listptt- != NULL) { n[0]=(float)(listptr->symbol),n[l]=(float)(listptr->occurrence); fwrite(n,sizeof(float),2,img_dest),listptr=listptr->next; } } mhisthuff.c /•contains the main function for Hufftnan coding an 8-bit image */ #include <stdio.h> 57 sttTJCt huffman { short int symbol; int occurrence; short int codelength; int codenumber; struct hufftnan *next; struct hufftnan * child; }; FILE *out; /* The "main" function checks the validity of the command line arguments and controls the flow of the program. This function receives pixel information from an input image and codes the difference beU\ een each pixel and the pixel imediately preceeding using the Huffrnan algorithm. The program compresses images stored in the raw data format with 8 or 24 BPP. The program compresses a 24-bit color image b\ coding the three color planes separately. The image must be stored in the planar, non-interlaced format (ie. [RRRRR... GGGGG... BBBBB...]). The "-P" switch controls which color plane will be coded. For example, the command line option "-P 2" would compress the second color plane. The default value of "p" is 1, so the "-P" switch can be omitted for gray scale images. The "-H" and "-W" switches indicate the height and width of the image, respectively. The default value of the height and width is 512 since 512 is the common dimension of standard images. The string count, "strcount," is the total number of symbols to encode. The symbol count, "symcount," is the number of unique symbols. The string count is the area of the image, and the symbol count is determined by the "histinput" ftinction. */ main(int argc, char *argv[]) { FILE *in,*out2; int n=argc,p= 1 ,w=512,h=512,sym_count=0,str_count,current,past=0; struct hufftnan *big_list; void start_outputing_bits(void); void done_outputing_bits(void); void code_symbol(int symbol); void hufftnan_code(struct hufftnan list[],int symcount); int hist_input(struct hufftnan list[],int sym,int n); void hist_output(FILE *img_dest,struct hufftnan list[]); if ( argc = 1) /* if no arguments print usage */ { printf("usage: huff_code In Outl Out2 [-W nnn] [-H nnn] [-P n]\n"); exit(O); } while ( - n ) /* get switches */ { if(argv[n][0] = '-') { if(argv[n][l] = 'W') w = atoi( argv[n+l]); elseif(argv[n][l]=='H') h = atoi( argv[n+l]); elseif(argv[n][l]=='P) 58 p = atoi(arg\[n-l]); else { printf("ln\alid argument n"): printf( "usage: huffcode In Outl Out2 [-W nnn] [-H nnn] [-P n] n"); exit(O): } / / if( !(in=fopen(argv[l],"r"))) { printf("Unable to open input image\n"); exit(0); } if( !(out=fopen(argv[2]."w"))) { printf("Unable to open first output image^n"); exit(O); / if ( !(out2=fopen(argv[3],"w"))) { printf("Unable to open second output image'n"); exit(O); } if (!(big_list=(struct huffrnan *)malloc( 1000*sizeof(struct hufftnan)))) { printf("Not enough memory for biglist n"); exit(O); } / * Position the file pointer at the correct color plane with the "p" value.*/ fseek(in,(p-1 )* w*h,SEEK_SET); str_count=w*h; /* number of svmbols to encode */ fwrite(&str_count,sizeof(int), 1 ,out); * After the file pointers and other variables have been initialized, the histogram data is created, "current" represents the current pixel value and "past" represents the previous pixel value. The second argument passed to the "hist_input" fiinction is the symbol to be coded which is the difference between the current and previous pixel values, "biglist" is the pointer to the list of "huffman" structures, "histinput" returns the updated value of the symbol count. If the svmbol passed to "histinput" is different from other s\ mbols previously passed to the function, a new "hufftnan" structure is linked to "biglist" with the "next" pointer, the symbol count is incremented, and the "occurrence" structure value is initialized to one. If the symbol passed to "histinput" is the same as a previous s\mbol. the "occurrence" value of the symbol structure is incremented. While the histogram is being created, the structures are sorted with respect to the "symbol" values. */ printf("creating histogram. . .n"); for (n=l ;n<=str_count;-i-+n) { current=getc(in); s\ m_count=hist_input(big_list,current-past.s\ mcount): past=current; } hist_output(out2,big_list[0].next); /* send histogram data to file */ 59 printf("creating hufftnan code book. .. n"); huffrnan_code(bigJist.sym count); fseek(in,(p-l)*w*h,SEEK_SET); /* After the Huffrnan algorithm generates the symbol codes, the symbol string from the input file must be coded into the compressed file. The symbols are coded one bit at a time w ith the "outputbit" function. Similarly, when the symbols are decoded, the decoder reads the compressed file one bit at a time with the "inputbit" ftinction in the same order the bits were sent to the compressed file, "startoutputingbits" and "done_outputing_bits" begins and ends bit output process, respecti\ ely. The argument sent to "code_s\ mbol" is the symbol value translated to an array index. The "code_s\mbol" function sends the code number of each symbol to the compressed file from the most significant bit (MSB) to the least significant bit (LSB), because Huffrnan coding is uniquely decodeable onh from left to right. */ printf("sending huffrnan code string. . .\n"); past=0,start_outputing_bits(); for (n=0;n<str_count;^+n) { current=getc(in); code_symbol(current-past+2 55): past=current; } doneoutputingbitsO; exit(O): } bitoutput.c * output bits into file */ #include <stdio.h> extern FILE *out; static int buffer; static int b i t s t o g o ; void start_outputing_bits(void) { buffer=0; bits_to_go=8; } void output_bit(int bit) { buffer»=l; if(bit) buffer 1=0x80; b i t s t o g o -= 1; if(bits_to_go==0) { putc(buffer,out): bits_to_go=8: } } void done_outputing_bits(void) { putc(buffer»bits_to_go,out); 60 m huff dec.c /*decodes the huffrnan encoded file */ #include <stdio.h> FILE *in; static int code_len[51 l],code_num[511]; static int sym_vector[51 l].s\m_count; /* In the "decodesymbol" function, each bit read from the compressed file is counted by "n" and left shifted into "decodenum." For everv bit read, the code length and code number for all of the unique symbols are tested for equality with "n" and "decodenum." respectively. The iterative loop cycles through the array indices. Since not all array indices of "codelen" and "decodenum" correspond to unique symbols, the search should be limited to only the symbol indices. The "symvector" array contains the index values corresponding to the unique symbols in order from most probable sy mbol to least probable symbol. The iterative loop limits the search of symbols by cycling through the values of the "symvector" array. Even with this search limitation, the decoding process for the 512x512 Lena image required about one and a half minutes to execute running on a 75MHz SPARC20 Sun machine. */ int decode_symbol(void) { int n,m,k,decode_num=0; int inputbitO; for(n=l;;++n){ decode_num«= 1; decode_num|=input_bit(); for(m=0;m<sym_count;++m) { k=sy mvector [m]; if (n==code_len[k] && decode_num=code_num[k]) retum(k); } } } /* The "main" function decodes the compressed file and outputs the reconstructed image. The code book is read from the file, and the symbols are decoded. The information from the code book is stored in three array vectors: "codelen," "codenum," and "symvector." */ main(int argc. char *argv[]) { FILE *out; int n=argc,number,str_count,current.past=0; short int symbol,length: void start_inputing_bits(); /* if no arguments print usage */ if(argc== 1) { printf( "usage: huffdecode In Out n"); exit(O): } 61 * open input and output files */ if ( !(in=fopen(argv[l]."r"))) { prinrf("Unable to open first input image'n"); exit(O); } if ( !(out=fopen(argv[2],"w"))) { printf("Unable to open output image n"); exit(O); } start_inputing_bits(): printf("decoding huffrnan code string. . .^n"); for (n=0;n<511:++n) code_len[n]=0,code_num[n]=0; fread(&str_count,sizeof(int),l,in); /* the number of symbols encoded */ fread(&sym_count,sizeof(int),l,in); /* the number of unique symbols */ /* read the codebook */ for (n=0;n<sym_count;++n) { fread(&symbol,sizeof(short int),l ,in); fread(&length,sizeof(short int), 1 .in); fread(&number,sizeof(int), 1 ,in); code_len[sy mbol+255]=length; code_num[symbol+255]=number: sym_vector[sym_count-n-l]=symbol^255; } /* read the encoded symbols */ for (n=l;n<=str_count;-^n) { symbol=decode_symbol(): current=symboH-past-255; putc(current,out); past=current; } exit(O); } inputbits.c /* input bits from file */ -include <stdio.h> extern FILE *in; static int buffer: static int b i t s t o g o ; static int garbagebits; void startJnputing_bits(void) { b i t s t o g o = 0: garbagebits = 0: } 62 int input_bit(void) { intt; if(bits_to_go=0) { buffer=getc(in); if(buffer=EOF) { garbage_bits+=l; if(garbage_bits>14) { printf("Bad input file n"); exit(O); } } bits_to_go=8; } t-buffer&l; buffer»=l: b i t s t o g o -= 1; return t; } 63 APPENDIX B: ANSI C IMPLEMENTATION OF ARITHMETIC ENCODING AND DECODFNG [4] arithmeticcoding.h /* Declarations used for arithmetic encoding and decoding */ #define typedef #define #define #define #define Codevaluebits 16 /* Number of bits in a code value */ long code_value; /* Type of an arithmetic code value */ Topvalue (((long) 1 «Code_value_bits)-1) /* Largest code value */ Firstqtr (Top_value/4+l) /* Point after first quarter */ Half (2*First_qtr) /* Point after first half */ Thirdqtr (3*First_qtr) /* Point after third quarter */ m_encode.c /* encodes an 8-bit image using arithmetic coding */ #include <stdio.h> #defineNS511 FILE *out; int char_to_index[NS],index_to_char[NS+l]; main(int argc,char *argv[]) { FILE *in; int n=argc,w=256,h=256,sym count,ch,symbol; int cum_freqfNS],past,current; intdata[100]; void start_model(int cum_freq[]); void start_outputing_bits(void); void start_encoding(void); void encode_symbol(int symbol,int cum_freq[]); void update_model(int symbol,int cum_freq[]); void done_outputing_bits(void); void done_encoding(void); /* if no arguments print usage */ if ( argc = 1) { printf("usage: arithcode In Out [-W nnn] [-H nnn]\n"); exit(O); } /* get args */ while (~n ) { if ( argv[n][0] == '-') { if(argv[n][l]=='W) /* then this is a switch */ 64 w = atoi( argv[n+l] ); elseif(argv[n][l]=='H') h = atoi( argv[n+I] ); else { printf("lnvalid argumentVn"); printf( "usage: arithcode In Out [-Vv nnn] [-H nnn] n"): exit(O); } } } /* open input and output files */ if ( !(in=fopen(argv[l],"r"))) { printf("Unable to open first input imageVn"); exit(O); } if ( !(out-fopen(argv[2],"w"))) { printf("Unable to open output imageVn"); exit(O); } start_model(cum_freq); startoutputingbitsO; start_encoding(),past=0; /* set up other modules */ /* The image pixels are encoded in the following for loop. The difference between two adjacent pixels is stored in "ch." This difference is translated up 255 so that the range of "ch" becomes 0-510. This translation is necessary so that the source symbols become valid array indices which are non-negative integers. The number of symbols, "NS," is defined as 511. This value is used to set the maximum array indexes. An original symbol is translated into an index symbol. Next, the index symbol is encoded, and the statistical model is updated. The "past" value is set to "current" and the process is continued for all the image pixels. The "doneencoding" fiinction assures that the final value encoded is within the final range so that the last symbol can be decoded. * for (n=0;n<(w*h);++n) { current=getc(in); /* read the next character */ ch=current-past+255; /* translate to a non-negative integer */ symbol=char_to_index[ch]: /* translate to an index */ encode_symbol(symbol,cum_freq);/* encode that symbol */ update_model(symbol,cum_freq); /* update the model */ past=current; } doneencodingO; doneoutputingbitsO; /* send the last few bits */ exit(O); } arithmetic encode.c /* arithmetic encoding algorithm */ 65 #include <stdio.h> #include "arithmeticcoding.h" /* current state of the encoding */ static codevalue low, high; /* ends of the current code region */ static long bitstofollow; /* number of opposite bits to output after the next bit */ /* start encoding a stream of symbols */ void start_encoding(void) { low=0; /* full code range */ high=Top_value; bits_to_follow=0; /* no bits to follow next */ } /* output bits plus following opposite bits */ static void bit_plus_follow(int bit) { void output_bit(int bit); output_bit(bit); /* output the bit */ while (bits_to_follow>0) { output_bit(!bit); /* Output bitstofollow opposite bits.*/ bits tofoUow -= 1; /* Set bitstofollow to zero. */ } } /* The binary implementation of arithmetic coding subdivides one main range of [0, 2 16) for 16-bit values. When the range is subdivided and narrowed by selecting a symbol to encode, a few most significant bits of the narrowed range can be determined. For example, if the narrowed range lies in the lower half of [0, 2^16), any 16-bit value in this narrowed range will have a MSB of 0. This 0 bit can be sent to the compressed output file. The high and low boundaries of the range which have a common 0 MSB can be left shifted one bit which effectively doubles their value which also doubles the range. There are other means to determine the most significant bits of a range of values. However, as each MSB is determined, the range will effectively double as in the above example. */ /* encode a symbol */ void encodesymbol (int symbol,int cum_freq[]) { long range; /* size of the current code range*/ range=(long)(high-low)+l; /* The following two lines demonstrate the method used to subdivide the range. The array elements "cum_freq[symbol-l]" and "cum_freq[symbol]" will differ by at least 1. The new range ("low" subtracted from "high") will be a fraction of the old range calculated in the previous line. The size of the new range will be proportional to the frequency count of "symbol" ("cum_freq[symbol]" subtracted from "cum_freq[symbol-l]"). This frequency count is, of course, directly proportional to the probability of "symbol." The equations in the following two lines should be restricted from overflow and underflow conditions. An overflow condition will occur if the product, "range*cum_freq[symbol1]," is greater than 2^31-1 since the operation is signed 32-bit integer multiplication. The maximum "range" value is 2^16-1, and the maximum "cum_freq[symbol-l]" value is "cum_freq[0]" which is at most "Maxfrequency." With this condition. "Maxfrequency" is limited to 2"^15. An underflow condition will occur if "high" and "low" become the same integers. In this case, encoding and decoding will become impossible. If "range" is too small or "cum_freq[0]" is too large, an underflow will occur. 66 In the following for loop, "range" is limited to a minimum of 2^14 which is a quarter of the maximum range. If "cum_freq[symbol-l]" and "cum_freq[symbol]" have a difference of 1, "cum_freq[0]" could be at most 2^14 in order for "high" to be greater than "low." Therefore, in order to avoid both the underflow and overflow conditions, "Maxfrequency" is set at 2 14-1. */ high=low+(range*cum_freq[symbol-l])/cum_freq[0]-l; /* Narrow the code region to */ Iow=low+(range*cum_freq[symbol])/cum_freq[0]; /* that allotted to this symbol. */ /* After the range is narrowed in the preceding two lines, some MSB's must be determined so that the range can expand above the minimum range. An example of determining a MSB so that the range can be doubled was discussed previously. A MSB from a range lying in the upper or lower half of the 16bit region is simple to obtain. A 0 bit is sent for a range lying in the lower half and a 1 bit is sent for a range lying in the upper half. In both cases, the range is scaled by doubling the "high" and "low" values. In the case of the range lying in upper half, half of the region must be subtracted from "low" and "high" before they can be scaled. Sometimes the range may not lie in either the upper or lower half but still be smaller than the minimum range to prevent underflow. In that case, "low" will lie in the second quarter and "high" will lie in the third quarter of the 16-bit region. The range is expanded by subtracting one quarter of the region from "low" and "high" and doubling their values. For each consecutive occurrence of this case, "bits tofollow" is incremented. When the next MSB is found from a range lying in the upper or lower half of the region, a number of opposite bits from the current MSB must be sent. The number of opposite bits is equivalent to "bitstofollow." For example, suppose "low" is 0111101111111111 (31743) and "high" is 1000010000000000 (33792). The five MSB's of any number in this range ("high">x>"low") can be determined if the first MSB is known beforehand. The four MSB's after the first MSB are the binary opposite of the first MSB. These "high" and "low" values would cause the third condition to occur four consecutive times so that "bitstofollow" would equal 4. After the next MSB is sent, four opposite bits would follow. Each of the three conditions causes the range to double. Once the range is large enough and the MSB's which can be determined are determined, the "encodesymbol" ftinction is exited. */ /* loop to output bits */ for(;;) { if(high<Half) { */ bit_plus_follow(0); /* output 0 if in low half } */ else if (low>=Half) { /* output 1 if in high half bit_plus_follow(l); low -= Half; /* subtt-act offset to top */ high -= Half; } /* Output an opposite bit later */ else if (low>=First_qtt- && high<Third_qtt-) { /* if in middle half */ bits to follow+= 1; /* Subtract offset to middle */ low -= Firstqtr; high -= Firstqtr; } /* Otherwise exit loop */ else break; /* Scale up code range */ low = 2*low; high = 2*high+l; } /* finish encoding the sft-eam */ void done_encoding(void) { 67 bits_to_follow +=1; /* Output two bits that select the quarter */ if (low<First_qtr) bit_plus_follow(0); /* that the current code range contains. */ else bit_plus_follow(l); } mdecode.c /* decodes an 8-bit image */ #include <stdio.h> #defineNS511 FILE *in; int index_to_char[NS+1 ],char_to_index[NS]; main(int argc,char *argv[]) { FILE *out; int n=argc,w=256,h=256,sym_count,ch,symbol; int cum_freq[NS],past,current; void start_model(int cum_freq[]); void start_inputing_bits(void); void start_decoding(void); int decode_symbol(int cum_freq[]); void update_model(int symbol, int cum_freq[]); /* if no arguments print usage */ if(argc== 1) { printfC'usage: arithdecode In Out [-W nnn] [-H nnn]\n"); exit(O); } /* get args */ while ( ~ n ) { if ( argv[n][0] == '-') /* then this is a switch */ { if(argv[n][l]=='W') w = atoi( argv[n+l]); elseif(argv[n][l]=='H') h = atoi(argv[n+l]); else { printf("Invalid argument\n"); printfC'usage: arithdecode In Out [-W nnn] [-H nnn]\n"); exit(O); } } } /* open input and output files */ if(!(in=fopen(argv[l],"r"))){ printf("Unable to open input imageVn"); 68 exit(O); } if( !(out=fopen(argv[2]."w'))) { printf("Unable to open output image\n"); exit(O); } start_model(cum_freq); ,* set up other modules */ startinputingbitsO; start_decodingO,past=0; /* The decoding process is very similar to the encoding process. The following for loop decodes the pixel values. An index symbol is decoded and tt-anslated back to an original symbol. The "past" value is set to "current," and the statistical model is updated. */ for (n=0;n<(w*h);-+n) { symbol=decode_symbol(cum_freq): ch=index_to_char[syTnbol]; current=ch+past-255: putc(current.out): past=current; update_model(symbol,cum_freq); } exit(O); /* loop through pixel values */ /* decode next symbol */ /* translate to a difference value*/ /* translate to a pixel value */ /* write out the pixel value */ /* update the model */ } arithmetic decode.c /* arithmetic decoding algorithm */ #include <stdio.h> #include "arithmetic codins.h" /* current state of the decoding */ static code_value value: static code_\ alue low, high; /* currently-seen code value *' /* ends of current code region */ /* The "startdecoding" fiinction fills "value" with the first sixteen bits which were ouput by the arithmetic coder. This value will be in the range of the first symbol encoded.*/ void start_decoding(void) { int i; int input_bit(void): value=0; for (i=l ;i<=Code_value_bits;i^^) { /* input bits to fill the code value * value = 2*value+input_bit(); } low=0: /* full code range */ high=Top_value; } /* decode the next sy mbol */ int decode_symbol(int cum_freq[]) { long range; /* size of current code region * 69 int cum: int symbol; int input_bit(void); range=(long)(high-low)-1; r* cumulative frequency calculated *' '* symbol decoded *' /* The following line translates "value" into a value in the range of the "cumfreq" array.* cum=(((long)(value-low)-l)*cum_freq[0]-l) range; /* Find cum freq for \alue * /* The symbol which corresponds to the cumulative frequency \ alue is found in the next line. * for (symbol=l ;cum_freq[sy mbol]>cum;symbol^-); /* then find sy mbol. */ /* With the symbol decoded, the range is narrowed and the MSB's are discarded similar to the coding process. Each time the range is doubled, the MSB of "value" is discarded and a least significant bit (LSB) is shifted into "value". The value of "value" will remain in the range of the next symbol to be decoded. The "inputbit" ftinction simply receives bits in the same order in which "outputbit' sent them to a file. The number of operations and execution time for the decoding process is comparable to the coding process. */ high=low+(range*cum_freq[symbol-l])/cum_freq[0]-l: /* Narrow the code region */ Iow=low+(range*cum_freq[symbol])'cum_freq[0]; /* to that allotted to this for (;;) { /* Loop to get rid of bits */ /* symbol. */ if(high<Half) {} /* nothing */ /* Expand low half */ elseif(low>=Half) { /* Expand high half */ value -= Half; /* Subtract offset to top */ low -= Half; high - - Half; } else if (low>=First_qtt- && high<Third_qtr) {/* Expand middle half. */ value -= Firstqtr; /* Subtract offset to middle. */ low -= Firstqtr; high -= Firstqtr; } /* otherw ise exit loop */ else break; /* scale up code range */ low=2*low: high=2*high+l: /* move in next input bit */ value=2*value+input_bit(); } return s\mbol: } bitinput.c /* bit input routines */ -include <stdio.h> ^include "arithmeticcoding.h" extern FILE *in; /* the bit buffer */ static int buffer; static int b i t s t o g o ; /* Bits waiting to be input * /* Number of bits still in buffer */ 70 static int garbage_bits; /* initialize bit input * void start_inputing_bits(void) { bits_to_go = 0; garbagebits = 0; } * Number of bits past end-of-file */ /* Buffer starts out w ith no bits in it. */ * input a bit */ int input_bit(void) { intt; if (bits_to_go==0) { * Read the next byte if no bits are left in buffer. */ buffer=getc(in); if(buffer==EOF) { garbage_bits+=l; /* Return arbitrary bits after if (garbage_bits>Code_value_bits-2) {/* eof, but check for too */ fprintf(stderr,"Bad input file n"); /* many such. *' exit(O); } } bits_to_go=8; } /* Retum the next bit from the bottom of the byte. * t=buffer&l; buffer»=l; bits_to_go -= 1; return t; } bitoutput.c #include <stdio.h> extern FILE *out; /* the bit buffer */ static int buffer; static int b i t s t o g o ; /* initialize for bit output */ void start_outputing_bits(void) { buffer=0; bits_to_go=8; /* Bits buffered for output */ /* Number of bits free in buffer */ * buffer is empty to start with * } /* output a bit */ void output_bit(int bit) { buffer»=l: if(bit) buffer 1=0x80; b i t s t o g o -= 1; if(bits_to_go==0) { putc(buffer,out); /* put bit in top of buffer */ * Output buffer if it is now full */ 71 bits_to_go=8; } } * flust out the last bits */ void done_outputing_bits(void) { putc(buffer»bits_to_go,out); } adaptive_model.c #define Maxfrequency 16383 #defineNS511 static int freq[NS+l]; /* symbol frequencies */ extern int index_to_char[NS+l],char_to_index[NS]; /* The ranges are subdivided and narrowed using the "cumfreq" array which is closely related to the symbol probabilities. Each symbol corresponds to a unique array index. The value of each array element is the cumulation of the FOO of the symbols indexed ahead of the current symbol. Each element of the "freq" array with a symbol index contains the value of the FOO of the corresponding symbol. All of the FOO are initialized to 1. No symbol is indexed by the zero element, so this value can be set to 0. All of the symbols are indexed ahead of "cum_freq[0]," so this element contains the cumulative FOO. The values of "cum_freq" are accumulated in reverse so that the zero element of "cumfreq" can be used for normalization purposes. As each symbol is coded, "freq" and "cumfreq" are sorted with respect to FOO in descending order. The arrays are sorted so that the symbols can be decoded more quickly and efficiently. When the arrays are sorted, the indices corresponding to the symbols change. The program should keep track of the indexes with which the symbols correspond. Two arrays, "chartoindex" and "indextochar." provide this ftinction. This program was initially written for text compression, so the source symbols are sometimes reffered to as characters and abbreviated "char." These two arrays are initialized in logical ascending order. The two arrays remain invertible. In other words, if a character is translated to an index w ith "chartoindex," the same index can be used to translate back to the original character w ith "indextochar." The indices are coded with the arithmetic algorithm, and these two arrays are used to translate back and forth between the original and coded symbols. The original symbols must be represented by non-negative integers to provide valid indices for "chartoindex". */ /* initialize the model */ void start_model(int cum_freq[]) { int i; for (i=0;i<NS;i++) { /* Set up tables that translate between sy mbol */ char_toindex[i]=i+l; /* indexes and characters. * index_to_char[i+1 ]=i; } for (i=0;i<=NS;i++) { /* Set up initial initial frequency counts to be one */ freq[i]=l; /* for all symbols. */ cum_freq[i]=NS-i: } freq[0]=0; /* Freq[0] must not be the same as freq[l]. */ } /* The statistical model used by the arithmetic coder is basically a histogram of the symbol occurrences. This model is labeled adaptive and adapts to changing symbol probabilities because the histogram is 72 updated with each symbol encoded. After a symbol is coded, the arrays "chartoindex," "index to char.' "freq," and "cumfreq" are updated with the "updatemodel" fiinction. The frequency count for the symbol is incremented in the "freq" array. If the index for the symbol is changed by the sorting operation, the "char_to_index" and "index_to_char" arrays are updated. If the symbol frequency counts become to large, an underflow condition will occur. To prevent an underflow condition, the frequency counts are limited with the "Maxfrequency" value. If the cumulative frequency count reaches "Max frequency." all of the frequency counts are divided in half. */ /* update the model to account for a new symbol */ void update_model(int symbol,int cum_freq[]) { int i,cum; int c h i , chsymbol; if (cum_freq[0]==Max_frequency) { cum=0; /* See if frequency counts are at their maximum */ for (i=NS;i>=0;i-) {/* If so, halve all the counts (keeping them non-zero). freq[i]=(freq[i]+l)/2: cum_freq[i]=cum; cum+=freq[i]; } } for (i=symbol;freq[i]=freq[i-l];i-); /* Find symbol's new index.*/ if(i<symbol) { ch_i=index_to_char[i]; /* Update the transition tables if*/ ch_symbol=index_to_char[symbol]; /* symbol has moved. */ index_to_char[i]=ch_symbol; index_to_char[symbol]=ch_i; char_to_index[ch_i]=symbol; char_to_index[ch_symbol]=i; } freq[i]+=l; /* Increment the frequency count for the symbol and */ while (i>0) { /* update the cumulative frequencies. */ i-=i; cum_freq[i]+=l; } 73 */ APPENDIX C: EZW EXAMPLE [12] In this section, a simple example will be used to highlight the order of operations used in the EZW algorithm. Only the string of symbols will be showTi. The reader interested in the details of adaptive arithmetic coding is referred to Chapter III. Consider the simple 3-scale wavelet transform of an 8 x 8 image. The array of values is shown in Figure C.l. Since the largest coefficient magnitude is 63. we can choose our initial threshold to be anywhere in (31.5. 63]. Let TQ = 32. Table C.l shows the processing on the first dominant pass. The following comments refer to Table C.l: 1. The coefficient has magnitude 63 which is greater than the threshold 32. and is positive so a positive symbol is generated. After decoding this symbol, the decoder knows the coefficient in the interval [32, 64) whose center is 48. 2. Even though the coefficient 31 is insignificant with respect to the threshold 32. it has a significant descendant two generations dovm in subband LH, with magnitude 47. Thus, the symbol for an isolated zero is generated. 3. The magnitude 23 is less than 32 and all descendants which include (3, -12. -14. 8) in subband HHj and all coefficients in subband HH, are insignificant. .\ zerotree symbol is generated, and no symbol will be generated for any coefficient in subbands HH2 and HH, during the current dominant pass. 4. The magnitude 10 is less than 32 and all descendants (-12. 7. 6, -1) also have magnitudes less than 32. Thus a zerotree symbol is generated. Notice that this tree has a violation of the "decaying spectrum" hypothesis since a coefficient (-12) in subband HL, has a magnitude greater than its parent (10). Ne\ ertheless. the entire tree has magnitude less than the threshold 32 so it is still a zerotree. 5. The magnitude 14 is insignificant with respect to 32. Its children are (-1. 47. -3. 2). Since its child with magnitude 47 is significant, an isolated zero symbol is generated. 6. Note that no symbols were generated from subband HHj which would ordinarily precede subband HL, in the scan. Also note that since subband HL, has no descendants. 74 the entropy coding can resume using a 3-symbol alphabet where the IZ and ZTR sy mbols are merged into the Z (zero) symbol. 7. The magnitude 47 is significant with respect to 32. Note that for the future dominant passes, this position will be replaced with the value 0. so that for the next dominant pass at threshold 16, the parent of this coefficient, which has magnitude 14. can be coded using a zerotree root symbol. During the first dominant pass, which used a threshold of 32, four significant coefficients were identified. These coefficients will be refined during the first subordinate pass. Prior to the first subordinate pass, the uncertainty interval for the magnitudes of all of the significant coefficients is the interval [32. 64). The first subordinate pass will refine these magnitudes and identify them as being either in inter\ al [32, 48), which will be encoded with the symbol "0," or in the interval [48. 64). which will be encoded with the symbol "1." Thus, the decision boundary is the magnitude 48. It is no coincidence that these symbols are exactly the first bit to the right of the MSB in the binary representation of the magnitudes. The order of operations in the first subordinate pass is illustrated in Table C.2. The first entry has magnitude 63 and is placed in the upper interv al w hose center is 56. The next entry has magnitude 34, which places it in the lower interval. The third entry 49 is in the upper interval, and the fourth entry 47 is in the lower inter\al. Note that in the case of 47, using the center of the uncertaint\ interval as the reconstruction \alue, when the reconstruction value is changed from 48 to 40, the reconstruction error actually increases from 1 to 7. Nevertheless, the uncertainty interval for this coefficient decreases from width 32 to width 16. At the conclusion of the processing of the entries on the subordinate list corresponding to the uncertainty inter\al [32, 64). these magnitudes are reordered for future subordinate passes in the order (63, 49. 34, 47). Note that 49 is moved ahead of 34 because from the decoder's point of view, the reconstruction \ alues 56 and 40 are distinguishable. Howe\er, the magnitude 34 remains ahead of magnitude 47 because as far as the decoder can tell, both have magnitude 40, and the initial order, which is based first on importance by scale, has 34 prior to 47. 75 The process continues on to the second dominant pass at the new threshold of 16. During this pass, only those coefficients not yet found to be significant are scanned. Addifionally, those coefficients previously found to be significant are treated as zero for the purpose of determining if a zerotree exists. Thus, the second dominant pass consists of encoding the coefficient -31 in subband LH, as negative significant, the coefficient 23 in subband HH3 as positive significant, the three coefficients in subband HL. that have not been previously found to be significant (10, 14. -13) are each encoded as zerotree roots, as are all four coefficients in subband LH2 and all four coefficients in subband HH The second dominant pass terminates at this point since all other coefficients are predictably insignificant. The subordinate list now contains, in order, the magnitudes (63, 49, 34. 47. 31. 23) which, prior to this subordinate pass, represent the three uncertainty intervals [48, 64), [32, 48) and [16, 32), each having equal width 16. The processing will refine each magnitude by creating two new uncertainty intervals for each of the three current uncertainty intervals. At the end of the second subordinate pass, the order of the magnitudes is (63, 49, 47, 34, 31, 23), since at this point, the decoder could have identified 34 and 47 as being in different intervals. Using the center of the uncertainty interval as the reconstruction value, the decoder lists the magnitudes as (60. 52. 44. 36, 28, 20). The processing continues alternating between dominant and subordinate passes and can stop at any time. 76 63 -34 49 10 7 13 -12 7 -31 23 14 -13 3 4 6 -I 15 14 3 -12 5 -7 3 9 -9 -7 -14 8 4 -2 3 ") -5 9 -1 47 4 6 -2 2 3 0 -3 2 3 0 4 2 -3 6 -4 3 6 3 6 5 11 5 6 0 3 -4 4 .1 Figure C.l Example of 3-Scale DWT of an 8 x 8 Image 77 Table C.l Processing of First Dominant Pass at T = 32 Comment Subband Coefficient Symbol Value Reconstruction Value LL3 63 PS 48 HL3 -34 NS -48 (2) LH3 -31 IZ 0 (3) HH3 23 ZR 0 HL2 49 PS 48 HL2 10 ZR 0 HL2 14 ZR 0 HL2 -13 ZR 0 LH2 15 ZR 0 LH2 14 IZ 0 LH2 -9 ZR 0 LH2 -7 ZR 0 HL, 7 Z 0 HL, 13 Z 0 HL, 3 Z 0 HL, 4 Z 0 LH, -1 Z 0 LH, 47 PS 48 LH, -3 0 LH, -2 z z (1) (4) (5) (6) (7) 78 0 Table C.2 Processing of the First Subordinate Pass Coefficient Symbol Magnitude Reconstruction Magnitude 63 1 56 34 0 40 49 1 56 47 0 40 79 PERMISSION TO COPY In presenting this thesis in partial fulfillment of the requirements for a master's degree at Texas Tech University or Texas Tech University Health Sciences Center, I agree that the Library and my major department shall make it freely available for research purposes. Permission to copy this thesis for scholarly purposes may be granted by the Director of the Library or my major professor. It is understood that any copying or publication of this thesis for financial gain shall not be allowed without my further written permission and that any user may be liable for copyright infringement. Agree (Permission is granted.) JjC'^ ^y^^^^SU^ Student's Signature 7-7-?7 Date Disagree (Permission is not granted.) Student's Signature Date