31295011710174

advertisement
COLOR IMAGE COMPRESSION USING
WAVELET TRANSFORM
by
STEVEN CARL MEADOWS, B.S.E.E.
A THESIS
IN
ELECTRICAL ENGINEERING
Submitted to the Graduate Faculty
of Texas Tech University in
Partial Fulfillment of
the Requirements for
the Degree of
MASTER OF SCIENCE
IN
ELECTRICAL ENGINEERING
Aooroved
Accepted
Dean of tjr^ Graduate School
August, 1997
AC
'^^ -7
t
ACKNOWLEDGMENTS
My sincere appreciation goes to my graduate advisor Dr. Sunanda Mitra for all of
her help and encouragement during my research. I also would like to thank Dr. Krile and
Dr. Lakhani for serving on my graduate committee.
11
TABLE OF CONTENTS
ACKNOWLEDGMENTS
ii
ABSTRACT
iv
LIST OF TABLES
v
LIST OF FIGURES
vi
CHAPTER
I. INTRODUCTION TO IMAGE COMPRESSION
II. HUFFMAN CODING
1
8
III. ARITHMETIC CODING
12
IV. DISCRETE WAVELET TRANSFORM
19
V. EMBEDDED ZEROTREE WAVELET ALGORITHM
VI. RESULTS OF EZW COMPRESSION
VIL
SUMMARY AND CONCLUSIONS
REFERENCES
35
45
51
52
APPENDIX
A. ANSI C IMPLEMENTATION OF HUFFMAN ENCODING AND
DECODING
54
B. ANSI C IMPLEMENTATION OF ARITHMETIC ENCODING AND
DECODING
64
C. EZW EXAMPLE
74
111
ABSTRACT
C language coding of image compression algorithms can be a difficult and tedious
task. Image compression methods are usually composed of many stages of cascaded
algorithms. Each algorithm may be developed independently. This thesis will address
the problem of interfacing new image compression algorithms with older and established
algorithms such as entropy coding and the discrete wavelet transform. The thesis will
describe ANSI C coding procedures and functions involved in implementing two entrop\
coding algorithms including Huffman coding and arithmetic coding. Wavelet theory will
be discussed as it applies to the discrete wavelet transform. The thesis will also describe
an ANSI C coding implementation of one of the newest wavelet coefficient coding
techniques, embedded zerotree wavelets (EZW) developed by Jerome Shapiro. The EZW
compression performance will be compared with JPEG which is the standard adapted
currently for still images by the Joint Photographic Experts Group.
IV
LIST OF TABLES
2.1 Assignment Procedure
11
3.1 Model of Set {a,e,i,o.u,!}
17
4.1. Initialized Filter Impulse Responces
34
6.1 Lena and Baboon Compressed Image Statistics
49
C.l Processing of First Dominant Pass at T = 32
78
C.2 Processing of the First Subordinate Pass
79
LIST OF FIGURES
1.1 Communication System Model
6
1.2 Histogram of Pixel Values
6
1.3 Histogram of Pixel Differences
7
1.4 A Transform Coding System
7
2.1 Huffman Structure
11
3.1 Arithmetic Coding Process
17
3.2 Initialized "freq" and "cumfreq" Arrays
17
3.3 Initialized "char_to_index" and "index_to_char" Arrays
18
3.4 Updated Arrays
18
4.1 Discrete Wavelet Decomposition
34
4.2 Discrete Wavelet Reconstruction
34
5.1 Zerotree Structure
43
5.2 Position Coding Flowchart
44
6.1 Lena and Baboon Images Compressed with EZW and JPEG
47
6.2 EZW and JPEG Compression Plots
50
C.l Example of 3-Scale DWT of an 8 x 8 Image
77
VI
CHAPTER I
INTRODUCTION TO IMAGE COMPRESSION
Currently, there is a large proliferation of digital data. Multimedia is an evohing
method of presenting many types of information. Multimedia combines text, pictures,
sound, and animation in a digital format to relate an idea or story. In the future,
multimedia may be as readily available as newspapers and magazines (which combine
text and pictures in a printed format to relate information) are today. With multimedia as
well as with other types of digital data, there is a need to reduce the costs of storage and
transmission of the information. Reducing costs translates into reducing the amount data
needed to represent the information. Data compression fills this role. Data compression
is the process of reducing the amount of data needed to represent information. Data
compression is often referred to by the type of data being compressed: image
compression compresses still images; video compression compresses animation
combined with sound; etc. Data is presented to a user in an uncompressed format and is
stored and transmitted in a compressed format. Therefore, data compression algorithms
need to perform two functions, compression and decompression. In Figure 1.1, the
encoder and decoder represent the compression and decompression processes,
respectively, and the channel represents a storage device or transmission process. In the
work presented in this thesis, the storage device or transmission process is assumed to be
lossless. In other words, the compressed data will not be corrupted by the channel. This
assumption is often realized in practice since most storage devices reliably preserve data
and error correction protocols are incorporated into most digital transmission processes.
This thesis focuses on the efficiency and performance of an encoder and decoder in an
image compression system.
Image compression is generally divided into two categories: lossless and lossy.
Lossless compression refers to compression without losing any image information. The
decoded pixel values will be the same as encoded pixel values. However, a lossy
compression system corrupts the pixel values so that the uncompressed or reconstructed
1
image is an approximation of the original image. Lossy compression has the advantage
of compressing an image to a much higher compression ratio (CR) than lossless
compression since a lossy compressed image contains less information than a lossless
compressed image. The CR is the amount of original data di\ided b\ the amount of
compressed data (Equation 1.1). The least amount of data needed to represent all of the
image information is constrained by the amount of information contained in the \ alues
amount of original data
CR =
—z
TT—•
amount oi compressed data
(1-1)
encoded. The amount of information contained in the values or symbols encoded can be
quantitatively measured. The entropy (Equation 1.2) is the amount of information in rary units per symbol of a source, where J is the number of different symbols and P(aj) is
the probability of symbol aj. (Probability is a measure of likelihood of a symbol
occurrence in a range from 0 to 1 with 1 being most likely.) When the base of the
entropy = L^, = - ^ P(cij) log, P(aj) •
(1-2)
logarithm is 2, the entropy is measured in binary units or bits per symbol. The entropy of
a source is the theoretical limit of the least number of bits required to code each symbol
on average [1]. All of the entropy calculations in this thesis are measured in bits per
symbol. Losslessly coding a set of symbols is generally referred to as entropy coding.
There are many sets of symbols which can represent an image. The most obxious
set of symbols to represent an image is the set of pixel values. The entropy of the
individual pixel values is calculated using Equation 1.2 where the probabilities of the
individual pixel values are obtained from their relati\e frequencies of occurrence. This
entropy is called the first-order entropy estimate of an image. This entropy is labeled an
image entropy estimate because if the pixel values are dependent on one another, the
image entropy will be less than the indi\ idual pixel value entropy. When the entrop\ of
pixel pairs is calculated, the dependency of the pixel values on their corresponding pairs
is taken into accoimt. This entropy is called the second-order entropy estimate. WTien
2
the pixels are grouped together into groups of three, the group entrop\ compensates for
the pixel dependencies within each group resulting in a third-order entropy estimate. The
infinite order entropy estimate becomes the actual image entrop\'. If indi\ idual pixel
values are statistically independent, the first-order entropy estimate becomes the actual
image entropy [1]. All of the entropy calculations in this thesis will be first-order entropy
estimates since the first-order estimate is the simplest to calculate. Variable length
coding techniques, such as Huffman coding, code symbols one at a time with each
symbol being represented by a variable number of bits. When pixel values are coded
with a variable length coding method, the first-order entropy estimate is the lower bound
on the pixel bit rate. However, when more than one pixel is coded at a time with a coding
method such as arithmetic coding, the average bits per pixel (BPP) can be lower than the
first-order entropy estimate [1]. Huffman and arithmetic coding techniques will be
discussed in detail in Chapters II and III, respectively.
If the pixel values are reversibly mapped to another set of values, the other set of
values may have a much lower entropy than the original pixel values. As stated
previously, many sets of values can represent an image. The problem in lossless image
compression is to find the set of values, which are reversibly mapped from the pixel
values, that has the lowest entropy. Predictive coding is a popular method to map the
pixel values to a lower entropy set. With predictive coding, each pixel value is predicted
from previous pixel values, and the difference between the original and predicted value is
coded [1]. As a demonstration of the advantage of predictive coding over pixel coding,
consider the standard Lena image. Lena contains 8 BPP resulting in 256 levels of gray
per pixel and has dimensions of 512 x 512 pixels. The probabilities of the pixel values
are their frequencies of occurrence (Figure 1.2) divided by the total number of pixels
which is 262144. Using Equation 1.2. the first-order entropy estimate is 7.4030 BPP.
Reducing the original 8 BPP to 7.4030 BPP achieves a CR of only 1.0806. Howe\er, if
each pixel value is predicted b\ the pixel immediately preceding, which is the pixel to the
left when scanning the image from left to right and top to bottom, the first-order entrop\
estimate of the differences between the actual and estimated pixel \ alues is 5.0221. This
3
value is less than the first-order estimate of the pixel values. This bit rate reduction
achieves a CR of 1.5930. Notice that the probabilifies of small pixel differences (close to
zero) are very high (Figure 1.3). These high probabilifies indicate that the pixel \ alues
are slowly varying, and the image has many smooth features. The sharp spike in the
histogram of the pixel differences is indicative of sets with low entropy. One can
estimate the relative entropies of different sets of values based on their histograms. For
example, if the histogram of pixel differences had a narrower and taller spike at zero, the
entropy would be less. Similarly, if the histogram had a fatter and shorter spike at zero,
the entropy would be more. With an improvement in CR from 1.08 to 1.59, predicti\ e
coding would store the Lena image with 78,031 fewer bytes than pixel coding.
Lossy compression is used to compress images to much higher CR's than lossless
compression. As shown in the previous example, variable length coding of the pixel
differences could at best compress the image to a CR of 1.59. Even with more
complicated coding techniques, generally, the best CR for lossless coded images is about
2. However, with lossy compression, CR's of 10, 20, or 40 are common depending on
how much distortion is acceptable. The type of lossy coding discussed in this thesis is
transform coding. With transform coding, a linear transformation is performed on the
image so that more important information can be separated from less important
information. This less important information can then be discarded. In other words, the
transformation decorrelates the pixel data so that the most information can be packed into
the least number of transform coefficients [1]. Common transforms are the discrete
cosine transform (DCT) and the discrete wavelet transform (DWT). This thesis will
focus on the DWT.
Transform coding proceeds as follows: First, the forward transformation is
performed on the image (Figure 1.4). Next, the transform coefficients are quantized.
Quantization is a method of approximating the transform coefficients so that the most
important image information is retained. Information loss and distortion occurs in the
quanfization stage. There are generally two types of quantization: scalar and vector.
Vector quantization occurs when values are quantized as groups. Scalar quantization
4
quantizes values individually. This thesis will focus on scalar quantization. One of the
simplest forms of scalar quantizafion is threshold coding. With threshold coding,
uniform ranges of transform coefficient values are placed into separate bins. The range of
values which is assigned to each bin is determined by the value of the bin width.
Equation 1.3 is the equation form of the quantization curve where k is the value of the bin
index, c is the value of the bin width, and T is the transform coefficient value [1]. Next,
the symbols representing these approximated coefficients are losslessly coded. Entrop>
coding should be the final stage of all lossy compression algorithms. The compression
gained from the entropy coding stage alone may be a CR of less than 1.5, but the entrop>
coding stage will add some compression. The decoding process performs the inverse of
the encoding stages except for the quantization stage. The inverse transform is performed
on the approximated transform coefficients which results in a distorted reconstructed
image.
Transform coding contains many different stages. Each stage may be developed
independently. The quantizer has been the focus of the most intense research. When a
new quantizer is developed, it must be interfaced with the other stages. In order to
interface a quantizer with an entropy coder, both processes must be well understood.
With an ANSI C code implementation, coding a set of quantizer symbols can be difficult
and tedious. Chapters II and III will demonstrate how to interface two common entropy
coders, Huffman coding and arithmetic coding, with a symbol generating algorithm such
kc-^<T<kc^^.
(1.3)
as a quantizer. Chapter IV will discuss the development and implementation of the
DWT. Chapter V will discuss an implementation of the new scalar quantizer of wavelet
coefficients, EZW. Chapter VI will demonstrate compression results of EZW and
compare them with the compression results of the lossy compression algorithm JPEG for
24-bit color images and a gray-scale image.
Information
source
^ Encoder
Channel
^ Decoder
Figure 1.1 Communication System Model
3000
50
100
150
200
pixel values
250
Figure 1.2 Histogram of Pixel Values
300
Information
user
x10
2.5
0
o
c
0
^o 1 . 5
o
0)
-o
1
c
13
0.5
0
-200
-100
0
pixel values
100
200
Figure 1.3 Histogram of Pixel Differences
Input image
Compressed
image
?
Forward
transform
^ Quantizer
Symbol
decoder
Symbol
encoder
Inverse
transform
Figure 1.4 A Transform Coding System
7
-> Compressed
image
-> Decompressed
image
CHAPTER II
HUFFMAN CODING
Huffman coding [2] has become one of the most used methods in coding sets of
symbols. As previously menfioned, Huffman coding is a type of \ariable length entrop\
coding where each symbol corresponds to a unique binary string of varying length.
Huffman coding is uniquely decodeable. In other words, when the symbols are encoded
by concatenating the binary strings, this concatenated binary string can be decoded
uniquely when read sequentially in the same order that it was written. No special
symbols are required to delimit the binary strings. Extra storage is required to store the
codebook which equates each unique symbol with its corresponding binary string.
However, the amount of data in the codebook is usually insignificant compared to the
amount of data required to code the source of symbols. Therefore, when the codebook is
concatenated with the string of encoded symbols, the overall file size is not increased
significantly.
In order to create a codebook and assign binary strings to symbols, the
probabilities of the symbols must be known or estimated. The symbols with higher
probabilities have shorter binary strings since they occur more often. The symbols that
occur less often have longer binary strings. The symbol probabilities can be established
by creating a histogram of the symbols from the source. Since the Huffman coding
algorithm only requires knowledge of the relative probabilities (how the probabilities
compare with one another), the frequencies of occurrence (FOO) can be used directly by
the algorithm. FOO are directly proportional to the symbol probabilities.
First, a simple example to demonstrate the Huffman algorithm will be discussed.
Next, an ANSI C code implementation of the algorithm will be described. Suppose an
image with 100 pixels is coded with the one-dimensional predictive coding technique
described in Chapter I. Suppose also that only 6 unique symbols resulted from the coding
(Table 2.1) [1]. The FOO sum to 100 since there are 100 pixels. First, the symbols are
sorted with respect to their FOO. Next, the two symbols with the least probabilities are
8
combined into one symbol whose probability is the sum of the probabilities of the two
symbols. Next, the symbols are sorted again and the process is repeated until only two
symbols are left. Each iteration is called a source reduction because the number of
symbols is reduced by one. Each source reduction column contains the FOO and the
binary strings (Table 2.1). After the fourth source reduction, the two symbols which are
left are given the codes 0 and 1. One of these two codes is handed down to the two
symbols which were combined in the fourth source reduction. The symbol with the FOO
of 60 was created from the two symbols with FOO of 30 in the fourth source reduction.
These two symbols are distinguished by concatenating a 0 or 1 to the code 0 which was
handed down. The symbol with the FOO of 40 is an original symbol which was not
created by the algorithm. Therefore the code for symbol 0 is set at 1. Next, the symbols
which were combined in the third source reduction are distinguished by concatenating a 0
or 1 to the handed down string. This process of assigning binary strings to symbols is
continued until all of the original symbols have been assigned codes. As predicted, the
symbols with the highest FOO have short codes and the symbols with the least FOO have
longer codes. A symbol string such as (-1, 0, 3, 1) can now be coded in binary as
00101010011. Wlien this binary string is read from left to right, the symbols can be
decoded uniquely.
The ANSI C code implementation of Huffman coding uses linked lists of
structures to create the binary codes and integer arrays to encode the binary strings. One
should be familiar with the ANSI C language before reading further. [3] is a good
reference for this programming language. The Huffman structure (Figure 2.1) consists of
the symbol, FOO, code length, code number, and two structure pointers. Each unique
symbol will have its own structure. The symbols which are being encoded must be
translated some way into integer values in "symbol". The "occurrence" \ alue is the FOO
of the symbol. The "codejength" and "codenumber" values determine the binary string
equated to the symbol. "code_number" is the numerical value of the binar> string. If
there are trailing zeros to the left of the binary string, the length of the string may not be
apparent from the binary value. Therefore, "codejength" determines the length of the
9
string. Together, "codelength" and "codevalue" uniquely determine an\ binar\ string.
The "next" pointer links the structures together in a sorted list. The "child" pointer links
the source reduced symbol with one of the two symbols which were combined to form
the source reduced symbol. For example, in the first source reduction column in Table
2.1, the symbols are sorted with respect to the FOO. If "huffman" structures represented
the symbols, the list would be linked together with "next" pointers. The "3" and "-2"
symbols were combined to form the last symbol in the list. The program must be able to
retrace the steps of source reducfion, so the "child" pointer in the last symbol would point
to the "3" symbol in the previous list. The "next" pointer in the "3" symbol would point
to the "-2" symbol because of the order of the list. Therefore, by following the "child"
and "next" pointers, the program can find the two symbols which were combined to form
a source reduced symbol.
The Huffman program was tested with the 8-bit gray scale standard Lena image.
The program reduced the file size from 262,144 bytes to 167,948 bytes for a CR of 1.56
and a bit rate of 5.13BPP. The cumulative squared difference between the original and
reconstructed pixel values was 0 since the coding was lossless. This compressed bit rate
is close to the entropy bit rate calculated above for the same data. Huffman coding
resulted in 5.13BPP while the first-order entropy estimate was calculated at 5.02BPP.
Since no other variable length coding technique can produce a lower bit rate than
Huffman coding, Huffman coding is considered optimal.
10
Table 2.1 Huffman Code Assignment Procedure [1]
Original source
Symbol
Source reduction
FOO
code
1
0
40
1
40 1
40 1
40 1
-1
30
00
30 00
30 00
30 00
1
10
Oil
10011
r-20010^
30 01
2
10
0100
100100<-| 10011
3
6
01010^
100101^
-2
4
OlOll-e
4
struct huffrnan {
short int symbol;
int occurrence;
short int codelength;
int codenumber;
struct huffrnan *next;
struct huffman * child;
};
Figure 2.1 Huffman Structure
11
r 60 0
40 1
CHAPTER III
ARITHMETIC CODING
Arithmetic coding is a relatively new lossless symbol coding technique.
Arithmetic coding can code more than one symbol with a single code word, thereb\
allowing arithmetic coding to achieve a lower bit rate than any \ ariable length coding
technique. For many years, Huffman coding was considered the best symbol coding
technique. Now, arithmetic coding is able to compress strings of symbols better than
Huffman coding. The arithmetic coding algorithm is better suited to using adaptive
statistical models. In other words, arithmetic coding can adapt to changing symbol
probabilities from a source. With an adaptive statistical model, the symbol probabilities
are determined while the symbols are being coded instead of being determined
beforehand as with the Huffman algorithm. Arithmetic coding is also more
computationally efficient than Huffman coding. Huffman decoding can be
computationally expensive since, with each bit read from a compressed file, the decoder
must scan through a look-up table containing the symbol codes. Howe\ er, with the
arithmetic compression program described in this thesis, coding and decoding is
performed through integer multiplication and division which is very fast on modem
computers. Also with arithmetic coding, symbols from different sources can easily be
encoded mixed together without loss of compression efficiency. However, the arithmetic
decoder must be aware of the order of the mixing in order to unmix the sources. This
technique of mixing and unmixing sources is used extensively in the EZW algorithm
described in Chapter V.
Arithmetic coding is more complicated and difficult to implement than Huffman
coding. The implementation described in this chapter and used in the EZW program was
taken from [4]. A standardized, patent-free, off-the-shelf arithmetic software package is
currently unavailable. However. IBM has developed a patented arithmetic coding
implementation called a Q-coder, and the Joint Photographic Experts Group (JPEG) has
agreed on a binary arithmetic coding technique for lossless image compression.
12
Arithmetic coding codes symbols by transmitting a \ alue in a specified range.
This range is dependent on the symbols encoded and probabilities used to model the
symbols. Consider a set of symbols with a fixed statistical model comprised of the five
vowels and an exclamafion point: {a,e,i,o,u,!}. Each symbol is assigned a subinter\ al in
the interval [0,1) based on its corresponding probability (Table 3.1). The symbol "e" can
be represented by any value in the range [0.2,0.5) such as 0.24. Any value in the range
[0,1) will represent a unique symbol. When more than one symbol is coded, the range
representing the symbols must be narrowed with each consecutive symbol (Figure 3.1).
Consider the code word (e,a,i,i,!). After "e" is selected for coding, the range narrows
from [0,1) to [0.2, 0.5). This range is subdivided in the same manner as the initial range.
The symbol subintervals will of course become smaller than the initial subintervals. The
range is narrowed again when "a" is selected for coding. The subinterval for a symbol
again becomes an entire range which is subdivided (Figure 3.1). This process of
narrowing the range of an encoded value for each symbol selected for coding continues
until the terminator symbol,"!," is reached. The final range of the code word (e,a,i,i,!) is
[0.23354,0.2336). The code word can be uniquely decoded with any value in this range.
The code word can be coded using 0.23355 or 0.23355349876. The second choice will
require more space to store, so the first choice is preferable. As the length of the code
word increases, the narrower the range becomes, and more digits will be required to store
the encoded value. Since more than one symbol is encoded with a single value,
arithmetic coding is able to achieve a bit rate less than the first-order entropy estimate of
an image as described in Chapter I.
Decoding the code word proceeds similarly to the encoding process. The initial
range, [0,1), and the symbol probabilities are known beforehand. The initial range is
subdivided according to the symbol probabilities. The value of 0.23355 is observed to lie
in the range [0.2, 0.5), so the first decoded symbol is "e." This range is subdivided in the
same manner as the inifial range (Figure 3.1). The value of 0.23355 is observed to lie in
the range [0.2,0.26), so "a" is the next decoded symbol. The decoder knows to stop
13
subdividing the ranges when the terminating symbol. "!," is decoded. As a result, the
decoder obtains the code word (e,a,i,i,!).
The ANSI C code implementation of arithmetic coding uses binary ranges and
subdivisions rather than base 10 as in the above example. The binary implementation
uses one main range of [0, 2'^) for 16-bit values. When this range is subdivided and
narrowed by selecting a symbol to encode, a few most significant bits of the narrowed
range can be determined. For example, if the narrowed range lies in the lower half of [0,
2'^), any 16-bit value in this narrowed range will have a MSB of 0. This 0 bit can be sent
to the compressed output file. The high and low boundaries of the range which have a
common 0 MSB can be left shifted one bit which effectively doubles their value which
also doubles the range. There are other means to determine the most significant bits of a
range of values. However, as each MSB is determined, the range will effectively double
as in the above example.
The ranges are subdivided and narrowed using the "cumfreq" array. With the
above base 10 example, the ranges were subdivided using the probabilities of the
symbols. The "cumfreq" array is closely related to the symbol probabilities. Each
symbol corresponds to a unique array index. The value of each array element is the
cumulation of the FOO of the symbols indexed ahead of the current symbol. For
example, consider Figure 3.2. The "freq" and "cumfreq" arrays have been initialized for
a set of four symbols indexed with indices 1-4. This example will be part of a running
example used in this chapter. Each element of the "freq" array with a symbol index
contains the value of the FOO of the corresponding symbol. All of the FOO are
initialized to I. No symbol is indexed by the zero element, so this value can be set to 0.
The value of "cum_freq[4]" is 0 since there are no symbols indexed ahead of
"cum_freq[4]." All of the symbols are indexed ahead of "cum_freq[0]." so this element
contains the cumulative FOO. The values of "cumfreq" are accumulated in reverse so
that the zero element of "cum_freq" can be used for normalization purposes.
As each symbol is coded, "freq" and "cumfreq" are sorted with respect to FOO in
descending order. The arrays are sorted so that the symbols can be decoded more quickh
14
and efficiently. When the arrays are sorted, the indices corresponding to the s> mbols
change. The program should keep track of the indexes with which the s\ mbols
correspond. Two arrays, "charjoindex" and "indexjochar." provide this ftinction
(Figure 3.3). This program was inifially written for text compression, so the source
symbols are sometimes referred to as characters and abbreviated "char." These two
arrays are inifialized in logical ascending order. The two arrays remain invertible. In
other words, if a character is translated to an index with "chartoindex," the same index
can be used to translate back to the original character with "index_to_char." The indices
are coded with the arithmetic algorithm, and these two arrays are used to translate back
and forth between the original and coded symbols. The original symbols must be
represented by non-negafive integers to provide valid indices for "chartoindex" (Figure
3.3).
The statistical model used by the arithmetic coder is basically a histogram of the
symbol occurrences. This model is labeled adaptive and adapts to changing symbol
probabilities because the histogram is updated with each symbol encoded. [4] provides
an example of a non-adaptive model with a static symbol histogram. The adaptive model
is adequate for most purposes. To complete the four symbol example, suppose the
character 2 is encoded (Figure 3.4). The first symbol index of "freq" was increased to a
frequency count of 2 so that "freq" remains in sorted order. The other three arrays were
adjusted accordingly. This small example should clarify the statistical model update
process.
The arithmetic program was tested with the 8-bit gray scale standard Lena image.
The program reduced the file size from 262,144 bytes to 165,165 bytes for a CR of 1.59
and a bit rate of 5.04BPP. The cumulative squared difference between the original and
reconstructed pixel values was 0 since the coding was lossless. This compressed bit rate
is very close to the entropy bit rate and less than the Huffman coded bit rate calculated
above for the same data. Huffman coding resulted in 5.13BPP while the first-order
entropy esfimate was calculated at 5.02BPP. Execufing on a SPARC20 Sun machine, the
arithmetic encoding and decoding programs required only about fi\ e seconds each to
15
code and decode the Lena image. The Huffman decoder alone required about one and a
half minutes to decode the Lena image. As a result, arithmetic coding can achie\ e lower
bit rates and more efficient execution than Huffman coding.
16
Table 3.1 Model of Set {a,e,i,o.u.I} [4]
Symbol
Probability
Initial Subinterval
a
0.2
[0,0.2)
e
0.3
[0.2,0.5)
i
0.1
[0.5,0.6)
o
0.2
[0.6,0.8)
u
0.1
[0.8,0.9)
I
0.1
[0.9,1.0)
After seeing
a
0.5
0.26.
0.236-,
0.2336-,
0.2336^
u
o
i
e
a
OJ
0.2 J
0.2 J
0.23J
0.233 J
Figure 3.1 Arithmetic Coding Process [4]
freq[0]=0
cum _freq[0]=4
freq[l]=l
cum _freq[l]=3
freq[2]=l
cum _freq[2]=2
freq[3]=l
cum _freq[3]=l
freq[4]=l
cum _freq[4]=0
Figure 3.2 Initialized "freq" and "cum_freq" Arrays
17
0.23354 J
char_to_index[0]=1 index_to_char[ 1 ]=0
char_to_index[ 1 ]=2 index_to_char[2]=1
char_to_index[2]=3 index_to_char[3]=2
char_to_index[3]=4 index_to_char[4]=3
Figure 3.3 Inifialized "char_to_index" and "index_to_char"
Arrays
freq[0]=0
cum_freq[0]=5
char_to_index[0]=3 index_to_char[l]=2
freq[l]=2
cum_freq[l]=3
char_to_index[l]=2 index_to_char[2]=l
freq[2]=l
cum_freq[2]=2
char_to_index[2]=l index_to_char[3]=0
freq[3]=l
cum_freq[3]=l
char_to_index[3]=4 index_to_char[4]=3
freq[4]=l
cum_freq[4]=0
Figure 3.4 Updated Arrays
18
CHAPTER IV
DISCRETE WAVELET TRANSFORM
The confinuous-fime wavelet transform (CTWT) w as developed as an
improvement over the familiar confinuous-time Fourier transform (CTFT). The GIFT is
used to extract the frequency content from confinuous signals (Equafion 4.1). Some
disadvantages of the CTFT are as follows: The CTFT requires knowledge of the entire
00
n(o)=
\f{t)e-^dt
(4T)
-00
signal from -oo to +oo when only part of the signal may be known. Also, when the
frequency characterisfics of the signal change with time, the CTFT is unable to identify
the time dependent signal frequencies. How can one represent signal frequencies w ith
their dependence on time? Musical scores are written with this idea in mind. The
musical notes on a page of sheet music represent sound frequencies at specific time
intervals.
The short-time Fourier transform (STFT) was developed to extract signal
frequencies within certain time intervals (Equation 4.2) as determined by a window
function, w{t-b) [5]. The transformed signal is now time and frequency dependent. The
00
STFT^ (o),b) = \f(t)e-''^w(t-b)dt
(4.2)
-00
basis functions for the STFT become windowed complex exponentials. The window
function is often chosen as Gaussian so that the inverse transform will also use a
Gaussian window function. The STFT is limited in the range of frequencies it can
analyze because the window size remains fixed. The window function will extract onh
part of a cycle of a low frequency and large numbers of cycles of a high frequency. The
window size should vary with frequency so that it can zoom out to measure low
frequencies and zoom in to measure high frequencies.
19
A basis funcfion for a transform which measures the full range of frequency
content of a signal at a time instant would need to have compact support (a limited
interval of non-zero values) in the time domain and frequency domain. This basis
ftinction would be time translatable (shift to measure the signal at different times) and
frequency scaleable (expand and contract to measure different signal frequencies). The
wavelet functions,
WaA^') = a-"''xi/\a^R\beR,
\ a J
were developed as these basis ftincfions for the CTWT,
(4.3)
00
CTWT^ {a,b) = \y/^, {t)f{t)dt [6].
(4.4)
-00
The scaling value, a, is analogous to frequency and the translation value, b, is analogous
to time. Before wavelets were developed, these basis functions were thought not to exist.
CTWTj{a,b) can be interpreted loosely as the "content" of/r) near frequency a and time b
[7]. The original function,/r), can be reconstructed from CTWTj(a,b) if i/4,t) satisfies the
admissibility condition,
l-l'P(^)'
C^ = ]
0
doxc^.
(4.5)
«
H^(co) denotes the Fourier transform of v|/(/). The reconstrucfion formula becomes
/(') = -TT /Jcr»T,(o,6)^„/0-V [6]^1//
-ooO
(4.6)
^
The two-dimensional (2D) discrete wavelet transform (DWT) which is performed on
digital images was developed as an extension of the CTWT. The 2D DWT provides
compact, uncorrelated, multiresolution representation of digital images. This chapter will
describe the development of the 2D DWT. The wavelet transform terminology and
descriptions will remain analogous to the familiar Fourier transform terminology for the
sake of clarity.
20
In order to develop the DWT, the wavelet functions must first be descretized into
a continuous-time wavelet series (CTWS). The wavelet functions in Equation 4.3 are
modified with the following relations:
a = a'"'; b = nbQa'"'; m,n eZ; a^ >\; b^ ^0.
(4.7)
Z is the set of integers. The values of a^ and b^ are arbitrarily set to 2 and 1, respectively.
As m gets smaller, the scaling factor, a, increases which expands the wavelet function.
The translation step size, b/n, also increases so that the step size is scaled the same as the
scaling factor. With this discretization, low frequency and high frequency signals can be
comparably analyzed. The wavelet ftmctions in Equation 4.3 become
yfr'i//{2'"t-n); m,neZ.
(4.8)
These wavelet functions form an orthonormal basis of L^(R) (the function space of square
integrable functions). The series coefficients of the CTWS are
(CTWS,),„„=d„,„ = \42^y/[2'"t-n)f{t)dt,f{t)^L\R).
(4.9)
If there exist numbers A>0 and 5 < oo such that
2
4/ir ^ Z Z k\d J ^^ll/f' ^f{t)^L\R),
(4.10)
meZneZ
where
00
1/11= \\f{t)U'-
(-^ii*
— 00
f(t) can be reconstructed [8] with the inverse of the CTWS,
/(0 = ZI<,.V2^v(2-'-«).
(4.12)
meZneZ
Multiresolution analysis derives from the ability to reconstructXO partially but
not completely. When//) is represented in different resolutions, the constructed signal
contains different frequency bands of the original/O- A low resolufion representafion of
flj) is constructed with a scaling ftinction, (fit). This representation oifij) contains all
frequency bands up to a certain cut off frequency which is similar to a low pass filtering
21
off[t). A low resolution vector space, Vj a iJ (R), is created to contain all functions
within resolution^, j eZ. Asj increases, the low resolution \ector space Vj becomes
more detailed. In other words, as7 increases, Vj contains functions with higher frequency
content. As a result, concentric subspaces are formed with subspace I'j containing all
subspaces Vj such that i<j, i.e.,
V / G Z , F,. eF^.,,,
(4T3)
limK = M F is dense in l'(R), lim V. = Hv. = {0} [91.
Also, these Vj have the property that for each function f(t) e Vj, a contracted version is
contained in the subspace Vj^•^,
VjeZ,f{t) eV^^filt)
GK,,, [6].
(4.14)
A unique scaling function, (ff(t) GF^, is created whose translations, (p(t -n), n eZ. form
an orthonormal basis for VQ. AS a consequence of Equation 4.14, an orthonormal basis
for subspace Vj becomes
^J^(t) = ^[2^(f,{rt-n),neZ.
(4.15)
Let Aj become an operator on / ( / ) G L^ (R) which creates an orthogonal projection off{t)
onto subspace Vj,
Ajfit) = Z {f{uU^,{u))(P^,it).
(4.16)
« = -00
where
00
{f{u).(l>jM))= \f(u^,{u)du.
(4.17)
This projecfion is similar to projecfing a three dimensional vector onto a two-dimensional
plane. The projected two-dimensional vector becomes the nearest representation in the
two-dimensional plane of the three-dimensional vector. Similarly. Ajf{t) is the closest
representation off{t) in subspace Vj,
22
Vg(/) GF^.. | | g ( 0 - / ( 0 | | > K / ( O - / ( O •
(4.18)
4//^0 is similar to a lowpass filtered version off{t), and asy increases. Ajf{t) becomes
more similar to the original XOAn important aspect of the Ajf{t) operafion is how the series coefficients in
Equation 4.16 relate to/r). Let ^^ represent an operator onf{t) which forms the discrete
inner products in Equafion 4.16,
< / ( « ) = {fitl^jAO)
= 4v{f{t),(t>{V{t-2-^n))),
n^Z.
(4.19)
With the convolution operation defined as
00
/ * g{^) = {fit) * g{t)\x) = jfit)g(x
- t)dt,
(4.20)
-00
Equation 4.19 can be rewritten as
00
^ ; / ( « ) = 4lJ \f{t)(t>{V{t-2-^n))dt
= {f{tr42J(t>{-Vt)){2-^n),
neZ.(4.21)
-00
AJ f(n) can be thought of as a convolution off[t) with a flipped and dilated version of
(/{t) uniformly sampled at intervals of 2'^ n . As the scaling ftinction becomes more
contracted, which corresponds to higher frequencies, the sampling rate decreases. Figure
1 in [9] graphically demonstrates the low pass filter characteristics of a sample scaling
function. As a result, Equation 4.21 represents a descretized low pass filtering off{t)
which corresponds to a discrete approximation off{t).
Since computers can only work with discrete signals, multiresolution analysis is
done by computing the discrete approximations ofXO. ^j fi^^) •> ^t many resolutions.
The first discretization offij) which contains the most information is set to AQf{n)
(resolufion 0) for normalization purposes. The lower resolution discrete versions off{t)
(At^fin), Atjfin),
^-^fi^),
• • •) contain less information offij). There is a problem
in converting the discrete approximations from one resolution to another without the need
of confinuous functions. In theory, one should be able to calculate a lower resolution
23
discrete function from a higher resolution discrete function since the higher resolution
function contains more signal information. In fact, this calculation is possible. Since
(fijnit) is a member of function space Vj^^,(/)j„{t) G VJ a F^^,, (fjniO can be constructed
with the orthonormal basis of F^v,,
^JAO
= X {<f>jA^Uj.^A^))</>j.^AO •
(4.22)
The inner products in Equation 4.22 can be rewritten as
00
{h>-(")'^.>.,* (")) = 2^ V2 j^(2^ u - n)(l){V'' u - k)du .
(4.23)
-00
With a substitution of variables, — = 2^u-n.
Equation 4.23 becomes
00
|V2^^(2-' t)(t>{t -{k-
2n))dt = (^_,o(0,^o..-2.(O) .
(4.24)
-00
Next, take the inner product ofj{f) with both sides of Equation 4.22,
{f{t\<t>j,(t))= f,{<l>.,,MA.,-iM)){f(tUj.u(t)).
(4.25)
k=-<xi
Define an impulse response of a discrete filter, //(&>), as
V« G Z , h{n) = 4T'{<t>_,MAM))
(4-26)
and H{co) as a mirror filter with an impulse response h{n) = h(-n).
The V2"' factor is
necessary to equate h(n) with the h{n) found in other wavelet papers. Equation 4.25 then
becomes
< / ( « ) = S V2/r(2« - k)A^,J{k)
= {42h{k) * A'j,J{k)){2n).
(4.27)
A=-oo
with the discrete convolution defined as
g * / ( « ) = {gik) * f{k)){n) =J,g{nA=-00
24
k)f{k).
(4.28)
As a result, the discrete approximation Aj /(^n), at resolutiony, can be calculated from
the discrete approximation at the next higher resolution by convolving A'j^^fin) with the
mirror filter H(co) ,multiplied by v2 , and keeping every other sample. The filter H{o)) is
a unique characterisfic of the scaling funcfion <^t).
If the vector spaces Vj are thought of as concentric circles, then the vector spaces
Wj would become the ring spaces between the adjacent circles. Let the vector space IVj
be defined as the difference between the two low resolution vector spaces Vi and Vr^. In
other words, the union of Wj and Vj becomes Vj+^, F^^, = Wj u Vj. The vector space Wj
also has the interesting property of being orthogonal to Vj, WjLVj. As a consequence of
orthogonality, every function in ^Vi can be written imiquely as a sum of a function in Vj
and a fiinction in Wj, F^^, = W^ © Vj. Since vector spaces Vj contain low resolution
functions with increasing frequency content as the resolution increases, functions
contained in the difference between two adjacent low resolution vector spaces have a
narrow range of frequency content. From the discussion above of the narrow frequency
content of wavelet functions, one could expect that wavelet functions are members of
vector space Wj. In fact, wavelet functions are used to construct an orthonormal basis of
Wj, i.e..
\fn eZ, y/ „(t) = v2^ i//{2-' t -n) form an orthonormal basis for Wj.
(4.29)
Since Wj are non-overlapping, orthogonal vector spaces in L^(R), one can conclude.
\/J,n G Z , i//in(t) form an orthonormal basis for L\R).
Wavelet and scaling functions are closely related such that several constraining
relations can be derived which aid in developing different wavelet and scaling functions.
From Equafion 4.22, we obtain,
^,.(0 = Z(^-.,o("),^o,.-2„(")M..,.(0 = t^Kk-2n)^^^,,{t).
With g(n) defined as an impulse response of a discretefilter.G(co),
25
(4.30)
\fn eZ, gin) = 42^{i//_,,(u),(P,^{u)) .
(4.31)
A similar relafion to Equafion 4.30 can be written for the wavelet functions since
^7jt(0eF,.^,,i.e.,
V'j.nit)= l^{v^-ioi^)Ak-2ni^))^j.ikiO=
Z>/2g(^-2«)^^.,,,(0.
k=-m
(4.32)
A =-00
By integrating both sides of Equafion 4.30 and settingy,«=0, we obtain.
\mdt= Y.Kk)\ \mdt\,
-00
A = -00
V_Qn
Y.h{k) =
/
(4.33)
/t = -00
With the Fourier transform pair for sequences defined as
1
H{co) = Y,Kn)e-''-" <:> h{n) = — \H{co)e''''dco,
2n
(4.34)
from Equafion 4.33, we get //(0)=1. A well know relation whose derivafion can be found
in [6] is
H(^cof ^\H{co^Ti:f
=\.
(4.35)
From this equafion, setting <y=0 yields H(^7i)=0. With Equafion 4.32, setting7>=0 yields
y/it) =
2Y^g{k)(/>{2t-k).
(4.36)
A = -oo
The Fourier domain representation of this equation becomes
^ico) = 2 £
k=-oo
- f
fr„\
CO
(f)^
o
Z giky
^1 -{fJA (co<^
gik) -e ^'^ O —
\2J
\'
A = -ooV
\2)
frn\
CO
(
CO
=G — O
\2)
\2
(4.37)
From Equations 4.30 and 4.32, the orthogonality condition, Vj±Wj, yields
00
V«,/7GZ,
00
0 = (^^.„(O,^,,,(O) = 2 2 ] X ^ ( ^ - 2 « ) g ( ' ^ - 2 j ^ ) ( ^ , . u ( 0 . ^ , . u , ( 0 ) - (4.38)
A=-00
0=
m=-co
2Y.hik-2n)gik-2p)
* = - 0 0
Wlien n,p=0, this equation has a solution,
Vr G Z, gik) = i-\)'hi-k
26
+ 2r + 1) [6].
(4.39)
In order to convert this equation to the Fourier domain, take the summation of both sides
with multiplied exponentials,
°0
00
00
5]g(A:)g-"^ = J^e''^h(-k + 2r + \)e-''^ ^ g'(--).2..i) X/j(^)^-'(--)'".
(4,40)
Gico) = e""^^'^'^ H{K - co)e-'''^^''^'^ = -Hi7i-co)e-"'^"-^'\
Inserting this relation into Equafion 4.37 and setting r=0 yields
4>(^) = V 2 / / [ ; r - y ] o [ y ] .
(4.41)
By setting &F=0 in the previous two equations, one can use the above relation H{7r)=0 to
conclude
«>
00
4^(0) = \i//it)dt = 0 and G(0) = X g(^) = 0.
-00
(4.42)
A=-oo
Thus, the wavelet function and the impulse response of Gico) have zero mean. The
discrete filters, H(co) and Gico), can be determined from the scaling and wavelet functions
or vice-versa. However, the discrete filters and scaling and wavelet functions must
correspond to the above constraining relations and several other constraining relations
which have not been mentioned but are included in [6] and [9]. The constraining
relations have assisted mathematicians in developing many different wavelet functions.
The complete DWT is based on decomposing Aj^^f{n), - J < j < -\. where J is
the number of levels of decomposition, into two sequences Aj f{n) and Dj f{n).
first sequence, ^^/(w), was discussed previously. The second sequence, Dj
The
f{n).w\\\
now be defined. Take the inner product offit) with both sides of Equafion 4.32,
(/(O, ^,. (O) = Z ^gik - 2n){fit),^^.„,, it)).
(4.43)
/C=-Q0
With D"^ defined as an operator on fit) which produces the sequence
V« G Z, D^Jfin) = {fit). y/^„ (/)) -
27
(4.44)
and the sequence g(n) = gi-n) defined as the impulse response of discrete filter Gico).
Equation 4.43 becomes
00
^ ; / ( « ) = Z V2g(2« - k)A^.Jik)
= i^gik)
* <,/(^))(2«).
(4.45)
k=-oo
This relation is comparable to Equafion 4.27. The sequence Dj f(n) can also be written
as an expression comparable to Equation 4.21,
D^fin) = 4V \fit)ii/{Vit-2-^n))dt
= if{t)^4Vi{/{-Vt))i2-^n).
(4.46)
-00
Since the wavelet function has bandpass frequency characteristics [9], from this relation,
D^'j fin) can be thought of as a bandpass filtered version offit) sampled at intervals of
2~^ n. D^'j fin)
is referred to as a detail sequence, and Aj f(n) is referred to as a smooth
sequence. From Equations 4.27 and 4.45, one can graphically represent the
decomposition of Aj^^f(n) with a filter block diagram (Figure 4.1). The arrows entering
and leaving Figure 4.1 represent the cascading of the filter diagram over J
decompositions where - J < J < -\.
Once the sequence A^fin) has been decomposed into J detail sequences and a
smooth sequence, the original sequence can be reconstructed from the decomposed
sequences. Since F^^, = Wj © Vj, (ffj^^„it) e F^.^, can be written uniquely as a sum of a
function in Wj and a function in Vj. These two functions are the projections of ^VIT? on
the respective vector spaces, Wj and Vj, i.e.,
^j.uC) = Z («>..*(«)'«',>,.,.("))«>;.»(')+ Z (^;.*(").(*,.,.„(")K,(').
k=-ao
(4.47)
k=-<x
00
°o
^J.^,niO= Z^/2/2(«-2^)^^.,(0+
A=-oo
^^gin-2k)i//^,it).
A=-co
Taking the inner product offit) with both sides of this relation yields
<,/(«)= f,^Kn-2k)A^fik)+
jt = - 0 0
Y^42g{n-2k)D'jik).
A=-00
28
(4.48)
Inserting a zero between every sample of Aj fik) and Dj fik) results in an upsampling
of the two sequences which is defined as
Affi2k) = A'jik), D^f{2k) = D^fik).
(4.49)
Substituting m = 2k and the upsampled sequences into Equation 4.48 yields
A'j^Jin) = i^h{m)
* Aff{m)){n)
+ (V2g(m) * D'^fim)){n).
Thus, A'^j^^fin) is reconstructed from the decomposed sequences Ajf(n)
and
(4.50)
Djf{n)
by taking the summation of the upsampled sequences convolved with the discrete filters
42Hico) and yl2Gico), respectively (Figure 4.2).
Usually, discrete signals have finite length. There is a problem in translating
finite length sequences into Equations 4.27, 4.45, and 4.50. Define A'f^^f{n) as a finite
length sequence of length T, where Tis even, and Aj^^f{n) as a periodic extension of
Af^^fin) with period T. If the sequences /?(«) and gin) are finite length and nonperiodic. A"! fin)
and D"! fin)
T
will be periodic with a period of —. The requirement
that Af^^fin) must have an even length translates into the requirement that the length of
the original sequence, Af fin),
must be divisible by 2*^, where J is the number of
decompositions. As a result, the sum of the lengths of the two decomposed sequences
Af fin)
and Df fin),
is the same as the length of Afjin).
In other words, the DWT
does not increase the number of samples of the original sequence. This compact
representation of a sequence is an advantage over other progressive decomposition
schemes [9].
The DWT lends itselfeasily to a familiar matrix notation. When Af^^f{n) is
represented as a vector v^^' eR^, this vector is multiplied by a square, non-singular
matrix, M+i- whose non-zero elements consist of the samples of 42hin) and -J2g{n).
This matrix-vector multiplicafion, v' = A^y+iV^^'. results in a vector, v^ eR\ whose
29
elements consist of the two decomposed sequences Af fin)
and Df f{n).
The matrix
notation of the DWT is best illustrated with an example. First, initialize the following
finite length sequences:
^fi^)
= «o,.. ^-{(^) = «-u, ^-t(^) = d_,^,. 0<n<7.0<k<3.
(4.51)
In Table 4.\,g{n) is derived from Equation 4.39 by setting r=\. Also, gin) and hin)
were defined above as gi-n) and hi-n), respectively. Remembering that
Atjin),
and D^Jin)
are periodic extensions of A^^ f{n),
A%f{n),
A^fin),
and
D'fj{n),
respectively, one can represent Equations 4.27 and 4.45 with the matrix notation
Vi,o'
^0,0
^-1,1
^0,1
«-l,2
«-l,3
^3
^-1,0
-Cn
d-x,x
-c.
Ci
«0,2
^1
«0,3
(4.52)
«0,4
-c.
«0.5
-c.
"-1,2
-^0
«0,6
~ ' ^ 2 _ _«0,7 _
_^-13_
where the blank spaces in the square matrix represent zeros. Also, the matrix notation of
Equation 4.50 becomes
C3
^0,0
«0,1
<^1
<^1 "
-^0
-^2
V,,o'
^-1.1
^0,2
<^1
C3
^-1,2
«0,3
-^0
-C2
^-1.3
«0,4
^1
^^3
^-1,0
«0,5
-^0
-C2
d-u
«0,6
<^1
^3
_«0,7 _
-Co
~^2_
(4.53)
d-u
d.u_
The square matrix in Equafion 4.53 is the inverse and the transpose of the square matrix
in Equafion 4.52. Thus, the square matrix in Equafion 4.52 is orthogonal, and its rows
form an orthonormal basis for R^.
30
The filter coefficients of the DWT used in the EZW algorithm (described in the
next chapter) are the DAUB4 coefficients named after Ingrid Daubechies [10]. The
number 4 is associated with the name DAUB4 because hin) has a length of 4 similar to
the above matrix example. Since the rows of the square matrix in Equation 4.52 form an
orthonormal basis for R^, two independent equations for the four coefficients can be
derived,
cl+c^ +cl+cl
=\, C^CQ+C^C, = 0 [11].
(4.54)
In order for these four coefficients to have a unique solution, two more independent
equations are required. The DAUB4 coefficients include vanishing moments for the first
two moments of g(n),
00
Y.^'gin)
= 0, p = 0,\.
(4.55)
W= -00
The vanishing moments yield two more independent equations,
c, -c^ +c, -Co =0, OC3-IC2 +2c, -3co = 0 ,
(4.56)
which give a imique solution for the coefficients,
I + V3
3 + V3
3-V3
I-V3
As the number of coefficients increases, the number of vanishing moments also increases.
The Daubechies filters have even lengths because when the filter length increases by two,
the number of vanishing moments increases by one. With more vanishing moments of
gin), the wavelet ftinction, i/4^t), becomes more regular or smooth with higher continuous
derivatives [11].
In order for the DWT to operate on images, the DWT must be extended to two
dimensions. As discussed above, the vector spaces Vj form multiresolution
approximations of Z (7?). Similarly, define the vector spaces Vj as multiresolution
approximations of L^iR^). Further, define the vector spaces Vj as separable
multiresolution approximations of L'iR~) where Vj can be decomposed into a tensor
31
product of two one-dimensional vector spaces, Vj = Vj (S) Vj. These one-dimensional
vector spaces are multiresolution approximations of L^iR). The scaling fiinction of F"
can be separated into two one-dimensional scaling fiinctions,
^ix,y)
= (l)ix)(l>{y),
(4.58)
where (jix) is the one-dimensional scaling function for Vj. The vector spaces which lie in
the differences between resolutions in two dimensions are defined as W'^. An
orthonormal basis for WJ can be constructed using scaled and translated versions of three
wavelet funcfions, ^\x,y),
^'^{x,y), and ^^ix,y). These wavelet functions are separable
into scaling and wavelet functions of one dimension,
^'\x,y)
= (l>{x)y/iy), '\'\x,y)
= y/{x)(l>{y), ^^\x,y)
= ii/{x)ii/{y).
(4.59)
With similar development as above, let A'j be the operator for the scaling
function and Df , Df, and Df be the operators for the three wavelet functions such
that
A'j{n,m)
=
if{x,y)^V(l>i-Vx)(/>i-Vy))i2-^n,2-^m),
Dffin,m)
= (/(jc,>^)*2V(-2^x)^(-2^>;))(2-^«,2-^m),
Dffin,m)
= {f{x,y)^V
Dffin,m)
= {f{x,y)''Vy/i-Vx)y/i-2'y))i2-'n,2''m),
y/i-V x)(l>i-V y))i2-^ n,2-^ m),
^"^'^^^
\/n,meZ.
These operators perform descretized filtering offix,y) along the x and y axes. With
similar development as above, these operators can be written as a decomposition of
A'^^^fin,m) into four, two-dimensional sequences,
A'jin,m)
= i2hik)hil)
*
Afjik,l))i2n,2m).
Dffin,m)
= i2hik)gil)
*
AfjikJ))i2n,2m),
Dffin.m)
= i2gik)hil)
*
Afjik,l))i2n,2m).
Dffin,m)
= i2gik)gil)
* Afjik,l))i2n,2m).
32
^n^m G Z.
These four two-dimensional sequences and Af^fin,m)
are set to periodic extensions of
finite digital images, similar to the above mentioned periodic extensions of onedimensional finite sequences. The period of the decomposed sequences becomes half of
the period of Af.^f{n,m).
In other words, the decomposed finite images have half the
dimension size of Af^f{n,m)
so that the four decomposed images, taken together. ha\e
the same number of samples as Af^fin,m).
When placed into matrix notation or finite
sum notation, the two-dimensional transformation kernel of the decomposition of
Af^fin,m)
becomes separable in the n and m directions. When a two-dimensional
transform is separable, the transform can be written as a cascade of two one-dimensional
transforms in the two perpendicular directions [1]. Thus, the 2D DWT of a finite digital
image can be represented as the cascade of a one-dimensional DWT on the rows and a
one-dimensional DWT on the columns of the image.
Wlien a digital image is decomposed one level, the four sub-images contain
different types of frequency information of the rows and columns. The low resolution
sub-image appears as a recognizable downsampled version of the original image. The
other three sub-images are less recognizable with high frequency information of either the
rows or columns or both. These three detail sub-images are sometimes referred to as
error images since they represent the error or difference between the low resolution subimage and the original image.
33
^Afjin)
i2
V2G
12
^DU{n)
^H
>^2
^A^fin)
keep one sample out of two
convolve with filter X
X
Figure 4.1 Discrete Wavelet Decomposition [9]
- > Jd
AUin)
' t2
V2//
D]fin)
^ t2
V2G
iy^<j(«)^
' 2
put one zero between each sample
X
convolve with filter X
Figure 4.2 Discrete Wavelet Reconstruction [9]
Table 4.1 Initialized Filter Impulse Responses
n
V2/2(«)
-3
0
-2
V2g(«)
V2g(«)
C3
0
-Co
0
C2
0
Ci
-1
0
Ci
0
-C2
0
Co
Co
c?
c.
1
Ci
0
-c.
0
2
C2
0
Ci
0
3
C3
0
-Co
0
42h{n)
34
CHAPTER V
EMBEDDED ZEROTREE WAVELET ALGORITHM
The embedded zerotree wavelet (EZW^ algorithm developed by Jerome Shapiro
[12] is a quantization algorithm used in lossy image compression. The EZW algorithm
uses two other independently developed algorithms, the DWT and arithmetic coding, to
form a complete image compression system. The arithmetic coding and DWT algorithms
have been described in detail in Chapters III and IV, respectively. EZW is a type of
transform coder discussed in Chapter I because an image is transformed with the DWT
prior to quantization. Also, since EZW quantizes the wavelet coefficients individually
and not in blocks, EZW is considered a scalar quantizer. Because the image information
is not distorted in the DWT or the arithmetic coding algorithm, all of the information loss
occurs in the EZW algorithm where the wavelet coefficients are approximated.
EZW is a low bit rate algorithm. In other words, EZW performs best at high
compression levels. The reconstructed image quality compares well or better than other
image compression programs at high compression levels such as 80:1 or 100:1. The
algorithm makes the most of very little image data. The compression performance is
dependent on the existence of zerotrees which are exponentially growing trees of
insignificant wavelet coefficients. Zerotrees will be discussed later. At high compression
levels, most of the wavelet coefficients are considered insignificant which results in large
and numerous zerotrees. However, at low compression levels, most of the wavelet
coefficients are considered significant and the size and number of zerotrees decreases.
Therefore, at low compression levels such as 10:1 or 20:1 other compression programs
such as JPEG have an advantage over EZW. The worth of EZW is dependent on the
specific need of an image compression program. If one needs compressed images with
near impercepfible distortion at low compression levels, JPEG compression would offer a
good solution. However, if one needs highly compressed images where the image
features remain recognizable but have some visible distortion, EZW will ha\e an
advantage over JPEG. The distortion introduced by EZW is more visually pleasing than
35
the blocking effect distortion produced by JPEG. Results of EZW compression will be
discussed further in Chapter VI.
Besides improved compression performance, another advantage of EZW is its
embedded code. EZW represents an image in a similar way that a computer represents a
decimal number of infinite precision with finite length code. The larger the number of
bits which represent a decimal number, the more accurate the representation will be. The
lower resolution digits are embedded in the front of the code so that the more precise
information is added on the end of the code. The code simply terminates when the
desired precision has been reached. Similarly, all coarse representations of the image are
embedded at the front of the EZW code and the more detailed image content is added on
the end. The EZW algorithm stops executing or generating code when the desired image
accuracy has been reached. As a result, a desired CR or distortion metric can be met
exactly. This embedded code does not affect the compression efficiency.
The EZW algorithm is an iterative algorithm so that it makes successive passes
through the image information. Each pass produces more detailed information of the
image. Different aspects of each iteration will be described and justified intuitively, but
for a more thorough rationalization and mathematical justification, the reader is referred
to [12]. In each iteration, the magnitudes of the wavelet coefficients, |jc|. are compared to
a single threshold value T. The coefficients equal to or below the threshold, |x| < T, are
labeled insignificant, and the coefficients above the threshold, \x\>T, are defined as
significant. The wavelet coefficients compactly represent the low frequency and high
frequency information of an image in distinct spatial locations. The high frequency
information which is contained in the lower level sub-image coefficients may represent
very few image pixels, but contributes significantly to the perceptual image qualit>. The
low frequency information which is contained in the high level sub-image coefficients
represents a large number of pixels compacted into a few coefficients. Therefore, large
magnitude high and low frequency coefficients are considered equally important and
36
compared with the same threshold, T. The insignificant coefficients are coded as zero
values.
Once the significant coefficients are found, their magnitudes and positions must
be coded. Magnitude coding will be discussed later. Efficient position coding represents
a formidable challenge. Since the wavelet coefficients are uncorrelated, it difficult to
predict where a significant coefficient will occur. However, it is easier to predict where
the insignificant coefficients will lie because of the decaying nature of the wavelet
coefficients across decomposition levels. Coefficients at lower levels tend to have
statistically smaller magnitudes than coefficients at higher levels. Therefore, if a
coefficient at a high level is found insignificant, the coefficients with the same spatial
orientation at lower levels have a high probability of also being insignificant. The
coefficients with the same spatial orientation at different levels form tree like structures.
These tree structures of insignificant coefficients are called zerotrees because they are
coded as zero values.
The nodes of the zerotrees have parent-child relationships such that most parent
nodes each have four children nodes (Figure 5.1). The two letters in each block in Figure
5.1 represent either the low (L) or high (H) frequency information of each decomposition
for the rows and columns, respectively. The subscripts represent the levels of
decomposition. A coefficient in a decomposition level above the first, except for the
lowest resolution sub-image coefficients, will have four children with the same spatial
orientafion in the next lower level. The coefficients in the lowest resolution sub-image
will have three children with the same spatial orientation, one in each of the other three
sub-images on the same level. All of the children nodes connected to a parent are called
the descendants of the parent. Similarly, all of the parent nodes connected to a child are
called the ancestors of the child. A zerotree is formed when a parent and all of its
descendants are found to be insignificant. This parent or the top node in the zerotree is
called the zerotree root. After a zerotree root symbol is coded in a compressed file, the
decoder will know to automatically assign zeros to the zerotree root and all of its
descendants. As a result, many insignificant coefficients can be predicted w ith one
37
encoded symbol. With the insignificant coefficients coded efficiently with zerotrees, the
positions of the significant coefficients also become coded efficiently.
The alphabet used to encode the positions of significant coefficients, except in the
lowest decomposition level, will include four symbols: positive significant (PS),
negative significant (NS), isolated zero (IZ), and zerotree root (ZR). In the lowest level,
the alphabet will include three symbols: PS, NS, and zero (Z). There is not a possibility
of a ZR in the lowest level, so the ZR symbol is not included in its alphabet. The
significant coefficients are divided into PS and NS so that the magnitude coding will not
have to keep track of the sign information.
The symbols are entropy coded with the adaptive arithmetic coder described in
Chapter III. With the arithmetic coder, the statistical model is separate from the coder.
The adaptive statistical model consists of a simple histogram. A different histogram is
used to represent each separate alphabet of symbols encoded. Therefore, different
symbol sources will be intermixed during the coding process. The decoder will be able to
unmix the sources since the order of the mixing will be known. The adaptive arithmetic
coder has an advantage in coding the small alphabets used in this algorithm. Each of the
symbols will occur regularly, so the statistical model will only need to have a short
memory to keep track of symbol probabilities. With a short memory, the model will
adapt quickly to changing symbol probabilities which are usually non-stationary. A
maximum frequency count of 256 was used to balance the need for an accurate model
with the need for the model to adapt quickly.
The scanning of the image for significant and insignificant coefficients while
coding the position information is called the dominant pass of the image. The order of
the scanning is important because the decoder will read the information in the same order.
The algorithm scans the image from the low resolution sub-images to the high resolution
sub-images (from the high decomposifion level to the low decomposition le\ el) so that
each parent is scanned before its child. The coefficients are assumed to be zero until the\
are found to be significant. If the coding stops in the middle of the dominant pass, only
the lower resolution coefficients will have the chance to be found significant. Since the
38
low resolution wavelet coefficients can produce recognizable image features better than
the high resolution coefficients, this ordering is justified. The sub-images are scanned
one at a time in the following order after the lowest resolution sub-image (LL^): upper
right (HLN), lower left (LH^), and lower right (HH^). The coefficients within each subimage are scanned from left to right and from top to bottom. The ordering of the subimages in each decomposition level and the coefficients within each sub-image need to be
consistent but are otherwise arbitrary. As each coefficient is scanned, the algorithm goes
through a sequence of decisions to determine which symbol to code (Figure 5.2). Once a
coefficient is found to be significant, its position is not coded again in subsequent
dominant passes. In other words, the dominant pass skips over coefficients which were
found significant in previous dominant passes.
When a coefficient is found to be significant, its magnitude is appended to a list
called the subordinate list. With each iteration of EZW, the threshold value, T. is divided
in half. Therefore, with each dominant pass, more coefficients will be appended to the
subordinate list. Based on the value of the current threshold, the decoder can estimate the
coefficient magnitudes in the subordinate list during the dominant passes. For example,
during the second dominant pass, the decoder will know that the significant magnitudes
lie somewhere between T and 772, where T is the original threshold. As a result, the
width of the magnitude uncertainty interval, (772, 7], is equivalent to the current
threshold, 772, during the dominant pass. The decoder esfimate of a magnitude is the
center of an imcertainty interval. This estimate provides a simple and effective
magnitude estimate. Since this relationship between the uncertainty interval and
threshold will also hold during the first dominant pass, for the sake of consistency, the
initial uncertainty interval will become (7, 27]. This initial uncertainty interval places a
constraint on the initial threshold with respect to the maximum wa\ elet coefficient
magnitude,
— max|x| < r < max|x|.
39
(5.1)
Obviously, T must be less than the maximum magnitude or else the first dominant pass
would not find any significant coefficients. If T is less that half the maximum magnitude,
then at least one magnitude will lie outside the initial uncertainty interval. (T, 27], and the
decoder estimate will be less accurate.
With each iteration of EZW, the uncertainty interval width (UIW) of the
significant magnitudes found in the dominant pass is divided in half However, all of the
magnitudes in the subordinate list should maintain the same UIW because of the
successive approximation principle. In other words, all of the significant magnitudes
found should be approximated with the same range of uncertainty. The UIW's of the
significant magnitudes found in previous dominant passes are reduced to the current
UIW, before the next dominant pass begins, with an iteration called the subordinate pass.
After the first dominant pass, the magnitudes in the subordinate list are compared with
the centers of the uncertainty intervals which are the decoder estimates. If a magnitude is
above the decoder estimate, a " 1" symbol is output. If a magnitude is below the decoder
estimate, a "0" symbol is output. These symbols are then entropy coded with a separate
two symbol alphabet. The new uncertainty intervals will either be above or below the
centers of the old uncertainty intervals. As a result, the UIW of the subordinate list is
divided in half by this process. Also, the UIW of the subordinate list will be equivalent to
the UIW of the magnitudes appended during the second dominant pass. When the
subordinate pass is applied to the subordinate list after each dominant pass, the UIW of
the entire subordinate list will remain uniform.
Since the accuracy of large magnitude coefficients is more important to image
reconstrucfion than the accuracy of small magnitude coefficients, the sorted order of the
subordinate list should reflect the importance of the larger magnitudes. At the end of the
first dominant pass, the subordinate list magnitudes are in order of appearance within the
image. From the decoder point of view, all of these magnitudes are the same, which is
the center of the uncertainty interval. After the magnitudes are refined with the
subordinate pass, their decoded values will change and may differ from one another. The
subordinate list should then be sorted with respect to the decoded \alues so that the
40
second subordinate pass will begin refining the largest magnitudes first (from the decoder
point of view). If the subordinate list is sorted each time after it is refined, the decoded
subordinate list will remain in sorted descending order before each subordinate pass
begins. As a resuh, the more important symbols which refine the larger magnitudes will
be decoded before the symbols refining the smaller magnitudes. This aspect of the EZW
iteration is useful in case the decoding stops during the subordinate pass.
Each EZW iterafion consists of a dominant pass followed by a subordinate pass
followed by a sorting of the subordinate list with respect to the decoded magnitudes. The
iterations continue until a target bit rate, CR, or distortion metric is reached. In the
implementation shown in this chapter, a target CR stops the EZW process. A
reconstructed image distortion metric such as mean square error (MSE) should be close to
non-increasing if not non-increasing with respect to the bit rate. If the EZW algorithm is
still somehow unclear, an example is sometimes worth a thousand words. A very good
EZW example is included in Appendix C. EZW is an innovative approach to quantize
wavelet coefficients so that a recognizable image can be reconstructed with very little
data.
It was of interest to see how EZW performed with a different set of wavelet
coefficients. The wavelet transform used in [12] was based on the 9-tap symmetric
quadrature mirror filters whose coefficients are given in [13]. However, the wa\elet
coefficients used in the implementation given in this chapter are the DAUB4 wavelet
coefficients described in Chapter IV. The performance of EZW in both implementations
was comparable. The results of standard image gray scale compression in Chapter VI can
be compared with the results in [12]. Also of interest was to have a readily available
EZW compression program to compare with other image compression algorithms
developed in our image processing lab such as the adaptive fuzzy clustering algorithm
[14]. Since EZW is a well respected algorithm, the EZW compression program can be
used in future research to compare with new image compression schemes.
The implementation of EZW in ANSI C code compresses and uncompresses an 8bit raw image of variable dimensions. The image can be rectangular, but the w idth and
41
height of the image must be divisible by 2^ where J is the level of wavelet
decomposition, as discussed in Chapter IV. With different initial threshold inputs,
different reconstructed image distortions will result. The initial threshold must be
optimized experimentally as discussed in [12]. The initial threshold can be optimized for
a large class of images with a single initial threshold used to compress several images. In
order to develop a more user friendly image compression system, the initial threshold
input must somehow be eliminated. A user should not be expected to experiment w ith
different thresholds to discover the one that gives the least distortion for a particular
image. Further research is needed in this area. If the CR's were mapped to a distortion
metric such as the peak signal-to-noise ratio (PSNR) for a large class of images, a user
could input a target PSNR instead of a desired CR. The option of a desired reconstructed
PSNR or CR could make the program easier to use. Calculating the reconstructed PSNR
while the image is being coded would give the same ability to set a target PSNR but that
method may be too computationally intensive. The EZW program presented in this
chapter represents a flexible and improvable starting point for research into EZW
compression.
42
LL3
HL3
HL2
LH3
HH3
HL,
n
HH2
4-
LH, \
HH,
Figure 5.1 Zerotree Structure
43
Input coefficient
^ Do not code
Yes
^ Do not code
Code PS
Code NS
Code IZ
Figure 5.2 Position Coding Flowchart
44
Code ZR
CHAPTER VI
RESULTS OF EZW COMPRESSION
EZW compression was tested on two color images and one gray-scale image. The
compression tests demonstrate the advantages of EZW over JPEG compression at high
compression levels. The 24-bit color images were separated into three color planes (red,
green, and blue) for EZW compression since the EZW program can only compress 8-bit
images. The RGB color space was the logical separation choice. The three reconstructed
color planes were then combined into a reconstructed 24-bit color image. The JPEG
compression program compressed the color images directly without separation into color
planes. The distortion of the reconstructed images was quantitatively measured with the
MSE and PSNR. The MSE and PSNR are both calculated from the cumulative square
error (CSE),
CSE=Y.Ti^j,k-h>^f^
(6T)
7=0 * = 0
where Xj ^ is the pixel value of the original M by N image and x^^ is the pixel value of
the reconstructed image. The MSE and PSNR become
^255^^
V est J
CSE
MN
The reconstructed Lena and Baboon images (512x512 size) show a distinct
appearance difference between the EZW and JPEG compressed images (Figure 6.1). The
distortion in the EZW compressed images is smooth without much blocking artifacts.
Some vertical and horizontal artifacts are due to the vertical and horizontal wavelet
transforms. Since the discrete cosine transform (DCT) in JPEG is performed on 8 by 8
image blocks, these image blocks appear as artifacts in the highly compressed images.
The EZW compressed Lena image with 100:1 CR and the EZW compressed Baboon
image with 80:1 CR more accurately represent the original respective images than the
45
respective JPEG compressed images (Table 6.1). This improved performance of EZW
compression over JPEG compression is expected as the CR increases.
The gray-scale standard Lena image was compressed many times w ith EZW and
JPEG to produce two plots (Figure 6.2). With a CR of 50, gray-scale Lena was EZW
compressed with 60 different initial thresholds. The valid initial thresholds lie between
6026 and 12052. The initial thresholds given to the EZW program were from 6100 to
12000 at intervals of 100. The distortion generally decreases as the threshold increases
with the minimum distortion at 11900. Different images and different CR's of this image
may have different initial threshold curves. This threshold curve was meant to give a
general idea of how different initial thresholds can affect the reconstructed distortion.
The minimum distortion threshold that was found in Figure 6.2a was used to
EZW compress gray-scale Lena at 36 CR's from 10 to 118. JPEG was also used to
compress the same image at many compression ratios. The two curves (Figure 6.2b)
demonstrate that EZW compression performance surpasses JPEG compression
performance at a CR of about 42. This graph reasserts the idea that EZW performs well
at high compression levels.
46
(a) Original Lena image
1
(b) EZW compressed at
(c) EZW compressed at
(d) EZW compressed at
60:1 CR
80:1 CR
100:1 CR
4 WMJMM4 m^'.'^^^i
(e) JPEG compressed at
(f) JPEG compressed at
(g) JPEG compressed at
59.6:1 CR
78.0:1 CR
96.9:1 CR
Figure 6.1 Lena and Baboon Images Compressed with
EZW and JPEG
47
(h) Original Baboon image
(i) EZW compressed at
(j) EZW compressed at
(k) EZW compressed at
40:1 CR
60:1 CR
80:1 CR
(1) JPEG compressed at
(m) JPEG compressed at
(n) JPEG compressed at
40.8:1 CR
61.0:1 CR
79.1:1 CR
Figure 6.1 Continued
48
Table 6.1 Lena and Baboon Compressed Image Statistics
Image
Compression
CR
MSE
PSNR
BPP
Lena
EZW
60
88.44
28.66
0.4
Lena
EZW
80
115.1
27.52
0.3
Lena
EZW
100
130.0
26.99
0.24
Lena
JPEG
59.6
78.90
29.16
0.403
Lena
JPEG
78.0
114.9
27.53
0.308
Lena
JPEG
96.9
169.0
25.85
0.248
Baboon
EZW
40
480.5
21.31
0.6
Baboon
EZW
60
568.1
20.59
0.4
Baboon
EZW
80
620.2
20.21
0.3
Baboon
JPEG
40.8
420.2
21.90
0.588
Baboon
JPEG
61.0
556.6
20.68
0.393
Baboon
JPEG
79.0
684.1
19.78
0.304
49
6000
7000
8000
9000
10000
11000
12000
initial threshold
(a) Threshold vs. MSE Plot for EZW Compression
250
200
JPEG compression
m
CO
^ 150
•D
O
i 100
EZW compression
o
o
0)
50
0
0
20
40
60
80
compression ratio (nnn:l)
100
(b) EZW and JPEG Rate-Distortion Curves
Figure 6.2 EZW and JPEG Compression Plots
50
120
CHAPTER VII
SUMMARY AND CONCLUSIONS
Two lossless compression programs and one lossy image compression program
have been implemented and described in this thesis. The results of Huffman coding were
compared with arithmetic coding with one example. Arithmetic coding will consistently
compress better than Huffman coding without losing any information. This arithmetic
coding characteristic occurs because arithmetic coding does not ha\ e the same theoretical
upper compression bound as variable length coders like Huffman coding.
After the 2D DWT was described, this transform and arithmetic coding w ere
applied to the new EZW quantization algorithm. EZW compression is a low bit-rate
compression algorithm, and the results demonstrated EZW's better performance at low
bit-rates over the standard JPEG compression for gray-scale images. The advantage of
EZW over JPEG in color image compression was less obvious. The major drawback in
the method used to compress the color images with EZW was the absence of the
elimination or reduction in the redundancy between color planes. The RGB color planes
were simply separated, compressed equally, reconstructed, and recombined. The RGB
color planes are correlated with one another. A transformation to a more uncorrelated
color space, where the color plane with less visual information is quantized more coarsely
than the other planes, should be attempted. There has been much research in optimal
color image quantizafion [15]. More research is needed into how EZW can be applied to
a better color image compression scheme.
The EZW program itself can also be improved. With more experiments w ith the
initial threshold, the best initial threshold may be predicted by the EZW program so that
the user input of an inifial threshold can be eliminated. This result would produce a more
user friendly program similar (in ease of use) to the JPEG compression program.
51
REFERENCES
I] Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, New York:
Addison-Wesley Publishing Company. 1992, pp 307-411.
2] Huffman, D. A., "A Method for the Construcfion of Minimum Redundancy Codes."
Proc. IRE^ vol. 40, no. 10, 1952, pp 1098-1101.
3] Stephen G. Kochan, Programming in ANSI C, Indianapolis. Ind.: Sams Publishing,
1994.
4] Ian Witten, Radford Neal, and John Cleary, "Arithmetic Coding for Data
Compression," Communications of the ACM. vol. 30, no. 6, June 1987. pp 520-540.
5] D. Gabor, "Theory of Communicafion," J. Inst. Elect. Eng. (London). Vol. 93. No. 3.
1946, pp 429-457.
6] H. J. Barnard, Image and Video Coding Using a Wavelet Decomposition, 1994, pp 727.
7] Ingrid Daubechies, Ten Lectures on Wavelets, Philadelphia, Pa.: Capital City Press.
1992, p. 2.
8] Ingrid Daubechies, "The Wavelet Transform, Time-Frequency Localization and
Signal Analysis," IEEE Transactions on Information Theory, Vol. 36, No. 5, pp 9611005, September 1990.
9] Stephane G. Mallat, "A Theory for Multiresolufion Signal Decomposition: The
Wavelet Representation," IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. 11, No. 7. pp 674-693 July 1989.
10] Ingrid Daubechies, Communications on Pure and Applied Mathematics, vol. 41,
1988, pp 909-996.
II] Press, Teukolsky, Vettering, and Flannery, Numerical Recipes in C, Cambridge:
Cambridge University Press, 1992, pp 591-606.
12] Jerome M. Shapiro, "Embedded Image Coding Using Zerotrees of Wavelet
Coefficients," IEEE Transactions on Signal Processing, Vol. 41, No. 12, December
1993, pp 3445-3462.
13] E. H. Adelson, E. Simoncelli, and R. Hingorani, "Orthogonal Pyramid Transforms
for Image Coding," Proc. SPIE. vol. 845. Cambridge, MA, Oct. 1987. pp 50-58
14] S. Mitra and S. Pemmaraju, "Adaptive Vector Quantization using an ART-based
Neuro-ftizzy Clustering Algorithm," Invited Paper presented at the International
Conference on Neural Networks, June 3-6, Washington, D. C . June 3-6, 1996.
52
[15] Jean-Pierre Braquelaire and Luc Brun, "Comparison and Optimization of Methods of
Color Image Quantization," IEEE Transactions on Image Processing- Vol. 6, No. 7,
July 1997, pp 1048-1052.
53
APPENDIX A: ANSI C IMPLEMENTATION OF
HUFFMAN ENCODING AND DECODING
huffman.c
/* functions used in Huffrnan algorithm */
#include <stdio.h>
/* Huffrnan structure */
struct hufftnan {
short int symbol;
int occurrence;
short int codelength;
int codenumber;
struct huffrnan *next;
struct hufftnan * child;
};
extern FILE *out;
static int code_len[511],code_num[511];
/* code each symbol */
void code_symbol(int symbol) {
int n,len=code_len[symbol];
void output_bit(int bit);
for (n=l ;n<=len;++n)
output_bit(code_num[symbol]&(l«(len-n)));
}
/* The "codebook" ftinction initializes "codelen" and "codenum" and sends the code book to a file. The
"codejen" and codenum" arrays are used by the symbol coder to instantly look up the code length and
code number for any symbol. Searching through a linked list for the code length and code number would
be too time consuming. However, the symbol values must be translated into array indexes which are
positive integers. Since the symbol values range from -255 to 255, adding 255 to the symbol values would
translate them to a range from 0 to 510. The symbol values and code lengths were written as short integers
(2 bytes) instead of integers (4 bytes) to save file space. */
void code_book(struct huffrnan *listptr, int sym_count) {
int num;
short int sym,len;
fwrite(&sym count,sizeof(int), 1 ,out):
while(listptr!=NULL) {
sym=listptr->symbol,num=listptr->code_number;
len=listptt-->code_length;
code_len[sym+255]=len;
code_num[sym+255]=num;
fwrite(&sym,sizeof(short int), 1 ,out);
54
fwrite(&len,sizeof(short int), 1 ,out);
fwrite(&num,sizeof(int), 1 ,out);
listptr=listptr->next;
}
}
/* insert structure into sorted list */
void insert_occurrence(struct huffrnan list[],int listnum) {
struct huffrnan *listptr,*listpp;
Iistptr=list[0].next,listpp=list;
while(listptr->occurrence<list[list_num].occurrence&&listptr->next!=NULL)
listpp=listptr,listptr=listptr->next;
if (listptr->occurrence >= list[list_num].occurrence)
listpp->next=&list[list_num],list[list_num].next=listptr;
else
listptr->next=&list[list_num],list[list_num].next=NULL:
}
/* sort structures with respect to occurrence */
void sort_occurrence(struct huffrnan list[],int symcount) {
int n;
list[0].next=&list[ 1 ],list[ 1 ].next=NULL;
for (n=2;n<=sym_count;++n)
insert_occurrence(list,n);
}
/* perform source reduction */
void create_parent(struct huffrnan list[],int symcount) {
struct huffrnan *listptr=list[0].next;
Iist[sym_count].child=list[0].next;
list[sym count].occurrence=listptr->occurrence,listptr=listptr->next;
list[sym_count].occurrence+=listptr->occurrence,list[0].next=listptr->next;
insert_occurrence(list,sym_count);
}
/* initialize structures */
void initialize_list(struct huffrnan *listptr) {
while (listptr!= NULL) {
listptr->child=NULL,listptr->code_number=0;
listptr=listptr->next;
}
}
/* The "assigncodes" ftinction recursively retraces the source reduction process and assigns codes to ever}
symbol created. Since the original symbols which were created by the histogram process have the "child"
pointer pointing to "NULL," the "assigncodes" function stops iterating when the "NULL" pointer is
reached, "level" represents the number of bits in each code. With each recursion, "level" is incremented.
The "code_number" value is passed back fr-om the parent to the to the tvvo symbols which were source
reduced to form the parent. These two symbols left shift "codenumber," and add a 1 or 0 to the least
significant bit. This process is continued until "codelength" and "codenumber" have been assigned to all
of the original symbols. */
55
void assign_codes(struct hufftnan *listptr,int level) {
if(listptr=NULLj
return;
(listptr—'next)->code_number=listptr->code_number;
listptr->code_length=level,listptr->code_number«= 1;
if (listptr->child!-NULL)
(listptr->child)->code_number=listptr->code_number:
assign_codes(listptr->child,level-1);
listptr=listptr->next,listptr->code_length=level;
listptr->code_number«=l ,listptr->code_number|=l;
if(listptr->child!=NULL)
(listptr->child)->code_number=listptr->code_number;
assign_codes(listptr->child,level+1);
}
/* The Hufftnan codes are generated from the histogram data and the code book is stored in a file. The
"hufftnancode" function executes the Huffrnan coding process. First, the linked list of structures is sorted
in ascending order with respect to "occurrence" with the "sortoccurrence" ftinction. The first uvo svmbols
in the list will have the least probabilities of occurrence and the least "occurrence" values. These two
symbols will be source reduced with the "create_parent" function. The symbol count is incremented so
"create_parent" can create a new symbol structure. The list count, "listcount," is decremented because the
size of the source reduced list will decrease by one. The new symbol's "occurrence" value will be the sum
of the "occurrence" values of the symbols which were source reduced. The two symbols which are source
reduced are taken out the sorted list, and the new symbol is inserted so that the list remains sorted. The
"child" pointer of the new symbol structure points to the first of the two structures which were removed.
This process is iterated until only two symbols remain in the list. Next, the code book is sent to the output
file. The original symbols are re-sorted, so that an iterative loop can process them in a linked list. */
void hufftnan_code(struct huffrnan list[],int symcount) {
int list_count=sym_count,n=sym_count;
sort_occurrence(list,n);
initialize_list(list[0] .next);
while (list_count>2)
-H-sym_count,create_parent(list,sym_count),—listcount;
assign_codes(list[0].next, 1);
sort_occurrence(list,n):
code_book(list[0].next,n);
}
histogram, c
/•create a histogram from file data */
#include <stdio.h>
struct huffrnan {
short int symbol;
int occurrence;
short int codelength;
int codenumber;
struct huffrnan *next;
56
struct huffman *child;
};
/* create a histogram with a linked list of structures */
int hist_input(struct huffman list[],int sym,int nj {
struct huffrnan *listptr,*listpp;
if(n==0) {
list[ 1 ].symbol=sym,list[ 1 ] .next=NULL,list[ 1 ].occurrence = 1;
list[0].next=&list[l],++n;
retum(n);
}
Iistptr=list[0].next,listpp=list;
while (listptr->symbol < sym && listptr->next != NULL)
listpp=listptr,listptr=listptr->next;
if (listptr->symbol == sym) {
++listptr->occurrence;
retum(n);
}
else if (listptr->symbol > sym) {
++n,list[n].symbol=sym,list[n].occurrence=l,list[n].next=listptr;
listpp->next=& list[n];
retum(n);
}
else {
++n,listptr->next=&list[n],list[n].symbol=sym,list[n].occurrence=l;
list[n].next=NULL;
retum(n);
}
}
/* The "histoutput" function sends the histogram data to an output file in the sorted order. The first
structure element contains the pointer to the first structure element with symbol data, so the first structure
element with symbol data can be any structure in the array. The final structure in the linked list has the
"next" pointer pointing to "NULL," so an iterative loop can find the end of the list. The histogram data
consists of the "symbol" value followed by the "occurrence" value. Each data pair is sent to the output file
in floating point format. The Hufftnan program was compiled and ran on a SPARC20 Sun machine, and
the histogram data was read and plotted with MATLAB on a PC. The floating point format was the only
format which the N4ATLAB program could read correctly. */
void hist_output(FlLE *img_dest, struct hufftnan *listptr) {
float n[2];
while (listptt- != NULL) {
n[0]=(float)(listptr->symbol),n[l]=(float)(listptr->occurrence);
fwrite(n,sizeof(float),2,img_dest),listptr=listptr->next;
}
}
mhisthuff.c
/•contains the main function for Hufftnan coding an 8-bit image */
#include <stdio.h>
57
sttTJCt huffman {
short int symbol;
int occurrence;
short int codelength;
int codenumber;
struct hufftnan *next;
struct hufftnan * child;
};
FILE *out;
/* The "main" function checks the validity of the command line arguments and controls the flow of the
program. This function receives pixel information from an input image and codes the difference beU\ een
each pixel and the pixel imediately preceeding using the Huffrnan algorithm. The program compresses
images stored in the raw data format with 8 or 24 BPP. The program compresses a 24-bit color image b\
coding the three color planes separately. The image must be stored in the planar, non-interlaced format (ie.
[RRRRR... GGGGG... BBBBB...]). The "-P" switch controls which color plane will be coded. For
example, the command line option "-P 2" would compress the second color plane. The default value of "p"
is 1, so the "-P" switch can be omitted for gray scale images. The "-H" and "-W" switches indicate the
height and width of the image, respectively. The default value of the height and width is 512 since 512 is
the common dimension of standard images. The string count, "strcount," is the total number of symbols
to encode. The symbol count, "symcount," is the number of unique symbols. The string count is the area
of the image, and the symbol count is determined by the "histinput" ftinction. */
main(int argc, char *argv[]) {
FILE *in,*out2;
int n=argc,p= 1 ,w=512,h=512,sym_count=0,str_count,current,past=0;
struct hufftnan *big_list;
void start_outputing_bits(void);
void done_outputing_bits(void);
void code_symbol(int symbol);
void hufftnan_code(struct hufftnan list[],int symcount);
int hist_input(struct hufftnan list[],int sym,int n);
void hist_output(FILE *img_dest,struct hufftnan list[]);
if ( argc = 1) /* if no arguments print usage */
{
printf("usage: huff_code In Outl Out2 [-W nnn] [-H nnn] [-P n]\n");
exit(O);
}
while ( - n ) /* get switches */
{
if(argv[n][0] = '-')
{
if(argv[n][l] = 'W')
w = atoi( argv[n+l]);
elseif(argv[n][l]=='H')
h = atoi( argv[n+l]);
elseif(argv[n][l]=='P)
58
p = atoi(arg\[n-l]);
else
{
printf("ln\alid argument n"):
printf( "usage: huffcode In Outl Out2 [-W nnn] [-H nnn] [-P n] n");
exit(O):
}
/
/
if( !(in=fopen(argv[l],"r"))) {
printf("Unable to open input image\n");
exit(0);
}
if( !(out=fopen(argv[2]."w"))) {
printf("Unable to open first output image^n");
exit(O);
/
if ( !(out2=fopen(argv[3],"w"))) {
printf("Unable to open second output image'n");
exit(O);
}
if (!(big_list=(struct huffrnan *)malloc( 1000*sizeof(struct hufftnan)))) {
printf("Not enough memory for biglist n");
exit(O);
}
/ * Position the file pointer at the correct color plane with the "p" value.*/
fseek(in,(p-1 )* w*h,SEEK_SET);
str_count=w*h; /* number of svmbols to encode */
fwrite(&str_count,sizeof(int), 1 ,out);
* After the file pointers and other variables have been initialized, the histogram data is created,
"current" represents the current pixel value and "past" represents the previous pixel value. The second
argument passed to the "hist_input" fiinction is the symbol to be coded which is the difference between
the current and previous pixel values, "biglist" is the pointer to the list of "huffman" structures,
"histinput" returns the updated value of the symbol count. If the svmbol passed to "histinput" is
different from other s\ mbols previously passed to the function, a new "hufftnan" structure is linked to
"biglist" with the "next" pointer, the symbol count is incremented, and the "occurrence" structure
value is initialized to one. If the symbol passed to "histinput" is the same as a previous s\mbol. the
"occurrence" value of the symbol structure is incremented. While the histogram is being created, the
structures are sorted with respect to the "symbol" values. */
printf("creating histogram. . .n");
for (n=l ;n<=str_count;-i-+n) {
current=getc(in);
s\ m_count=hist_input(big_list,current-past.s\ mcount):
past=current;
}
hist_output(out2,big_list[0].next); /* send histogram data to file */
59
printf("creating hufftnan code book. .. n");
huffrnan_code(bigJist.sym count);
fseek(in,(p-l)*w*h,SEEK_SET);
/* After the Huffrnan algorithm generates the symbol codes, the symbol string from the input file must
be coded into the compressed file. The symbols are coded one bit at a time w ith the "outputbit"
function. Similarly, when the symbols are decoded, the decoder reads the compressed file one bit at a
time with the "inputbit" ftinction in the same order the bits were sent to the compressed file,
"startoutputingbits" and "done_outputing_bits" begins and ends bit output process, respecti\ ely. The
argument sent to "code_s\ mbol" is the symbol value translated to an array index. The "code_s\mbol"
function sends the code number of each symbol to the compressed file from the most significant bit
(MSB) to the least significant bit (LSB), because Huffrnan coding is uniquely decodeable onh from left
to right. */
printf("sending huffrnan code string. . .\n");
past=0,start_outputing_bits();
for (n=0;n<str_count;^+n) {
current=getc(in);
code_symbol(current-past+2 55):
past=current;
}
doneoutputingbitsO;
exit(O):
}
bitoutput.c
* output bits into file */
#include <stdio.h>
extern FILE *out;
static int buffer;
static int b i t s t o g o ;
void start_outputing_bits(void) {
buffer=0;
bits_to_go=8;
}
void output_bit(int bit) {
buffer»=l;
if(bit) buffer 1=0x80;
b i t s t o g o -= 1;
if(bits_to_go==0) {
putc(buffer,out):
bits_to_go=8:
}
}
void done_outputing_bits(void) {
putc(buffer»bits_to_go,out);
60
m huff dec.c
/*decodes the huffrnan encoded file */
#include <stdio.h>
FILE *in;
static int code_len[51 l],code_num[511];
static int sym_vector[51 l].s\m_count;
/* In the "decodesymbol" function, each bit read from the compressed file is counted by "n" and left
shifted into "decodenum." For everv bit read, the code length and code number for all of the unique
symbols are tested for equality with "n" and "decodenum." respectively. The iterative loop cycles through
the array indices. Since not all array indices of "codelen" and "decodenum" correspond to unique
symbols, the search should be limited to only the symbol indices. The "symvector" array contains the
index values corresponding to the unique symbols in order from most probable sy mbol to least probable
symbol. The iterative loop limits the search of symbols by cycling through the values of the "symvector"
array. Even with this search limitation, the decoding process for the 512x512 Lena image required about
one and a half minutes to execute running on a 75MHz SPARC20 Sun machine. */
int decode_symbol(void) {
int n,m,k,decode_num=0;
int inputbitO;
for(n=l;;++n){
decode_num«= 1;
decode_num|=input_bit();
for(m=0;m<sym_count;++m) {
k=sy mvector [m];
if (n==code_len[k] && decode_num=code_num[k])
retum(k);
}
}
}
/* The "main" function decodes the compressed file and outputs the reconstructed image. The code book is
read from the file, and the symbols are decoded. The information from the code book is stored in three
array vectors: "codelen," "codenum," and "symvector." */
main(int argc. char *argv[]) {
FILE *out;
int n=argc,number,str_count,current.past=0;
short int symbol,length:
void start_inputing_bits();
/* if no arguments print usage */
if(argc== 1)
{
printf( "usage: huffdecode In Out n");
exit(O):
}
61
* open input and output files */
if ( !(in=fopen(argv[l]."r"))) {
prinrf("Unable to open first input image'n");
exit(O);
}
if ( !(out=fopen(argv[2],"w"))) {
printf("Unable to open output image n");
exit(O);
}
start_inputing_bits():
printf("decoding huffrnan code string. . .^n");
for (n=0;n<511:++n) code_len[n]=0,code_num[n]=0;
fread(&str_count,sizeof(int),l,in); /* the number of symbols encoded */
fread(&sym_count,sizeof(int),l,in); /* the number of unique symbols */
/* read the codebook */
for (n=0;n<sym_count;++n) {
fread(&symbol,sizeof(short int),l ,in);
fread(&length,sizeof(short int), 1 .in);
fread(&number,sizeof(int), 1 ,in);
code_len[sy mbol+255]=length;
code_num[symbol+255]=number:
sym_vector[sym_count-n-l]=symbol^255;
}
/* read the encoded symbols */
for (n=l;n<=str_count;-^n) {
symbol=decode_symbol():
current=symboH-past-255;
putc(current,out);
past=current;
}
exit(O);
}
inputbits.c
/* input bits from file */
-include <stdio.h>
extern FILE *in;
static int buffer:
static int b i t s t o g o ;
static int garbagebits;
void startJnputing_bits(void) {
b i t s t o g o = 0:
garbagebits = 0:
}
62
int input_bit(void) {
intt;
if(bits_to_go=0) {
buffer=getc(in);
if(buffer=EOF) {
garbage_bits+=l;
if(garbage_bits>14) {
printf("Bad input file n");
exit(O);
}
}
bits_to_go=8;
}
t-buffer&l;
buffer»=l:
b i t s t o g o -= 1;
return t;
}
63
APPENDIX B: ANSI C IMPLEMENTATION OF
ARITHMETIC ENCODING AND DECODFNG [4]
arithmeticcoding.h
/* Declarations used for arithmetic encoding and decoding */
#define
typedef
#define
#define
#define
#define
Codevaluebits 16
/* Number of bits in a code value */
long code_value;
/* Type of an arithmetic code value
*/
Topvalue (((long) 1 «Code_value_bits)-1) /* Largest code value */
Firstqtr (Top_value/4+l) /* Point after first quarter */
Half (2*First_qtr)
/* Point after first half */
Thirdqtr (3*First_qtr)
/* Point after third quarter */
m_encode.c
/* encodes an 8-bit image using arithmetic coding */
#include <stdio.h>
#defineNS511
FILE *out;
int char_to_index[NS],index_to_char[NS+l];
main(int argc,char *argv[]) {
FILE *in;
int n=argc,w=256,h=256,sym count,ch,symbol;
int cum_freqfNS],past,current;
intdata[100];
void start_model(int cum_freq[]);
void start_outputing_bits(void);
void start_encoding(void);
void encode_symbol(int symbol,int cum_freq[]);
void update_model(int symbol,int cum_freq[]);
void done_outputing_bits(void);
void done_encoding(void);
/* if no arguments print usage */
if ( argc = 1)
{
printf("usage: arithcode In Out [-W nnn] [-H nnn]\n");
exit(O);
}
/* get args */
while (~n )
{
if ( argv[n][0] == '-')
{
if(argv[n][l]=='W)
/* then this is a switch */
64
w = atoi( argv[n+l] );
elseif(argv[n][l]=='H')
h = atoi( argv[n+I] );
else
{
printf("lnvalid argumentVn");
printf( "usage: arithcode In Out [-Vv nnn] [-H nnn] n"):
exit(O);
}
}
}
/* open input and output files */
if ( !(in=fopen(argv[l],"r"))) {
printf("Unable to open first input imageVn");
exit(O);
}
if ( !(out-fopen(argv[2],"w"))) {
printf("Unable to open output imageVn");
exit(O);
}
start_model(cum_freq);
startoutputingbitsO;
start_encoding(),past=0;
/* set up other modules */
/* The image pixels are encoded in the following for loop. The difference between two adjacent pixels
is stored in "ch." This difference is translated up 255 so that the range of "ch" becomes 0-510. This
translation is necessary so that the source symbols become valid array indices which are non-negative
integers. The number of symbols, "NS," is defined as 511. This value is used to set the maximum
array indexes. An original symbol is translated into an index symbol. Next, the index symbol is
encoded, and the statistical model is updated. The "past" value is set to "current" and the process is
continued for all the image pixels. The "doneencoding" fiinction assures that the final value encoded
is within the final range so that the last symbol can be decoded. *
for (n=0;n<(w*h);++n) {
current=getc(in);
/* read the next character */
ch=current-past+255;
/* translate to a non-negative integer */
symbol=char_to_index[ch]:
/* translate to an index */
encode_symbol(symbol,cum_freq);/* encode that symbol */
update_model(symbol,cum_freq); /* update the model */
past=current;
}
doneencodingO;
doneoutputingbitsO;
/* send the last few bits */
exit(O);
}
arithmetic encode.c
/* arithmetic encoding algorithm */
65
#include <stdio.h>
#include "arithmeticcoding.h"
/* current state of the encoding */
static codevalue low, high; /* ends of the current code region */
static long bitstofollow; /* number of opposite bits to output after the next bit */
/* start encoding a stream of symbols */
void start_encoding(void) {
low=0;
/* full code range */
high=Top_value;
bits_to_follow=0;
/* no bits to follow next */
}
/* output bits plus following opposite bits */
static void bit_plus_follow(int bit) {
void output_bit(int bit);
output_bit(bit);
/* output the bit */
while (bits_to_follow>0) {
output_bit(!bit);
/* Output bitstofollow opposite bits.*/
bits tofoUow -= 1;
/* Set bitstofollow to zero. */
}
}
/* The binary implementation of arithmetic coding subdivides one main range of [0, 2 16) for 16-bit
values. When the range is subdivided and narrowed by selecting a symbol to encode, a few most
significant bits of the narrowed range can be determined. For example, if the narrowed range lies in the
lower half of [0, 2^16), any 16-bit value in this narrowed range will have a MSB of 0. This 0 bit can be
sent to the compressed output file. The high and low boundaries of the range which have a common 0
MSB can be left shifted one bit which effectively doubles their value which also doubles the range. There
are other means to determine the most significant bits of a range of values. However, as each MSB is
determined, the range will effectively double as in the above example. */
/* encode a symbol */
void encodesymbol (int symbol,int cum_freq[]) {
long range;
/* size of the current code range*/
range=(long)(high-low)+l;
/* The following two lines demonstrate the method used to subdivide the range. The array elements
"cum_freq[symbol-l]" and "cum_freq[symbol]" will differ by at least 1. The new range ("low"
subtracted from "high") will be a fraction of the old range calculated in the previous line. The size of
the new range will be proportional to the frequency count of "symbol" ("cum_freq[symbol]" subtracted
from "cum_freq[symbol-l]"). This frequency count is, of course, directly proportional to the
probability of "symbol." The equations in the following two lines should be restricted from overflow
and underflow conditions. An overflow condition will occur if the product, "range*cum_freq[symbol1]," is greater than 2^31-1 since the operation is signed 32-bit integer multiplication. The maximum
"range" value is 2^16-1, and the maximum "cum_freq[symbol-l]" value is "cum_freq[0]" which is at
most "Maxfrequency." With this condition. "Maxfrequency" is limited to 2"^15. An underflow
condition will occur if "high" and "low" become the same integers. In this case, encoding and decoding
will become impossible. If "range" is too small or "cum_freq[0]" is too large, an underflow will occur.
66
In the following for loop, "range" is limited to a minimum of 2^14 which is a quarter of the maximum
range. If "cum_freq[symbol-l]" and "cum_freq[symbol]" have a difference of 1, "cum_freq[0]" could
be at most 2^14 in order for "high" to be greater than "low." Therefore, in order to avoid both the
underflow and overflow conditions, "Maxfrequency" is set at 2 14-1. */
high=low+(range*cum_freq[symbol-l])/cum_freq[0]-l; /* Narrow the code region to */
Iow=low+(range*cum_freq[symbol])/cum_freq[0];
/* that allotted to this symbol. */
/* After the range is narrowed in the preceding two lines, some MSB's must be determined so that the
range can expand above the minimum range. An example of determining a MSB so that the range can
be doubled was discussed previously. A MSB from a range lying in the upper or lower half of the 16bit region is simple to obtain. A 0 bit is sent for a range lying in the lower half and a 1 bit is sent for a
range lying in the upper half. In both cases, the range is scaled by doubling the "high" and "low"
values. In the case of the range lying in upper half, half of the region must be subtracted from "low"
and "high" before they can be scaled. Sometimes the range may not lie in either the upper or lower half
but still be smaller than the minimum range to prevent underflow. In that case, "low" will lie in the
second quarter and "high" will lie in the third quarter of the 16-bit region. The range is expanded by
subtracting one quarter of the region from "low" and "high" and doubling their values. For each
consecutive occurrence of this case, "bits tofollow" is incremented. When the next MSB is found
from a range lying in the upper or lower half of the region, a number of opposite bits from the current
MSB must be sent. The number of opposite bits is equivalent to "bitstofollow." For example,
suppose "low" is 0111101111111111 (31743) and "high" is 1000010000000000 (33792). The five
MSB's of any number in this range ("high">x>"low") can be determined if the first MSB is known
beforehand. The four MSB's after the first MSB are the binary opposite of the first MSB. These "high"
and "low" values would cause the third condition to occur four consecutive times so that
"bitstofollow" would equal 4. After the next MSB is sent, four opposite bits would follow. Each of
the three conditions causes the range to double. Once the range is large enough and the MSB's which
can be determined are determined, the "encodesymbol" ftinction is exited. */
/* loop to output bits */
for(;;) {
if(high<Half) {
*/
bit_plus_follow(0); /* output 0 if in low half
}
*/
else if (low>=Half) { /* output 1 if in high half
bit_plus_follow(l);
low -= Half;
/* subtt-act offset to top */
high -= Half;
}
/* Output an opposite bit later */
else if (low>=First_qtt- && high<Third_qtt-) {
/* if in middle half */
bits to follow+= 1;
/* Subtract offset to middle */
low -= Firstqtr;
high -= Firstqtr;
}
/* Otherwise exit loop */
else break;
/* Scale up code range */
low = 2*low;
high = 2*high+l;
}
/* finish encoding the sft-eam */
void done_encoding(void) {
67
bits_to_follow +=1;
/* Output two bits that select the quarter */
if (low<First_qtr) bit_plus_follow(0); /* that the current code range contains. */
else bit_plus_follow(l);
}
mdecode.c
/* decodes an 8-bit image */
#include <stdio.h>
#defineNS511
FILE *in;
int index_to_char[NS+1 ],char_to_index[NS];
main(int argc,char *argv[]) {
FILE *out;
int n=argc,w=256,h=256,sym_count,ch,symbol;
int cum_freq[NS],past,current;
void start_model(int cum_freq[]);
void start_inputing_bits(void);
void start_decoding(void);
int decode_symbol(int cum_freq[]);
void update_model(int symbol, int cum_freq[]);
/* if no arguments print usage */
if(argc== 1)
{
printfC'usage: arithdecode In Out [-W nnn] [-H nnn]\n");
exit(O);
}
/* get args */
while ( ~ n )
{
if ( argv[n][0] == '-')
/* then this is a switch */
{
if(argv[n][l]=='W')
w = atoi( argv[n+l]);
elseif(argv[n][l]=='H')
h = atoi(argv[n+l]);
else
{
printf("Invalid argument\n");
printfC'usage: arithdecode In Out [-W nnn] [-H nnn]\n");
exit(O);
}
}
}
/* open input and output files */
if(!(in=fopen(argv[l],"r"))){
printf("Unable to open input imageVn");
68
exit(O);
}
if( !(out=fopen(argv[2]."w'))) {
printf("Unable to open output image\n");
exit(O);
}
start_model(cum_freq);
,* set up other modules */
startinputingbitsO;
start_decodingO,past=0;
/* The decoding process is very similar to the encoding process. The following for loop decodes the
pixel values. An index symbol is decoded and tt-anslated back to an original symbol. The "past" value
is set to "current," and the statistical model is updated. */
for (n=0;n<(w*h);-+n) {
symbol=decode_symbol(cum_freq):
ch=index_to_char[syTnbol];
current=ch+past-255:
putc(current.out):
past=current;
update_model(symbol,cum_freq);
}
exit(O);
/* loop through pixel values
*/
/* decode next symbol */
/* translate to a difference value*/
/* translate to a pixel value */
/* write out the pixel value */
/* update the model */
}
arithmetic decode.c
/* arithmetic decoding algorithm */
#include <stdio.h>
#include "arithmetic codins.h"
/* current state of the decoding */
static code_value value:
static code_\ alue low, high;
/* currently-seen code value
*'
/* ends of current code region */
/* The "startdecoding" fiinction fills "value" with the first sixteen bits which were ouput by the
arithmetic coder. This value will be in the range of the first symbol encoded.*/
void start_decoding(void) {
int i;
int input_bit(void):
value=0;
for (i=l ;i<=Code_value_bits;i^^) { /* input bits to fill the code value *
value = 2*value+input_bit();
}
low=0:
/* full code range */
high=Top_value;
}
/* decode the next sy mbol */
int decode_symbol(int cum_freq[]) {
long range;
/* size of current code region *
69
int cum:
int symbol;
int input_bit(void);
range=(long)(high-low)-1;
r* cumulative frequency calculated *'
'* symbol decoded *'
/* The following line translates "value" into a value in the range of the "cumfreq" array.*
cum=(((long)(value-low)-l)*cum_freq[0]-l) range;
/* Find cum freq for \alue *
/* The symbol which corresponds to the cumulative frequency \ alue is found in the next line. *
for (symbol=l ;cum_freq[sy mbol]>cum;symbol^-);
/* then find sy mbol. */
/* With the symbol decoded, the range is narrowed and the MSB's are discarded similar to the coding
process. Each time the range is doubled, the MSB of "value" is discarded and a least significant bit
(LSB) is shifted into "value". The value of "value" will remain in the range of the next symbol to be
decoded. The "inputbit" ftinction simply receives bits in the same order in which "outputbit' sent
them to a file. The number of operations and execution time for the decoding process is comparable to
the coding process. */
high=low+(range*cum_freq[symbol-l])/cum_freq[0]-l: /* Narrow the code region */
Iow=low+(range*cum_freq[symbol])'cum_freq[0];
/* to that allotted to this
for (;;) {
/* Loop to get rid of bits */
/* symbol. */
if(high<Half) {} /* nothing */
/* Expand low half */
elseif(low>=Half) {
/* Expand high half */
value -= Half;
/* Subtract offset to top */
low -= Half;
high - - Half;
}
else if (low>=First_qtt- && high<Third_qtr) {/* Expand middle half. */
value -= Firstqtr;
/* Subtract offset to middle. */
low -= Firstqtr;
high -= Firstqtr;
}
/* otherw ise exit loop */
else break;
/* scale up code range */
low=2*low:
high=2*high+l:
/* move in next input bit */
value=2*value+input_bit();
}
return s\mbol:
}
bitinput.c
/* bit input routines */
-include <stdio.h>
^include "arithmeticcoding.h"
extern FILE *in;
/* the bit buffer */
static int buffer;
static int b i t s t o g o ;
/* Bits waiting to be input *
/* Number of bits still in buffer */
70
static int garbage_bits;
/* initialize bit input *
void start_inputing_bits(void) {
bits_to_go = 0;
garbagebits = 0;
}
* Number of bits past end-of-file */
/* Buffer starts out w ith no bits in it. */
* input a bit */
int input_bit(void) {
intt;
if (bits_to_go==0) {
* Read the next byte if no bits are left in buffer. */
buffer=getc(in);
if(buffer==EOF) {
garbage_bits+=l;
/* Return arbitrary bits after
if (garbage_bits>Code_value_bits-2) {/* eof, but check for too */
fprintf(stderr,"Bad input file n"); /* many such. *'
exit(O);
}
}
bits_to_go=8;
}
/* Retum the next bit from the bottom of the byte. *
t=buffer&l;
buffer»=l;
bits_to_go -= 1;
return t;
}
bitoutput.c
#include <stdio.h>
extern FILE *out;
/* the bit buffer */
static int buffer;
static int b i t s t o g o ;
/* initialize for bit output */
void start_outputing_bits(void) {
buffer=0;
bits_to_go=8;
/* Bits buffered for output */
/* Number of bits free in buffer */
* buffer is empty to start with *
}
/* output a bit */
void output_bit(int bit)
{
buffer»=l:
if(bit) buffer 1=0x80;
b i t s t o g o -= 1;
if(bits_to_go==0) {
putc(buffer,out);
/* put bit in top of buffer */
* Output buffer if it is now full */
71
bits_to_go=8;
}
}
* flust out the last bits */
void done_outputing_bits(void) {
putc(buffer»bits_to_go,out);
}
adaptive_model.c
#define Maxfrequency 16383
#defineNS511
static int
freq[NS+l];
/* symbol frequencies */
extern int index_to_char[NS+l],char_to_index[NS];
/* The ranges are subdivided and narrowed using the "cumfreq" array which is closely related to the
symbol probabilities. Each symbol corresponds to a unique array index. The value of each array element
is the cumulation of the FOO of the symbols indexed ahead of the current symbol. Each element of the
"freq" array with a symbol index contains the value of the FOO of the corresponding symbol. All of the
FOO are initialized to 1. No symbol is indexed by the zero element, so this value can be set to 0. All of
the symbols are indexed ahead of "cum_freq[0]," so this element contains the cumulative FOO. The values
of "cum_freq" are accumulated in reverse so that the zero element of "cumfreq" can be used for
normalization purposes.
As each symbol is coded, "freq" and "cumfreq" are sorted with respect to FOO in descending order.
The arrays are sorted so that the symbols can be decoded more quickly and efficiently. When the arrays
are sorted, the indices corresponding to the symbols change. The program should keep track of the indexes
with which the symbols correspond. Two arrays, "chartoindex" and "indextochar." provide this
ftinction. This program was initially written for text compression, so the source symbols are sometimes
reffered to as characters and abbreviated "char." These two arrays are initialized in logical ascending
order. The two arrays remain invertible. In other words, if a character is translated to an index w ith
"chartoindex," the same index can be used to translate back to the original character w ith
"indextochar." The indices are coded with the arithmetic algorithm, and these two arrays are used to
translate back and forth between the original and coded symbols. The original symbols must be
represented by non-negative integers to provide valid indices for "chartoindex". */
/* initialize the model */
void start_model(int cum_freq[]) {
int i;
for (i=0;i<NS;i++) {
/* Set up tables that translate between sy mbol */
char_toindex[i]=i+l; /* indexes and characters. *
index_to_char[i+1 ]=i;
}
for (i=0;i<=NS;i++) {
/* Set up initial initial frequency counts to be one */
freq[i]=l;
/* for all symbols. */
cum_freq[i]=NS-i:
}
freq[0]=0;
/* Freq[0] must not be the same as freq[l]. */
}
/* The statistical model used by the arithmetic coder is basically a histogram of the symbol occurrences.
This model is labeled adaptive and adapts to changing symbol probabilities because the histogram is
72
updated with each symbol encoded. After a symbol is coded, the arrays "chartoindex," "index to char.'
"freq," and "cumfreq" are updated with the "updatemodel" fiinction. The frequency count for the
symbol is incremented in the "freq" array. If the index for the symbol is changed by the sorting operation,
the "char_to_index" and "index_to_char" arrays are updated. If the symbol frequency counts become to
large, an underflow condition will occur. To prevent an underflow condition, the frequency counts are
limited with the "Maxfrequency" value. If the cumulative frequency count reaches "Max frequency." all
of the frequency counts are divided in half. */
/* update the model to account for a new symbol */
void update_model(int symbol,int cum_freq[]) {
int i,cum;
int c h i , chsymbol;
if (cum_freq[0]==Max_frequency) {
cum=0;
/* See if frequency counts are at their maximum */
for (i=NS;i>=0;i-) {/* If so, halve all the counts (keeping them non-zero).
freq[i]=(freq[i]+l)/2:
cum_freq[i]=cum;
cum+=freq[i];
}
}
for (i=symbol;freq[i]=freq[i-l];i-);
/* Find symbol's new index.*/
if(i<symbol) {
ch_i=index_to_char[i];
/* Update the transition tables if*/
ch_symbol=index_to_char[symbol]; /* symbol has moved. */
index_to_char[i]=ch_symbol;
index_to_char[symbol]=ch_i;
char_to_index[ch_i]=symbol;
char_to_index[ch_symbol]=i;
}
freq[i]+=l;
/* Increment the frequency count for the symbol and */
while (i>0) {
/* update the cumulative frequencies. */
i-=i;
cum_freq[i]+=l;
}
73
*/
APPENDIX C: EZW EXAMPLE [12]
In this section, a simple example will be used to highlight the order of operations
used in the EZW algorithm. Only the string of symbols will be showTi. The reader
interested in the details of adaptive arithmetic coding is referred to Chapter III. Consider
the simple 3-scale wavelet transform of an 8 x 8 image. The array of values is shown in
Figure C.l. Since the largest coefficient magnitude is 63. we can choose our initial
threshold to be anywhere in (31.5. 63]. Let TQ = 32. Table C.l shows the processing on
the first dominant pass. The following comments refer to Table C.l:
1. The coefficient has magnitude 63 which is greater than the threshold 32. and is
positive so a positive symbol is generated. After decoding this symbol, the decoder
knows the coefficient in the interval [32, 64) whose center is 48.
2. Even though the coefficient 31 is insignificant with respect to the threshold 32. it
has a significant descendant two generations dovm in subband LH, with magnitude 47.
Thus, the symbol for an isolated zero is generated.
3. The magnitude 23 is less than 32 and all descendants which include (3, -12. -14.
8) in subband HHj and all coefficients in subband HH, are insignificant. .\ zerotree
symbol is generated, and no symbol will be generated for any coefficient in subbands
HH2 and HH, during the current dominant pass.
4. The magnitude 10 is less than 32 and all descendants (-12. 7. 6, -1) also have
magnitudes less than 32. Thus a zerotree symbol is generated. Notice that this tree has a
violation of the "decaying spectrum" hypothesis since a coefficient (-12) in subband HL,
has a magnitude greater than its parent (10). Ne\ ertheless. the entire tree has magnitude
less than the threshold 32 so it is still a zerotree.
5. The magnitude 14 is insignificant with respect to 32. Its children are (-1. 47. -3.
2). Since its child with magnitude 47 is significant, an isolated zero symbol is generated.
6. Note that no symbols were generated from subband HHj which would ordinarily
precede subband HL, in the scan. Also note that since subband HL, has no descendants.
74
the entropy coding can resume using a 3-symbol alphabet where the IZ and ZTR sy mbols
are merged into the Z (zero) symbol.
7. The magnitude 47 is significant with respect to 32. Note that for the future
dominant passes, this position will be replaced with the value 0. so that for the next
dominant pass at threshold 16, the parent of this coefficient, which has magnitude 14. can
be coded using a zerotree root symbol.
During the first dominant pass, which used a threshold of 32, four significant
coefficients were identified. These coefficients will be refined during the first
subordinate pass. Prior to the first subordinate pass, the uncertainty interval for the
magnitudes of all of the significant coefficients is the interval [32. 64). The first
subordinate pass will refine these magnitudes and identify them as being either in inter\ al
[32, 48), which will be encoded with the symbol "0," or in the interval [48. 64). which
will be encoded with the symbol "1." Thus, the decision boundary is the magnitude 48.
It is no coincidence that these symbols are exactly the first bit to the right of the MSB in
the binary representation of the magnitudes. The order of operations in the first
subordinate pass is illustrated in Table C.2.
The first entry has magnitude 63 and is placed in the upper interv al w hose center
is 56. The next entry has magnitude 34, which places it in the lower interval. The third
entry 49 is in the upper interval, and the fourth entry 47 is in the lower inter\al. Note that
in the case of 47, using the center of the uncertaint\ interval as the reconstruction \alue,
when the reconstruction value is changed from 48 to 40, the reconstruction error actually
increases from 1 to 7. Nevertheless, the uncertainty interval for this coefficient decreases
from width 32 to width 16. At the conclusion of the processing of the entries on the
subordinate list corresponding to the uncertainty inter\al [32, 64). these magnitudes are
reordered for future subordinate passes in the order (63, 49. 34, 47). Note that 49 is
moved ahead of 34 because from the decoder's point of view, the reconstruction \ alues 56
and 40 are distinguishable. Howe\er, the magnitude 34 remains ahead of magnitude 47
because as far as the decoder can tell, both have magnitude 40, and the initial order,
which is based first on importance by scale, has 34 prior to 47.
75
The process continues on to the second dominant pass at the new threshold of 16.
During this pass, only those coefficients not yet found to be significant are scanned.
Addifionally, those coefficients previously found to be significant are treated as zero for
the purpose of determining if a zerotree exists. Thus, the second dominant pass consists
of encoding the coefficient -31 in subband LH, as negative significant, the coefficient 23
in subband HH3 as positive significant, the three coefficients in subband HL. that have
not been previously found to be significant (10, 14. -13) are each encoded as zerotree
roots, as are all four coefficients in subband LH2 and all four coefficients in subband HH
The second dominant pass terminates at this point since all other coefficients are
predictably insignificant.
The subordinate list now contains, in order, the magnitudes (63, 49, 34. 47. 31.
23) which, prior to this subordinate pass, represent the three uncertainty intervals [48,
64), [32, 48) and [16, 32), each having equal width 16. The processing will refine each
magnitude by creating two new uncertainty intervals for each of the three current
uncertainty intervals. At the end of the second subordinate pass, the order of the
magnitudes is (63, 49, 47, 34, 31, 23), since at this point, the decoder could have
identified 34 and 47 as being in different intervals. Using the center of the uncertainty
interval as the reconstruction value, the decoder lists the magnitudes as (60. 52. 44. 36,
28, 20). The processing continues alternating between dominant and subordinate passes
and can stop at any time.
76
63
-34
49
10
7
13
-12
7
-31
23
14
-13
3
4
6
-I
15
14
3
-12
5
-7
3
9
-9
-7
-14
8
4
-2
3
")
-5
9
-1
47
4
6
-2
2
3
0
-3
2
3
0
4
2
-3
6
-4
3
6
3
6
5
11
5
6
0
3
-4
4
.1
Figure C.l Example of 3-Scale DWT of an 8 x 8 Image
77
Table C.l Processing of First Dominant Pass at T = 32
Comment
Subband
Coefficient
Symbol
Value
Reconstruction
Value
LL3
63
PS
48
HL3
-34
NS
-48
(2)
LH3
-31
IZ
0
(3)
HH3
23
ZR
0
HL2
49
PS
48
HL2
10
ZR
0
HL2
14
ZR
0
HL2
-13
ZR
0
LH2
15
ZR
0
LH2
14
IZ
0
LH2
-9
ZR
0
LH2
-7
ZR
0
HL,
7
Z
0
HL,
13
Z
0
HL,
3
Z
0
HL,
4
Z
0
LH,
-1
Z
0
LH,
47
PS
48
LH,
-3
0
LH,
-2
z
z
(1)
(4)
(5)
(6)
(7)
78
0
Table C.2 Processing of the First Subordinate Pass
Coefficient
Symbol
Magnitude
Reconstruction
Magnitude
63
1
56
34
0
40
49
1
56
47
0
40
79
PERMISSION TO COPY
In presenting this thesis in partial fulfillment of the requirements for a
master's degree at Texas Tech University or Texas Tech University Health Sciences
Center, I agree that the Library and my major department shall make it freely
available for research purposes. Permission to copy this thesis for scholarly
purposes may be granted by the Director of the Library or my major professor.
It is understood that any copying or publication of this thesis for financial gain
shall not be allowed without my further written permission and that any user
may be liable for copyright infringement.
Agree (Permission is granted.)
JjC'^
^y^^^^SU^
Student's Signature
7-7-?7
Date
Disagree (Permission is not granted.)
Student's Signature
Date
Download