Universal coding with different modelers in data compression

advertisement
Universal coding with different modelers in data compression
by Reggie ChingPing Kwan
A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in
Computer Science
Montana State University
© Copyright by Reggie ChingPing Kwan (1987)
Abstract:
In data compression systems, most existing codes have integer code length because of the nature of
block codes. One of the ways to improve compression is to use codes with noninteger length. We
developed an arithmetic universal coding scheme to achieve the entropy of an information source with
the given probabilities from the modeler. We show how the universal coder can be used for both the
probabilistic and nonprobabilistic approaches in data compression. to avoid a model file being too
large, nonadaptive models are transformed into adaptive models simply by keeping the appropriate
counts of symbols or strings. The idea of the universal coder is from the Elias code so it is quite close
to the arithmetic code suggested by Rissanen and Langdon. The major difference is the way that we
handle the precision problem and the carry-over problem. Ten to twenty percent extra compression can
be expected as a result of using the universal coder. The process of adaptive modeling, however, may
be forty times slower unless parallel algorithms are used to speed up the probability calculating
process. UNIVERSAL CODING WITH DIFFERENT MODELERS
IN DATA COMPRESSION
by
Reggie ChingPing Kwan
A thesis submitted'in partial fulfillment
o f t h e r e q u i r e m e n t s for t he d e g r e e
Of
M a s t e r of S c i e n c e ’
in
Computer Science
MONTANA STATE UNIVERSITY
Boze m a n , M o n t a n a
J u l y '1987
MAIN LJtij
/V372
/<97?
Co
p*
j_j_
APPROVAL
of a thesis submitted by
Reggie chingping Kwan
This thesis has been read by each member of the thesis committee
a n d h a s b e e n f o u n d to be s a t i s f a c t o r y r e g a r d i n g cont e n t , E nglish
usage, format, citation, bibliographic style, and consistency, and is
ready for submission to the College of Graduate studies.
Date
Chairperson,
Faduate Commi
Approved for the Major Department
Date
Head /
Approved for the College of Graduate studies
Date
Graduate Dean
ie
iii
STATEMENT OF PERMISSION TO USE
In
presenting, this
thesis
in
partial
fulfillment
of th e
r e q u i r e m e n t s for a m a s t e r ' s d e g r e e at M o n t a n a S t a t e University,
I
a g r e e t h a t t h e L i b r a r y s h a l l m a k e it a v a i l a b l e to b o r r o w e r s un d e r
rules of the Library.
Brief quotations from the thesis are allowable
without special permission,
provided that accurate acknowledgment of
source is made.
Permission for extensive quotation from or reproduction of this
thesis m a y be granted by m y major professor, or in his/her absence, by
the Director of Libraries when, in the opinion of either, the proposed
use of the material is' for scholarly purposes.
Any copying or use of
t h e m a t e r i a l in t h i s t h e s i s for f i n a n c i a l g a i n s h a l l not be a l l o w e d
without my written permission.
Signature _
'Date
■ iv
TABLE OF CONTENTS
Page
LIST OF T A B L E S .......... ........................... ,...............
vi
LIST OF FIGURES............. .......................................
vii
A B S T R A C T .......... ........................................ '.........
viii
1.
INTRODUCTION TO DATA COMPRESSION............
A Basic Communication System..............................
The Need For Data compression.............................
Data compression and .Compaction - A definition..........
Theoretical B a s i s ..........................................
Components of a Data Compression System...................
Information s o u r c e ..... ................................
Mo d e l e r .........................................
Encoder..................................................
Compressed File.........................................
De c o d e r ...................... ...... .... '................
2.
3.
I
■
I
2
3
3
9
10
10
11
11
12
UNIVERSAL C O D I N G ...............................................
13
Coding Questions and Efficiency...........................
■ Criteria of a Good Compression System ....................
Codes with Combined Modelers and Coders........
Huffman Code for zero-memory Source...................
Huffman Code for Markov S o u r c e ........................
LZW Compression System .................................
Universal C o d e r ..................... ,.i.....................
Idea Origin — Elias C o d e ........ •......... ............
The A l g o r i t h m .......
■ The Precision problem..................................
The carry-over problem ............................
13
14
15
15
18
22
25
25
27
29
29
UNIVERSAL.CODING WITH DIFFERENT MODELERS....................
33
Is There a Universal Modeler?............... '........ 33
Nonadaptive M o d e l s .........................................
Zero-memory Source.-................ -...................
Markov Source...........................................
Adaptive M o d e l s ............................................
Zero-memory Source......................................
Markov S o u r c e ...........................................
LZW C o d e ..................... -...........................
34
35
36
38
38
39
39
.
V
TABLE O F CONTENTS— -Continued
Page
4.
COMPRESSION RESULTS AND ANALYSIS..... ........................
45
Test Data and Testing M e t h o d ..... ........................
Code A n a l y s i s..................... ■.........................
Model A n a l y s i s .................................
The Best c a s e ......I..................................
The Worst c a s e ..........................................
Conclusions.............................
Future R e s e a r c h......................... i.................
45
46
47
48
50
50
51
REFERENCES C I T E D ...............................
53
APPENDICES. . ..........................
56
A.
B.
Glossary of symbols and Entropy Expressons..............
Definitions of C o d e s .........................
57
59
vi
LIST OF TABLES
Table
page
1.
Kikian code....;...........................................
5
2.
Compressed Kikian c o d e .....................................
5
3.
Probabilities of the Markov source in figure 7 ...........
21
4.
The LZW string t a b l e .......................................
24
5.
Compression results for a variety of data typ e s .........
46
vii
LIST OF FIGURES
Figure.
Page
1.
A basic communication s y stem..............................
2
2.
Coding of an information s o u r c e ...........................
4
3. •Code tree of Code A and Code B . ...........................
6
4.
components of a data compression s y stem.......:.........
9
5.
Reduction p r o c e s s ..... '............. ................. .....
17
6.
splitting p r o c e s s .........
17
7.
State diagram of a first-order Markov source............
18
8.
Encoding state d i a g r a m ................. '..................
20
9.
Decoding state d i a gram..............
20.
10.
A compression example for LZW algorithm..................
24
11.
A decompression example of figure 1 0 .....................
24
12.
Subdivision of the interval [0, 1] for
source sequences...........................................
26
subdivision of the interval [0, I]. with
the probabilities in binary digi t s ........
26
14.
The encoding process of
the universal encoder........
30
15.
The decoding process of
the universal decoder............
30
16.
An example of a model f i l e ......... ................. '....
36
17.
An example of a model file for a first-order
Markov s o u r c e ...............................................
37
An example of an adaptive modeler for
zero-memory s o u r c e ..................'......................
40
An example of an adaptive modeler for
a first-order Markov s o urce...............................
41
An example of converting the LZW table
into a m o d e l e r ............. .................. ■......... .
43
Subclasses of c o d e s ........................................
59
13.
18.
19.
20.
21.
viii
ABSTRACT
In d a t a c o m p r e s s i o n s y s t e m s , m o s t e x i s t i n g c o d e s h a v e i nteger
code length because of the nature of block codes.
improve compression
One of the ways to
is to use c o d e s w i t h n o n i n t e g e r
length.
We
developed an arithmetic universal coding scheme to achieve the entropy
of a n
information
modeler.
source
w ith the given probabilities
from
t he
W e s h o w h o w t h e u n i v e r s a l c o d e r can b e u s e d for both t he
probabilistic and nonprobabiIistic approaches in data compression,
to
avoid a model file being, too large, nonadaptive models are transformed
i n t o a d a p t i v e m o d e l s s i m p l y b y k e e p i n g the a p p r o p r i a t e c o u n t s of
symbols or strings.
The idea of the universal coder is from the Elias
code so it is quite close to the arithmetic code suggested by Rissanen
and Langdon.
T h e m a j o r d i f f e r e n c e is t h e w a y t h a t w e h a n d l e t h e
precision p r o blem and the carry-over problem.
Ten to twenty percent
extra compression can be expected as a result of using the universal
coder.
The process of adaptive modeling, however,
m a y be forty times
slower unless parallel algorithms are used to speed up the probability
calc u l a t i n g process.
I
Chapter I
INTRODUCTION TO DATA COMPRESSION
■•
A Basic Communication System
C o m m u n i c a t i o n s a r e u s e d to p r o v i d e t e r m i n a l u s e r s a c cess to
computer systems that m a y be remotely located, as well as to provide a
m e a n s for c o m p u t e r s t o c o m m u n i c a t e w i t h other c o m p u t e r s
for t h e
purpose of load sharing and accessing the data and/or programs stored
at a d i s t a n t c o m p u t e r .
T h e c o n v e n t i o n a l c o m m u n i c a t i o n s y s t e m is
modeled by:
1.
An information source
2.
An encoding of this source
3.
A channel over, or through, which the information is sent
4.
A noi.se (error) s o u r c e that is a d d e d t o t h e s i g n a l in t h e
channel
5.
A decoding and hopefully a recovery of the original informat­
ion from the contaminated received signal
6.
A sink for the information
T h i s m o d e l , s h o w n in f i g u r e I, is a s i m p l i f i e d v e r s i o n o f a ny
communication systems.
In
(3),
w e are particularly
interested with
sending information either from n o w to then (storage), or from here to
there (transmission).
2
T
<^JsourceTj>— *
(I)
Figure I.
Encode
Channel
(2 )
(3)
Decode
-1Cjsink
(4)
(5)
A basic communication system.
The Need For Data Compression
There are three major reasons that we bother to code:
1.
Error detection and correction
2.
Encryption
3.
Data compression
To r e c o v e r f r o m noise, p a r i t y c h e c k and H a m m i n g (4,7) co d e s [ I ]
a r e i n t r o d u c e d t o d e t e c t a n d c o r r e c t e r r o r s respectively.
unauthorized receivers,
system [2] are used.
To a v o i d
encryption schemes like public key encryption
Finally, to reduce the data storage requirements
a n d / o r the d a t a c o m m u n i c a t i o n costs, t h e r e is a n e e d t o reduce th e
s i z e of v o l u m i n o u s a m o u n t s o f data.
[3],
Huffman
employed.
[3],
LZW
[4],
Thus, c o d e s like S h a n n o n - F a n n o
and A r i t hmetic
[5]
and
[ 6 ] etc.
ar e
3
D a t a c o m p r e s s i o n t e c h n i q u e s a re m o s t useful for large archival
files,
I/O bound systems with spare CPU cycles,
and transfer of large
files over long distance communication links especially when low band
width is available.
This
paper
discusses
several
data
compression
systems
and
develops a universal coder for diffferent modelers.
Data Compression and compaction - A definition
D a t a c o m p r e s s i o n / c o m p a c t i o n c an be d e f i n e d a s t h e p r o c e s s of
creating
a
condensed
version
redundancies in the data.
of the
source
data
by
reducing
In general, data compression techniques can
be divided into t w o categories —
nonreversible and reversible,
in
nonreversible techniques (usually called data compaction), the size of
the physical
r e p r e s e n t a t i o n o f d a t a is r e d u c e d w h i l e a s u bset of
"relevant" information is preserved, e.g. CCC algorithm
compaction
considered
[ 8 ],
In
r e l evant,
reversible
techniques,
all
[7], and Rear-
information
and compression/decompression
reco v e r s
is
th e
original data, e.g. L Z W algorithm [4].
Theoretical Basis
T h e e n c o d i n g o f an i n f o r m a t i o n s o u r c e can be i l l u s t r a t e d by
figure
2.
4
sI
X1
\
S2
>
x2
Information
X
S
>
source
— > Xj^
X^ is a sequence of
•
corresponding to S^
•
/
/
xr
Sq
Figure 2.
Source
Code
alphabet
alphabet
Code word
Coding of an information source.
In fact, f i g u r e 2 is the c o m p o n e n t la b e l e d (2) in figure I.
The
simplest and most important coding philosophy in data compression is
to assign short code words to those events with high probabilities and
long code words to those events with low probabilities [3].
An event,
however, can be a value, source symbol or a group of source symbols to
be encoded.
T a b l e s I a n d 2 a r e u s ed to i l l u s t r a t e the idea to a c h i e v e d a t a
compression when the probabilities are known,
Kikian
of
a
very
primitive
tribe
Kiki.
consider the language
The
Kikian
can only
communicate in four words # = kill, @ = eat, z = sleep, and & = think.
T h e s o u r c e a l p h a b e t S = { #, @, z, & }.
A s s u m e an a n t h r o p o l o g i s t
s t u d i e s K i k i a n a n d he e n c o d e s th e i r d a i l y a c t i v i t i e s w i t h a b i n a r y
code.
Then,
the c o d e a l p h a b e t X = { 0, I }.
Without
k n o w i n g the
5
probabilities
of
the
tribe's
activi t i e s ,
or
source
symbols,
we
constructed the code for the source symbols as table I.
With the probabilities of each activity,
or symbol,
being known,
we can employ our coding philosophy to construct a n e w code, as shown
in table 2.
Table I.
Kikian code.
Code
Symbol
Table 2.
#
00
@
01
Z
10
&
11
Compressed Kikian code.
Symbol
Probability
Code B
#
0.125
HO
@
0.25
10
z
0.5
0
&
0.125
111
6
The average length L of a code word using any code is defined as
L=
Y
q
j
where q
i = l
IC(Si)| * P(Si)
(1-1)
= number of source symbols
Si
= source symbols
C(Si )
= code for S i
Ic(Si)]
= length of the code for symbol S i
P(Si )
= probability of symbol
Si
The average length L a of a code word using code A is obviously 2
bits because
Jc(Si)|
= 2 for all i a n d
P ( S i ) is a l w a y s one.
On t he
o t h e r hand, t h e a v e r a g e l e n g t h L 0 of a c o d e w o r d u s i n g c o d e B is 1.75
bits, calculate as
L 0 = 0.125 * 3 + 0.25 * 2 + 0.5 * I + 0.125 * 3.
T h a t is, for our e n c o d i n g s y s t e m for the K i k i a n w e h a v e fou n d a
m e t h o d u s i n g an a v e r a g e of o n l y 1.75 b i t s per s y m b o l ,
bits per symbol.
rather t h a n 2
Both codes, however, are instantaneous (see Appendix
B ) , a n d t h e r e f o r e u n i q u e l y decodable.
It is b e c a u s e all code w o r d s
f r o m b o t h c o d e s a r e f o r m e d by t he l e aves of their c o d e tree as in
f i g u r e 3.
Code B
Code A
A
/ \
A
/ \
/
\
0
/
/ \
\
0
I
I
/ \
/ \
0
10
1
#
@
&
#
0
\
I
z
@
/ \
0 I
z
Figure 3.
Code trees of Code A and Code B.
&
7
A t y p i c a l d a y o f t h e K i k i a n c o u l d b e sleep, eat,
sleep, think, eat, a n d sleep.
If a K i k i a n has a diary, z @ z # z & @ z w i l l
be the symbols appear on'a typical day.
book,
it
would
be
1001100010110110
probabilities of different activities,
known, however,
sleep, kill,
On the anthropologist's note
without
considering
the
with the probabilities being
the daily activities of the Kikian can be represented
by 01001100111100.
In other words, to represent the s a m e information,
t h e d a i l y a c t i v i t i e s of t h e Kikian, w e c an either, u s e 16 bits or 14
bits.
C o d e b , in t h i s case, is c l e a r l y b e t t e r t h a n c o d e A even t h o u g h
they are both uniquely decodable which is. the major criteria in data
compression.
Nevertheless, at this point w e cannot conclude if code B
gives us the best compression for the given probabilities,
though it
turns out to be true.
To d e t e r m i n e w h a t is t he b e s t c o m p r e s s i o n w e c a n get for the
g i v e n p r o b a b i l i t i e s , let us look at t he i n f o r m a t i o n c o n t e n t of each
e v e n t first.
Let p(S^) b e the p r o b a b i l i t y o f t he s y m b o l Sj_.
We can
s a y t h a t t h e i n f o r m a t i o n c o n t e n t of Sj_ is:
I ( S i ) = - log P ( S i )
(1.2)
bits of information if log base two is used [3].
Code B uses only one bit to represent z and three bits to represent
@ because the information content of z is less than @.
I(z) = - I o g 2 p(z) = -Iog 2 0.5
= I bit
I(@) = -Iog 2 p(@) = -Iog 2 0.125 = 3 bits
8
Intuitively,
if
an
event
is
highly
unlikely
to
happen
a nd
happens, it deserves m o r e attention or it contains m o r e information.
To determine if a code has the shortest average code length L, we
h a v e t o c o m p a r e L w i t h t h e e n t r o p y H(S).
If s y m b o l s^ occurs,
we
obtain an amount of information equal to
. I(Si ) = - Iog 2 P(Si ) bits
T h e p r o b a b i l i t i e s t h a t S i is P ( S i ), so t he a v e r a g e a m o u n t of
information obtain per symbol from the source is
P(Si ) I(Si )
The quantity,
bits
the average a m o u n t
of
i n f o r m a t i o n p er
source
symbol, is called entropy H(S) of the source
H(S) = - ]Tq i=l P(Si ) Iog 2 P (Si )
bits
(1.3) .
In c o d e B, L b = H(S) = 1.7-5 bits, a n d t h e r e f o r e c o d e B p r o v i d e s
the best compression with the given probabilities according to Shannon
First Theorem
H r (S) =< L < H r (S) + I.
(1.4)
The proof of the above theorm can be found in any information or
c o d i n g book.
It is o b v i o u s that t he l o w e r b o u n d o f L is t h e entropy.
T h e u p p e r b o u n d o f L, o n t h e o t h e r hand,
n a t u r e o f b l o c k coded..
ca n b e e x p l a i n e d b y t h e
T h e l e n g t h o f a n y b l o c k c o d e is a l w a y s an
i n t e g e r , e v e n t h o u g h t h e i n f o r m a t i o n c o n t e n t is n o t n e c e s s a r i l y an
integer because
P ( S i ) I.
(C(Si I)J f o r m (1.1) is c a l c u l a t e d as
(C(Si )J = |log
9
Components of a Data Compression System
The fundamental components of a data compression system are:
1.
Information source
2.
Modeler
3.
Encoder
4.
Compressed file
5.
D e c oder.
In a d i g i t a l i m a g e c o m p r e s s i o n sy s t e m , a p r e d i c t o r m a y b e used
w i t h the m o d e l e r to c a p t u r e h o p e f u l l y not s o o b v i o u s redundancies.
For the purpose of our discussion,
encoders and decoders.
we would concentrate on modelers,
Components of a data compression system are
shown in figure 4.
Information
Modeler
Encoder
source
Model
Compressed
Decoder
Figure 4.
Modeler
Components of a data compression system.
10
Information Source
For communication purposes,
of a r a n d o m process
{s(i)}
that e m i t s o u t c o m e s of r a n d o m variables
S(i) in a never ending stream.
t h e integers.
an information source can be thought
The time index i then runs through all
In a d d i t i o n t o t he r a n d o m v a r i a b l e s S(i), t he p r o c e s s
also defi n e d all the finite
joint v a r i a b l e s (S(i),
S( j),
...). T h e
most frequently studied i n f ormation s o u r c e s a r e e i t h e r independent,
or Markovian.
A source is said to be independent if p(S^) = p(Sj_|Sj), where, j =
l,...,q.
Therefore,
it is often called zero-memory source.
A m o r e general type of information source with q symbols is one
in which the occurrence of a source symbol Si m ay depend on a finite
number m of preceding symbols.
Such a source (called Markovian, or an
m th_order Markov source) is specified by giving the source alphabet S
and the set of conditional probabilities
PtSi ISj1 ,Sj2/•••/Sjm )
for i=l,2,...,q; jp=l,2,...,q
(1.5)
For an m ^ ^ - o r d e r M a r k o v source, t h e p r o b a b i l i t y o f a s y m b o l is
known if we know the m preceding symbols.
Modeler
.The
•
modeler
is
responsible
for
redundancies of the information source.
capturing
Designated
the
"designated"
redundancies can
be viewed as the frequency measures used in the coding process.
In a straight forward data compression system like Huffman code
[3],
the modeler simply counts the occurrence of each symbol S i , for
11
i=l,2,...,q.
In o t h e r m o r e s o p h i s t i c a t e d systems, e.g. L Z W algorithm
[4], the modeler is responsible for putting n e w strings in the string
table.
Though the algorithm does not attempt to subdivide the process
i n t o m o d e l i n g a n d codi n g ,
m o d e l i n g and coding are actually done
separately and will be shown in chapter
Depending
on
how
3.
the probabilities
ar e
calculated,
we
c an
distinguish between t w o cases: the adaptive and the nonadaptive models
d i s c u s s e d in d e t a i l in c h a p t e r 3.
file
is
generated
as
in
figure
In a n o n a d a p t i v e m o d e l ,
4.
The model
file
a model
c o n t a i n s th e
necessary probabilities for both the encoder and the decoder.
Encoder
The
encoder
encodes
the
symbols
p r o b a b i l i t i e s g i v e n b y t h e mode l e r .
philosophy,
symbols
or
strings
T he b a s i c
data
with
their
compression
with higher probabilities are assigned with the
shorter code words, applies here.
The encoder also ensures the codes
to be uniquely decodable for reversible compression.
In our d a t a c o m p r e s s i o n s y s t e m , a ll th e e n c o d e r n e e d s f r o m th e
modeler are the probabilities of all symbols' and the previous symbols.
The whole encoding process will be adding and shifting.
Compressed File
T h e c o m p r e s s e d f i l e c an be t h o u g h t of a file w i t h a s e q u e n c e of
c o d e w o r d s or c o d e alph a b e t s.
W i t h X t he set of t h e c o d e alphabets,
the compressed file x should be simply a subset of X*.
12
Decoder
The decoding process also involves the modeler regardless of the
nature of the modeler
( a d aptive or nonadaptive).
If n o n a d a p t i v e
modeler is used, a model file is required .with the compressed file to
provide all the necessary statistics.
In our
data
compression system,
th e d e c o d e r
goes
subtract and shift process to recover the original s o u r c e .
through a
13
Chapter 2
UNIVERSAL CODING
Coding Questions and Efficiency
The three major questions in coding theory are:
1.
H o w good are the best codes?
2.
H o w can we design "good" codes?
3.
H o w can we decode such codes?
In data compression systems, the first question has been answered
by
Shannon's
fi r s t
theorem
described
in
t he
previous
chapter.
Shannon's f i r s t t h e o r e m s h o w s t h a t t h e r e exis t s a c o m m o n y a r d s t i c k
w i t h w h i c h w e m a y m e a s u r e a n y i n f o r m a t i o n source.
The value of a
symbol (or string in s o m e cases) from an information source .S m a y be
m e a s u r e d in t e r m s of an e q u i v a l e n t n u m b e r of r - a r y d i g i t s n e e d e d to
represent one symbol/string from the source; the theorem says that the
average value of a symbol/string from S is H(S).
Let the average length of a uniquely decodable r-ary code for the
s o u r c e S b e L.
H r (S).
By Shannon's first t h e o r e m ,
L c a n n o t be less t h a n
Accordingly, we define E, the Efficiency of the code, by
E = H r (S) / L
(2.1)
W e a r e t r y i n g to a n s w e r th e o t h e r t w o q u e s t i o n s in this thesis.
In o r d e r t o a n s w e r t h e m ,
w e need to study several existing data
14
compression systems because there are no absolute answers for the t w o
questions.
B e f o r e w e d o that, w e s h a l l look at w h a t d o w e m e a n by
"good" code in the next section.
Criteria of a Good Data Compression System
A g o o d d a t a c o m p r e s s i o n s y s t e m h as to h a v e t h e f o l l o w i n g thr e e
qualities;
1.
the codes it produces have to be decodable.
2.
the average code length should be as close to the entropy as
possible, or E - 1.0.
3.
the coding (encoding and decoding) process is done in "accep­
table" time.
Decodability can easily be achieved by constructing all the code
w o r d s b y t h e l e a v e s of t h e c o d e t r e e in t h e c a s e of b l o c k codes.
We
shall develop ways to ensure decodability in later sections.
L a t e r in t h e c h a p t e r , w e d e s c r i b e a u n i v e r s a l c o d i n g te c h n i q u e
that guarantees the average code length to be the s a m e as the entropy
for the given probabilities provided by the modeler.
That is to say,
t h e e f f i c i e n c y o f t h e c o d e e q u a l s o n e a c c o r d i n g to (2.1).
looking
at
the
efficiency
of
codes>
we
can
also
compression ratios defined by
■PC = ' ( IOF I -
|CF|) /
IOF I
where PC = Percentage of compression,
OF = Original File, and
CF = Compressed File.
(2.2)
Be s i d e s
compare
their
15
T h e r e is n o a b s o l u t e w a y to m e a s u r e "acceptable" time.
Some
c o d e s a r e r e l a t i v e l y m o r e a c c e p t a b l e t h a n others, w h i l e it is c l e a r
that t h e r e is a t r a d e - o f f b e t w e e n c o m p r e s s i o n r a t i o a n d time.
Th e
l o n g e r t h e m o d e l e r a n a l y z e s t h e i n f o r m a t i o n s o u r c e t h e s m a l l e r t he
compressed file is going to be.
Codes with Combined Modelers and Coders
A lot of data compression systems do not separate their processes
into, modeling and coding.
In this chapter, we look at several popular
data compression systems and then discuss the universal coding scheme
that w e use.
Huffman Code for- zero-memory Source
Huffman
code
[1]
is
o ne
of
the
variable-length
enforces the data compression philosophy —
to h a v e t h e
shortest
s y m b o l s S]_, S 2 Z - w
the
code
X
with
e n c o dings,
consider
the
source S with the
Sq a n d s y m b o l p r o b a b i l i t i e s P 1 , p 2 , ..., p q >
the
that
the most frequent symbols
s y m b o l s X 1 , x 2 , ..., x r .
Huffman code can be divided into t w o processes,
splitting.
codes
Let
T h e g e n e r a t i o n of
namely reduction and
The reduction process involves:.
a.
Let t h e s y m b o l s b e o r d e r e d t h a t P 1 >= p 2 >= ... >= Pq .
b.
Combine the last r symbols into one symbol.
c.
R e a r r a n g e t h e r e m a i n i n g s o u r c e c o n t a i n i n g q ^ r s y m b o l s in
d e c r e a s i n g order.
16
d.
R e p e a t (c) a n d (d) u n t i l t h e r e a r e less t h a n or eq u a l to r
symbols.
After the reduction process,
w e then go backward to perform the
splitting process:
a.
List the r code symbols in a fixed o r d e r .
b.
Assign the r symbols to the final reduced source.
c.
go backward with the one of the symbols splits into r symbols.
d.
Append the r code symbols to the last r source symbols.
e.
Repeat (c) and (d) until there are q source symbols.
Figures 5 and
6 are
t h e r e d u c t i o n a n d s p l i t t i n g p r o c e s s e s o f an
e x a m p l e w i t h S = ( S 1 , S 2 , S 3 , S 4 , S 5I, P 1 = 0.4, P 2 - 0.2, P 3 = 0.2,
P 4 = 0.1, P 3 — 0.±, a n d X — {0, I }•
The Huffman code for S is formed where C(S1 ) = I, C(S2 ) = 01, and
so on.
S i n c e t h e l e n g t h s of t h e c o d e s a re not n e c e s s a r y the same,
Huffman
code
is a p e r f e c t
e x a m p l e o f v a r i a b l e - l e n g t h code.
The
average code length of the Huffman code:
L = 1(0.4) + 2(0.2) + 3(0.2) + 4(0.1) + 4(0.1)
= 2.5 bits/symbol.
The entropy of the above source:
H(S) = (0.4)log(1/0.4) + 2 * (0.2)log(l/0.2) + 2 * (0.l)log(l/0.I)
=
2.12
bits/symbols.
The efficiency of the code:
E = 2.12 / 2.5
0.848
17
Original source
Symbols
Reduced source
probabilities
51
0.4
52
0.2
s3
0.2
54
0.1
55
0.1
J
First
Reduction
Reduction
P1
Reduced source
Code
s2
sI
I
0.4
I
s2
0.2
01
0.2
01
s3
0.2
000
0.2
000
s4
0.1
OOlOl
0.2
001
s5
0.1
OOllj
Figure 6.
Reduction
Reduction process.
0.4
sI
Third
Second
Original source
Symbols
S2
)
Original
Figure 5.
S2
sI
/
splitting process
,
s3
0.4
1>
0.6
0
0.4
00
0.4
I
0.2
01
18
Huffman Code for Markov source
T h e b e s t w a y to i l l u s t r a t e th e b e h a v i o r of a m t h _ order M a r k o v
source defined in chapter I is through the use of state diagram,
state d i a g r a m
in a
w e r e p r e s e n t e a c h of th e q m p o s s i b l e s t a t e s o f the
source by a single circle,
and the possible transitions from state to
state by arrows as in figure 7.
As an example of a simple one-memory
source, suppose we have an alphabet of three symbols, a, b, and c (S =
{a, b, c}).
Let the probabilities be:
p(a|a) = 1/3
p(b|a) = 1/3
P(c|a) = 1/3
p(a|b) = 1/4
P(b|b) = 1/2
P(c|b) = 1/4
p(a|c) = 1/4
p(b|c) = 1/4
p(c|c) =
1/2
formed
by
By
solving
the
following
equations
probabilities,
Pa
* P(a|a)+
Pa
* P(b|a) + pb * p(b|b) + pc * p(b|c)
=
Pa
* P(c|a)+ pb * p(c|b) + pc * p(c|c)
= pc
Pb
* P(a|b) +
Pc
* p(a|c)
= pa
pb
Pa + Pb + Pc = I
we get
Pa = 3/11,
Figure 7.
p b = 4/11,
pc = 4/11.
State diagram of a first-order Markov source.
the
above
19
T h e r e are, of course, t h r e e s t a t e s (31 ) in this case, l a b e l e d a,
b,
and
c and
transition
indicated by
the circles.
Each d i r e c t e d
f r o m o n e s t a t e to a n o t h e r state,
line
is a
w h o s e p r o b a b i l i t y is
indicated by the number alongside the line.
For e a c h s t a t e in the M a r k o v s y s t e m w e can u se th e a p p r o p r i a t e
Huffman code obtained from the corresponding transition probabilities
for l e a v i n g that state.
T h e g e n e r a t i o n of the c o d e s is ex a c t l y the
s a m e as the zero-memory source at each state.
By the same token,
we
c a n b r e a k d o w n a n y o r d e r M a r k o v so u r c e into q m z e r o - m e m o r y states.
For state a w e get the binary encoding
a
=
I
Pa = 1/3
b =
00
Pb
c =
01
Pc = I/3
La = 1.67.
=
1/3
For state b,
a =
10
Pa = V 4
b =
0
Pb
=
1/2
c =
11
Pc
=
1/4
Lb
= 1.5.
For state c,
11
Pb = 1/4
c =
0
LC
= 1.5.
Ii
M
b =
Il
10
O?
\
I—I
a =
The average code
the above Markov source
lM = Pa * La + Pb * Lb + Pc * Lc = 1 ^55 bits/symbols.
A s s u m i n g t h e s t a r t i n g s t a t e is a, t h e e n c o d i n g p r o c e s s c an be
shown by another state diagram as in figure 8.
20
Figure 8.
Encoding State diagram.
To encode,
for in s t a n ce, t h e s o u r c e s = b a a b c c c c c c , the resu l t
w i l l be 0 0 1 0 1 0 0 1 1 0 0 0 0 0 .
To d ecode, w e u s e a n o t h e r s t a t e d i a g r a m as
shown in figure 9.
I
Figure 9.
Decoding State diagram.
21
Now,
let us calculate the efficiency of the Huffman code for the
a b o v e M a r k o v source.
T h e e n t r o p y H(S) of a Hit^1- O rder M a r k o v s o urce
[1] is
H(S) —
^ P(S ^ S j2/ • • •/Sjj^) * log (l/(pg^ |pjj,pj 2, . .. ,Pjj^))
The relevant probabilities are in table 3.
The entropy is calculated
u s i n g (2.4):
H(S) = I
P(S j , p s i ) * log (!/(P(Si ISj ))
= 1.42 bits /symbol.
The efficiency is calculated using (2.1)
E = 1.42/1.55 = 0.92
Table 3.
Probabilities of the Markov source in figure 7.
P(Si ISj )
P(Sj)
(2.4)
P ( S izS j )
sj
Si
a
a
1/3
3/11
1/11
a
b
1/3
3/11
1/11
a
C
1/3
3/11
1/11
b
a
1/4
4/11
1/11
b
b
1/2
4/11
2/11
b
C
1/4
4/11
1/11
C
a
1/4
4/11
1/11
C
b
1/4
4/11
1/11
C
C
1/2
4/11
2/11
22
LZW Compression System
L Z W compression algorithm
data compression.
[4]
is a nonprobabilistic approach to
T h e c o d i n g p r o c e s s is a s t r i n g m a t c h i n g p r o c e s s
w h i c h c a p t u r e s h i g h u s a g e p a t t e r n s (strings)
[10].
s u g g e s t e d in [ 9 ] a n d
C o n t r a r y to the p r e v i o u s t w o H u f f m a n codes, L Z W is a t y p i c a l
fixed length code in which each code represents different numbers of
source symbols.
Welch's modification [4] of Lempel and ziv's idea in [9] and [10]
m a k e s it a lot e a s i e r t o i m p l e m e n t .
W e i m p l e m e n t e d the L Z W d a t a
c o m p r e s s i o n s y s t e m a c c o r d i n g to th e a l g o r i t h m g i v e n in [4].
The
results will be discussed in chapter 4.
The most
important feature in L ZW algorithm is the string table
which holds all the strings w e encounter in the source file.
The only
time that we output a code is when a particular symbol/string has been
seen twice.
The following is the main idea of the encoding process:
1.
Initialize the string table to contain single character strings.
2.
W ^ _ the first character in the source file.
3.
Ki—
4.
If the concatenation of WK is in the table,
the next character in the source file.
W * _ WK,
else
output the code for W,
put WK in the table,
W—
5.
K.
If not end of file, goto 3.
23
The basic idea of the decoding process can be simply as follows:
1.
Initialize the string table to contain single character
strings.
2.
W ^ _ decode the first code in the compressed file.
3.
CODE-*—
4.
If CODE exists,
K -+—
the next code in the compressed file.
decode CODE,
else
— decode CODE as WjW
5.
W-»—
K.
6.
If there are more codes, goto step 3.
T a b l e 4 is t h e s t r i n g t a b l e for t h e e x a m p l e in f i g u r e 10.
T he
s a m e table is re-produced by the decoding process in figure 11.
The
efficiency of the L Z W code is a little harder to calculate because the
code
probabilities
of
symbols/strings are involved, they are not used in a direct sense.
We
know,
is
a
nonprobabilistic
however,
code.
Though
t h a t t h e l o w e r b o u n d of the c o d e is t he e n t r o p y by
Shannon first theorem (1.4).
Welsh in [4] for instance,
Let us take the 12-bit code suggested by
in the beginning of the encoding process,
single symbols are represented by
12
bits while their information is
I (single symbol) = - log p(single symbol)
= - log 1/128
= 7 bits.
It is because the table is initialized with all the ASCII characters
and there are 128 of them.
The algorithm assumes them to be equally
24
Table 4.
The LZW string table.
code
string
a
b
I
d
2
3
4
ab
ba
aa
aac
cd
da
aba
abab
5
6
7
8
9
10
11
12
C
b
a
Output
Code
I
2
I
5
a
a
7
Decompressed
Data
String Added
To Table By
Decoder
b
C
d
a
b
a
b
a
d
3
4
b
a
b
b
a
a
11
5
11
9
8
12
10
A compression example for LZW algorithm.
I
2
I
7
3
4
5
11
2
V
V
V
V
V
V
V
V
V
a
b
a
aa
C
d
ab
aba
b
5
9
7
6
Figure 11.
a
a
a
c
7
6
Compressed
Codes
a
b
a
a
aa
d
a
ab
a
ab
aba
a
Figure 10.
K
C
Input
Symbols
New String
Added to
Table
W
8
11
10
A decompression example of figure 10.
12
25
p r o b a b l e s o t h e i r p r o b a b i l i t y is 1/128.
Si n c e t h e L Z W c o d e a nd all
the others w e investigated are concatenation codes w i t h integer code
lengths, it leads us to believe that in order to construct a code with
efficiency equal to one w e have to use nonblock.codes.
Universal Coder
Both the encoder and the decoder need the probabilities given by
the modeler.
Thus,
a universal encoder needs only the probabilities
to produce uniquely decpdable codes.
The universal decoder also needs
the s a m e probabilities to reproduce the source file.
The Elias code
[3] is the first code that tries to reflect the exact probabilities of
the symbols/strings.
Rissanen and Langdon in [5],
[6],
[10], and [11]
g e n e r a l i z e t h e c o n c e p t o f the E l i a s c o d e to e m b r a c e ot h e r types of
nonblock codes called arithmetic codes.
Idea Origin - Elias code
Elias code assigns each source sequence to a subdivision of.the
interval
[0, I]. T o i l l u s t r a t e t h e c o d i n g p r o c e s s , c o n s i d e r a z e r o
m e m o r y source with t w o symbols a and b, occurring with probabilities
0.7 a n d 0.3 r e s p e c t i v e l y .
W e m a y r e p r e s e n t an a r b i t r a r y i n f i n i t e
s e q u e n c e of p o s s i b l e s o u r c e f i l e as f i g u r e 12.
figure,
A c c o r d i n g t o -the
sequences starting with "a" are assigned to the
interval
[0,
0.7]; s e q u e n c e s s t a r t i n g w i t h "aa" a r e a s s i g n e d to t h e i n t e r v a l [ 0,
0.49];
a n d so on.
T o e n c o d e t he s e q u e n c e f r o m t he s o u r c e w e s i m p l y
26
|s| = I
0
|s| = 2
0.7
I
I
aa
0
Figure 12.
0.49
I
ba
0.7
I
bb
0.91
I
1.0
Subdivision of the interval [0, 1] for source sequences
I
I
o
I
I
I
0.5
0
00
O
Figure 13.
ab
1.0
I
01
0.25
I
1.0
10
0.5
I
11
0.75
I
1.0
Subdivision of the interval [0, 1] with the probabilities
in binary digits.
p r o v i d e a b i n a r y e x p a n s i o n for e a c h p o i n t in [0, 1] as in figure 13.
Thus,
a source that starts with "ba" will have acompressed file that
starts
with
subtraction.
"10".
Decoding
is c l e a r l y
a process
of
successive
If the b i n a r y s e q u e n c e s t a r t s w i t h "1011", the point
r e p r e s e n t e d m u s t lie b e t w e e n 0.6875 a n d 0.75; h e n c e t he first th r e e
symbols from the source must be "bab".
27
The Algorithm
With the idea from Elias,
w e developed an arithmetic code which
is s i m i l a r t o t h e o n e d e s c r i b e d in [10].
T he d i f f e r e n c e is the w a y
that we handle the precision problem and the carry-over problem.
The algorithm that we develop accepts any type of input source as
long as w e know the size of the source alphabet S, and the symbols are
a r r a n g e d in l e x i c a g r a p h i c or a n y f i x e d order.
T h e b a s i c idea of our
c o d e for a n y s y m b o l Sj is t he c u m u l a t i v e p r o b a b i l i t i e s b e f o r e S j,
P ( S 1 ) + P ( S 2 ) + ... + p ( S j _ 1 ).
T h e c o d e c an b e d e f i n e d r e c u r s i v e l y by
the equations
C(s S 1 ) = C ( s ) , C(s S i ) = C(s) + P(Si_1 ).
(2.5)
Both t h e e n c o d e r a n d d e c o d e r r e q u i r e t w o b i n a r y r e gisters,
C a nd A.
T h e f o r m e r a c t u a l l y s h o u l d be a b u f f e r a nd the l a t t e r of w i d t h n (n
w i l l be discussed
in t h e s e c t i o n precision).
T h e c o d e s a re h e l d
t e m p o r a r i l y in t h e C re g i s ter, w h i l e t he c u m u l a t i v e p r o b a b i l i t y is
held
in t h e a d d e n t A.
C r e p r e s e n t s t h e e x a c t p r o b a b i l i t y o f t he
s o u r c e f i l e s w i t h an i m a g i n a r y d e c i m a l p o i n t in t he front o f the
compressed file.
The following is the encoding algorithm:
1.
Initialization: C
2.
S1
3.
A
4.
Leftover * —
5.
If Leftover < 0 then Leftover
6.
Shift A by Sht bits from the end of C.
7.
C
8.
Goto 2 until the end of the source file.
4—
read from the source file.
P ( S i_ 1 ), Sht —
-4
—
all zero, Shift, previous. Leftover-*- 0.
I -log Previous].
Leftover - noninteger part of (-log previous).
I - Leftover, Sht
C + A, Previous-+- P(Si ).
Sht +
( |c| = 0 initially)
I.
28
T h e e n c o d i n g p r o c e s s is b a s i c a l l y s h i f t i n g a nd adding.
token,
By the s a m e
the decoding process should be subtracting and shifting:
1.
Initialization:
previous
I, Leftover—
0.
C-*— code from the compressed file.
2.
i—
2.
3.
If C - Previous * P(Si ) < 0
then decode as symbol S i-J/
previous
S
h
t
p(Si_ 1 ),
I(-log Previous) |,
Leftover
Leftover - noninteger part of (-log Previous).
If Leftover < 0 then Leftover^—
Sht —
I - Leftover,
Sht + I.
Shift C Sht bits to the left.
Goto (2).
else
i
i + I/
Goto (3).
T h e C r e g i s t e r , d u r i n g the d e c o d i n g process, has t o h a v e at least as
m a n y b i t s as the A register,
to
d e m o n s t r a t e t he c o d i n g processes,
consider S = { a, b, c, d } and s = abaaacdababab, the example that we
used
in
figure
10.
Assuming
it
is
a
zero-memory
statistics of s:
p(a)
=7/13,
P(a) = 0,
p(b)
=4/13,
P(b) = 7/13,
p(c)
=1/13,
P(c) = 11/13,
P(d)
=1/13,
P(d) = 12/13.
source,
t he
29
figure 14 is the encoding process of a simplier example when S = { a,
b } , X = {
p(b) = 7/8.
0, I }, s.= . a b b b b b b a a b
a nd p(a) =. 1/8 a n d
Figure 15 is the decoding process of the same example.
.The precision problem
T h e a l g o r i t h m in t h e last s e c t i o n o m i t t e d o n e v e r y i m p o r t a n t
thing —
the number of bits that is required to represent A to ensure
decodability.
Theorem:
T h e n u m b e r o f b i t s that A n e e d s is n, w h e r e n is the
number of bits used in counting the occurrence of symbols/strings in
.the data compression system.
Proof:
T h e s m a l l e s t n u m b e r b e t w e e n t w o a d d e n t s is 1/n.
r e p r e s e n t 1/n,
To
w e n e e d e x a c t l y n b i t s in b i n a r y w i t h an i m a g i n a r y
decimal in front.
Remarks: Even though n bits are used, the bits per code are by no
m e a n s n b e c a u s e c o d e s o v e r l a p e a c h oth e r u n l e s s t h e va l u e o f the
addent
is a l w a y s
1/n.
In ord e r
for that to h appen,
t he p r e v i o u s
s y m b o l s h o u l d a l w a y s h a v e t he p r o b a b i l i t y 1/n w h i c h is i m p o s s i b l e
unless
IS I = n = |s|.
O u r idea is o ne o f t he w a y s o f u s i n g f l o a t i n g
point arithmetic to overcome the requirement for unlimited precision
of the Elias code.
The Carry-over problem
During the encoding process,
[11]
, a n d t h e a b o v e m a y fail.
the algorithms given in [6],
When encoding a symbol
[10],
with high
30
S = { a, b },
X — { 0 , l } , s = a b b b b b b b a a b,
p(a) = 1/8, p(b) = 7/8, P(a) = OOO2 , P(b) = o o i 2 .
Register C
Input
A
Shift
Leftover
a
000
000
0
0
b
000001
001
0
b
b
0000011
0000100
001
3
I
b
b
0000101
0000110
001
001
0
0
001
001
0
.6147
.4421
.2294
001
0
I
.0368
.8441
000
000
001
0
3
3
.6515
.6515
.6515
b
0000111
b
a
00001111
00001111
a
00001111000
00001111000001
b
I - log (7/8)= .8074
Conpressed file: 00001111000001
Figure 14.
The encoding process of the universal encoder.
Register c '
Shift
Leftover
3
I
0
.8074
01111000001
b
00001111000001
01111000001
b
b
1011000001
1001000001
0
0
.6147
.4421
1001000001
0111000001
b
0111000001
0
.1194
0101000001
b
0101000001
.0368
0011000001
.8411
001000001
.6515
.6515
000000001
000001
001
Decoded as
a
Register C
b
0011000001
0
I
b
a
001000001
000000001
0
3
a
000001
3
.6515
b
001
0
.4588
Figure 15.
1011000001
The decoding process of the universal decoder.
31
probability, an addition is performed with a large number of 1's.
If
a long sequence of I-bits has been shifted out of the C register, this
w i l l p r e c i p i t a t e a "carry-over".
If t he s e q u e n c e o f 1 - b i t s e x t e n d s
through the C register and into the already written code bits (in the
c o m p r e s s e d f i l e ) , it b e c o m e s i m p o s s i b l e to c o r r e c t l y p e r f o r m t he
addition without rereading the compressed file.
A r b i t r a r i l y l o n g s e q u e n c e s o f 1 - b i t s and therefore arbitrarily
long carry-overs m a y be generated.
In binary addition,
the first 0-
bit b e f o r e a s e q u e n c e o f 1 - b i t s w i l l s t o p a c arry-over.
reason,
For this
inserting 0-bits into the code stream at strategic locations
w i l l p r e v e n t c a r r y - o v e r s from causing problems.
"Zero-stuffing" is used by Rissanen and Langdon [12] to check the
o u t p u t s t r e a m for s e q u e n c e s of 1-bits.
W h e n t h e l e n g t h of o ne of
these reaches a predefined value x, a 0 is inserted (stuffed) into the
code str e a m just after the offending sequence.
The sequence
is
undone
by the decoder by checking x consecutive 1-bits and extracting the
following
bit
of
the
sequence.
In
order
that
the
d e c o d e r .can
correctly undo the zero-stuffing, 0-bits have to be inserted which are
not n e c e s s a r i l y n e e d e d for t h e c a r r y - o v e r prob l e m .
determine the value x can also cause problems.
T h e w a y that w e
If x is too small,
we
could waste a lot of bits and consequently lead to poor compression.
O n t h e o t h e r hand, t h e s i z e of x is r e s t r i c t e d b y t h e s i z e o f the C
register.
T h e w o r s t t h i n g abo u t z e r o - s t u f f i n g is t h a t w e h a v e to
constantly check the C register for x consecutive 1-bits.
We
use another
solution by simply
i n t e r v a l s in t h e code.
insert
0 - b i t s at
regular
I n s t e a d of u s i n g o n l y the r e g i s t e r for C, w e
32
use an e x t r a 4K b u f f e r w i t h t h e first bit (we a c t u a l l y use a byte)
being the "carry-catcher".
W e always wait until the buffer is filled
before w e output it to the compressed file.
Wasting one bit (byte) in
4 0 9 6 s a v e s a lot o f t i m e c h e c k i n g for x c o n s e c u t i v e s e q u e n c e o f 1bits.
Moreover,
l o n g c a r r y - o v e r s a r e s o m e w h a t rare.
.A n y that do
occur tend to eliminate others from occurring because a long sequence
of 1 - b i t s t u r n s t o a l o n g s e q u e n c e of 0-bits w h e n a c a r r y ri p p l e s
through.
This method works well.
A carry-over only happens after a large
a d d e n t w i t h a s m a l l shift, it w i l l no t n o r m a l l y a d d a n y s i g n i f i c a n t
t i m e t o t h e c o d i n g process.
I m p l e m e n t a t i o n is v e r y simple.
Th e
insertions can be done with the normal file-buffering - insert a 0-bit
in t h e b e g i n n i n g o f e a c h buffer.
T h e c o m p l e x i t y o f t h e p r o c e s s is
further reduced by inserting an entire 0-byte in the beginning of each
buffer.
'B y t e - l e v e l
addressing.
addressing
is
often
easier
than
bit-level
The decoder then extracts and e x a m i n e s the entire byte
rather than a bit.
33
Chapter 3
UNIVERSAL CODING WITH DIFFERENT MODELERS
Modeling an
information source
is without question the hardest
p a r t o f a n y d a t a c o m p r e s s i o n system.
discover
the
compressed.
redundancies
The
or
information
T h e job of t h e m o d e l e r is to
regular
can
features
be
viewed
in t h e d a t a to be
as
the
probability
distribution on the alphabet of the information source.
In order to use the universal coder, the coding process has to be
d i v i d e d i n t o a m o r e g e n e r a l f o r m as in figure 4.
The modeler
r e s p o n s i b l e for p r o v i d i n g a p p r o p r i a t e p r o b a b i l i t i e s
e n c o d e r a n d t h e decoder.
is
for b o t h t he
Even t h o u g h s o m e of t he d a t a c o m p r e s s i o n
systems stay aw a y from the probabilities, like L Z W [4] (an improvement
of L e m p e l a n d ziv's id e a s in [9] a n d
[10]) or t h e s t r i n g m a t c h i n g
techniques suggested by Rodeh
w e can a l w a y s
[13],
r e d u n d a n c y r e d u c t i o n t e c h n i q u e s i n t o our modeler.
convert
their
The conversion
actually improves their codes compression-wise because w e enforce our
c o d e .to t h e
lower
b o u n d of S h a n n o n
first t h e o r e m
(1.4)
w i t h t he
appropriate probabilities.
Is There A Universal Modeler?
Since
the
universal
encoder
and
'■
decoder
need
only
the
p r o b a b i l i t i e s o f t h e s y m b o l s / s t r i n g s a n d the p r o b a b i l i t i e s o f t h e
previous symbol/string,
the most
basic modeler should provide these
34
probabilities.
T h e e a s i e s t w a y to do that is to k e e p the c o u n t s of
each symbol/string:
pfS^) = c o u n t (Sj^) / total count.
(3.1)
However, different information sources have totally different kind of
redundancies.
redundancies
Welsh
[4]
suggested
in c o m m e r c i a l data,
that
there
are
na m e l y character
four
typ e s
of
distri b u t i o n ,
c h a r a c t e r r e p e t i t i o n , h i g h - u s a g e patterns and positional redundancy.
To capture a particular kind of redundancy,
information
source
to
decide
techniques have to be used.
what
kind
For instance,
w e need to analyze the
of
redundancy capturing
the probabilities w e need
in a first order Markov source are P(SX |Sy), which can be given by
p ( S x |sy )
= C o u n t (Sx |Sy ) / Count(y )
w h e r e Count(y)
denotes
(3.2)
the n u m b e r of t i m e s c l a s s y o c c u r s in the
source s, and Count(Sx )Sy ) denotes the number of times the next symbol
at these occurrences is Sx .
Modelers can be divided into two. categories,
and adaptive.
namely nonadaptive
The following sections show h o w w e can convert existing
models to provide appropriate probabilities for the universal coder so
as to improve compression. ■
Nonadaptive Models
Nonadaptive models basically scan the information source to get
the appropriate probabilities.
The probabilities will be put.in the
m o d e l f i l e so t h a t b o t h t h e e n c o d e r a n d d e c o d e r c a n use.
Therefore,
t h e d e c o d i n g p r o c e s s c a n n o t b e p e r f o r m e d w i t h o n l y t he c o m p r e s s e d
35
file,
and the compression ratios defined in (2.2) should be redefined
as
PC = ( IOF I -
ICF + M F I) /
IOF I.
(3.3)
T h e m a j o r d i s a d v a n t a g e of n o n a d a p t i v e m o d e l s is t h e fact that
they
require
two
passes
of
the
c o m p r e s s e d f i l e c a n b e g e n erated.
information
source
before
the
Nevertheless, the probabilities
p r o v i d e d b y a n o n a d a p t i v e m o d e l is u s u a l l y b e t t e r t h a n an a d a p t i v e
model, therefore better compression can be expected if w e can keep the
size of the model file down.
Zero-memory Source
F o r a z e r o - m e m o r y source, w e h a v e b e e n u s i n g t h e H u f f m a n c o d e
s i n c e t h e fifties.
However,
it h a s b e e n s h o w n b y Shannon's first
theorem that the worst a code would do is one extra bit more, than the
entropy.
T h o u g h t h e H u f f m a n c o d e d o e s n ot s e p a r a t e t he m o d e l i n g
process f r o m the coding, w e can view the first step of the following
algorithm of the Huffman code as modeling and the rest as the encoding
process:
1.
s c a n t h e s o u r c e f i l e b y c o u n t i n g t he o c c u r r e n c e of e a c h
source symbol.
2.
a r r a n g e t h e s y m b o l s in d e s c e n d i n g o r d e r w i t h re s p e c t o f
their probabilities.
3.
p e r form the reduction and splitting process as in figures 5
and 6 to generate the code.
4.
build a finite state machine as in figure 8.
5.
encode the source file using the machine built in step 4.
36
For the universal coder, w e need the probabilities of each symbol
just as
in t h e a b o v e a l g o r i t h m .
The modeler
is
responsible
for
p r o v i d i n g (2.1) for a l l i by s i m p l y c o u n t i n g the o c c u r r e n c e of each
symbol.
In o t h e r w o r d s , t h e m o d e l e r for a ny z e r o - m e m o r y s o u r c e is
s t e p I of t h e a l g o r i t h m above.
= a b a c d b a e c a ,
Figure 16.
For e x a m p l e ,
s=
{ a, b, c, d, e }, s
and the model file can be represented by
An example of a model file.
**The model file above actually represents the statistics for figure
6.
Markov Source
To d e s i g n a d a t a c o m p r e s s i o n s y s t e m for a M a r k o v source, th e
m o d e l e r has to p r o v i d e the p r o b a b i l i t i e s g i v e n in (3.2).
F r o m (3.2),
we can calculate the information content of the source by
I(S)
=
Y
t
z=l
Proof:
Count(z)
log Count(z) -]T
^ C o u n U x |z)logCount(x|z).
z=l xcS
(3.4)
Since p(x|z) = cou n t(x | z ) / co u n t (z ), and
P(S)=TT
TT
P(XjQ1 ) ... TT
TT
P(x|%).
xeS c o u n t (XlQ1 )
xeS Count(x|Qk )
l/p(s) =
(3.5)
TT T r c o u n t ( Q 1) Z C o u n t ( X l Q 1 ) ... Tf
TT C o u n t ( Q K )/Count(x |q k )
x€S C o u n t (x IQl)
Count(x|Qk)
Xes
37
I(s)
=
Y
Z
Y
log Count(z ) - ^
Z
Y
log Count(x |z)
x€S C o u n t ( x Iz) z=l
XfS Count(xjz) Z=I
j
K
K
=^z=llog Count(z) [xcS
Y Count(xI
Z z)
_Z
ZCount(x|z) log count(x|z)
z=l xeS
Z
Z Z
=
Count(z)
log Count(z) -
z=l
Count(x|z)logCount(x|z).
z=l Xes
Let us illustrate nonadaptive models with the example w e used before
where the source s = abbbacccbaaabbbcccabcaac, and the source alphabet
S = { a, b, c }.
By a s s u m i n g the s o u r c e to be a f i r s t - o r d e r M a r k o v
source, the m o d e l e r c o u n t s th e s y m b o l s c o n d i t i o n e d on t h e p r e v i o u s
symbols.
W i t h the i n i t i a l s t a t e as a w e s h o u l d ge t t h e c o n d i t i o n a l
count
C o u n t(a |a )
3
Co unt(bIa ) = 3
C o u n t (cI a ) = 3
C o u n t(a Ib )
2
count(bIb) = 4
C o u n t (cIb) = 2
C o u n t( a I c )
2
Count(b I c ) = 2
C o u n t (c I c ) = 4
t o p r o d u c e t h e m o d e l f i l e in f i g u r e 17.
By (3.4),
t he
ideal c o d e
length with this model is
I(s) = 91og9 + 2 * 81og8 - (3 * 31og3 + 4 * 21og2 + 2 * 41og4)
= 38.26 bits.
F ig u re
17.
An
e x a m p le
source.
of
a
m odel
f ile
fo r
a
firs t-o rd e r
M arkov
38
Adaptive Models
The major disadvantage of the nonadaptive models is the fact that
the information source has to be prescanned by the modeler to get the
appropriate
probabilities.
More
importantly,
the model
file
c o n t a i n i n g t h e p r o b a b i l i t i e s ha s t o b e t r a n s m i t t e d t o t h e d e c o d i n g
u n i t s as a p r e a m b l e in th e c o d e string.
S i n c e t he m o d e l file is not
n e e d e d in t h e d a t a c o m p r e s s i o n s y s t e m u s i n g an a d a p t i v e m o d e l e r as
shown
in figure 4,
compression could sometimes be better especially
for M a r k o v s o u r c e w h e r e t h e m o d e l f i l e is large.
Markov source,
the model file contains
In a f i r s t - o r d e r
[s|^ probabilities.
When the
n u m b e r o f o r d e r s g o e s up, t he n u m b e r o f p r o b a b i l i t i e s in t he m o d e l
file goes up exponentially,
Number of probabilities needed =
|S(orc^e r .
(3.6)
•D
A n A S C I I f i l e w h i c h is a t h i r d - o r d e r M a r k o v s o u r c e r e q u i r e s 2 5 6 J =
1 6 7 7 7 2 1 6 p r o b a b i l i t y values,
source,
to
represent a higher order of Ma r k o v
the model file could even be bigger than the source file.
To a v o i d p r e s c a n n i n g t he s o u r c e a n d the p r e a m b l e ,
consider adaptive models in our data compression system.
of fact,
w e n e e d to
As a matter
adaptive models also give the modeler a chance to adjust the
probabilities as the source is being encoded.
Zero-memory Source
T h e H u f f m a n c o d e w e l o o k e d at e a r l i e r is a c l a s s i c e x a m p l e of a
code using a nonadaptive model.
The conversion of such a nonadaptive
m o d e l t o an a d a p t i v e o n e is q u i t e s t r a i g h t forward.
E q u a t i o n (3.1)
39
still holds- while the probabilities are calculated after each symbol
is scan n e d .
The modeler of an adaptive model does the followings:
1. . initialize the source symbols to be equally probable.
2.
r e a d in a s y m b o l f r o m t h e s o u r c e file, a n d s e n d the pr e s e n t
probabilities to the encoder.
3.
update the probabilities with the current source symbol.
4.
If not end of file, goto step 2.
E q u a t i o n (3.1) h a s t o b e r e c a l c u l a t e af t e r e a c h s y m b o l
is s c a n n e d
b e c a u s e b o t h t h e t o t a l c o u n t a nd the co u n t of t he p r e s e n t s y m b o l
change.
Figure 18 is what an adaptive modeler should provide to the
encoder and decoder.
Markov. Source
T h e a d a p t i v e m o d e l e r for a M a r k o v s o u r c e is n ot m u c h d i f f e r e n t
than a zero-memory source,
as each state of the finite state machine
of the Markov source is the same as the zero-memory source,
f i g u r e 17 a n d f i g u r e 18,
combining
f i g u r e 19 is an e x a m p l e of a f i r s t - o r d e r
.Markov source.
LZW C o d e .
Though the L Z W compression a l g o r i t h m .[4] does not s e e m to fit our
u n i v e r s a l c o d e r b e c a u s e it is a n o n p r o b a b i l i s t i c a p p r o a c h ,
w e c an
use their idea in building such a modeler that provides probabilities
of symbol/string for the encoder and the decoder.
W e can consider the
s t r i n g t a b l e g e n e r a t e d b y t he L Z W a l g o r i t h m as t he m o d e l e r w h i c h
40
S = { a, b, c, d, e }, s = a b a c d b a e c a.
After scanning b,
Figure 18.
An example of an adaptive modeler for zero-memory source.
41
S = { a, b, c }, s = abbbacccbaaabbbcccabcaac.
Before scanning.
I
After the source s,
4
Figure 19.
An example of an adaptive modeler for a first-order Markov
source.
42
counts the symbols and strings.
We can build the modeler with a tree
k e e p i n g t h e c o u n t o f a l l s y m b o l s a n d s t r i n g s that t h e m o d e l e r h as
scanned.
built.
1.
W e e n c o d e o n e s y m b o l at at t i m e b y p a r s i n g t he tree w e
W e build the L Z W modeler as follows:
I n i t i a l i z e t h e co u n t of l a m b d a to q ( |S |), a n d th e count of
all the symbols to one.
Previous —
2.
Initialize the pointer W to Lambda.
I.
Read one symbol
f o r m t he s o u r c e file,
s e n d P(S^) a nd
Previous to the encoder.
3.
Previous - P ( S i ) / W, Update W.
4.
Move W to point to S i .
5.
Read one symbol Sj from the source file,
if Sj is a son of W
then send
P(Sj) and previous to the encoder.
Previous —
Update
P(Sj) / W.
W.
move W to Sj.
else
send
P(Sj) and previous to the encoder.
Previous —
P(Sj) / W.
Update both
W and Pj.
Move W to lambda.
Figure 20 is the example in table 4 by using such a modeler.
beauty of such a modeler
The
is it converts a nonprobabilistic algorithm
43
S = { a , b ,
c, d } , s = a b a a a c d a b a b a b .
Before scanning,
After scanning a.
After scanning b,
After scanning the source s,
Figure 20.
An example of converting the LZW table into a modeler.
44
into
a
probabilistic
capturing redundancies.
one
while
preserving
th e
same' p r o c e s s
of
The major shortcoming of using such a modeler
is the speed of both the encoding and decoding processes.
Each time a
symbol
process, of
is
encoded,
calculating
we
haye
to
go
a l l .the p r o b a b i l i t i e s
algorithm uses an adaptive modeler,
through
at
one
th e
whole
level.
Since
the
LZW
the decoding unit simply uses the
s a m e modeler and the decoder described in chapter 2.
45
Chapter 4
COMPRESSION RESULTS AND ANALYSIS
Test Data and Testing Method
T h e test d a t a for t h e u n i v e r s a l c o d e r w i t h d i f f e r e n t m o d e l e r
consisted of different types of r e d u n d a n c i e s —
scale images, and floating point arrays.
E n g l i s h text, g r a y ­
The floating point arrays we
u s e d a r e f i l e s w h i c h d o not c o n t a i n a n y "natural" redundancies.
use 0,
I and 2 Taylor Series approximation (predictor),
compression
is e x p e c t e d n o m a t t e r
predictor is used,
w h a t c o d e w e use.
We
otherwise no
Whenever a
we use a particular predictor throughout one test
of a particular file to compare the results.
As w e have discussed in the previous chapter, building a model for
a high order Markov source is too expensive memory-wise.
compression
of
our
universal
co d e r
a nd
different
W e test the
modelers
by
c o m p r e s s i n g t h e s a m e f i l e w i t h H u f f m a n c o d e and L Z W code, a nd then,
with
the
universal
coder
with
z e r o -memory modeler and with LZ W
modeler.
Table 5 shows the compression results of different codes with and
w i t h o u t u s i n g t h e u n i v e r s a l c o d i n g scheme.
improvement in compression;
section model analysis.
W e c an s e e the cl e a r
the speed trade-off will be discussed in
46
Table 5.
Compression results for a variety of data types.
' with our modeler
and universal coder
Huffman
LZW
25%
Floating Arrays
Gray-scale images
Data Type
English text
Zero-memory
LZW
45%
33%
55%
5%
10%
28%
30 %
5%
35%
30%
50%
I
• code Analysis
T h e c o d i n g u n i t is b a s e d on t h e idea of t he E l i a s c o d e [3].
The ■
c o d e p r o d u c e d b y t h e u n i v e r s a l c o d e r t h a t w e b u i l t c a n b e c a l l e d as
Arithmetic Code which in its original practical formulation is due to
J. Rissanen
[12].
W e also constructed the Fast Arithmetic code
[10]
which is an implementation of Arithmetic code for sources using binary
alphabets and wh ich uses no multiplication.
resulting' f a s t e r
performance.
coding
is
a loss
o f at
T h e p r i c e p a i d for t he
most
3%
in c o m p r e s s i o n
In p r a c t i c e , the Fast A r i t h m e t i c c o d e m a y be a b e t t e r
c o d e t o u s e b e c a u s e it is a b o u t t w i c e as fast as our u n i v e r s a l code.
However,
the Fast Arithmetic Code can only handle binary source code
so for most cases the universal code has the clear advantage.
There are also several advantages .to using the universal code as
o p p o s e d to o t h e r a v a i l a b l e codes.
First, t he u n i v e r s a l co d e c an be
47
used with any model of the source.
Other codes have been historically
highly mod e l dependent and in fact the coding and the model are hard
t o separate.
The
universal
code enables
t he d e s i g n e r
compression syst e m to emphasize .the modeling of the data.
of a d a t a
Modeling is
w h e r e t h e d i f f i c u l t y in d a t a c o m p r e s s i o n r e a l l y lies b e c a u s e of t h e
variety of redundancies.
The adaptability of the universal code allows
of a data compression system.
the modular design
If any changes are made to the system,
for example changing from floating point numbers to gray scale images
or f r o m a n o n a d a p t i v e a p p r o a c h to a n a d a p t i v e one, o n l y t h e m o d e l e r
n e e d s t o b e c h a n g e d a n d not t he coder.
the design of one
T h i s f l e x i b i l i t y a l l o w s for
data compression system which is able to compress
various types of data under various circumstances without the need of
rebuilding the syst e m for each type data.
It also allows
for models
to use any type of source alphabets.
T h e m o s t s i g n i f i c a n t g a i n in the u n i v e r s a l c o d e is that it is a
noninteger code and so can perform compression as l ow as the entropy.
It is then easier for us to compare one model to another.
Model Analysis
Since our universal coder guarantees to generate codes with the
shortest
average
code
length,
we
can
appropriate models for particular sources.
the
entropy
probabilities.
—
the
best
possible
concentrate
on
finding
Remember the definition of
compression
with
The word "given" is the key of a modeler.
the
given
The several
models we discussed in the previous chapter provides totally different
48
kinds of probabilities.
best,
but
rather
Intuitively,
It is not possible to conclude which is the
which
is
better
for
a
particular
source.
it is s a f e to s a y that L Z W m o d e l w o r k s b e s t for m o s t
sources because L Z W works like a variable order Markov process.
the popular patterns are put in the string table.
b e s t a n d w o r s t c a s e in d a t a c o m p r e s s i o n ,
Only
Before w e study the
let us c o n s i d e r a ge n e r a l
information source.
Let t h e s o u r c e a l p h a b e t S = ( S 1 , S 2 / •••/ Sq I , t h e i n f o r m a t i o n
s o u r c e s € S * z a n d s is m a d e u p o f W 1 , w 2 , ...,'.W1 , ..., w m , w h e r e Wj_£
S a n d Is I = m.
The Best Case
T h e b e s t c a s e for a n y d a t a c o m p r e s s i o n s y s t e m is w h e n the file
contains
only
the
repetition
of
a
single
symbol
of
the
source
alphabet.
First of all,
let us look at h o w nonadaptive model would do.
In
a z e ro-memory source or Huffman code, a single symbol Sx will have the
probability equal one.
code.
The
pcHuffman =
Sx will be represented by one bit in Huffman
percentage
compression
of
Huffman
code
by
(2.2)
is
I _ [(m + # of bits in the model file) / m * # of bits per
source symbolj, so
pcHuffman ~
By
using
the
I - U
universal
/ bits per source symbol).
coder
and
the
zero-memory
modeler,
the
e n c o d i n g u n i t s h o u l d s i m p l y s e n d t h e m o d e l file a n d t h e n u m b e r of
occurrences of symbol Sx to the decoder.
This is exactly the w ay that
a run-length code would handle the situation.
Thus the compression is
49
even better than the Huffman code though the theoretical 0 bit is needed
to encode such a source, because
H(s) = I log I + (q - I) * 0 log 0
0.
If w e t r e a t s u c h a s o u r c e as a M a r k o v source, w e s h o u l d c o m e up w i t h
the e x a c t r e s u l t e x c e p t a b i g g e r m o d e l file,
it is b e c a u s e both the
e n c o d i n g a n d d e c o d i n g p r o c e s s s t a y in o ne sta t e o f t h e finite s t a t e
machine as in figures 8 and 9.
For adaptive models, the performance of both zero-memory
sources
and Markov sources are somewhat worse than the nonadaptive models in
the best case.
all sym b o l s .
It is because we start off with equal probabilities of
It is i n t e r e s t i n g to s e e h o w the L Z W m o d e l e r p e r f o r m s
under the situation.
By the LZW algorithm,
the source s = w^, w 2, ..,
w^,
.., w m , ans s is p a r t i t i o n e d i n t o t^,
.., t i , .., t r , w h e r e t i
S*.
T h e e n c o d e d f i l e of s is th e c o n c a t e n a t i o n of t h e c o d e of tj, ..,
a n d t r , that is t o s a y C(s) = C ( t i ) C ( t 2 ) ... C t t i ) .. C ( t r ).
best case,
W i = wj,
for i, j = I, 2, ..., m.
W 1 , t 2 = W 2W 3 , t 3 = W 4W
so
on.
For
instance,
aaaaaaaaaaaaaaa.
aa,
aaa,
aaaa,
|s|
aaaaa,
C(aaaa) C(aaaaa).
=
5W 6 ,
jt-J = I, |t2 | = 2,
S
ASCII
= 128 a nd
and
Th e p a r t i t i o n is t^ =
..., a nd
=
{
|s| = 15.
the c o d e
N
>
of s C(s) = C(a)
s
=
C(aa)
C(aaa)
In general, if s
(r - l)(r - 2) / 2
N - f sqrt(2m) ]
and
s is p a r t i t i o n e d into a,
T h u s s can be e n c o d e d by 5 codes.
>=
|t3 | = 3, a n d
c h a r a c t e r s },
Im | , and the number of code needed to encode s is N,
r(r - I) / 2
For t he
(4.1)
50
If w e u s e a 12-b i t c o d e as s u g g e s t e d in [4], t he c o m p r e s s e d file is
about sqrt(2*m) *• 12 bits. . By using the L Z W modeler and the universal
coder,
w e n e e d (7 + 12) / 2 b i t s for e a c h c o d e o n t h e average.
other words,
In
w e c a n s a v e 2.5 b i t s p e r c o d e a n d t h e n u m b e r of code
needed is sqrt(2*m).
The Worst Case
The worst case for any data compression system is a file with all
the
symbols
and
possible
strings
having
exactly
the
same
probabilities.
For n o n a d a p t i v e m o d e l s ,
p(S^) =
z e r o - m e m o r y source, a n d p(S j |Yj_)
p(Sj) for i,j = I, 2, ..., q in
= P(Sj) for all'j a n d
= I, 2, ...,
q. •The best that the universal coder using a zero-memory modeler or a
Markov source modeler is
H(s) = q * 1/q log q.
(4.2)
I(s) = m * H(s).
For adaptive
v
models,
equation
(4.3)
(4.2) and (4.3) still hold because
an adaptive model starts with equal probable symbols and the count of
symbols and strings never changes much in the worst case.
The original L Z W algorithm,
ho w e v e r , produces a compressed file
which is bigger than the source file because the nature of the fixed
l e n g t h code.
It is t h e r e a s o n w h y t h e L Z W a l g o r i t h m g i v e s n e g a t i v e
compression (expansion)
in some cases.
Conclusions
Different
different
models produce different
source
files.
However,
t he
r a t i o s of c o m p r e s s i o n on
universal
coding
scheme
51
d e s c r i b e d in t h i s p a p e r g u a r a n t e e s t he s h o r t e s t a v e r a g e c o d e l e n g t h
for the given probabilities.
More importantly,
t h e u n i v e r s a l cod e r
c a n be u s e d w i t h a n y m o d e l o f t h e source, a nd t h e c o n v e r s i o n of m o s t
d a t a c o m p r e s s i o n s y s t e m s i n t o m o d e l e r a n d c o d e r is q u i t e e a s y an d
straight forward.
Nevertheless/ using an adaptive model like L Z W with the universal
coder slo w s d o w n the speed of the whole coding process by as much as
50 times comparing with the original L Z W code while 10 to 20 percent
e x t r a c o m p r e s s i o n is g a i n e d on t he average.
It is u p t o t h e user to
decide if compression is more important than the speed of the coding
process.
It a l s o le a d s t o f u t u r e r e s e a r c h in s p e e d i n g u p t he c o d i n g
p r o c e s s for a d a p t i v e m o d e l s w h i c h a r e s u p p o s e d t o b e fast e r t h a n
nonadaptive models.
Future Research
F u t u r e r e s e a r c h in t h e w h o l e a r e a of d a t a c o m p r e s s i o n m a y be
concentrated on modeling.
Since no universal modeling technique can
guarantee effective compression,
it would b e nice if there existed an
a n a l y z e r w h i c h d e t e r m i n e s w h i c h m o d e l e r w o r k s b e s t f or a n y g i v e n
source file.
Data
compression
systems
which
deal
with
digitized
images
complicate the p r o blem even further because some images do not possess
a n y "natural" r e d u n d a n c i e s .
T h e m o d e l e r s w e look at in this p a p e r
capture only sequential redundancies while in most images,
of a pixel depends on the surrounding pixels,
[14], for instance, have to be used.
the value
predictors described in
52
Furthermore,
quite
a
bit
adaptive models using the universal coder performs
slower
than
nonadaptive
models
because
of
the
r e c a l c u l a t i o n o f p r o b a b i l i t i e s a f t e r e a c h s y m b o l h a s b e e n scanned.
However,
parallel
algorithms
probability calculations:
seem
to
fit
very
well
with- t h e
speed is expected, to increase dramatically
if such algorithms are employed.
53
REFERENCES CITED
54
REFERENCES CITED
[1] Hamming, Richard W.
Coding and Information Theory, prentice-
H a l l , Englewood Cliffs, New Jersey, 1986.
[2] Rivest R.L.; Shamir A.; and Adleman L.
"On Digital Signatures and
Public Key Cryptosystems", Communications of the ACM, Vol. 21, No.
2, February 78, p. 267-284.
[3] Abramson, Norman.
Information Theory and coding, McGraw-Hill
Book Company, New York, New York, 1963.
[4] Welch, Terry A.'- "A Technique for High-performance Data
Coirpressi o n " , IEEE Computer, June 1984, p. 8-19.
[5] L a n g d o n , Glen G.
"Antreduction to Arithmetic Coding",
I.B.M.
J. Res. D e v elop., V o l . 28, No. 2, March 1984, p. 135-149.
[6]
Rissanen, Jorma; and Langdon, Glen G.
"Arithemetic Coding",
IEEE Transaction on Information Theory, Vol. 27, No. I, March
1979, p.149-162.
[7]
[8]
C a m p e l l , Graham.
"Two Bit/Pixel Full Color Encoding", ACM
S IGGRAPH 1986 Proceedings, V o l . 20, No. 4, 1986, p. 127-135.
Reg b a t i, K.
"An O v e rview of Data Compression Techniques",
IEEE C o m puter, April 1981, p. 71-75.
[9]
Lempel, Abraham; and Ziv, Jacob.
"On the Complexity of
Finite Sequences", IEEE Transaction on Information Theory,
V o l . 22, No. I, January 1976, p. 75-81.
[10] Langdon, Glen G.
"A simple General Binary Source Code",
IEEE
Transaction on Information T h eory, V o l . 28, No. 5, September 1982,
p. 800-803.
[11] " L a n g d o n , Glen G.; and Rissanen, J o r m a . "Compression of BlackWhite images with Arithmetic Coding",
IEEE Transaction on
Information Theory, V p l . 29, No. 6, June 1981, p. 858-867.
55
REFERENCES CITED— Continued
[ 1 2 ] ■Rissanen, J o r m a ; and Langdon, Glen G.
"Universal Modeling
and Codi n g " , IEEE Transaction on Information T h e o r y , V o l . 27,
No. I, January 1981, p. 12-23.
[13] Rodeh, Michael; Pratt, Vaughan R.; and E v e n , Shimon.
"Linear
Algorithm for Data Compression via String Matching".
Journal
of the Association for Computing Machinery, V o l . 28, No. I,
.January 1981, p. 16-24.
[14] Henry, Joel E.
"Model Reduction and predictor selection in Image
Compression", Master's thesis. D e p a r t m e n t of c o m p u t e r Science,
Monatan State University, B ozeman, Montana, August 1986.
APPENDICES
57
APPENDIX A
Glossary of Symbols and Entropy Expression
General Symbols
S
source alphabet
Si
source symbol
q = |s|
number of source symbols
Sn
nth extension of source alphabet
s
information source, an element of S*
|S|
size of an information source
C(Si )
code of the source symbol S
C(s)
code of the inormation source s
X
code alphabet
Xi
code symbol
r
number of code symbol
Ii
number of code symbols in code word X i corresponding to Si
L
average length
of a code word for S
Ln
average of a code
word for
Sn
Entropy Expressions
L = £ q i=1
Ic(Si ) I P(Si )
the average code length
58
I(Si ) = - log P(Si )
he amount of information obtained
when S i is received
H(S) = - E s P(Si ) log P(Si )
±(s)
=
IsI
*
H(S)
entropy of the zero-memory source S
amount
of
information
information source
of
an
59
APPENDIX B
Definitions of codes
Instantaneous
Nonblock
Not instantaneous
Uniquely decodable
Not uniquely decodable
Nonsingular
Singular
Block
Figure 21.
Subclasses of codes.
Definitions:
A block c o d e is a c o d e w h i c h m a p s e a c h o f th e s y m b o l s o f t he
source alphabet
S i n t o a fi x e d
sequence
of
symbols
of
the
code
a l p h a b e t X.
A b l o c k c o d e is s a i d to be nonsingular if all t h e w o r d s of the
codes are distinct.
60
The nth extension of a block code which maps the symbols
the code words
s y m b o l s (SjilS ^ 2
into
is the block code which maps the sequences of source
"s In ^ ^n t o t he s e q u e n c e s of c o d e w o r d s
...
Xfn).
A b l o c k c o d e is s a i d to be u n i q u e l y d e c o d a b l e if, a n d only if,
the nth extension of the code is nonsingular, for every finite n.
.
A u n i q u e l y d e c o d a b l e c o d e is s a i d t o be i n s t a n t a n e o u s if it is
possible
to
decode
each
succeeding code symbols.
word
in
sequence
without
reference
to
MONTANA STATE UNIVERSITY LIBRARIES
762
10025
63 4
(#'
Download