image COMPRESSION

advertisement
UNIVERSITY OF JOENSUU
Department of Computer Science
Lecture notes:
IMAGE COMPRESSION
Pasi Fränti
Abstract: The course introduces to image compression methods in the case of
binary, gray scale, color palette, true color and video images. Topics include
Huffman coding, arithmetic coding, Colomb coding, run-length modeling,
predictive modeling, statistical modeling, context modeling, progressive image
decompression, transform-based modeling, wavelets, vector quantization, and
fractal-based compression. Existing and forth-coming standards such as JBIG1,
JBIG2, JPEG, JPEG-LS and JPEG-2000 will be covered. Main emphasize is on
the compression algorithms in these standards.
Joensuu
9.9.2002
TABLE OF CONTENTS:
IMAGE COMPRESSION
"A picture takes more than thousand bytes"
1. INTRODUCTION .............................................................................................................................................. 3
1.1 IMAGE TYPES ...................................................................................................................................................... 3
1.2 LOSSY VERSUS LOSSLESS COMPRESSION ............................................................................................................. 4
1.3 PERFORMANCE CRITERIA IN IMAGE COMPRESSION .............................................................................................. 6
2 FUNDAMENTALS IN DATA COMPRESSION ............................................................................................. 8
2.1 MODELLING ........................................................................................................................................................ 8
2.2 CODING ............................................................................................................................................................. 16
3 BINARY IMAGES ............................................................................................................................................ 26
3.1 RUN-LENGTH CODING........................................................................................................................................ 26
3.2 READ CODE ..................................................................................................................................................... 29
3.3 CCITT GROUP 3 AND GROUP 4 STANDARDS ...................................................................................................... 30
3.4 BLOCK CODING ................................................................................................................................................. 31
3.5 JBIG ................................................................................................................................................................. 33
3.6 JBIG2 ............................................................................................................................................................... 36
3.7 SUMMARY OF BINARY IMAGE COMPRESSION ALGORITHMS ................................................................................ 37
4 CONTINUOUS TONE IMAGES .................................................................................................................... 39
4.1 LOSSLESS AND NEAR-LOSSLESS COMPRESSION .................................................................................................. 39
4.2 BLOCK TRUNCATION CODING ............................................................................................................................ 49
4.3 VECTOR QUANTIZATION .................................................................................................................................... 51
4.4 JPEG ................................................................................................................................................................ 60
4.5 WAVELET.......................................................................................................................................................... 70
4.6 FRACTAL CODING .............................................................................................................................................. 73
5 VIDEO IMAGES .............................................................................................................................................. 80
LITERATURE ..................................................................................................................................................... 83
APPENDIX A: CCITT TEST IMAGES............................................................................................................... 86
APPENDIX B: GRAY-SCALE TEST IMAGES ................................................................................................. 87
2
1. Introduction
The purpose of compression is to code the image data into a compact form, minimizing both
the number of bits in the representation, and the distortion caused by the compression. The
importance of image compression is emphasized by the huge amount of data in raster images:
a typical gray-scale image of 512512 pixels, each represented by 8 bits, contains 256
kilobytes of data. With the color information, the number of bytes is tripled. If we talk about
video images of 25 frames per second, even a one second of color film requires approximately
19 megabytes of memory, therefore the capacity of a typical hard disk of a PC-machine
(540 MB) can store only about 30 seconds of film. Thus, the necessity for compression is
obvious.
There exists a number of universal data compression algorithms that can compress almost any
kind of data, of which the best known are the family of Ziv-Lempel algorithms. These
methods are lossless in the sense that they retain all the information of the compressed data.
However, they do not take advantage of the 2-dimensional nature of the image data.
Moreover, only a small portion of the data can be saved by a lossless compression method,
and thus lossy methods are more widely used in image compression. The use of lossy
compression is always a trade-off between the bit rate and the image quality.
1.1 Image types
From the compression point of view, the images can be classified as follows:
 binary images
 gray-scale images
 color images
 video images
An illustration of this classification is given in Figure 1.1. Note that the groups of gray-scale,
color, and video images are closely related to each other, but there is a gap between the binary
images and the gray-scale images. This demonstrates the separation of the compression
algorithms. The methods that are designed for the gray-scale images, can also be applied to
the color and video images. However, they usually do not apply to the binary images, which
are a distinct class of images from this point of view.
For comparison, Figure 1.1 also shows the class of textual data. The fundamental difference
between images and e.g. English text is the 2-dimensionality of the image data. Another
important property is that the gray-scales are countable, which is not true for English text. It is
not evident that any subsequent symbols, e.g. 'a' and 'b' are close to each other, whereas the
gray-scales 41 and 42 are. These properties distinct the image data from other data like
English text.
Note also that the class of color-palette images appears on the border line of image data and
the non-image data. This demonstrates the lack of countable alphabet of the color-palette
images, which makes them closer to other data. In fact, color-palette images are often
compressed by the universal compression algorithms, see Section 4.7.
3
IM AGE
COM PRESSION
Gray-scale
images
UNIVERSAL
COM PRESSION
Bin ary
im ages
Video
images
Tru e colou r
im ages
Textual
data
Colour
palette
images
Figure 1.1: Classification of the images from the compression point of view.
1.2 Lossy versus lossless compression
A compression algorithm is lossless (or information preserving, or reversible) if the
decompressed image is identical with the original. Respectively, a compression method is
lossy (or irreversible) if the reconstructed image is only an approximation of the original one.
The information preserving property is usually desirable, but not always obligatory for certain
applications.
The motivation for lossy compression originates from the inability of the lossless algorithms
to produce as low bit rates as desired. Figure 1.2 illustrates typical compression performances
for different types of images and types of compression. As one can see from the example, the
situation is significantly different with binary and gray-scale images. In binary image
compression, very good compression results can be achieved without any loss in the image
quality. On the other hand, the results for gray-scale images are much less satisfactory. This
deficiency is emphasized because of the large amount of the original image data when
compared to a binary image of equal resolution.
4
100%
80%
60%
53.4 %
40%
20%
0%
6.7 %
4.3 %
CCITT-3
IMAGE:
binary
TYPE:
METHOD: JBIG
(lossless)
LENA
gray-scale
JPEG
(lossless)
LENA
gray-scale
JPEG
(lossy)
Figure 1.2: Example of typical compression performance.
The fundamental question of lossy compression techniques is where to lose information. The
simplest answer is that information should be lost wherever the distortion is least. This
depends on how we define distortion. We will return to this matter in mode detailed in Section
1.3.
The primary use of images is for human observation. Therefore it is possible to take advantage
of the limitations of the human visual system and lose some information that is less visible to
the human eye. On the other hand, the desired information in an image may not always be
seen by human eye. To discover the essential information, an expert in the field and/or image
processing and analysis may be needed, cf. medical applications.
In the definition of lossless compression, it is assumed that the original image is in digital
form. However, one must always keep in mind that the actual source may be an analog view
of the real world. Therefore the loss in the image quality already takes place in the image
digitization, where the picture is converted from analog signal to digital. This can be
performed by an image scanner, digital camera, or any other suitable technique.
The principal parameters of digitization are the sampling rate (or scanning resolution), and
the accuracy of the representation (bits per sample). The resolution (relative to the viewing
distance), on the other hand, is dependent on the purpose of the image. One may want to
watch the image as an entity, but the observer may also want to enlarge (zoom) the image to
see the details. The characteristics of the human eye cannot therefore be utilized, unless the
application in question is definitely known.
Here we will ignore the digitization phase and make the assumption that the images are
already stored in the digital form. These matters, however, should not be ignored when
designing the entire image processing application. It is still worthwhile mentioning that while
the lossy methods seem to be the main stream of research, there is still a need for lossless
methods, especially in medical imaging and remote sensing (i.e. satellite imaging).
5
1.3 Performance criteria in image compression
The aim of image compression is to transform an image into compressed form so that the
information content is preserved as much as possible. Compression efficiency is the principal
parameter of a compression technique, but it is not sufficient by itself. It is simple to design a
compression algorithm that achieves a low bit rate, but the challenge is how to preserve the
quality of the reconstructed image at the same time. The two main criteria of measuring the
performance of an image compression algorithm thus are compression efficiency and
distortion caused by the compression algorithm. The standard way to measure them is to fix a
certain bit rate and then compare the distortion caused by different methods.
The third feature of importance is the speed of the compression and decompression process. In
on-line applications the waiting times of the user are often critical factors. In the extreme case,
a compression algorithm is useless if its processing time causes an intolerable delay in the
image processing application. In an image archiving system one can tolerate longer
compression times if the compression can be done as a background task. However, fast
decompression is usually desired.
Among other interesting features of the compression techniques we may mention the
robustness against transmission errors, and memory requirements of the algorithm. The
compressed image file is normally an object of a data transmission operation. The
transmission is in the simplest form between internal memory and secondary storage but it can
as well be between two remote sites via transmission lines. The data transmission systems
commonly contain fault tolerant internal data formats so that this property is not always
obligatory. The memory requirements are often of secondary importance, however, they may
be a crucial factor in hardware implementations.
From the practical point of view the last but often not the least feature is complexity of the
algorithm itself, i.e. the ease of implementation. Reliability of the software often highly
depends on the complexity of the algorithm. Let us next examine how these criteria can be
measured.
Compression efficiency
The most obvious measure of the compression efficiency is the bit rate, which gives the
average number of bits per stored pixel of the image:
bit rate =
size of the compressed file
pixels in the image

C
N
(bits per pixel)
(1.1)
where C is the number of bits in the compressed file, and N (=XY) is the number of pixels in
the image. If the bit rate is very low, compression ratio might be a more practical measure:
compression ratio =
size of the original file
size of the compressed file
6

N k
C
(1.2)
where k is the number of bits per pixel in the original image. The overhead information
(header) of the files is ignored here.
Distortion
Distortion measures can be divided into two categories: subjective and objective measures.
A distortion measure is said to be subjective, if the quality is evaluated by humans. The use of
human analysts, however, is quite impractical and therefore rarely used. The weakest point of
this method is the subjectivity at the first place. It is impossible to establish a single group of
humans (preferably experts in the field) that everyone could consult to get a quality evaluation
of their pictures. Moreover, the definition of distortion highly depends on the application, i.e.
the best quality evaluation is not always made by people at all.
In the objective measures the distortion is calculated as the difference between the original and
the reconstructed image by a predefined function. It is assumed that the original image is
perfect. All changes are considered as occurrences of distortion, no matter how they appear to
a human observer. The quantitative distortion of the reconstructed image is commonly
measured by the mean absolute error (MAE), mean square error (MSE), and peak-to-peak
signal to noise ratio (PSNR):
MAE 
1 N
 yi  xi
N i 1
(1.3)
1 N
2
MSE    yi  x i 
N i 1
PSNR  10  log 10 255 2 MSE ,
(1.4)
assuming k=8.
(1.5)
These measures are widely used in the literature. Unfortunately these measures do not always
coincide with the evaluations of a human expert. The human eye, for example, does not
observe small changes of intensity between individual pixels, but is sensitive to the changes in
the average value and contrast in larger regions. Thus, one approach would be to calculate the
mean values and variances of some small regions in the image, and then compare them
between the original and the reconstructed image. Another deficiency of these distortion
functions is that they measure only local, pixel-by-pixel differences, and do not consider
global artifacts, like blockiness, blurring, or the jaggedness of the edges.
7
2
Fundamentals in data compression
Data compression can be seen consisting of two separated components: modelling and coding.
The modelling in image compressions consists of the following issues:
 How, and in what order the image is processed?
 What are the symbols (pixels, blocks) to be coded?
 What is the statistical model of these symbols?
The coding consists merely on the selection of the code table; what codes will be assigned to
the symbols to be coded. The code table should match the statistical model as well as possible
to obtain the best possible compression. The key idea of the coding is to apply variable length
codes so that more frequent symbols will be coded with less bits (per pixel) than the less
frequent symbols. The only requirement of coding is that it is uniquely decodable; i.e. any two
different input files must result to different code sequences.
A desirable (but not necessary) property of a code is the so-called prefix property; i.e. no code
of any symbol can be a prefix of the code of another symbol. The consequence of this is that
the codes are instantaneously decodable; i.e. a symbol can be recognized from the code
stream right after its last bit has been received. Well-known prefix codes are Shannon-Fano
and Huffman codes. They can be constructed empirically on the basis of the source. Another
coding scheme, known as Golomb-Rice codes, is also a prefix code, but it presumes a certain
distribution of the source.
The coding is usually considered as the easy part of the compression. This is because the
coding can be done optimally (corresponding to the model) by arithmetic coding! It is
optimal, not only in theory, but also in practice, no matter what is the source. Thus the
performance of a compression algorithm depends on the modelling, which is the key issue in
data compression. Arithmetic coding, on the other hand, is sometimes replaced by sub-optimal
codes like Huffman coding (or another coding scheme) because of practical aspects, see
Section 2.2.2.
2.1 Modelling
2.1.1 Segmentation
The models of the most lossless compression methods (for both binary and gray-scale images)
are local in the way they process the image. The image is traversed pixel by pixel (usually in
row-major order), and each pixel is separately coded. This makes the model relatively simple
and practical (small memory requirements). On the other hand, the compression schemes are
limited to the local characteristics of the image.
In the other extreme there are global modelling methods. Fractal compression techniques are
an example of modelling methods of this kind. They decompose the image into smaller parts
which are described as linear combinations of the other parts of the image, see Figure 2.1. The
global modelling is somewhat impractical because of the computational complexity of the
methods.
8
Block coding is a compromise between the local and global models. Here the image is
decomposed into smaller blocks which are separately coded. The larger the block, the better
the global dependencies can be utilized. The dependencies between different blocks, however,
are often ignored. The shape of the block can be uniform, and is often fixed throughout the
image. The most common shapes are square and rectangular blocks because of their practical
usefulness. In quadtree decomposition the shape is fixed but the size of the block varies.
Quadtree thus offers a suitable technique to adapt to the shape of the image with the cost of a
few extra bits describing the structure of the tree.
In principle, any segmentation technique can be applied in block coding. The use of more
complex segmentation techniques is limited because they are often computationally
demanding, but also because of the overhead required to code the block structure; as the shape
of the segmentation is adaptively determined, it must be transmitted to the decoder also. The
more complex the segmentation, the more bits are required. The decomposition is a trade-off
between the bit rate and good segmentation:
Simple segmentation:
Complex segmentation:
+ Only small cost in bit rate, if any.
+ Simple decomposition algorithm.
- Poor segmentation.
- Coding of the blocks is a key issue.
- High cost in the bit rate
- Computationally demanding decomposition.
+ Good segmentation according to image shape.
+ Blocks are easier to code.
"pattern"
"shape"
Figure 2.1: Intuitive idea of global modelling.
9
2.1.2 Order of processing
The next questions after the block decomposition are:
 In what order the blocks (or the pixels) of the image are processed?
 In what order the pixels inside the block are processed?
In block coding methods the first question is interesting only if the inter-block correlations
(dependencies between the blocks) will be considered. In pixel-wise processing, on the other
hand, it is essential since the local modelling is inefficient without taking advantage of the
information of the neighboring pixels. The latter topic is relevant for example in coding the
transformed coefficients of DCT.
The most common order of processing is row-major order (top-to-down, and from left-toright). If a particular compression algorithm considers only 1-dimensional dependencies (eg.
Ziv-Lempel algorithms), an alternative processing method would be the so-called zigzag
scanning in order to utilize the two-dimensionality of the image data, see Figure 2.2.
A drawback of the top-to-down processing order is that the image is only partially "seen"
during the decompression. Thus, after decompression of 10 % of the image pixels, it is only
little known about the rest of the image. A quick overview of the image, however, would be
convenient for example in image archiving systems where the image database is browsed
often to retrieve the desired image. A progressive modelling is an alternative approach to the
ordering of the image to avoid this deficiency.
The idea in the progressive modelling is to arrange for the quality of an image to increase
gradually as data is received. The most "useful" information in the image is sent first, so that
the viewer can begin to use the image before it is completely displayed, and much sooner than
if the image were transmitted in normal raster order. There are three basically different ways
to achieve progressive transmission:
 In transform coding to transmit low-frequent components of the blocks first.
 In vector quantization to begin with a limited palette of colors and gradually provide
more information so that color details increase with time.
 In pyramid coding a low resolution version of the image is transmitted first, following
with gradually increasing resolutions until full precision is reached, see Figure 2.3 for an
example.
These progressive modes of operation will be discussed in more detailed in Section 4.
10
(a)
(b)
Figure 2.2: Zigzag scanning; (a) in pixel-wise processing; (b) in DCT-transformed block.
0.1 %
0.5 %
2.1 %
8.3 %
Figure 2.3: Early stage of transmission of the image Camera (2562568);
in sequential order (above); in progressive order (below).
2.1.3 Statistical modelling
Data compression in general is based on the following abstraction:
Data = information content + redundancy
(2.1)
The aim of compression is to remove the redundancy and describe the data by its information
content. (an observant reader may notice some redundancy between the course of String
algorithms and this course). In statistical modelling the idea is to "predict" symbols that are to
be coded by using a probability distribution for the source alphabet. The information content
of a symbol in the alphabet is determined by its entropy:
11
H x   log 2 p x
(2.2)
where x is the symbol and p x is its probability. The higher the probability, the lower the
entropy, and thus the shorter codeword should be assigned to the symbol. The entropy H x
gives the number of bits required to code the symbol x in an average, in order to achieve the
optimal result. The overall entropy of the probability distribution is given by:
k
H    p x   log 2 p x 
(2.3)
x 1
where k is the number of symbols in the alphabet. The entropy gives the lower bound of
compression that can be achieved (measured in bits per symbol), corresponding to the model.
(The optimal compression can be realized by arithmetic coding.) The key issue is how to
determine these probabilities. The modelling schemes can be classified into following three
categories:
 Static modelling
 Semi-adaptive modelling
 Adaptive (or dynamic) modelling
In the static modelling the same model (code table) is applied to all input data ever to be
coded. Consider text compression; if the ASCII data to be compressed is known to consist of
English text, the model based on the frequency distribution of English text could be applied.
For example, the probabilities of the most likely symbols in ASCII file of English text are
p(' ')=18 %, p('e')=10 %, p('t')=8 % in an average. Unfortunately the static modelling fails if
the input data is not English text, rather than binary data of an executable file, for example.
The advantage of static modelling, is that no side information is needed to transmit to the
decoder, and that the compression can be done by one-pass over the input data.
Semi-adaptive modelling is a two-pass method in the sense the input data is processed. In the
first phase the input data is analyzed and some statistical information of it (or the code table)
is sent to the decoder. At the second phase the actual compression is done on the basis of this
information (eg. frequency distribution of the data) which is now known by the decoder, too.
Dynamic modelling takes one step further and adapts to the input data "on-line" during the
compression. It is thus a one-step method. As the decoder does not have any prior knowledge
of the input data, an initial model (code table) should be used for compressing the first
symbols of the data. However, when the coding/decoding proceeds the information of the
symbols already coded/decoded can be taken advantage of. The model for a particular symbol
to be coded can be constructed on the basis of the frequency distribution of all the symbols
that have already been coded. Thus both encoder and decoder have the same information and
no side-information is needed to be sent to the decoder.
Consider the symbol sequence of (a, a, b, a, a, c, a, a, b, a). If no prior knowledge is allowed,
one could apply the static model given in Table 2.1. In semi-adaptive modelling the frequency
distribution of the input data is calculated. The probability model is then constructed on the
12
basis of the relative frequencies of these symbols, see Table 2.1. The entropy of the input data
is:
Static model:
H
1
1.58  1.58...1.58  1.58
10
(2.4)
Semi-adaptive model:
1
  0.51  0.51  2.32  0.51  0.51  3.32  0.51  0.51  2.32  0.51
10
7
2
1

 0.51   2.32   3.32 = 1.16
10
10
10
H
(2.5)
Table 2.1: Example of the static and semi-adaptive models.
STATIC MODEL
H
symbol p(x)
a
0.33
1.58
b
0.33
1.58
c
0.33
1.58
SEMI-ADAPTIVE MODEL
H
symbol count
p(x)
a
7
0.70
0.51
b
2
0.20
2.32
c
1
0.10
3.32
In dynamic modelling the input data is processed as shown in Table 2.2. In the beginning,
some initial model is needed since no prior knowledge is allowed from the input data. Here
we assume equal probabilities. The probability of the first symbol ('a') is thus 0.33 and the
corresponding entropy 1.58. After that, the model is updated by increasing the count for
symbol 'a' by one. Note, that it is assumed that each symbol has been occurred once before the
processing, i.e. their initial counts equal to 1. (This is for avoiding the so-called zerofrequency problem. Because if we have no occurrence of a symbol, its probability would be
0.00 yielding entropy of .)
At the second step, the symbol 'a' is again processed. Now the modified frequency distribution
gives the probability of 2/4 = 0.50 for the symbol 'a', resulting to entropy of 1.00. As the
coding proceeds, the more accurate approximation of the probabilities are obtained, and in the
final step the symbol 'a' has now the probability of 7/12 = 0.58, resulting to entropy of 0.78.
The sum of the entropies of the coded symbols is 14.5 bits, yielding to the overall entropy of
1.45 bits per symbol.
The corresponding entropies of the different modelling strategies are summarized here:
Static modelling:
Semi-adaptive modelling:
Dynamic modelling:
1.58 (bits per symbol)
1.16
1.45
13
Table 2.2: Example of the dynamic modelling. The numbers are the frequencies of the
symbols.
step:
symbol
a
b
c
p(x)
H
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
1
1
1
0.33
1.58
2
1
1
0.50
1.00
3
1
1
0.20
2.32
3
2
1
0.50
1.00
4
2
1
0.57
0.81
5
2
1
0.13
3.00
5
2
2
0.56
0.85
6
2
2
0.60
0.74
7
2
2
0.18
2.46
7
3
2
0.58
0.78
It should be noted, that in the semi-adaptive modelling the information used in the model
should also be sent to the decoder, which would increase the overall bit rate. Moreover, the
dynamic modelling is inefficient in the early stage of the processing, but it will quickly
improve its performance when more symbols have been processed, and thus more information
can be used in the modelling. The result of the dynamic model is thus much closer to the
semi-adaptive model when the length of the source is longer. Here the example was too short
that the model would have had enough time adapt to the input data.
The properties of the different modelling strategies are summarized as follows:
Static modelling:
Semi-adaptive modelling:
Dynamic modelling:
+ One-pass method
+ No side information
- Non-adaptive
+ No updating of model
during compression
- Two-pass method
- Side information needed
+ Adaptive
+ No updating of model
during compression
+ One-pass method
+ No side information
+ Adaptive
- Updating of model
during compression
Context modelling:
So far we have considered only the overall frequency distribution of the source, but paid no
attention to the spatial dependencies between the individual pixels. For example, the
intensities of neighboring pixels are very likely to have strong correlation between each other.
Once again, consider ASCII data of English text, when the first five symbols have already
been coded: "The_q...". The frequency distribution of the letters in English would suggest that
the following letter would be blank with the probability of 18 %, or the letter 'e' (10 %), 't'
(8 %), or any other with the decreasing probabilities. However, by consulting dictionary under
the Q-chapter, it can be found that more than 99 % of the words have letter 'u' following the
letter 'q' (eg. quadtree, quantize, quality). Thus, the probability distribution highly depends on,
in which context it occurs. The solution is to use, not only one, but several different models,
one for each context. The entropy of a N-level context model is the weighted sum of the
entropies of the individual contexts:


  
N
 k

H N    p c j   p xi c j  log 2 p xi c j 
j 1
 i 1

 
14
(2.6)
where p x c is the probability of symbo x in a context c, and N is the number of different
contexts. In the image compression, the context is usually the value of the previous pixel, or
the values of two or more neighboring pixels to the west, north, northwest, and northeast of
the current pixel. The only limitation is that the pixels within the context template must have
been compressed and thus seen by the decoder, so that both the encoder and decoder have the
same information.
The number of contexts equals the number of possible combinations within the neighboring
pixels that are present in the context template. A 4-pixel template is given in Figure 2.4. The
number of contexts increases exponentially with the number of pixels within the template.
With one pixel in template, the number of contexts are 28=256, but with two pixels the
number is already 228=65536, which is rather impractical because of high memory
requirement. One must also keep in mind, that in the semi-adaptive modelling, all models
must be sent to the decoder, so the question is crucial also for the compression performance.
NW N NE
W
Figure 2.4: Example of a four pixel context template.
A solution to this problem is to quantize the pixels values within the template to reduce the
number of possible combinations. For example, by quantizing the values by 4 bits each, the
total number of contexts in a one-pixel template is 24=16, and in a two-pixel template
224=256. (Note, that the quantization is performed only in the computer memory in order to
calculate what context model should be used; the original pixel values in the compressed file
are untouched.)
Table 2.3: The number of contexts as the function of the number of pixels in the template.
Pixels within
the template
1
2
3
4
No. of contexts
256
65536
16106
4109
No. of contexts
if quantized to
4 bits
16
256
4096
65536
Predictive modelling:
Predictive modelling consists of the following three components:
 Prediction of the current pixel value.
 Calculating the prediction error.
 Modelling the error distribution.
15
The value of the current pixel x is predicted on the basis of the pixels that have already been
coded (thus seen by the decoder too). Refer the neighboring pixels as in Figure 2.4, thus a
possible predictor could be:
x 
 xW  x N 
(2.7)
2
The prediction error is the difference between the original and the predicted pixel values:
(2.8)
e  x  x
The prediction error is then coded instead of the original pixel value. The probability
distribution of the prediction errors are concentrated around zero while very large positive,
and very small negative errors are rare to appear, thus the distribution resembles Gaussian
normal distribution function where the only parameter is the variance of the distribution, see
Figure 2.5. Now even a static model can be applied to estimate the probabilities of the errors.
Moreover, the use of context modelling is not necessary anymore. The methods of this kind
are sometimes referred as differential pulse code modulation (DPCM).
p (e )
-255
0
255
e
Figure 2.5: Probability distribution function of the prediction errors.
2.2 Coding
As stated earlier coding is considered as the easy part of compression. That is, good coding
methods has been known since decades, eg. Huffman coding (1952). Moreover, the arithmetic
coding (1979) is well-known to be optimal corresponding to the model. Let us next study
these methods.
2.2.1 Huffman coding
Huffman algorithm creates a code tree (called Huffman tree) on the basis of probability
distribution. The algorithm starts by creating for each symbol a leaf node containing the
symbol and its probability, see Figure 2.6a. The two nodes with the smallest probabilities
16
become siblings under a parent node, which is given a probability equal to the sum of its two
children's probabilities, see Figure 2.6b. The combining operation is repeated, choosing the
two nodes with the smallest probabilities, and ignoring nodes that are already children. For
example, at the next step the new node formed by combining a and b is joined with the node
for c to make a new node with probability p=0.2. The process continues until there is only one
node without a parent, which becomes the root of the tree, see Figure 2.6c.
The two branches from every nonleaf node are next labelled 0 and 1 (the order is not
important). The actual codes of the symbols are obtained by traversing the tree from the root
to the corresponding leaf nodes representing the symbols; the codeword is the catenation of
the branch labels during the path from root to the leaf node, see Table 2.4.
a
b
0.05
0.05
c
0.1
d
e
0.2
f
0.3
g
0.2
0.1
(a)
0.1
a
b
0.05
0.05
c
0.1
d
e
0.2
f
0.3
g
0.2
0.1
(b)
1.0
0
1
0.4
0
1
0.2
0.6
0
1
0
1
0.1
0
a
0.05
0.3
1
b
0.05
1
0
c
0.1
d
0.2
e
0.3
f
0.2
g
0.1
(c)
Figure 2.6: Constructing the Huffman tree: (a) leaf nodes; (b) combining nodes;
(c) the complete Huffman tree.
17
Table 2.4. Example of symbols, their probabilities, and the corresponding Huffman codes.
Symbol
a
b
c
d
e
f
g
Probability
0.05
0.05
0.1
0.2
0.3
0.2
0.1
Codeword
0000
0001
001
01
10
110
111
2.2.2 Arithmetic coding
Arithmetic coding is known to be optimal coding method in respect to the model. Moreover, it
is extremely suitable for dynamic modelling, since there is no actual code table like in
Huffman coding to be updated. The deficiency of Huffman coding is emphasized in the case
of binary images. Consider binary images having the probability distribution of 0.99 for
a white pixel, and 0.01 for a black pixel. The entropy of the white pixel is -log20.99 = 0.015
bits. However, in Huffman coding the code length is bounded to be 1 bit per symbol at
minimum. In fact, the only possible code for binary alphabet would be one bit for each
symbol, thus no compression would be possible.
Let us next consider the fundamental properties of binary arithmetic. With n bits at most 2n
different combinations can represented; or with n bits a code interval between zero to one can
be divided into 2n parts each having the length of 2-n, see Figure 2.7. Let us assume A is a
power of ½, that is A=2-n. From the opposite point of view, an interval with the length A can
be coded by using  log 2 A bits (assuming A is a power of ½).
1
0.875
0.75
0.625
0.5
0.375
0.25
0.125
0
111
110
11
101
10
100
1
011
0
010
01
001
00
000
Figure 2.7: Interval [0,1] is divided into 8 parts, thus each having the length of 2-3=0.125.
Each interval can now be coded by using  log 2 0.125  3 bits.
18
The basic idea of arithmetic coding is to represent the entire input file as an small interval
between the range [0,1]. The actual coding is the binary code representation of the interval;
taking  log 2 A bits, in respect to the length of the interval (A). In the other words, arithmetic
coding represents the input file with a single codeword.
Arithmetic coding starts by dividing the interval into subintervals according to the probability
distribution of the source. The length of each subinterval equals to the probability of the
corresponding symbol. Thus, the sum of the lengths of all subintervals equals to 1 filling the
range [0, 1] completely. Consider the probability model of Table 2.1. Now, the interval is
divided into the subintervals [0.0, 0.7], [0.7, 0.9], and [0.9, 1.0] corresponding to the symbols
a, b, and c, see Figure 2.8.
The first symbol in a sequence (a,a,b,...) to be coded is 'a'. The coding proceeds by taking the
subinterval of the symbol 'a', that is [0, 0.7]. Then this interval is again splitted into three
subintervals so that the length of each subinterval is relative to its probability. For example,
the subinterval for symbol 'a' is now 0.7  [0, 0.7] = [0, 0.49]; for symbol 'b' it is 0.2  [0, 0.7]
plus the length of the first part, that is [0.49, 0.63]. The last subinterval for symbol 'c' is [0.63,
0.7]. The next symbol to be coded is 'a', so the next interval will be [0, 0.49].
1
0.9
c
b
0.7
0.63
0.49
c
b
0.441
a
0.343
c
b
a
a
0
Figure 2.8: Example of arithmetic coding of sequence 'aab' using the model of Table 2.1.
The process is repeated for each symbol to be coded resulting to a smaller and smaller
interval. The final interval describes the source uniquely. The length of this interval is the
cumulative multiplication of the probabilities of the coded symbols:
n
A final  p1  p2  p3 ...  pn   pi
(2.9)
i 1
Due to the previous discussions this interval can be coded by
n
n
i 1
i 1
C A   log 2  pi    log 2 pi
(2.10)
19
number of bits (assuming A is a power of ½). If the same model is applied for each symbol to
be coded, the code length can be described in respect to the source alphabet:
m
C A    pi  log 2 pi
(2.11)
i 1
where m is the number of symbols in the alphabet, and pi is the probability of that particular
symbol in the alphabet. The important observation is that the C A equals to the entropy! This
means that the source can be coded optimally if A is a power of ½. This is the case if the
length of the source approaches to infinite. In practice, arithmetic coding is optimal even for
rather short sources.
Optimality of arithmetic coding:
The length of the final interval is not exactly a power of ½, as it was assumed. The final
interval, however, can be approximated by any of its subinterval that meets the requirement
A=2-n. Thus the approximation can be bounded by
A  A'  A
2
(2.12)
yielding to the upper bound of the code length:
C A   log 2 A 2   log 2 A  1  H  1
(2.13)
The upper bound for the coding deficiency thus is 1 bit for the entire file. Note, that Huffman
coding can be simulated by arithmetic coding, where each subinterval division is restricted to
be a power of ½. There is no such restrictions in arithmetic coding (except in the final
interval), which is the reason why arithmetic coding is optimal, in contrary to Huffman
coding. The code length of Huffman coding has been shown to be bounded by
C A  H  ps1  log
2  log e
 H  ps1  0.086 (bits per symbol)
e
(2.14)
where ps1 is the probability of the most probable symbol in the alphabet. In the other words,
the relative performance of Huffman coding (in respect to the entropy H) is the better the
smaller is the probability of the most probable symbol s1. This is often the case in multisymbol alphabet, however, in binary images the probability distribution is often very skew; it
is not rare that the probability of the white pixel is as high as 0.99.
In principle, the problem with the skew distribution of binary alphabet is possible to avoid by
blocking the pixels, for example by constructing a new symbol from 8 subsequent pixel in the
image. The redefined alphabet thus consists of all the 256 possible pixel combinations.
However, the probability of the most probable symbol (8 white pixels) is still too high, that is
0.998 = 0.92. Moreover, the number of combinations increases exponentially by the number of
symbols in the block.
20
Implementation aspects of arithmetic coding:
Two variables are needed to describe the interval; A is the length of the interval, and C is the
lower bound. The interval, however, very soon gets so small that it cannot be expressed by 16or 32 bit integer in computer memory. The following procedure is thus applied. When the
interval falls completely below the half point (0.5), it is known that the codeword describing
the final interval starts with the bit 0. If the interval were above the half point, the codeword
would start with the bit 1. In both cases, the starting bit can be outputted, and the processing
can then be limited to the corresponding half of the full scale, which is either [0, 0.5], or
[0.5, 1]. This is realized by zooming the corresponding half as shown in Figure 2.9.
1.0
1.0
0.5
0.5
0.6
0.4
0.3
0.2
0.0
0.0
Figure 2.9: Example of half point zooming.
The underflow can also occur if the interval decreases so that its lower bound is just below the
half point, but the upper bound is still above it. In this case the half-point zoom cannot be
applied. The solution is the so-called quarter-point zooming, see Figure 2.10. The condition
for quarter-point zooming is that the lower bound of the interval exceeds 0.25, and the upper
bound doesn't exceed 0.75. Now it is known that the following bit stream is either "01xxx" if
final interval is below half point, or "10xxx" if the final interval is above the half point (Here
xxx refers to the rest of the code stream). In general, it can be shown that if the next bit due to
a half-point zooming will be b, it is followed by as many opposite bits of b as there were
quarter-point zoomings before the next half-point zooming.
Since the final interval completely covers either the range [0.25, 0.5], or the range [0.5, 0.75],
the encoding can be finished by sending the bit pair "01" if the upper bound is below 0.75, or
"10" if the lower bound exceeds 0.25.
1.0
1.0
1.0
0.9
0.75
0.75
0.7
0.6
0.5
0.5
0.75
0.5
0.4
0.3
0.25
0.25
0.0
0.0
0.25
0.0
0.1
Figure 2.10: Example of two subsequent quarter-point zoomings.
21
QM-coder:
QM-coder is an implementation of arithmetic coding which has been specially tailored for
binary data. One of the primary aspects in its designing has also been the speed. The main
differences between QM-coder and the arithmetic coding described in the previous section is
summarized as follows:
 The input alphabet of QM-coder must be in binary form.
 For gaining speed, all multiplications in QM-coder has been eliminated.
 QM-coder includes its own modelling procedures.
The fact that QM-coder is a binary arithmetic coder does not exclude the possibility of having
multi-alphabet source. The symbols just have to be coded by one bit at a time, using a binary
decision tree. The probability of each symbol is the product of the probabilities of the node
decisions.
In QM-coder the multiplication operations have been replaced by fast approximations or by
shift-left-operations in the following way. Denote the more probable symbol of the model by
MPS, and the less probable symbol by LPS. In other words, the MPS is always the symbol
which has the higher probability. The interval in QM-coder is always divided so that the LPS
subinterval is above the MPS subinterval. If the interval is A and the LPS probability estimate
is Qe, the MPS probability estimate should ideally be (1-Qe). The lengths of the respective
subintervals are then A  Qe and A  1  Qe . This ideal subdivision and symbol ordering is
shown in Figure 2.11.
A+C
LPS
A Qe
C+A-Qe A
M PS
A (1-Qe)
C
Figure 2.11: Illustration of symbol ordering and ideal interval subdivision.
Instead of operating in the scale [0, 1], the QM-coder operates in the scale [0, 1.5]. Zooming,
(or renormalization as it is called in QM-coder) is performed every time the length of the
interval gets below half the scale 0.75 (the details of the renormalization are by-passed here).
Thus the interval length is always in the range 0. 75  A  1. 5 . Now, the following rough
approximation is made:
(2.15)
A  1  A  Qe  Qe
22
If we follow this scheme, coding a symbol changes the interval as follows:
After MPS:
C is unchanged
(2.16)
A  A  1  Qe  A  A  Qe  A  Qe
After LPS:
C  C  A  1  Qe  C  A  A  Qe  C  A  Qe
(2.17)
A  A  Qe  Qe
Now all multiplications are eliminated, except those needed in the renormalization. However,
the renormalization involves only multiplications by the number of two, which can be
performed by bit-shifting operation.
QM-coder also includes its own modelling procedures, which makes the separation between
modelling and coding a little bit unconventional, see Figure 2.12. The modelling phase
determines the context to be used and the binary decision to be coded. QM-coder then picks
up the corresponding probability, performs the actual coding and updates the probability
distribution if necessary. The way QM-coder handles the probabilities is based on a stochastic
algorithm (details are omitted here). The method also adapts quickly to local variations in the
image. For details see the "JPEG-book" by Pennebaker and Mitchell [1993].
Compression with
arithmetic coding
Modelling:
Compression
with QM-coder
Coding:
Modelling:
Processing
the image
Determining
the context
Determining
the prob.
distribution
Processing
the image
Arithmetic
Coding
Determining
the context
Updating
model
QM-coder:
Determining
the prob.
distribution
Arithmetic
Coding
Updating
model
Figure 2.12: Differences between the optimal arithmetic coding (left), and
the integrated QM-coder (right).
2.2.3 Golomb-Rice codes
Golomb codes are a class of prefix codes which are suboptimal but very easy to implement.
Golomb codes are used to encode symbols from a countable alphabet. The symbols are
arranged in descending probability order, and non-negative integers are assigned to the
23
symbols, beginning with 0 for the most probable symbol, see Figure 2.13. To encode integer x,
it is divided into two components, to the most significant part xM and to the least significant
part xL:
x
x M   
m
x L  x mod m
(2.18)
where m is the parameter of the Golomb coding. The values xM and xL are a complete
representation of x since:
(2.19)
x  xM  m  xL
The xM is outputted using unary code, and the xL is outputted using binary code (an adjusted
binary code is needed if m is not a power of 2), see Table 2.5 for an example.
Rice coding is the same as Golomb coding except that only a subset of the parameter values
may be used, namely the power of 2. The Rice code with the parameter k is exactly the same
as the Golomb code with parameter m=2k. The Rice codes are even simpler to implement
since xM can be computed by shifting x bit-wise right k times. The xL is computed by masking
out all but the k low order bits. The sample Golomb and Rice code tables are shown in
Table 2.6.
p (x )
x
0 1 2 ...
Figure 2.13: Probability distribution function assumed by Golomb and Rice codes.
24
Table 2.5. An example of the Golomb coding with the parameter m=4.
x
0
1
2
3
4
5
6
7
8
:
xM
0
0
0
0
1
1
1
1
2
:
xL
0
1
2
3
0
1
2
3
0
:
Code of xM
0
0
0
0
10
10
10
10
110
:
Code of xL
00
01
10
11
00
01
10
11
00
:
Table 2.6. Golomb and Rice codes for the parameters m=1 to 5.
Golomb:
Rice:
x=0
1
2
3
4
5
6
7
8
9
:
m=1
k=0
0
10
110
1110
11110
111110
:
:
:
:
:
m=2
k=1
00
01
100
101
1100
1101
11100
11101
111100
111101
:
m=3
00
010
011
100
1010
1011
1100
11010
11011
11100
:
25
m=4
k=2
000
001
010
011
1000
1001
1010
1011
11000
11001
:
m=5
000
001
010
0110
0111
1000
1001
1010
10110
10111
:
3
Binary images
Binary images represent the simplest and most space economic form of images and are of
great interest when colors or grey-scales are not needed. They consists only of two colors,
black and white. The probability distribution of this input alphabet is often very skew, e.g.
p(white)=98 % and p(black)=2 %. Moreover, the images usually have large homogenous areas
of the same color. These properties can be taken advantage of in the compression of binary
images.
3.1 Run-length coding
Run-length coding (RLC), also referred as run-length encoding (RLE) is probably the best
known compression method for binary images. The image is processed row by row, from left
to right. The idea is to block the subsequent pixels in each scanning line having the same
color. Each block, referred as run, is then coded by its color information and length resulting
to a code stream like
(3.1)
C1, n1, C2, n2, C3, n3,...
where Ci is the code due to the color information of the i'th run, and ni is the code due to the
length of the run. In binary images there are only two colors, thus a black run is always
followed by a white run, and vice versa. Therefore it is sufficient to code only the lengths of
the runs; no color information is needed. The first run in each line is assumed to be white. On
the other hand, if the first pixel happens to be black, a white run of zero length is coded.
The run-length "coding" method is purely a modelling scheme resulting to a new alphabet
consisting of the lengths of the runs. These can be coded for example by using the Huffman
code given in Table 3.1. Separate code tables are used to represent the black and white runs.
The code table contains two types of codewords: terminating codewords (TC) and make-up
codewords (MUC). Runs between 0 and 63 are coded using single terminating codeword.
Runs between 64 and 1728 are coded by a MUC followed by a TC. The MUC represents
a run-length value of 64M (where M is an integer between 1 and 27) which is equal to, or
shorter than, the value of the run to be coded. The following TC specifies the difference
between the MUC and the actual value of the run to be coded. See Figure 3.1 for an example
of run-length coding using the code table of Table 3.1.
26
current line
011
10100
11 0111 0010
1110
11
code generated
Figure 3.1: Example of one-dimensional run-length coding.
Vector run-length coding:
The run-length coding efficiently codes large uniform areas in the images, even though the
two-dimensional correlations are ignored. The idea of the run-length coding can also be
applied two-dimensionally, so that the runs consist of mn-sized blocks of pixels instead of
single pixels. In this vector run-length coding the pixel combination in each run has to be
coded in addition to the length of the run. Wang and Wu [1992] reported up to 60 %
improvement in the compression ratio when using 84 and 44 block sizes. They also used
blocks of 41 and 81 with slightly smaller improvement in the compression ratio.
Predictive run-length coding:
The performance of the run-length coding can be improved by using prediction technique as
a preprocessing stage (see also Section 2.1). The idea is to form a so-called error image from
the original one by comparing the value of each original pixel to the value given by
a prediction function. If these two are equal, the pixel of the error image is white; otherwise it
is black. The run-length coding is then applied to the error image instead of the original one.
Benefit is gained from the increased number of white pixels; thus longer white runs will be
obtained.
The prediction is based on the values of certain (fixed) neighboring pixels. These pixels have
already been encoded and are therefore known to the decoder. The prediction is thus identical
in the encoding and decoding phases. The image is scanned in row-major order and the value
of each pixel is predicted from the particular observed combination of the neighboring pixels,
see Figure 3.2. The frequency of a correct prediction varies from 61.4 to 99.8 % depending on
the context; the completely white context predicts a white pixel with a very high probability,
and a context with two white and two black pixels usually gives only an uncertain prediction.
The prediction technique increases the proportion of white pixels from 94 % to 98 % for a set
of test images; thus the number of black pixels is only one third than that of the original
image. An improvement of 30 % in the compression ratio was reported by Netravali and
Mounts [1980]; and up to 80 % with an inclusion of the so-called re-ordering technique.
27
Table 3.1: Huffman code table for the run-lengths.
n
white runs
- - - terminating codewords - - n
black runs
white runs
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
00110101
000111
0111
1000
1011
1100
1110
1111
10011
10100
00111
01000
001000
000011
110100
110101
101010
101011
0100111
0001100
0001000
0010111
0000011
0000100
0101000
0101011
0010011
0100100
0011000
00000010
00000011
00011010
0000110111
010
11
10
011
0011
0010
00011
000101
000100
0000100
0000101
0000111
00000100
00000111
000011000
0000010111
0000011000
0000001000
00001100111
00001101000
00001101100
00000110111
00000101000
00000010111
00000011000
000011001010
000011001011
000011001100
000011001101
000001101000
000001101000
n
white runs
- - - make-up codewords - - n
black runs
white runs
black runs
64
128
192
256
320
384
448
512
576
640
704
768
832
896
11011
10010
010111
0110111
00110110
00110111
01100100
01100101
01101000
01100111
011001100
011001101
011010010
011010011
0000001111
000011001000
000011001001
000001011011
000000110011
000000110100
000000110101
0000001101100
0000001101101
0000001001010
0000001001011
0000001001100
0000001001101
0000001110010
0000001110011
0000001110100
0000001110101
0000001110110
0000001110111
0000001010010
0000001010011
0000001010100
0000001010101
0000001011010
0000001011011
0000001100100
0000001100101
000000000001
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
960
1024
1088
1152
1216
1280
1344
1408
1472
1536
1600
1664
1728
EOL
28
00011011
00010010
00010011
00010100
00010101
00010110
00010111
00101000
00101001
00101010
00101011
00101100
00101101
00000100
00000101
00001010
00001011
01010010
01010011
01010100
01010101
00100100
00100101
01011000
01011001
01011010
01011011
01001010
01001011
00110010
00110011
00110100
011010100
011010101
011010110
011010111
011011000
011011001
011011010
011011011
010011000
010011001
010011010
011000
010011011
0000000000001
black runs
000001101010
000001101011
000011010010
000011010011
000011010100
000011010100
000011010101
000011010111
000001101100
000001101101
000011011010
000011011011
000001010100
000001010101
000001010110
000001010111
000001100100
000001100101
000001010010
000001010011
000000100100
000000110111
000000111000
000000100111
000000101000
000001011000
000001011001
000000101011
000000101100
000001011010
000001100110
000001100111
99.76 %
96.64 %
62.99 %
77.14 %
83.97 %
94.99 %
87.98 %
61.41 %
71.05 %
61.41 %
86.59 %
78.74 %
70.10 %
78.60 %
95.19 %
91.82 %
Figure 3.2: A four-pixel prediction function. The various prediction contexts (pixel
combinations) are given in the left column; the corresponding prediction value in the middle;
and the probability of the correct prediction in rightmost column.
3.2 READ code
Instead of the lengths of the runs, one can code the location of the boundaries of the runs (the
black/white transitions) relative to the boundaries of the previous row. This is the basic idea in
the method called relative element address designate (READ). The READ code includes three
coding modes:
 Vertical mode
 Horizontal mode
 Pass mode
In vertical mode the position of each color change (white to black or black to white) in the
current line is coded with respect to a nearby change position (of the same color) on the
reference line, if one exists. "Nearby" is taken to mean within three pixels, so the verticalmode can take on one of seven values: -3, -2, -1, 0, +1, +2, +3. If there is no nearby change
position on the reference line, one-dimensional run-length coding - called horizontal mode - is
used. A third condition is when the reference line contains a run that has no counterpart in the
current line; then a special pass code is sent to signal to the receiver that the next complete run
of the opposite color in the reference line should be skipped. The corresponding codewords
for each coding mode are given in Table 3.2.
Figure 3.3 shows an example of coding in which the second line of pixels - the current line - is
transformed into the bit-stream at the bottom. Black spots mark the changing pixels that are to
be coded. Both end-points of the first run of black pixels are coded in vertical mode, because
that run corresponds closely with one in the reference line above. In vertical mode, each offset
is coded independently according to a predetermined scheme for the possible values. The
beginning point of the second run of black pixels has no counterpart in the reference line, so it
is coded in horizontal mode. Whereas vertical mode is used for coding individual change29
points, horizontal mode works with pairs of change-points. Horizontal-mode codes have three
parts: a flag indicating the mode, a value representing the length of the preceding white run,
and another representing the length of the black run. The second run of black pixels in the
reference line must be "passed", for it has no counterpart in the current line, so the pass code
is emitted. Both end-points of the next run are coded in vertical mode, and the final run is
coded in horizontal mode. Note that because horizontal mode codes pairs of points, the final
change-point shown is coded in horizontal mode even though it is within 3-pixel range of
vertical mode.
Table 3.2. Code table for the READ code. Symbol wl refers to the length of the white run and
bl to the length of the black run; Hw() and Hb() refer to the Huffman codes of Table 3.1.
Mode:
Pass
Horizontal
Codeword:
0001
001 + Hw(wl) + Hb(bl)
0000011
000011
011
1
010
000010
0000010
+3
+2
+1
0
-1
-2
-3
Vertical
reference line
current line
vertical mode
-1
0
010
1
horizontal mode pass
3 white 4 black code
001 1000 011
0001
vertical mode
+2 -2
horizontal mode
4 white 7 black
000011 000010 001 1011
00011
code generated
Figure 3.3: Example of the two-dimensional READ code.
3.3 CCITT group 3 and group 4 standards
The RLE and READ algorithms are included in two image compression standards, known as
CCITT1 Group 3 (G3) and Group 4 (G4). They are nowadays widely used in FAX-machines.
1
Consultative Committee for International Telegraphy and Telephone
30
The CCITT standard also specifies details like paper size (A4) and scanning resolution. The
two optional resolutions of the image are specified as 17281188 pixels per A4 page
(200100 dpi) in the low resolution, and 17282376 (200200 dpi) in the high resolution.
The G3 specification covers binary documents only, although G4 does include provision for
optional grayscale and color images.
In the G3 standard every kth line of the image is coded by the 1-dimensional RLE-method
(also referred as Modified Huffman) and the 2-dimensional READ-code (more accurately
referred as Modified READ) is applied for the rest of the lines. In the G3, the k-parameter is
set to 2 for the low resolution, and 4 for the high resolution images. In the G4, k is set to
infinite so that every line of the image is coded by READ-code. An all-white reference line is
assumed at the beginning.
3.4 Block coding
The idea in the block coding, as presented by Kunt and Johnsen [1980], is to divide the image
into blocks of pixels. A totally white block (all-white block) is coded by a single 0-bit. All
other blocks (non-white blocks) thus contain at least one black pixel. They are coded with
a 1-bit as a prefix followed by the contents of the block, bit by bit in row-major order, see
Figure 3.4.
The block coding can be extended so that all-black blocks are considered also. See Table 3.2
for the codewords of the extended block coding. The number of uniform blocks (all-white and
all-black blocks) depends on the size of the block, see Figure 3.5. The larger the block size,
the more efficiently the uniform blocks can be coded (in bits per pixel), but the less there are
uniform blocks to be taken advantage of.
In the hierarchical variant of the block coding the bit map is first divided into bb blocks
(typically 1616). These blocks are then divided into quadtree structure of blocks in the
following manner. If a particular b*b block is all-white, it is coded by a single 0-bit.
Otherwise the block is coded by a 1-bit and then divided into four equal sized subblocks
which are recursively coded in the same manner. Block coding is carried on until the block
reduces to a single pixel, which is stored as such, see Figure 3.6.
31
The power of block coding can be improved by coding the bit patterns of the 22-blocks by
Huffman coding. Because the frequency distributions of these patterns are quite similar for
typical facsimile images, a static Huffman code can be applied. The Huffman coding gives an
improvement of ca. 10 % in comparison to the basic hierarchical block coding. Another
improvement is the use of prediction technique in the same manner as was used with the runlength coding. The application of the prediction function of Figure 3.2 gives an improvement
ca. 30 % in the compression ratio.
Figure 3.4: Basic block coding technique. Coding of four sample blocks.
IMAGE 1
100%
IMAGE 3
100%
M IX ED
80%
M IX ED
80%
60%
60%
ALL-WHIT E
40%
40%
20%
ALL-WHIT E
20%
0%
0%
4
6
8
10 12 14 16 18 20 22 24 26 28 30 32
4
6
8
Block size
10 12 14 16 18 20 22 24 26 28 30 32
Block size
IMAGE 4
IMAGE "BLACK HAND"
100%
100%
80%
60%
M IX ED
80%
M IX ED
60%
ALL-BLACK
40%
ALL-BLACK
40%
ALL-WHIT E
20%
20%
0%
ALL-WHIT E
0%
4
6
8
10 12 14 16 18 20 22 24 26 28 30 32
4
Block size
6
8
10 12 14 16 18 20 22 24 26 28 30 32
Block size
Figure 3.5: Number of block types as a function of block size.
Table 3.2. Codewords in the block coding method. Here xxxx refers to the content
of the block, pixel by pixel.
Content of the
block:
All-white
All-black
Mixed
Codeword in
block coding:
Codeword in
extended block
coding:
0
11
10 xxxx
0
1 1111
1 xxxx
32
Image to be compressed:
Code bits:
1
0111
0011 0111 1000
0111 1111 1111
0101 1010 1100
x x
x x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x x x x
x x x x
x
x
x
x
x x x
x x x
x
x
x
x
x
x
Figure 3.6: Example of hierarchical block coding.
3.5 JBIG
JBIG (Joint Bilevel Image Experts Group) is the newest binary image compression standard
by CCITT and ISO2. It is based on context-based compression where the image is compressed
pixel by pixel. The pixel combination of the neighboring pixels (given by the template)
defines the context, and in each context the probability distribution of the black and white
pixels are adaptively determined on the basis of the already coded pixel samples. The pixels
are then coded by arithmetic coding according to their probabilities. The arithmetic coding
component in JBIG is the QM-coder.
Binary images are a favorable source for context-based compression, since even a relative
large number of pixels in the template results to a reasonably small number of contexts. The
templates included in JBIG are shown in Figure 3.7. The number of contexts in a 7-pixel
template is 27=128, and in the 10-pixel model it is 210=1024. (Note that a typical binary image
of 17281188 pixels consists of over 2 million pixels.) The larger the template, the more
accurate probability model it is possible to obtain. However, with a large template the
adaptation to the image takes longer; thus the size of the template cannot be arbitrary large.
The optimal context size for the CCITT test images set Figure 3.8.
7-pixel template
10-pixel template
Two-line version of
the 10-pixel template
?
?
?
?
Pixel to be coded
Pixel within template
Figure 3.7: JBIG sequential model templates.
2
International Standards Organization
33
Compression ratio
26
24
22
20
18
16
14
12
10
8
6
4
2
0
optimal
standard
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16
Pixel in context template
Figure 3.8: Sample compression ratios for context-based compression.
JBIG includes two modes of processing: sequential and progressive. The sequential mode is
the traditional row-major order processing. In the progressive mode, a reduced resolution
version of the image (referred as starting layer) is compressed first, followed by the second
layer, and so on, see Figure 3.9. The lowest resolution is either 12.5 dpi (dots per inch), or 25
dpi. In each case the resolution is doubled for the next layer. In this progressive mode, the
pixels in the previous layer can be included in the context template when coding the next
layer. In JBIG, four such pixels are included, see Figure 3.10. Note that there are four
variations (phases 0, 1, 2, and 3) of the same basic template model depending on the position
of the current pixel.
Highest
resolution
Lowest
resolution
25 dpi
50 dpi
100 dpi
Figure 3.9: Different layers of JBIG.
34
200 dpi
?
?
Phase 0
Phase 1
?
?
Phase 2
Phase 3
Figure 3.10: JBIG progressive model templates.
Resolution reduction in JBIG:
In the compression phase, the lower resolution versions are calculated on the basis on the next
layer. The obvious way to halve the resolution is to group the pixels into 22 blocks and take
the color of majority of these four pixels. Unfortunately, with binary images it is not clear
what to do when two of pixels are black (1) and the other two are white (0). Consistently
rounding up or down tends to wash out the image very quickly. Another possibility is to round
the value in a random direction each time, but this adds considerable noise to the image,
particularly at the lower resolutions.
The resolution reduction algorithm incorporated in JBIG is illustrated in Figure 3.11. the value
of the target pixel is calculated as a linear function of the marked pixels. The alreadycommitted pixels at the lower resolution - the round ones - participate in the sum with
negative weights that exactly offset the corresponding positive weights. This mean that if the
already-committed areas are each either uniformly white or uniformly black, they do not affect
the assignment of the new pixel. If black and white are equally likely and the pixels are
statistically independent, the expected value of the target pixel is 4.5. It is chosen to be black
if the value is 5 or more, and white if it is 4 or less.
This method preserves the overall grayness of the image. However, problems occur with lines
and edges, because these deteriorate very rapidly. To address this problem, a number of
exception patterns can be defined which, when they are encountered, reverse the polarity of
the pixel that is obtained from thresholding the weighted sum as described above. An example
of such an exception pattern is show in Figure 3.12.
35
-1
-3
-3
1
2
1
0
0
2
4
2
0
0
1
2
1
1
1
Figure 3.11: Resolution reduction in JBIG:
participating pixels marked (left);
pixel weights (right)
432594
216297
0
0 1
0
1
Figure 3.12: Example of an
exception pattern for resolution
reduction.
108148
5474
Figure 3.13: Example of JBIG resolution reduction for CCITT image 5.
3.6 JBIG2
The emerging standard JBIG2 enhances the compression of text images using pattern
matching technique. The standard will have two encoding methods: pattern matching &
substitution (PM&S), and soft pattern matching (SPM). The image is segmented into pixel
blocks containing connected black pixels using any segmentation technique. The content of
the blocks are matched to the library symbols. If an acceptable match (within a given error
marginal) is found, the index of the matching symbol is encoded. In case of unacceptable
match, the original bitmap is coded by a JBIG-style compressor. The compressed file consists
of bitmaps of the library symbols, location of the extracted blocks as offsets, and the content
of the pixel blocks.
The PM&S encoding mode performs lossy substitution of the input block by the bitmap of the
matched character. This requires very safe and intelligent matching procedure to avoid
substitution error. The SPM encoding mode, on the other hand, is lossless and is outlined in
Fig. 3.14. Instead of performing substitution, the content of the original block is also coded in
order to allow lossless compression. The content of the pixel block is coded using a JBIGstyle compressor with the difference that the bitmap of the matching symbol is used as an
additional information in the context model. The method applies the two-layer context
template shown in Fig. 3.15. Four context pixels are taken from the input block and seven
from the bitmap of the matching dictionary symbol. The dictionary is builded adaptively
during the compression by conditionally adding the new pixel blocks into dictionary. The
36
standard defines mainly the general file structure and the decoding procedure but leaves some
freedom in the design of the encoder.
Segment image
into pixel blocks
Search for
acceptable match
Next block
Match found?
No
Yes
Encode index of
matching symbol
Encode bitmap by
JBIG style
compressor
Encode original
block using 2-level
context template
Conditionally add
new symbol to the
dictionary
Encode position of
the block as offset
Last block?
Yes
END
Figure 3.14: Block diagram of JBIG2.
Context pixels from
the original image
Context pixels from
the matching pixel block
?
Figure 3.15: Two-level context template for coding the pixel blocks.
3.7 Summary of binary image compression algorithms
Figure 3.16 profiles the performance of several binary image compression algorithms,
including also three well-known universal compression software (Compress, Gzip, Pkzip).
37
Note that all of these methods are lossless. Figure 3.17 gives comparison of JBIG and JBIG2
for the set of CCITT test images.
25.0
Compression ratio
20.0
15.0
23.3
10.0
5.0
18.9
18.0
8.9
10.8
11.2
7.9
17.9
10.3
9.8
0.0
COMPRESS GZIP PKZIP BLOCK RLE 2D-RLE ORLE
G3
G4
JBIG
COMPRESS = Unix standard
compression software
2D-RLE = 2-dimensional RLE [WW92]
GZIP = Gnu compression software
G3 = CCITT Group 3 [YA85]
PKZIP = Pkware compression software
G4 = CCITT Group 4 [YA85]
ORLE = Ordered RLE [NM80]
BLOCK = Hierarchical block coding [KJ80] JBIG = ISO/IEC Standard draft [PM93]
RLE = Run-length coding [NM80]
Figure 3.16: Compression efficiency of several binary image compression algorithms
for CCITT test image 3.
60000
JBIG
JBIG-2
50000
bytes
40000
30000
20000
10000
0
1
2
3
4
5
6
7
8
Figure 3.17: Comparison of JBIG and JBIG2 for the set of CCITT test images.
38
4
Continuous tone images
4.1 Lossless and near-lossless compression
4.1.1 Bit-plane coding
The idea of bit plane coding is to apply binary image compression for the bit planes of the
gray-scale image. The image is first divided into k separate bit planes, each representing
a binary image. These bit planes are then coded by any compression method designed for the
binary images, e.g. context based compression with arithmetic coding, as presented in Section
3.5. The bit planes of the most significant bits are the most compressible, while the bit planes
of the least significant bits are nearly random and thus mostly uncompressable.
Better results can be achieved if the binary codes are transformed into Gray codes before
partitioning the image into the bit planes. Consider an 8-bit image consisting of only two pixel
values, which are either 127 and 128. Their corresponding binary representations are
0111 1111 and 1000 0000. Now, even if the image could be compressed by 1 bit per pixel by
a trivial algorithm, the bit planes are completely random (thus uncompressable) since the
values 127 and 128 differ in every bit position. Gray coding is a method of mapping a set of
numbers into a binary alphabet such that successive numerical changes result in a change of
only one bit in the binary representation. Thus, when two neighboring pixels differ by one,
only a single bit plane is affected. Figure 4.1 shows the binary code (BC) and their
corresponding Gray code representations (GC) in the case of 4-bit number of 0 to 15. One
method of generating a Gray code representation from the binary code is by the following
logical operation:
GC  BC   BC  1
(4.1)
where  denotes bit-wise exclusive OR operation, and >> denotes bit-wise logical right-shift
operation. Note that the ith GC bit plane is constructed by performing an exclusive OR on the
ith and (i+1)th BC bit planes. The most significant bit planes of the binary and the Gray codes
are identical. Figures 4.2 and 4.3 shows the bit planes of binary and Gray codes for the test
image Camera.
39
Binary code
Gray code
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
Figure 4.1: Illustration of four-bit binary and Gray codes.
Suppose that the bit planes are compressed by the MSP (most significant plane) first and the
LSP last. A small improvement in a context based compression is achieved, if the context
templates includes few pixels from the previously coded bit plane. Typically the bits included
in the template are the bit of the same pixel that is to be coded, and possibly the one above it.
This kind of templated is referred as a 3D-template.
40
Binary code
Gray code
7
6
5
4
Figure 4.2: Binary and Gray code bit planes for test image Camera, bit planes 7 through 4.
41
Binary code
Gray code
3
2
1
0
Figure 4.3: Binary and Gray code bit planes for test image Camera, bit planes 3 through 0.
42
4.1.2 Lossless JPEG
Lossless JPEG (Joint Photographic Experts Group) proceeds the image pixel by pixel in rowmajor order. The value of the current pixel is predicted on the basis of the neighboring pixels
that have already been coded (see Figure 2.4). The prediction functions available in JPEG are
given in Table 4.1. The prediction errors are coded either by Huffman coding or arithmetic
coding. The Huffman code table of lossless JPEG is given in Table 4.2. Here one first encodes
the category of the prediction error followed by the binary representation of the value within
the corresponding category, see Table 4.3 for an example.
The arithmetic coding component in JPEG is QM-coder, which is a binary arithmetic coder.
The prediction errors are coded in the same manner than in the Huffman coding scheme:
category value followed by the binary representation of the value. Here the category values are
coded by a sequence of binary decisions as shown in Figure 4.4. If the prediction value is not
zero, the sign of the difference is coded after the "zero/non-zero" decision. Finally, the value
within the category is encoded bit by bit from the most significant bit to the least significant
bit. The probability modelling of QM-coder takes care that the corresponding binary decisions
are encoded according to their corresponding probabilities. The details of the context
information involved in the scheme are omitted here.
0
{0}
1
{1}
2
{ 2, 3 }
3
{ 4, ..,7 }
::
7
{ 63, ..,127 }
8
{ 128, ..,255 }
Figure 4.4: Binary decision tree for coding the categories.
Table 4.1: Predictors used in lossless JPEG.
Mode:
0
1
2
3
Predictor:
Null
W
N
NW
Mode:
4
5
6
7
43
Predictor:
N + W - NW
W + (N - NW)/2
N + (W - NW)/2
(N + W)/2
Table 4.2: Huffman coding of the prediction errors.
Category:
0
1
2
3
4
5
6
7
8
Codeword:
00
010
011
100
101
110
1110
11110
111110
Difference:
0
-1, 1
-3, -2, 2, 3
-7,..-4, 4..7
-15,..-8, 8..15
-31,..-16, 16..31
-63,..-32, 32..63
-127,..-64, 64..127
-255,..-128, 128..255
Codeword:
0, 1
00, 01, 10, 11
000,...011, 100...111
0000,...0111, 1000...1111
:
:
:
:
Table 4.3: Example of lossless JPEG for the pixel sequence (10, 12, 10, 7, 8, 8, 12) when
using the prediction mode 1 (i.e. the predictor is the previous pixel value). The predictor for
the first pixel is zero.
Pixel:
Prediction error:
Category:
Bit sequence:
10
+10
4
12
+2
2
10
-2
2
7
-3
2
8
+1
1
8
0
0
12
+4
3
1011010
01110
01101
01100
0101
00
100100
4.1.3 FELICS
FELICS (Fast and Efficient Lossless Image Compression System) is a simple but yet efficient
compression algorithm proposed by Howard and Vitter [1993]. The main idea is to avoid the
use of computationally demanding arithmetic coding, but instead use a simpler coding scheme
together with a clever modelling method.
FELICS uses the information of two adjacent pixels when coding the current one. These are
the one to the left of the current pixel, and the one above it. Denote the values of the
neighboring pixels by L and H so that L is the one which is smaller. The probabilities of the
pixel values obeys the distribution given in Figure 4.5.
probability
below
range
in
range
L
above
range
H
intensity
Figure 4.5: Probability distribution of the intensity values.
44
The coding scheme is as follows: A code bit indicates whether the actual pixel value P falls
into the range [L, H]. If so, an adjusted binary coding is applied. Here the hypothesis is that
the in-range values are uniformly distributed. Otherwise the above/below-range decision
requires another code bit, and the value is then coded by Rice coding with adaptive
k-parameter selection.
Adjusted binary codes:
To encode an in-range pixel value P, the difference P-L must be encoded. Denote =H-L, thus
the number of possible values in the range is +1. If +1 is a power of two, a binary code
with log 2    1 bits is used. Otherwise the code is adjusted so that  log 2    1  bits are
assigned to some values, and  log 2    1  bits to others. Because the values near the middle
of the range are slightly more probable, shorter codewords are assigned to those values. For
example, if =4 there are five values (0, 1, 2, 3, 4) and their corresponding adjusted binary
codewords are (111, 10, 00, 01, 110).
Rice codes:
If the pixel value P exceed the range [L, H], the difference P-(H+1) is coded using Rice
coding. According to hypotesis of the distribution in 4.6, the values have exponentially
decreasing probabilities when P increases. On the other hand, if P falls below the range
[L, H], the difference (L-1)-P is then coded. The shape of the distributions in the above and
below ranges are identical, thus the same Rice coding is applied. See Table 4.4 for a
summarison of the FELICS code words, and Section 2.2.3 for details of Rice coding.
For determining the Rice coding parameter k, the  is used as a context. For each context ,
a cumulative total bit rate is maintained for each reasonable Rice parameter value k, of the
code length that would have resulted if the parameter k were used to encode all values
encoutered so far in the context. The parameter with the smallest cumulative code length is
used to encode the next value encountered in the context. The allowed parameter values are
k=0, 1, 2, and 3.
Table 4.4: Code table of FELICS; B = adjusted binary coding, and R = Rice coding.
Pixel position: Codeword:
Below range
10 + R(L-P-1)
In range
0 + B(P-L)
Above range
11 + R(P-H-1)
4.1.4
JPEG-LS
The JPEG-LS is based on the LOCO-I algorithm (Weinberger et al., 1998). The method uses
the same ideas as the lossless JPEG with the improvement of utilizing context modeling and
adaptive correction of the predictor. The coding component is changed to Golomb codes with
an adaptive choice of the skewness parameter. The main structure of the JPEG-LS is shown in
Fig. 4.6. The modeling part can be broken into the following three components:
a.
Prediction
45
b.
c.
Determination of context.
Probability model for the prediction errors.
In the prediction, the next value x is predicted as x based on the values of already coded
neighboring pixels. The three nearest pixels (denoted as a, b, c), shown in Fig. 4.?, are used in
the prediction as follows:
 mina , b if c  max(a , b)

x  maxa , b if c  min(a , b)
 a  b  c if c  otherwise

(4.2)
The predictor tends to pick a in cases of an horizontal edge above the current location, and b
in cases of a vertical edge exists left of the current location. The third choice (a+b-c) is based
on the presumption that there is a smooth plane around the pixel and uses this estimation as
the prediction. The prediction residual  = x - x are then input to the context modeler, which
will decide the appropriate statistical model to be used in the coding.
Modeler
Coder
context
image
samples
Gradients
Fixed
predictor
Flat
region?
Adaptive
correction
-
prediction
errors
Context
modeler
pred. errors,
statistics
Golomb
coder
predicted
values
+
c
b
d
a
x
image
samples
mode
compressed
bit stream
regular
run
Run counter
image
samples
run lengths,
statistics
Run coder
Figure 4.6: Block diagram of JPEG-LS.
The context is determined by calculating three gradients between the four context pixels:
g1 = d - b, g2 = b - c, and g1 = c - a. Each difference is quantized into a small number of
approximately equiprobable connected regions in order to reduce the number of models.
Denote the quantized values of g1, g2, g3 by q1, q2, q3. The quantization regions are {0},
{1,2}, {3,4,5,6}, {7,8,...,20}, and {e20} by default for 8 bits per pixel images, but the
regions can be adjusted for the particular images and application. The number of models is
reduced further by assuming that symmetric contexts Ci = g1, g2, g3 and Cj = -g1, -g2, -g3
have the same statistical properties (with the difference of the sign). The total number of
models is thus:
 2T  1
C 
3

1
(4.3)
2
46
where T is the number of non-zero regions. Using the default regions (T=4) this gives 365
models in total. Each context will have its own counters and statistical model as described
next.
The fixed predictor of (4.2) is fine-tuned by adaptive adjustment of the prediction value. An
optimal predictor would result in prediction error  = 0 on average, but this is not necessary
the case. A technique called bias cancellation is used in JPEG-LS to detect and correct
systematic bias in the predictor. A correction value C’ is estimated based on the prediction
errors seen so far in the context, and then subtracted from the prediction value. The correction
value is estimated as the average prediction error:
D
C'   
N
(4.4)
where D is the cumulative sum of the previous uncorrected prediction errors, and N is the
number of pixels coded so far. These values are maintained in each context. The practical
implementation is slightly different and the reader is recommended to look for the details from
the paper by Weinberger et al. (1998).
The prediction errors is approximated by a Laplacian distribution, i.e. a two-sided exponential
decay centered at zero. The prediction errors are first mapped as follows:
M     2   u  
(4.5)
This mapping will re-order the values in the interleaved sequence 0, -1, +1, -2, +2, and so on.
Golomb codes (or their special case Rice codes) are then used to code the mapped values, see
Section 2.2.3. The only parameter of the code is k, which defines the skewness of the
distribution. In JPEG-LS, the value of k is adaptively determined as follows:


(4.6)
k  min k ' 2 k '  N  A
where A is the accumulated sum of magnitudes of the prediction errors (absolute values) seen
in the context so far. The appropriate Golomb code is then used as described by Weinberger et
al. (1998).
JPEG-LS has also “run mode” for coding flat regions. The run mode is activated when a flat
region is detected as a = b = c = d (or equivalently q1 = q2 = q3 = 0). The number of repeated
successful predictions (=0) is the encoded by a similar Golomb coding of the run length. The
coding the returns to the regular mode when coding the next unsuccessful prediction error
(0). This technique is also referred as alphabet extension.
The standard included also an optional near-lossless mode, in which every sample value in
a reconstructed image is guaranteed to differ from the original value by a (small) preset
amount . This mode is implemented simply by quantizing the prediction error  as follows:
   
Q    u   

 2  1
(4.7)
47
The quantized values must then be used also in the context modeling, and taken into account
in the coding and decoding steps.
4.1.5 Residual coding of lossy algorithms
An interesting approach to lossless coding is the combination of lossy and lossless methods.
First, a lossy coding method is applied to the image. Then a residual between the original and
the reconstructed image is calculated and compressed by any lossless compression algorithm.
This scheme can also be considered as a kind of progressive coding. The lossy image serves as
a rough version of the image which can be quickly retrieved. Then the complete image can be
retrieved by adding the residual and the lossy parts together.
4.1.6 Summary of the lossless gray-scale compression
Figure 4.7 profiles the performance of several lossless gray-scale image compression
algorithms, including also three well-known compression software (Compress, Gzip, Pkzip)
based on Ziv-Lempel algorithms [1977] [1978].
The compression efficiency of the bit plane coding is rather close to the results of the lossless
JPEG. In fact, the coding of the bit planes by JBIG (with 3D-template) outperforms lossless
JPEG for images with low bits per pixel values in the original image, see Figure 4.7. For
example, the corresponding bit rates of the lossless JPEG and JBIG for the bit planes were
3.83 and 3.92, for a set of 8 bpp images (256 gray scales). On the other hand, for 2 bpp images
(4 gray scales) the corresponding bit rates were 0.34 (lossless JPEG), and 0.24 (JBIG bit plane
coding). The result of the bit plane based JBIG coding are better when the precision of the
image is 6 bpp or lower. Otherwise lossless JPEG gives slightly better compression results.
8.0
Bits per pixel
7.0
COMPRESS = Unix standard
compression software
6.0
GZIP = Gnu compression software
5.0
PKZIP = Pkware compression software
4.0
7.5
7.1
BP = Bit plane coding [RM92]
6.9
3.0
5.1
4.9
4.7
4.7
2.0
FELICS = Fast and Efficient Lossless
Image Compression System [HV92b]
1.0
0.0
3D-BP = Bit plane coding with 3Dtemplate [RM92]
COM PRESS GZIP PKZIP
BP 3D-BP FELICS JPEG
JPEG = ISO and CCITT standard [PM93]
Figure 4.7: Compression efficiency of lossless image compression algorithms
for test image Lena (5125128).
48
4.00
3.50
Bits rate
3.00
2.50
2.00
1.50
JBIG
1.00
JPEG
0.50
0.00
2
3
4
5
6
7
8
Bits per pixel
Figure 4.8: Compression versus precision. JBIG refers to the bit plane coding, and JPEG for
the lossless JPEG. The results are for the JPEG set of test images.
4.2 Block truncation coding
The basic idea of block truncation coding (BTC) is to divide the image into 44-pixel blocks
and quantize the pixels to two different values, a and b. For each block, the mean value ( x )
and the standard deviation () are calculated and encoded. Then two-level quantization is
performed for the pixels of the block so that a 0-bit is stored for the pixels with values smaller
than the mean, and the rest of the pixels are represented by a 1-bit. The image is reconstructed
at the decoding phase from the x and , and from the bit plane by assigning the value a to the
0-value pixels and b to the 1-value pixels:
a  x  
q
mq
(4.8)
b  x  
mq
q
(4.9)
where m (=16) is the total number of the pixels in the block and q is the number of 1-bits in
the bit plane. The quantization level values were chosen so that the mean and variance of the
pixels in the block would be preserved in the decompressed image, thus the method is also
referred as moment preserving BTC. Another variant of BTC, called absolute moment BTC
(AMBTC) selects a and b as the mean values of the pixels within the two partitions:
a
1
  xi
m  q xi  x
(4.10)
b
1
  xi
q xi  x
(4.11)
49
This choice of a and b does not necessarily preserve the second moment (variance) of the
block. However, it has been shown that the MSE-optimal representative for a set of pixels is
their mean value. See Figure 4.9 for an example of the moment preserving BTC.
In the moment preserving BTC the quantization data is represented by the pair ( x ,).
A drawback to this approach is that the quantization levels are calculated at the decoding
phase from the quantized values of ( x ,s) containing rounding errors. Thus extra degradation is
caused by the coding phase. The other approach is to calculate the quantization levels (a,b)
already at the encoding phase and transmit them. In this way one can minimize both the
quantization error and the computation needed at the decoding phase.
The basic BTC algorithm does not consider how the quantization data - ( x ,) or (a,b) - and
the bit plane should be coded, but simply represents the quantization data by 8+8 bits, and the
bit plane by 1 bit per pixel. Thus, the bit rate of BTC is  8  8  m m bits per pixel (=2.0 in
the case of 44 blocks).
O r i g i n a l
B i t - p l a n e
R e c o n s t r u c t e d
2
9
1 2
1 5
0
1
1
1
2
1 2
1 2
1 2
2
1 1
1 1
9
0
1
1
1
2
1 2
1 2
1 2
2
3
1 2
1 5
0
0
1
1
2
2
1 2
1 2
3
3
4
1 4
0
0
0
1
2
2
2
1 2
x
=
7 . 9 4

=
4 . 9 1
q
=
9
a
=
2 . 3
b
=
1 2 . 3
Figure 4.9: Example of the moment preserving BTC.
The major drawback of BTC is that it performs poorly in high contrast blocks because two
quantization levels are not sufficient to describe these blocks. The problem can be attacked by
using variable block sizes. With large blocks one can decrease their total number and therefore
reduce the bit rate. On the other hand, small blocks improve the image quality.
One such an approach is to apply quadtree decomposition. Here the image is segmented into
blocks of size m1 m1. If standard deviation  of a block is less than a predefined threshold
th (implying a low contrast block) the block is coded by a BTC algorithm. Otherwise it is
divided into four subblocks and the same process is repeated until the threshold criterion is
met, or the minimal block size (m2 m2) is reached. The hierarchy of the blocks is
represented by a quadtree structure.
The method can be further improved by compressing the bit plane, eg. by vector quantization.
This combined algorithm is referred as BTC-VQ. The pair (a,b), on the other hand, can be
compressed by forming two subsample images; one from the a-values and another from the bvalues. These can then be compressed by any image compression algorithm in the same
manner than the mean value in the mean/residual VQ.
50
All the previous ideas can be collected together to form a combined BTC algorithm. Let us
next examine one such possible combination. See Table 4.5 for the elements of the combined
method, referred as HBTC-VQ. The variable block sizes are applied with the corresponding
minimum and maximum block sizes of 22 and 3232. For a high quality of the compressed
image the use of 22-blocks is essential in the high contrast regions. Standard deviation () is
used as a threshold criterion and is set to 6 for all levels. (For the 44-level the threshold value
could be left as an adjusting parameter.) The bit plane is coded by VQ using a codebook with
256 entries, thus the compression effect will be 0.5 bpp for every block coded by VQ. Two
subsample images are formed, one from the a-values and another from the b-values of the
blocks. They are then coded by FELICS, see Section 4.1.3. The result of the combined BTC
algorithm is illustrated in Figure 4.10.
Table 4.5: Elements of the combined BTC algorithm.
Part:
Quantization
Coding of (a,b):
Coding of bitplane:
Block size:
BTC
bpp = 2.00
mse = 43.76
Method:
AMBTC
FELICS
VQ / 256 entries
3232  22
AMBTC
HBTC-VQ
bpp = 2.00
mse = 40.51
bpp = 1.62
mse = 15.62
Figure 4.10: Magnifications of Lena when compressed by various BTC variants.
4.3 Vector quantization
Vector quantization (VQ) is a generalization of scalar quantization technique where the
number of possible (pixel) values is reduced. Here the input data consists of M-dimensional
vectors (eg. M-pixel blocks) instead of scalars (single pixel values). Thus with 8-bpp grayscale images the number of different vectors (blocks) is 256M. The input space, however, is
not evenly occupied by these vectors. In fact, because of the high correlation between the
51
neighboring pixel values, some input vectors are very common while others hardly ever
appear in real images. For example, completely random patterns of pixels are rarely seen but
certain structures (like edges, flat areas, and slopes) are found in almost every image.
Vector quantization partitions the input space into K non-overlapping regions so that the input
space is completely covered. A representative (codevector) is then assigned to each cluster.
Vector quantization maps each input vector to this codevector, which is the representative
vector of the partition. Typically the space is partitioned so that each vector is mapped to its
nearest codevector minimizing certain distortion function. The distortion is commonly the
euclidean distance between the two vectors:
d X, Y  
M
 X
i
 Yi 
(4.12)
2
i 1
where X and Y are two M-dimensional vectors. The codevector is commonly chosen as the
centroid of the vectors in the partition, that is

C   c1 , c2 ,..., c m ,  X1 , X 2 ,..., X m

(4.13)
where X i is the average value of the ith component of the vectors belonging to the partition.
This selection minimizes the euclidean distortion within the partition. The codebook of vector
quantization consists of all the codewords. The design of a VQ codebook is studied in the next
section.
Vector quantization is applied in image compression by dividing the image into fixed sized
blocks (vectors), typically 44, which are then replaced by the best match found from the
codebook. The index of the codevector is then sent to the decoder by log 2 K bits, see Figure
4.11. For example, in the case of 44 pixel blocks and a codebook of size 256, the bit rate is
 log 2 256 16  0.5 (bpp), and the corresponding compression ratio 16  8 8  16 .
Mapping
function Q
1
Training setT
Code
vectors
3
2
1
1
3
42
2
4
3
3
N training
vectors
Codebook C
M code
vectors
42
M
N
8
Scalar in [1.. M]
32 11
K-dimesional vector
Figure 4.11: Basic structure of vector quantization.
52
4.3.1 Codebook organization
Even if the time complexity of codebook generation algorithms is usually rather high, it is not
so critical since the codebook is typically generated only once as a preprocessing stage.
However, the search of the best match is applied K times for every block in the image. For
example, in the case of 44 blocks and K=256, 16256 multiplications are needed for each
block. For an image of 512512 pixels there are 16384 blocks in total, thus the number of
multiplications required is over 67 million.
Tree-structure VQ:
In a tree-structured vector quantization the codebook is organized in an m-ary balanced tree,
where the codevectors are located in the leaf nodes, see Figure 4.12. The input vector is
compared with m predesigned test vectors at each stage or node of the tree. The nearest test
vector determines which of m paths through the tree to select in order to reach the next stage
of testing. At each stage the number of code vectors is reduced to 1/m of the previous set of
candidates. In many applications m=2 and we have a binary tree. If the codebook size is K=md,
then d= log m K m-ary search stages are needed to locate the chosen code vector. An m-ary tree
have breadth m and depth d. The drawback of the tree-structured VQ is that the best match is
not necessarily found because the search is made heuristically on the basis of the search-tree.
Figure 4.12: Tree-structured vector quantizer codebook.
The number of bits required to encode a vectors is still log 2 K regardless of the treestructure. In the case of binary tree, the indexes can be obtained by assigning binary labels for
the branches in the tree so that the index of a codevector is a catenation of the labels within
the path from root to the leaf node. Now these bit sequences can be transmitted (encoded) in
progressive manner from the most significant bits of the blocks to the least significant bits, so
that at the first phase the bits of the first branches in the tree structure are sent. Thus the
decoder can display the first estimate of the image as soon as the first bits of each block are
received. This kind of progressive coding consists of as many stages as there are bits in the
code indexes. Note, that the progressive coding is an option that requires no extra bits in the
coding.
53
Classified VQ:
In classified vector quantization instead of having one codebook, several (possibly smaller)
codebooks are used. For each block, a classifier selects the codebook where the search is
performed, see Figure 4.13. Typically the codebooks are classified according to the shape of
the block so that codevectors having horizontal edges might be located in one codebook,
blocks having diagonal edges in another, and so on. The encoder has to send two indexes to
the decoder; the index of the chosen codevector within the book, but also the index of the
class where the codevector was taken. The classified VQ can be seen as a special case of treestructued VQ where the depth of the tree is 1.
There are two motivations for classified VQ. First, it allows faster search from the codebooks
since the codebooks can be smaller than the codebook of a full search VQ. Second, the
classified VQ can also be seen as a type of codebook construction method where the
codevectors are obtained due to their type (shape). However, the classified VQ codebooks in
general are no better than full search codebooks. Consider a classified VQ where we have 4
classes, each having 256 codevectors. The total number of bits required to encode each block
is log 2 4  log 2 256  2  8  10 . If we have the same number of bits, we can express
210=1024 different codevectors in a full-search VQ. By choosing the union of the four
subcodebooks of classified VQ, the result cannot be any worse than the classified VQ. Thus,
the primary benefit of classified VQ is that it allows a faster search (at the cost of extra
distortion).
X
i1
Encoder
Decoder
i2
Classifier
Class 1 Class 2
Class M
Figure 4.13: Classified vector quantizer.
4.3.2 Mean/residual VQ
In mean/residual vector quantization the blocks are divided into two components; the mean
value of the pixels, and the residual where the mean is subtracted from the individual pixel
values:
(4.14)
ri  xi  x
It is easier to design a codebook for the mean-removed blocks than for the original blocks.
The range of the residual pixel values is [-255, 255] but they are concentrated around the zero.
54
Thus, the mean/residual VQ is a kind of prediction technique. However, since the predictor is
the mean value of the same block, it must be encoded also. A slightly different variant is
interpolative/residual VQ, where the predictor is not the mean value of the block, but the
result of a bilinear interpolation, see Figure 4.14. The predictor for each pixel is interpolated
not only on the basis of the mean value of the block, but also on the basis of the mean values
of the neighboring blocks. Details of bilinear interpolation are omitted here.
Mean/residual VQ
Interpolative/residual VQ
Figure 4.14: Predictors of two residual vector quantizers.
In addition to removing the mean value, one can normalize the pixel values by eliminating the
standard deviation also:
ri 
xi  x
(4.15)

where  is the standard deviation of the block. The histogram of the resulting residual values
have zero mean and unit distance. What is left is the shape of the block, ie. the correlations
between the neighboring pixels. This method can be called as mean/gain/shape vector
quantization since it separates these three components from each other. The shape is then
coded by vector quantization. Other coding methods, however, might be more suitable for the
mean and gain (). For exampe, one could form a subsample image from the mean values of
the blocks and compress them by any image compression algorithm, eg. lossless JPEG. Figure
4.16 shows sample blocks taken from test image Eye (Figure 4.15) when normalized by
Equation (4.15).
55
Figure 4.15: Test image Eye (50508).
Figure 4.16: 100 samples of normalized 44 blocks from 4.14.
4.3.3 Codebook design
The codebook is usually constructed on the basis of a training set of vectors. The training set
consists of sample vectors from a set of images. Denote the number of vectors in the training
set by N. The object of the codebook construction is to design a codebook of K vectors so that
the average distortion is minimized in respect to the training set. See Figure 4.17 for an
example of 100 two-dimensional training vectors and their one possible partitioning into 10
sets.
56
Figure 4.17: Example of 100 sample training set consisting of 2-D vectors plotted into the
vector space (left); example of partitioning the vectors into 10 sample regions (right).
Random heuristic:
The codebook can be selected by randomly choosing K vectors from the training set. This
method, even if it looks like quite irrational, is useful as the initial codebook for an iterative
method like the generalized Lloyd algorithm. It is also applicable as such, since the actual
coding is not a random process, but for each image block to be coded, the best possible
codevector is chosen from the codebook.
Pairwise nearest neighbor:
The pairwise nearest neighbor (PNN) algorithm starts by including all the N training vectors
into the codebook. In each step, two codevectors are merged into one codevector so that the
increase in the overall distortion is minimized. The algorithm is then iterated until the number
of codevectors is reduced to K. The details of the algorithms are explained in the following.
The PNN algorithm starts by calculating the distortion between all pairs of training vectors.
The two nearest neighboring vectors are combined into a single cluster and represented by
their centroid. The input space consists now of N-1 clusters, one containing two vectors and
the rest containing a single vector. At each phase of the algorithm, the increase in average
distortion is computed for every pair of clusters, resulting if the two clusters and their
centroids are replaced by the merged two clusters and the corresponding centroid. The two
clusters with minimum increase in the distortion are then merged, and the codevector of that
cluster is its centroid. See Figure 4.18 for an example of PNN algorithm.
57
Training set:
( x1, x2 )
(1,1)
(2,5)
(3,2)
(4,6)
(5,4)
(5,8)
(6,6)
(8,2)
(8,9)
(9,1)
Initial:
1st stage:
2nd stage:
10
10
10
8
8
8
x2
6
x2
4
2
6
x2
4
2
0
2
4
Codevectors:Dist.:
(1,1) (2,5) (3,2) (4,6) (5,4) (5,8) (6,6) (8,2) (8,9) (9,1) -
x1
6
8
10
0
0
Best candidates
for merging:
2
4
Codevectors:Dist.:
(1,1) (2,5) (3,2) (4,6) (5,4) (5,8) (6,6) (8,2) (9,2) 2.0
(8,2)+(9,1) +2.0
(4,6)+(6,6) +2.0
(1,1)+(3,2) +2.4
4
2
0
0
6
3rd stage:
x1
6
8
10
0
Best candidates
for merging:
(4,6)+(6,6) +2.0
(1,1)+(3,2) +2.4
(2,5)+(4,6) +2.4
4th stage:
10
8
8
8
x2
4
2
6
x2
4
2
0
2
4
Codevectors:Dist.:
(1,1) (2,5) (3,2) (5,5) 3.8
(5,8) (8,2) (9,2) 2.0
x1
6
8
10
Best candidates
for merging:
(5,5)+(5,8) +2.2
(1,1)+(3,2) +2.4
(2,5)+(5,5) +2.8
6
8
10
Best candidates
for merging:
(5,4)+(5,6) +1.8
(2,5)+(5,6) +2.2
(1,1)+(3,2) +2.4
6
4
2
0
0
x1
5th stage:
10
6
4
Codevectors:Dist.:
(1,1) (2,5) (3,2) (5,6) 2.0
(5,4) (5,8) (8,2) (9,2) 2.0
10
x2
2
0
0
2
4
Codevectors:Dist.:
(1,1) (2,5) (3,2) (5,6) 6.0
(8,2) (9,2) 2.0
x1
6
8
10
Best candidates
for merging:
(1,1)+(3,2) +2.4
(2,5)+(5,6) +2.6
(2,5)+(3,2) +3.4
0
2
4
x1
6
8
Codevectors:Dist.:
(2,2) 2.4
(2,5) (5,6) 3.8
(8,2) (9,2) 2.0
Figure 4.18: Pairwise nearest neighbor algorithm for a training set with N=10,
and the final codebook with K=5.
58
10
Splitting algorithm:
The splitting algorithm has the opposite approach to codebook construction compared to the
PNN method. It starts by a codebook with a single codevector which is the centroid of the
complete training set. The algorithm then produces increasingly larger codebooks by splitting
certain codevector Y to two codevectors Y and Y+, where  is a vector of small Euclidean
norm. One choice of  is to make it proportional to the vector whose ith component is the
standard deviation of the ith component of the set of training vectors. The new codebook after
the splitting can never be worse than the previous one, since it is the same than the previous
codebook plus one new codevector. The algorithm is iterated until the size of codebook
reaches K.
Generalized Lloyd algorithm:
The Generalized Lloyd Algorithm (GLA), also referred as the LBG-algorithm, is an iterative
method that takes a codebook as an input (referred as the initial codebook) and produces
a new improved version of the initial codebook (resulting to lower overall distortion). The
hypothesis is that the iterative algorithm finds the nearest locally optimal codebook in respect
to the initial one. The initial codebook for GLA can be constructed by any existing codebook
design method, eg. by the random heuristic. The Lloyd's necessary conditions of optimality are
defined as follows:
 Nearest neighbor condition: For a given set of codevectors, each training vector is
mapped to its nearest codevector in respect to the distortion function.
 Centroid condition: For a given partition, the optimal codevector is the centroid of the
vectors within the partition.
On the basis of these two optimality conditions, the Lloyd's algorithm is formulated as a twophase iterative process:
1. Divide the training set into partitions by mapping each vector X to its nearest
codevector Y using the euclidean distance.
2. Calculate the centroid of each region and replace the codevectors Y by the centroids of
their corresponding partitions.
Both stages of the algorithm satisfy the optimality conditions, thus the resulting codebook
after one iteration can never be worse than the original one. The iterations are thus continued
until no change (=decrease in the overall distortion) is achieved. The algorithm doesn't
necessarily reach the global optimum, but converges to a local minimum.
Note, that there is a third condition of optimality stating that there shouldn't be any vector
having equal distance with two different codevectors. However, this can be omitted by
defining the distortion function so that this condition could never happen. Thus this third
optimality condition is omitted in the discussion made here.
59
4.3.4 Adaptive VQ
VQ can also be applied adaptively by designing a codebook on the basis of the image to be
coded. The codebook, however, must then be included in the compressed file. For example,
consider a codebook of 256 entries each taking 16 bytes. The complete codebook thus requires
256  4  4 = 4096 bytes of memory increasing the overall bit rate by 0.125 bits per pixel (in the
case of 512512 image). In dynamic modelling, on the other hand, the compression would
start with an initial codebook, which would then be updated during the compression - on the
basis of the already coded blocks. This wouldn't increase the bit rate, but the computational
load required by eg. GLA might be too much if it was applied after each block.
4.4 JPEG
JPEG (Joint photographic experts group) was started in 1986 by cooperative efforts of both
the International organization for standardization (ISO) and International Telegraph and
Telephone Consultative Committee (CCITT), and it was later joined by International
Electrotechnical Committee (IEC). The purpose of JPEG was to create an image compression
standard for gray-scale and color images. Even if the name JPEG refers to the standardization
organization it was also adopted to the compression method. The JPEG standard includes the
following modes of operation:
 Lossless coding
 Sequential coding
 Progressive coding
 Hierarchical coding
The lossless coding mode is completely different form the lossy coding and was presented in
Section 4.1.1. The lossy baseline JPEG (sequential coding mode) is based on discrete cosine
transform (DCT), and is introduced next.
4.4.1 Discrete cosine transform
The 1-D discrete cosine transform (DCT) is defined as
N 1
  2 x  1u 
C u    u  f  x   cos

 2N

x 0
(4.16)
Similarly, the inverse DCT is defined as
N 1
  2 x  1u 
f  x      u C u  cos

 2N

u0
(4.17)
where
 1
 N for u  0
  u  
 2 N for u  1,2,..., N  1

(4.18)
The corresponding 2-D DCT, and the inverse DCT are defined as
60
N 1 N 1
  2 y  1 v 
  2 x  1u 
C u, v    u  v   f  x, y  cos
 cos


 2N

x 0 y0
 2N

(4.19)
N 1 N 1
  2 y  1 v 
  2 x  1 u 
f  x, y      u  v C u, v  cos 
 cos 


2N


u0 v0
 2N

(4.20)
and
The advantage of DCT is that it can be expressed without complex numbers. 2-D DCT is also
separable (like 2-D Fourier transform), i.e. it can be obtained by two subsequent 1-D DCT in
the same way than Fourier transform. See Figure 7.2 for an example of basis functions of the
1-D DCT, Figure 7.3 for an example of the basis functions of the 2-D DCT, and Figure 7.3 for
illustration of the 2-D DCT for 44 sample blocks.
u=0
u=1
u=2
u=3
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0
0
0
0
-0.5
-0.5
-0.5
-0.5
-1.0
-1.0
-1.0
-1.0
u=4
u=5
u=6
u=7
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0
0
0
0
-0.5
-0.5
-0.5
-0.5
-1.0
-1.0
-1.0
-1.0
Figure 4.19: 1-D DCT basis functions for N=8.
61
v
0
2
1
3
u
0
1
2
3
Figure 4.20: 2-D DCT basis functions for N=4. Each block consists of 44 elements,
corresponding to x and y varying from 0 to 3.
62
ORIGINAL
IMAGE
TRANSFORMED
IMAGE
FLAT
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
11
14
15
18
15
11
16
17
18
13
19
12
14
12
12
18
10
10
10
10
10
20
10
10
10
10
10
10
10
10
10
10
10
10
20
10
10
10
20
10
10
10
20
10
10
10
20
10
10
10
10
10
10
10
10
10
20
20
20
20
20
20
20
20
10
10
20
20
10
10
20
20
10
10
20
20
10
10
20
20
10
10
10
10
10
10
10
20
10
10
20
20
10
20
20
20
10
10
10
10
12
12
12
12
14
14
14
14
40.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
58.8
-3.9
2.7
3.0
0.3
-2.8
-1.7
-0.9
-1.8
-3.5
1.2
-5.3
1.3
2.6
-3.4
1.8
42.5
1.4
-2.5
-3.2
1.4
0.7
-1.4
-1.8
-2.5
-1.4
2.5
3.3
-3.2
-1.8
3.3
4.3
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
-18.4
0.0
0.0
0.0
0.0
0.0
0.0
0.0
7.7
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
-11.1
5.0
4.6
0.0
0.0
4.6
-5.0
-1.9
-0.7
0.0
-1.9
5.0
-8.9
0.0
0.0
0.0
0.0
0.0
0.0
0.0
-0.6
0.0
0.0
0.0
RANDOM TEXTURE
IMPULSE
LINE (horizontal)
50.0
-5.4
-10.1
13.1
EDGE (vertical)
60.0
0.0
0.0
0.0
EDGE (horizontal)
60.0
-18.4
0.0
7.7
EDGE (diagonal)
55.0
-11.1
0.0
-0.7
SLOPE (horizontal)
16
16
16
16
52.0
0.0
0.0
0.0
Figure 4.21: Example of DCT for sample 44 blocks.
63
4.4.2 Baseline JPEG
The image is first segmented into 88 blocks of pixels, which are then separately coded. Each
block is transformed to frequency domain by fast discrete cosine transform (FDCT). The
transformed coefficients are quantized and then entropy coded either by arithmetic coder
(QM-coder with binary decision tree) or by Huffman coding. See Figure 4.22 for the main
structure of the baseline JPEG. The corresponding decoding structure is given in Figure 4.22.
Neither the DCT nor the entropy coding lose any information from the image. The DCT only
transforms the image into frequency space so that it is easier to compress. The only phase
resulting to distortion is the quantization phase. The pixels in the original block are
represented by 8-bit integers, but the resulting transform coefficients are 16 bit real numbers,
thus the DCT itself would result extension in the file size, if no quantization were performed.
The quantization in JPEG is done by dividing the transform coefficients ci (real number) by
the so-called quantization factor qi (integer between 1..255):
c 
ci  round  i 
 qi 
(4.21)
The result is rounded to the nearest integer, see Figure 4.24 for an example. The higher the
quantization factor, the less accurate is the representation of the value. Even the lowest
quantization factor (q=1) results to a small amount of distortion, since the original coefficients
are real number, but the quantized values are integers. The dequantization is defined by
(4.22)
ri  ci  qi
Source
Image Data
FDCT
8x8 blocks
Quantizer
Entropy
Encoder
Table
Specifications
Table
Specifications
Compressed
Image Data
Figure 4.22: Main structure of JPEG encoder.
Compressed
Image Data
Entropy
Decoder
Dequantizer
IDCT
Table
Specifications
Table
Specifications
Figure 4.23: Main structure of JPEG decoder.
64
Reconstructed
Image Data
10
9
8
7
6
5
4
3
2
1
0
5
4
3
2
1
0
Figure 4.24: Example of quantization by factor of 2.
In JPEG, the quantization factor is not uniform within the block. Instead, the quantization is
performed so that more bits are allocated to the low frequency components (consisting the
most important information) than to the high frequency components. See Table 4.6 for an
example of possible quantization matrices. The basic quantization tables of JPEG are shown
in Table 4.7, where the first one is applied both in the gray-scale image compression, and for
the Y component in color image compression (assuming YUV, or YIQ color space). The
second quantization table is for the chrominance components (U and V in YUV color space).
The bit rate of JPEG can be adjusted by scaling the basic quantization tables up (to achieve
lower bit rates), or down (to achieve higher bit rates). The relative differences between the qi
factors are retained.
Table 4.6: Possible quantization tables.
Uniform quantization
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
More accurate quantization
16
16
16
16
16
16
16
16
1
2
4
4
8
8
16
16
2
4
4
8
16
16
16
16
2
8
8
16
16
16
32
32
4
8
16
16
32
32
32
32
4
8
16
16
32
32
32
32
8
16
16
32
32
32
32
64
16
16
32
32
32
64
64
64
Less accurate quantization
16
32
32
32
32
64
64
64
8
64
128
256
256
256
256
256
64
128
256
256
256
256
256
256
64
128
256
256
256
256
256
256
128
128
256
256
256
256
256
256
Table 4.7: JPEG quantization tables.
Luminance
16
12
14
14
18
24
49
72
11
12
13
17
22
35
64
92
10
14
16
22
37
55
78
95
16
19
24
29
56
64
87
98
24
26
40
51
68
81
103
112
40
58
57
87
109
104
121
100
Chrominance
51
60
69
80
103
113
120
103
61
55
56
62
77
92
101
99
17
18
24
47
99
99
99
99
65
18
21
26
66
99
99
99
99
24
26
56
99
99
99
99
99
47
66
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
256
The entropy coding in JPEG is either the Huffman or arithmetic coding. Here the former is
briefly discussed. The first coefficient (denoted by DC-coefficient) is separately coded from
the rest of the coefficients (AC-coefficients). The DC-coefficient is coded by predicting its
value on the basis of the DC-coefficient of the previously coded block. The difference
between the original and the predicted value is then coded using similar code table than was
applied in the lossless JPEG (see Section 4.1.1). The AC-coefficients are then coded one by
one in the order given by zig-zag scanning (see Section 2.1.2). No prediction is made, but a
simple run-length coding is applied. All sub-sequent zero-value coefficients are coded by their
number (the length of the run). Huffman coding is then applied to the non-zero coefficients.
The details of the entropy coding can be found in [Pennebaker & Mitchell 1993].
Table 4.8 gives an example of compressing a sample block by JPEG using the basic
quantization table. The result of compressing test image Lena by JPEG is shown in Figures
4.25 and 4.26.
4.4.3 Other options in JPEG
JPEG for color images:
RGB color images are compressed in JPEG by transforming the image first into YUV (or YIQ
in the case of North America) and then compressing the three color components separately.
The chrominance components are often sub-sampled so that 22 block of the original pixels
forms a new pixel in the sub-sampled image. The color component images are then upsampled
to their original resolution in the decompression phase.
Progressive mode:
Progressive JPEG is rather trivial. Instead of coding the image block after block, the coding is
divided into several stages. At the first stages the DC-coefficients are coded from each block.
The decoder can get rather good approximation of the image on the basis on the
DC-coefficients alone, since they contain the information of the average value of the blocks.
At the second stage, the first significant AC-coefficients (determined by the zig-zag order) are
coded. At the third phase the next significant AC-coefficients are coded, and so on. In total,
there are 64 coefficients in each block, so the progressive coding can have at most 64 stages.
In practice, the progressive coding can be changed back to the sequential order for example
already after the first stage. This is because the DC-coefficients are usually enough for the
decoder to decide whether the image is worth retrieving.
Hierarchical mode:
The hierarchical coding mode of JPEG is a variant of progressive modelling, too. A reduced
resolution version of the image is compressed first followed by the higher resolution versions
in increasing order. In each case the resolution is doubled for the next image similarly than
was done in JBIG.
66
Table 4.8: Example of a sample block compressed by JPEG.
Original block
139
144
150
159
159
161
162
162
144
151
155
161
160
161
162
162
149
153
160
162
161
161
161
161
153
156
163
160
162
161
163
161
155
159
158
160
162
160
162
163
155
156
156
159
155
157
157
158
Transformed block
155
156
156
159
155
157
157
158
235.6 -1.0 -12.1 -5.2 2.1 -1.7 -2.7 1.3
-22.6 -17.5 -6.2 -3.2 -2.9 -0.1 0.4 -1.2
-10.9 -9.3 -1.6 1.5 0.2 -0.9 -0.6 -0.1
155
156
156
159
155
157
157
158
-7.1
-0.6
1.8
-1.3
-2.6
-1.9
-0.8
-0.2
-0.4
1.6
Quantization matrix
16
12
14
14
18
24
49
72
11
12
13
17
22
35
64
92
10
14
16
22
37
55
78
95
16
19
24
29
56
64
87
98
24
26
40
51
68
81
103
112
40
58
57
87
109
104
121
100
-10
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1.5
1.6
-0.3
-1.5
-1.8
0.9
-0.1
-0.8
-0.5
1.9
-0.1
-0.7
1.5
1.7
1.2
0.0
0.6
1.0
1.1
-0.6
0.3
1.3
-1.0
-0.8
-0.4
Quantized coefficients
51 61
60 55
69 56
80 62
103 77
113 92
120 101
103 99
15
-2
-1
-1
0
0
0
0
0
-1
-1
0
0
0
0
0
144
148
155
160
163
163
160
158
146
150
156
161
163
164
161
159
Dequantized coefficients
240 0
-24 -12
-14 -13
0
0
0
0
0
0
0
0
0
0
0.2
1.5
1.6
-0.3
-3.8
0
0
0
0
0
0
0
0
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
156
156
156
157
158
158
159
159
156
156
156
155
156
157
158
158
Decompressed block
0
0
0
0
0
0
0
0
67
149
152
157
161
164
164
162
161
152
154
158
162
163
164
162
161
154
156
158
161
162
162
162
162
156
156
157
159
160
160
161
161
Original
bpp = 8.00
JPEG
mse = 0.00
bpp = 1.00
mse = 33.08
bpp = 0.25
JPEG
bpp = 0.50
mse = 17.26
JPEG
mse = 79.11
Figure 4.25: Test image Lena compressed by JPEG. Mse refers to mean square error.
68
Original
bpp = 8.00
JPEG
mse = 0.00
bpp = 1.00
mse = 33.08
bpp = 0.25
JPEG
bpp = 0.50
mse = 17.26
JPEG
mse = 79.11
Figure 4.26: Magnifications of Lena compressed by JPEG.
69
4.5 JPEG2000
--- to be written later --Equation numbers:
Figure numbers:
Next number --->
Next number --->
(4.23)
(4.27)
4.5.1 Wavelet transform
The basic idea of (discrete) wavelet transform is to decompose the image into smooth and
detail components. The decomposition is perform in horizontal and in vertical directions
separately. The smooth component represents the average color information and the detail
component the differentials of neighboring pixels. The smooth component is obtained using
a low pass filter and the detail component by high pass filter.
 Within each family of wavelets (such as the Daubechies family) are wavelet subclasses
distinguished by the number of coefficients and by the level of iteration. Wavelets are
classified within a family most often by the number of vanishing moments. This is an extra set
of mathematical relationships for the coefficients that must be satisfied, and is directly related
to the number of coefficients (1). For example, within the Coiflet wavelet family are Coiflets
with two vanishing moments, and Coiflets with three vanishing moments. In Figure 4, several
different wavelet families are illustrated.
 The matrix is applied in a hierarchical algorithm, sometimes called a pyramidal algorithm.
The wavelet coefficients are arranged so that odd rows contain an ordering of wavelet
coefficients that act as the smoothing filter, and the even rows contain an ordering of wavelet
coefficient with different signs that act to bring out the data's detail. The matrix is first applied
to the original, full-length vector. Then the vector is smoothed and decimated by half and the
matrix is applied again. Then the smoothed, halved vector is smoothed, and halved again, and
the matrix applied once more. This process continues until a trivial number of "smoothsmooth-smooth..." data remain. That is, each matrix application brings out a higher resolution
of the data while at the same time smoothing the remaining data. The output of the DWT
consists of the remaining "smooth (etc.)" components, and all of the accumulated "detail"
components.
70
-
Subsampling
=
Upsampling
Subsampling
=
Upsampling
=
Figure 4.??: Different families of wavelet functions.
71
Figure 4.??: Example of vertical and horizontal sub band decomposition.
Figure 4.??: Illustration of the first and second iterations of wavelet decomposition.
4.5.2 Wavelet-based compression
 Filtering
 Quantizer
 Entropy coding
 Arithmetic coding
 Bit allocation
72
4.6 Fractal coding
Fractals can be considered as a set of mathematical equations (or rules) that generate fractal
images; images that have similar structures repeating themselves inside the image. The image
is the inference of the rules and it has no fixed resolution like raster images. The idea of
fractal compression is to find a set of rules that represents the image to be compressed, see
Figure 4.25. The decompression is the inference of the rules. In practice, fractal compressions
tries to decompose the image into smaller regions which are described as linear combinations
of the other parts of the image. These linear equations are the set of rules.
The algorithm presented here is the Weighted Finite Automata (WFA) proposed by Karel
Culik II and Jarkko Kari [1993]. It is not the only existing fractal compression method, but has
been shown to work well in practice, rather than being only a theoretical model. In WFA the
rules are represented by a finite automata consisting of states (Qi), and transitions from one
state to another (fj). Each state represents an image. The transitions leaving the state define
how the image is constructed on the basis of the other images (states). The aim of WFA is to
find such an automate A that represents the original image as well as possible.
inference
of the rules
Image
Set of rules
fractal
compression
Figure 4.25: Fractal compression.
The algorithm is based on quadtree decomposition of the images, thus the states of the
automata are squared blocks. The subblocks (quadrants) of the quadtree have been addressed
as shown in Figure 4.26. In WFA, each block (state) is described by the content of its four
subblocks. This means that the complete image is also one state in the automata. Let us next
consider an example of an automata (Figure 4.27), and the image it creates (Figure 4.28). Here
we adopt a color space where 0 represents the white color (void), and 1 represents the black
color (element). The decimal values from 0 to 1 are different shades of gray.
The labels of the transitions indicates what subquadrant the transition is applied to. For
example, the transition from Q0 to Q1 is used for the quadrant 3 (top rightmost quadrant) with
the weight ½, and for the quadrants 1 (top leftmost) and 2 (bottom rightmost) with the weight
of ¼. Denote the expression of the quadrant d in Qi by fi(d). Thus, the expression of the
quadrants 1 and 2 in Q0 is given by ½Q0+¼Q1. Note that the definition is recursive so that
these quadrants in Q0 are the same image as the one described by the state itself, but only half
of its size, plus added the one fourth of the image defined by the state Q1.
73
A 2k2k resolution representation of the image in Q0 is constructed by assigning the pixels by
a value f0(s), where s is the k-length string of the pixel's address in the kth level of the
quadtree. For an example, the pixel values at the addresses 00, 03, 30, and 33 are given below:
f0  00 
1
1 1
1 1 1 1
f0  0    f0 ()    
2
2 2
2 2 2 8
f0  03 
1
1
1 1 3
f0  3   1 2 f0 ()  1 2 f1 ()   
2
2
8 4 8
(4.15)
f0  30 
1
1
1 1
1
1 1 5
 f0  0   f1  0    f0 ()   f1 ()   
2
2
2 2
2
8 2 8
(4.16)
f0  33 
1
1
1
1
1 1 1 7
 f0  3   f1  3   1 2 f0 ()  1 2 f1 ()   f1 ()    
2
2
2
2
8 4 2 8

(4.14)



(4.17)
Note, that fi() with empty string evaluates to the final weight of the state, which is ½ in Q0 and
1 in Q1. The image at three different resolutions is shown in Figure 4.28.
1
3
0
2
(a)
11 13 31 33
10 12 30 32
01 03 21 23
00 02 20 22
(b)
(c)
Figure 4.26: (a) The principle of addressing the quadrants; (b) Example of addressing at the
resolution of 44; (c) The subsquare specified by the string 320.
0,1,2,3 ( 1/2)
0,1,2,3 (1)
1,2 ( 1/4 )
1/
2
1
Q0
Q1
3 ( 1/2 )
Figure 4.27: A diagram for a WFA defining the linear grayness function of Figure 4.28.
74
4
6
2
4
/8
/8
4/
8
3/
8
2
/8
1
/8
/8
/8
2x2
5/
8
4/
8
3
/8
2
/8
6/
8
5/
8
4
/8
3
/8
7/
8
6/
8
5
/8
4
/8
4x4
128 x 128
Figure 4.28: Image generated by the automata in Figure 4.27 at different resolutions.
The second example is the automata given in Figure 4.29. Here the states are illustrated by the
images they define. The state Q1 is expressed in the following way. The quadrant 0 is the same
as Q0, whereas the quadrant 3 is empty so no transitions with the label 3 exists. The quadrants
1 and 2, on the other hand, are recursively described by the same state Q1. The state Q2 is
expressed as follows. The quadrant 0 is the same as Q1, and the quadrant 3 is empty. The
quadrants 1 and 2 are again recursively described by the same state Q2. Besides the different
color (shading gray) the left part of the diagram (states Q3, Q4, Q5) is the same as the right part
(states Q0, Q1, Q2). For example, Q4 has the shape of Q1 but the color of Q3. The state Q5 is
described so that the quadrant 0 is Q4, quadrant 1 is Q2, and quadrant 2 is the same as the state
itself.
2 (1)
Q5
1,2 (1)
Q2
1 (1)
0 (1)
1,2 ( 1/2)
0 (1)
1,2 (1)
Q4
Q1
1 ( 1/2)
0 (1)
0,1,2,3 ( 1/2)
0 (1)
0,1,2,3 (1)
Q3
Q0
1,3 ( 1/4 )
Figure 4.29: A WFA generating the diminishing triangles.
75
WFA algorithm:
The aim of WFA is to find such an automata that describes the original image as closely as
possible, and is as small as possible by its size. The distortion is measured by TSE (total
square error). The size of the automata can be approximated by the number of states and
transitions. The minimized optimization criteria is thus:
d k  f , f A   G  size A
(4.18)
The G-parameter defines how much emphasis is put on the distortion and to the bit rate. It is
left as an adjusting parameter for the user. The higher is the G, the smaller bit rates will be
achieved at the cost of image quality, and vice versa. Typically G has values from 0.003 to
0.2.
The WFA-algorithm compresses the blocks of the quadtree in two different ways:
 by a linear combination of the functions of existing states
 by adding new state to the automata, and recursively compressing its four subquadrants.
Whichever alternative yields a better result in minimizing (4.18) is then chosen. A small set of
states (initial basis) is predefined. The functions in the basis do not need even to be defined by
a WFA. The choice of the functions can, of course, depend on the type of images one wants to
compress. The initial basis in [Culik & Kari, 1993] resembles the codebook in vector
quantization, which can be viewed as a very restricted version of the WFA fractal
compression algorithm.
The algorithm starts at the top level of the quadtree, which is the complete image. It is then
compressed by the WFA algorithm, see Figure 4.30. The linear combination of a certain block
is chosen by the following greedy heuristic: The subquadrant k of a block i is matched against
each existing state in the automata. The best match j is chosen, and a transition from i to j is
created with the label k. The match is made between normalized blocks so that their size and
average value are scaled to be equal. The weight of the transition is the relative difference
between the mean values. The process is then repeated for the residual, and is carried over
until the reduction in the square error between the original and the block described by the
linear combination is low enough. It is a trade-off between the increase in the bit rate and the
decrease in the distortion.
76
Initial basis:
Initial basis:
Q1
Q2
Q3
w1
Q4
w2
Q5
Q1
Q6
Q2
Q3
Q4
Q5
Q6
w3
Image
Image
(a)
1
Q0
(b)
Figure 4.30: Two ways of describing the image: (a) by the linear combination of existing
states; (b) by creating a new state which is recursively processed.
WFA algorithm for wavelet transformed image:
A modification of the algorithm that yields good results is to combine the WFA with a wavelet
transformation [DeVore et al. 1992]. Instead of applying the algorithm directly to the original
image, one first makes a wavelet transformation on the image and writes the wavelet
coefficients in the Mallat form. Because the wavelet transform has not been considered earlier,
the details of this modification are omitted here.
Compressing the automata:
The final bitstream of the compressed automata consists of three parts:
 Quadree structure of the image decomposition.
 Transitions of the automata
 Weights of the transitions
A bit in the quadtree structure indicates whether a certain block is described as a linear
combination of the other blocks (a set of transitions), or by a new state in the automata. Thus
the states of the automata are implicitly included in the quadtree. The initial states need not to
be stored.
The transitions are stored in an nn matrix, where each non-zero cell M(i, j)=wij describes that
there is a transition from Qi to Qj having the weight wij. If there is no transition between i
and j, wij is set to zero. The label (0,1,2, or 3) is not stored. Instead there are four different
matrices, one for each subquadrant label. The matrices Mk(i, j) (k=0,1,2,3) are then
represented as binary matrices Bk(i, j) so that B(i, j)=1 if and only if wij0; otherwise B(i, j)=0.
The consequence of this is that only the non-zero weights are needed to store.
The binary matrices are very sparse because for each state only few transitions exists,
therefore they can be efficiently coded by run-length coding. Some states are more frequently
used in linear combinations than other, thus arithmetic coding using the column j as the
context was considered in [Kari & Fränti, 1994]. The non-zero weights are then quantized and
a variable length coding (similar to the FELICS coding) is applied.
77
The results of the WFA outperform that of the JPEG, especially at the very low bit rates.
Figures 4.31 and 4.32 illustrate test image Lena when compressed by WFA at the bit rates
0.30, 0.20, and 0.10. The automata (at the 0.20 bpp) consists of 477 states and 4843
transitions. The quadtree requires 1088 bits, whereas the bitmatrices 25850, and the weights
25072 bits.
Original
bpp = 8.00
WFA
mse = 0.00
bpp = 0.30
WFA
bpp = 0.20
mse = 49.32
WFA
mse = 70.96
bpp = 0.10
Figure 4.31: Test image Lena compressed by WFA.
78
mse = 130.03
Original
bpp = 8.00
WFA
mse = 0.00
bpp = 0.30
mse = 70.96
bpp = 0.10
WFA
bpp = 0.20
mse = 49.32
WFA
mse = 130.03
Figure 4.32: Magnifications of Lena compressed by WFA.
79
5
Video images
Video images can be regarded as a three-dimensional generalization of still images, where the
third dimension is time. Each frame of a video sequence can be compressed by any image
compression algorithm. A method where the images are separately coded by JPEG is
sometimes referred as Motion JPEG (M-JPEG). A more sophisticated approach is to take
advantage of the temporal correlations; i.e. the fact that subsequent images resemble each
other very much. This is the case in the latest video compression standard MPEG (Moving
Pictures Expert Group).
MPEG:
MPEG standard consists of both video and audio compression. MPEG standard includes also
many technical specifications such as image resolution, video and audio synchronization,
multiplexing of the data packets, network protocol, and so on. Here we consider only the
video compression in the algorithmic level. The MPEG algorithm relies on two basic
techniques
 Block based motion compensation
 DCT based compression
MPEG itself does not specify the encoder at all, but only the structure of the decoder, and
what kind of bit stream the encoder should produce. Temporal prediction techniques with
motion compensation are used to exploit the strong temporal correlation of video signals. The
motion is estimated by predicting the current frame on the basis of certain previous and/or
forward frame. The information sent to the decoder consists of the compressed DCT
coefficients of the residual block together with the motion vector. There are three types of
pictures in MPEG:
 Intra-pictures (I)
 Predicted pictures (P)
 Bidirectionally predicted pictures (B)
Figure 5.1 demonstrates the position of the different types of pictures. Every Nth frame in the
video sequence is an I-picture, and every Mth frame a P-picture. Here N=12 and M=4. The
rest of the frames are B-pictures.
Compression of the picture types:
Intra pictures are coded as still images by DCT algorithm similarly than in JPEG. They
provide access points for random access, but only with moderate compression. Predicted
pictures are coded with reference to a past picture. The current frame is predicted on the basis
of the previous I- or P-picture. The residual (difference between the prediction and the
original picture) is then compressed by DCT. Bidirectional pictures are similarly coded than
the P-pictures, but the prediction can be made both to a past and a future frame which can be
I- or P-pictures. Bidirectional pictures are never used as reference.
80
The pictures are divided into 1616 macroblocks, each consisting of four 88 elementary
blocks. The B-pictures are not always coded by bidirectional prediction, but four different
prediction techniques can be used:
 Bidirectional prediction
 Forward prediction
 Backward prediction
 Intra coding.
The choice of the prediction method is chosen for each macroblock separately. The
bidirectional prediction is used whenever possible. However, in the case of sudden camera
movements, or a breaking point of the video sequence, the best predictor can sometimes be
given by the forward predictor (if the current frame is before the breaking point), or backward
prediction (if the current frame is after the breaking point). The one that gives the best match
is chosen. If none of the predictors is good enough, the macroblock is coded by intra-coding.
Thus, the B-pictures can consist of macroblock coded like the I-, and P-pictures.
The intra-coded blocks are quantized differently from the predicted blocks. This is because
intra-coded blocks contain information in all frequencies and are very likely to produce
'blocking effect' if quantized too coarsely. The predicted blocks, on the other hand, contain
mostly high frequencies and can be quantized with more coarse quantization tables.
Forward
prediction
Forward
prediction
I
B
B
B
P
B
B
B
P
B
B
B
I
Bidirectional
prediction
Figure 5.1: Interframe coding in MPEG.
Motion estimation:
The prediction block in the reference frame is not necessarily in the same coordinates than the
block in the current frame. Because of motion in the image sequence, the most suitable
predictor for the current block may exist anywhere in the reference frame. The motion
estimation specifies where the best prediction (best match) is found, whereas motion
compensation merely consists of calculating the difference between the reference and the
current block.
The motion information consists of one vector for forward predicted and backward predicted
macroblocks, and of two vectors for bidirectionally predicted macroblocks. The MPEG
standard does not specify how the motion vectors are to be computed, however, block
matching techniques are widely used. The idea is to find in the reference frame a similar
81
macroblock to the macroblock in the current frame (within a predefined search range). The
candidate blocks in the reference frame are compared to the current one. The one minimizing
a cost function measuring the mismatch between the blocks, is the one which is chosen as
reference block.
Exhaustive search where all the possible motion vectors are considered are known to give
good results. Because the full searches with a large search range have such a high
computational cost, alternatives such as telescopic and hierarchical searches have been
investigated. In the former one, the result of motion estimation at a previous time is used as
a starting point for refinement at the current time, thus allowing relatively narrow searches
even for large motion vectors. In hierarchical searches, a lower resolution representation of the
image sequence is formed by filtering and subsampling. At a reduced resolution the
computational complexity is greatly reduced, and the result of the lower resolution search can
be used as a starting point for reduced search range conclusion at full resolution.
82
Literature
All
M. Rabbani, P.W. Jones, Digital Image Compression Techniques. Bellingham, USA, SPIE
Optical Engineering Press, 1991.
All
B.Furth (editor), Handbook of Multimedia Computing. CRC Press, Boca Raton, 1999.
All
C.W.Brown and B.J.Shepherd, Graphics File Formats: Reference and Guide. Manning
publications, Greenwich, 1995.
All
M. Nelson, Data Compression: The Complete Reference (2nd edition). Springer-Verlag, New
York, 2000.
All
J.A. Storer, M. Cohn (editors), IEEE Proc. of Data Compression Conference, Snowbird,
Utah, 2002.
1-3
I.H. Witten, A. Moffat, and T.C. Bell, Managing Gigabytes: Compressing and Indexing
Documents and Images. Van Nostrand Reinhold, New York, 1994.
1
R.E. Gonzalez, R.E. Woods, Digital Image Processing. Addison-Wesley, 1992.
1
A.Low, Introductory Computer Vision and Image Processing. McGraw-Hill, 1991.
1
P. Fränti, Digital Image Processing, University of Joensuu, Dept. of Computer Science.
Lecture notes, 1998.
1.3
M. Miyahara, K. Kotani and V.R. Algazi, Objective Picture Quality Scale (PQS) for Image
Coding, IEEE Transactions on Communications, Vol. 46 (9), 1215-1226, September 1998.
1.3
P. Fränti, "Blockwise distortion measure for statistical and structural errors in digital images",
Signal Processing: Image Communication, 13 (2), 89-98, August 1998.
2
J. Teuhola, Source Encoding and Compression, Lecture Notes, University of Turku, 1998.
2.1.3 T. Bell, J. Cleary, I. Witten, Text Compression. Prentice-Hall, Englewood Cliffs, New Jersey,
1990.
2.2
P.G. Howard, The Design and Analysis of Efficient Lossless Data Compression Systems.
Brown University, Ph.D. Thesis (CS-93-28), June 1993.
2.2.1 D. Huffman, A Method for the Reconstruction of Minimum Redundancy Codes. Proc. of the
IRE, Vol. 40, 1098-1101, 1952.
2.2.2 J. Rissanen, G.G. Langdon, Arithmetic Coding. IBM Journal of Research and Development,
Vol. 23 (2), 149-162, March 1979.
2.2.2 W.B. Pennebaker, J.L. Mitchell, G.G. Langdon, R.B. Arps, An Overview of the Basic
Principles of the Q-coder adaptive Binary Arithmetic Coder. IBM Journal of Research and
Development, Vol. 32 (6), 717-726, November 1988.
3
R. Hunter, A.H. Robinson, International digital facsimile coding standards. Proceedings of the
IEEE, Vol. 68 (7), 854-867, July 1980.
3
Y. Yasuda, Overview of Digital Facsimile Coding Techniques in Japan. Proceedings of the
IEEE, Vol. 68 (7), 830-845, July 1980.
3 & 4.1 R.B. Arps and T.K. Truong , “Comparison of international standards for lossless still image
compression”. Proc. of the IEEE, 82, 889-899, June 1994.
3 & 4 P. Fränti, Block Coding in Image Compression, Ph.D. Thesis, University of Turku, Dept. of
Computer Science, 1994. (Research report R-94-12)
3.1
A.N. Netravali, F.W. Mounts, Ordering Techniques for Facsimile Coding: A Review.
Proceedings of the IEEE, Vol. 68 (7), 796-807, July 1980.
3.1
Y. Wao, J.-M. Wu, Vector Run-Length Coding of Bi-level Images. Proceedings Data
Compression Conference, Snowbird, Utah, 289-298, 1992.
3.3
CCITT, Standardization of Group 3 Facsimile Apparatus for Document Transmission, ITU
Recommendation T.4, 1980.
83
3.3
3.3
3.4
3.4
3.5
3.5
3.5
3.5
3.5
3.6
3.6
4.1
4.1.1
4.1.2
4.1.2
4.1.3
4.1.4
4.1.4
4.1.5
4.1.6
4.1.6
4.2
CCITT, Facsimile Coding Schemes and Coding Control Functions for Group 4 Facsimile
Apparatus, ITU Recommendation T.6, 1984.
Y. Yasuda, Y. Yamazaki, T. Kamae, B. Kobayashi, Advanced in FAX. Proceedings of the
IEEE, Vol. 73 (4), 706-730, April 1985.
M. Kunt, O. Johnsen, Block Coding: A Tutorial Review. Proceedings of the IEEE, Vol. 68
(7), 770-786, July 1980.
P. Fränti and O. Nevalainen, "Compression of binary images by composite methods based on
the block coding", Journal of Visual Communication and Image Representation, 6 (4),
366-377, December 1995.
ISO/IEC Committee Draft 11544, Coded Representation of Picture and Audio Information Progressive Bi-level Image Compression, April 1992.
G.G. Langdon, J. Rissanen, Compression of Black-White Images with Arithmetic Coding.
IEEE Transactions on Communications, Vol. 29 (6), 358-367, June 1981.
B. Martins, and S. Forchhammer, Bi-level image compression with tree coding. IEEE
Transactions on Image Processing, April 1998, 7 (4), 517-528.
E.I. Ageenko and P. Fränti, “Enhanced JBIG-based compression for satisfying objectives of
engineering document management system”, Optical Engineering, 37 (5), 1530-1538, May
1998.
E.I. Ageenko and P. Fränti, "Forward-adaptive method for compressing large binary images",
Software Practice & Experience, 29 (11), 1999.
P.G. Howard, “Text image compression using soft pattern matching”, The Computer Journal,
40 (2/3), 146-156, 1997.
Howard, P. G., Kossentini, F., Martins, B., Forchammer, S, and Rucklidge, W. J., The
emerging JBIG2 standard. IEEE Trans. Circuits and Systems for Video Technology,
November 1998, 8 (7), 838-848.
Arps R.B., Truong T.K., Comparison of International Standards for Lossless Still Image
Compression. Proceedings of the IEEE, Vol. 82 (6), 889-899, June 1994.
M. Rabbani and P.W. Melnychuck, Conditioning Context for the Arithmetic Coding of Bit
Planes. IEEE Transactions on Signal Processing, Vol. 40 (1), 232-236, January 1992.
P.E. Tischer, R.T. Worley, A.J. Maeder and M. Goodwin, Context-based Lossless Image
Compression. The Computer Journal, Vol. 36 (1), 68-77, January 1993.
N. Memon and X. Wu, Recent developments in context-based predictive techniques for
lossless image compression. The Computer Journal, Vol. 40 (2/3), 127-136, 1997.
P.G. Howard, J.S. Vitter, Fast and Efficient Lossless Image Compression. Proceedings Data
Compression Conference, Snowbird, Utah, 351-360, 1993.
M. Weinberger, G. Seroussi and G. Sapiro, “The LOCO-I lossless image compression
algorithm: principles and standardization into JPEG-LS”, Research report HPL-98-193,
Hewlett Packard Laboratories. (submitted to IEEE Transactions on Image Processing)
X. Wu and N.D. Memon, “Context-based, adaptive, lossless image coding”, IEEE
Transactions on Communications, Vol. 45 (4), 437-444, April 1997.
S. Takamura, M. Takagi, Lossless Image Compression with Lossy Image Using Adaptive
Prediction and Arithmetic Coding. Proceedings Data Compression Conference, Snowbird,
Utah, 166-174, 1994.
J. Ziv, A. Lempel, A Universal Algorithm for Sequential Data Compression. IEEE
Transactions on Information Theory, Vol. 23 (3), 337-343, May 1977.
J. Ziv, A. Lempel, Compression of Individual Sequences Via Variable-Rate Coding. IEEE
Transactions on Information Theory, Vol. 24 (5), 530-536, September 1978.
E.J. Delp, O.R. Mitchell, Image Coding Using Block Truncation Coding. IEEE Transactions
on Communications, Vol. 27 (9), 1335-1342, September 1979.
84
4.2
4.3
4.3
4.3
4.4
4.5
4.5
4.5
4.5
4.6
4.6
4.6
6
P. Fränti, O. Nevalainen and T. Kaukoranta, "Compression of Digital Images by Block
Truncation Coding: A Survey", The Computer Journal, vol. 37 (4), 308-332, 1994.
A. Gersho, R.M. Gray, Vector Quantization and Signal Compression. Kluwer Academic
Publishers, Dordrecht, 1992.
N.M. Nasrabadi, R.A. King, Image Coding Using Vector quantization: A Review. IEEE
Transactions on Communications, Vol. 36 (8), 957-971, August 1988.
Y. Linde, A. Buzo, R.M. Gray, An Algorithm for Vector Quantizer Design. IEEE
Transactions on Communications, Vol.28 (1), 84-95, January 1980.
W.B. Pennebaker, J.L. Mitchell, JPEG Still Image Data Compression Standard. Van
Nostrand Reinhold, 1993.
M. Vetterli and C. Herley, Wavelets and Filter Banks: Theory and Design, IEEE Transactions
on Signal Processing, Vol. 40, 2207-2232, 1992.
R.A. DeVore, B. Jawerth, B.J. Lucier, Image Compression Through Wavelet Transform
Coding. IEEE Transactions on Information Theory, Vol. 38 (2), 719-746, March 1992.
T. Ebrahimi, C. Christopoulos and D.T. Lee: Special Issue on JPEG-2000, Signal Processing:
Image Communication, Vol. 17 (1), Pages 1-144, January 2002.
D.S. Taubman, M.W. Marcellin, JPEG-2000: Image Compression Fundamentals, Standards
and Practice. Kluwer Academic Publishers, Dordrecht, 2002.
Y. Fisher, Fractal Image Compression: Theory and Application. Springer-Verlag, New York,
1995.
K. Culik, J. Kari, Image Compression Using Weighted Finite Automata. Computers &
Graphics, Vol. 17 (3), 305-313, 1993.
J. Kari, P. Fränti, Arithmetic Coding of Weighted Finite Automata. Theoretical Informatics
and Applications, Vol. 28 (3-4), 343-360, 1994.
D.J. LeGall, The MPEG Video Compression Algorithm. Signal Processing: Image
Communication, Vol. 4 (2), 129-139, April 1992.
85
Appendix A: CCITT test images
IMAGE 1
IMAGE 2
IMAGE 3
IMAGE 4
IMAGE 5
IMAGE 6
IMAGE 7
IMAGE 8
86
Appendix B: Gray-scale test images
BRIDGE (256256)
CAMERA (256256)
BABOON (512512)
LENA (512512)
87
Download