Adaptive Linear Prediction Lossless Image Coding

advertisement
Adaptive Linear Prediction Lossless Image Coding
Giovanni Motta1, James A. Storer1 and Bruno Carpentieri2
Abstract: The practical lossless digital image compressors that achieve the best results in terms of
compression ratio are also simple and fast algorithms with low complexity both in terms of memory usage
and running time. Surprisingly, the compression ratio achieved by these systems cannot be substantially
improved even by using image-by-image optimization techniques or more sophisticate and complex
algorithms [6]. A year ago, B. Meyer and P. Tischer were able, with their TMW [2], to improve some
current best results (they do not report results for all test images) by using global optimization techniques
and multiple blended linear predictors. Our investigation is directed to determine the effectiveness of an
algorithm that uses multiple adaptive linear predictors, locally optimized on a pixel-by-pixel basis. The
results we obtained on a test set of nine standard images are encouraging, where we improve over CALIC
on some images.
Introduction
After the Call for Contributions for ISO/IEC JTC 1.29.12 (lossless JPEG), the field of
greylevel lossless image compression received great attention from many researchers in
the data compression community. Most of the contributions are very effective in
compressing images, while keeping low the computational complexity and the memory
requirements. On the other hand, most of them use heuristics and, even if the
compression ratio achieved cannot be in practice easily improved, it is not completely
clear they are able to capture the real entropy of the image or not.
In [2] and [5] B. Meyer and P. Tischer proposed TMW, a lossless image coding
algorithm that, by using linear predictors, achieves on some test images compression
performance higher than CALIC [3], that is the best (in terms of compression ratio)
algorithm known so far. TMW improves the current best results by using global
optimization and blended linear predictors; a TMW compressed file consists of two parts:
a header that contains the parameters of the model and the encoded data itself. Even if
TMW has a computational complexity several orders of magnitude greater than CALIC,
the results are in any case surprising because:
 Linear predictors are known to be not effective in capturing fast transitions in
image luminosity (edges) [6];
 Global optimization seemed unable to improve substantially the performance of
lossless image compressors [6];
 CALIC was thought to achieve a data rate extremely close to the real entropy of
the image.
In this paper, we discuss a series of experiments we made with an algorithm that uses
multiple adaptive linear predictors that are locally optimized on a pixel-by-pixel basis.
We address the problem of greylevel lossless image compression exclusively from the
point of view of the achievable compression ratio, without being concerned about
computational complexity or memory requirements.
1
2
Brandeis University, Computer Science Dept., Waltham MA-02454, {gim, storer}@cs.brandeis.edu.
Universita' di Salerno, Dip. di Informatica ed Applicazioni "R.M. Capocelli", I-84081 Baronissi (SA),
Italy, bc@dia.unisa.it.
-2The preliminary results we obtained on a test set of nine standard images are
encouraging. We improve over CALIC on some test images and we believe that, with a
better encoding of the prediction error, our algorithm can be competitive with CALIC and
TMW.
Description of the Algorithm
Our algorithm is based on adaptive linear prediction and consists of two main steps, pixel
prediction and entropy coding; a pseudocode description is given in Figure 3. The input
image is scanned from top to bottom, left to right, and the luminosity of each pixel PIX(x,
y) is predicted according to a weighted sum of its neighbors (or context) and rounded to
the nearest integer value:
PIX(x, y)  int( w0 * PIX(x, y  2)  w1 * PIX(x  1,y  1)  w2 * PIX(x, y  1)
w3 * PIX (x  1, y 1)  w4 * PIX (x  2, y)  w5 * PIX(x  1,y))
Figure 1 shows the pixels that form the context of PIX(x, y). The context has a fixed
shape and only the weights are allowed to change.
Title: predictor.eps
Creator: fig2dev Version3.2 Patchlevel
0-beta3
Preview: This EPS picture was not saved
witha preview(TIFF or PICT) included init
Comment: This EPS picture will print to a
postscript printer but not to other types of
printers
Figure 1: Context of the pixel PIX(x, y).
After the prediction, an error ERR(x, y) (prediction error or residual) is calculated by
subtracting the current pixel from its prediction
ERR( x,y)  PIX(x, y)  PIX(x, y)
and finally the prediction error is entropy encoded and sent to the decoder.
If we encode the image in raster-scan order, with a top to bottom, left to right scan, the
context will be composed by previously encoded pixels and the prediction error is
sufficient for the decoder to make a faithful reconstruction of the original pixel value.
During the encoding process, the weights w0 ,..., w5 are adaptively changed and optimized
on a per pixel basis. Our intent is to determine the predictors' weights such that they are
able to model local characteristics of the image being encoded. After several
experiments, we decided to determine the predictor by optimizing the energy of the
prediction error inside a small window of radius Rp centered on PIX(x,y), Wx ,y (R p ) :
min E(x, y)  min
w 0 ,...,w5
 (ERR( x , y ))2
w0 ,...,w5 PIX ( x , y )W ( R )
 
x, y
p
-3Using a window of previously encoded pixels we can use a backward prediction scheme
and the encoder has no need to send any side information to the decoder. On the other
hand, backward prediction has as a well-known major drawback, poor performance in the
presence of edges. The radius Rp of the window Wx ,y (R p ) (See Figure 2) is one of the
essential features of our algorithm. Its size affects the prediction quality because if Rp is
too small, only a few samples are in the window and the predictor "overspecializes"
making big errors when in presence on edges. On the other hand too many samples in the
window ( Rp too big) tend to generate predictors that are not specific enough to remove
local variations in the image. In our experiments, we decided to keep Rp constant and
equal for all the images.
Title: window.eps
Creator: fig2devVersion 3.2 Patchlevel 0-beta3
Preview: This EPS picture was not saved witha
preview(TIFF or PICT) included in it
Comment: This EPSpicture will print to a postscript
printer but not to other types of printers
Figure 2: Window Wx ,y (R p ) of radius Rp and centered on PIX(x, y).
To improve the prediction, the optimization is performed only on a subset of samples
collected in the window. The rationale is that we want the predictors' weights to be
representative of the relation existing between the context and the pixel being encoded.
By discarding samples that have a context "too different" from the one of the current
pixel, we can specialize the prediction and follow fine periodic patterns in the window.
Most algorithms existing in literature use a simple pixel predictor and compensate the
weak prediction with sophisticate heuristics to model the error as a function of the
context in which it occurs (see for example LOCO-I [1]). Our algorithm, instead, embeds
the contextual encoding inside the error prediction step. The classification of the samples
in clusters of pixels that have a similar context is performed by using a Generalized
Lloyd Algorithm [9] (or LBG). This classification method, although not optimal in our
framework, is good enough to improve the performance of the basic adaptive predictor.
However, we are confident that a better classification would improve the performance of
our algorithm.
Once all the samples in the window are classified and a representative centroid is
determined for each cluster, a cluster of pixels is selected according to the minimum
distance between the context of the corresponding centroid and the context of the current
pixel. Also, in a set of predictors, the one that achieves the lowest prediction error on the
selected centroid is chosen. This predictor is finally refined by applying Gradient Descent
optimization on the samples collected in the selected cluster.
-4for every pixel PIX(x,y) in the input image do begin
Collect all the pixels and their context in Wx,y(Rp)
Determine n centroids C1,...,Cn by applying the LBG on the contexts in Wx,y(Rp)
Let K1,...,Kn be the corresponding clusters
Classify each pixel/context in Wx,y(Rp) in one of the clusters K1,...,Kn
Classify the context of the current pixel PIX(x,y); let k be the index of the cluster
Let Pi={w0, .., w5} be the predictor that achieves the smallest error on Ck among a
set of predictors P1,...,Pn
Apply the Gradient Descent on the pixels in Ck to refine the predictor Pi
Use the refined predictor P'i to predict PIX(x,y)
Generate the prediction error ERR(x,y)
end
Figure 3: Pseudocode description of the adaptive prediction.
At each step t of the optimization, while the difference between the previous and the
current errors is smaller than a fixed threshold, the weights wi of the predictor are
changed according to
E
wi (t 1)  wi (t)  
wi
where E is the error energy and  a small constant. When only a few samples are in the
window, for example when PIX(x, y) is close to the top or to the left border, a default
fixed predictor is used in the prediction and the Gradient Descent optimization is not
applied. In our implementation, we used as a default the classic "planar predictor" [6]:
Pdef  {w0  0,w1  1,w2  1,w3  0,w4  0, w5  1}
The algorithm uses at each step the refined predictor from the previous iterations, so
initialization is not an issue and the predictors P1,...,Pn can be initialized with random
values without compromising the performance. We also experimented by initializing the
predictors with the values used in JPEG-LS, this only resulted in a slightly faster
convergence in the Gradient Descent optimization. Reinitializing the predictors, instead
of using the previous refined weights, while resulting in a much slower convergence,
doesn't seems to change the compression ratio.
Entropy Coding
As is common in literature [10], we assume that for most images, prediction errors can be
closely approximated by a Laplace distribution. In our case, adaptive linear prediction
generates a skewed Laplacian distribution, centered on zero and with very long tails. We
decided to use an Arithmetic Encoder [8] for the error entropy coding. Arithmetic
encoding divides the coding step into the determination of a probabilistic model for the
source and in the entropy coding that uses that model. This results in a very general
-5framework in which the modeling part can be easily customized to perform experiments.
Minimum and maximum error have absolute values that are much greater than the
operative limits of the distribution; during the experiments, we observed that 95% of the
errors are concentrated in an interval [-, ..., +] that is substantially narrower than [Min,
..., Max]. While typical values for s are in the range [8, ...,20], Min and Max assume, in
general, values in the range [-120, ..., 120].
The Arithmetic Coder implementation that we used [11] has the limitation that the initial
probabilities must be always greater than zero. As a consequence, when only a small
number of samples is available to model the distribution, encoder efficiency can be
compromised by the use of a coding range that is substantially greater than necessary.
For this reason, we decided to experiment by using two different models: one for the
"typical errors", in the range [-, ..., +], and another for the errors outside of that range.
As Min and Max, the parameter  is determined by an off-line observation of all the
errors and must be sent to the decoder. While sending those parameters has a cost that is
irrelevant from the point of view of the compressed file size, it makes our algorithm an
off-line procedure. Adaptive arithmetic encoders that rescale their range can eventually
be used to overcome this problem. Errors are encoded by separating magnitude and sign.
This is reasonable because the sign of the prediction error has little or no correlation with
its magnitude, and two different probabilistic models can be used for the encoding.
Our implementation uses an arithmetic encoder with four different models:
ACM_P Parameters model, used only to transmit the header of the compressed file
with all the global parameters (Min, Max, , Re);
ACM_M Used to encode the magnitude of the typical errors. It has symbols inside
the range [0, ..., ] plus an extra symbol (+1) that is used to send an
escape signal to the decoder;
ACM_E An adaptive model that is used to encode the magnitude of the non-typical
errors. It has symbols in the range [+1, ..., max(abs(Max), abs(Min))];
ACM_S Used to encode the error sign. It has two symbols [0, 1] that represent
positive and negative errors.
Unlike the other three models, ACM_M is not automatically updated and the probability
distribution for the magnitude of the prediction error ERR(x, y) is determined each time
by observing the previously encoded error magnitudes in a window Wx ,y (Re ) or radius Re.
A gain in the compression ratio can be achieved by modeling properly the error sign; our
current implementation however, uses a less effective and simpler adaptive model.
Results and Discussion
Experiments were performed in order to assess the algorithm on a test set composed of
nine greylevel images 720 * 576 pixels, digitized with a resolution of 8 bit (256 grey
levels) per pixel.
-6-
Figure 4: Comparisons with the entropy of the prediction error in LOCO-I.
The same test set is widely used for comparisons in most of the lossless data compression
literature and can be downloaded from an ftp site [12]. Main results are expressed in
terms of bit per pixels or by giving the size of the compressed file. This (inelegant)
choice was necessary to evaluate the small variation that are usually rounded off when
results are expressed in bit per pixel.
Figure 4 compares the entropy of the prediction error of the simple fixed predictor used
in LOCO-I with the entropy of the prediction error achieved by our algorithm. The results
were obtained by using 2 predictors and by optimizing the predictors in a window of
radius Rp  10 . For comparison we also indicated the overall performance of LOCO-I,
after the sophisticate entropy coding with a context modeling.
# of predictors
Baloon
Barb
Barb2
Board
Boats
Girl
Gold
Hotel
Zelda
Total (in bytes)
1
2
4
6
8
154275
227631
250222
193059
210229
204001
235682
236037
195052
150407
223936
250674
190022
208018
202004
237375
236916
193828
150625
224767
254582
190504
209408
202326
238728
239224
194535
150221
225219
256896
190244
209536
202390
239413
240000
195172
150298
225912
258557
190597
210549
202605
240352
240733
195503
1906188
1893180
1904699
1909091
1915106
Table 1: Compressed File Size vs. Number of Predictors. Results shown for a window of
radius Rp  6 ; error is coded by using a single adaptive arithmetic encoder.
-7-
Rp
Baloon
Barb
Barb2
Board
Boats
Girl
Gold
Hotel
Zelda
Total (in bytes)
6
8
10
150407
223936
250674
190022
208018
202004
237375
236916
193828
149923
223507
249361
190319
206630
201189
235329
235562
193041
1893180
1884861
12
14
149858
224552
246147
190911
206147
201085
234229
235856
192840
150019
225373
247031
191709
206214
201410
234048
236182
192911
150277
226136
246265
192509
206481
201728
234034
236559
193111
1881625
1884897
1887100
Table 2: Compressed File Size vs. window radius RP. The number of predictors used is
2, prediction error is entropy encoded by using a single adaptive arithmetic encoder.
It is evident that our adaptive linear predictors are (understandably) much more powerful
than the fixed predictor used in LOCO-I; however, even adaptive prediction hasn't
enough power to capture edges and sharp transitions, present for example in the picture
"hotel".
Tables 1, 2 and 3 summarize the experiments we made in order to understand the
sensitivity of the algorithm to its parameters. In these experiments, we measured the
variations on the compressed file size when only one of the parameters changes.
In the Table 1, the number of predictor is changed while keeping the window
radius Rp  6 , conversely, in the Table 2, the number of predictors is kept fixed to 2 and
the performance with respect the window size changes is evaluated.
Both the experiments described in the Tables 1 and 2, were performed by using a very
simple entropy coding scheme for the prediction error: a single adaptive arithmetic coder.
As we also verified experimentally, the performance of a single adaptive arithmetic
encoder are a good approximation of the first order entropy of the encoded data.
Re
baloon
barb
barb2
board
boats
girl
gold
hotel
zelda
Total (in bytes)
6
147518
218411
237523
187058
203837
198050
232617
231125
190311
147227
216678
234714
186351
202168
197243
230619
229259
189246
8
10
147235
216082
233303
186171
201585
197013
229706
228623
188798
12
147341
215906
232696
186187
201446
197040
229284
228441
188576
14
147479
215961
232455
186303
201504
197143
229111
228491
188489
16
147620
216135
232399
186467
201623
197245
229026
228627
188461
18
147780
216370
232473
186646
201775
197356
229012
228785
188469
20
147885
216600
232637
186800
201943
197465
229053
228949
188500
1846450
1833505
1828516
1826917
1826936
1827603
1828666
1829832
Table 3: Compressed File Size vs. error window radius Re. The number of predictors is 2
and Rp=10. Prediction error is encoded as described in the "Entropy Coding" section.
-8-
SUNSET CB9 [1]
LOCO-I [1]
UCM [6]
2 Pred., Rp=10, EC with Re
CALIC [6]
TWM [2],[13]
baloon
2.89
2.90
2.81
2.84
2.78
2.65
barb barb2 board boats
4.64 4.71 3.72 3.99
4.65 4.66 3.64 3.92
4.44 4.57 3.57 3.85
4.16 4.48 3.59 3.89
4.31 4.46 3.51 3.78
4.08 4.38
3.61
girl
3.90
3.90
3.81
3.80
3.72
gold hotel zelda Average
4.60 4.48 3.79
4.08
4.47 4.35 3.87
4.04
4.45 4.28 3.80
3.95
4.42 4.41 3.64
3.91
4.35 4.18 3.69
3.86
4.28
3.80
Table 4: Compression rate in bit per pixel achieved on the test set by some popular
lossless image encoding algorithms. The number of predictors used in our results is 2,
Rp=10 and entropy encoding is performed as described in the "Entropy Coding" section.
Table 3 reports the conclusive experiments; the number of predictors is kept fixed to 2,
Rp=10 and performance is evaluated encoding the prediction error as described in the
section "Entropy Coding". Results are reported for changes in the value of Re .
Comparisons with some popular lossless image codecs (see Table 4 and Figure 5) shows
that the proposed algorithm achieves good performance on most test set images. Where
we fall short of CALIC confirms that linear prediction, even in this form, is not adequate
to model image edginess. Also, unlike CALIC, our codec doesn't use any special mode to
encode high contrast image zones, so our results are penalized by images like "hotel" that
have high contrasted regions. A closer look to the prediction error magnitude and sign for
"board" and "hotel", two images in the test set, shows that most of the edges in the
original image are still present in the prediction error (Figures 6 and 7).
Conclusion
The preliminary results we obtained experimenting on a test set of nine standard images
are encouraging. With a better classification and selection of the contexts in the
prediction window and with a more sophisticated encoding of the prediction error, it may
be possible to achieve stable and better results on all the test images.
Also it is likely that the computational complexity can be substantially reduced without
sacrificing the performance, by using alternative methods for the optimization of the
predictors. Further complexity reduction may be possible by substituting the arithmetic
coder with more efficient entropy coders.
Acknowledgment
We wish to thank Martin Cohn for fruitful discussions.
-9-
Figure 5: Graphical representation of the data in Table 4.
Title: board_mag.eps
Creator: MATLAB, The Mathworks, Inc.
Preview: This EPS picture was not saved witha preview(TIFF or PICT)
included init
Comment: This EPSpicture will print to a postscript printer but not to
other types of printers
Title: board_sgn.eps
Creator: MATLAB, The Mathworks, Inc.
Preview: This EPS picture was not saved witha preview(TIFF or PICT)
included init
Comment: This EPSpicture will print to a postscript printer but not to
other types of printers
Title: hotel_mag.eps
Creator: MATLAB, The Mathworks, Inc.
Preview: This EPS picture was not saved witha preview(TIFF or PICT)
included init
Comment: This EPSpicture will print to a postscript printer but not to
other types of printers
Title: hotel_sgn.eps
Creator: MATLAB, The Mathworks, Inc.
Preview: This EPS picture was not saved witha preview(TIFF or PICT)
included init
Comment: This EPSpicture will print to a postscript printer but not to
other types of printers
Figures 6 and 7: Magnitude (left column) and sign (right column) of the prediction error
in two images of the Test Set. Images are "board" (top row) and "hotel" (bottom row).
- 10 -
Bibliography
[1] M.J. Weinberger, G. Seroussi and G. Sapiro, "LOCO-I: A Low Complexity,
Context-Based, Lossless Image Compression Algorithm", Proceedings IEEE
Data Compression Conference (DCC), Snowbird, Utah, Mar-Apr 1996.
[2] B. Meyer and P. Tischer, "Extending TMW for Near Lossless Compression of
Greyscale Images", Proceedings IEEE Data Compression Conference (DCC),
Snowbird, Utah, Mar-Apr 1998.
[3] X. Wu and N. Memon, "Context-based, Adaptive, Lossless Image Codec", IEEE
Trans. on Communications, Vol.45, No.4, Apr 1997.
[4] X. Wu, W. Choi and N. Memon, "Lossless Interframe Image Compression via
Context Modeling", Proceedings IEEE Data Compression Conference (DCC),
Snowbird, Utah, Mar-Apr 1998.
[5] B. Meyer and P. Tischer, "TMW - a New Method for Lossless Image
Compression", International Picture Coding Symposium PCS97 conference
proceedings, Sep 1997.
[6] X. Wu, "An Algorithmic Study on Lossless Image Compression", Proceedings
IEEE Data Compression Conference (DCC), Snowbird, Utah, Mar-Apr 1996.
[7] D. Speck, "Fast Robust Adaptation of Predictor Weights from Min/Max
Neighboring Pixels for minimal Conditional Entropy", Proc. Twenty-Ninth
Asilomar Conference Signal, Systems and Computers, pgg. 234-242, Oct 30 Nov 2, Pacific Groove, CA.
[8] I.H. Witten, R. Neal and J.G. Cleary, "Arithmetic Coding for Data Compression",
Communications of the ACM, Vol.30, No.6, Jun 1987, pp.520-540.
[9] Y. Linde, A. Buzo and R.M. Gray, "An Algorithm for Vector Quantization
Design", IEEE Trans. Communications, Jan 1980, v. COM-28, pgg. 84-95.
[10] P.G. Howard, "The Design and Analysis of Efficient Lossless Data Compression
Systems", Ph.D. Dissertation, Department of Computer Science, Brown
University, June 1993.
[11] F. Wheeler, Adaptive Arithmetic Coding, Source Code from:
"http://ipl.rpi.edu/wheeler/ac/".
[12] X. Wu, Test Images, from: "ftp://ftp.csd.uwo.ca/pub/from_wu/images/".
[13] B. Meyer, TMW Code and new results, from:
"http://www.cs.monash.edu.au/~bmeyer/tmw".
Download