Adaptive Linear Prediction Lossless Image Coding Giovanni Motta1, James A. Storer1 and Bruno Carpentieri2 Abstract: The practical lossless digital image compressors that achieve the best results in terms of compression ratio are also simple and fast algorithms with low complexity both in terms of memory usage and running time. Surprisingly, the compression ratio achieved by these systems cannot be substantially improved even by using image-by-image optimization techniques or more sophisticate and complex algorithms [6]. A year ago, B. Meyer and P. Tischer were able, with their TMW [2], to improve some current best results (they do not report results for all test images) by using global optimization techniques and multiple blended linear predictors. Our investigation is directed to determine the effectiveness of an algorithm that uses multiple adaptive linear predictors, locally optimized on a pixel-by-pixel basis. The results we obtained on a test set of nine standard images are encouraging, where we improve over CALIC on some images. Introduction After the Call for Contributions for ISO/IEC JTC 1.29.12 (lossless JPEG), the field of greylevel lossless image compression received great attention from many researchers in the data compression community. Most of the contributions are very effective in compressing images, while keeping low the computational complexity and the memory requirements. On the other hand, most of them use heuristics and, even if the compression ratio achieved cannot be in practice easily improved, it is not completely clear they are able to capture the real entropy of the image or not. In [2] and [5] B. Meyer and P. Tischer proposed TMW, a lossless image coding algorithm that, by using linear predictors, achieves on some test images compression performance higher than CALIC [3], that is the best (in terms of compression ratio) algorithm known so far. TMW improves the current best results by using global optimization and blended linear predictors; a TMW compressed file consists of two parts: a header that contains the parameters of the model and the encoded data itself. Even if TMW has a computational complexity several orders of magnitude greater than CALIC, the results are in any case surprising because: Linear predictors are known to be not effective in capturing fast transitions in image luminosity (edges) [6]; Global optimization seemed unable to improve substantially the performance of lossless image compressors [6]; CALIC was thought to achieve a data rate extremely close to the real entropy of the image. In this paper, we discuss a series of experiments we made with an algorithm that uses multiple adaptive linear predictors that are locally optimized on a pixel-by-pixel basis. We address the problem of greylevel lossless image compression exclusively from the point of view of the achievable compression ratio, without being concerned about computational complexity or memory requirements. 1 2 Brandeis University, Computer Science Dept., Waltham MA-02454, {gim, storer}@cs.brandeis.edu. Universita' di Salerno, Dip. di Informatica ed Applicazioni "R.M. Capocelli", I-84081 Baronissi (SA), Italy, bc@dia.unisa.it. -2The preliminary results we obtained on a test set of nine standard images are encouraging. We improve over CALIC on some test images and we believe that, with a better encoding of the prediction error, our algorithm can be competitive with CALIC and TMW. Description of the Algorithm Our algorithm is based on adaptive linear prediction and consists of two main steps, pixel prediction and entropy coding; a pseudocode description is given in Figure 3. The input image is scanned from top to bottom, left to right, and the luminosity of each pixel PIX(x, y) is predicted according to a weighted sum of its neighbors (or context) and rounded to the nearest integer value: PIX(x, y) int( w0 * PIX(x, y 2) w1 * PIX(x 1,y 1) w2 * PIX(x, y 1) w3 * PIX (x 1, y 1) w4 * PIX (x 2, y) w5 * PIX(x 1,y)) Figure 1 shows the pixels that form the context of PIX(x, y). The context has a fixed shape and only the weights are allowed to change. Title: predictor.eps Creator: fig2dev Version3.2 Patchlevel 0-beta3 Preview: This EPS picture was not saved witha preview(TIFF or PICT) included init Comment: This EPS picture will print to a postscript printer but not to other types of printers Figure 1: Context of the pixel PIX(x, y). After the prediction, an error ERR(x, y) (prediction error or residual) is calculated by subtracting the current pixel from its prediction ERR( x,y) PIX(x, y) PIX(x, y) and finally the prediction error is entropy encoded and sent to the decoder. If we encode the image in raster-scan order, with a top to bottom, left to right scan, the context will be composed by previously encoded pixels and the prediction error is sufficient for the decoder to make a faithful reconstruction of the original pixel value. During the encoding process, the weights w0 ,..., w5 are adaptively changed and optimized on a per pixel basis. Our intent is to determine the predictors' weights such that they are able to model local characteristics of the image being encoded. After several experiments, we decided to determine the predictor by optimizing the energy of the prediction error inside a small window of radius Rp centered on PIX(x,y), Wx ,y (R p ) : min E(x, y) min w 0 ,...,w5 (ERR( x , y ))2 w0 ,...,w5 PIX ( x , y )W ( R ) x, y p -3Using a window of previously encoded pixels we can use a backward prediction scheme and the encoder has no need to send any side information to the decoder. On the other hand, backward prediction has as a well-known major drawback, poor performance in the presence of edges. The radius Rp of the window Wx ,y (R p ) (See Figure 2) is one of the essential features of our algorithm. Its size affects the prediction quality because if Rp is too small, only a few samples are in the window and the predictor "overspecializes" making big errors when in presence on edges. On the other hand too many samples in the window ( Rp too big) tend to generate predictors that are not specific enough to remove local variations in the image. In our experiments, we decided to keep Rp constant and equal for all the images. Title: window.eps Creator: fig2devVersion 3.2 Patchlevel 0-beta3 Preview: This EPS picture was not saved witha preview(TIFF or PICT) included in it Comment: This EPSpicture will print to a postscript printer but not to other types of printers Figure 2: Window Wx ,y (R p ) of radius Rp and centered on PIX(x, y). To improve the prediction, the optimization is performed only on a subset of samples collected in the window. The rationale is that we want the predictors' weights to be representative of the relation existing between the context and the pixel being encoded. By discarding samples that have a context "too different" from the one of the current pixel, we can specialize the prediction and follow fine periodic patterns in the window. Most algorithms existing in literature use a simple pixel predictor and compensate the weak prediction with sophisticate heuristics to model the error as a function of the context in which it occurs (see for example LOCO-I [1]). Our algorithm, instead, embeds the contextual encoding inside the error prediction step. The classification of the samples in clusters of pixels that have a similar context is performed by using a Generalized Lloyd Algorithm [9] (or LBG). This classification method, although not optimal in our framework, is good enough to improve the performance of the basic adaptive predictor. However, we are confident that a better classification would improve the performance of our algorithm. Once all the samples in the window are classified and a representative centroid is determined for each cluster, a cluster of pixels is selected according to the minimum distance between the context of the corresponding centroid and the context of the current pixel. Also, in a set of predictors, the one that achieves the lowest prediction error on the selected centroid is chosen. This predictor is finally refined by applying Gradient Descent optimization on the samples collected in the selected cluster. -4for every pixel PIX(x,y) in the input image do begin Collect all the pixels and their context in Wx,y(Rp) Determine n centroids C1,...,Cn by applying the LBG on the contexts in Wx,y(Rp) Let K1,...,Kn be the corresponding clusters Classify each pixel/context in Wx,y(Rp) in one of the clusters K1,...,Kn Classify the context of the current pixel PIX(x,y); let k be the index of the cluster Let Pi={w0, .., w5} be the predictor that achieves the smallest error on Ck among a set of predictors P1,...,Pn Apply the Gradient Descent on the pixels in Ck to refine the predictor Pi Use the refined predictor P'i to predict PIX(x,y) Generate the prediction error ERR(x,y) end Figure 3: Pseudocode description of the adaptive prediction. At each step t of the optimization, while the difference between the previous and the current errors is smaller than a fixed threshold, the weights wi of the predictor are changed according to E wi (t 1) wi (t) wi where E is the error energy and a small constant. When only a few samples are in the window, for example when PIX(x, y) is close to the top or to the left border, a default fixed predictor is used in the prediction and the Gradient Descent optimization is not applied. In our implementation, we used as a default the classic "planar predictor" [6]: Pdef {w0 0,w1 1,w2 1,w3 0,w4 0, w5 1} The algorithm uses at each step the refined predictor from the previous iterations, so initialization is not an issue and the predictors P1,...,Pn can be initialized with random values without compromising the performance. We also experimented by initializing the predictors with the values used in JPEG-LS, this only resulted in a slightly faster convergence in the Gradient Descent optimization. Reinitializing the predictors, instead of using the previous refined weights, while resulting in a much slower convergence, doesn't seems to change the compression ratio. Entropy Coding As is common in literature [10], we assume that for most images, prediction errors can be closely approximated by a Laplace distribution. In our case, adaptive linear prediction generates a skewed Laplacian distribution, centered on zero and with very long tails. We decided to use an Arithmetic Encoder [8] for the error entropy coding. Arithmetic encoding divides the coding step into the determination of a probabilistic model for the source and in the entropy coding that uses that model. This results in a very general -5framework in which the modeling part can be easily customized to perform experiments. Minimum and maximum error have absolute values that are much greater than the operative limits of the distribution; during the experiments, we observed that 95% of the errors are concentrated in an interval [-, ..., +] that is substantially narrower than [Min, ..., Max]. While typical values for s are in the range [8, ...,20], Min and Max assume, in general, values in the range [-120, ..., 120]. The Arithmetic Coder implementation that we used [11] has the limitation that the initial probabilities must be always greater than zero. As a consequence, when only a small number of samples is available to model the distribution, encoder efficiency can be compromised by the use of a coding range that is substantially greater than necessary. For this reason, we decided to experiment by using two different models: one for the "typical errors", in the range [-, ..., +], and another for the errors outside of that range. As Min and Max, the parameter is determined by an off-line observation of all the errors and must be sent to the decoder. While sending those parameters has a cost that is irrelevant from the point of view of the compressed file size, it makes our algorithm an off-line procedure. Adaptive arithmetic encoders that rescale their range can eventually be used to overcome this problem. Errors are encoded by separating magnitude and sign. This is reasonable because the sign of the prediction error has little or no correlation with its magnitude, and two different probabilistic models can be used for the encoding. Our implementation uses an arithmetic encoder with four different models: ACM_P Parameters model, used only to transmit the header of the compressed file with all the global parameters (Min, Max, , Re); ACM_M Used to encode the magnitude of the typical errors. It has symbols inside the range [0, ..., ] plus an extra symbol (+1) that is used to send an escape signal to the decoder; ACM_E An adaptive model that is used to encode the magnitude of the non-typical errors. It has symbols in the range [+1, ..., max(abs(Max), abs(Min))]; ACM_S Used to encode the error sign. It has two symbols [0, 1] that represent positive and negative errors. Unlike the other three models, ACM_M is not automatically updated and the probability distribution for the magnitude of the prediction error ERR(x, y) is determined each time by observing the previously encoded error magnitudes in a window Wx ,y (Re ) or radius Re. A gain in the compression ratio can be achieved by modeling properly the error sign; our current implementation however, uses a less effective and simpler adaptive model. Results and Discussion Experiments were performed in order to assess the algorithm on a test set composed of nine greylevel images 720 * 576 pixels, digitized with a resolution of 8 bit (256 grey levels) per pixel. -6- Figure 4: Comparisons with the entropy of the prediction error in LOCO-I. The same test set is widely used for comparisons in most of the lossless data compression literature and can be downloaded from an ftp site [12]. Main results are expressed in terms of bit per pixels or by giving the size of the compressed file. This (inelegant) choice was necessary to evaluate the small variation that are usually rounded off when results are expressed in bit per pixel. Figure 4 compares the entropy of the prediction error of the simple fixed predictor used in LOCO-I with the entropy of the prediction error achieved by our algorithm. The results were obtained by using 2 predictors and by optimizing the predictors in a window of radius Rp 10 . For comparison we also indicated the overall performance of LOCO-I, after the sophisticate entropy coding with a context modeling. # of predictors Baloon Barb Barb2 Board Boats Girl Gold Hotel Zelda Total (in bytes) 1 2 4 6 8 154275 227631 250222 193059 210229 204001 235682 236037 195052 150407 223936 250674 190022 208018 202004 237375 236916 193828 150625 224767 254582 190504 209408 202326 238728 239224 194535 150221 225219 256896 190244 209536 202390 239413 240000 195172 150298 225912 258557 190597 210549 202605 240352 240733 195503 1906188 1893180 1904699 1909091 1915106 Table 1: Compressed File Size vs. Number of Predictors. Results shown for a window of radius Rp 6 ; error is coded by using a single adaptive arithmetic encoder. -7- Rp Baloon Barb Barb2 Board Boats Girl Gold Hotel Zelda Total (in bytes) 6 8 10 150407 223936 250674 190022 208018 202004 237375 236916 193828 149923 223507 249361 190319 206630 201189 235329 235562 193041 1893180 1884861 12 14 149858 224552 246147 190911 206147 201085 234229 235856 192840 150019 225373 247031 191709 206214 201410 234048 236182 192911 150277 226136 246265 192509 206481 201728 234034 236559 193111 1881625 1884897 1887100 Table 2: Compressed File Size vs. window radius RP. The number of predictors used is 2, prediction error is entropy encoded by using a single adaptive arithmetic encoder. It is evident that our adaptive linear predictors are (understandably) much more powerful than the fixed predictor used in LOCO-I; however, even adaptive prediction hasn't enough power to capture edges and sharp transitions, present for example in the picture "hotel". Tables 1, 2 and 3 summarize the experiments we made in order to understand the sensitivity of the algorithm to its parameters. In these experiments, we measured the variations on the compressed file size when only one of the parameters changes. In the Table 1, the number of predictor is changed while keeping the window radius Rp 6 , conversely, in the Table 2, the number of predictors is kept fixed to 2 and the performance with respect the window size changes is evaluated. Both the experiments described in the Tables 1 and 2, were performed by using a very simple entropy coding scheme for the prediction error: a single adaptive arithmetic coder. As we also verified experimentally, the performance of a single adaptive arithmetic encoder are a good approximation of the first order entropy of the encoded data. Re baloon barb barb2 board boats girl gold hotel zelda Total (in bytes) 6 147518 218411 237523 187058 203837 198050 232617 231125 190311 147227 216678 234714 186351 202168 197243 230619 229259 189246 8 10 147235 216082 233303 186171 201585 197013 229706 228623 188798 12 147341 215906 232696 186187 201446 197040 229284 228441 188576 14 147479 215961 232455 186303 201504 197143 229111 228491 188489 16 147620 216135 232399 186467 201623 197245 229026 228627 188461 18 147780 216370 232473 186646 201775 197356 229012 228785 188469 20 147885 216600 232637 186800 201943 197465 229053 228949 188500 1846450 1833505 1828516 1826917 1826936 1827603 1828666 1829832 Table 3: Compressed File Size vs. error window radius Re. The number of predictors is 2 and Rp=10. Prediction error is encoded as described in the "Entropy Coding" section. -8- SUNSET CB9 [1] LOCO-I [1] UCM [6] 2 Pred., Rp=10, EC with Re CALIC [6] TWM [2],[13] baloon 2.89 2.90 2.81 2.84 2.78 2.65 barb barb2 board boats 4.64 4.71 3.72 3.99 4.65 4.66 3.64 3.92 4.44 4.57 3.57 3.85 4.16 4.48 3.59 3.89 4.31 4.46 3.51 3.78 4.08 4.38 3.61 girl 3.90 3.90 3.81 3.80 3.72 gold hotel zelda Average 4.60 4.48 3.79 4.08 4.47 4.35 3.87 4.04 4.45 4.28 3.80 3.95 4.42 4.41 3.64 3.91 4.35 4.18 3.69 3.86 4.28 3.80 Table 4: Compression rate in bit per pixel achieved on the test set by some popular lossless image encoding algorithms. The number of predictors used in our results is 2, Rp=10 and entropy encoding is performed as described in the "Entropy Coding" section. Table 3 reports the conclusive experiments; the number of predictors is kept fixed to 2, Rp=10 and performance is evaluated encoding the prediction error as described in the section "Entropy Coding". Results are reported for changes in the value of Re . Comparisons with some popular lossless image codecs (see Table 4 and Figure 5) shows that the proposed algorithm achieves good performance on most test set images. Where we fall short of CALIC confirms that linear prediction, even in this form, is not adequate to model image edginess. Also, unlike CALIC, our codec doesn't use any special mode to encode high contrast image zones, so our results are penalized by images like "hotel" that have high contrasted regions. A closer look to the prediction error magnitude and sign for "board" and "hotel", two images in the test set, shows that most of the edges in the original image are still present in the prediction error (Figures 6 and 7). Conclusion The preliminary results we obtained experimenting on a test set of nine standard images are encouraging. With a better classification and selection of the contexts in the prediction window and with a more sophisticated encoding of the prediction error, it may be possible to achieve stable and better results on all the test images. Also it is likely that the computational complexity can be substantially reduced without sacrificing the performance, by using alternative methods for the optimization of the predictors. Further complexity reduction may be possible by substituting the arithmetic coder with more efficient entropy coders. Acknowledgment We wish to thank Martin Cohn for fruitful discussions. -9- Figure 5: Graphical representation of the data in Table 4. Title: board_mag.eps Creator: MATLAB, The Mathworks, Inc. Preview: This EPS picture was not saved witha preview(TIFF or PICT) included init Comment: This EPSpicture will print to a postscript printer but not to other types of printers Title: board_sgn.eps Creator: MATLAB, The Mathworks, Inc. Preview: This EPS picture was not saved witha preview(TIFF or PICT) included init Comment: This EPSpicture will print to a postscript printer but not to other types of printers Title: hotel_mag.eps Creator: MATLAB, The Mathworks, Inc. Preview: This EPS picture was not saved witha preview(TIFF or PICT) included init Comment: This EPSpicture will print to a postscript printer but not to other types of printers Title: hotel_sgn.eps Creator: MATLAB, The Mathworks, Inc. Preview: This EPS picture was not saved witha preview(TIFF or PICT) included init Comment: This EPSpicture will print to a postscript printer but not to other types of printers Figures 6 and 7: Magnitude (left column) and sign (right column) of the prediction error in two images of the Test Set. Images are "board" (top row) and "hotel" (bottom row). - 10 - Bibliography [1] M.J. Weinberger, G. Seroussi and G. Sapiro, "LOCO-I: A Low Complexity, Context-Based, Lossless Image Compression Algorithm", Proceedings IEEE Data Compression Conference (DCC), Snowbird, Utah, Mar-Apr 1996. [2] B. Meyer and P. Tischer, "Extending TMW for Near Lossless Compression of Greyscale Images", Proceedings IEEE Data Compression Conference (DCC), Snowbird, Utah, Mar-Apr 1998. [3] X. Wu and N. Memon, "Context-based, Adaptive, Lossless Image Codec", IEEE Trans. on Communications, Vol.45, No.4, Apr 1997. [4] X. Wu, W. Choi and N. Memon, "Lossless Interframe Image Compression via Context Modeling", Proceedings IEEE Data Compression Conference (DCC), Snowbird, Utah, Mar-Apr 1998. [5] B. Meyer and P. Tischer, "TMW - a New Method for Lossless Image Compression", International Picture Coding Symposium PCS97 conference proceedings, Sep 1997. [6] X. Wu, "An Algorithmic Study on Lossless Image Compression", Proceedings IEEE Data Compression Conference (DCC), Snowbird, Utah, Mar-Apr 1996. [7] D. Speck, "Fast Robust Adaptation of Predictor Weights from Min/Max Neighboring Pixels for minimal Conditional Entropy", Proc. Twenty-Ninth Asilomar Conference Signal, Systems and Computers, pgg. 234-242, Oct 30 Nov 2, Pacific Groove, CA. [8] I.H. Witten, R. Neal and J.G. Cleary, "Arithmetic Coding for Data Compression", Communications of the ACM, Vol.30, No.6, Jun 1987, pp.520-540. [9] Y. Linde, A. Buzo and R.M. Gray, "An Algorithm for Vector Quantization Design", IEEE Trans. Communications, Jan 1980, v. COM-28, pgg. 84-95. [10] P.G. Howard, "The Design and Analysis of Efficient Lossless Data Compression Systems", Ph.D. Dissertation, Department of Computer Science, Brown University, June 1993. [11] F. Wheeler, Adaptive Arithmetic Coding, Source Code from: "http://ipl.rpi.edu/wheeler/ac/". [12] X. Wu, Test Images, from: "ftp://ftp.csd.uwo.ca/pub/from_wu/images/". [13] B. Meyer, TMW Code and new results, from: "http://www.cs.monash.edu.au/~bmeyer/tmw".