Adopted from K.Sayood, “Introduction to Data Compression“, 4 th edition, Morgan Kaufmann,2012 Ch. 5 Dictionary techniques LZ, LZ 77 (or LZ 1), LZ 78 (or LZ 2), LZ W Lempel-Ziv-Welch algorithm Applications Unix Compression Command V-42bis PK Zip, Zip, GIF L Harc, PNG, gzip and ARJ Text Sources/ Computer Commands ( Sources that generate a relatively small number of patterns quite frequently.) Applications: Text Compression, Modem Communications, Image Compression. Techniques that incorporate structure in the data in order to increase Compression 1) Static 2) Dynamic (Adaptive) Commonly occurring patterns. Develop an index for these. Most useful with sources that generate a relatively small number of patterns quite frequently such as text sources and computer commands class of frequently occurring patterns (size of dictionary) must be much smaller than the number of all possible patterns. DICTIONARY Ex:Consider 4 character words, 3 character from lower case English alphabet (26 letters) one character from six punctuation marks(, ? . ! ; :) Alphabet size = 32 (26 letters + 6 punctuation marks) Number of character patterns = 324 = 220 = 1048576 Need 20 bits (5 bits/character) to code each pattern. Assume 256 most likely patterns placed into a dictionary. 1-bit flag 0 (In the dictionary) + 8 bits for pattern in the dictionary (Total 9 bits) 1(not in dictionary) + 20 bits for pattern (Total 21 bits) p = probability of pattern from the dictionary Ar. Number of bits/ pattern = R R = 9p + 21(1-p) = 21-12p, (5.1) For R< 20, p≥0.084 20 = 21-12p 12p = 1, p = 1/12 ≈ 0.084 p should be as large as possible. Carefully select patterns that are most likely to occur as entries in the dictionary. Static approach: Dictionary developed before encoding Adaptive of Dynamic approach: Dictionary developed on the fly. 5.3 Static Dictionary Most appropriate when considerable prior knowledge about the source is available. Ex. Student records, bank statements, credit card statements Efficient for a specific application Application-specific or data-specific static-dictionary-based coding scheme is the most efficient. The coding scheme designed for a specific application may not work well for a different application. 5.3.1 Digram Coding Static Dictionary Coding. Digrams: pairs of letters ASC II characters Digram Coding: static dictionary technique that is less specific to a single application. Ex 5.3.1/ p 119 (Source) 5-letter alphabet A = {a,b,c,d,r} Encode ‘abracadabra’ Table 5.1: A sample dictionary Code 000 001 010 Entry a b c Code 100 101 110 Entry r ab ac 011 Add d 111 ad 101100110111101100000 101 ⏟ 100 ⏟ 110 ⏟ 111 ⏟ 101 ⏟ 100 ⏟ 000 ⏟ 𝑎𝑏 𝑟 𝑎𝑐 𝑎𝑑 𝑎𝑏 𝑟 𝑎 Dictionary designed for LaTex (Table 5.2) is not suitable for C programs. nl = new line = space Technique (generating dictionary) to adapt to source output characteristics. Table 5-2 Table 5.3 (Latex document C-programs Ch. 5) These tables are different. 5.4 Adaptive dictionary based technique. (LZ 77) Lempel-Ziv 1977-LZ1 Lempel-Ziv 1978-LZ2 Lempel-Ziv-Welch - LZW WAN data communication products use LZ 77 or LZ 78 algorithm (see table 7.4, p. 186, Hoffman, “Data compression in digital systems: Kluwer, 1995). Publishing! Text, graphics and print ready images are compressed with LZW and other lossless algorithmsIbid p. 292. Ex. 5.4.4 LZW algorithm decoding Encoder output sequence 5 2 3 3 2 1 6 8 10 12 9 11 7 16 5 4 4 11 21 4 (see Table 5) Decoder starts with the same initial dictionary as the encoder (Table 3) Table 3 Initial LZW dictionary Index 1 2 3 4 5 Entry a b o w Index 6 7 8 9 10 Entry wa ab bb ba ab Start with Index 5 corresponds to w, decode (Already in the dictionary) Next decoder input is 2 (index) corresponds to ‘a’ Decode ‘a’ and concatenate with our current pattern to form ‘wa’. This is not in the dictionary. Add this as 6th element of the dictionary and start a new pattern beginning with ‘a’ The next four inputs 3 3 2 1 Corresponds to b b a b These generate 𝑎𝑏 ⏟ 𝑏𝑏 ⏟ 𝑏𝑎 ⏟ 𝑎𝑛𝑑 𝑎𝑏 ⏟ (7) (8) (9) The next input is 6 (10) wa Concatenate b with w to form bw (11) New pattern starts with w (‘wa’ already in the dictionary) Index 8 bb Concatenate ‘wa’ with ‘b’ to wab(12) Continue the construction (decoding) of the LZW dictionary. Situation where LZW decoding breaks down Table 5.10: Initial dictionary for abababab Index 1 2 Entry a b Table 5.11: Final dictionary for abababab Index Entry Index Entry 1 2 3 4 5 6 7 8 a b ab ba aba abab bab baba 9 10 11 12 13 14 15 ababa ababab babab bababa abababa abababab bababab Source alphabet A = {a,b} Encode the sequence ababababab ------Transmitted sequence 1 2 3 5-------- Decoding: Begin with initial dictionary (Table 5.10). (1, 2) decoded as (a,b) leads to 3rd entry ab. Next input is 3 (gives ab). Next is 4 (gives ba). See table (5.14). Next input is 5. Not in the dictionary 5.5 Applications: LZW is one of the most widely used compression algorithms. Table 5.13: Constructing the fifth entry (stage one) Index 1 2 3 4 5 Entry a b ab ba a… Table 5.14: Constructing the fifth entry (stage two) Index 1 2 3 4 5 Entry a b ab ba ab… Table 5.14: Completion of the fifth entry. Index 1 2 3 4 5 6 Entry a b ab ba aba a… See prob8/ p. 140 Program diffim, huff_enc (Compress command) (Unix Compress Command) LZW decoder has to contain an exception handler to handle the special case of decoding an index that does not have a corresponding complete entry in the decoder dictionary. (See Tables 4.7 and 4.8) Table 5.16: Comparison of GIF with arithmetic coding Image GIF Sena Sensin Earth Omaha 51,085 60,649 34,276 61,580 Arithmetic Coding Arithmetic Coding of Pixel Values of Pixel Difference 53,431 31,847 58,306 37,126 38,248 32,137 56,061 51,393 5.5.2 GIF (Image Compression) Developed by Compuserve Info Service to encode graphical images (For details see pages 151, 152). GIF is very popular for encoding all kinds of images both computer generated and natural images. Not very efficient to losslessly compress images of natural scenes,photographs, satellite images etc., (see table 5.16 above) References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. J. Ziv. and A. Lempel "A Universal Algorithm for Data Compression," IEEETrans. on Information Theory, vol. IT-23, pp. 337-343, May 1977. J. Ziv and A. Lempel "Compression of Individual Sequences via Variable-RateCoding," IEEE Trans. on Information Theory, vol. IT-24, pp. 530-536, Sept. 1978. J. A. Storer and T. G. Syzmanski, "Data Compression via Textual Substitution,"Journal of the ACM, pp. 928-951,1982. T. C. Bell "Better OPMIL Text Compression," IEEE Trans. on Comm., vol. COM-34, pp. 1176-1182, Dec. 1986. T. A. Welch "A Technique for High-Performance Data Compression," IEEE Computer, pp. 8-19, June 1984. T. C. Bell, J. G. Cleary, and I. H. Witten "Text Compression," Advanced Reference Series. Englewood Cliffs, NJ: Prentice Hall, 1990. M. Nelson "The Data Compression Book," New York: M&T Books, 1991. G. Held and T. R. Marshall "Data Compression," New York: Wiley, third edition, 1991. P. Marchand, "Graphics and GUI's with MATLAB," Boca Raton, FL: CRC Press, 1996. W. Kou, "Digital Image Compression Algorithms and Standards," Amsterdam, Kluwer Academic, 1995. G. Louchard and W. Szpankowski, "Generalized Lempel-Ziv parsing scheme and its preliminary analysis of the average profile," DCC '95 Data Compression Conf., pp. , Snowbird, UT, March 1995. R. Horspool, "The effect of non-greedy parsing Lempel-Ziv compression methods," DCC' 95 Data Compression Conf., pp. ,Snowbird, UT, March 1995. G. Louchard and W. Szpankowski, "On the Average Redundancy Rate of the Lempel-Ziv Code," DCC '96, Data Compression Conf., Snowbird, UT, April 1996. J. A. Storer, "Lossless Image Compression Using Generalized LZ1-Type Methods," DCC' 96, Data Compression Conf., UT, April 1996. C. T. Chen and L. G. Chen, "A novel architecture for Lempel-Ziv based data compression," IEEE ICCE, Chicago, IL, June 1996. D. Sheinwald, "On the Ziv-Lempel proofand related topics," Proc. IEEE, vol. 82, pp. 866-871, June 1994. A. D. Wyner and J. Ziv, "The sliding window Lempel-Ziv algorithm is asymptotically optimal," Proc. IEEE, vol. 82, pp. 872-877, June 1994. Y. F. Hu and X. S. Wu, "The methods of improving the compression ratio ofLZ77 family data compression algorithms," ICSP, Beijing, China, Oct. 1996. V. G. Ruiz and I. Garcia, "A lossy data compressor based on the LZW algorithm,"ICSPAT 96, pp. 1002-1006, Boston, MA, Oct. 1996. S. A. Savari, "Redundancy of the Lempel-Ziv-Welch Code," Data Compression Conf., (DCC 97), Snowbird, UT, March 1997. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. S. R. Kosaraju and G. Manzini, "Compression oflow entropy strings with LempelZiv algorithms," Compression and Complexity of Sequences 1997, Salerno, Italy,June 1997. J. I. Lathrop and M. Strauss, "A universal upper bound on the performance of the Lempel-Zivalgorithm on maliciously-constructed data," Compression and Complexity ofSequences 1997, Salerno, Italy, June 1997. D. Greene et al, "A progressive Ziv-Lempel algorithm for image compression," Compression andComplexity of Sequences 1997, Salerno, Italy, June 1997. M. Cohn and H. Helfgott, "Asymmetry in Ziv-Lempel compression," Compression andComplexity of sequences 1997, Salerno, Italy, June 1997. S. De Agostino, "A parallel decoder for LZ2 compression using the ID update heuristic,"Compression and Complexity of sequences 1997, Salerno, Italy, June 1997. R. H. Wyman and P. Y. K. Cheung, "Bit plane differential LZW for the compression of video for variable bandwidth channels," IEEE ISCAS' 97, Hong Kong,June 1997. C. Su, C-F. Yan and J-C. Yo, "Hardware efficient updating technique for LZW codec design," IEEE ISCAS' 97, Hong Kong, June 1997. C. T. Chen and L. G. Chen, "High-Speed VLSI design of the LZ-based datacompression," IEEE ISCAS'97, Hong Kong, June 1997. G. Held, "Data and image compression: Tools and techniques," 4th Edition, New York, NY: Wiley, 1996. P. Tischer, "A modified LZW data compression scheme," Australian ComputerScience Commun., vol. 9, pp. 262-272, 1987. R. Hoffman, "Data compression in digital systems," New York, NY: Chapman & Hall,1997. D.J. Craft, "ADLC and a pre-processor extension, BDLC, provides ultra fast compression for general-purpose bit-mapped image data," Data Compression Conf., p.400, IEEE Computer Society Press, 1995. (ADLC - Adaptive lossless data compression, BDLC Bit-mapped lossless datacompression, an LZ77 variant). T. Kida et al, "Multiple pattern matching in LZW compressed text," IEEE DCC Conf, UT,Mar. 1998. S. Even, "Four value adding algorithms," IEEE Spectrum, vol. 35, pp.33-38, May 1998. J. C. Kieffer, T.H. Park and Y. Xu, "Progressive lossless image coding via self referential partitions," IEEE ICIP, pp. , Chicago, IL, Oct. 1998. C-Ho Cheung, C. S-Wai and P. Lai-Man, " Predictive lossy LZSS algorithm for fidelity constrainedimage coding," Intl. Forum cum Conf. on Info. Technology and Commun. at the dawn of the new Millennium, Bangkok, Thailand, Aug. 2000. Y-K. Lai and K-C. Chen, " A novel VLSI architecture for Lempel-Ziv based data compression,"IEEE ISCAS, Geneva, Switzerland, May 2000. L.P.Deutsch, "Deflate compressed data format specification," Request for Comments (RFC), 1951, available in ftp ftp://ftp.uu.netlpub/archiving/zip/doc/1996. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. J. Miano, " Compressed image file formats: JPEG, PNG, GIF, XBM, BMP,"Addison Wesley, 1999. (software on disk) H.H. Shih, S.S. Narayanan and C.-C. Jay Kuo, "Automatic main melody extraction from MIDI files with a modified Lempel-Ziv algorithm," IEEE ISIMP 2001, Hong Kong, May 2001. M. J. Weinberger and Ordentlich, “On-line decision making for a class of loss functions via Lempel-ziv parsing”, DCC 2000, Snow Bird, UT March 2000, http://www.cs.brandeis.edu/~dcc Y. Reznik and W. Szpankowski, “On the average redundancy rate of the Lempel-ziv code with K-error protocol,” DCC 2000. Data compression conference.; S. De Agostino, “Work-optimal parallel decoders for LZ2 data compression,” DCC 2000. N. J. Brittain and M. R. El-Sakka, “Grayscale true two-dimensional dictionary based image compression,” JVCIR, vol. 18, pp 35-44, Feb 2007. (2D-LZ). J.D. Gibson et al, "Digital compression for multimedia," San Diego, CA: Academic Press, 1998 (see Appendices E and F). M. Aboy, R. Hornero, D.Abasalo, and D. Alvarez. Interpretation of Lempel-Ziv complexity measure in the context of biomedical signal analysis. IEEE Transactions on Biomedical Engineering,53(11):2282-2288,Nov.2006. N. Radhakrishnan and B.N. Gangadhar. Estimating regularity in epileptic seizure timeseries data. IEEE Engineering in Medicine and Biology Magazine,17:89-94,1998. X.-S. Zhang, R.J. Roy, and E.W. Jensen. EEG complexity as a measure of depth of anesthesia for patients. IEEE Transactions on Biomedical Engineering,48(12):1424-1433, Dec.2001. Daniel Abasolo, Roberto Hornero, Carlos Gomez, Maria Garcia, and Miguel Lopez. Analysis of EEG background activity in Alzheimer’s disease patients with Lampel-Ziv complexity and central tendency measure. Medical Engineering Physics,28(4):315322,2006. H. Zhang, Y.Zhu, and Z. Wang. Complexity measure and complexity rate information based detection of ventricular tachycardia and fibrillation. Medical and Biological Engineering amd Computing, 38:553-557,2000. B. Li, J. Xu and F. Wu, "ld dictionary mode for Screen Content Coding," in Visual Communication and Image Processing Conference, pp. 189 - 192, Dec. 2014. X. Guo et al, "Wyner - Ziv - based multiview video coding," IEEE trans. on CSVT, Vol. 18, pp. 713 - 714, June 2008. J.-S. Kim and J.-G. Kim, "Reliability-based selective encoding in pixel-domain Wyner-Ziv residual video codec," Future Information Communication Technology and Applications, Lecture Notes in Electrical Engineering (LNEE), Vol. 235, pp. 359-367, Sep 2013. 54. J.-S. Kim, J.-G. Kim, H. Choi, and K.-D. Seo, "Pixel-domain Wyner-Ziv residual video coder with adaptive binary-to-Gray code converting process," Electronics Letters, Vol. 49, no.3, Jan. 2013. Further Reading 1. Text Compression, by T.C. Bell, J.G. Cleary, and I.H. Witten. Text Compression. Advanced Reference Series. Prentice Hall, Eaglewood Cliffs, New Jersey, 1990. This provides an excellent exposition of dictionary-based coding techniques. 2. The Data Compression Book, by M.Nelson and J.-L.Gailley. The Data Compression Book. This also does a good job of describing the Ziv-Lempel algorithms. There is also a very nice description of some of the software implementation aspects. 3. Data Compression, by G. Held and T.R. Marshall. Data Compression. Wiley, third edition, 1991. This contains a description of diagram coding under the name “diatomic coding.” The book also includes BASIC programs that help in the design of dictionaries. 4. The PNG algorithm is described in a very accessible manner in “PNG Lossless Compression,” by G. Roelofs. PNG Lossless Compression. In K. Sayood, editor, Lossless Compression Handbook, pages 371-390. Academic Press,2003 . 5. A more in-depth look at dictionary compression is provided in “Dictionary- Based Data Compression: An Algorithm Perspective,” by S.C. Sahinalp and N.M. Rajpoot. Dictionary-Based Data Compression: An Algorithmic Perspective. In K Sayood, editor, Lossless Compression Handbook, pages 153-168. Academic Press, 2003.