In the dictionary

advertisement
Adopted from K.Sayood, “Introduction to Data Compression“, 4 th
edition, Morgan Kaufmann,2012
Ch. 5 Dictionary techniques
LZ, LZ 77 (or LZ 1), LZ 78 (or LZ 2), LZ W
Lempel-Ziv-Welch algorithm
Applications
Unix Compression Command
V-42bis
PK Zip, Zip,
GIF
L Harc, PNG, gzip and
ARJ
Text Sources/ Computer Commands
( Sources that generate a relatively small number of patterns quite frequently.)
Applications:
Text Compression, Modem Communications, Image Compression.
Techniques that incorporate structure in the data in order to increase Compression
1) Static
2) Dynamic (Adaptive)
Commonly occurring patterns. Develop an index for these.
Most useful with sources that generate a relatively small number of patterns quite
frequently such as text sources and computer commands class of frequently
occurring patterns (size of dictionary) must be much smaller than the number of all
possible patterns.
DICTIONARY
Ex:Consider 4 character words, 3 character from lower case English alphabet (26
letters) one character from six punctuation marks(, ? . ! ; :)
Alphabet size = 32 (26 letters + 6 punctuation marks)
Number of character patterns = 324 = 220 = 1048576
Need 20 bits (5 bits/character) to code each pattern. Assume 256 most likely
patterns placed into a dictionary.
1-bit flag
0 (In the dictionary) + 8 bits for pattern in the dictionary (Total 9 bits)
1(not in dictionary) + 20 bits for pattern (Total 21 bits)
p = probability of pattern from the dictionary
Ar. Number of bits/ pattern = R
R = 9p + 21(1-p) = 21-12p, (5.1)
For R< 20, p≥0.084
20 = 21-12p
12p = 1, p = 1/12 ≈ 0.084
p should be as large as possible. Carefully select patterns that are most likely to
occur as entries in the dictionary.
Static approach: Dictionary developed before encoding
Adaptive of Dynamic approach: Dictionary developed on the fly.
5.3 Static Dictionary
Most appropriate when considerable prior knowledge about the source is available.
Ex. Student records, bank statements, credit card statements
Efficient for a specific application
Application-specific or data-specific static-dictionary-based coding scheme is the
most efficient. The coding scheme designed for a specific application may not
work well for a different application.
5.3.1 Digram Coding
Static Dictionary Coding.
Digrams: pairs of letters
ASC II characters
Digram Coding: static dictionary technique that is less specific to a single
application.
Ex 5.3.1/ p 119 (Source)
5-letter alphabet A = {a,b,c,d,r}
Encode ‘abracadabra’
Table 5.1: A sample dictionary
Code
000
001
010
Entry
a
b
c
Code
100
101
110
Entry
r
ab
ac
011
Add
d
111
ad
101100110111101100000
101
⏟ 100
⏟ 110
⏟ 111
⏟ 101
⏟ 100
⏟ 000
⏟
𝑎𝑏
𝑟
𝑎𝑐
𝑎𝑑
𝑎𝑏
𝑟
𝑎
Dictionary designed for LaTex (Table 5.2) is not suitable for C programs.
nl = new line
= space
Technique (generating dictionary) to adapt to source output characteristics.
Table 5-2
Table 5.3
(Latex document
C-programs
Ch. 5)
These tables are different.
5.4 Adaptive dictionary based technique. (LZ 77)
Lempel-Ziv 1977-LZ1
Lempel-Ziv 1978-LZ2
Lempel-Ziv-Welch - LZW
WAN data communication products use LZ 77 or LZ 78 algorithm (see table 7.4,
p. 186, Hoffman, “Data compression in digital systems: Kluwer, 1995).
Publishing! Text, graphics and print ready images are compressed with LZW and
other lossless algorithmsIbid p. 292.
Ex. 5.4.4 LZW algorithm decoding
Encoder output sequence
5 2 3 3 2 1 6 8 10 12 9 11 7 16
5 4 4 11 21 4 (see Table 5)
Decoder starts with the same initial dictionary as the encoder (Table 3)
Table 3 Initial LZW dictionary
Index
1
2
3
4
5
Entry
a
b
o
w
Index
6
7
8
9
10
Entry
wa
ab
bb
ba
ab
Start with Index 5 corresponds to w, decode (Already in the dictionary)
Next decoder input is 2 (index)
corresponds to ‘a’
Decode ‘a’ and concatenate with our current pattern to form ‘wa’. This is not in the
dictionary. Add this as 6th element of the dictionary and start a new pattern
beginning with ‘a’
The next four inputs 3 3 2 1
Corresponds to b b a b
These generate 𝑎𝑏
⏟ 𝑏𝑏
⏟ 𝑏𝑎
⏟ 𝑎𝑛𝑑 𝑎𝑏
⏟
(7) (8) (9)
The next input is 6
(10)
wa
Concatenate b with w to form bw (11)
New pattern starts with w
(‘wa’ already in the dictionary)
Index 8
bb
Concatenate ‘wa’ with ‘b’ to wab(12)
Continue the construction (decoding) of the LZW dictionary.
Situation where LZW decoding breaks down
Table 5.10: Initial dictionary for abababab
Index
1
2
Entry
a
b
Table 5.11: Final dictionary for abababab
Index
Entry
Index
Entry
1
2
3
4
5
6
7
8
a
b
ab
ba
aba
abab
bab
baba
9
10
11
12
13
14
15
ababa
ababab
babab
bababa
abababa
abababab
bababab
Source alphabet A = {a,b}
Encode the sequence ababababab ------Transmitted sequence 1 2 3 5--------
Decoding: Begin with initial dictionary (Table 5.10).
(1, 2) decoded as (a,b) leads to 3rd entry ab. Next input is 3 (gives ab). Next is 4
(gives ba). See table (5.14). Next input is 5. Not in the dictionary
5.5 Applications: LZW is one of the most widely used compression algorithms.
Table 5.13: Constructing the fifth entry (stage one)
Index
1
2
3
4
5
Entry
a
b
ab
ba
a…
Table 5.14: Constructing the fifth entry (stage two)
Index
1
2
3
4
5
Entry
a
b
ab
ba
ab…
Table 5.14: Completion of the fifth entry.
Index
1
2
3
4
5
6
Entry
a
b
ab
ba
aba
a…
See prob8/ p. 140
Program diffim, huff_enc
(Compress command)
(Unix Compress Command)
LZW decoder has to contain an exception handler to handle the special case of
decoding an index that does not have a corresponding complete entry in the
decoder dictionary.
(See Tables 4.7 and 4.8)
Table 5.16: Comparison of GIF with arithmetic coding
Image
GIF
Sena
Sensin
Earth
Omaha
51,085
60,649
34,276
61,580
Arithmetic Coding Arithmetic Coding
of Pixel Values
of Pixel Difference
53,431
31,847
58,306
37,126
38,248
32,137
56,061
51,393
5.5.2 GIF (Image Compression)
Developed by Compuserve Info Service to encode graphical images (For details
see pages 151, 152). GIF is very popular for encoding all kinds of images both
computer generated and natural images. Not very efficient to losslessly compress
images of natural scenes,photographs, satellite images etc., (see table 5.16 above)
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
J. Ziv. and A. Lempel "A Universal Algorithm for Data Compression," IEEETrans. on
Information Theory, vol. IT-23, pp. 337-343, May 1977.
J. Ziv and A. Lempel "Compression of Individual Sequences via Variable-RateCoding,"
IEEE Trans. on Information Theory, vol. IT-24, pp. 530-536, Sept. 1978.
J. A. Storer and T. G. Syzmanski, "Data Compression via Textual Substitution,"Journal
of the ACM, pp. 928-951,1982.
T. C. Bell "Better OPMIL Text Compression," IEEE Trans. on Comm., vol. COM-34, pp.
1176-1182, Dec. 1986.
T. A. Welch "A Technique for High-Performance Data Compression," IEEE Computer,
pp. 8-19, June 1984.
T. C. Bell, J. G. Cleary, and I. H. Witten "Text Compression," Advanced Reference
Series. Englewood Cliffs, NJ: Prentice Hall, 1990.
M. Nelson "The Data Compression Book," New York: M&T Books, 1991.
G. Held and T. R. Marshall "Data Compression," New York: Wiley, third edition, 1991.
P. Marchand, "Graphics and GUI's with MATLAB," Boca Raton, FL: CRC Press, 1996.
W. Kou, "Digital Image Compression Algorithms and Standards," Amsterdam, Kluwer
Academic, 1995.
G. Louchard and W. Szpankowski, "Generalized Lempel-Ziv parsing scheme and its
preliminary analysis of the average profile," DCC '95 Data Compression Conf., pp. ,
Snowbird, UT, March 1995.
R. Horspool, "The effect of non-greedy parsing Lempel-Ziv compression methods,"
DCC' 95 Data Compression Conf., pp. ,Snowbird, UT, March 1995.
G. Louchard and W. Szpankowski, "On the Average Redundancy Rate of the Lempel-Ziv
Code," DCC '96, Data Compression Conf., Snowbird, UT, April 1996.
J. A. Storer, "Lossless Image Compression Using Generalized LZ1-Type Methods,"
DCC' 96, Data Compression Conf., UT, April 1996.
C. T. Chen and L. G. Chen, "A novel architecture for Lempel-Ziv based data
compression," IEEE ICCE, Chicago, IL, June 1996.
D. Sheinwald, "On the Ziv-Lempel proofand related topics," Proc. IEEE, vol. 82, pp.
866-871, June 1994.
A. D. Wyner and J. Ziv, "The sliding window Lempel-Ziv algorithm is asymptotically
optimal," Proc. IEEE, vol. 82, pp. 872-877, June 1994.
Y. F. Hu and X. S. Wu, "The methods of improving the compression ratio ofLZ77 family
data compression algorithms," ICSP, Beijing, China, Oct. 1996.
V. G. Ruiz and I. Garcia, "A lossy data compressor based on the LZW
algorithm,"ICSPAT 96, pp. 1002-1006, Boston, MA, Oct. 1996.
S. A. Savari, "Redundancy of the Lempel-Ziv-Welch Code," Data Compression Conf.,
(DCC 97), Snowbird, UT, March 1997.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
S. R. Kosaraju and G. Manzini, "Compression oflow entropy strings with LempelZiv
algorithms," Compression and Complexity of Sequences 1997, Salerno, Italy,June 1997.
J. I. Lathrop and M. Strauss, "A universal upper bound on the performance of the
Lempel-Zivalgorithm on maliciously-constructed data," Compression and Complexity
ofSequences 1997, Salerno, Italy, June 1997.
D. Greene et al, "A progressive Ziv-Lempel algorithm for image compression,"
Compression andComplexity of Sequences 1997, Salerno, Italy, June 1997.
M. Cohn and H. Helfgott, "Asymmetry in Ziv-Lempel compression," Compression
andComplexity of sequences 1997, Salerno, Italy, June 1997.
S. De Agostino, "A parallel decoder for LZ2 compression using the ID update
heuristic,"Compression and Complexity of sequences 1997, Salerno, Italy, June 1997.
R. H. Wyman and P. Y. K. Cheung, "Bit plane differential LZW for the compression of
video for variable bandwidth channels," IEEE ISCAS' 97, Hong Kong,June 1997.
C. Su, C-F. Yan and J-C. Yo, "Hardware efficient updating technique for LZW codec
design," IEEE ISCAS' 97, Hong Kong, June 1997.
C. T. Chen and L. G. Chen, "High-Speed VLSI design of the LZ-based
datacompression," IEEE ISCAS'97, Hong Kong, June 1997.
G. Held, "Data and image compression: Tools and techniques," 4th Edition, New York,
NY: Wiley, 1996.
P. Tischer, "A modified LZW data compression scheme," Australian ComputerScience
Commun., vol. 9, pp. 262-272, 1987.
R. Hoffman, "Data compression in digital systems," New York, NY: Chapman &
Hall,1997.
D.J. Craft, "ADLC and a pre-processor extension, BDLC, provides ultra fast compression
for general-purpose bit-mapped image data," Data Compression Conf., p.400, IEEE
Computer Society Press, 1995. (ADLC - Adaptive lossless data compression, BDLC Bit-mapped lossless datacompression, an LZ77 variant).
T. Kida et al, "Multiple pattern matching in LZW compressed text," IEEE DCC Conf,
UT,Mar. 1998.
S. Even, "Four value adding algorithms," IEEE Spectrum, vol. 35, pp.33-38, May 1998.
J. C. Kieffer, T.H. Park and Y. Xu, "Progressive lossless image coding via self referential
partitions," IEEE ICIP, pp. , Chicago, IL, Oct. 1998.
C-Ho Cheung, C. S-Wai and P. Lai-Man, " Predictive lossy LZSS algorithm for fidelity
constrainedimage coding," Intl. Forum cum Conf. on Info. Technology and Commun. at
the dawn of the new Millennium, Bangkok, Thailand, Aug. 2000.
Y-K. Lai and K-C. Chen, " A novel VLSI architecture for Lempel-Ziv based data
compression,"IEEE ISCAS, Geneva, Switzerland, May 2000.
L.P.Deutsch, "Deflate compressed data format specification," Request for Comments
(RFC), 1951, available in ftp ftp://ftp.uu.netlpub/archiving/zip/doc/1996.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
J. Miano, " Compressed image file formats: JPEG, PNG, GIF, XBM, BMP,"Addison
Wesley, 1999. (software on disk)
H.H. Shih, S.S. Narayanan and C.-C. Jay Kuo, "Automatic main melody extraction from
MIDI files with a modified Lempel-Ziv algorithm," IEEE ISIMP 2001, Hong Kong, May
2001.
M. J. Weinberger and Ordentlich, “On-line decision making for a class of loss functions
via Lempel-ziv parsing”, DCC 2000, Snow Bird, UT March 2000,
http://www.cs.brandeis.edu/~dcc
Y. Reznik and W. Szpankowski, “On the average redundancy rate of the Lempel-ziv code
with K-error protocol,” DCC 2000. Data compression conference.;
S. De Agostino, “Work-optimal parallel decoders for LZ2 data compression,” DCC 2000.
N. J. Brittain and M. R. El-Sakka, “Grayscale true two-dimensional dictionary based
image compression,” JVCIR, vol. 18, pp 35-44, Feb 2007. (2D-LZ).
J.D. Gibson et al, "Digital compression for multimedia," San Diego, CA: Academic
Press, 1998 (see Appendices E and F).
M. Aboy, R. Hornero, D.Abasalo, and D. Alvarez. Interpretation of Lempel-Ziv
complexity measure in the context of biomedical signal analysis. IEEE Transactions on
Biomedical Engineering,53(11):2282-2288,Nov.2006.
N. Radhakrishnan and B.N. Gangadhar. Estimating regularity in epileptic seizure timeseries data. IEEE Engineering in Medicine and Biology Magazine,17:89-94,1998.
X.-S. Zhang, R.J. Roy, and E.W. Jensen. EEG complexity as a measure of depth of
anesthesia for patients. IEEE Transactions on Biomedical Engineering,48(12):1424-1433,
Dec.2001.
Daniel Abasolo, Roberto Hornero, Carlos Gomez, Maria Garcia, and Miguel Lopez.
Analysis of EEG background activity in Alzheimer’s disease patients with Lampel-Ziv
complexity and central tendency measure. Medical Engineering Physics,28(4):315322,2006.
H. Zhang, Y.Zhu, and Z. Wang. Complexity measure and complexity rate information
based detection of ventricular tachycardia and fibrillation. Medical and Biological
Engineering amd Computing, 38:553-557,2000.
B. Li, J. Xu and F. Wu, "ld dictionary mode for Screen Content Coding," in Visual
Communication and Image Processing Conference, pp. 189 - 192, Dec. 2014.
X. Guo et al, "Wyner - Ziv - based multiview video coding," IEEE trans. on CSVT, Vol.
18, pp. 713 - 714, June 2008.
J.-S. Kim and J.-G. Kim, "Reliability-based selective encoding in pixel-domain
Wyner-Ziv residual video codec," Future Information Communication Technology and
Applications, Lecture Notes in Electrical Engineering (LNEE), Vol. 235, pp. 359-367,
Sep 2013.
54.
J.-S. Kim, J.-G. Kim, H. Choi, and K.-D. Seo, "Pixel-domain Wyner-Ziv residual video
coder with adaptive binary-to-Gray code converting process," Electronics Letters, Vol.
49, no.3, Jan. 2013.
Further Reading
1. Text Compression, by T.C. Bell, J.G. Cleary, and I.H. Witten. Text Compression.
Advanced Reference Series. Prentice Hall, Eaglewood Cliffs, New Jersey, 1990. This
provides an excellent exposition of dictionary-based coding techniques.
2. The Data Compression Book, by M.Nelson and J.-L.Gailley. The Data Compression
Book. This also does a good job of describing the Ziv-Lempel algorithms. There is also a
very nice description of some of the software implementation aspects.
3. Data Compression, by G. Held and T.R. Marshall. Data Compression. Wiley, third
edition, 1991. This contains a description of diagram coding under the name “diatomic
coding.” The book also includes BASIC programs that help in the design of dictionaries.
4. The PNG algorithm is described in a very accessible manner in “PNG Lossless
Compression,” by G. Roelofs. PNG Lossless Compression. In K. Sayood, editor, Lossless
Compression Handbook, pages 371-390. Academic Press,2003 .
5. A more in-depth look at dictionary compression is provided in “Dictionary- Based Data
Compression: An Algorithm Perspective,” by S.C. Sahinalp and N.M. Rajpoot.
Dictionary-Based Data Compression: An Algorithmic Perspective. In K Sayood, editor,
Lossless Compression Handbook, pages 153-168. Academic Press, 2003.
Download