Lossless compression

advertisement
ITC HAND-OUT FOR SE-ITA & B (2014-15)
Lossless and Lossy Compression
Lossless compression
Lossless data compression makes use of data compression algorithms that allows the exact
original data to be reconstructed from the compressed data. Example : ZIP file format. This can
be contrasted to lossy data compression, which does not allow the exact original data to be
reconstructed from the compressed data.
Lossless compression is used when it is important that the original and the decompressed data be
identical, or when no assumption can be made on whether certain deviation is uncritical.
However the size of the compressed sequence cannot be less than the entropy of the source.
Typical examples are executable programs and source code. Some image file formats, notably
PNG, use only lossless compression, while others like TIFF and MNG may use either lossless or
lossy methods.
Lossless compression methods may be categorized according to the type of data they are
designed to compress. Some main types of targets for compression algorithms are text, images
and sound.
Most lossless compression programs use two different kinds of algorithms: one which generates
a statistical model for the input data, and another which maps the input data to bit strings using
this model in such a way that "probable" (e.g. frequently encountered) data will produce shorter
output than "improbable" data. Example: Statistical modeling algorithms for text include LZ77,
LZW and encoding algorithms to produce bit sequences are Huffman coding and Arithmetic
coding.
Lossy compression
A lossy data compression method is one where compressing data and then decompressing it
retrieves data that may well be different from the original, but is "close enough" to be useful in
some way. The measure of the difference between original and reconstructed data is termed as
distortion. Hence, the aim is to get minimum distortion while compressing it to the lowest rate.
Lossy data compression is used frequently on the Internet and especially in streaming media and
telephony applications. These methods are typically referred to as codecs in this context. Most
lossy data compression formats suffer from generation loss: repeatedly compressing and
decompressing the file will cause it to progressively lose quality.
Types of Lossy compression
There are two basic lossy compression schemes:

In lossy transform codecs, samples of picture or sound are taken, chopped into small
segments, transformed into a new basis space, and quantized. The resulting quantized
values are then entropy coded.

In lossy predictive codecs, previous and/or subsequent decoded data is used to predict the
current sound sample or image frame. The error between the predicted data and the real
data, together with any extra information needed to reproduce the prediction, is then
quantized and coded.
In some systems the two techniques are combined, with transform codecs being used to
compress the error signals generated by the predictive stage.
Lossless vs. Lossy compression
Lossless





None of the information is lost
It only removes redundant information
It is reversible
Compression ratio is low
Use for text and computer files ( very sensitive data)
Lossy





There is a loss of information
It removes visually irrelevant data
It is irreversible
Compression ratio is high
Used for compressing sound, images or videos (Audio can be compressed at 10:1 with no
noticeable loss of quality, video can be compressed immensely with little visible quality
loss, eg. 300:1. Lossily compressed still images are often compressed to 1/10th their
original size, as with audio, but the quality loss is more noticeable, especially on closer
inspection)
SPEECH COMPRESSION
No matter what language is being spoken, speech is generated using machinery that is not very
different from person to person. This machinery has to obey physical laws that substantially limit
the behavior of the outputs. Speech can be analyzed in terms of a model and model can be
extracted and transmitted to the receiver. At the receiver the speech is synthesized using the
model.
Speech is produced by forcing air first through an elastic opening, the vocal cords, laryngeal,
oral, nasal and pharynx passages and finally through mouth and nasal cavity. First sound is
generated and is modulated into speech as it traverses through the vocal tract. In order to
generate a fragment of speech we have to generate a sequence of excitation signals and the
corresponding sequence of vocal tract approximation
Many speech compression schemes exist. Some of them are: –





Waveform coding
Channel vocoder
Linear predictive coder (LPC)
Code excited linear prediction (CELP)
Mixed excitation linear prediction (MELP)
Waveform Coding
Waveform coding is some kind of approximately lossless coding, as it deals with speech signal
as any kind of ordinary data. The resulting signal is close as possible as the original one. Codecs
using this techniques have generally low complexity and give high quality at rates >= 16 Kbps.
The simplest form of waveform coding is Pulse Code Modulation (PCM), which involves
sampling and quantizing the input waveform. Narrow-band speech is typically band-limited to 4
KHz and sampled at 8 KHz. Many codecs try to predict the value of the next sample from the
previous samples. This is because there is correlation between speech samples due to the nature
of speech signal. An error signal is computed from the original and predicted signals. As in most
cases, this error signal is small with respect to the original one, it will have lower variance than
the original one. Hence, fewer bits are required to encode them. This is the basis of Differential
Pulse Code Modulation (DPCM) codecs. They quantize the difference between the original and
predicted (from the past samples) signals. The notion of adaptive coding is an enhancement to
DPCM coding. This is done by making the predictor and quantizer adaptive so that they change
to match the characteristics of the speech being coded. The most known codec using this
technique is the Adaptive DPCM (ADPCM) codecs. It is also possible to encode in the frequency
domain instead of the time domain (as the above mentioned techniques). In Sub-Band Coding
(SBC), the original speech signal is divided into a number of frequency bands, or sub-bands.
Each one is coded independently using any time domain coding technique like ADPCM encoder.
One of the advantages of doing this is that all sub-band frequencies do not influence in the same
way the perceptual quality of the signal. Hence, more bits are used to encode the sub-bands
having more perceptually important effect on the quality than those where the noise at these
frequencies is less perceptually important. Adaptive bit allocation schemes may be used to
further exploit these ideas. SBC produces good quality at bit rates ranging form 16 to 32 Kbps.
However, they are very complex with respect to the DPCM codecs. As in video spatial coding,
Discrete Cosine Transformation (DCT) is used in speech coding techniques. The type of coding
employing this technique is the Adaptive Transform Coding (ATC). Blocks of speech signal is
divided into a large numbers of frequency bands. The number of bits used to code each
transformation coefficient is adapted depending on the spectral properties of the speech. Good
signal quality is maintained using ATC coding at bit rates of about 16 Kbps.
Channel Vocoder
Each segment of input speech is analyzed using a bank of band-pass filters called the analysis
filters. Energy at the output of each filter is estimated at fixed time intervals and transmitted to
the receiver. A decision is made as to whether the speech in that segment is voiced or unvoiced.
Voiced sound tend to have a pseudo-periodic structure. The period of the fundamental harmonic
is called the pitch period. Transmitter also forms an estimate of the pitch period which is
transmitted to the receiver. Unvoiced sounds tend to have a noise like structure. At the receiver,
the vocal tract filter is implemented by a bank of band-pass filters (identical to the filters at
transmitter). The input to the filters is noise source (for unvoiced segments) or periodic pulse (for
voiced).
Linear Predictive Coder (LPC-10)
Instead of the vocal tract being modeled by a bank of filters, in LPC, it is modeled as a single
linear filter:
The input to the vocal tract filter is either the output of a random noise generator or a periodic
pulse generator. At the transmitter a segment of speech is analyzed to make a decision on
voiced/unvoiced, the pitch period and the parameters of the vocal tract filter. In LPC-10 input
speech is sampled at 8000 samples per second which is broken into 180 sample segments.
Therefore, the rate is 2.4 kbps.
Code excited linear prediction (CELP)
In CELP instead of having a codebook of pulse patterns we allow a variety of excitation signals.
Given a segment, encoder obtains the vocal tract filter. Encoder then excites the vocal tract filter
with the entries of the codebook. Difference between original speech segment and the
synthesized speech is fed to a perceptual weighting filter. Codebook entry generating minimum
average weighted error is declared to the best match.
Mixed excitation linear prediction (MELP)
MELP is the new federal standard for speech coding at 2.4 kbps. MELP uses LPC filter to model
the vocal tract and a much more complex approach to the generation of the excitation signal.
Excitation signal is a multiband mixed excitation. Mixed excitation contains both a filtered signal
from a noise generator as well as a contribution depending on the input signal. First step in
constructing the excitation signal is pitch extraction. Input is also subjected to a multiband
voicing analysis using five filters with passband 0-500, 500-1000, 1000-2000, 2000-3000 and
3000-4000 Hz. The goal of the analysis is to obtain the voicing strength for each band used in the
shaping filters
IMAGE COMPRESSION
GIF: Graphics Interchange Format
It is a bitmap image format that was introduced by CompuServe in 1987 and has since come into
widespread usage on the World Wide Web due to its wide support and portability. The format
supports up to 8 bits per pixel for each image, allowing a single image to reference its own
palette of up to 256 different colors chosen from the 24-bit RGB color space. It also
supports animations and allows a separate palette of up to 256 colors for each frame. These
palette limitations make the GIF format less suitable for reproducing color photographs and other
images with continuous color, but it is well-suited for simpler images such as graphics or logos
with solid areas of color.
GIF
images
are
compressed
using
the Lempel-Ziv-Welch (LZW) lossless
compression technique to reduce the file size without degrading the visual quality.
data
GIFs are suitable for sharp-edged line art (such as logos) with a limited number of colors. This
takes advantage of the format's lossless compression, which favors flat areas of uniform color
with well-defined edges. It can also be used to store low-color data for games and for small
animations and low-resolution film clips. Since a single GIF image palette is limited to 256
colors, it is not usually used as a format for digital photography. Digital photographers use image
file formats capable of reproducing a greater range of colors, such as TIFF, RAW or JPEG
JPEG
JPEG stands for Joint photographic experts group. It is the first international standard in image
compression. It is widely used today. It could be lossy as well as lossless, more commonly it is
lossy.
The JPEG compression scheme is divided into the following stages:
1. Preprocessing the image
2. Transformation using Discrete Cosine Transform (DCT) to blocks of pixels, thus removing
redundant image data.
3. Quantization of each block of DCT coefficients using weighting functions optimized for the
human eye.
4. Encoding the resulting coefficients (image data) using a Huffman variable word-length
algorithm to remove redundancies in the coefficients.
Step 1 - Preprocessing
The first step is to convert the red, green, blue color channels to YCbCr space. Next, the image is
partitioned into blocks of size 8 x 8 pixels.
See the example below.
Notice the highlighted pixels in block row 4, block column 28. We will use the elements of this
matrix to illustrate the mathematics of transformation and quantization steps. An enlargement of
this block appears below as well as the 64 pixels intensities that make up the block.
5
176
193
168
168
170
167
165
6
176
158
172
162
177
168
151
5
167
172
232
158
61
145
214
33
179
169
174
5
5
135
178
8
104
180
178
172
197
188
169
63
5
102
101
160
142
133
139
51
47
63
5
180
191
165
5
49
53
43
5
184
170
168
74
Pixel intensities of the block row 4, block column 28.
Enlargement of the example block.
The last part of preprocessing the image is to subtract 127 from each pixel intensity in each
block. This step centers the intensities about the value 0 and it is done to simple the mathematics
of the transformation and quantization steps. For our running example block, here are the new
values.
-122
49
66
41
41
43
40
38
-121
49
31
45
35
50
41
24
-122
40
45
105
31
-66
18
87
-94
52
42
47
-122
-122
8
51
-119
-23
53
51
45
70
61
42
-64
-122
-25
-26
33
15
6
12
-76
-80
-64
-122
53
64
38
-122
-78
-74
-84
-122
57
43
41
-53
Pixel intensity values less 127 in block row 4, block column 28.
Step 2 - Transformation
The preprocessing has done nothing that will make the coding portion of the algorithm more
effective. The transformation step is the key to increasing the coder's effectiveness. The JPEG
Image Compression Standard relies on the Discrete Cosine Transformation (DCT) to transform
the image. The DCT is a product C = U B U^Twhere B is an 8 x 8 block from the preprocessed
image and U is a special 8 x 8 matrix.
DCT tends to push most of the high intensity information (larger values) in the 8 x 8 block to the
upper left-hand of C with the remaining values in C taking on relatively small values. The DCT
is applied to each 8 x 8 block. The image at left shows the DCT of our highlighted block while
the image at right shows the DCT applied to each block of the preprocessed image.
DCT of the example block.
DCT applied to each block. Fullsize version
The values (rounded to three digits) of the DCT of our example block:
-27.500 -213.468 -149.608
-95.281 -103.750 -46.946 -58.717
-21.544 -239.520
27.226
168.229
51.611
-8.238 -24.495 -52.657 -96.621
-27.198
-31.236
-32.278
173.389
30.184
-43.070
-50.473
67.134
-14.115
11.139
19.500
8.460
33.589
-53.113
-36.750
2.918
-70.593
66.878
47.441
-32.614
-8.195
18.132 -22.994
12.078
-19.127
6.252
-55.157
85.586
-0.603
8.028
11.212
71.152
-38.373
-75.924
29.294
-16.451 -23.436
-4.213
15.624
-51.141 -56.942
4.002
49.143
71.010
18.039
-5.795 -18.387
6.631
DCT values in block row 4, block column 28.
DCT does create a large number of near-zero values, but we need a quantization step to better
prepare the data for coding.
Step 3 - Quantization
The next step in the JPEG algorithm is the quantization step. Here we will make decisions about
values in the transformed image - elements near zero will converted to zero and other elements
will be shrunk so that their values are closer to zero. All quantized values will then be rounded to
integers. Quantization makes the JPEG algorithm an example of lossy compression. The DCT
step is completely invertible - that is, we applied the DCT to each block B by computing C = U
B U^T. It turns out we can recover B by the computation B = U^T C U. When we "shrink"
values, it is possible to recover them. However, converting small values to 0 and rounding all
quantized values are not reversible steps. We forever lose the ability to recover the original
image. We perform quantization in order to obtain integer values and to convert a large number
of the values to 0. The Huffman coding algorithm will be much more effective with quantized
data and the hope is that when we view the compressed image, we haven't given up too much
resolution. For applications such as web browsing, the resolution lost in order to gain storage
space/transfer speed is acceptable.
The result of quantization applied to our example block. The image has been enhanced to
facilitate viewing of the low intensities.
-2
-19
-15
-6
-4
-1
-1
0
14
4
-2
-13
0
0
-1
-2
-2
-2
-2
7
-1
-1
0
1
2
-3
-2
2
0
0
1
0
1
0
1
-1
-1
0
0
0
-3
2
1
-1
0
0
0
0
0
0
0
-1
1
0
0
0
1
0
-1
0
0
0
0
0
DCT values in block row 4, block column 28.
Quantized DCT of the example block.
Step 4 - Encoding
The last step in the JPEG process is to encode the transformed and quantized image. The regular
JPEG standard uses an advanced version of Huffman coding. The original image has dimensions
160 x 240 so that 160*240*8 = 307,200 bits are needed to store it to disk. If we apply Huffman
coding to the transformed and quantized version of the image, we need only 85,143 bits to store
the image to disk. The compression rate is about 2.217bpp. This represents a savings of over
70% of the original amount of bits needed to store the image!
Inverting the Process
The inversion process is quite straightforward. The first step is to decode the Huffman codes to
obtain the quantized DCT of the image. In order to undo the shrinking process, elements in each
8x8 block are magnified by the appropriate amount. At this point we have blocks C^' where the
original DCT before quantization is C = U B U^T. We next invert the DCT by computing B^' =
U^T C^'U for each block. The last step is to add 127 to each element in each block. The resulting
matrix is an approximation to the original image.
The images below shows the original image at left and the JPEG compressed image at right.
Recall that the image at right requires over 70% less storage space than that of the original
image!
A digital grayscale image. Fullsize version
The compressed grayscale image. Fullsize version
Finally, we compare our example of block row 4, block column 28 with the compressed version
of the block.
5 176 193 168 168 170 167 165
6 176 158 172 162 177 168 151
5 167 172 232 158
33 179 169 174
61 145 214
5
5 135 178
8 104 180 178 172 197 188 169
63
Enlargement of the block.
5 102 101 160 142 133 139
51
47
63
5 180 191 165
5
49
53
43
5 184 170 168
74
Pixel intensities of the block.
Enlargement of the compressed block.
0
172
198
190
149
159
149
179
17
166
131
175
168
195
192
140
12
167
177
255
140
39
123
203
22
178
170
145
1
28
146
187
11
103
183
190
160
207
189
155
60
9
80
103
157
144
115
156
60
48
67
11
176
196
139
19
49
58
47
0
190
168
176
60
Pixel intensities of the compressed block.
Issues and Problems
The JPEG Image Compression Standard is a very effective method for compressing digital
images, but it does suffer from some problems. One problem is the decoupling that occurs before
we apply the DCT - partitioning the image into 8x8 blocks results in the compressed image
sometimes appearing "blocky". In the images below, we have zoomed in on the upper right hand
corners of the original image and the compressed image. You can see the block artifacts in the
compressed image.
Upper right hand corner of original image. Fullsize
version
Upper right hand corner of compressed image. Fullsize
version
The quantization step makes the JPEG Image Compression Standard an example
of lossy compression. For some applications, such as web browsing, the loss of resolution is
acceptable. For other applications, such as high resolution photographs for magazine
advertisements, the loss of resolution is unacceptable.
Download