The coefficients of the

advertisement
ES154 Communications and Media 2001-2002
Lecture 19
UNIVERSITY OF WARWICK
School of Engineering
Compression
D.D.Udrea
Aims and Objectives
- To explain the concept of data compression and data streaming
- To give some examples of compression algorithms (by no means exhaustive)
Types of compression
As we have discussed so far, the memory and bandwidth requirements for most of the multimedia sources
exceed the available bandwidths and storage spaces of the networks and devices available. Therefore the
information needs to be compressed in order to reduce its bandwidth.
Compression can b classified according to several criteria:
One criterion is the amount of information being recovered after de-compression takes place:
- lossless compression: in which the entire information is recovered
- lossy compression: in which some details from the original data are not recovered.
A second criterion is the dependency on the type of information:
- independent : this type of compression can be applied to any string of bits, irrespective whether
they belong to a text file or an image ( e.g. entropy encoding)
- -dependent : this type of compression takes into account specific characteristics of the media
source and the human perception of it to eliminate minor detail (e.g. source encoding)
A third criterion is a general one which takes into consideration solely the type of information of the
source. Examples of sources are text, image, audio, video.
Text compression
All the three different types of text –unformatted, formatted, hypertext- are represented as strings of
characters from a defined set. They may comprise letters, numbers and signs (called alphanumeric
characters) and formatting characters (called control characters). The control characters are used to
interpret the alphanumeric characters in a certain way.
We can deduce from this that the compression algorithms used for text should be lossless, since any loss
of information may lead to misinterpretation of the original text or its formatting.
One of the types of lossless encoding, which is also independent of the type of information that is being
compressed, is entropy encoding. Entropy refers to the frequency (the number of times) in which a
certain symbol appears in the entire file of data.
A particular type of encoding is Huffman encoding (with different variants). Basically, the Huffman
encoding means assigning to every symbol a code where the length of the code is inversely related to the
number of times the symbol appears (i.e. frequently occurring symbols are coded with shorter codes,
rarely occurring symbols are given longer codes). One important restriction in the way the code is defined
is that, when decoding, the codes should not be mistaken for one another. Therefore the rule is that a
longer code should not start with a shorter code.
For example, if the word BANANA is to be coded without compression, just as plain text:
text
B
A
N
A
N
A
ASCII(decimal)
66
65
78
65
78
65
ASCII (hex)
42
41
4E
41
4E
41
Binary
1000010 1000001 1001110 1000001 1001110 1000001
The number of bits for the word is 6 characters x 7 bit/character = 42 bit
If the text is to be encoded we consider the length of the codeword inversely proportional to the number
of
occurrences (or frequency of occurrence) of a particular character.
533560702
Page 1 of 7
ES154 Communications and Media 2001-2002
Lecture 19
UNIVERSITY OF WARWICK
School of Engineering
Compression
D.D.Udrea
For example, A appears 3 times, thus the shortest codeword, N appears 2 times, thus longer codeword and
B appears once, thus the longest codeword.
text
B
A
N
A
N
A
Binary compressed 001
1
01
1
01
1
The number of bits in the compressed version is 10.
We have also ensured that the de-coding is unambiguous, since all codewords end with a ‘1’.
The compression achieved is 10/42 = 0.24
Or can be expressed as 42/10 = 4.2 : 1
Or, expressed as compression ratio percentage (42-10)/42 = 76%
A dictionary encoding uses whole strings of characters, rather than individual characters, as a basis for
encoding. A particular example is L-Z-W (Lempel-Ziv-Welsh) in which the place of the word in the
dictionary, rather than the word itself, is transmitted. Thus, shorter words will have a low compression,
but longer words will have a high compression.
The disadvantage consists in the fact that both the compressor and the de-compressor should have (or
should build dynamically) the dictionary.
For example, if the text GO TO SCHOOL is to be coded, we have to build a dictionary;
1st place GO
or, in alphabetical order
1st place GO
2nd place TO
2nd place SCHOOL
rd
3 place SCHOOL
3rd place TO
If we are to transmit the text in ASCII representation (7 bit/character, including spaces) we have
12 characters x 7 bit/character = 84 bit
If we are to transmit the text compressed, we should only send the place in the dictionary for each word.
Text
GO TO SCHOOL
Compressed text
123
or, in alphabetical order
132
Compressed text binary form 01 10 11
01 11 10
In total 3 words x 2 bit/word = 6 bit
For a text of maximum N different words,
N = 2n where N is the maximum number of words in the dictionary
n is the number of bits needed to code each place
Image compression
Images can be compressed in either lossless or lossy format, because the eye is not so sensitive to certain
colours and certain image features. However, if the image represents a source of data (e.g. intensity levels
from an apparatus or an identification code) the format should be lossless.
There are a multitude of image formats, some of them include compression algoritms. Example of image
formats which are mostly used in Web applications:
JPEG : high levels of compression on images with 16 or 24 bits of colour (lossy)
GIF : medium levels of compression on images with up to 256 colours (lossless)
PNG : supports larger numbers of colours and more flexible transparency (lossless)
TIFF : up to 24-bit images (supports lossless compression)
533560702
Page 2 of 7
ES154 Communications and Media 2001-2002
Lecture 19
UNIVERSITY OF WARWICK
School of Engineering
Compression
D.D.Udrea
BMP : up to 24-bit images (supports lossless compression)
One type of algorithms for lossless compression is run-length encoding. It is used, for example, in
transmitting digitised documents, from scanners and fax machines.
To obtain compression, it is assumed that the original file contains long, continuous, fields of identical
data. Therefore, instead of transmitting the value of the data each time it is encountered, the value is
transmitted only once, followed by a number representing the number of times to be repeated.
In the example below, a binary image is to be transmitted. It has alternating black and white fields.
. The normal
1
1
1
1
1
0
0
0
0
1
The total number of bits is 12 pixels x 1 bit/pixel = 12bit
1
coding should be
1
Instead of individual bits, the number of identical, consecutive bits is transmitted (bit, number of times)
(1,5)
(0,4)
(1,3)
Moreover, if we make the rule that the start of an image is always a black pixel, one can drop the actual
pixel values from the transmission.
(5)
(4)
(3) or, in binary 101 100 011, a total of 9 bits.
The total number of bits depends more on the number of times the image changes.
Another class of compression algorithms are called transform algorithms. They are based on the fact
that a signal can be represented in time or space, as well as in frequency.
For example, each line in the image above can be represented as a spatial signal of pixel amplitudes.
Continuous or slow varying signals yield to low frequencies being present in the signal, whereas frequent
or sharp variations of the signal yield to high frequencies being present in the signal.
By performing a transform on the original time or space signal (e.g. Fourier Transform) one can obtain
the frequency representation of the signal. By performing an inverse transform (e.g. Inverse Fourier
Transform) on the frequency representation, one can retrieve back the original time or space signal.
One type of transform that can be used is the Cosine Transform. If it is applied to the samples of a signal
it is called Digital Cosine Transform (DCT). DCT forms the basis of numerous compression
algorithms, such as used in JPEG or MPEG formats.
If the original signal contains the following samples
s =[ s(0) s(1) s(2) ……s(k)]
The coefficients of the
The coefficients of a the DCT can be calculated as
533560702
Page 3 of 7
ES154 Communications and Media 2001-2002
Lecture 19
C ( 0) 
1
M
M 1
 s(k ) and C (u) 
k 0
2
M
M 1
UNIVERSITY OF WARWICK
School of Engineering
Compression
D.D.Udrea
 (2k  1)u 
 for u = 1,2 , …
2M
 s(k ) cos
k 0
Then, instead of sending the samples of the original signal, one sends the coefficients of the cosine
transform.
The compression occurs in two ways. A lossless compression occurs if some frequencies are missing
from the signal and thus their coefficients are 0 and are not transmitted. A lossy compression occurs if
some frequency components are very small compared to others and are not transmitted. Further
compression occurs if the value of the coefficients is set to a lower precision (e.g. 2.55 rounded to 2.5) or
digitised to a lower number of bits.
Audio compression
Unlike text and images, audio and most video signals are originally analog signals. After digitisation,
both comprise a continuous stream of digital values, which represent the amplitude of a sample of a
particular signal taken at repetitive time intervals. The digitisation process uses PCM (pulse code
modulation), see Lecture 16.
PCM involves the sampling of the analog signal at (at least) twice its maximum frequency. Each sample
is then quantised and the binary level of the quantisation level is transmitted (as shown in figure below).
The digital signal corresponding to the analog signal is
100 011 011 010 corresponding to amplitudes of 4 3 3 2…
Differential Pulse Code Modulation (D-PCM) is a derivative of the standard PCM which exploits the fact
that, for most audio signals, the range in the differences in amplitudes between successive samples is
smaller than the range of the actual samples. In D-PCM, instead of coding the amplitude of the signal,
one codes the difference in the amplitude of signals. Hence, either fewer bits are necessary for encoding,
or the quantisation error decreases very much if the same number of bits is used
In the example above, the amplitudes are
4 3 3 2 1 0 2 7 4 2 1
the differences in amplitudes are
(-)1 0 1 1 1 (+) 2 5 (-)3 2 1
533560702
8 = 23 levels which can be PCM coded with 3 bits
the signal has a dynamic range of max(7)-min(0)=7
6 levels
the signal has a dynamic range of max(5)-min(0)=5
Page 4 of 7
ES154 Communications and Media 2001-2002
Lecture 19
UNIVERSITY OF WARWICK
School of Engineering
Compression
D.D.Udrea
There are various standards which specify the bit rate of the compressed audio signal, the type of
compression code and the quality of the decompressed audio signals. They use a combination of PCM,
DCT or DFT (Discrete Fourier Transform) and quantisation.
The most common are the MPEG (Motion Picture Expert Group) standards, which also specify
parameters for video signals. Each new standard (or layer) brings increased compression and increased
signal quality, however, at the expense of increased levels of complexity and cost and higher delay in
processing the signal.
The table below shows a comparison between 3 MPEG standards:
MPEG layer Application
Compressed bit rate Quality
Delay
Layer 1
Digital audio cassette
32-448 kb/s
HiFi
20ms
Layer 2
Audio/video broadcasting
32-193 kb/s
near CD
40 ms
Layer 3
CD quality over slow channels
64 kb/s
CD quality
60 ms
Video compression
Video (with sound) features in a number of multimedia applications:
- interpersonal (videoconferencing and video telephony)
- interactive (access to stored videos)
- entertainment (digital TV and movie on demand)
The quality of the video used in these applications is determined by the digitisation format and the frame
rate used. In practice, the bandwidth resulting from the digitisation formats is substantially larger than of
the transmission channels used. Hence, compression is necessary.
Moreover, as none of the single algorithm compression techniques give a high enough compression/small
enough delay to be able to satisfy the requirements for video, combinations of algorithms have to be used.
The MPEG standards (see table above define such combinations of algorithms).
The video compression techniques used by these algorithms are heavily reliant on the fact that successive
video frames are heavily correlated (related to each other), therefore the contents of a frame may be used
to predict some or all of the contents of subsequent frames.
An example of such algorithms is in MPEG 4 standard, which is mostly used in multimedia applications
over the Internet. The difference to other standards is that MPEG 4 has a number of content-based
functionalities (i.e the algorithms take into account the exact content of the frames in order to optimise
the compression).
In this case, each scene is defined in terms of audio and visual objects and each object is treated
separately.
For example, in the example below, there are several visual objects planes (VOPs): the background scene,
the car and the person. Each VOP is then analysed in terms of shape, texture, motion etc… and will be
coded separately to make full use of the most appropriate algorithms.
533560702
Page 5 of 7
ES154 Communications and Media 2001-2002
Lecture 19
UNIVERSITY OF WARWICK
School of Engineering
Compression
D.D.Udrea
Streaming
Streaming refers to a mode of operation in which continuously generated media (such as audio and video)
is passed directly to the destination as it is generated and, at the destination, is played out directly as it is
received, then discarded.
The main restriction in this case is that the source is synchronised with the transmission channel and the
receiver. In order to overcome little variations in the transmission rates, especially on a network which
does not guarantee the amount of delay, a playout memory buffer is employed at the receiver end which
can store several seconds of audio/video and thus re-synchronise the transmission.
Below is an example of a transmission of streamed media over the web.
When the client clicks on the link, the same procedure is followed as for any page. A TCP/IP connection
is established between the web server and the client. The server contacts the file server which holds the
media file itself and a meta file containing the description of the media file. The meta file is sent first,
enabling the receiving web browser to set up the appropriate media player. When the media file is sent, it
is sent directly to the media player, without the involvement of the browser itself.
533560702
Page 6 of 7
ES154 Communications and Media 2001-2002
Lecture 19
UNIVERSITY OF WARWICK
School of Engineering
Compression
D.D.Udrea
References:
Halsall, Fred – Multimedia communications: Applications, Networks, Protocols and Standards – Addison
Wesley, 2001, ISBN 0-201-39818-4 QA 78.51.H2
533560702
Page 7 of 7
Download