Multimedia Compression

advertisement
Multimedia Compression
B90901134 陳威尹
Generic Compression Overview
Generic compression is also called entropy encoding, and is lossless
compression as well. This kind of compressions needs the statistical knowledge of
data, no matter gets in processing or in advance.
Basics of Information Theory
According to Shannon, the entropy of an information source S is defined as:
where pi is the probability that symbol Si in S will occur.
indicates the amount of information contained in Si, i.e., the number of
bits needed to code Si.
For example, in an image with uniform distribution of gray-level intensity, i.e.
pi = 1/256, then the number of bits needed to code each gray level is 8 bits. The
entropy of this image is 8.
Huffman Coding
Huffman coding is based on the frequency of occurrence of a data item (pixel
in images). The principle is to use a lower number of bits to encode the data that
occurs more frequently. Codes are stored in a Code Book which may be
constructed for each image or a set of images. In all cases the code book plus
encoded data must be transmitted to enable decoding.
The Huffman algorithm is now briefly summarized:
1. Initialization: Put all nodes in an OPEN list, keep it sorted at all times (e.g.,
ABCDE).
2. Repeat until the OPEN list has only one node left:
(a) From OPEN pick two nodes having the lowest frequencies/probabilities,
create a parent node of them.
(b) Assign the sum of the children's frequencies/probabilities to the parent
node and insert it into OPEN.
(c) Assign code 0, 1 to the two branches of the tree, and delete the children
from OPEN.
Symbol
------
Count
log(1/p)
Code
Subtotal (# of bits)
---------
--------------------
-----
--------
A
15
1.38
0
15
B
7
2.48
100
21
C
6
2.70
101
18
D
6
2.70
110
18
E
5
2.96
111
15
TOTAL (# of bits): 87



The following points are worth noting about the above algorithm:
Decoding for the above two algorithms is trivial as long as the coding table
(the statistics) is sent before the data. (There is a bit overhead for sending this,
negligible if the data file is big.)
Unique Prefix Property: no code is a prefix to any other code (all symbols
are at the leaf nodes) -> great for decoder, unambiguous.
If prior statistics are available and accurate, then Huffman coding is very
good.
In the above example:
Number of bits needed for Huffman Coding is: 87 / 39 = 2.23
Arithmetic Coding
Huffman coding and the like use an integer number (k) of bits for each symbol;
hence k is never less than 1. Sometimes, e.g., when sending a 1-bit image,
compression becomes impossible.
 Idea: Suppose alphabet was
X, Y
and
prob(X) = 2/3
prob(Y) = 1/3

If we are only concerned with encoding length 2 messages, then we can map
all possible messages to intervals in the range [0..1]:

To encode message, just send enough bits of a binary fraction that uniquely
specifies the interval.
Conclusion
Generic compression algorithms are used in generic file compression formats
like Zip, Rar, gzip, bzip, etc, and they are usually the final stage of content
specific compression; for example, JPEG uses Huffman or Arithmetic; Monkey’s
Audio (ape) uses Rice; and Lossless Audio (La) uses Arithmetic.
Content specific Compression
Generally, correlation means redundancy. For general algorithm may not find
content-specific correlation, and general algorithm of higher order may not be
efficient enough, content specific de-correlation is needed.
No matter lossy or lossless, multimedia file format use content-specific
pre-filter as 1st step to reduce data redundancy.
Audio compression
Traditional lossless compression methods (Huffman, LZW, etc.) usually don't
work well on audio compression (the same reason as in image compression).
Psychoacoustics
Human hearing and voice



Frequency range is about 20 Hz to 20 kHz, most sensitive at 2 to 4 KHz.
Dynamic range (quietest to loudest) is about 96 dB
Normal voice range is about 500 Hz to 2 kHz
o
o
Low frequencies are vowels and bass
High frequencies are consonants
Critical Bands

Human auditory system has a limited, frequency-dependent resolution. The

perceptually uniform measure of frequency can be expressed in terms of the
width of the Critical Bands.
It is less than 100 Hz at the lowest audible frequencies, and more than 4 kHz
at the high end. Altogether, the audio frequency range can be partitioned into
25 critical bands.
A new unit for frequency bark (after Barkhausen) is introduced:
1 Bark = width of one critical band
For frequency < 500 Hz, it converts to
For frequency > 500 Hz, it is
freq / 100
Bark,
Bark.
Sensitivity of human hearing in relation to frequency

Experiment: Put a person in a quiet room. Raise level of 1 kHz tone until just
barely audible. Vary the frequency and plot
Frequency Masking

Experiment: Play 1 kHz tone (masking tone) at fixed level (60 dB). Play test
tone at a different level (e.g., 1.1 kHz), and raise level until just

distinguishable.
Vary the frequency of the test tone and plot the threshold when it becomes
audible:

Repeat for various frequencies of masking tones
Temporal masking
 If we hear a loud sound, then it stops, it takes a little while until we can hear a
soft tone nearby.
 Experiment: Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at
40 dB. Test tone can't be heard (it's masked).
Stop masking tone, then stop test tone after a short delay.
Adjust delay time to the shortest time when test tone can be heard (e.g., 5 ms).
Repeat with different level of the test tone and plot:

Total effect of both frequency and temporal maskings:
Image compression
Compressing an image is significantly different than compressing raw binary
data. Of course, general purpose compression programs can be used to compress
images, but the result is less than optimal. This is because images have certain
statistical properties which can be exploited by encoders specifically designed for
them. Also, some of the finer details in the image can be sacrificed for the sake of
saving a little more bandwidth or storage space. This also means that lossy
compression techniques can be used in this area.
Lossless compression involves with compressing data which, when
decompressed, will be an exact replica of the original data. This is the case when
binary data such as executables, documents etc. are compressed. They need to be
exactly reproduced when decompressed. On the other hand, images (and music
too) need not be reproduced 'exactly'. An approximation of the original image is
enough for most purposes, as long as the error between the original and the
compressed image is tolerable.
Error Metrics
Two of the error metrics used to compare the various image compression
techniques are the Mean Square Error (MSE) and the Peak Signal to Noise Ratio
(PSNR). The MSE is the cumulative squared error between the compressed and
the original image, whereas PSNR is a measure of the peak error. The
mathematical formulae for the two are
MSE =
PSNR = 20 * log10 (255 / sqrt(MSE))
where I(x,y) is the original image, I'(x,y) is the approximated version (which is
actually the decompressed image) and M,N are the dimensions of the images. A
lower value for MSE means lesser error, and as seen from the inverse relation
between the MSE and PSNR, this translates to a high value of PSNR. Logically, a
higher value of PSNR is good because it means that the ratio of Signal to Noise is
higher. Here, the 'signal' is the original image, and the 'noise' is the error in
reconstruction. So, if you find a compression scheme having a lower MSE (and a
high PSNR), you can recognize that it is a better one.
The Outline
We'll take a close look at compressing grey scale images. The algorithms
explained can be easily extended to color images, either by processing each of the
color planes separately, or by transforming the image from RGB representation to
other convenient representations like YUV in which the processing is much easier.
The usual steps involved in compressing an image are
1.
Specifying the Rate (bits available) and Distortion (tolerable error)
parameters for the target image
2.
Dividing the image data into various classes, based on their importance
3.
Dividing the available bit budget among these classes, such that the distortion
is a minimum
Quantize each class separately using the bit allocation information derived in
step 3
5.
Encode each class separately using an entropy coder and write to the file
Remember, this is how 'most' image compression techniques work. But there
are exceptions. One example is the Fractal Image Compression technique, where
possible self similarity within the image is identified and used to reduce the
amount of data required to reproduce the image. Traditionally these methods have
been time consuming, but some latest methods promise to speed up the process.
Literature regarding fractal image compression can be found at <findout>.
4.
Reconstructing the image from the compressed data is usually a faster process
than compression. The steps involved are
1.
Read in the quantized data from the file, using an entropy decoder. (reverse
of step 5)
2.
Dequantize the data. (reverse of step 4)
3.
Rebuild the image. (reverse of step 2)
Reference:
 Lossless Compression Algorithms
http://www.cs.cf.ac.uk/Dave/Multimedia/node207.html
 Audio Compression
http://www.cs.sfu.ca/CourseCentral/365/li/material/notes/Chap4/Chap4.4/Cha
p4.4.html
 Image Compression
http://www.debugmode.com/imagecmp/
 Monkey’s Audio
http://www.monkeysaudio.com/theory.html
 Lossless Audio (La)
http://www.lossless-audio.com/theory.htm
 Compression and speed of lossless audio formats




http://web.inter.nl.net/users/hvdh/lossless/main.htm
http://members.home.nl/w.speek/comparison.htm
http://www.wordiq.com/definition/Wavelet_compression
http://www.wordiq.com/definition/Psychoacoustics
http://www.wordiq.com/definition/MP3
H.264
http://www.komatsu-trilink.jp/device/pdf11/UBV2003.pdf
Download