Multimedia Compression B90901134 陳威尹 Generic Compression Overview Generic compression is also called entropy encoding, and is lossless compression as well. This kind of compressions needs the statistical knowledge of data, no matter gets in processing or in advance. Basics of Information Theory According to Shannon, the entropy of an information source S is defined as: where pi is the probability that symbol Si in S will occur. indicates the amount of information contained in Si, i.e., the number of bits needed to code Si. For example, in an image with uniform distribution of gray-level intensity, i.e. pi = 1/256, then the number of bits needed to code each gray level is 8 bits. The entropy of this image is 8. Huffman Coding Huffman coding is based on the frequency of occurrence of a data item (pixel in images). The principle is to use a lower number of bits to encode the data that occurs more frequently. Codes are stored in a Code Book which may be constructed for each image or a set of images. In all cases the code book plus encoded data must be transmitted to enable decoding. The Huffman algorithm is now briefly summarized: 1. Initialization: Put all nodes in an OPEN list, keep it sorted at all times (e.g., ABCDE). 2. Repeat until the OPEN list has only one node left: (a) From OPEN pick two nodes having the lowest frequencies/probabilities, create a parent node of them. (b) Assign the sum of the children's frequencies/probabilities to the parent node and insert it into OPEN. (c) Assign code 0, 1 to the two branches of the tree, and delete the children from OPEN. Symbol ------ Count log(1/p) Code Subtotal (# of bits) --------- -------------------- ----- -------- A 15 1.38 0 15 B 7 2.48 100 21 C 6 2.70 101 18 D 6 2.70 110 18 E 5 2.96 111 15 TOTAL (# of bits): 87 The following points are worth noting about the above algorithm: Decoding for the above two algorithms is trivial as long as the coding table (the statistics) is sent before the data. (There is a bit overhead for sending this, negligible if the data file is big.) Unique Prefix Property: no code is a prefix to any other code (all symbols are at the leaf nodes) -> great for decoder, unambiguous. If prior statistics are available and accurate, then Huffman coding is very good. In the above example: Number of bits needed for Huffman Coding is: 87 / 39 = 2.23 Arithmetic Coding Huffman coding and the like use an integer number (k) of bits for each symbol; hence k is never less than 1. Sometimes, e.g., when sending a 1-bit image, compression becomes impossible. Idea: Suppose alphabet was X, Y and prob(X) = 2/3 prob(Y) = 1/3 If we are only concerned with encoding length 2 messages, then we can map all possible messages to intervals in the range [0..1]: To encode message, just send enough bits of a binary fraction that uniquely specifies the interval. Conclusion Generic compression algorithms are used in generic file compression formats like Zip, Rar, gzip, bzip, etc, and they are usually the final stage of content specific compression; for example, JPEG uses Huffman or Arithmetic; Monkey’s Audio (ape) uses Rice; and Lossless Audio (La) uses Arithmetic. Content specific Compression Generally, correlation means redundancy. For general algorithm may not find content-specific correlation, and general algorithm of higher order may not be efficient enough, content specific de-correlation is needed. No matter lossy or lossless, multimedia file format use content-specific pre-filter as 1st step to reduce data redundancy. Audio compression Traditional lossless compression methods (Huffman, LZW, etc.) usually don't work well on audio compression (the same reason as in image compression). Psychoacoustics Human hearing and voice Frequency range is about 20 Hz to 20 kHz, most sensitive at 2 to 4 KHz. Dynamic range (quietest to loudest) is about 96 dB Normal voice range is about 500 Hz to 2 kHz o o Low frequencies are vowels and bass High frequencies are consonants Critical Bands Human auditory system has a limited, frequency-dependent resolution. The perceptually uniform measure of frequency can be expressed in terms of the width of the Critical Bands. It is less than 100 Hz at the lowest audible frequencies, and more than 4 kHz at the high end. Altogether, the audio frequency range can be partitioned into 25 critical bands. A new unit for frequency bark (after Barkhausen) is introduced: 1 Bark = width of one critical band For frequency < 500 Hz, it converts to For frequency > 500 Hz, it is freq / 100 Bark, Bark. Sensitivity of human hearing in relation to frequency Experiment: Put a person in a quiet room. Raise level of 1 kHz tone until just barely audible. Vary the frequency and plot Frequency Masking Experiment: Play 1 kHz tone (masking tone) at fixed level (60 dB). Play test tone at a different level (e.g., 1.1 kHz), and raise level until just distinguishable. Vary the frequency of the test tone and plot the threshold when it becomes audible: Repeat for various frequencies of masking tones Temporal masking If we hear a loud sound, then it stops, it takes a little while until we can hear a soft tone nearby. Experiment: Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at 40 dB. Test tone can't be heard (it's masked). Stop masking tone, then stop test tone after a short delay. Adjust delay time to the shortest time when test tone can be heard (e.g., 5 ms). Repeat with different level of the test tone and plot: Total effect of both frequency and temporal maskings: Image compression Compressing an image is significantly different than compressing raw binary data. Of course, general purpose compression programs can be used to compress images, but the result is less than optimal. This is because images have certain statistical properties which can be exploited by encoders specifically designed for them. Also, some of the finer details in the image can be sacrificed for the sake of saving a little more bandwidth or storage space. This also means that lossy compression techniques can be used in this area. Lossless compression involves with compressing data which, when decompressed, will be an exact replica of the original data. This is the case when binary data such as executables, documents etc. are compressed. They need to be exactly reproduced when decompressed. On the other hand, images (and music too) need not be reproduced 'exactly'. An approximation of the original image is enough for most purposes, as long as the error between the original and the compressed image is tolerable. Error Metrics Two of the error metrics used to compare the various image compression techniques are the Mean Square Error (MSE) and the Peak Signal to Noise Ratio (PSNR). The MSE is the cumulative squared error between the compressed and the original image, whereas PSNR is a measure of the peak error. The mathematical formulae for the two are MSE = PSNR = 20 * log10 (255 / sqrt(MSE)) where I(x,y) is the original image, I'(x,y) is the approximated version (which is actually the decompressed image) and M,N are the dimensions of the images. A lower value for MSE means lesser error, and as seen from the inverse relation between the MSE and PSNR, this translates to a high value of PSNR. Logically, a higher value of PSNR is good because it means that the ratio of Signal to Noise is higher. Here, the 'signal' is the original image, and the 'noise' is the error in reconstruction. So, if you find a compression scheme having a lower MSE (and a high PSNR), you can recognize that it is a better one. The Outline We'll take a close look at compressing grey scale images. The algorithms explained can be easily extended to color images, either by processing each of the color planes separately, or by transforming the image from RGB representation to other convenient representations like YUV in which the processing is much easier. The usual steps involved in compressing an image are 1. Specifying the Rate (bits available) and Distortion (tolerable error) parameters for the target image 2. Dividing the image data into various classes, based on their importance 3. Dividing the available bit budget among these classes, such that the distortion is a minimum Quantize each class separately using the bit allocation information derived in step 3 5. Encode each class separately using an entropy coder and write to the file Remember, this is how 'most' image compression techniques work. But there are exceptions. One example is the Fractal Image Compression technique, where possible self similarity within the image is identified and used to reduce the amount of data required to reproduce the image. Traditionally these methods have been time consuming, but some latest methods promise to speed up the process. Literature regarding fractal image compression can be found at <findout>. 4. Reconstructing the image from the compressed data is usually a faster process than compression. The steps involved are 1. Read in the quantized data from the file, using an entropy decoder. (reverse of step 5) 2. Dequantize the data. (reverse of step 4) 3. Rebuild the image. (reverse of step 2) Reference: Lossless Compression Algorithms http://www.cs.cf.ac.uk/Dave/Multimedia/node207.html Audio Compression http://www.cs.sfu.ca/CourseCentral/365/li/material/notes/Chap4/Chap4.4/Cha p4.4.html Image Compression http://www.debugmode.com/imagecmp/ Monkey’s Audio http://www.monkeysaudio.com/theory.html Lossless Audio (La) http://www.lossless-audio.com/theory.htm Compression and speed of lossless audio formats http://web.inter.nl.net/users/hvdh/lossless/main.htm http://members.home.nl/w.speek/comparison.htm http://www.wordiq.com/definition/Wavelet_compression http://www.wordiq.com/definition/Psychoacoustics http://www.wordiq.com/definition/MP3 H.264 http://www.komatsu-trilink.jp/device/pdf11/UBV2003.pdf