Polytechnic University, Dept. Electrical and Computer Engineering EE3414 Multimedia Communication System I Spring 2006, Yao Wang ___________________________________________________________________________________ Homework 5 (Audio Coding) (Solution) Reading Assignment: Z. N. Li and M. Drew, Fundamentals of multimedia, Prentice Hall, 2004. Chapter 14: MPEG audio compression. Copies provided. Written Assignment 1. When uniformly quantizing a signal, what is the improvement obtained by adding one more bit per sample? 6dB 2. What are the three auditory properties that determine the lowest level of a sound that can be heard? Threshold in quiet, frequency masking, temporal masking. Threshold-in-quiet tells us the lowest level (i.e. the threshold) that can be heard when only a single tone is present, and the threshold is frequency-dependent. Frequency masking tells us that when two tones are present, the louder tone masks the softer tone if the two tones are close in frequency, so that the threshold for the softer tone is raised beyond the threshold-in-quiet. Temporal masking refers to the fact that when the louder tone stops playing, one cannot hear the softer tone immediately even if it is above the masking level determined by the frequency masking effect. It takes a certain delay before the softer tone can be heard, and the delay time depends on the frequency as well as the loudness of the softer tone. 3. Assume an audio signal is divided into 16 frequency bands with energy in different bands as follows --------------------------------------------------------------------------------------Band 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Level (db) 0 8 12 10 6 2 20 60 14 20 15 2 3 5 3 1 -------------------------------------------------------------------------------------- Assume that if the level of the 8th band is 60dB, it gives a masking of 12 dB in the 7th band, 15dB in the 9th. How many bits you would need to code the 7th band and 9th band respectively? Suppose original signal is represented with 8 bits/sample/band. Because the energy of 7th band is 20 dB, greater than 12 dB, we need to code it. Because the masking level is 12 dB, we can code it with 12 dB quantization noise, or reducing the bit rate by 2 bits. Sine the original sample is represented by 8 bits, we need to use only 6 bits. For the 9th band, the signal energy is 14 dB, lower than 15 dB. So we don’t need to code the 9th band (i.e. using 0 bits). 4. What are the three principle steps in perceptual audio coding? Draw a block diagram showing these three components. Subband filtering to derive sub-signals in different frequency bands, masking level calculation based on energy of different subbands, bit allocation among subbands based the masking level of each band. 5. What are the differences between the three layers of MPEG1 audio in terms of the techniques used and the audio quality produced at the same bit rate? Layer 1 and layer 2 both use a filter bank that divides the original signal into 32 subbands with equal bandwidth in the range from 0 to the maximum frequency (half of the sampling frequency) in the signal. Layer 3 further applies MDCT to each of the 32 sub-signals to split each sub-signal to several (6 or 18) sub-sub-signals with smaller bandwidth. These sub-subsignals are then grouped to form critical bands with non-uniform bandwidths similar to that produced by the human auditory system. Layer 1 scales and quantizes 12 samples of each subband with the same scale factor and bit allocation, whereas layers 2 and 3 process 36 samples simultaneously, using three separate scale factors but the same bit allocation. Layer 1 only exploits the frequency masking property of the human auditory system when performing bit allocation among subsignals, whereas Layers 2 and 3 exploit both the frequency and temporal masking properties. Layers 1 and 2 apply fixed-length binary encoding to the quantized sample indices, whereas Layer 3 employs Huffman coding. For the same bit rate, the audio quality is lowest with Layer 1 and highest with Layer 3. To yield the same audio quality, Layer 1 requires highest bit rate, Layer 3 the lowest. More specifically, to yield quality indistinguishable from stereo audio originally at 1.4 Mbps, Layer 1 requires 384 Kbps, Layer 2 requires 192 Kbps, Layer 3 requires only 128 Kbps. 6. What is MP3 audio? Suppose you are a sales person in an electronics store. Explain how does it work to a potential customer who is looking to buy a MP3 music player.