HW5 solution - Department of Electrical & Computer Engineering

advertisement
Polytechnic University, Dept. Electrical and Computer Engineering
EE3414 Multimedia Communication System I
Spring 2006, Yao Wang
___________________________________________________________________________________
Homework 5 (Audio Coding)
(Solution)
Reading Assignment:
Z. N. Li and M. Drew, Fundamentals of multimedia, Prentice Hall, 2004. Chapter 14: MPEG
audio compression. Copies provided.
Written Assignment
1. When uniformly quantizing a signal, what is the improvement obtained by adding one more bit
per sample?
6dB
2. What are the three auditory properties that determine the lowest level of a sound that can be
heard?
Threshold in quiet, frequency masking, temporal masking. Threshold-in-quiet tells us the lowest
level (i.e. the threshold) that can be heard when only a single tone is present, and the threshold
is frequency-dependent. Frequency masking tells us that when two tones are present, the louder
tone masks the softer tone if the two tones are close in frequency, so that the threshold for the
softer tone is raised beyond the threshold-in-quiet. Temporal masking refers to the fact that
when the louder tone stops playing, one cannot hear the softer tone immediately even if it is
above the masking level determined by the frequency masking effect. It takes a certain delay
before the softer tone can be heard, and the delay time depends on the frequency as well as the
loudness of the softer tone.
3. Assume an audio signal is divided into 16 frequency bands with energy in different bands as
follows
--------------------------------------------------------------------------------------Band
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Level (db)
0 8 12 10 6 2 20 60 14 20 15 2 3 5 3 1
-------------------------------------------------------------------------------------- Assume that if the level of the 8th
band is 60dB, it gives a masking of 12 dB in the 7th band, 15dB in the 9th. How many bits you
would need to code the 7th band and 9th band respectively? Suppose original signal is
represented with 8 bits/sample/band.
Because the energy of 7th band is 20 dB, greater than 12 dB, we need to code it. Because the
masking level is 12 dB, we can code it with 12 dB quantization noise, or reducing the bit rate by
2 bits. Sine the original sample is represented by 8 bits, we need to use only 6 bits. For the 9th
band, the signal energy is 14 dB, lower than 15 dB. So we don’t need to code the 9th band (i.e.
using 0 bits).
4. What are the three principle steps in perceptual audio coding? Draw a block diagram showing
these three components.
Subband filtering to derive sub-signals in different frequency bands, masking level calculation
based on energy of different subbands, bit allocation among subbands based the masking level
of each band.
5. What are the differences between the three layers of MPEG1 audio in terms of the techniques
used and the audio quality produced at the same bit rate?
Layer 1 and layer 2 both use a filter bank that divides the original signal into 32 subbands with
equal bandwidth in the range from 0 to the maximum frequency (half of the sampling
frequency) in the signal. Layer 3 further applies MDCT to each of the 32 sub-signals to split
each sub-signal to several (6 or 18) sub-sub-signals with smaller bandwidth. These sub-subsignals are then grouped to form critical bands with non-uniform bandwidths similar to that
produced by the human auditory system. Layer 1 scales and quantizes 12 samples of each
subband with the same scale factor and bit allocation, whereas layers 2 and 3 process 36
samples simultaneously, using three separate scale factors but the same bit allocation. Layer 1
only exploits the frequency masking property of the human auditory system when performing bit
allocation among subsignals, whereas Layers 2 and 3 exploit both the frequency and temporal
masking properties. Layers 1 and 2 apply fixed-length binary encoding to the quantized sample
indices, whereas Layer 3 employs Huffman coding.
For the same bit rate, the audio quality is lowest with Layer 1 and highest with Layer 3. To
yield the same audio quality, Layer 1 requires highest bit rate, Layer 3 the lowest. More
specifically, to yield quality indistinguishable from stereo audio originally at 1.4 Mbps, Layer 1
requires 384 Kbps, Layer 2 requires 192 Kbps, Layer 3 requires only 128 Kbps.
6. What is MP3 audio? Suppose you are a sales person in an electronics store. Explain how does it
work to a potential customer who is looking to buy a MP3 music player.
Download