Digitized Sound Telecommunications 1 P. Mathys Sampling of Waveforms Computers cannot directly deal with continuous-time (CT) waveforms. A CT waveform needs to be sampled at regular time intervals before it can be stored and processed by a computer. The sampling operation converts the CT signal into a discrete-time (DT) signal or sequence. Sampling of Waveforms The next slide shows a sinewave and its samples after sampling with rate Fs = 16000 Hz (16000 samples/sec) The samples are marked with “o” in the graph. Each sample is a real number, e.g., 0.70710678 and a sampled waveform is a sequence of such numbers. Sampling of 1000 Hz Sinewave 16000 Samples/sec ==> Fs = 16000 Hz 16 Samples/Period Sampling at Higher Rate If we sample at a higher rate, we expect to Need more memory. Need larger files to store all samples. Takes longer to transmit samples over network. Reduce approximation error that results from sampling a CT signal. Sampling at Higher Rate 32000 Samples/sec ==> Fs = 32000 Hz 32 Samples/Period Sampling at Lower Rate If we sample at a lower rate, we will use less storage space and transmission, e.g., over the Internet, will be faster. But how much will the quality degrade? What is the minimum sampling rate that is needed? Sampling at Lower Rate 8000 Samples/sec ==> Fs = 8000 Hz 8 Samples/Period Using Different Sampling Rates Sampling Rate Fs File Size (1 sec, 16 bits) Sound Sample 8,000 Hz 16 kB sin1000_8.wav 16,000 Hz 32 kB sin1000_16.wav 32,000 Hz 64 kB sin1000_32.wav What Sampling Rate is Best? The previous three examples of sampling a 1000 Hz tone at Fs=8000 Hz, Fs=16000 Hz, and Fs=32000 Hz show no difference in sound quality. But let’s look at another example where we sample a 5000 Hz tone at Fs=16000 Hz and Fs=8000 Hz. 5000 Hz Sinewave, Fs=16000 Hz sin5000_16.wav 16 samples in 1 msec, 3.2 samples per period 5000 Hz Sinewave, Fs=8000 Hz Fs=8000 sin5000_8.wav Fs=16000 sin5000_16.wav Leaving out every second sample (x) we have 8 samples (o) in 1 msec, 1.6 samples per period 5000 Hz Sinewave The 5000 Hz sinewave sounds different at Fs=8000 Hz and at Fs=16000 Hz. The reason is that the soundcard, which converts the samples back to a CT waveform, tries to find the “smoothest” (i.e., the lowest frequency) waveform that passes through all samples as shown in the next slide. 5000 Hz Sinewave, Fs=8000 Hz Green sin5000_8.wav Blue sin5000_16.wav Green curve is “smoother” than blue (dashed) curve. Aliasing The effect whereby tone frequencies are altered because of a reduction in sampling rate is called aliasing. Aliasing affects all sampled sound sequences, whether they be pure tones, music or speech. Music examples: Original Fs=44100 Hz muss44.wav w/o Aliasing Fs=11025 Hz muss11.wav with Aliasing Fs=11025 Hz muss11_ali.wav Aliasing Aliasing folds high frequency components down to low frequencies. This is visible and audible especially well in the portions marked in red. muss11_ali_orig.wav With Aliasing Original muss11_ali44s.wav muss44s.wav Aliasing To prevent aliasing from occurring, the sound file needs to be lowpass-filtered before the sampling rate is reduced. muss11_ali_noali.wav With Aliasing No Aliasing muss11_ali44s.wav muss11_44s.wav Nyquist Rate Let B (in Hz) be the highest frequency contained in a sound waveform. Sampling Theorem: To avoid distortion due to aliasing, a signal of bandwidth B must use a sampling rate Fs>2B. The sampling rate of 2B samples/sec is called Nyquist rate. Common Sampling Rates Application Sampling Rate Fs [Hz] Bandwidth B [Hz] Telephony 8000 3000 Music Low Quality 11025 5000 Music 22050 10000 44100 20000 48000 22000 Music Hi-Fi DAT Digital Audio Tape Quantization Sampling alone is not sufficient to convert a waveform to a format that a computer can handle. The problem is that each sample is a real number that requires infinite precision for processing and storing. Computers have finite word length and can only approximate real numbers. Quantization For example, pi = 3.14159265358979323846264… is a real number. On a computer that can represent 5 decimal digits we would approximate pi as 3.1416. Typical computer word lengths are 8, 16, 32, and 64 bits (approximately 2, 4, 9, and 19 decimal digits). Quantization The process of approximating a real number by a number with finite wordlength is called quantization. Unlike sampling, quantization is not a reversible process. However, by choosing a large enough wordlength, the quantization error can be made as small as desired. Quantization: Examples The next slides show the quantization of a sine wave to 2, 3, and 4 bits: 2 Bits (4 levels): 00, 01, 10, 11 3 Bits (8 levels): 000, 001, 010, 011, 100, 101, 110, 111 4 Bits (16 levels): 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111 Quantization: Example (2 Bits) 16 bits q1000_16.wav 11 10 01 00 Digital sequence of samples (Fs=16000 Hz): 10,11,11,11,11,11,11,10,01,00,00,00,00,... 2 bits q1000_2.wav Quantization: Example (2 Bits) Quantization is a non-linear operation that introduces new frequency components. Here the new components appear at odd multiples of the fundamental frequency q1000h_16_2.wav 16 bits 2 bits q1000_16.wav q1000_2.wav Quantization: Example (3 Bits) 16 bits 111 110 101 100 011 010 001 000 Digital sequence of samples (Fs=16000 Hz): 100,110,111,111,111,111,110,100,011,001,... q1000_16.wav 3 bits q1000_3.wav Quantization: Example (4 Bits) 16 bits 1111 1110 1101 1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001 0000 q1000_16.wav 4 bits q1000_4.wav Digital sequence of samples (Fs=16000 Hz): 1001,1100,1110,1111,1111,1110,1100,1001,0110,... Quantization Error The difference q(t)-y(t) between the quantized signal and the original signal (shown dotted in red in the previous slides) is called quantization error. The quantization error becomes larger if fewer bits are used. Quantization is a nonlinear process that introduces new and unwanted frequency components. Signal to Quantization Noise Ratio The signal to quantization noise ratio (SQNR) is a measure to judge the effect of quantization. SQNR is computed as: SQNR = 10*log(signal power/quant err pwr) where the logarithm is taken to the base 10 and SQNR is measured in dB (decibels). Signal to Quantization Noise Ratio If the signal power is 10 times higher than the quantization error power, then the SQNR is 10 dB, if it is a 100 times higher, the SQNR is 20 dB, if it is a 1000 times higher, the SQNR is 30 dB, etc. For quantization of a sinusoid to k bits we have SQNR = 6.02 x k + 1.76 dB Quantization of Sinewaves Quantization SQNR [dB] Sound 16 bits 98.1 q1000_16.wav 8 bits 49.9 q1000_8.wav 4 bits 25.8 3 bits 19.8 2 bits 13.8 1 bit 7.8 q1000_4.wav q1000_3.wav q1000_2.wav q1000_1.wav Quantization of Speech/Music Quantization SQNR [dB] 16 bits ~84 8 bits 35.0 6 bits 22.6 4 bits 10.1 2 bits -3.5 Sound tlp_16.wav tlp_8.wav tlp_6.wav tlp_4.wav tlp_2.wav SQNR for 16-bit Quantization 16-Bit quantization (e.g., for CD) yields approximately 90 dB SQNR. What does that mean? 10*log(signal pwr/noise pwr) = 90 Î Signal pwr/noise pwr = 10^9 That is, the quantization noise power is only one billionth of the signal power. Common Parameters for Sound The table on the next slide shows common sampling rates Fs and quantizations Q (in bits/sample) for sound waveforms. The uncompressed rate R (in bytes/sec) is computed using: R = #channels x Fs x Q / 8 Common Parameters for Sound Application Speech (Telephony) Speech # of Fs Channels in Hertz 1 8000 Q Rate R in bits in kB/sec 8 8 1 11025 8 11.025 Music Low Quality Speech High Quality Music, Stereo 1 11025 16 22.05 1 22050 16 44.1 2 22050 16 88.2 Music, Hi-Fi 1 44100 16 88.2 Music Hi-Fi Stereo 2 44100 16 176.4 The MPEG Standards MPEG stands for Moving Picture Experts Group. This group works on standards for coding of moving images and sound. MPEG standards can be obtained from ISO (International Standards Organization) or, in the US, from ANSI (American National Standards Institute). The MPEG Standards MPEG-1: Standard for compression and coding of relatively low resolution video (352x240 pixels, 30 frames/s) at 1.152 Mbits/sec (or 144 kB/sec) for CD-ROM. MPEG-2: Is an extension of MPEG-1 for high-quality digital video using rates in the range 1.5 ... 15 Mbits/sec. The MPEG Standards MPEG-3: Was once intended for HDTV (High Definition Television) applications, but HDTV is now part of MPEG-2. Thus MPEG-3 efforts were abandoned. MPEG-4: Intended for very low bitrate coding of audio-visual programs, e.g., for interactive mobile multimedia applications at rates up to 64 kbits/sec. The MPEG Standards MPEG-1 (and MPEG-2) specify a family of three audio coding schemes, called layer-1, layer-2, and layer-3. From layer-1 to layer-3, encoder complexity and performance (sound quality per bitrate) are increasing. Layer-1: From 32 kbit/sec to 448 kbit/sec Layer-2: From 32 kbit/sec to 384 kbit/sec Layer-3: From 32 kbit/sec to 320 kbit/sec MP3: Typical Compression Ratios Layer-1: 4:1 (typ. rate 384 kbps) Layer-2: 6:1…8:1 (typ. rate 224 kbps) Layer-3: 10:1…12:1 (typ. rate 128 kbps) MP3 Compression Techniques Minimal Audition Threshold: Is not linear, ear is most sensitive between 2 and 5 kHz. Sounds below threshold need not be retained and coded. Masking Effect: During strong sounds you do not hear the weakest sounds. Thus, using psychoacoustic modeling not all sounds need to be coded. MP3 Compression Techniques Reservoir of Bytes: Some passages may not be codeable at a given rate without altering musical quality. MP 3 then “borrows” bytes from other passages that can be coded at lower rate. Huffman Coding: Is used to code the information into variable length codewords after compression. Summary Sound waveforms need to be sampled and quantized for computers. Sampling converts continuous time to discrete time. Aliasing must be avoided. Quantization converts amplitude to finite wordlength value. Keep SQNR low. File size (uncompressed) in bytes is: #channels x Fs x bits/sample x time(sec) / 8