Digitized Sound Sampling of Waveforms

advertisement
Digitized Sound
Telecommunications 1
P. Mathys
Sampling of Waveforms
Computers cannot directly deal with
continuous-time (CT) waveforms.
„ A CT waveform needs to be sampled at
regular time intervals before it can be
stored and processed by a computer.
„ The sampling operation converts the CT
signal into a discrete-time (DT) signal
or sequence.
„
Sampling of Waveforms
The next slide shows a sinewave and its
samples after sampling with rate
Fs = 16000 Hz (16000 samples/sec)
„ The samples are marked with “o” in the
graph.
„ Each sample is a real number, e.g.,
0.70710678 and a sampled waveform is
a sequence of such numbers.
„
Sampling of 1000 Hz Sinewave
16000 Samples/sec ==> Fs = 16000 Hz
16 Samples/Period
Sampling at Higher Rate
„
If we sample at a higher rate, we expect
to
„ Need
more memory.
„ Need larger files to store all samples.
„ Takes longer to transmit samples over
network.
„ Reduce approximation error that results
from sampling a CT signal.
Sampling at Higher Rate
32000 Samples/sec ==> Fs = 32000 Hz
32 Samples/Period
Sampling at Lower Rate
If we sample at a lower rate, we will use
less storage space and transmission,
e.g., over the Internet, will be faster.
„ But how much will the quality degrade?
„ What is the minimum sampling rate that
is needed?
„
Sampling at Lower Rate
8000 Samples/sec ==> Fs = 8000 Hz
8 Samples/Period
Using Different Sampling Rates
Sampling Rate
Fs
File Size
(1 sec, 16 bits)
Sound
Sample
8,000 Hz
16 kB
sin1000_8.wav
16,000 Hz
32 kB
sin1000_16.wav
32,000 Hz
64 kB
sin1000_32.wav
What Sampling Rate is Best?
The previous three examples of
sampling a 1000 Hz tone at Fs=8000
Hz, Fs=16000 Hz, and Fs=32000 Hz
show no difference in sound quality.
„ But let’s look at another example where
we sample a 5000 Hz tone at Fs=16000
Hz and Fs=8000 Hz.
„
5000 Hz Sinewave, Fs=16000 Hz
sin5000_16.wav
16 samples in 1 msec, 3.2 samples per period
5000 Hz Sinewave, Fs=8000 Hz
Fs=8000
sin5000_8.wav
Fs=16000
sin5000_16.wav
Leaving out every second sample (x) we have
8 samples (o) in 1 msec, 1.6 samples per period
5000 Hz Sinewave
The 5000 Hz sinewave sounds different
at Fs=8000 Hz and at Fs=16000 Hz.
„ The reason is that the soundcard, which
converts the samples back to a CT
waveform, tries to find the “smoothest”
(i.e., the lowest frequency) waveform
that passes through all samples as
shown in the next slide.
„
5000 Hz Sinewave, Fs=8000 Hz
Green
sin5000_8.wav
Blue
sin5000_16.wav
Green curve is “smoother” than blue (dashed) curve.
Aliasing
The effect whereby tone frequencies
are altered because of a reduction in
sampling rate is called aliasing.
„ Aliasing affects all sampled sound
sequences, whether they be pure tones,
music or speech. Music examples:
„
Original
Fs=44100 Hz
muss44.wav
w/o Aliasing
Fs=11025 Hz
muss11.wav
with Aliasing
Fs=11025 Hz
muss11_ali.wav
Aliasing
Aliasing folds
high frequency
components
down to low
frequencies.
This is visible
and audible
especially well
in the portions
marked in red.
muss11_ali_orig.wav
With Aliasing
Original
muss11_ali44s.wav
muss44s.wav
Aliasing
To prevent
aliasing from
occurring, the
sound file
needs to be
lowpass-filtered
before the
sampling rate
is reduced.
muss11_ali_noali.wav
With Aliasing
No Aliasing
muss11_ali44s.wav
muss11_44s.wav
Nyquist Rate
Let B (in Hz) be the highest frequency
contained in a sound waveform.
„ Sampling Theorem: To avoid distortion
due to aliasing, a signal of bandwidth B
must use a sampling rate Fs>2B.
„ The sampling rate of 2B samples/sec is
called Nyquist rate.
„
Common Sampling Rates
Application
Sampling Rate
Fs [Hz]
Bandwidth
B [Hz]
Telephony
8000
3000
Music
Low Quality
11025
5000
Music
22050
10000
44100
20000
48000
22000
Music
Hi-Fi
DAT
Digital Audio Tape
Quantization
Sampling alone is not sufficient to
convert a waveform to a format that a
computer can handle.
„ The problem is that each sample is a
real number that requires infinite
precision for processing and storing.
„ Computers have finite word length and
can only approximate real numbers.
„
Quantization
For example,
pi = 3.14159265358979323846264…
is a real number. On a computer that
can represent 5 decimal digits we would
approximate pi as 3.1416.
„ Typical computer word lengths are 8,
16, 32, and 64 bits (approximately 2, 4,
9, and 19 decimal digits).
„
Quantization
The process of approximating a real
number by a number with finite
wordlength is called quantization.
„ Unlike sampling, quantization is not a
reversible process.
„ However, by choosing a large enough
wordlength, the quantization error can
be made as small as desired.
„
Quantization: Examples
„
The next slides show the quantization of
a sine wave to 2, 3, and 4 bits:
„2
Bits (4 levels): 00, 01, 10, 11
„ 3 Bits (8 levels): 000, 001, 010, 011, 100,
101, 110, 111
„ 4 Bits (16 levels): 0000, 0001, 0010, 0011,
0100, 0101, 0110, 0111, 1000, 1001, 1010,
1011, 1100, 1101, 1110, 1111
Quantization: Example (2 Bits)
16 bits
q1000_16.wav
11
10
01
00
Digital sequence of samples (Fs=16000 Hz):
10,11,11,11,11,11,11,10,01,00,00,00,00,...
2 bits
q1000_2.wav
Quantization: Example (2 Bits)
Quantization is
a non-linear
operation that
introduces new
frequency
components.
Here the new
components
appear at odd
multiples of the
fundamental
frequency
q1000h_16_2.wav
16 bits
2 bits
q1000_16.wav
q1000_2.wav
Quantization: Example (3 Bits)
16 bits
111
110
101
100
011
010
001
000
Digital sequence of samples (Fs=16000 Hz):
100,110,111,111,111,111,110,100,011,001,...
q1000_16.wav
3 bits
q1000_3.wav
Quantization: Example (4 Bits)
16 bits
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
q1000_16.wav
4 bits
q1000_4.wav
Digital sequence of samples (Fs=16000 Hz):
1001,1100,1110,1111,1111,1110,1100,1001,0110,...
Quantization Error
The difference q(t)-y(t) between the
quantized signal and the original signal
(shown dotted in red in the previous
slides) is called quantization error.
„ The quantization error becomes larger if
fewer bits are used. Quantization is a
nonlinear process that introduces new
and unwanted frequency components.
„
Signal to Quantization Noise Ratio
The signal to quantization noise ratio
(SQNR) is a measure to judge the effect
of quantization.
„ SQNR is computed as:
„
SQNR = 10*log(signal power/quant err pwr)
where the logarithm is taken to the base
10 and SQNR is measured in dB
(decibels).
Signal to Quantization Noise Ratio
If the signal power is 10 times higher
than the quantization error power, then
the SQNR is 10 dB, if it is a 100 times
higher, the SQNR is 20 dB, if it is a 1000
times higher, the SQNR is 30 dB, etc.
„ For quantization of a sinusoid to k bits
we have
„
SQNR = 6.02 x k + 1.76 dB
Quantization of Sinewaves
Quantization
SQNR [dB]
Sound
16 bits
98.1
q1000_16.wav
8 bits
49.9
q1000_8.wav
4 bits
25.8
3 bits
19.8
2 bits
13.8
1 bit
7.8
q1000_4.wav
q1000_3.wav
q1000_2.wav
q1000_1.wav
Quantization of Speech/Music
Quantization
SQNR [dB]
16 bits
~84
8 bits
35.0
6 bits
22.6
4 bits
10.1
2 bits
-3.5
Sound
tlp_16.wav
tlp_8.wav
tlp_6.wav
tlp_4.wav
tlp_2.wav
SQNR for 16-bit Quantization
16-Bit quantization (e.g., for CD) yields
approximately 90 dB SQNR.
„ What does that mean?
„
10*log(signal pwr/noise pwr) = 90
Î Signal pwr/noise pwr = 10^9
ƒThat is, the quantization noise power is
only one billionth of the signal power.
Common Parameters for Sound
The table on the next slide shows
common sampling rates Fs and
quantizations Q (in bits/sample) for
sound waveforms.
„ The uncompressed rate R (in bytes/sec)
is computed using:
„
R = #channels x Fs x Q / 8
Common Parameters for Sound
Application
Speech
(Telephony)
Speech
# of
Fs
Channels in Hertz
1
8000
Q
Rate R
in bits in kB/sec
8
8
1
11025
8
11.025
Music
Low Quality
Speech
High Quality
Music, Stereo
1
11025
16
22.05
1
22050
16
44.1
2
22050
16
88.2
Music, Hi-Fi
1
44100
16
88.2
Music
Hi-Fi Stereo
2
44100
16
176.4
The MPEG Standards
MPEG stands for Moving Picture
Experts Group. This group works on
standards for coding of moving images
and sound.
„ MPEG standards can be obtained from
ISO (International Standards
Organization) or, in the US, from ANSI
(American National Standards Institute).
„
The MPEG Standards
MPEG-1: Standard for compression and
coding of relatively low resolution video
(352x240 pixels, 30 frames/s) at 1.152
Mbits/sec (or 144 kB/sec) for CD-ROM.
„ MPEG-2: Is an extension of MPEG-1 for
high-quality digital video using rates in
the range 1.5 ... 15 Mbits/sec.
„
The MPEG Standards
MPEG-3: Was once intended for HDTV
(High Definition Television) applications,
but HDTV is now part of MPEG-2. Thus
MPEG-3 efforts were abandoned.
„ MPEG-4: Intended for very low bitrate
coding of audio-visual programs, e.g.,
for interactive mobile multimedia
applications at rates up to 64 kbits/sec.
„
The MPEG Standards
MPEG-1 (and MPEG-2) specify a family
of three audio coding schemes, called
layer-1, layer-2, and layer-3.
„ From layer-1 to layer-3, encoder
complexity and performance (sound
quality per bitrate) are increasing.
„
Layer-1: From 32 kbit/sec to 448 kbit/sec
„ Layer-2: From 32 kbit/sec to 384 kbit/sec
„ Layer-3: From 32 kbit/sec to 320 kbit/sec
„
MP3: Typical Compression Ratios
Layer-1: 4:1 (typ. rate 384 kbps)
„ Layer-2: 6:1…8:1 (typ. rate 224 kbps)
„ Layer-3: 10:1…12:1 (typ. rate 128 kbps)
„
MP3 Compression Techniques
Minimal Audition Threshold: Is not
linear, ear is most sensitive between 2
and 5 kHz. Sounds below threshold
need not be retained and coded.
„ Masking Effect: During strong sounds
you do not hear the weakest sounds.
Thus, using psychoacoustic modeling
not all sounds need to be coded.
„
MP3 Compression Techniques
Reservoir of Bytes: Some passages
may not be codeable at a given rate
without altering musical quality. MP 3
then “borrows” bytes from other
passages that can be coded at lower
rate.
„ Huffman Coding: Is used to code the
information into variable length
codewords after compression.
„
Summary
Sound waveforms need to be sampled
and quantized for computers.
„ Sampling converts continuous time to
discrete time. Aliasing must be avoided.
„ Quantization converts amplitude to finite
wordlength value. Keep SQNR low.
„ File size (uncompressed) in bytes is:
„
#channels x Fs x bits/sample x time(sec) / 8
Download