Audio

advertisement
Audio Coding
Ketan Mayer-Patel
CS 294-9 :: Fall 2003
Overview of Today
• PCM
– Linear
– m-LaW
•
•
•
•
DPCM
ADPCM
MPEG-1
Vocoding
Sampling Techniques
Generic Coding Techniques
Psychoacoutic Coding
Speech Specific Techniques
CS 294-9 :: Fall 2003
Audio Signals
• Analog audio is basically voltage as a continuous function
of time.
• Unlike video which is 3D, audio is a 1D signal.
– Can capture without having to discretize the higher dimensions.
• Audio sampling basically boils down to quantizing signal
level to a set of values.
• Digital audio parameters:
– bits per sample
– sampling rate
– number of channels.
CS 294-9 :: Fall 2003
Sampling
• Pulse Amplitude Modulation (PAM)
– Each sample’s amplitude is represented by 1 analog value
• Sampling theory (Nyquist)
– If input signal has maximum frequency (bandwidth) f,
sampling frequency must be at least 2f
– With a low-pass filter to interpolate between samples, the
input signal can be fully reconstructed
CS 294-9 :: Fall 2003
PCM
0100
0011
0010
0001
0000
1001
1010
1011
1100
Quantization error (“noise”)
• Pulse Code Modulation (PCM)
– Each sample’s amplitude represented by an integer code-word
– Each bit of resolution adds 6 dB of dynamic range
– Number of bits required depends on the amount of noise that is tolerated
SNR – 4.77
n =
6.02
CS 294-9 :: Fall 2003
Linear PCM
•
•
•
•
Uses evenly spaced quantization levels.
Typically 16-bits per sample.
Provides a large dynamic range.
Difficult for humans to perceive
quantization noise.
• Compact Disks
– 16-bit linear sampling
– 44.1 KHz sampling rate
– 2 channels
CS 294-9 :: Fall 2003
Non-linear Sampling
• If we try to use 8 bits per sample, dynamic
range is reduced significantly and
quantization noise can be heard.
• In particular, we end up with not enough
levels for the lower amplitudes.
• Solution is to sample more densely in the
lower amplitudes and less densely for the
higher amplitudes.
• Sort of like a log scale.
CS 294-9 :: Fall 2003
Non-linear Sampling Illustrated
Output
Input
CS 294-9 :: Fall 2003
m-law and A-law
• Non-linear sampling called “companding”
• 8-bits companded provides dynamic range
equivalent to 12-bits.
• U-law and A-law are companding standards
defined in G.711
• Difference is in exact shape of piece-wise
linear companding function.
CS 294-9 :: Fall 2003
m -Law companding
• Provides 14-bit quality (dynamic range) with an 8-bit
encoding
• Used in North American & Japanese ISDN voice service
• Simple to compute encoding
ln(1 + m|x|)
f(x) = 127 x sign(x) x
ln(1 + m)
CS 294-9 :: Fall 2003
(x normalized to [-1, 1])
m -Law Encoding
High-resolution
PCM encoding
(12, 14, 16 bits)
8-bit
m-Law
encoding
Table
Lookup
Inverse
Table
Lookup
Sender
Input
Amplitude
0-1
1-3
Step
Size
1
...
2
Segment
000
...
...
...
010
1111
0000
31
32
...
011
1111
0000
47
48
...
CS 294-9 :: Fall
2003
1111
...
...
...
16
15
16
...
215-223
223-239
001
1111
0000
...
...
8
...
91-95
95-103
Code
Value
0
1
...
...
4
Quantization
0000
0001
...
29-31
31-35
463-479
Receiver
63
14-bit
decoding
m -Law Decoding
High-resolution
PCM encoding
(12, 14, 16 bits)
Table
Lookup
8-bit
m-Law
encoding
Sender
Inverse
Table
Lookup
14-bit
decoding
Receiver
Multiplier
...
2
1
0001111
0010000
30
33
...
...
4
0011111
0100000
93
99
...
...
8
0101111
0110000
219
231
16
...
...
CS 294-9 :: Fall0111111
2003
Decode
Amplitude
0
2
...
m-Law
Endoding
0000000
0000001
471
Difference Encoding
0100
0011
0010
0001
0000
1001
1010
1011
1100
• Differential-PCM (DPCM)
– Exploit temporal redundancy in samples
– Difference between 2 x-bit samples can be represented
with significantly fewer than x-bits
– Transmit the difference (rather than the sample)
CS 294-9 :: Fall 2003
Slope Overload Problem
0100
0011
0010
0001
0000
1001
1010
1011
1100
“Slope Overload”
• Differences in high frequency signals near the
Nyquist frequency cannot be represented with a
smaller number of bits!
– Error introduced leads to severe distortion in the higher
frequencies
CS 294-9 :: Fall 2003
Adaptive DPCM (ADPCM)
• Use a larger step-size to encode differences
between high-frequency samples & a smaller stepsize for differences between low-frequency
samples
• Use previous sample values to estimate changes in
the signal in the near future
CS 294-9 :: Fall 2003
ADPCM
• To ensure differences are always small...
– Adaptively change the step-size (quanta)
– (Adaptively) attempt to predict next sample
value
y-bit
PCM
sample
+
–
Difference
Quantizer
+
Predicted
PCM
Sample n+1
Predictor
Step-Size
Adjuster
+
+
+
Dequantizer
CS 294-9 :: Fall 2003
x-bit
ADPCM
“difference”
IMA’s proposed ADPCM
16-bit
PCM
sample
+
–
Difference
Quantizer
+
PCM
Sample n–1
Register
4-bit
ADPCM
difference
Step-Size
Adjuster
+
+
+
Dequantizer
• Predictor is not adaptive and simply uses the last
sample value
• Quantization step-size increases logarithmically
with signal frequency
CS 294-9 :: Fall 2003
IMA Difference Quantization
16-bit
PCM
sample
+
–
Difference
Quantizer
+
PCM
sample
n–1
Register
Quantization
Step-Size
Adjuster
+
+
+
4-bit
ADPCM
difference
(in step-size units)
Dequantizer
Quantizer
Output
difference < 1 4 step_size
1 step_size < difference < 1 step_size
4
2
1 step_size < difference < 3 step_size
2
4
3 step_size < difference < step_size
4
step_size < difference < 5 4 step_size
5 step_size < difference < 3 step_size
4
2
3 step_size < difference < 7 step_size
2
CS 4294-9 :: Fall 2003
7 step_size < difference
4
000
001
010
011
100
101
110
111
Step-Size
Multiples
0.0
0.25
0.50
0.75
1.0
1.25
1.5
1.75
IMA Step-size Table
Index
Step
Size
Index
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
7
8
9
10
11
12
13
14
16
17
19
21
23
25
28
31
34
37
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Step
Size
41
45
50
55
60
66
73
80
88
97
107
118
130
143
157
173
190
209
Index
Step
Size
Index
Step
Size
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
230
253
279
307
337
371
408
449
494
544
598
658
724
796
876
963
1060
1166
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
1282
1411
1552
1707
1878
2066
2272
2499
2749
3024
3327
3660
4026
4428
4871
5358
5894
6484
CS 294-9 :: Fall 2003
Index
Step
Size
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
7132
7845
8630
9493
10442
11487
12635
13899
15289
16818
18500
20350
22358
24623
27086
29794
32767
Adaptive Step-size Selection
16-bit
PCM
Sample
+
–
Difference
Quantizer
+
(in step-size units)
PCM
Sample
n–1
New Step-Size
Step-Size
Adjuster
+
+
Register
Step-Size
Table
Lookup
4-bit
ADPCM
difference
+
Dequantizer
Range Limit
(0 to 88)
+
Previous
Index
Register
Index
Adjustment
CS 294-9 :: Fall 2003
Step-Size
Table Index
Adjustment
Lookup
Quantizer
Output
Adaptive Step-size Selection
Step-Size
Table
Lookup
Range Limit
(0 to 88)
+
Previous
Index
Register
New Step-Size
Quantization
Step-Size
Table Index
Index
Adjustment Adjustment
Lookup
Quantizer Step-Size Table
Output Index Adjustment
difference < 1 4 step_size
000
1 step_size < difference < 1 step_size
001
4
2
1 step_size < difference < 3 step_size
010
2
4
3 step_size < difference < step_size
011
4
step_size < difference < 5 4 step_size
100
5 step_size < difference < 3 step_size
101
4
2
3 step_size < difference < 7 step_size
110
2
4
7 step_size < difference
111
CS 294-9 :: Fall 2003
4
-1
-1
-1
-1
2
4
6
8
Difference
Quantizer
Step-Size
Adjustment
X 0.91
X 0.91
X 0.91
X 0.91
X 1.21
X 1.46
X 1.77
X 2.14
IMA ADPCM Example
X
150
155
167
170
250
250
250
250
200
200
200
200
200
200
200

5
13
4
80
55
1
1
-49
Step
7
7
7
16
14
31
66
60
55
Q
010
111
001
111
111
000
000
011
Adj I
0
-1 0
8 8
-1 7
8 15
8 23
-1 22
-1 21
-1 20
M

Decode
150
0.5 3.5
154
1.75 12
166
0.25 4
170
1.75 24.5 195
1.75 54
249
0.0 0
249
0.0 0
249
0.75 -41 208
CS 294-9 :: Fall 2003
Xn
+
Difference
Quantizer
+
–
Step-Size
Adjuster
Xn–1
+
Register
+
+
Dequantizer
Networking Considerations
Dequantizer
+
The IMA codec is
reasonably robust to errors
+
+
Step-Size
Adjuster
Quantization
PCM
sample n–1
An interval with a low-level
signal will correct any stepsize error
Register
Quantizer
Output
difference < 1 4 step_size
000
1 step_size < difference < 1 step_size
001
4
2
1 step_size < difference < 3 step_size
010
2
4
3 step_size < difference < step_size
011
4
step_size < difference < 5 4 step_size
100
5 step_size < difference < 3 step_size
101
4
2
3 step_size < difference < 7 step_size
110
2
4
7 step_size < difference
CS 294-9 :: Fall 2003 111
4
Step-Size Table
Index Adjustment
-1
-1
-1
-1
2
4
6
8
Psychoacoustic Properties
100
Sound
Level
(dB)
Audible
80
60
40
20
Inaudible
0
0.02 0.05 0.1 0.2 0.5 1
2
5 10 20
Frequency
(kHz)
• Human perception of sound is a function of frequency
and signal strength
– (MPEG exploits this relationship.)
CS 294-9 :: Fall 2003
Auditory Masking
100
Sound
Level
(dB)
Audible
80
Masking tone
60
40
20
Masked tone
Inaudible
0
0.02 0.05 0.1 0.2 0.5 1
2
5 10 20
Frequency
(kHz)
• The presence of tones at certain frequencies makes
us unable to perceive tones at other “nearby”
frequencies
– Humans cannot distinguish between tones within 100
Hz at low frequencies and 4 kHz at high frequencies
CS 294-9 :: Fall 2003
MPEG Encoder Block Diagram
PCM
Audio
Samples
(32, 44.1,
48 kHz)
Mapping
Quantizer
Psychoacoutstic
Model
Coding
Frame
Packing
Ancillary Data
CS 294-9 :: Fall 2003
Encoded
Bitstream
Subband Filter
• Transforms signal from time domain to
frequency domain.
– 32 PCM samples yields 32 subband samples.
• Each subband corresponds to a freq. band evenly
spaced from 0 to Nyquist freq.
– Filter actually works on a window of 512
samples that is shifted over 32 samples at a time.
• Subband coefficients are analyzed with
psychoacoustic model, quantized, and coded.
CS 294-9 :: Fall 2003
Layer 1
• 384 samples per frame.
• Iterative bit allocation process:
– For each subband, determine MNR.
– Increase number of quantization bits for
subband with smallest MNR.
– Iterate until all bits used.
• Fixed allocation of bits among subbands for
a particular frame.
• Up to 448 kb/s
CS 294-9 :: Fall 2003
Layer 2
•
•
•
•
1152 samples per frame.
Iterative bit allocation.
Subband allocation is dynamic.
Up to 384 kb/s
CS 294-9 :: Fall 2003
Layer 3
• 1152 samples
– Up to 320 kb/s
• Each subband further analyzed using MDCT
to create 576 frequency lines.
– 4 different windowing schemes depending on
whether samples contain “attack” of new
frequencies.
• Lots of bit allocation options for quantizing
frequency coefficients.
• Quantized coefficients Huffman coded.
CS 294-9 :: Fall 2003
Vo-coding
• Concept: Develop a mathematical
model of the vocal cords & throat
– Derive/compute model parameters for
a short interval and transmit to the
decoder
– Use the parameters to synthesize
speech at the decoder
• So what is a good model?
– A “buzzer” in a “tube”!
– The buzzer is characterized by its
intensity & pitch
– The tube is characterized by its
formants
CS 294-9 :: Fall 2003
Vocoding - Basic Concepts
75
Amplitude
60
45
30
15
Frequency
(kHz)
0
• Formant — frequency maxima & minima in
the spectrum of the speech signal
• Vocoders group and code portions of the
signal by amplitude
CS 294-9 :: Fall 2003
“Buzzer” and “Tube” Model
“yadda yadda yadda”
• Vocoding principles:
– voice = formants + buzz pitch & intensity
– voice – estimated formants = “residue”
• Linear Predictive Coding (LPC)
– A sample is represented as a linear combination of p
previous samples
p
y(n) =

ak y(n – k) + G x x(n)
k=1
CS 294-9 :: Fall 2003
LPC
• Decoder artificially generates speech via formant synthesis
– A mathematical simulation of the vocal tract as a series of bandpass
filters
– Encoder codes & transmit filter coefficients, pitch period, gain
factor, & nature of excitation
• Standards:
– Regular Pulse Excited Linear Predictive Coder (RPE-LPC)
• Digital cellular standard GSM 6.1 (13 kbps)
– Code Excited Linear Predictive Coder (CELP)
• US Federal Standard 1016 (4.8 kbps)
– Linear Predictive Coder (LPC)
• US Federal Standard 1015 (2.4 kbps)
CS 294-9 :: Fall 2003
Networking Concerns
• Audio bandwidth is actually quite small.
• But human sensitivity to loss and noise is
quite high.
• Netwoking concerns:
– Loss concealment
– Jitter control
• Especially for telephony applications.
CS 294-9 :: Fall 2003
Download