Speech Coding

advertisement
Speech Coding
Using LPC
What is Speech Coding
 Speech coding is the procedure of transforming
speech signal into more compact form for


Transmission
 Available Bandwidth
Encryption
Uncompressed Speech signal
 Analog speech is a bandpassed signal between
200 and 3400 Hz.
 Uncompressed digital speech is a bit stream at
64kB/s.
 Transmission technology must

transmit the signals from point A to point B:
 with minimum degradation
 using minimum bandwidth
Speech coding
 By coding we mean an efficient representation of
the signal
– COMPRESSION
 The main approaches:
waveform coding
smart quantizers
 transform coding
 Parametric / hybrid coding

}
How each of these works:
 Waveform coders:
try to find an efficient
representation of the
waveform, directly.
 Transform coders:
try to find an efficient
representation in the
frequency domain.
FFT, etc.
 Parametric coders:
try to find a small set of
parameters that are an
efficient representation of
the signal.
exc.
H ( )
speech
Comparison of speech coders
LPC (Linear Predictive coding)
 LPC is a model for signal production: it is based
on the assumption that the speech signal is
produced by a very specific model.
Speech Production in Humans
 The speech signal is
created by:


A pressure source (lungs),
exciting ...
A Filter (Vocal tract:
pharynx - mouth [soft
palate, tongue] - nasal
cavity)
For DSP Engineer
 An excitation source
 A time varying filter
filter:
Excitation
speech
H(t, )
The model and its representation
 The LPC model looks at
speech as:


Excitation:
 periodic (voiced) originating in the
larynx
 noise (unvoiced) fricative, produced in
the mouth
An all-pole filter
representing the vocal
tract
.. ..
all pole
filter:
H()
Block Diagram
Why the name
“Linear Predictive Coding”
 It is assumed that the new sample is the
weighted linear combination of previous
samples
p
s (n)   a s (n  i )  Ge(n)
i
i 1
Z-Plane Representation
 In the z-plane we can write the model as a transfer
function:
H(z) 
G
p
1   ai z  i
i 1
• Clearly this transfer function has only poles which is why it represents an all pole filter.
Mathematical analysis
 Reminder: our problem is to find the LPC
parameters, for a given speech signal. This is
called the Inverse Problem.
 How do we find the set of parameters that
gives the best match to the signal?
What are these Parameters
 The Coefficients of the All Pole Filter
 Pitch of the speech
 How do we find the Coefficients:

least squares
 Formulation:



Given a signal s(n);
Defining an error as:
Find the set of
square error:
ai
e(n)  s(n)   ai s(n  i)
i 1
that will minize the mean
E   e2 ( n)
n
p
Solution:
 Simply equate the derivative of E to zero:
E
 0, i  1... p
ai
• Which gives us the Normal Equations:
p
 a  s(n  k )s(n  i)   s(n)s(n  i), i  1...p
k 1
k
n
n
• These are no more than p linear equations in
p unknowns...
Or in matricial form:
  s(n  1) s(n  1)
 n
  s(n  2) s(n  1)
n



 s(n  p) s(n  1)
n
 s(n 1)s(n  2)
 s(n  2)s(n  2)

n
n

n


 s(n  p)s(n  2) 
n
 s(n 1)s(n  p)   a    s(n 1)s(n) 
 s(n  2)s(n  p)   a    s(n  2)s(n) 
n
1
2
  

 a 
n s(n  p)s(n  p)  p 
n

n





s
(
n

p
)
s
(
n
)


n

What is each element of the form-
 s(n  k )s(n  i)?
n
 A correlation; in other words:

take the signal, multiply it by a shifted version, and sum.
 Since our signal is long and time varying- we did it on
short windows
 Two variants:


autocorrelation method
covariance method
Solving the Matrix
 Found the Coefficients a(i) by Using the
Levinson-Durbin recursion method
Second Parameter
 Pitch was found by the finding the
correlation of the signal window with
itself
 Then these parameters were transmitted
Predictor coefficients
Gain
Pitch period
Voiced/unvoiced
switch
Total
Overall bit rate
18 * 8 = 144
5
6
1
156
50 * 156 = 7800
bits / second
Bit rate for plain LPC vocoder
Predictor
coefficients
Gain
DCT
coefficients
Total
Overall bit
rate
18 * 8 = 144
5
40 * 4 = 160
309
50 * 309 =
15450 bits /
second
Bit rate for voice-excited LPC vocoder with DCT
Conclusion
 Sound produced through LPC method is
not exactly the real sound but it sounds
intelligibly understandable
 LPC can be used in Speech recognition
systems
 LPC was widely used in Military because
of low bit rate in transmission
 There are many variants over the basic
scheme: LPC-10, CELP, MELP, RELP,
VSELP, ASELP, LD-CELP...
Download