ch7 (LPC).ppt

advertisement
Linear Prediction
1
Linear Prediction (Introduction):

The object of linear prediction is to estimate the
output sequence from a linear combination of
input samples, past output samples or both :
q
p
j 0
i 1
yˆ (n)   b( j ) x(n  j )   a(i) y(n  i)

The factors a(i) and b(j) are called predictor
coefficients.
2
Linear Prediction (Introduction):

Many systems of interest to us are describable by a
linear, constant-coefficient difference equation :
p
q
i 0
j 0
 a(i) y(n  i)   b( j ) x(n  j )

If Y(z)/X(z)=H(z), where H(z) is a ratio of
polynomials N(z)/D(z), then
q
p
j 0
i 0
N ( z )   b( j ) z  j and D( z )   a(i ) z i

Thus the predicator coefficient given us immediate access to the poles
and zeros of H(z).
3
Linear Prediction (Types of System Model):

There are two important variants :

All-pole model (in statistics, autoregressive (AR) model
):


All-zero model (in statistics, moving-average (MA)
model ) :


The numerator N(z) is a constant.
The denominator D(z) is equal to unity.
The mixed pole-zero model is called the autoregressive
moving-average (ARMA) model.
4
Linear Prediction (Derivation of LP equations):

Given a zero-mean signal py(n), in the AR model :
yˆ (n)   a(i) y(n  i)

The error is :
i 1
e(n)  y (n)  yˆ (n)
p
  a (i ) y (n  i )
i 0

To derive the predicator we use the orthogonality principle,
the principle states that the desired coefficients are those
which make the error orthogonal to the samples y(n-1),
y(n-2),…, y(n-p).
5
Linear Prediction (Derivation of LP equations):

Thus we require that
 y(n  j )e(n)  0 for j  1, 2, ..., p

Or,

Interchanging the operation of averaging and summing,
and representing < > by summing over n, we have
p
 a(i) y(n  i) y(n  j)  0, j  1,..., p

i 0
n
The required predicators are found by solving these
equations.
6
Linear Prediction (Derivation of LP equations):

The orthogonality principle also states that resulting
minimum error is given by
E  e ( n )  y ( n ) e( n )
2

Or,
p
 a(i) y(n  i) y(n)  E
i 0

n
We can minimize the error over all time :

p
 a(i)r
i 0
p

i j
 a(i)r
i 0

i
 0, j  1,2, ...,p
E
where
7
Linear Prediction (Applications):

Autocorrelation matching :

We have a signal y(n) with known autocorrelation ryy (n)
. We model this with the AR system shown below :
e(n)
z (n)
σ
1-A(z)
H ( z) 

A( z )


p
1   ai z i
i 1
8
Linear Prediction (Order of Linear Prediction):

The choice of predictor order depends on the
analysis bandwidth. The rule of thumb is :
For a normal vocal tract, there is an average of
about one formant per kilohertz of BW.
 One formant require two complex conjugate poles.
 Hence for every formant we require two predicator
coefficients, or two coefficients per kilohertz of
bandwidth.

9
Linear Prediction (AR Modeling of Speech Signal):

True Model:
Pitch
Gain
s(n)
DT
Voiced Impulse
generator
G(z)
Glottal
Filter
Speech
Signal
U(n)
Voiced
V
Volume
velocity
U
H(z)
Vocal tract
Filter
R(z)
LP
Filter
Uncorrelated
Unvoiced
Noise
generator
Gain
10
Linear Prediction (AR Modeling of Speech Signal):

Using LP analysis :
Pitch
Gain
DT
Voiced Impulse
generator
estimate
Speech
V
U
White
Noise
Unvoiced
generator
s(n)
All-Pole
Filter
(AR)
Signal
H(z)
11
3.3 LINEAR PREDICTIVE CODING
MODEL FOR SREECH
RECOGNITION
u (n)

s (n)
A(z )
G
12
3.3.1 The LPC Model
s (n)  a1 s (n  1)  a 2 s (n  2)  ...  a p s (n  p ),
Convert this to equality by including an excitation term:
p
s (n)   ai s (n  i )  Gu(n),
i 1
p
S ( z )   ai z i S ( z )  GU ( z )
i 1
S ( z)
H ( z) 

GU ( z )
1
p
1   a i z i
1

.
A( z )
i 1
13
3.3.2 LPC Analysis Equations
p
s (n)   ak s (n  k )  Gu( n).
k 1
~
p
s (n)   ak s (n  k ).
k 1
The prediction error:
p
~
e( n )  s ( n )  s ( n )  s ( n )   ak s ( n  k )
k 1
Error transfer function:
p
E( z)
k
A( z ) 
 1   ak z .
S ( z)
k 1
14
3.3.2 LPC Analysis Equations
S n ( m)  s ( n  m)
e n ( m)  e( n  m)
We seek to minimize the mean squared error signal:
En 
2
e
 n ( m)
m
2


E n    s n ( m)   a k s n ( m  k )  .
m 
k 1

p
15
E n
 0,
a k
s
k  1,2,..., p
p
n

( m  i ) s n ( m)   a k  S n ( m  i ) S n ( m  k )
k 1
m
(*)
m
Terms of short-term covariance:
 n (i, k )   S n (m  i ) S n (m  k )
m
With this notation, we can write (*) as:
p 
 n (i,0)   a k  n (i, k )
i  1,2,..., p
k 1
A set of P equations, P unknowns
16
3.3.2 LPC Analysis Equations
The minimum mean-squared error can be expressed as:

p

E   s ( m)   a  s ( m) s ( m  k )
2
n
n
k 1
m
k
n
n
m
p 
 n (0,0)   a k n (0, k ).
k 1
17
3.3.3 The Autocorrelation Method
s (m  n).w(m), 0  m  N  1
s n ( m)  
otherwise.
0,
The mean squared error is:
En 
w(m): a window zero outside 0≤m≤N-1
N 1 p
2
e
 n ( m)
m 0
And:
 n (i, k ) 
 n (i, k ) 
N 1 p
s
m 0
n
(m  i ) s n (m  k ),
N 1 ( i  k )
s
m 0
n
(m) s n (m  i  k ),
1 i  p
0k  p
1 i  p
.
0k  p
18
3.3.3 The Autocorrelation Method
 n (i, k ) 
n 1 ( i  k )
s
m 0
n
(m) s n (m  i  k ),
1 i  p
.
0k  p
Since  n (i, k ) is only a function of i - k, the covariance
function reduces to simple autocorrel ation function :
 n (i, k )  rn (i  k )
19
3.3.3 The Autocorrelation Method
Since the autocorrel ation function is symmetric,
i.e. rn (k )  rn (k )
so :

p
 r (| i  k |) a
k 1
n
k
 rn (i ),
1 i  p
and can be expressed in matrix form as :


rn (1)
rn (2) ...rn ( p  1)  a1  rn (1) 
rn (0)
  
r (1)



r
(
0
)
r
(
1
)
...
r
(
p

2
)
r
(
2
)
n
n
n
n
 a    n
.
 2
rn (2)
rn (1)
rn (0) ...rn ( p  3)     rn (3) 

 a


rn ( p  1) rn ( p  2) rn ( p  3) ... rn (0)   p  rn ( p )
20
3.3.3 The Autocorrelation Method
21
3.3.3 The Autocorrelation Method
22
3.3.3 The Autocorrelation Method
23
3.3.4 The Covariance Method
change the interval of computing error to 0  m  N  1 and
to use the unweighted speech directly :
N 1
E n   en2 (m)
m 0
with  n (i, k ) defined as :
N 1
 n (i, k )   s n (m  i )s n (m  k ),
m 0
1 i  p
0k  p
or, by change of variables ,
 n (i, k ) 
N i 1
s
mi
n
(m) s n (m  i  k ),
1 i  p
0k  p
.
24
3.3.4 The Covariance Method
 n (1,1)
 (2,1)
 n
 n (3,1)



 n ( p,1)



 n (1,2)  n (1,3)   n (1, p)  a1   n (1,0) 


 n (2,2)  n (2,3)   n (2, p)      n (2,0) 
a2 

 n (3,2)  n (3,3)   n (3, p)       n (3,0) .
 a  



  3
 

 

 n ( p,2)  n ( p,3)  n ( p, p) a   n ( p,0)
 p
The resulting covariance matrix is symmetric, but not Toeplitz,
and can be solved efficiently by a set of techniques called
Cholesky decomposition
25
3.3.6 Examples of LPC Analysis
26
3.3.6 Examples of LPC Analysis
27
3.3.7 LPC Processor for Speech Recognition
28
3.3.7 LPC Processor for Speech Recognition
Preemphasis: typically a first-order FIR,
To spectrally flatten the signal
Most widely the following filter is used:
~
1
H ( z)  1  a z ,
~
0.9  a  1.0.
~
s(n)  s(n)  a s(n  1).
29
3.3.7 LPC Processor for Speech Recognition
Frame Blocking:
~
x (n)  s( M  n),
n  0,1,..., N  1
  0,1,..., L  1.
30
3.3.7 LPC Processor for Speech Recognition

Windowing
~
0  n  N  1.
x (n)  x (n) w(n),

Hamming Window:
 2n 
w(n)  0.54  0.46 cos 
,
 N 1

0  n  N  1.
Autocorrelation analysis
r (m) 
N 1 m ~

n 0
~
x  (n) x (n  m),
m  0,1,..., p,
31
3.3.7 LPC Processor for Speech Recognition

LPC Analysis, to find LPC coefficients, reflection
coefficients (PARCOR), the log area ratio coefficients, the
cepstral coefficients, …

Durbin’s method
E ( 0 )  r (0)
L 1


( i 1)
k i  r (i )    j r (| i  j |)  E (i 1) , (*)
j 1


 i(i )  k i
1 i  p
 (ji )   (ji 1)  k i i(i j1)
E ( i )  (1  k i2 ) E (i 1) ,
note : the summation in (*) is ommitted for i  1
32
3.3.7 LPC Processor for Speech Recognition
a m  LPC coefficien ts   m( p ) ,
1 m  p
k m  PARCOR coefficien ts
 1  km
g m  log area ratio coefficien ts  log 
 1  km

.

33
3.3.7 LPC Processor for Speech Recognition

LPC parameter conversion to cepstral coefficients
c0  ln  2
 2 is the gain term in LPC model
m 1
k
c m  a m    c k a m  k ,
k 1  m 
m 1
k
c m    c k a m  k ,
k 1  m 
1 m  p
m  p,
34
3.3.7 LPC Processor for Speech Recognition

Parameter weighting

Low-order cepstral coefficients are sensitive to overall spectral slope
High-order cepstral coefficients are sensitive to noise
The weighting is done to minimize these sensitivities


log | S (e

j
) |

c
m  
m
e
 j m
  ( jm)c

j
log s (e ) 


m  
m
e
 j m
35
3.3.7 LPC Processor for Speech Recognition


 

j
 jm
log | s(e ) |   c m e
,

m  

c m  c m ( jm).

c m  wm c m,
1  m  Q,
 Q  m 
,
wm  1  sin 
2
 Q 

1 m  Q
36
3.3.7 LPC Processor for Speech Recognition

Temporal cepstral derivative
Fourier series representa tion of the time derivative :



c (t )

j
log | s(e , t ) |   m e  jm
t
t
m  
Approximat e by an orthogonal polynomial fit over a finite - length win dow
K
c m (t )
  c m (t )    c m (t  k )
t
k  K
or optionally :  c m (t )  c m (t  K )  c m (t  K )
Finally



ot  (c1 (t ), c 2 (t ),..., c Q (t ),  c1 (t ),  c 2 (t ),...,  cQ (t ))
37
3.3.9 Typical LPC Analysis Parameters
N
M
P
Q
K
number of samples in the analysis frame
number of samples shift between frames
LPC analysis order
dimension of LPC derived cepstral vector
number of frames over which cepstral time
derivatives are computed
38
Typical Values of LPC Analysis Parameters for SpeechRecognition System
parameter
Fs  6.67 kHz
Fs  8 kHz
Fs  10 kHz
N
300 (45 msec) 240 (30 msec) 300 (30 msec)
M
100 (15 msec) 80 (10 msec)
100 (10 msec)
p
8
10
10
Q
12
12
12
K
3
3
3
39
Download