unit5 - University of Kentucky

advertisement
EE513
Audio Signals and Systems
LPC Analysis and Speech
Kevin D. Donohue
Electrical and Computer Engineering
University of Kentucky
Speech Generation
Speech can be divided into
fundamental building blocks of sounds
referred to as phonemes. All sounds
result from turbulence through
obstructed air flow
The vocal cords create quasi-periodic
obstructions of air flow as a sound
source at the base of the vocal tract.
Phonemes associated with the vocal
cord are referred to as voiced speech.
Single shot turbulence from obstructed
air flow through the vocal tract is
primarily generated by the teeth,
tongue and lips. Phonemes associated
with non-periodic obstructed air flow
are referred to as unvoiced speech.
Taken from http://www.kt.tu-cottbus.de/speech-analysis/
Speech Production Models
The general speech model:
Quasi-Periodic
Pulsed Air
Voiced Speech
Vocal Tract
Filter
Air Burst or
Continuous flow
Vocal
Radiator
Unvoiced Speech
Sources can be modeled as quasi periodic impulse trains or random
sequences of impulses.
Vocal tract filter can be modeled as an all-pole filter related to the tract
resonances.
The radiator can be modeled as a simple gain with spatial direction
(possibly some filtering)
Vocal Tract Resonances
Vocal tract length corresponds
to signal wavelength (). It can
be obtained from resonant
frequencies (f ) estimated from
recorded speech sounds and the
speed of sound (c), using
equation:

c
f
First 3 resonances of
tube with 1 closed end
1/4 Wavelength
3/4 Wavelength
5/4 Wavelength
Image adapted from:
hyperphysics.phy-astr.gsu.edu
Vocal Tract Resonances
The resonances of the vocal tract are called formants and
can be estimated from peaks of the spectrum where the effects
of pitch have been smoothed out (i.e. spectral envelope).
Low Order AR Modeling
If the voiced speech is characterized by an all pole model
with low order (i.e. about 10 for sampling rate of 8kHz),
then the pole frequencies correspond to the resonances of
the vocal tract:
Xˆ ( z )
G

1
2
p
Eˆ ( z ) 1  a (1) z  a (2) z  ......  a( p) z
The above transfer function can represent a filter that
computes the error between the current sample and the
sample predicted from previous samples. Therefore, it is
call a prediction error filter.
Example
Create an “auh” sound (as the “a” in about or “u” in hum) and use the
(linear prediction coefficient) LPC command to model this sound being
generated from a quasi-periodic sequence of impulses exciting an all
pole filter.
The LPC command finds a vector of filter coefficients such that
prediction error is minimized.
Predict x(n) from previous samples:
xˆ[n]  a(1) x[n  1]  a(2) x[n  2]  ......  a( M ) x[n  M ]
Compute prediction error sequence with:
e[n]  x[n]  a(1) x[n  1]  a(2) x[n  2]  ......  a( M ) x[n  M ]
Use Z-transforms to find transfer function of filter that recovers
x(n) from the LPCs and error sequence e(n).
LPC Derivation
Derive an algorithm to compute LPC coefficients from a stream of
data that minimizes the mean squared prediction error.
Let x(n)
a ( m)
for 0  n  N be the sequence of data points and
for 1  m  M be the Mth order LPC coefficients, and xˆ ( n)
be the prediction estimate.
The mean squared error for the prediction is given by:
N
N
1
1
2
2
ˆ


mse 
x
(
n
)

x
(
n
)

e
(
n
)


N  M  1 nM
N  M  1 nM
LPC Computation
Put prediction equations in matrix form:
 x( M )
 x( M  1)
 x ( M  2)
 x( M  3)
xP  




 x( N )

 x( M  1)

 x( M )

 x( M  1)

 x ( M  2)
 XD  







 x( N  1)

x ( M  2)
x( M  1)
x( M )
x( M  1)
x( M  3)
x ( M  2)
x( M  1)
x( M )







x ( N  2)
x( N  3)










x( N  M ) 
x(0)
x(1)
x ( 2)
x(3)
  a (1)
  a ( 2)
  a (3)
  a ( 4)
a




 a( M )
Each row of X Da  xˆ p is a prediction of the corresponding
sample in x p









LPC Computation
The mean squared error can be expressed as:
( N  M  1)  mse 

T
2
ˆ


xˆ p  x p   X Da  x p
e
(
n
)

x

x

p
p
N
nM
 X
T
D
a  xp

If derivative is taken with respect to a and set equal to 0, the
result is:


1 T
T
XDXD XDx p
a
LPC Computation
Transpose of the data matrix times itself results in the autocorrelation matrix:
 x( M  1)
 x( M  2)

 x( M  3)

x( M  4)
T
XDXD  






 x(0)
x( M )
x( M  1)
x( M  2)
x( M  3)
x( M  1)
x( M )
x( M  1)
x( M  2)







x(1)
x(2)

x( N  1)
x( N  2)
x( N  3)
x( N  4)
  x( M  1)
  x( M )

  x( M  1)

  x( M  2)







x( N  M )   x( N  1)
x( M  2)
x( M  1)
x( M )
x( M  1)
x( M  3)
x( M  2)
x( M  1)
x( M )







x( N  2)
x( N  3)













x( N  M ) 
x(0)
x(1)
x(2)
x(3)
The data matrix transpose times the future (p-vector) values become a sequence of
autocorrelation values starting with the first lag:
 x( M  1)
 x( M  2)

 x( M  3)

x( M  4)
T
XDxP  






 x(0)
x( M )
x( M  1)
x( M  2)
x( M  3)
x( M  1)
x( M )
x( M  1)
x( M  2)







x(1)
x(2)

x( N  1)
x( N  2)
x( N  3)
x( N  4)
  x( M )
  x( M  1)

  x( M  2)

  x( M  3)







x( N  M )   x( N )












Autocorrelation and LPC
Define the autocorrelation of a sequence as:
N
r ( k )   x ( n  k ) x ( n)
where
x ( n)  0
for n  0 and n  N
n 0
Note that the LPC coefficients are computed from the
autocorrelation coefficients:
X
T
D
XD

r (1)
r (2)
 r ( M  1) 
 r (0)
 r (1)
r (0)
r (1)
r ( M  2)
  r (2)
r (1)
r (0)
r ( M  3) 






r
(
M

1
)
r
(
M

2
)
r
(
M

3
)
r
(
0
)


 r (1) 
 r ( 2) 
T
X D x p   r (3) 
  
r ( M )


Autocorrelation Matrix
Script for Analysis
winlens = 50; %PSD window length in milliseconds
[y,fs] = wavread('../data/aaa3.wav'); % Read in wavefile
winlen = winlens*fs/1000;
[cb,ca] = butter(5,2*100/fs,'high'); % Filter to remove LF recording noise
yf = filtfilt(cb,ca,y);
[a,er] = lpc(yf,10); % Compute LPC coefficient with model order 10
predy = filter(a,1,yf); % Compute prediction error with all zero filter
kd=1; % Starting figure number
figure(kd) ; plot(predy); hold on; plot(yf,'g'); hold off; title('Prediction error'); xlabel('Samples'); ylabel('Amplitude')
recon = filter(1,a,predy); % Compute reconstructed signal from error and all-pole filter
figure(kd+1) % Plot reconstructed signal
plot(recon,'b')
hold on
% Plot with original delayed by a unit so it does not entirely overlap the perfectly reconstructed signal
plot(yf(2:end),'r')
hold off
xlabel('Samples'); ylabel('Amplitude')
title('Reconstructed Signal (blue) and Original (red)')
% By examining a the error sequence, generate a simple impulse sequence to simulate its period (about 103 sample period)
g = [];
for k=1:150
g = [g, 1, zeros(1,55)];
end
Script for Analysis
% Run simulated error sequence through all pole filter
sim = filter(1,a,g);
soundsc([(sim')/std(sim); zeros(fix(fs)*1,1); yf/std(yf)],fs)
% Plot pole zero diagram
figure(kd+2)
r = (roots(a))
w = [0:.001:2*pi];
plot(real(r),imag(r),'xr',real(exp(j*w)),imag(exp(j*w)),'b')
title('Pole diagram of vocal tract filter')
xlabel('Real'); ylabel('Imaginary')
% Find resonant frequencies corresponding to poles
froots = (fs/2)*angle(r)/pi;
nf = find(froots > 0 & froots < fs/2); % Find those corresponding to complex conjugate poles
figure(kd+3)
% Examine average specturm with formant frequencies
[pd,f] = pwelch(yf,hamming(winlen),fix(winlen/2),2*winlen,fs);
dbspec = 20*log10(pd);
mxp = max(dbspec); % Find max and min points for graphing verticle lines
mnp = min(dbspec);
plot(f,dbspec,'b') % Plot PSD
hold
Script for Analysis
% Over lines on plot where formant frequencies were estimated from LPCs
for k=1:length(nf)
plot([froots(nf(k)), froots(nf(k))], [mnp(1), mxp(1)], 'k--')
end
hold off
title('PSD plot with formant frequencies (Black broken lines)')
xlabel('Hertz')
ylabel('dB')
% Get spectrum from the AR (LPC) parameters
[hz,fz] = freqz(1, a, 1024, fs);
figure(kd+4)
plot(fz,abs(hz))
title('Spectrum Generated by LPCs')
xlabel('Hertz')
ylabel('Amplitude')
LPC Analysis Result
Pole Frequencies of LPC model from vocal tract shape
PSD plot with formant frequencies (Black broken lines)
20
0
-20
dB
-40
-60
-80
-100
-120
-140
0
500
1000
1500
2000
Hertz
2500
3000
3500
4000
Frequency periodicities from harmonics of Pitch frequency
Vocal Tract Filter
Implementations
Direct form 1 for all pole model:
x(n)  g1e(n)  a(1) x(n  1)  a(2) x(n  2)  ......  a( M ) x(n  M )
Xˆ ( z )
g1

1
2
M
ˆ
E ( z ) 1  a(1) z  a (2) z  ......  a( M ) z
x(n)
z-1
…
z-1
z-1
z-1
 a(1)
 a ( 2)
 a (3)
 a (M )
+
g1
e(n)
Vocal Tract Filter
Implementations
Direct form 1, second order sections:

 

gM / 2
Xˆ ( z ) 
g1
g2





 

1
2
1
2
1
2
Eˆ ( z )  1  c(1,2) z  c(1,3) z  1  c(2,2) z  c(2,3) z   1  c( M / 2,2) z  c( M / 2,3) z 
e(n)
g1
+
g2
…
+
z-1
+
c(1,2)
+
+
z-1
+
z-1
 c(M / 2,2)
c(2,2)
z-1
c(1,3)
gM /2
+
z-1
+
c(2,3)
x(n)
z-1
+
 c(M / 2,3)
Vocal Tract Filter
Implementations
Lattice implementation are popular because of good numerical error and
stability properties. The filter is implement in modular stages with
coefficients directly related to stability criterion and tube resonances of the
vocal tract (example of 2nd order system):
Eˆi ( z )  Eˆi 1( z )  ki z 1Xˆ i ( z )
Xˆ i 1( z )  ki*Eˆi ( z )  z 1Xˆ i ( z )
e(n)
e2 ( n)
+
e1 ( n )
 k1
k1*
+
x2 ( n )
z-1
+
e0 (n)
x(n)
k0
k 0*
+
x1 (n)
z-1
x0 (n)
Example
a)
Record a neutral vowel sound, estimate the formant frequencies, and
estimate the size of the vocal tract based on a 345 m/s speed of sound
and assume an open-at-one-end tube model.
b)
Use LPCs estimated from the neutral vowel sound, to filter another
sample of speech from the same speaker. Use it as an all zero filter
and then as an all pole filter. Listen to the sound and describe what is
happening.
c)
Convert the LPC coefficients for all-pole filter into a second order
section and implement filter. Describe advantages of this approach.
d)
Modify the filter by maintaining the angle of the poles/zeros but move
their magnitudes closer to the unit circle. Listen to the sound and
explain what is happening.
Homework (1)
a)
Record a free vowel sound and estimate the size of your vocal tract
based on the formant frequencies.
b)
Compute the LPCs from a free vowel sound and use the LPCs to filter
another segment of speech with –10dB of white noise added. Use the
LPCs as an all-zero filter and as an all-pole filter. Describe the sound
of the filtered outputs and explain what is happening between the 2
filters.
c)
Move the poles and zeros further away from the unit circles and
repeat part b). Describe the effect on the filtered sound when pole
and zeros are moved away from the unit circle. Submit this
description and the mfiles used to process the data.
Download