Digital Representation of Audio Information

advertisement
EE599-020
Audio Signals and Systems
Speech Production
Kevin D. Donohue
Electrical and Computer Engineering
University of Kentucky
Related Web Sites
Speech modeling is a very popular topic. Many web sites
are devoted to education and research in this area. A
general search of speech production, modeling, synthesis,
analysis … will turn up many interesting web sites. A few
examples are given below:
http://www.asel.udel.edu/speech/tutorials/production/index.html
http://www.haskins.yale.edu/haskins/HEADS/production.html
http://www.kt.tu-cottbus.de/speech-analysis/
Speech Generation
Speech can be divided into
fundamental building blocks of sounds
referred to as phonemes. All sounds
results from turbulence through
obstructed air flow
The vocal cords create quasi-periodic
obstructions of air flow as a sound
source at the base of the vocal tract.
Phonemes associated with the vocal
cord are referred to as voiced speech.
Single shot turbulence from obstructed
air flow through the vocal tract is
primarily generated by the teeth,
tongue and lips. Phonemes associated
with with non-periodic obstructed air
flow are referred to as unvoiced
speech.
Taken from http://www.kt.tu-cottbus.de/speech-analysis/
Speech Production Models
The general speech model:
Quasi-Periodic
Pulsed Air
Voiced Speech
Vocal Tract
Filter
Air Burst or
Continuous flow
Vocal
Radiator
Unvoiced Speech
Sources can be modeled as quasi periodic impulse trains or random
sequences of impulse.
Vocal tract filter can be modeled as an all-pole filter related to the tract
resonances.
The radiator can be modeled as a simple gain with spatial direction
(possibly some filtering)
Example
Create an “a” sound (as the “a” in about or “u” in hum) and use the
LPC command to model this sound being generated from a quasiperiodic sequence of impulses exciting an all pole filter.
The LPC command finds a vector of filter coefficients (a) such that
prediction error is minimized:
Predict x(n) from previous samples:
~
x (n)  a(1) x(n  1)  a(2) x(n  2)  ...... a( p) x(n  p)
Compute prediction error sequence with:
e(n)  x(n)  a(1) x(n  1)  a(2) x(n  2)  ...... a( p) x(n  p)
Use Z-transforms to find transfer function of filter that
recovers x(n) from the LPCs and error sequence e(n).
Identify the components related to source, vocal tract, and
radiator.
Script for Analysis
[y,fs] = wavread('aaaaa.wav'); % Read in wave file
[cb,ca] = butter(5,2*60/fs,'high'); % Filter to remove LF recording noise
yf = filtfilt(cb,ca,y);
[a,er] = lpc(yf,10); % Compute LPC coefficent with model order 10
predy = filter(a,1,yf); % Compute prediction error with all zero filter
figure(1) ; plot(predy); title('Prediction error'); xlabel('Samples'); ylabel('Amplitude')
recon = filter(1,a,predy); % Compute reconstructed signal from error and all-pole filter
figure(2)
% Plot reconstructed signal
plot(recon,'b')
hold on
% Plot with original delayed by a unit so it does not entirely overlap the perfectly reconstructed signal
plot(yf(2:end),'r')
hold off
% By examining a the error sequence, generate a simple impulse sequence to simulate its period (about 103 sample period)
g = [];
for k=1:150
g = [g, 1, zeros(1,103)];
end
% Run simulated error sequence through all pole filter
sim = filter(1,a,g);
soundsc([(sim')/std(sim); yf/std(yf)],fs) % Play sounds compare simulated with real
Script for Analysis
% Plot pole zero diagram
figure(3)
r = (roots(a))
w = [0:.001:2*pi];
plot(real(r),imag(r),'xr',real(exp(j*w)),imag(exp(j*w)),'b'); title('Pole diagram of vocal tract filter')
xlabel('Real'); ylabel('Imaginary')
% Find resonant frequencies corresponding to poles
froots = (fs/2)*angle(r)/pi;
nf = find(froots > 0 & froots < fs/2); % Find those corresponding to complex conjugate poles
figure(4)
% Examine average specturm with formant frequencies
[pd,f] = psd(yf,4*1024,fs,hamming(2*1024),256);
dbspec = 20*log10(pd);
mxp = max(dbspec); % Find max and min points for graphing verticle lines
mnp = min(dbspec);
plot(f,dbspec,'b') % Plot PSD
hold on
% Over lines on plot where formant frequencies were estimated from LPCs
for k=1:length(nf)
plot([froots(nf(k)), froots(nf(k))], [mnp(1), mxp(1)], 'k--')
end
hold off
title('PSD plot with formant frequencies (Black broken lines)'); xlabel('Hertz'); ylabel('dB')
LPC Analysis Result
Pole Frequencies of LPC model from vocal tract shape
PSD plot with formant frequencies (Black broken lines)
20
0
-20
dB
-40
-60
-80
-100
-120
-140
0
500
1000
1500
2000
Hertz
2500
3000
3500
4000
Frequency periodicities from harmonics of Pitch frequency
Vocal Tract Filter
Implementations
Direct form 1 for all pole model:
x( n)  g1e( n)  a( 2) x( n  1)  a(3) x( n  2)  ......  a( p  1) x( n  p )
g1
Xˆ ( z )

Eˆ ( z ) 1  a( 2) z 1  a(3) z 2  ......  a( p  1) z  p
x(n)
z-1
…
z-1
z-1
z-1
a(2)
a(3)
a(4)
a( p  1)
+
g1
e(n)
Vocal Tract Filter
Implementations
Direct form 1, second order sections:

 

gp/2
g1
g2
Xˆ ( z ) 





1

2

1

2

1

2





ˆ
E( z )  1  c(1,2) z  c(1,3) z  1  c(2,2) z  c( 2,3) z   1  c( p / 2,2) z  c( p / 2,3) z 
e(n)
g1
+
g2
…
+
z-1
+
c(1,2)
+
+
z-1
+
c(2,2)
z-1
c(1,3)
gp/2
z-1
+
z-1
+
c(2,3)
x(n)
c( p / 2,2)
z-1
+
c( p / 2,3)
Vocal Tract Filter
Implementations
Lattice implementation are popular because of error and stability
properties. The filter is implement in modular stages with coefficients
directly related to stability criterion and tube resonances of the vocal tract :
Eˆi ( z )  Eˆi 1( z )  ki z 1Xˆ i ( z )
Xˆ i 1( z )  ki*Eˆi ( z )  z 1Xˆ i ( z )
e(n)
+
e2 ( n)
k2
z-1
+
e1 ( n )
 k1
k1*
+
x2 ( n )
z-1
+
e0 (n)
x(n)
k0
k 0*
+
x1 (n)
z-1
x0 (n)
Example
a)
b)
c)
d)
Record a neutral vowel sound, estimate the formant
frequencies, and estimate the size of the vocal tract based
on a 341 m/s speed of sound and assume an open-at-oneend tube model.
Use LPCs estimated from the neutral vowel sound, to
filter another sample of speech from the same speaker.
Use it as an all zero filter and then as an all pole filter.
Listen to the sound and describe what is happening.
Convert the LPC coefficients for all-pole filter into a
second order section and implement filter. Describe
advantages of this approach.
Modify the filter by maintaining the angle of the
poles/zeros but move their magnitudes closer to the unit
circle. Listen to the sound and explain what is happening.
Homework (1)
a) Record a free vowel sound and estimate the size of your
vocal tract based on the formant frequencies.
b) Compute the LPCs from a free vowel sound and use the
LPCs to filter another segment of speech with –10dB of
white noise added. Use the LPCs as an all-zero filter and
as an all-pole filter. Describe the sound of the filtered
outputs and explain what is happening between the 2
filters.
Extra credit (1 point), move the poles and zeros further away
from the unit circles and repeat part b). Describe the
effect on the filtered sound when pole and zeros are
moved away from the unit circle. Submit this description
and the mfiles used to process the data.
Download