EE599-020 Audio Signals and Systems Speech Production Kevin D. Donohue Electrical and Computer Engineering University of Kentucky Related Web Sites Speech modeling is a very popular topic. Many web sites are devoted to education and research in this area. A general search of speech production, modeling, synthesis, analysis … will turn up many interesting web sites. A few examples are given below: http://www.asel.udel.edu/speech/tutorials/production/index.html http://www.haskins.yale.edu/haskins/HEADS/production.html http://www.kt.tu-cottbus.de/speech-analysis/ Speech Generation Speech can be divided into fundamental building blocks of sounds referred to as phonemes. All sounds results from turbulence through obstructed air flow The vocal cords create quasi-periodic obstructions of air flow as a sound source at the base of the vocal tract. Phonemes associated with the vocal cord are referred to as voiced speech. Single shot turbulence from obstructed air flow through the vocal tract is primarily generated by the teeth, tongue and lips. Phonemes associated with with non-periodic obstructed air flow are referred to as unvoiced speech. Taken from http://www.kt.tu-cottbus.de/speech-analysis/ Speech Production Models The general speech model: Quasi-Periodic Pulsed Air Voiced Speech Vocal Tract Filter Air Burst or Continuous flow Vocal Radiator Unvoiced Speech Sources can be modeled as quasi periodic impulse trains or random sequences of impulse. Vocal tract filter can be modeled as an all-pole filter related to the tract resonances. The radiator can be modeled as a simple gain with spatial direction (possibly some filtering) Example Create an “a” sound (as the “a” in about or “u” in hum) and use the LPC command to model this sound being generated from a quasiperiodic sequence of impulses exciting an all pole filter. The LPC command finds a vector of filter coefficients (a) such that prediction error is minimized: Predict x(n) from previous samples: ~ x (n) a(1) x(n 1) a(2) x(n 2) ...... a( p) x(n p) Compute prediction error sequence with: e(n) x(n) a(1) x(n 1) a(2) x(n 2) ...... a( p) x(n p) Use Z-transforms to find transfer function of filter that recovers x(n) from the LPCs and error sequence e(n). Identify the components related to source, vocal tract, and radiator. Script for Analysis [y,fs] = wavread('aaaaa.wav'); % Read in wave file [cb,ca] = butter(5,2*60/fs,'high'); % Filter to remove LF recording noise yf = filtfilt(cb,ca,y); [a,er] = lpc(yf,10); % Compute LPC coefficent with model order 10 predy = filter(a,1,yf); % Compute prediction error with all zero filter figure(1) ; plot(predy); title('Prediction error'); xlabel('Samples'); ylabel('Amplitude') recon = filter(1,a,predy); % Compute reconstructed signal from error and all-pole filter figure(2) % Plot reconstructed signal plot(recon,'b') hold on % Plot with original delayed by a unit so it does not entirely overlap the perfectly reconstructed signal plot(yf(2:end),'r') hold off % By examining a the error sequence, generate a simple impulse sequence to simulate its period (about 103 sample period) g = []; for k=1:150 g = [g, 1, zeros(1,103)]; end % Run simulated error sequence through all pole filter sim = filter(1,a,g); soundsc([(sim')/std(sim); yf/std(yf)],fs) % Play sounds compare simulated with real Script for Analysis % Plot pole zero diagram figure(3) r = (roots(a)) w = [0:.001:2*pi]; plot(real(r),imag(r),'xr',real(exp(j*w)),imag(exp(j*w)),'b'); title('Pole diagram of vocal tract filter') xlabel('Real'); ylabel('Imaginary') % Find resonant frequencies corresponding to poles froots = (fs/2)*angle(r)/pi; nf = find(froots > 0 & froots < fs/2); % Find those corresponding to complex conjugate poles figure(4) % Examine average specturm with formant frequencies [pd,f] = psd(yf,4*1024,fs,hamming(2*1024),256); dbspec = 20*log10(pd); mxp = max(dbspec); % Find max and min points for graphing verticle lines mnp = min(dbspec); plot(f,dbspec,'b') % Plot PSD hold on % Over lines on plot where formant frequencies were estimated from LPCs for k=1:length(nf) plot([froots(nf(k)), froots(nf(k))], [mnp(1), mxp(1)], 'k--') end hold off title('PSD plot with formant frequencies (Black broken lines)'); xlabel('Hertz'); ylabel('dB') LPC Analysis Result Pole Frequencies of LPC model from vocal tract shape PSD plot with formant frequencies (Black broken lines) 20 0 -20 dB -40 -60 -80 -100 -120 -140 0 500 1000 1500 2000 Hertz 2500 3000 3500 4000 Frequency periodicities from harmonics of Pitch frequency Vocal Tract Filter Implementations Direct form 1 for all pole model: x( n) g1e( n) a( 2) x( n 1) a(3) x( n 2) ...... a( p 1) x( n p ) g1 Xˆ ( z ) Eˆ ( z ) 1 a( 2) z 1 a(3) z 2 ...... a( p 1) z p x(n) z-1 … z-1 z-1 z-1 a(2) a(3) a(4) a( p 1) + g1 e(n) Vocal Tract Filter Implementations Direct form 1, second order sections: gp/2 g1 g2 Xˆ ( z ) 1 2 1 2 1 2 ˆ E( z ) 1 c(1,2) z c(1,3) z 1 c(2,2) z c( 2,3) z 1 c( p / 2,2) z c( p / 2,3) z e(n) g1 + g2 … + z-1 + c(1,2) + + z-1 + c(2,2) z-1 c(1,3) gp/2 z-1 + z-1 + c(2,3) x(n) c( p / 2,2) z-1 + c( p / 2,3) Vocal Tract Filter Implementations Lattice implementation are popular because of error and stability properties. The filter is implement in modular stages with coefficients directly related to stability criterion and tube resonances of the vocal tract : Eˆi ( z ) Eˆi 1( z ) ki z 1Xˆ i ( z ) Xˆ i 1( z ) ki*Eˆi ( z ) z 1Xˆ i ( z ) e(n) + e2 ( n) k2 z-1 + e1 ( n ) k1 k1* + x2 ( n ) z-1 + e0 (n) x(n) k0 k 0* + x1 (n) z-1 x0 (n) Example a) b) c) d) Record a neutral vowel sound, estimate the formant frequencies, and estimate the size of the vocal tract based on a 341 m/s speed of sound and assume an open-at-oneend tube model. Use LPCs estimated from the neutral vowel sound, to filter another sample of speech from the same speaker. Use it as an all zero filter and then as an all pole filter. Listen to the sound and describe what is happening. Convert the LPC coefficients for all-pole filter into a second order section and implement filter. Describe advantages of this approach. Modify the filter by maintaining the angle of the poles/zeros but move their magnitudes closer to the unit circle. Listen to the sound and explain what is happening. Homework (1) a) Record a free vowel sound and estimate the size of your vocal tract based on the formant frequencies. b) Compute the LPCs from a free vowel sound and use the LPCs to filter another segment of speech with –10dB of white noise added. Use the LPCs as an all-zero filter and as an all-pole filter. Describe the sound of the filtered outputs and explain what is happening between the 2 filters. Extra credit (1 point), move the poles and zeros further away from the unit circles and repeat part b). Describe the effect on the filtered sound when pole and zeros are moved away from the unit circle. Submit this description and the mfiles used to process the data.