Fundamental frequency estimation HCS 7367 Speech Perception Lab http://www.mathworks.com/help/signal/ug/estimating‐ fundamental‐frequency‐with‐the‐complex‐cepstrum.html TrackDraw uses a real cepstrum‐based F0 estimation algorithm that works reasonably well. Example using the command line: Dr. Peter Assmann Fall 2013 >> [y,fs]=wavread('wheel.wav'); >> F0=cepf0(y,fs)'; >> F0=medfilt1(F0,4); Fundamental frequency estimation Fundamental frequency estimation 250 YIN de Cheveigné , A. (2002). YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111, 1917‐1930. 200 http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf http://www.ircam.fr/pcm/cheveign/sw/yin.zip http://www.ircam.fr/pcm/cheveign/sw/sf.zip Cepstrum-based F0 estimation F0 (Hz) Cepstrum‐based F0 estimation: YAAPT Zahorian SA, Hu H (2008). A spectral/temporal method for robust fundamental frequency tracking. J. Acoust. Soc. Am. 123, 4559‐4571. 150 100 50 0 http://www.ws.binghamton.edu/zahorian/yaapt.htm Others 100 200 300 Time (ms) 400 http://www.audiocontentanalysis.org/code/pitch‐tracking/compute‐pitch/ Fundamental frequency estimation MBSC Multi‐band summary correlogram‐based pitch detection L. N. Tan, and A. Alwan, "Multi‐Band Summary Correlogram‐based Pitch Detection for Noisy Speech", Speech Communication 55 (2013), 841–856. 250 200 >> F0=mbsc(y,fs); 150 >> plot( F0,’ .’ ); • Introduction • Statement of problem + background information (10%) • Method – Enough detail for replication purposes (25%) • Results • Figures + written summary (25%) • Discussion http://www.ee.ucla.edu/~spapl/code/MBSC_matlab.zip >> [y,fs]=wavread('sent1.wav'); Term Projects – Discuss expected / explain unexpected findings (25%) • Appendix 100 – Include all Matlab code (15%) 50 0 0 20 40 60 80 100 120 140 160 180 200 1 Ellipse plots • Download the ellipse function: http://www.utdallas.edu/~assmann/hcs7367/ellipse.m • Load the Hillenbrand data set: [filenames,dur,F0s,F1s,F2s,F3s,F4s,F120,F220,F320,F150,F250, F350,F180,F280,F380] = textread('vowdata_no_header.txt','%s%4.1f%4.1f%4.1f%4.1f% 4.1f%4.1f%4.1f%4.1f%4.1f%4.1f%4.1f%4.1f%4.1f%4.1f%4.1f'); Ellipse plots % assign vowel and group codes talker_group = char('m','w','b','g'); vowel = char('ae','ah','aw','eh','er','ei','ih','iy','oa','oo','uh','uw'); hvd=char('had','hod','hawed','head','herd','hayed','hid','heed','hoed', 'hod','hood','hud','whod'); filenames=char(filenames); [nfiles,nchar]=size(filenames); for ifile=1:nfiles, vowel_code(ifile) = strmatch(filenames(ifile,4:5),vowel); talker_group_code(ifile) = strmatch(filenames(ifile,1),talker_group); end; Ellipse plots z=ellipse(log(F1s(indgv)),log(F2s(indgv)),1); patch(z(:,1),z(:,2),[0.7 0.7 1],'EdgeColor','b'); hold on; plot(log(F1s(indgv)),log(F2s(indgv)),'ob','MarkerSize',1,'MarkerFaceCol or','b'); hm=text(meanF1,meanF2,hvd(1,:),'FontSize',9); set(hm,'Color','b'); hold on; Ellipse plots % Get rid of zero entries ind=union(find(F1s==0),find(F2s==0)); ind=union(ind,find(F3s==0)); filenames(ind,:)=[]; F1s(ind)=[]; F2s(ind)=[]; F3s(ind)=[]; Ellipse plots indg=find(talker_group_code==1); % find row numbers for males indv=find(vowel_code==1); % find row numbers for /ae/ indgv=intersect(indg,indv); % intersection meanF1=mean(log(F1s(indgv))); % log mean of F1 for this subset meanF2=mean(log(F2s(indgv))); % log mean of F2 for this subset Ellipse plots axis(log([250 1300 500 4000])) set(gca,'Box','On','XTick',log([300:100:1300]),... 'XTickLabel','300|400||600||800||1000||1200|'); set(gca,'YTick',log([500:500:4000]),... 'YTickLabel','500|1000|1500|2000|2500|3000|3500|4000'); xlabel('F1 frequency (Hz)'); ylabel('F2 frequency (Hz)'); title('Hillenbrand vowels ‐ W. Michigan (adults)'); 2 Ellipse plots Ellipse plots Hillenbrand vowels - W. Michigan (adults) 4000 3500 3000 F2 frequency (Hz) 2500 2000 had 1500 1000 500 300 400 600 F1 frequency (Hz) 800 1000 1200 • Homework assignment: put the code above into a script and add a for‐loop to plot ellipses, data points and labels for all 12 vowels. • Make separate ellipse plots for males, females, boys and girls • Bonus points: plot your vowels in the vowel space. Be sure to log‐transform the formant frequencies, and use large symbols: >> plot(log(myF1),log(myF2),'rp','MarkerSize',20, 'MarkerFaceColor','r'); Filters Filters • A filter is a device that alters the frequency content of a signal through mechanical, acoustical or electrical elements. Digital filters operate on discrete‐time (sampled) signals. • Filters are commonly used to allow a certain band of frequencies to pass through while others are suppressed or attenuated. • Low‐pass filter – passes frequencies below fc • High‐pass filter – passes frequencies above fc • Bandpass filter – passes frequencies between fL and fH • Bandstop filter – passes frequencies outside fL and fH Filters and convolution Filters and convolution • The Matlab function conv.m implements the mathematical operation of convolution, defined as: y ( n) h( n) * x( n) h( n m) x ( m) m • A digital filter's output y(k) is related to its input x(k) by convolution with its impulse response, h(k). y ( n) h( n) * x( n) h( n m) x ( m) m 3 Filters and convolution • Use conv.m in digital filtering applications where the filter length, m, is finite, and the number of speech samples, n, is also finite. Filters and convolution • Convolution involves the following steps: – (1) reverse the impulse response in time; – (2) sum the cross‐products of filter and signal to generate a sample of the output; y ( n) h( n) * x( n) h( n m) x ( m) m – (3) shift the filter by one sample and repeat step 2. Continue until signal and filter no longer overlap. y ( n) h( n) * x ( n) h( n m) x ( m) m Filters and convolution • Signal x(k) – Generate a vector of 20 random numbers: x=rand(20,1); Filters and convolution • Convolve the signal with the filter y = conv( h,x ); Filters and convolution • Create the filter impulse response, h(k) – Design a 3‐point moving average filter h = [ 1 1 1 ] / 3; Filters and convolution • plot(x, 'b'); • hold on; • plot(y, 'r'); Add legend • legend('Input sequence', 'Filtered sequence'); 4 Filters and convolution • The 3‐point moving average filter is an example of a finite impulse response (FIR) filter with a finite set of filter coefficients b(1), b(2), … b(nb). • It acts as a low‐pass (smoothing) filter. Filtering and convolution • Another way to describe the operation of filtering is in terms of the z‐transform: Y ( z) H ( z) X ( z) b(1) b(2) z 1 ... b(nb 1) z nb X ( z) a(1) a(2) z 1 ... a(na 1) z na • where X (z ) is the z ‐transform of the speech signal, • H (z ) is the transfer function of the filter, and • Y (z ) is the z ‐transform of the filtered speech. Filtering and convolution • The constants b (i ) and a (i ) are the filter coefficients; the order of the filter is the larger of nb and na. Y ( z) H ( z) X ( z) b(1) b(2) z 1 ... b(nb 1) z nb X ( z) a(1) a(2) z 1 ... a(na 1) z na Filtering and convolution • When na = 0 (a is a scalar) the filter is a Finite Impulse Filtering and convolution • When nb = 0 (b is a scalar) the filter is an Infinite Impulse Response (IIR), all‐pole, recursive, or autoregressive (AR) filter. Y ( z) H ( z) X ( z) b(1) b(2) z 1 ... b(nb 1) z nb X ( z) a(1) a(2) z 1 ... a(na 1) z na Filtering and convolution • When both na > 0 and nb > 0, the filter is an Infinite Response (FIR), all‐zero, non‐recursive, or moving Impulse Response (IIR), pole‐zero, recursive, or average (MA) filter. autoregressive moving average (ARMA) filter. Y ( z) H ( z) X ( z) b(1) b(2) z 1 ... b(nb 1) z nb X ( z) a(1) a(2) z 1 ... a(na 1) z na Y ( z) H ( z) X ( z) b(1) b(2) z 1 ... b(nb 1) z nb X ( z) a(1) a(2) z 1 ... a(na 1) z na 5 Filtering and convolution Filtering and convolution • The z‐transform expressed as a difference equation: y ( n) b1 x(n) b2 x( n 1) ... bnb 1 x(n nb) a2 y (n 1) ... ana 1 y (n na ) • The output samples of y are given by: y (1) b1 x(1) • Digital filters of this form can be implemented using the function filter.m • Example: a simple low‐pass filter Frequency response » b = 1; 0 y (2) b1 x(2) b2 x(1) a2 y (1) y (3) b1 x(3) b2 x(2) b2 x(1) a2 y (2) a3 y (1) » a = [1 ‐0.9] » x=[1; zeros(1023,1)]; » y = filter(b,a,x); … -10 -15 -20 -25 -30 0 Graphical techniques for filter design • • • • Magnitude (dB) -5 Fdesign Filterbuilder GUI FDATool SPTool ELLIP Elliptic or Cauer digital and analog filter design. [B,A] = ELLIP(N,Rp,Rs,Wn) designs an Nth order lowpass digital elliptic filter with Rp decibels of ripple in the passband and a stopband Rs decibels down. ELLIP returns the filter coefficients in length N+1 vectors B (numerator) and A (denominator). The cut-off frequency Wn must be 0.0 < Wn < 1.0, with 1.0 corresponding to half the sample rate. Use Rp = 0.5 and Rs = 20 as starting points, if you are unsure about choosing them. If Wn is a two-element vector, Wn = [W1 W2], ELLIP returns an order 2N bandpass filter with passband W1 < W < W2. [B,A] = ELLIP(N,Rp,Rs,Wn,'high') designs a highpass filter. 2 3 Frequency (kHz) 4 5 Filter design • Example: design a 5‐th order elliptical low‐ pass filter with a cutoff of 1000 Hz, no more than 1 dB of ripple in the passband, and at least 60 dB attenuation in the stopband. Elliptic filter » help ellip 1 Elliptic filter • Example: design a 5‐th order elliptical low‐pass filter with no more than 1 dB of ripple in the passband, and at least 60 dB attenuation in the stopband. » N=5; » » » » » » % filter order Rp=1; % # of decibels of ripple in the passband Rs=60; % # of decibels of attenuation in stopband Wn=1000 / (rate/2); % normalized filter cutoff frequency [b ,a] = ellip(N,Rp,Rs,Wn); x=[1; zeros(1023,1)]; y = filter (b,a,x); [B,A] = ELLIP(N,Rp,Rs,Wn,'stop') is a bandstop filter if Wn = [W1 W2]. 6 Elliptic filter Bandpass filters • Example: design a 5‐th order elliptical low‐pass filter with no more than 1 dB of ripple in the passband, and at least 60 dB attenuation in the stopband. » » » » » » » » 0 -20 Magnitude (dB) • Example: design a 5‐th order elliptical bandpass filter centered at 1000 Hz with a bandwidth of 400 Hz; no more than 1 dB of ripple in the passband, and at least 60 dB attenuation in the stopband. -40 -60 -80 N=5; % filter order Rp=1; % # of decibels of ripple in the passband Rs=60; % # of decibels of attenuation in stopband Wn=[800 1200]; % specify upper and lower filter cutoff frequencies Wn=Wn / (rate/2); % normalized filter cutoff frequencies [b ,a] = ellip(N,Rp,Rs,Wn); x=[1; zeros(1023,1)]; y = filter (b,a,x); -100 0 1 2 3 Frequency (kHz) 4 5 Homework: LP, HP, and BP filters Bandpass filters 0 Magnitude (dB) -20 -40 -60 -80 -100 0 1000 2000 3000 Frequency (Hz) 4000 5000 Gammatone spectrograms • Download the Gammatone spectrogram code: http://www.ee.columbia.edu/~dpwe/resources/m atlab/gammatonegram/ • If you use this in your work, include this reference: D. P. W. Ellis (2009). "Gammatone‐like spectrograms", web resource. http://www.ee.columbia.edu/~dpwe/resources/matlab/gammatonegram/ 1. Design a 5‐th order elliptical low‐pass filter with 1000 Hz cutoff, no more than 1 dB of ripple in the passband, and at least 60 dB attenuation in the stopband. 2. Design a high‐pass filter with these same parameters. 3. Design a band‐pass filter centered at 1000 Hz with a 500‐Hz bandwidth. 4. Filter the vowel /ae/ from your dataset. Plot the waveform and spectrum and listen to the filtered vowel. Long‐term average speech spectrum >> help ltass LTASS: Computes long‐term average speech spectrum via FFT. Usage: mag=ltass(w,rate,twin,thop,nfft); w: input waveform rate: sample rate in Hz (default 10000 Hz) twin: frame length in ms (default: 10 ms) thop: frame update in ms (default: 5 ms) nfft: FFT window length (default: 256 samples) 7 Long‐term average speech spectrum • Compute and plot the LTASS of the vowel /ae/ – twin=10; Vowel identification experiment • What we need: – A Matlab script to implement a labelled, 12‐response button box on the computer screen. – A list of the stimuli to be included. – A random number generator to scramble the order of the stimuli. – A for‐loop to cycle through the stimuli and play each file, record the listener’s response and store the responses in a data file. – A Matlab script to load the data file and score the responses, keeping track of correct and incorrect answers. – thop=5; – nfft=256; – mag=ltass(y,rate,twin,thop,nfft); – freq=linspace(0,rate/2,length(mag)); – plot(freq,mag); Download the playback script http://www.utdallas.edu/~assmann/hcs7367/vowel_id.m • • • • Put the waveform files in a single directory Change directory name to match in vowel_id.m Change SNDREC program directory if necessary Run the program and report any errors to me Scoring the results • Results are stored in the Matlab data file specified. • Matlab data files have a .MAT extension. • Start Matlab and type: » load mydata.mat • • Next we’ll develop a scoring program. In the meantime you can score your results by hand: » [stimlist resplist] LDhoed.wav hoed TShod.wav hod HChod.wav hod TSheed.wav heed AMhood.wav hood CCherd.wav hood ... 8