Lab 7 slides

advertisement
Fundamental frequency estimation
HCS 7367
Speech Perception Lab
http://www.mathworks.com/help/signal/ug/estimating‐
fundamental‐frequency‐with‐the‐complex‐cepstrum.html
TrackDraw uses a real cepstrum‐based F0 estimation algorithm that works reasonably well. Example using the command line:
Dr. Peter Assmann
Fall 2013
>> [y,fs]=wavread('wheel.wav');
>> F0=cepf0(y,fs)';
>> F0=medfilt1(F0,4);
Fundamental frequency estimation
Fundamental frequency estimation
250
YIN
de Cheveigné , A. (2002). YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111, 1917‐1930. 200
http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf
http://www.ircam.fr/pcm/cheveign/sw/yin.zip
http://www.ircam.fr/pcm/cheveign/sw/sf.zip
Cepstrum-based F0 estimation
F0 (Hz)
Cepstrum‐based F0 estimation:
YAAPT
Zahorian SA, Hu H (2008). A spectral/temporal method for robust fundamental frequency tracking. J. Acoust. Soc. Am. 123, 4559‐4571.
150
100
50
0
http://www.ws.binghamton.edu/zahorian/yaapt.htm
Others
100
200
300
Time (ms)
400
http://www.audiocontentanalysis.org/code/pitch‐tracking/compute‐pitch/
Fundamental frequency estimation
MBSC
Multi‐band summary correlogram‐based pitch detection
L. N. Tan, and A. Alwan, "Multi‐Band Summary Correlogram‐based Pitch Detection for Noisy Speech", Speech Communication 55 (2013), 841–856. 250
200
>> F0=mbsc(y,fs);
150
>> plot( F0,’ .’ );
• Introduction
• Statement of problem + background information (10%)
• Method – Enough detail for replication purposes (25%)
• Results • Figures + written summary (25%)
• Discussion http://www.ee.ucla.edu/~spapl/code/MBSC_matlab.zip
>> [y,fs]=wavread('sent1.wav');
Term Projects
– Discuss expected / explain unexpected findings (25%)
• Appendix
100
– Include all Matlab code (15%)
50
0
0
20
40
60
80
100
120
140
160
180
200
1
Ellipse plots
• Download the ellipse function:
http://www.utdallas.edu/~assmann/hcs7367/ellipse.m
• Load the Hillenbrand data set:
[filenames,dur,F0s,F1s,F2s,F3s,F4s,F120,F220,F320,F150,F250,
F350,F180,F280,F380] = textread('vowdata_no_header.txt','%s%4.1f%4.1f%4.1f%4.1f%
4.1f%4.1f%4.1f%4.1f%4.1f%4.1f%4.1f%4.1f%4.1f%4.1f%4.1f');
Ellipse plots
% assign vowel and group codes
talker_group = char('m','w','b','g');
vowel = char('ae','ah','aw','eh','er','ei','ih','iy','oa','oo','uh','uw');
hvd=char('had','hod','hawed','head','herd','hayed','hid','heed','hoed', 'hod','hood','hud','whod');
filenames=char(filenames); [nfiles,nchar]=size(filenames);
for ifile=1:nfiles,
vowel_code(ifile) = strmatch(filenames(ifile,4:5),vowel);
talker_group_code(ifile) = strmatch(filenames(ifile,1),talker_group);
end;
Ellipse plots
z=ellipse(log(F1s(indgv)),log(F2s(indgv)),1);
patch(z(:,1),z(:,2),[0.7 0.7 1],'EdgeColor','b');
hold on;
plot(log(F1s(indgv)),log(F2s(indgv)),'ob','MarkerSize',1,'MarkerFaceCol
or','b'); hm=text(meanF1,meanF2,hvd(1,:),'FontSize',9);
set(hm,'Color','b');
hold on;
Ellipse plots
% Get rid of zero entries
ind=union(find(F1s==0),find(F2s==0)); ind=union(ind,find(F3s==0));
filenames(ind,:)=[];
F1s(ind)=[];
F2s(ind)=[];
F3s(ind)=[];
Ellipse plots
indg=find(talker_group_code==1); % find row numbers for males
indv=find(vowel_code==1); % find row numbers for /ae/
indgv=intersect(indg,indv); % intersection
meanF1=mean(log(F1s(indgv))); % log mean of F1 for this subset
meanF2=mean(log(F2s(indgv))); % log mean of F2 for this subset
Ellipse plots
axis(log([250 1300 500 4000]))
set(gca,'Box','On','XTick',log([300:100:1300]),...
'XTickLabel','300|400||600||800||1000||1200|');
set(gca,'YTick',log([500:500:4000]),...
'YTickLabel','500|1000|1500|2000|2500|3000|3500|4000');
xlabel('F1 frequency (Hz)');
ylabel('F2 frequency (Hz)');
title('Hillenbrand vowels ‐ W. Michigan (adults)');
2
Ellipse plots
Ellipse plots
Hillenbrand vowels - W. Michigan (adults)
4000
3500
3000
F2 frequency (Hz)
2500
2000
had
1500
1000
500
300
400
600
F1 frequency (Hz)
800
1000
1200
• Homework assignment: put the code above into a script and add a for‐loop to plot ellipses, data points and labels for all 12 vowels. • Make separate ellipse plots for males, females, boys and girls
• Bonus points: plot your vowels in the vowel space. Be sure to log‐transform the formant frequencies, and use large symbols:
>> plot(log(myF1),log(myF2),'rp','MarkerSize',20, 'MarkerFaceColor','r');
Filters
Filters
• A filter is a device that alters the frequency content of a signal through mechanical, acoustical or electrical elements. Digital filters operate on discrete‐time (sampled) signals.
• Filters are commonly used to allow a certain band of frequencies to pass through while others are suppressed or attenuated. • Low‐pass filter – passes frequencies below fc
• High‐pass filter – passes frequencies above fc
• Bandpass filter – passes frequencies between fL and fH
• Bandstop filter – passes frequencies outside fL and fH
Filters and convolution
Filters and convolution
• The Matlab function conv.m implements the mathematical operation of convolution, defined as:
y ( n)  h( n) * x( n) 

 h( n  m) x ( m)
m  
• A digital filter's output y(k) is related to its input x(k) by convolution with its impulse response, h(k).
y ( n)  h( n) * x( n) 

 h( n  m) x ( m)
m  
3
Filters and convolution
• Use conv.m in digital filtering applications where the filter length, m, is finite, and the number of speech samples, n, is also finite.
Filters and convolution
• Convolution involves the following steps: – (1) reverse the impulse response in time; – (2) sum the cross‐products of filter and signal to generate a sample of the output; y ( n)  h( n) * x( n) 

 h( n  m) x ( m)
m  
– (3) shift the filter by one sample and repeat step 2. Continue until signal and filter no longer overlap.
y ( n)  h( n) * x ( n) 

 h( n  m) x ( m)
m  
Filters and convolution
• Signal x(k)
– Generate a vector of 20 random numbers:

x=rand(20,1); Filters and convolution
• Convolve the signal with the filter
y = conv( h,x );
Filters and convolution
• Create the filter impulse response, h(k)
– Design a 3‐point moving average filter
h = [ 1 1 1 ] / 3;
Filters and convolution
• plot(x, 'b');
• hold on;
• plot(y, 'r');
Add legend
• legend('Input sequence', 'Filtered sequence');
4
Filters and convolution
• The 3‐point moving average filter is an example of a finite impulse response (FIR) filter with a finite set of filter coefficients b(1), b(2), … b(nb). • It acts as a low‐pass (smoothing) filter.
Filtering and convolution
• Another way to describe the operation of filtering is in terms of the z‐transform:
Y ( z)  H ( z) X ( z) 
b(1) b(2) z 1  ...  b(nb  1) z  nb
X ( z)
a(1) a(2) z 1  ...  a(na  1) z na
• where X (z ) is the z ‐transform of the speech signal,
• H (z ) is the transfer function of the filter, and
• Y (z ) is the z ‐transform of the filtered speech.
Filtering and convolution
• The constants b (i ) and a (i ) are the filter coefficients; the order of the filter is the larger of nb and na.
Y ( z)  H ( z) X ( z) 
b(1) b(2) z 1  ...  b(nb  1) z  nb
X ( z)
a(1) a(2) z 1  ...  a(na  1) z na
Filtering and convolution
• When na = 0 (a is a scalar) the filter is a Finite Impulse Filtering and convolution
• When nb = 0 (b is a scalar) the filter is an Infinite Impulse Response (IIR), all‐pole, recursive, or autoregressive (AR) filter.
Y ( z)  H ( z) X ( z) 
b(1) b(2) z 1  ...  b(nb  1) z  nb
X ( z)
a(1) a(2) z 1  ...  a(na  1) z na
Filtering and convolution
• When both na > 0 and nb > 0, the filter is an Infinite Response (FIR), all‐zero, non‐recursive, or moving Impulse Response (IIR), pole‐zero, recursive, or average (MA) filter.
autoregressive moving average (ARMA) filter.
Y ( z)  H ( z) X ( z) 
b(1) b(2) z 1  ...  b(nb  1) z  nb
X ( z)
a(1) a(2) z 1  ...  a(na  1) z na
Y ( z)  H ( z) X ( z) 
b(1) b(2) z 1  ...  b(nb  1) z  nb
X ( z)
a(1) a(2) z 1  ...  a(na  1) z na
5
Filtering and convolution
Filtering and convolution
• The z‐transform expressed as a difference equation: y ( n)  b1 x(n)  b2 x( n  1)  ...
 bnb 1 x(n  nb)  a2 y (n  1)  ...
 ana 1 y (n  na )
• The output samples of y are given by:
y (1)  b1 x(1)
• Digital filters of this form can be implemented using the function filter.m
• Example: a simple low‐pass filter
Frequency response
» b = 1; 0
y (2)  b1 x(2)  b2 x(1)  a2 y (1)
y (3)  b1 x(3)  b2 x(2)  b2 x(1)  a2 y (2)  a3 y (1)
» a = [1 ‐0.9] » x=[1; zeros(1023,1)];
» y = filter(b,a,x);
…
-10
-15
-20
-25
-30
0
Graphical techniques for filter design
•
•
•
•
Magnitude (dB)
-5
Fdesign
Filterbuilder GUI
FDATool
SPTool
ELLIP Elliptic or Cauer digital and analog filter design.
[B,A] = ELLIP(N,Rp,Rs,Wn) designs an Nth order lowpass digital
elliptic filter with Rp decibels of ripple in the passband and a stopband Rs
decibels down. ELLIP returns the filter coefficients in length N+1 vectors
B (numerator) and A (denominator).
The cut-off frequency Wn must be 0.0 < Wn < 1.0, with 1.0 corresponding
to half the sample rate. Use Rp = 0.5 and Rs = 20 as starting points, if you
are unsure about choosing them.
If Wn is a two-element vector, Wn = [W1 W2], ELLIP returns an
order 2N bandpass filter with passband W1 < W < W2.
[B,A] = ELLIP(N,Rp,Rs,Wn,'high') designs a highpass filter.
2
3
Frequency (kHz)
4
5
Filter design
• Example: design a 5‐th order elliptical low‐
pass filter with a cutoff of 1000 Hz, no more than 1 dB of ripple in the passband, and at least 60 dB attenuation in the stopband.
Elliptic filter
» help ellip
1
Elliptic filter
• Example: design a 5‐th order elliptical low‐pass filter with no more than 1 dB of ripple in the passband, and at least 60 dB attenuation in the stopband.
» N=5;
»
»
»
»
»
»
% filter order
Rp=1;
% # of decibels of ripple in the passband
Rs=60; % # of decibels of attenuation in stopband
Wn=1000 / (rate/2); % normalized filter cutoff frequency [b ,a] = ellip(N,Rp,Rs,Wn);
x=[1; zeros(1023,1)];
y = filter (b,a,x);
[B,A] = ELLIP(N,Rp,Rs,Wn,'stop') is a bandstop filter if Wn = [W1 W2].
6
Elliptic filter
Bandpass filters
• Example: design a 5‐th order elliptical low‐pass filter with no more than 1 dB of ripple in the passband, and at least 60 dB attenuation in the stopband.
»
»
»
»
»
»
»
»
0
-20
Magnitude (dB)
• Example: design a 5‐th order elliptical bandpass filter centered at 1000 Hz with a bandwidth of 400 Hz; no more than 1 dB of ripple in the passband, and at least 60 dB attenuation in the stopband.
-40
-60
-80
N=5;
% filter order
Rp=1;
% # of decibels of ripple in the passband
Rs=60; % # of decibels of attenuation in stopband
Wn=[800 1200]; % specify upper and lower filter cutoff frequencies
Wn=Wn / (rate/2); % normalized filter cutoff frequencies
[b ,a] = ellip(N,Rp,Rs,Wn);
x=[1; zeros(1023,1)];
y = filter (b,a,x);
-100
0
1
2
3
Frequency (kHz)
4
5
Homework: LP, HP, and BP filters
Bandpass filters
0
Magnitude (dB)
-20
-40
-60
-80
-100
0
1000
2000
3000
Frequency (Hz)
4000
5000
Gammatone spectrograms
• Download the Gammatone spectrogram code:
http://www.ee.columbia.edu/~dpwe/resources/m
atlab/gammatonegram/
• If you use this in your work, include this reference:
D. P. W. Ellis (2009). "Gammatone‐like spectrograms", web resource. http://www.ee.columbia.edu/~dpwe/resources/matlab/gammatonegram/
1. Design a 5‐th order elliptical low‐pass filter with 1000 Hz cutoff, no more than 1 dB of ripple in the passband, and at least 60 dB attenuation in the stopband.
2. Design a high‐pass filter with these same parameters.
3. Design a band‐pass filter centered at 1000 Hz with a 500‐Hz bandwidth.
4. Filter the vowel /ae/ from your dataset. Plot the waveform and spectrum and listen to the filtered vowel.
Long‐term average speech spectrum
>> help ltass
LTASS: Computes long‐term average speech spectrum via FFT.
Usage: mag=ltass(w,rate,twin,thop,nfft);
w: input waveform
rate: sample rate in Hz (default 10000 Hz)
twin: frame length in ms (default: 10 ms)
thop: frame update in ms (default: 5 ms)
nfft: FFT window length (default: 256 samples)
7
Long‐term average speech spectrum
• Compute and plot the LTASS of the vowel /ae/
– twin=10;
Vowel identification experiment
• What we need:
– A Matlab script to implement a labelled, 12‐response button box on the computer screen.
– A list of the stimuli to be included.
– A random number generator to scramble the order of the stimuli.
– A for‐loop to cycle through the stimuli and play each file, record the listener’s response and store the responses in a data file.
– A Matlab script to load the data file and score the responses, keeping track of correct and incorrect answers.
– thop=5;
– nfft=256;
– mag=ltass(y,rate,twin,thop,nfft);
– freq=linspace(0,rate/2,length(mag));
– plot(freq,mag);
Download the playback script
http://www.utdallas.edu/~assmann/hcs7367/vowel_id.m
•
•
•
•
Put the waveform files in a single directory
Change directory name to match in vowel_id.m
Change SNDREC program directory if necessary
Run the program and report any errors to me
Scoring the results
• Results are stored in the Matlab data file specified.
• Matlab data files have a .MAT extension.
• Start Matlab and type:
» load mydata.mat
•
•
Next we’ll develop a scoring program. In the meantime you can score your results by hand:
» [stimlist resplist]
LDhoed.wav hoed TShod.wav hod HChod.wav hod TSheed.wav heed AMhood.wav hood CCherd.wav hood ...
8
Download