Source/Filter theory for vowels

The source/filter theory of speech acoustics
Sound sources vocal fold vibration (complex periodic sound)
frication (turbulent noise)
stop release burst (impulse source)
Filter vocal tract resonances
vowel acoustics
voice source – complex periodic wave
vocal tract filter – resonant tube
[ ә ] Fn = (2n-1)c/ 4L
[ a ] ??
[ i ] ??
A sequence of glottal pulses – one pulse every 6.6 ms
so the glottal frequency is 150 Hz (150 pulses per second)
The “glottal frequency” is also called the fundamental frequency
of voicing and is abbreviated F0.
The spectrum of the sequence of glottal pulses.
It has a series of peaks – these are called “harmonics”
Higher frequency harmonics generally have lower
amplitude than lower frequency harmonics.
The frequency of the lowest harmonic (the first harmonic)
is equivalent to the fundamental frequency of the voice.
1st harmonic = F0 = 150 Hz (in this example)
The higher harmonics are integer multipules of the
fundamental – thus the 10th harmonic is at 1500 Hz.
This is the spectrum of the voice SOURCE. It has
yet to be filtered by a vocal tract!
Review schwa acoustics.
- Vocal tract is a uniform tube (same diameter for the
length of the tube)
- This tube has certain resonant frequencies
- We can calculate the resonances given the length
of the vocal tract (assume 17.5 cm for now)
and the speed of sound (assume 35,000 cm/s)
- Fn = (2n-1)c/4L
F1 = 500 Hz
F2 = 1500 Hz
F3 = 2500 Hz
The filtered voice source - [ ә ]
Same set of harmonics as in the unfiltered source
now harmonics at 500, 1500 and 2500 Hz are louder.
Not all vowels sound like schwa.
Today we’ll look at the “tube models” approach to
modeling vowel differences
Next time we’ll look at the “perturbation theory”
A tube model of the vowel [ɑ] (low back unrounded)
narrow back tube
wide front tube
A tube model of the vowel [ɑ] (low back unrounded)
fn = (2n-1)c/4lb
fn = (2n-1)c/4lf
A nomogram of the two tube model: resonant frequencies
for different back tube lengths.
A nomogram of the two tube model.
If the back tube is 3 cm long, the front tube is 13 cm long.
F1 = c/(4*13) = 673 Hz
F2 = 3c /(4*13) = 2019 Hz
F3 = c/ (4*3) = 2917 Hz
A nomogram of the two tube model.
In the vowel [ a ] the front and back tubes are about equal.
F2 = c/(4*8) = 1093 Hz
Prediction of the tube model is
basically correct – we find F1 and F2
in [ ɑ ] are very close to each other in
A tube model for the vowel [ i ]
back tube
front tube
A tube model for the vowel [ i ]
fn = nc/2lb
fn = (2n-1)c/4lf
A tube model for the vowel [ i ]
A Helmholtz resonator
f = c/2π √(Ac/Ablblc)
[i] with constriction 11 cm from the glottis
F1 = 300 Hz (Helmholtz)
F2 = 1900 Hz = c/(2*9) lb=9 cm
F3 = 2200 Hz = c / (4*4) lf = 4 cm
This tube model prediction is approximately
correct for [ i ].
F3 = 2500 Hz
F2 = 1900 Hz
F1 = 400 Hz