ECE 598: The Speech Chain Lecture 8: Formant Transitions; Vocal Tract Transfer Function Today Perturbation Theory: A different way to estimate vocal tract resonant frequencies, useful for consonant transitions Syllable-Final Consonants: Formant Transitions Vocal Tract Transfer Function Uniform Tube (Quarter-Wave Resonator) During Vowels: All-Pole Spectrum Q Bandwidth Nasal Vowels: Sum of two transfer functions gives spectral zeros Topic #1: Perturbation Theory Perturbation Theory (Chiba and Kajiyama, The Vowel, 1940) A(x) is constant everywhere, except for one small perturbation. Method: 1. Compute formants of the “unperturbed” vocal tract. 2. Perturb the formant frequencies to match the area perturbation. Conservation of Energy Under Perturbation Conservation of Energy Under Perturbation “Sensitivity” Functions Sensitivity Functions for the QuarterWave Resonator (Lips Open) 0 x L • Note: low F3 of /er/ is caused in part by a side branch under the tongue – perturbation alone is not enough to explain it. /AA/ /ER/ /IY/ /W/ Sensitivity Functions for the HalfWave Resonator (Lips Rounded) 0 x L • Note: high F3 of /l/ is caused in part by a side branch above the tongue – perturbation alone is not enough to explain it. /L,OW/ /UW/ Formant Frequencies of Vowels From Peterson & Barney, 1952 Topic #2: Formant Transitions, Syllable-Final Consonant Events in the Closure of a Nasal Consonant Formant Transitions Vowel Nasalization Nasal Murmur Formant Transitions: A Perturbation Theory Model “the mom” Formant Transitions: Labial Consonants “the bug” Formant Transitions: Alveolar Consonants “the supper” “the tug” Formant Transitions: Post-alveolar Consonants “the shoe” “the zsazsa” “the gut” Formant Transitions: Velar Consonants “sing a song” Topic #3: Vocal Tract Transfer Functions Transfer Function “Transfer Function” T(w)=Output(w)/Input(w) In speech, it’s convenient to write T(w)=UL(w)/UG(w) UL(w) = volume velocity at the lips UG(w) = volume velocity at the glottis T(0) = 1 Speech recorded at a microphone = pressure PR(w) = R(w)T(w)UG(w) R(w) = jrf/r = “radiation characteristic” r = density of air r = distance to the microphone f = frequency in Hertz Transfer Function of an Ideal Uniform Tube Ideal Terminations: Reflection coefficient at glottis: zero velocity, g=1 Reflection coefficient at lips: zero pressure, g=-1 Obviously, this is an approximation, but it gives… T(w) = 1/cos(wL/c) w12w22w32… = …(w+w3)(w+w2)(w+w1)(w-w1)(w-w2)(w-w3)… wn = npc/L – pc/2L Fn = nc/2L – c/4L Transfer Function of an Ideal Uniform Tube Peaks are actually infinite in height (figure is clipped to fit the display) Transfer Function of a Non-Ideal Uniform Tube Almost ideal terminations: At glottis: velocity almost zero, g≈1 At lips: pressure almost zero, g≈-1 T(w) = 1/(j/Q +cos(wL/c)) … at Fn=nc/2L – c/4L,… T(2pFn) = -jQ 20log10|T(2pFn)| = 20log10Q Transfer Function of a Non-Ideal Uniform Tube Transfer Function of a Vowel: Height of First Peak is Q1=F1/B1 T(w) = ∞ (2pFn)2+(pBn)2 P (jw+j2pFn+pBn)(jw-j2pFn+pBn) n=1 T(2pF1) ≈ (2pF1)2/(j4pF1pB1) = -jF1/B1 Call Qn = Fn/Bn T(2pF1) ≈ -jQ1 20log10|T(2pF1)| ≈ 20log10Q1 Transfer Function of a Vowel: Bandwidth of a Peak is Bn ∞ T(w) = P n=1 (2pFn)2+(pBn)2 (jw+j2pFn+pBn)(jw-j2pFn+pBn) T(2pF1+pB1) ≈ (2pF1)2/((j4pF1)(pB1+pB1)) = -jQ1/2 At f=F1+0.5Bn, |T(w)|=0.5Qn 20log10|T(w)| = 20log10Q1 – 3dB Amplitudes of Higher Formants: Include the Rolloff (2pFn)2+(pBn)2 ∞ T(w) = P (jw+j2pF +pB )(jw-j2pF +pB ) n=1 n n n n At f above F1 T(2pf) ≈ (F1/f) T(2pF2) ≈ (-jF2/B2)(F1/F2) 20log10|T(2pF2)| ≈ 20log10Q2 – 20log10(F2/F1) 1/f Rolloff: 6 dB per octave (per doubling of frequency) Vowel Transfer Function: Synthetic Example L1 = 20log10(500/80)=16dB L2 = 20log10(1500/240) – 20log10(F2/F1) = 16dB – 9.5dB B2 = 240Hz B1 = 80Hz L3 = 20log10(2500/600) – 20log10(F3/F1) – 20log10(F3/F2) B3 = 600Hz? (hard to measure because rolloff from F1, F2 turns the F3 peak into a plateau) F4 peak completely swamped by rolloff from lower formants Shorthand Notation for the Spectrum of a Vowel T(s) = P n=1 ∞ snsn* (s-sn)(s-sn*) s = jw sn = -pBn+j2pFn sn* = -pBn-j2pFn snsn* = |sn|2 = (2pFn)2+(pBn)2 T(0) = 1 20log10|T(0)| = 0dB Another Shorthand Notation for the Spectrum of a Vowel T(s) = ∞ 1 P (1-s/sn)(1-s/sn*) n=1 Topic #4: Nasalized Vowels Vowel Nasalization Nasalized Vowel Nasal Consonant Nasalized Vowel PR(w) = R(w)(UL(w)+UN(w)) UN(w) = Volume Velocity from Nostrils PR(w) = R(w)(TL(w)+TN(w))UG(w) = R(w)T(w)UG(w) T(w) = TL(w) + TN(w) Nasalized Vowel T(s) = TL(s)+TN(s) 1 = (1-s/sLn)(1-s/sLn*) + 1 (1-s/sNn)(1-s/sNn*) 2(1-s/sZn)(1-s/sZn*) = (1-s/sLn)(1-s/sLn*)(1-s/sNn)(1-s/sNn*) 1/sZn = ½(1/sLn+1/sNn) sZn = nth spectral zero T(s) = 0 if s=sZn The “Pole-Zero Pair” 20log10T(w) = 20log10(1/(1-s/sLn)(1-s/sLn*)) + 20log10((1-s/sZn)(1-s/sZn*)/(1-s/sNn)(1-s/sNn*)) = original vowel log spectrum + log spectrum of a pole-zero pair Additive Terms in the Log Spectrum Transfer Function of a Nasalized Vowel Pole-Zero Pairs in the Spectrogram Nasal Pole Zero Oral Pole Summary Perturbation Theory: Formant Transitions Labial closure: loci near 250, 1000, 2000 Hz Alveolar closure: loci near 250, 1700, 3000 Hz Velar closure: F2 and F3 come together (“velar pinch”) Vocal Tract Transfer Function Squeeze near a velocity peak: formant goes down Squeeze near a pressure peak: formant goes up T(s) = P snsn*/(s-sn)(s-sn*) T(w=2pFn) = Qn = Fn/Bn 3dB bandwidth = Bn Hertz T(0) = 1 Nasal Vowels: Sum of two transfer functions gives a spectral zero between the oral and nasal poles Pole-zero pair is a local perturbation of the spectrum