ACOUSTICAL THEORY OF SPEECH PRODUCTION Robert A. Prosek, Ph.D. CSD 301 Acoustical Theory • There is nothing more practical than a good theory • The linear source-filter theory is one of the best in our field • Based on Gunnar Fant’s “Acoustic Theory of Speech Production” • The theory expresses articulatory-acoustic relationships Acoustical Theory • The source is vocal fold vibration • for some consonants, the source is more complex • can be in the vocal tract or a combination of both • The filter is the vocal tract • extending from the vocal folds to the lips or nares • like all filters, the vocal tract is frequency dependent Acoustic Theory • The source and the filter are assumed to be independent • this is an assumption made for convenience • it implies that you can change the output of the vocal folds without changing the vocal tract • vice-versa Vowels • Modeled as a tube closed at one end and open at the other • the closure is a membrane with a slit in it • the tube has uniform cross sectional area • membrane represents the source of energy (vocal folds) • the energy travels through the tube • the tube generates no energy on its own • the tube represents an important class of resonators • odd quarter length relationship Vowels (2) • There are an infinite number of resonances for this tube • we need only consider the first three or four • the model is valid to only about 5 kHz • The model was developed by Chiba and Kajiyama in 1941 • based on pipe organs for which a great deal was known Vowels (3) • If c=35000 cm/s, and • l=17.5 cm • What are the first three resonances? • The simple tube closed at one end and open at the other, with the above length, is a reasonable approximation of /ᴧ/ produced by a male talker • Some points to note: Vowels (4) • A curved tube (vocal tract) and a straight tube (model) behave identically acoustically out to 5 kHz • this is because the curve begins to affect acoustic signals with a short wavelength • The resonances are equally spaced if the tube has uniform cross sectional area • Remember: all of the energy comes from the source (vocal fold vibration for vowels) • Changing the length of the tube changes the resonance frequencies • Influenced by age and sex • l= 14.5 cm for females • l= 8.75 cm for children Vowels (5) • A one-vowel model isn’t very useful • Different vowels are modeled, acoustically, by different vocal tract shapes • Phonetically, how are vowels distinguished? • If we place a constriction in the tube (vocal tract) • the resonances changes • if you change the articulation, you change the vocal tract shape, and the resonance frequencies, amplitudes and bandwidths • • Vowels (6) The output energy of a vowel is the product of • the source energy • the size and shape of the resonator • the radiation characteristic Glottal source characteristics for vowels • vocal fold vibration is periodic • • what does this imply for the spectrum? f0 or F0 is used to indicate the vocal fundamental frequency • the amplitude of the harmonics decreases by -12 dB/octave Vowels (7) • Filter characteristics for vowels • the vocal tract is a dynamic filter • it is frequency dependent • it has, theoretically, an infinite number of resonances • each resonance has a center frequency, an amplitude and a bandwidth • for speech, these resonances are called formants • formants are numbered in succession from the lowest • • F1, F2, F3, etc. • A1, A2, A3, etc. • B1, B2, B3, etc. the formants together form the transfer function • input-output relationship • formants become physically evident only when energized Vowels (8) • Radiation characteristic • acoustic effect when a sound leaves a small area and enters a large one • The effect is to raise the slope of the spectrum by +6 dB/octave • Acoustic Phonetic Relationships for Vowels • F1 is inversely related to tongue height • F2 is directly related to tongue advancement • Lip rounding lowers all formant frequencies Vowels (9) • Perturbation Theory • Volume velocity variations reflect the way air particles vibrate at a particular point in the vocal tract • At some points, vibration is minimal (node); at others, maximal (antinodes) • For F1, the antinode is at the open end and the node is at the closed end • For F2, there are two antinodes and two nodes • For F3, there are three antinodes and three nodes • etc. Vowels (10) • Perturbation Theory (continued) • if a change in cross sectional area is applied (a perturbation) • the acoustic effect depends on proximity to a node or an antinode • near an antinode the formant frequency lowers • near a node the formant frequency rises • lip constrictions lower all formant frequencies • laryngeal constrictions raise all formant frequencies Vowels (11) • Amplitude relationships • amplitudes depend on formant frequencies • if F1 is lowered (raised), A1 lowers (rises) • if two formant frequencies move closer together, then both peaks increase in amplitude • how do you raise or lower formant frequencies? Vowels (12) • Source-Filter Interactions • Some vocal tract shapes may affect vocal fold vibration • Singers’ formant • High impedance constrictions require greater subglottal air pressure • Vocal tract - vocal fold coupling during open phase of vibratory cycle Consonants (1) • The linear source-filter theory can be used to describe the acoustics of consonants as well as vowels • For consonants, however, the source is not always at the level of the vocal folds • some sources are in the vocal tract • these sources are aperiodic • durations and amplitudes also are different from vowels • Nonetheless, source-filter theory gives us a series of expectations for the acoustic characteristics for consonants • Consonants (2) Fricatives • Modeled as a tube with a very severe constriction • The air exiting the constriction is turbulent • The Reynold’s number gives the conditions for turbulence • Re=vh/ʊ • Notice that turbulence can be generated in two ways • Zeros or antiformants can be found in the spectrum • Because of the turbulence, there is no periodicity unless accompanied by voicing • What does an aperiodic spectrum look like? Consonants (3) • When a fricative constriction is tapered • the back cavity is involved • this resembles a tube closed at both ends • Fn=nc/2l • such a situation occurs primarily for articulation disorders Consonants (4) • Nasal consonants • Velopharyngeal port is open and the oral cavity is completely blocked at some point • The side-branch resonator produces antiformants (zeros) • The overall vocal tract is longer than for vowels • What effect does this have on the spectrum? • Oral formants, nasal formants, nasal antiformants • Nasal murmur Consonants (5) • Stops • The tube model is not altered very much for stops • However, the time domain becomes critical • There is a complete closure of the vocal tract somewhere • Pressure builds up behind the closure • Rapid release • The articulation results in a burst and transitions Consonants (6) • Other consonants are variations of these • Affricates • Liquids • Glides • Diphthongs