The Acoustics and Perception of American English Vowels Hillenbrand: Vowels 1 Formant Patterns for the “Noncentral” (i.e., omitting /ú/ and /ü/) Monophthongal Vowels of American English (based on Peterson & Barney averages) Hillenbrand: Vowels 2 Let’s Look at Another Way to Visualize Formant Data for Vowels: The “Standard” F1-F2 Plot. Hillenbrand: Vowels 3 Hillenbrand: Vowels 4 Hillenbrand: Vowels 5 Formant Data for Men “Standard” F1-F2 Plot Hillenbrand: Vowels 6 Notice that the formant values for women for a given vowel are shifted up and to the right, indicating higher values for both F1 and F2. This is due to the shorter vocal tracts of women vs. men. The same is true of the relationship between the formant values of children relative to women – and for the same reason; i.e., children have shorter vocal tracts than women. Hillenbrand: Vowels 7 One More (apparently screwy) Way to Visualize Vowel Formant Data: The Acoustic Vowel Diagram Note that in the Acoustic Vowel Diagram: (1) the axes are reversed, (2) the numbers go backwards. Why would anyone do such a screwy thing? Hillenbrand: Vowels 8 Conventional F1-F2 Plot Acoustic Vowel Diagram Hillenbrand: Vowels 9 Peterson & Barney Averages (for men only) Plotted on an Acoustic Vowel Diagram Formant data are being plotted, but the result strongly resembles an articulatory vowel diagram, with the x axis corresponding to tongue advancement (i.e., front vs. back) and the y axis corresponding to tongue height. This gives us a convenient way to interpret formant data in articulatory terms. Hillenbrand: Vowels 10 1. What is the articulatory explanation for the differences in formant frequencies? 2. What effect might this have on the intelligibility of the vowels spoken by the deaf talker? Data shown above are hypothetical, but this is exactly the sort of thing that has been observed in the speech of deaf talkers. For example, Monsen (1978) showed that: (a) the formant values of deaf talkers tend to be centralized relative to NH talkers, and (b) the degree of centralization is a good predictor of speech intelligibility. Hillenbrand: Vowels 11 Peterson & Barney (1952) Study conducted at Bell Labs. The 1st big acoustic study carried out with the (then) recently invented sound spectrograph machine. 1. Recordings 10 vowels (i,ɪ,ɛ,æ,ɑ,ɔ,ʊ,u,ʌ,ɚ) /hVd/ context (“heed,” “hid,” “head,” “had,” etc.) 76 talkers (33 men, 28 women, 15 children) 2. Measurements: F0, F1-F3 3. Listening Study 70 listeners asked to identify each test signal as one of ten words (“heed,” “hid,” “head,” “had,” etc.) Hillenbrand: Vowels 12 Listening Test Results Simple: The signals were highly intelligible. Overall identification accuracy = 94.5% Error rate varied some across vowels. For example: Error rate very low for: /i/ (0.1%) /ü/ (0.4%) /u/ (0.8%) Higher for: /å/ /‰/ (13.0%, confused with /Ø/) (12.1%, confused with /A/) ___________________________________ Details aside, the simple message is that vowel identity was transmitted quite accurately to the listeners. What information do listeners use to recognize vowels? To answer this, we need to look at the acoustic data. ________________________________ Hillenbrand: Vowels 13 Peterson & Barney (1952) General American English Vowel Formant Data 1. Lots of overlap among adjacent vowels 2. Note the positions of /A/ and /‰/: /A/ is a lower (and less fronted) vowel than /‰/ Hillenbrand: Vowels 14 Peterson & Barney (1952) General American English Vowel Formant Data 1. Lots of overlap among adjacent vowels 2. Note the positions of /A/ and /‰/: /A/ is a lower (and less fronted) vowel than /‰/ Hillenbrand: Vowels 15 It is mostly the case that the men occupy the lower left portion of each ellipse, the children occupy the upper right portion, and the women cluster toward the center. This is mainly due to differences in vocal- tract length. There is quite a bit of variability across individual talkers, though. (Data from Peterson & Barney, 1952.) Hillenbrand: Vowels 16 Same Data as Previous Figure, but Plotted on a Single Graph Hillenbrand: Vowels 17 Hillenbrand, Getty, Clark & Wheeler (1995) Michigan (Northern Cities) Vowel Formant Data 1. Lots of overlap among adjacent vowels 2. /A/ and /‰/ almost on top of one another, and out of order from Peterson & Barney (1952 Hillenbrand: Vowels 18 Peterson & Barney (Mostly Mid-Atlantic) vs. Hillenbrand et al. (Upper Midwest/Northern Cities) 1. /A/ is raised and fronted in Northern Cities data 2. Back vowels fronted (especially /å/) and lower in Northern Cities data 3. High vowels (/i/ /I/ /u/ /U/) not quite as high in Northern Cities data Hillenbrand: Vowels 19 Question: How well can vowels be separated based on F1 and F2 alone? This is the kind of question that can be answered with a statistical pattern recognition algorithm. Hillenbrand: Vowels 20 How A Pattern Recognizer Works Training Testing Hillenbrand: Vowels 21 So, how well can vowels be separated based on F1 and F2 alone? Answer: Pretty well, but not nearly well enough to explain human listener data. ________________________________________________ ________________________________________________ Pattern classification results from Hillenbrand & Gayvert (1993) Automatic Classification Human Listeners Peterson & Barney vowels: 74.9% 94.4% Hillenbrand et al. vowels: 68.2% 95.4% ________________________________________________ ________________________________________________ It must be that listeners are using some information to recognize vowels other than F1 and F2. What are the possibilities? F3: It helps some (especially for /ü/), but not enough: Automatic classification improves to about 80-85% – better, but still well below human listeners. f0: Ditto: It helps some, but not enough: Automatic classification improves to about 80-85% – better, but still well below human listeners. F3 and f0: Better still (~89-90%), but still below human listeners. Hillenbrand: Vowels 22 What does this mean? It appears as though listeners are recognizing vowels based on information other than F0 and F1-F3. What are the possibilities? Two Candidates: Duration Patterns of spectral change over time Hillenbrand: Vowels 23 ___________________________________ American English Vowels Have Different Typical Durations ___________________________________ /i/ > /I/ /u/ > /U/ /A/ > /‰/ /å/ > /ú/ /Ø/ > /å/ ___________________________________ ___________________________________ Do Listeners Use Duration in Vowel Identification? Hillenbrand: Vowels 24 Original Duration Neutral Duration Short Duration Long Duration Hillenbrand: Vowels 25 Logic: If duration plays no role in vowel recognition, the 4 signal types ought to be equally intelligible; i.e., artificially modifying duration will not affect what vowel is heard. On the other hand, if duration plays a role in vowel perception, the OD signals ought to be more intelligible than any of the duration-modified signals. Also, there are specific kinds of changes in vowel identity that we would expect. For example: Shortened /i/ ought to be heard as /I/ Lengthened /I/ ought to be heard as /i/ Shortened /A/ ought to be heard as /‰/ Lengthened /‰/ ought to be heard as /A/ Shortened /u/ ought to be heard as /U/ Lengthened /U/ ought to be heard as /u/ Shortened /å/ ought to be heard as /ú/ Lengthened /ú/ ought to be heard as /å/ Hillenbrand: Vowels 26 RESULTS Original Duration: 96.0% Neutral Duration: 94.1% Short Duration: 91.4% Long Duration: 90.9% Hillenbrand: Vowels 27 CONCLUSIONS 1. Duration has a measurable but fairly small overall effect on vowel perception. 2. Vowel Shortening (-2 SDs): ~5% drop in overall intelligibility 3. Vowel Lengthening (+2 SDs): ~5% drop in overall intelligibility 4. Vowels Most Affected: /å/ - /Ø/ - /ú/ /A/ - /‰/ 5. Vowels Not Affected: /i/ - /I/ /u/ - /U/ Hillenbrand: Vowels 28 Effects of Duration on Vowel Perception • Original Duration, Long Duration, Short Duration Hillenbrand: Vowels 29 The Role of Spectral Change in Vowel Perception Notice that some vowels – especially /A/ and /I/ – show a fair amount of change in formant freq’s throughout the course of the vowel. Is it possible that these formant movements are perceptually significant? Hillenbrand: Vowels 30 More examples. Note especially the rise in F2 for /U/ and /ú/. Hillenbrand: Vowels 31 Here’s another way to visualize patterns of formant frequency change in vowels: This figure shows formant frequencies measured at the beginning of the vowel and a 2nd time at the end of the vowel. (The phonetic symbol is plotted at the 2nd measurement). Note that some vowels (e.g., /i/ and /u/) are pretty steady over time, but others have formants that change quite a bit throughout the course of the vowel (e.g., /e/, /o/, /ú/, /U/, /A/, /I/). Hillenbrand: Vowels 32 NAT: OF: FF: Naturally spoken /hAd/ Synthesized, preserving original formant contours Synthesized with flattened formants Hillenbrand: Vowels 33 Key comparison is OF vs. FF: If the formant movements don’t matter, flattening the formant contour will not affect the vowel percept, and the recognition rates for OF and FF should be very similar. On the other hand, if the formant movements are important, the FF signals will be less intelligible than the OF signals. Conclusion: Spectral change patterns do matter. Hillenbrand: Vowels 34 What can we conclude from all this about how listeners recognize which vowel was spoken? 1. Primary Cues: F1 and F2 Relationships among the formants matter, not absolute formant frequencies 2. Cues that are of secondary importance, but definitely play a role in vowel perception: f0 F3 (especially for [ɚ]) Spectral Change Patterns Vowel Duration Hillenbrand: Vowels 35