The Acoustics and Perception of American English Vowels Hillenbrand: Vowels 1

advertisement
The Acoustics and Perception
of American English Vowels
Hillenbrand: Vowels
1
Formant Patterns for the “Noncentral” (i.e., omitting
/ú/ and /ü/) Monophthongal Vowels of American English
(based on Peterson & Barney averages)
Hillenbrand: Vowels
2
Let’s Look at Another Way to Visualize Formant
Data for Vowels: The “Standard” F1-F2 Plot.
Hillenbrand: Vowels
3
Hillenbrand: Vowels
4
Hillenbrand: Vowels
5
Formant Data for Men
“Standard” F1-F2 Plot
Hillenbrand: Vowels
6
Notice that the formant values for women for a given vowel are shifted up
and to the right, indicating higher values for both F1 and F2. This is due to
the shorter vocal tracts of women vs. men. The same is true of the
relationship between the formant values of children relative to women –
and for the same reason; i.e., children have shorter vocal tracts than
women.
Hillenbrand: Vowels
7
One More (apparently screwy) Way
to Visualize Vowel Formant Data:
The Acoustic Vowel Diagram
Note that in the Acoustic Vowel Diagram: (1) the axes are reversed,
(2) the numbers go backwards.
Why would anyone do such a screwy thing?
Hillenbrand: Vowels
8
Conventional F1-F2 Plot
Acoustic Vowel Diagram
Hillenbrand: Vowels
9
Peterson & Barney Averages (for men only)
Plotted on an Acoustic Vowel Diagram
Formant data are being plotted, but the result strongly resembles an articulatory vowel
diagram, with the x axis corresponding to tongue advancement (i.e., front vs. back) and
the y axis corresponding to tongue height. This gives us a convenient way to interpret
formant data in articulatory terms.
Hillenbrand: Vowels
10
1. What is the articulatory explanation for the differences in formant frequencies?
2. What effect might this have on the intelligibility of the vowels spoken by the deaf talker?
Data shown above are hypothetical, but this is exactly the sort of thing that has been observed
in the speech of deaf talkers. For example, Monsen (1978) showed that: (a) the formant values
of deaf talkers tend to be centralized relative to NH talkers, and (b) the degree of
centralization is a good predictor of speech intelligibility.
Hillenbrand: Vowels
11
Peterson & Barney (1952)
Study conducted at Bell Labs. The 1st big acoustic study carried out
with the (then) recently invented sound spectrograph machine.
1. Recordings
10 vowels (i,ɪ,ɛ,æ,ɑ,ɔ,ʊ,u,ʌ,ɚ)
/hVd/ context (“heed,” “hid,” “head,” “had,” etc.)
76 talkers (33 men, 28 women, 15 children)
2. Measurements: F0, F1-F3
3. Listening Study
70 listeners asked to identify each test signal as one of ten words
(“heed,” “hid,” “head,” “had,” etc.)
Hillenbrand: Vowels
12
Listening Test Results
Simple: The signals were highly intelligible.
Overall identification accuracy = 94.5%
Error rate varied some across vowels. For example:
Error rate very low for:
 /i/ (0.1%)
 /ü/ (0.4%)
 /u/ (0.8%)
Higher for:
 /å/
 /‰/
(13.0%, confused with /Ø/)
(12.1%, confused with /A/)
___________________________________
Details aside, the simple message is that vowel
identity was transmitted quite accurately to the
listeners.
What information do listeners use to recognize
vowels?
To answer this, we need to look at the acoustic data.
________________________________
Hillenbrand: Vowels
13
Peterson & Barney (1952) General American
English Vowel Formant Data
1. Lots of overlap among adjacent vowels
2. Note the positions of /A/ and /‰/: /A/ is a lower (and less fronted)
vowel than /‰/
Hillenbrand: Vowels
14
Peterson & Barney (1952) General American
English Vowel Formant Data
1. Lots of overlap among adjacent vowels
2. Note the positions of /A/ and /‰/: /A/ is a lower (and less fronted)
vowel than /‰/
Hillenbrand: Vowels
15
It is mostly the case that the men
occupy the lower left portion of each
ellipse, the children occupy the upper
right portion, and the women cluster
toward the center. This is mainly due
to differences in vocal- tract length.
There is quite a bit of variability
across individual talkers, though.
(Data from Peterson & Barney,
1952.)
Hillenbrand: Vowels
16
Same Data as Previous Figure, but Plotted
on a Single Graph
Hillenbrand: Vowels
17
Hillenbrand, Getty, Clark & Wheeler (1995)
Michigan (Northern Cities) Vowel Formant Data
1. Lots of overlap among adjacent vowels
2. /A/ and /‰/ almost on top of one another, and out of
order from Peterson & Barney (1952
Hillenbrand: Vowels
18
Peterson & Barney (Mostly Mid-Atlantic) vs.
Hillenbrand et al. (Upper Midwest/Northern Cities)
1. /A/ is raised and fronted in Northern Cities data
2. Back vowels fronted (especially /å/) and lower in Northern Cities data
3. High vowels (/i/ /I/ /u/ /U/) not quite as high in Northern Cities data
Hillenbrand: Vowels
19
Question: How well can vowels be separated based on F1
and F2 alone? This is the kind of question that can be
answered with a statistical pattern recognition algorithm.
Hillenbrand: Vowels
20
How A Pattern Recognizer Works
Training
Testing
Hillenbrand: Vowels
21
So, how well can vowels be separated based on F1 and F2 alone?
Answer: Pretty well, but not nearly well enough to explain human listener data.
________________________________________________
________________________________________________
Pattern classification results from Hillenbrand & Gayvert (1993)
Automatic
Classification
Human
Listeners
Peterson & Barney vowels:
74.9%
94.4%
Hillenbrand et al. vowels:
68.2%
95.4%
________________________________________________
________________________________________________
It must be that listeners are using some information to recognize vowels other than
F1 and F2. What are the possibilities?
F3: It helps some (especially for /ü/), but not enough: Automatic classification
improves to about 80-85% – better, but still well below human listeners.
f0: Ditto: It helps some, but not enough: Automatic classification improves to about
80-85% – better, but still well below human listeners.
F3 and f0: Better still (~89-90%), but still below human listeners.
Hillenbrand: Vowels
22
What does this mean? It appears as though listeners are
recognizing vowels based on information other than F0 and
F1-F3. What are the possibilities?
Two Candidates:
Duration
Patterns of spectral change over time
Hillenbrand: Vowels
23
___________________________________
American English Vowels Have
Different Typical Durations
___________________________________
/i/ > /I/
/u/ > /U/
/A/ > /‰/
/å/ > /ú/
/Ø/ > /å/
___________________________________
___________________________________
Do Listeners Use Duration
in Vowel Identification?
Hillenbrand: Vowels
24
Original Duration
Neutral Duration
Short Duration
Long Duration
Hillenbrand: Vowels
25
Logic: If duration plays no role in vowel recognition, the 4 signal types ought to be equally
intelligible; i.e., artificially modifying duration will not affect what vowel is heard. On the
other hand, if duration plays a role in vowel perception, the OD signals ought to be more
intelligible than any of the duration-modified signals.
Also, there are specific kinds of changes in vowel identity that we would expect. For
example:
Shortened /i/ ought to be heard as /I/
Lengthened /I/ ought to be heard as /i/
Shortened /A/ ought to be heard as /‰/
Lengthened /‰/ ought to be heard as /A/
Shortened /u/ ought to be heard as /U/
Lengthened /U/ ought to be heard as /u/
Shortened /å/ ought to be heard as /ú/
Lengthened /ú/ ought to be heard as /å/
Hillenbrand: Vowels
26
RESULTS
Original Duration:
96.0%
Neutral Duration:
94.1%
Short Duration:
91.4%
Long Duration:
90.9%
Hillenbrand: Vowels
27
CONCLUSIONS
1. Duration has a measurable but fairly small
overall effect on vowel perception.
2. Vowel Shortening (-2 SDs): ~5% drop in
overall intelligibility
3. Vowel Lengthening (+2 SDs): ~5% drop in
overall intelligibility
4. Vowels Most Affected:
/å/ - /Ø/ - /ú/
/A/ - /‰/
5. Vowels Not Affected:
/i/ - /I/
/u/ - /U/
Hillenbrand: Vowels
28
Effects of Duration
on Vowel Perception
• Original Duration, Long Duration, Short Duration
Hillenbrand: Vowels
29
The Role of Spectral Change in Vowel Perception
Notice that some vowels – especially /A/ and /I/ –
show a fair amount of change in formant freq’s
throughout the course of the vowel. Is it possible
that these formant movements are perceptually
significant?
Hillenbrand: Vowels
30
More examples.
Note especially
the rise in F2
for /U/ and /ú/.
Hillenbrand: Vowels
31
Here’s another way to
visualize patterns of formant
frequency change in vowels:
This figure shows formant
frequencies measured at the
beginning of the vowel and a
2nd time at the end of the
vowel. (The phonetic symbol
is plotted at the 2nd
measurement). Note that
some vowels (e.g., /i/ and /u/)
are pretty steady over time,
but others have formants that
change quite a bit throughout
the course of the vowel (e.g.,
/e/, /o/, /ú/, /U/, /A/, /I/).
Hillenbrand: Vowels
32
NAT:
OF:
FF:
Naturally spoken /hAd/
Synthesized, preserving original formant contours
Synthesized with flattened formants
Hillenbrand: Vowels
33
Key comparison is OF vs. FF: If the formant movements don’t
matter, flattening the formant contour will not affect the vowel
percept, and the recognition rates for OF and FF should be very
similar. On the other hand, if the formant movements are
important, the FF signals will be less intelligible than the OF
signals.
Conclusion: Spectral
change patterns do matter.
Hillenbrand: Vowels
34
What can we conclude from all
this about how listeners recognize
which vowel was spoken?
1. Primary Cues:
 F1 and F2
 Relationships among the formants matter, not absolute formant
frequencies
2. Cues that are of secondary importance, but definitely play a role in
vowel perception:




f0
F3 (especially for [ɚ])
Spectral Change Patterns
Vowel Duration
Hillenbrand: Vowels
35
Download