Contrast and the realization of schwa vowels in English Edward Flemming Introduction The project: • To derive properties of phonetic and phonological vowel reduction from general constraints related to speech production and perception Outline of the talk: • Outline a model of vowel reduction. • Explore its application to English vowel reduction (‘reduction to schwa’). 2 Phonological vowel reduction • Vowel contrasts are neutralized in unstressed syllables. • E.g. Southern Italian (Mistretta dialect, Mazzola 1976) Primary stressed: Elsewhere: i u i u e o a a [i] [e] [a] [o] [u] stressed vowels vi¤n˘i ‘he sells’ ve@ni ‘he comes’ a@vi ‘he has’ mo@ri ‘he dies’ u@Ô˘i ‘he boils’ unstressed vowels vin˘i¤mu ‘we sell’ (vini¤mu ‘we come’) avi¤ti ‘he has’ (muri¤mu ‘we die’) uÔ˘i¤mu ‘we boil’ 3 Phonological vowel reduction • Common patterns of vowel reduction: (a) i u (b) i u (c) i u e o e o a/´ E ç a a • (a) reduces to (b), e.g. Standard Italian, B. Portuguese, Slovene • (b) reduces to (c), e.g. Standard Russian, S. Italian, Catalan • (a) reduces to (c), e.g. E. Catalan • Reduction to a single vowel, e.g. English, Dutch, Salish • Primarily neutralization of height contrasts. 4 Outline of an analysis of vowel reduction • Vowel reduction is fundamentally motivated by undershoot in short unstressed syllables. 5 Phonetic vowel reduction - Undershoot Lindblom's V Reduction Model - gVg F2 (Hz) 2500 2000 1500 1000 500 300 i u 350 400 e 100 ms o 500 F1 (Hz) 450 550 125 ms 600 a 200ms 650 700 6 Outline of an analysis of vowel reduction • Vowel reduction is fundamentally motivated by undershoot in short unstressed syllables. • Short duration of unstressed vowels increases the effort required to achieve distinct vowel qualities, particularly low vowels (Lindblom 1963). • Contrasts are subject to distinctiveness constraints, so neutralization occurs where phonetic reduction would otherwise render contrasts insufficiently distinct. 7 Undershoot as a consequence of effort minimization • Faster movements are more effortful (Nelson 1983, Perkell et al 2002). • In a CVC sequence, the articulators have to move to and from the position for the vowel. • Undershoot results from avoiding fast movements. F2C F2V F1V F1C 8 Formant undershoot as a function of duration and distance Lindblom’s model: 1800 F2i 1600 if F1t 375 Hz: F1V = F1t F2t 1200 1000 800 F1t 600 F1v 400 if F1t > 375 Hz F1V = k1(375 frequency (Hz) F2V = k2(F2i –F2t)e-T+F2t F2v 1400 F1i 200 –F1t)e-T+F1t F2i is F2 at C release F1t, F2t are V formant targets k, depend on consonant context 0 50 100 150 200 250 300 vowel duration (ms) More undershoot where: • Vowel is shorter • Distance between C and V is greater 9 Formant undershoot as a function of duration and distance Lindblom’s model: 1800 F2i 1600 if F1t 375 Hz: F1V = F1t if F1t > 375 Hz F1V = k1(375 –F1t)e-T+F1t F2i is F2 at C release F1t, F2t are V formant targets k, depend on consonant context frequency (Hz) F2V = k2(F2i –F2t)e-T+F2t F2v 1400 F2t 1200 1000 800 600 400 F1t F1i F1v 200 0 50 100 150 200 250 300 vowel duration (ms) More undershoot where: • Vowel is shorter • Distance between C and V is greater 10 Implementation of the model of vowel reduction • Stressed and unstressed inventories of contrasting vowel categories are selected from a space of possible vowels so as to best satisfy constraints on contrasts: Maximize distinctiveness of contrasts. Maximize number of contrasts. Minimize articulatory effort. • Effort minimization implies undershoot. 11 Model of vowel reduction • The vowel space is modeled on Liljencrants and Lindblom (1972). F2(Bark) 14 12 10 8 6 2 i u 3 5 F1(Bark) 4 6 a 7 8 12 i. Maximize the distinctiveness of contrasts • Distinctiveness of the contrast between Vi and Vj is the (weighted) distance between the vowels in formant space. 2 F2(Bark) 14 12 8 6 2 i Where xn is F2 of Vn in Bark yn is F1 of Vn in Bark a<1 10 u V1 d12 3 4 V2 5 6 a 7 8 13 F1(Bark) dij (a(x i x j )) (y i y j ) 2 i. Maximize the distinctiveness of contrasts • Overall distinctiveness cost of a vowel system depends on the minimum distance found in either inventory. 1 Cost: 2 dmin where dmin min dij i j 14 ii. Maximize the number of contrasts • maximize the number of vowels in the stressed and unstressed vowel inventories. Cost: nstressed nunstressed where nave 2 1 2 nave 15 iii. Articulatory effort • The space of possible vowels contracts as vowel duration is reduced, following the undershoot functions proposed by Lindblom (1963) • Consonants are assumed to assimilate partially to the vowel target in F2, but not in F1. F2(Bark) 12 10 8 6 2 2.5 3 3.5 4 100 ms F1(Bark) 14 4.5 5 160 ms 5.5 6 16 Overall cost function • The optimal vowel system is the one that best satisfies these constraints: 1 wn2 minimize 2 2 V dmin nave (subject to vowel space constraint) 17 Optimal inventories 14 12 10 8 6 2 2.5 3 3.5 4 F1 (Bark) a = 0.14, k1 = 1.5, 1 = 0.008, k2 =1.5, 1 = 0.01, c = 0.27, F2l = 1400 Hz, wn = 6 F2 (Bark) 4.5 Durations: stressed 160 ms Unstressed 100 ms 5 5.5 6 stressed unstressed 18 Italian vowels F 2 (B ark) 14 12 10 8 6 2 i 2 .5 u e 3 o 4 4 .5 E ç 5 F1 (Bark) 3 .5 5 .5 6 6 .5 a 7 s tres s ed Data from Albano Leoni et al 1995 uns tres s ed 19 Undershoot and vowel reduction • Relating phonological vowel reduction to undershoot helps to explain: i. The tendency to neutralize vowel contrasts in short unstressed syllables. ii. The generalization that vowel reduction primarily eliminates height contrasts. iii. The generalization that neutralizing vowel reduction is accompanied by phonetic reduction. 20 English reduction to schwa • English exhibits a pattern of vowel reduction whereby vowel quality contrasts are neutralized in unstressed syllables. • The resulting vowel is usually transcribed as schwa [´] atom »QR´m atomic ´»tHAmIk 21 Reduction to ‘schwa’ Predictions of the undershoot model: • Reduction to a single vowel should be most likely where vowels are very short. • Where there is a single vowel, distinctiveness of vowel quality contrasts is irrelevant, so effort minimization should dominate. • So schwa should be a transitional vowel, maximally assimilated to the surrounding context. – ‘targetless schwa’ 22 Minimum effort vowels • Minimal deviation from the narrow constrictions for surrounding consonants results in low F1 (a high vowel) because any constriction above the pharynx lowers F1. • Minimal deviation from the tongue body and lip positions for surrounding consonants and vowels results in contextually variable F2. • But schwa is often said to be a mid central vowel. 23 Experiment 1: English schwa vowels ( research with Stephanie Johnson) • Materials final non-final Rosa Lisa Russia asia rhapsody suggest suspend prejudice ninja sofa vodka today begin report soda alpha umbrella compare probable suffocate 24 Experiment 1 • Also recorded full vowels for comparison. heed [i], hid [I], head [E], had [Q], odd [A], hood [U], who [u] • Spoken in carrier phrase ‘Say ___ to me’. • 9 female speakers of American English. • Measured first two formants at the mid point of the vowels. • compare frequently lacked any voiced vowel in the first syllable, so it was excluded from analysis. 25 Results 2500 2000 1500 1000 u i 500 300 400 I U 500 600 700 E 800 A 900 1000 Q 1100 non-final full vowels (means) 26 F1 (Hz) Non-final schwa: • Low F1 (mean 425 Hz) • F2 is contextually variable. F2 (Hz) 3000 Results 2500 2000 1500 1000 u i 500 300 400 I U 500 600 700 E Final schwa: • F1 shows wide range (mean 665 Hz). • Much of this is between-speaker variation. • Central F2 (mean 1772 800 900 A 1000 Q 1100 non-final full vowels (means) final schwa 27 F1 (Hz) Non-final schwa: • Low F1 (mean 425 Hz) • F2 is contextually variable. F2 (Hz) 3000 Results Subject 0 2 4 6 8 10 0 200 mean F1 (Hz) Final schwa: • F1 shows wide range (mean 665 Hz). • Much of this is between-speaker variation. vocal tract size quality of final schwa ([´] - [√]) 400 600 final schwa heed 800 had 1000 1200 1400 28 Two patterns of vowel reduction • The difference between final and non-final schwa vowels can interpreted in terms of the undershoot model of vowel reduction. • There are two degrees of unstressed vowel reduction, depending on characteristic vowel duration. 29 Two patterns of vowel reduction • Final unstressed vowels are longer than non-final unstressed vowels: 153 ms vs. 64 ms. – Presumably a result of final lengthening. • Allows for a lower schwa vowel, which in turns allows for the maintenance of contrasts • The vowels [i, ´, oU] and rhotic [´’] contrast in unstressed syllables contrast in absolute word-final position. pretty [»p®IRi] letter [»lER´’] beta [»beIŔ ] motto [»mARoU] • Non-final schwa does not minimally contrast with other vowel qualities. • Consequently it is assimilated to context. 30 The correlation between contrast and reduced vowel quality The same correlation between system of contrasts and reduced vowel quality is observed across languages: • Contextually variable high vowels only seems to be found where all vowel quality contrasts are neutralized, e.g. Dutch (van Bergem 1994), probably Montana Salish. • Mid central [´] is found in contrast with higher vowels, e.g. reduced vowel inventories of the form [i, ´, u] occur in unstressed syllables in Russian (Padgett and Tabain 2003), E. Bulgarian (Wood and Pettersson 1988), Catalan (Herrick 2003), and in final unstressed syllables in Brazilian Portuguese (Mattoso Camara 1972). 31 The correlation between contrast and reduced vowel quality • Central Catalan stressed and unstressed vowels (Herrick 2003). i u e o ´ E ç a 32 Experiment 2: Targetless schwa • This analysis hypothesizes that the non-final reduced vowel in English is a minimum effort vowel. • So far based on impressionistic evaluation of a relatively unsystematic sample of medial schwa vowels. 33 Experiment 2: Targetless schwa • Systematically vary the preceding and following context of medial schwa. Questions: • Does variability of schwa involve assimilation to the surrounding context? • Is there any evidence that schwa has a vowel quality target? Conclusions: • Much of the variability of schwa can be attributed to assimilation. • Schwa lacks a vowel quality target, but it is not completely targetless - its target is to be a vowel. 34 Materials • Nonce words of the form: [»bV1C1´«C2V2t] – All combinations of V1 from {i, œ, u} Cn from {b, d, g} V2 from {i, A, u} (81 words) – Subjects were instructed to model the stress pattern on words like propagate and parakeet. e.g. »bid´«gut, »bQg´«bit, etc. 35 Materials • All words were read in the sentence frame: ‘X. Do you know what a X is?’ • Presented twice in random order (only second run is analyzed here). • Read by four native speakers of American English, 2 male, 2 female. 36 Analysis Measured F1 and F2: • at steady state, extreme values, or midpoint of V1 and V2 • at the offset of V1, and at the onset of voicing in V2. • in schwa: at the onset of voicing, at the offset of schwa, and half way between these points. 5000 V1 S 0 26.1806 V2 37 26.6053 Time (s) F2 (Hz) F2 (Hz) 3000 2500 2000 1500 3000 1000 2500 2000 1500 1000 300 200 i 300 u 400 i Results u 400 500 600 700 800 900 1000 F2 (Hz) 2200 2000 1800 1600 1400 1200 1000 formants at midpoint of schwa 700 Q 800 900 F2 (Hz) 2500 2000 1500 1000 200 200 300 i u i 300 u 400 400 500 500 600 600 700 Q 700 800 Q 38 800 900 F1 (Hz) Q 600 F1 (Hz) 500 Results • Medial schwa is highly variable in quality, particularly in F2. F2 (Hz) F2 (Hz) 2500 2000 1500 1000 3000 2500 2000 1500 1000 300 200 i u 300 i u 400 400 500 500 600 600 700 700 800 Q 900 1000 Q 800 39 900 F1 (Hz) 3000 Is this variability the result of interpolation between preceding and following context? • F2 in schwa is correlated with F2 of surrounding vowels • But it is not the result of simple interpolation between vowels. 2600 2400 F2 (Hz) 2200 i_i u_a i_a u_i 2000 1800 1600 1400 1200 1000 V1mid Smid V2mid 40 Is schwa variability the result of interpolation? 2500 2500 2300 2300 2100 2100 1900 1900 F2 (Hz) F2 (Hz) • Unsurprisingly, the consonants also have a significant influence on F2 of schwa. 1700 1500 1700 1500 1300 1300 1100 1100 900 900 V1mid V1off ud_da Smid ub_ba V2ons V2mid ug_ga V1mid V1off ib_bi Smid id_di V2ons V2mid ig_gi 41 Is schwa variability the result of interpolation? • The F2 trajectory of schwa is more likely to be an interpolation between preceding and following consonants. • But F2 adjacent to a consonant depends in turn on F2 of the adjacent vowel. 42 Locus equations • Typically consonant F2 is a linear function of F2 at the midpoint of the adjacent vowel (Lindblom 1963, Klatt 1987, etc). • The slope and intercept of this function depend on the consonant. bid bçd 5000 5000 0 1.96743 2.42708 Time (s) 0 11.3647 11.8142 Time (s) 43 Is schwa variability the result of interpolation? • These considerations suggest the following model of F2 at schwa midpoint: F2Smid = aC1F2V1 + bC1 + cC2F2V2 + dC2 • The effect of the vowels on F2 of schwa is modulated by the intervening consonants. • This model is quite successful (r2 = 0.73-0.86 for individual subjects) – For comparison, a model based on F2 at the vowel mid points alone yields r2 of 0.19-0.36 44 Summary for F2 • Medial schwas show wide variation in F2. • This variation is systematically conditioned by context. • It is difficult to determine whether schwa F2 is the result of simple interpolation. 45 Is F1 variability the result of interpolation? • F1 at schwa midpoint also varies substantially according to vowel context, but cannot be the result of interpolation between preceding and following vowels. 900 800 700 i_i ae_a i_a ae_i F1 (Hz) 600 500 400 300 200 100 0 V1mid Smid V2mid 46 Is F1 variability the result of interpolation? • The consonants can account for the fact that schwa F1 tends to be much lower than in adjacent vowels: forming a stop closure lowers F1. • But if F1 in schwa is governed by the adjacent consonantal constrictions, then we would expect schwa vowels to have very low F1. 900 800 700 i_i ae_a i_a ae_i F1 (Hz) 600 500 400 300 200 100 0 V1mid Smid V2mid 47 Is F1 variability the result of interpolation? • In iC´Ci, F1 at schwa midpoint is higher than in preceding or following vowels. • Schwa appears to have an F1 (height) target. 900 900 800 800 700 700 F1 (Hz) F1 (Hz) • If there are no height contrasts, why is there a height target? 600 500 600 500 400 400 300 300 200 200 V1mid V1off aed_da Smid aeb_ba V2ons V2mid aeg_ga V1mid V1off ib_bi Smid id_di V2ons V2mid 48 ig_gi Height target or ‘presence’ target? • While non-final schwa does not contrast with other vowels in quality, it is still important to distinguish presence vs. absence of schwa. • In some contexts presence vs. absence of schwa is minimally contrastive. about [´baUt] vs. bout [baUt] parade [pH´®eId] prayed [pH®eId] support [s´pHç®t] sport [spç®t] • So schwa is expected to have targets related to signaling the presence of a vowel, e.g. duration intensity peak 49 Height target or ‘presence’ target? • Producing an intensity peak generally involves increasing F1 (cf. Howitt 2000). • So the apparent F1 target for non-final schwa may be a byproduct of signaling the presence of a vowel. • This interpretation predicts that the ‘target’ for F1 is not a specific value, but a minimum value. Accordingly, schwa should assimilate to its context where this would result in high F1 - e.g. adjacent to a low vowel. 50 • Preliminary evidence suggests that schwa fully assimilates to the F1 of an adjacent low vowel. E.g. ‘Mrs Shah adressed the audience’ • If schwa had a specific F1 target (e.g. ~400 Hz), then we should observe movement towards this target. 5000 5000 0 0 0.309392 Time (s) Mr Shah dressed... 0 5.13834 5.62427 Time (s) Mrs Shah addressed... 51 Summary of experiment 2 • The quality of medial schwa vowels is highly variable. • Variation in F2 is particularly extensive, and involves assimilation to the surrounding context, suggesting that medial schwa has no specific backness or rounding targets. • F1 in schwa also covaries with F1 of surrounding vowels, but systematically deviates from F1 of the context, suggesting that schwa has an F1 target. • This apparent F1 target may be a byproduct of realizing an intensity peak to cue the presence of a vowel. – it is important to consider contrasts with absence as well as quality contrasts as sources of constraint on the realization of segments. 52 Conclusion • English shows two patterns of vowel reduction, both are consistent with the undershoot model of vowel reduction. • Non-final schwa vowels are short, which would tend to lead to substantial undershoot. • Vowel quality contrasts are neutralized. • The resulting vowel is substantially assimilated to its context. • But assimilation is limited by the need to provide clear cues to the presence of a vowel, e.g. an intensity peak. • Final unstressed position has longer vowels, allowing for vowel quality contrasts. • Schwa is realized as a mid central vowel, distinct from /i, oU/. • The same correlation between system of contrasts and the quality of reduced vowels is observed across languages – contextually variable vowel ( high in most contexts) in the absence of quality contrasts. 53 – mid central vowel in contrast with higher vowels. End 54 Schwa presence vs. absence 5000 5000 0 0.0123687 0.291936 Time (s) saw n(othing) 0 16.9501 17.2303 Time (s) saw an(other) 55