Syllable triangles, syllable centers, articulatory syllable durations, shadow angles, oh my ! Donna Erickson Kanazawa Medical University, Japan Haskins Laboratories, USA ericksondonna2000@gmail.com Thanks to Osamu Fujimura and J.C. Williams, & my colleagues Jangwon Kim, Sungbok Lee, Shigeto Kawahara, Caroline Menezes, Atsuo Suemitsu, Jeff Moore, Yoshiho Shibuya, & many others C/D model: what does it model? • The C/D model models how phonological structures are mapped onto articulatory gestures (Fujimura, 2000; also, see www.cdmodel.wordpress.com) . • PROSODY is the skeletal base. • Strings of spoken syllables are represented as “syllable pulse trains” – each syllable is represented as one pulse. – The size of each syllable pulse is determined by its “syllable magnitude”. • • “syllable-boundary pulse train--computed as a time function representing the skeletal rhythmic structure of the utterance.” From Fujimura& Erickson, 2004 • Syllable magnitude correlates with sentence (phrasal) stress. • “won” receives primary sentential stress; “that” and “ful” receive the secondary sentential stress. Syllable magnitude • Syllable magnitude is to a first approximation, how much the jaw opens (jaw displacement from occlusal plane) for each syllable. • For a string of syllables, we see different amounts of jaw opening, which reflect (I argue) the metrical organization of an utterance (see e.g., Erickson et al., 2012). Jaw displacement for each syllable measured from occlusal plane ( marked with arrows) From Erickson et al. 2014 English From Erickson et al. 2014 Vowel Normalization From Williams et al. 2013 • Once we "wash away" the vowel quality effects, utterances with the same metrical structure, regardless of vowel content, show similar patterns of syllable pulse trains (Erickson and Menezes 2013). Review so far • The C/D Model posits the pulse train as the fundamental organization of utterances-- in speech planning we start with the rhythm represented by the pulse train. • Its rhythmic structure is partly represented by different heights of syllable pulses • In actual utterances, we do observe different amounts of jaw displacement, which reflect those syllable pulses. • Moreover, patterns of jaw displacement observed in other languages also reflect metrical structure of that language, i.e., Japanese (Kawahara et al. 2014), Chinese (Erickson et al. 2015). • Both Japanese and Chinese appear to have phrase initial and phrase final stress (which is different from stress patterns of English) • The jaw displacement patterns of the first language may be carried over into those of the second language. French? Predictions: French speakers have difficulty distinguishing local stress in English French has final-stress Probably French has large final jaw-opening French speakers may be similar to Japanese speakers. The jaw displacement patterns of the first language may be carried over into those of the second language. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228 .1196&rank=6 Phrase boundaries • The C/D model also has the power to algorithmically derive phrase boundaries in a spoken utterance from jaw movement patterns. • No other model can do this. • Based on the combination of the height of the syllable pulse (amount of jaw displacement) and the average maximum speed of the onset and offset crucial articulators of the syllable, the model calculates – (a) where the phrase boundary occurs and – (b) how big this boundary is (e.g., Fujimura 1986, Bonaventura & Fujimura, 2007, Menezes 2004, Kim et al., 2014). Syllable triangles, syllable centers, articulatory syllable durations, shadow angles, oh my ! • If you concur with the premise that the jaw opens for a syllable, then the rest is just a matter of “computation.” • Pam said BAT that fat cat at that mat Jaw displacement for each syllable measured from occlusal plane From Erickson et al. 2014 There once was a girl from De ca tur Syllable triangles, syllable centers, articulatory syllable durations, phrase boundaries, shadow angles • Pam said BAT that fat cat at that mat Crucial articulators & “icebergs” • A syllable consists of a nucleus (vowel) and onset and coda elements. • For the sentence Pam said bat that fat cat at that mat, the crucial articulators are lower lip (for p, m, b, f), tongue tip (for s, d, t, th)and tongue dorsum (for k). • Fujimura (1986) observed that when one overlays the demisyllabic velocity time function of the crucial articulator, there is a point of smallest mean invariance. • He referred to as the “iceberg” region, which is the average maximum velocity of all the repetitions of a single utterance type. • The iceberg point (Bonaventura 2003; Menezes 2003; Bonaventura & Fujimura 2007) is algorithmically determined at the minimum variance point of a number of trajectories of the same demisyllable. • One approach is to find the point of the minimum root-mean-squarred-error in the horizontal direction after optimal time shifting of the trajectories to the reference trajectory (Fujimura 1986; 1994; Bonaventura & Fujimura 2007). • Another approach is to choose the point of the minimum “iceberg metric” among multiple vertical movement bands of the crucial articulator (Menezes, 2003). • The iceberg metric is proportional to the variance of articulatory speed and inversely proportional the mean of articulatory speed in the band. • However, these methods require a large number of trajectory samples to secure the reliability. • An alternative approach for determining the smallest mean invariance is to use the maximum speed point of the crucial articulators for the onset or coda of each demisyllable (e.g., Erickson 2010, Erickson et al. 2014 &Erickson et al. submitted, Kim et al. 2014). From Kim et al. 2014 • In this way, the center of the syllable is calculated as the midpoint between the maximum speed of the crucial articulators; quotation marks indicate this is an alternative approach for determining the “iceberg” point. From Kim et al. 2014 Syllable triangle construction Pam said BAT that fat cat at that mat So??? • Test the model • Invariance of articulatory excursion and speed of crucial articulators? • Do “shadow angles” change as a function of emotion or contrastive emphasis? • How do “consonants” work? • How do IRF’s change as a function of emotion and contrastive emphasis? • Articulatory phrase boundaries & perceived boundaries? Invariance of articulatory excursion and speed of crucial articulators? CV R=0.80 R=0.84 R=0.89 VC R=0.95 bat R=0.59 R=0.87 that R=0.04 fat R=0.48 cat Red is emphasized Do “shadow angles” change as a function of emotion or contrastive emphasis? Emotion (from Kim et al. 2014) Contrastive Emphasis (from Kim et al. in progress Error plot bars for shadow angles How do “consonants” work? How do IRF’s change as a function of emotion and contrastive emphasis? From Kim et al. 2014 • Emotion affects amplification of IRFs & timing (Kim et al. 2014) • Contrastive emphasis—still investigating. Articulatory phrase boundaries & perceived boundaries? • Perception tests using Rapid Prosodic Transcription (e.g., Cole et al., 2008). • Tasks (www.gengojeff.com): – 1. where do you hear a boundary? – 2. which words are prominent? Boundary perception Prominence perception Articulatory phrase boundaries & perceived boundaries Articulatory Prominence Articulatory Boundaries Perceptual Prominence r=0.60 (p<.001) r=0.36 (p<.001) Perceptual Boundaries A05 Perceptual Prominence r=0.43 (p<.001) r=0.28 (p<.001) r=0.68 (p<0.001) r=0.41 (p<0.001) Perceptual Boundaries r=0.18 (p<0.05) r=-0.18 n.s. A03 Summary • 1. C/D model accounts for utterance prominence • 2. C/D model accounts for phrase boundaries • 3. more work is waiting to be done a. about shadow angles b. IRF’s c. etc. • 4. see www.cdmodel.wordpress.com for more discussions about C/D model Acknowledgements • Thanks to Osamu Fujimura and J.C. Williams, & my colleagues Jangwon Kim, Sungbok Lee, Shigeto Kawahara, Caroline Menezes, Atsuo Suemitsu, Jeff Moore, Yoshiho Shibuya, & many others • This work was supported by NSF IIS--‐1116076, NIHDC007124, and Japan Society for the Promotion of Science, Grants in aid for Scientific Research (C) #22520412 and (C) #2537044. References • • • • • • • • • • • • • Bonaventura, P. 2003. Invariant patterns in articulatory movements. Ph.D. dissertation, The Ohio State University. Bonaventura, P., Fujimura, O. 2007. Articulatory movements and prosodic boundaries. In: Beddor, P., Ohala, J., Solé, M. (eds.), Experimental Approaches to Phonology, Oxford: Oxford University Press, 209-227. Cole, J., Goldstein, L. A. Katsika, A. Y. Mo, Y., E. Nava, E., Tiede, M. 2008. Perceived prosody: Phonetic bases of prominence and boundaries. J. Acoust. Soc. Am. 124, 2496. Erickson, D., 1998. Effects of contrastive emphasis on jaw opening. Phonetica 55, 147-169. Erickson, D. 2002. Articulation of extreme formant patterns for emphasized vowels. Phonetica 59, 134-149. Erickson, D. 2010. More about jaw, rhythm and metrical structure. Acoustical Society of Japan Fall Meeting, p. 103. Erickson, D., A. Suemitsu, Y. Shibuya, and M. Tiede 2012. Metrical structure and production of English rhythm. Phonetica 69, 180–190.Fujimura, O. 2000. The C/D model and prosodic control of articulatory behavior. Phonetica 57, 128-138. Erickson, D., Kawahara, S., Moore, J., Menezes, C. Suemitsu, A., Kim, J., Shibuya, Y. 2014. Calculating articulatory syllable duration and phrase boundaries. ISSP2014 (Cologne, Germany, May 2014), 102105. Erickson, D., Kim, J., Kawahara, S., Wilson, I., Menezes, C., and Suemitsu, A. submitted. Bridging articulation and perception: The C/D model and contrastive emphasis, ICPHS 2015. Erickson, D., Iwata, R., Moore,J., Suemitsu, A., Shibuya, Y. 2015. The jaw keeps the beat: Speech rhythm in English, Japanese and Mandarin. Lexicon Festa-3, Feb. 1, 2015. NINJAL,Tokyo, Japan. Fujimura, O. 1986. Relative invariance of articulatory movements: An iceberg model. In: J. Perkell, J. and Klatt, D. H. (eds), Invariance and Variability in Speech Processes, Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. 226-242. Fujimura, O. 1994. C/D model: A computational model of phonetic implementation. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 17, 1-20. Fujimura, O. 2000. The C/D model and prosodic control of articulatory behavior. Phonetica 57, 128138. • • • • • • • • • • • • • Gabor P., Shinobu, M., Kazuhito Y. 2014. Boundary and Prominence Perception by Japanese Learners of English: A preliminary study. Journal of Phonetic Society of Japan 17, 59-66. Harrington, J., Fletcher, J., Beckman, M.E. 2000. Manner and place conflicts in the articulation of accent in Australian English. In: Broe, M. Pierrehumbert, J. (eds), Papers in Lab.Phonology V: Language Acquisition and the Lexicon. Cambridge: Cambridge University Press, 40-51. http://gengojeff.netau.net/pam/ Jong, K. de. 1995. The supraglottal articulation of prominence in English: linguistic stress as localized hyperarticulation. J. Acoust. Soc. Am. 97, 491–504. Kawahara, S., Erickson, D., Moore, J., Suemitsu, A., Shibuya, Y. 2014. Jaw displacement and metrical structure in Japanese: The effect of pitch accent, foot structure, and phrasal stress. Journal of Phonetic Society of Japan, 77-87 Kim, J., Erickson, D., Lee, S., Narayanan, S. 2014. A study of invariant properties and variation patterns in the converter/distributor model for emotional speech. Interspeech 2014. 413-417. Macchi, M. 1985. Segmental and suprasegmental features and lip and jaw articulations. Doct.diss. New York University (unpublished). Menezes, C. 2003. Rhythmic pattern of American English: An articulatory and acoustic study. Ph.D. dissertation, The Ohio State University. Menezes, C. 2004. Changes in phrasing in semi-spontaneous emotional speech: Articulatory evidences. J. Phonetic Soc. Japan 8, 45-59. Menezes, C., Erickson, D., McGory, J., Pardo, B., and Fujimura, O. 2002. An articulatory and perceptual study of phrasing. Temporal Integration in the Perception of Speech. ISCA Workshop. (Aix-en-Provence, April 8-10), 43. Menezes, C., Pardo, B., Erickson, D., and Fujimura, O. 2003. Changes in syllable magnitude and timing due to repeated corrections. Speech Communication 40, 71-8. Summers, W. V. Effects of stress and final consonant voicing on vowel production: articulatory and acoustic analyses. J. Acoust. Soc. Am. 82, 847–863. Westbury, J. Fujimura, O. 1989. An articulatory characterization of contrastive emphasis. J. Acoust. Soc. Am. 85, S98.