PowerPoint *********

advertisement
Syllable triangles, syllable centers,
articulatory syllable durations,
shadow angles, oh my !
Donna Erickson
Kanazawa Medical University, Japan
Haskins Laboratories, USA
ericksondonna2000@gmail.com
Thanks to Osamu Fujimura and J.C. Williams, & my colleagues Jangwon Kim, Sungbok Lee,
Shigeto Kawahara, Caroline Menezes, Atsuo Suemitsu, Jeff Moore, Yoshiho Shibuya, & many
others
C/D model: what does it model?
• The C/D model models how phonological
structures are mapped onto articulatory gestures
(Fujimura, 2000; also, see www.cdmodel.wordpress.com) .
• PROSODY is the skeletal base.
• Strings of spoken syllables are represented as
“syllable pulse trains”
– each syllable is represented as one pulse.
– The size of each syllable pulse is determined by its
“syllable magnitude”.
•
• “syllable-boundary pulse train--computed as a time
function representing the skeletal rhythmic structure of
the utterance.”
From Fujimura& Erickson, 2004
• Syllable magnitude correlates with sentence (phrasal)
stress.
• “won” receives primary sentential stress; “that” and
“ful” receive the secondary sentential stress.
Syllable magnitude
• Syllable magnitude is to a first approximation,
how much the jaw opens (jaw displacement from
occlusal plane) for each syllable.
• For a string of syllables, we see different amounts
of jaw opening, which reflect (I argue) the
metrical organization of an utterance (see e.g.,
Erickson et al., 2012).
Jaw displacement for each syllable measured from
occlusal plane ( marked with arrows)
From Erickson et al. 2014
English
From Erickson et al. 2014
Vowel Normalization
From Williams et al. 2013
• Once we "wash away" the vowel quality effects, utterances with the same
metrical structure, regardless of vowel content, show similar patterns of
syllable pulse trains (Erickson and Menezes 2013).
Review so far
• The C/D Model posits the pulse train as the fundamental
organization of utterances-- in speech planning we start
with the rhythm represented by the pulse train.
• Its rhythmic structure is partly represented by different
heights of syllable pulses
• In actual utterances, we do observe different amounts of
jaw displacement, which reflect those syllable pulses.
• Moreover, patterns of jaw displacement observed in other
languages also reflect metrical structure of that language,
i.e., Japanese (Kawahara et al. 2014), Chinese (Erickson et
al. 2015).
• Both Japanese and Chinese appear to have phrase initial
and phrase final stress (which is different from stress
patterns of English)
• The jaw displacement patterns of the first language may be
carried over into those of the second language.
French?
Predictions:
French speakers have difficulty distinguishing local
stress in English
French has final-stress
Probably French has large final jaw-opening
French speakers may be similar to Japanese speakers.
The jaw displacement patterns of the first language
may be carried over into those of the second language.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228
.1196&rank=6
Phrase boundaries
• The C/D model also has the power to algorithmically
derive phrase boundaries in a spoken utterance from
jaw movement patterns.
• No other model can do this.
• Based on the combination of the height of the syllable
pulse (amount of jaw displacement) and the average
maximum speed of the onset and offset crucial
articulators of the syllable, the model calculates
– (a) where the phrase boundary occurs and
– (b) how big this boundary is (e.g., Fujimura 1986,
Bonaventura & Fujimura, 2007, Menezes 2004, Kim et al.,
2014).
Syllable triangles, syllable centers, articulatory
syllable durations, shadow angles, oh my !
• If you concur with the premise that the jaw
opens for a syllable, then the rest is just a matter
of “computation.”
•
Pam said BAT that fat
cat at that mat
Jaw displacement for each syllable
measured from occlusal plane
From Erickson et al. 2014
There
once
was
a
girl
from
De
ca
tur
Syllable triangles, syllable centers, articulatory syllable
durations, phrase boundaries, shadow angles
•
Pam said BAT that fat
cat at that mat
Crucial articulators & “icebergs”
• A syllable consists of a nucleus (vowel) and onset and
coda elements.
• For the sentence Pam said bat that fat cat at that mat,
the crucial articulators are lower lip (for p, m, b, f),
tongue tip (for s, d, t, th)and tongue dorsum (for k).
• Fujimura (1986) observed that when one overlays the
demisyllabic velocity time function of the crucial
articulator, there is a point of smallest mean invariance.
• He referred to as the “iceberg” region, which is the
average maximum velocity of all the repetitions of a
single utterance type.
• The iceberg point (Bonaventura 2003; Menezes 2003;
Bonaventura & Fujimura 2007) is algorithmically
determined at the minimum variance point of a
number of trajectories of the same demisyllable.
• One approach is to find the point of the minimum
root-mean-squarred-error in the horizontal direction
after optimal time shifting of the trajectories to the
reference trajectory (Fujimura 1986; 1994;
Bonaventura & Fujimura 2007).
• Another approach is to choose the point of the
minimum “iceberg metric” among multiple vertical
movement bands of the crucial articulator (Menezes,
2003).
• The iceberg metric is proportional to the variance of
articulatory speed and inversely proportional the
mean of articulatory speed in the band.
• However, these methods require a large number
of trajectory samples to secure the reliability.
• An alternative approach for determining the
smallest mean invariance is to use the maximum
speed point of the crucial articulators for the onset
or coda of each demisyllable (e.g., Erickson 2010,
Erickson et al. 2014 &Erickson et al. submitted,
Kim et al. 2014).
From Kim et al. 2014
• In this way, the center of the syllable is
calculated as the midpoint between the
maximum speed of the crucial articulators;
quotation marks indicate this is an alternative
approach for determining the “iceberg” point.
From Kim et al. 2014
Syllable triangle construction
Pam said BAT that fat cat at that mat
So???
• Test the model
• Invariance of articulatory excursion and speed of
crucial articulators?
• Do “shadow angles” change as a function of
emotion or contrastive emphasis?
• How do “consonants” work?
• How do IRF’s change as a function of emotion
and contrastive emphasis?
• Articulatory phrase boundaries & perceived
boundaries?
Invariance of articulatory excursion
and speed of crucial articulators?
CV
R=0.80
R=0.84
R=0.89
VC
R=0.95
bat
R=0.59
R=0.87
that
R=0.04
fat
R=0.48
cat
Red is emphasized
Do “shadow angles” change as a function of emotion or contrastive emphasis?
Emotion (from Kim et al. 2014)
Contrastive Emphasis (from Kim et al. in progress
Error plot bars for shadow angles
How do “consonants” work? How do IRF’s change as a function of
emotion and contrastive emphasis?
From Kim et al. 2014
• Emotion affects amplification of IRFs & timing (Kim et al.
2014)
• Contrastive emphasis—still investigating.
Articulatory phrase boundaries &
perceived boundaries?
• Perception tests using Rapid Prosodic
Transcription (e.g., Cole et al., 2008).
• Tasks (www.gengojeff.com):
– 1. where do you hear a boundary?
– 2. which words are prominent?
Boundary perception
Prominence perception
Articulatory phrase boundaries &
perceived boundaries
Articulatory
Prominence
Articulatory
Boundaries
Perceptual
Prominence
r=0.60
(p<.001)
r=0.36
(p<.001)
Perceptual
Boundaries
A05
Perceptual
Prominence
r=0.43
(p<.001)
r=0.28
(p<.001)
r=0.68
(p<0.001)
r=0.41
(p<0.001)
Perceptual
Boundaries
r=0.18
(p<0.05)
r=-0.18
n.s.
A03
Summary
• 1. C/D model accounts for utterance prominence
• 2. C/D model accounts for phrase boundaries
• 3. more work is waiting to be done
a. about shadow angles
b. IRF’s
c. etc.
• 4. see www.cdmodel.wordpress.com for more
discussions about C/D model
Acknowledgements
• Thanks to Osamu Fujimura and J.C. Williams, &
my colleagues Jangwon Kim, Sungbok Lee,
Shigeto Kawahara, Caroline Menezes, Atsuo
Suemitsu, Jeff Moore, Yoshiho Shibuya, & many
others
• This work was supported by NSF IIS--‐1116076,
NIHDC007124, and Japan Society for the
Promotion of Science, Grants in aid for Scientific
Research (C) #22520412 and (C) #2537044.
References
•
•
•
•
•
•
•
•
•
•
•
•
•
Bonaventura, P. 2003. Invariant patterns in articulatory movements. Ph.D. dissertation, The Ohio State
University.
Bonaventura, P., Fujimura, O. 2007. Articulatory movements and prosodic boundaries. In: Beddor, P.,
Ohala, J., Solé, M. (eds.), Experimental Approaches to Phonology, Oxford: Oxford University Press,
209-227.
Cole, J., Goldstein, L. A. Katsika, A. Y. Mo, Y., E. Nava, E., Tiede, M. 2008. Perceived prosody: Phonetic
bases of prominence and boundaries. J. Acoust. Soc. Am. 124, 2496.
Erickson, D., 1998. Effects of contrastive emphasis on jaw opening. Phonetica 55, 147-169.
Erickson, D. 2002. Articulation of extreme formant patterns for emphasized vowels. Phonetica 59,
134-149.
Erickson, D. 2010. More about jaw, rhythm and metrical structure. Acoustical Society of Japan Fall
Meeting, p. 103.
Erickson, D., A. Suemitsu, Y. Shibuya, and M. Tiede 2012. Metrical structure and production of English
rhythm. Phonetica 69, 180–190.Fujimura, O. 2000. The C/D model and prosodic control of articulatory
behavior. Phonetica 57, 128-138.
Erickson, D., Kawahara, S., Moore, J., Menezes, C. Suemitsu, A., Kim, J., Shibuya, Y. 2014. Calculating
articulatory syllable duration and phrase boundaries. ISSP2014 (Cologne, Germany, May 2014), 102105.
Erickson, D., Kim, J., Kawahara, S., Wilson, I., Menezes, C., and Suemitsu, A. submitted. Bridging
articulation and perception: The C/D model and contrastive emphasis, ICPHS 2015.
Erickson, D., Iwata, R., Moore,J., Suemitsu, A., Shibuya, Y. 2015. The jaw keeps the beat: Speech
rhythm in English, Japanese and Mandarin. Lexicon Festa-3, Feb. 1, 2015. NINJAL,Tokyo, Japan.
Fujimura, O. 1986. Relative invariance of articulatory movements: An iceberg model. In: J. Perkell, J.
and Klatt, D. H. (eds), Invariance and Variability in Speech Processes, Hillsdale, NJ: Lawrence Erlbaum
Associates, Inc. 226-242.
Fujimura, O. 1994. C/D model: A computational model of phonetic implementation. DIMACS Series in
Discrete Mathematics and Theoretical Computer Science, 17, 1-20.
Fujimura, O. 2000. The C/D model and prosodic control of articulatory behavior. Phonetica 57, 128138.
•
•
•
•
•
•
•
•
•
•
•
•
•
Gabor P., Shinobu, M., Kazuhito Y. 2014. Boundary and Prominence Perception by Japanese Learners of English: A
preliminary study. Journal of Phonetic Society of Japan 17, 59-66.
Harrington, J., Fletcher, J., Beckman, M.E. 2000. Manner and place conflicts in the articulation of accent in Australian
English. In: Broe, M. Pierrehumbert, J. (eds), Papers in Lab.Phonology V: Language Acquisition and the Lexicon.
Cambridge: Cambridge University Press, 40-51.
http://gengojeff.netau.net/pam/
Jong, K. de. 1995. The supraglottal articulation of prominence in English: linguistic stress as localized
hyperarticulation. J. Acoust. Soc. Am. 97, 491–504.
Kawahara, S., Erickson, D., Moore, J., Suemitsu, A., Shibuya, Y. 2014. Jaw displacement and metrical structure in
Japanese: The effect of pitch accent, foot structure, and phrasal stress. Journal of Phonetic Society of Japan, 77-87
Kim, J., Erickson, D., Lee, S., Narayanan, S. 2014. A study of invariant properties and variation patterns in the
converter/distributor model for emotional speech. Interspeech 2014. 413-417.
Macchi, M. 1985. Segmental and suprasegmental features and lip and jaw articulations. Doct.diss. New York
University (unpublished).
Menezes, C. 2003. Rhythmic pattern of American English: An articulatory and acoustic study. Ph.D. dissertation, The
Ohio State University.
Menezes, C. 2004. Changes in phrasing in semi-spontaneous emotional speech: Articulatory evidences. J. Phonetic
Soc. Japan 8, 45-59.
Menezes, C., Erickson, D., McGory, J., Pardo, B., and Fujimura, O. 2002. An articulatory and perceptual study of
phrasing. Temporal Integration in the Perception of Speech. ISCA Workshop. (Aix-en-Provence, April 8-10), 43.
Menezes, C., Pardo, B., Erickson, D., and Fujimura, O. 2003. Changes in syllable magnitude and timing due to
repeated corrections. Speech Communication 40, 71-8.
Summers, W. V. Effects of stress and final consonant voicing on vowel production: articulatory and acoustic analyses.
J. Acoust. Soc. Am. 82, 847–863.
Westbury, J. Fujimura, O. 1989. An articulatory characterization of contrastive emphasis. J. Acoust. Soc. Am. 85, S98.
Download