2010 - Department of Linguistics - University of California, Davis

advertisement
Long-Distance Coarticulation:
A Production and Perception Study of English and American Sign Language
By
MICHAEL ANDREW GROSVALD
B.A.S. (University of California, Davis) 1989
M.A. (University of California, Berkeley) 1993
M.A. (University of California, Davis) 2006
DISSERTATION
Submitted in partial satisfaction of the requirements for the degree of
DOCTOR OF PHILOSOPHY
in
Linguistics
in the
OFFICE OF GRADUATE STUDIES
of the
UNIVERSITY OF CALIFORNIA
DAVIS
Approved:
_____________________________________
_____________________________________
_____________________________________
Committee in Charge
2009
i
Copyright
by
Michael Andrew Grosvald
2009
ii
TABLE OF CONTENTS
List of Tables .................................................................................................................vi
List of Figures ...............................................................................................................vii
Dedication ......................................................................................................................ix
Acknowledgements......................................................................................................... x
Abstract ........................................................................................................................xiii
Chapter 1 — Introduction............................................................................................. 1
1.1. Motivation for study............................................................................................ 1
1.2. Assimilation and coarticulation ......................................................................... 3
1.2.1. Assimilation in spoken language.................................................................... 3
1.2.2. Distinguishing coarticulation and assimilation ..............................................5
1.2.3. Assimilation and coarticulation in signed language....................................... 6
1.2.3.1. Some models of sign language phonology.............................................. 7
1.2.3.2. Approaching coarticulation in signed language ....................................13
1.3. Comparison of English schwa and ASL neutral space .................................. 16
1.3.1. Status of schwa............................................................................................. 17
1.3.2. Neutral signing space ................................................................................... 20
1.3.3. Schwa and neutral space: Comparison and contrast .................................... 23
1.3.4. Schwa and neutral space: Summary............................................................. 26
1.4. Dissertation outline ........................................................................................... 27
1.5. Research questions and summary of results................................................... 28
PART I: Spoken-language production and perception ............................................ 32
Chapter 2 — Coarticulation production in English.................................................. 32
2.1. Introduction ....................................................................................................... 32
2.1.1. Segmental and contextual factors................................................................. 35
2.1.2. Language-specific factors ............................................................................ 39
2.1.3. Speaker-specific factors ............................................................................... 40
2.2. First production experiment: [i] and [a] contexts .......................................... 41
2.2.1 Methodology ................................................................................................ 42
2.2.1.1. Speakers ................................................................................................ 42
2.2.1.2. Speech samples ..................................................................................... 42
2.2.1.3. Recording and digitizing ....................................................................... 47
2.2.1.4. Editing, measurements and analysis......................................................48
2.2.2. Results and discussion.................................................................................. 51
2.2.2.1. Group results ......................................................................................... 51
2.2.2.2. Individual results ................................................................................... 53
2.2.2.3. Follow-up tests ...................................................................................... 57
2.2.2.4. Longer-distance effects .........................................................................63
2.3. Second production experiment: [i], [a], [u] and [æ] contexts........................ 64
2.3.1 Methodology ................................................................................................ 65
2.3.1.1. Speakers ................................................................................................ 65
2.3.1.2. Speech samples ..................................................................................... 65
2.3.1.3. Recording and digitizing ....................................................................... 68
2.3.1.4. Measurements and data sample points ..................................................68
2.3.2. Results and discussion.................................................................................. 68
iii
2.3.2.1. Group results ......................................................................................... 68
2.3.2.2. Individual results ................................................................................... 73
2.3.2.3. Follow-up tests ...................................................................................... 84
2.3.2.4. Longer-distance effects .........................................................................87
2.3.2.5. Further comparison of the two experiments..........................................88
2.4. Chapter conclusion............................................................................................ 93
Chapter 3 — Coarticulation perception in English: Behavioral study ................... 94
3.1. Introduction ....................................................................................................... 94
3.2. Methodology ...................................................................................................... 97
3.2.1. Listeners ....................................................................................................... 97
3.2.2. Creation of stimuli for perception experiment .............................................98
3.2.3. Task ............................................................................................................ 100
3.3. Results and discussion..................................................................................... 103
3.3.1. Perception measure .................................................................................... 103
3.3.2. Group results .............................................................................................. 105
3.3.3. Individual results ........................................................................................ 105
3.3.4. Correlation between production and perception ........................................ 109
3.4. Chapter conclusion.......................................................................................... 115
Chapter 4 — Coarticulation perception in English: ERP study............................ 118
4.1. Introduction ..................................................................................................... 118
4.2. Methodology .................................................................................................... 122
4.2.1. Participants and Stimuli ............................................................................. 122
4.2.2. Electroencephalogram (EEG) recording ....................................................124
4.3. Results and discussion..................................................................................... 126
4.3.1. Latency ....................................................................................................... 134
4.3.2. Relationship to behavioral results .............................................................. 136
4.3.2.1. Can ERP results predict behavioral outcomes?................................... 140
4.3.2.2. Are ERP and behavioral responses correlated in general?..................141
4.4. Chapter conclusion.......................................................................................... 141
PART II: Signed-language production and perception.......................................... 143
Chapter 5 — Coarticulation production in ASL..................................................... 143
5.1. Introduction ..................................................................................................... 143
5.2. Initial study ...................................................................................................... 148
5.2.1. Methodology .............................................................................................. 149
5.2.1.1. Signer 1 ............................................................................................... 149
5.2.1.2. Task ..................................................................................................... 150
5.2.1.3. Motion capture data recording ............................................................ 152
5.2.2. Results ........................................................................................................ 154
5.3 Main study ....................................................................................................... 163
5.3.1. Methodology .............................................................................................. 164
5.3.1.1. Subjects ............................................................................................... 164
5.3.1.2. Task ..................................................................................................... 165
5.3.2. Results ........................................................................................................ 172
5.3.2.1. Group results ....................................................................................... 174
5.3.2.2. Signer 2 ............................................................................................... 182
5.3.2.3. Signer 3 ............................................................................................... 192
iv
5.3.2.4. Signer 4 ............................................................................................... 200
5.3.2.5. Signer 5 ............................................................................................... 208
5.3.3. Other aspects of intersigner variation......................................................... 221
5.4. Chapter conclusion.......................................................................................... 226
Chapter 6 — Coarticulation perception in ASL: Behavioral study ...................... 232
6.1. Introduction ..................................................................................................... 232
6.2. Methodology .................................................................................................... 233
6.2.1. Participants ................................................................................................. 233
6.2.2. Creation of stimuli for perception experiment ...........................................233
6.2.3. Task ............................................................................................................ 237
6.3. Results and discussion..................................................................................... 241
6.3.1. Perception measure .................................................................................... 241
6.3.2. Group results .............................................................................................. 241
6.3.3. Individual results ........................................................................................ 244
6.3.4. Relationship between production and perception ......................................247
6.4. Chapter conclusion.......................................................................................... 247
Chapter 7 — Discussion and conclusion .................................................................. 249
7.1. Models of spoken-language coarticulation.................................................... 249
7.2. Incorporating signed-language coarticulation ............................................. 253
7.3. Cross-modality contrasts ................................................................................ 256
7.4. Dissertation conclusion ................................................................................... 263
Appendix A ................................................................................................................. 265
Appendix B ................................................................................................................. 269
Appendix C ................................................................................................................. 280
Appendix D ................................................................................................................. 288
Appendix E ................................................................................................................. 297
Appendix F.................................................................................................................. 309
Appendix G ................................................................................................................. 311
Bibliography ............................................................................................................... 321
Vita .............................................................................................................................. 333
v
LIST OF TABLES
Table 2.1: Significance testing outcomes for first group of speakers ............................ 55
Table 2.2: Follow-up testing results, Speakers 3 and 7.................................................. 59
Table 2.3: Possible very-long-distance coarticulatory effects for Speakers 3 and 7...... 64
Table 2.4: Expected coarticulatory influence of four vowels on nearby schwa ............ 66
Table 2.5: ANOVA testing results, all 38 speakers ....................................................... 72
Table 2.6: Significance testing outcomes, second group of speakers ............................ 75
Table 2.7: Average formant values, Speaker 37 ............................................................79
Table 2.8: Significance testing results for all context pairs, Speaker 37 ....................... 80
Table 2.9: Further results, Speaker 37............................................................................ 81
Table 2.10: Summary of testing outcomes for all contexts, second speaker group ....... 83
Table 2.11: Very-long-distance significance testing results for four speakers .............. 88
Table 2.12: Summary of overall results for Speakers 3, 5 and 7 ................................... 90
Table 2.13: Summary of outcomes for the two experiments ......................................... 92
Table 3.1: Type of data obtained from each subject group............................................ 98
Table 3.2: Duration, amplitude and f0 values used in normalizing vowel stimuli ......100
Table 3.3: Perception scores, all subjects..................................................................... 108
Table 4.1: Latency results for the entire subject group ................................................ 136
Table 4.2: Latency results by subgroup ....................................................................... 138
Table 5.1: Results for WANT in z-dimension (height) at distance 3, Signer 1 ........... 159
Table 5.2: Distance-1 results for WANT, Signer 1......................................................162
Table 5.3: Demographic information of the five signers ............................................. 164
Table 5.4: Sentence frames and context signs, main sign production study................166
Table 5.5: Numerical results for the main sign production study ................................ 181
Table 5.6: Production results, Signer 2 ........................................................................ 190
Table 5.7: Production results, Signer 3 ........................................................................ 199
Table 5.8: Production results, Signer 4 ........................................................................ 206
Table 5.9: Production results, Signer 5 ........................................................................ 220
Table 5.10: Some measures of intersigner differences ................................................ 222
Table 5.11: Quantification of neutral-space “drift” for each signer.............................223
Table 6.1: Duration and height-difference information for the sign stimuli................ 237
Table 6.2: Results for each subject in the sign perception study ................................. 245
Table A1: Formant values of distance-1 target vowels in [a] vs. [i] contex ................266
Table A2: Formant values of distance-2 target vowels in [a] vs. [i] context ...............267
Table A3: Formant values of distance-3 target vowels in [a] vs. [i] context ...............268
vi
LIST OF FIGURES
Figure 1.1: Autosegmental representation of nasal place assimilation ............................ 4
Figure 1.2: Move-Hold representation of the ASL sign IDEA ........................................ 8
Figure 1.3: Representation of a sign in the Hand-Tier model........................................ 10
Figure 1.4: Hand-configuration assimilation in the Hand-Tier model...........................11
Figure 1.5: The vowel quadrangle and schwa................................................................ 18
Figure 1.6: Possible coarticulatory influences on schwa and neutral signing space......23
Figure 2.1: Fowler’s (1983) model of VCV articulation ............................................... 36
Figure 2.2: Coarticulation model from Keating (1985) ................................................. 38
Figure 2.3: Expected coarticulatory influence on schwa of nearby [i] or [a].................45
Figure 2.4: Editing points used for the sequence up at a ............................................... 48
Figure 2.5: Average F1 and F2 of target vowels, first group of speakers......................53
Figure 2.6: Coarticulatory effects produced by Speaker 7............................................. 57
Figure 2.7: A typical recording made from Speaker 7...................................................60
Figure 2.8: Correlation results, production at distances 1 and 2, first speaker group .... 62
Figure 2.9: Expected influence on schwa from [i], [a], [u] or [æ] .................................67
Figure 2.10: Target vowel positions in F1-F2 space, second group of speakers ........... 70
Figure 2.11: Coarticulatory effects on target vowels, Speaker 37 ................................. 77
Figure 2.12: Correlation results for production at distances 1 and 3, both groups ........85
Figure 3.1: Logistic curve .............................................................................................. 96
Figure 3.2: Design of the perception task for the speech study ................................... 102
Figure 3.3: Production-perception correlation results, first speaker group.................. 111
Figure 3.4: Production-perception correlation results, second speaker group ............. 112
Figure 3.5: Production-perception correlation results, both groups............................. 113
Figure 3.6: Production and perception—hypothetical threshold values ...................... 115
Figure 4.1: Hypothetical patterning of perceptual sensitivity to coarticulation........... 122
Figure 4.2: Sequencing of the ERP study stimuli ........................................................ 123
Figure 4.3: Configuration of the electrodes in the 32-channel cap ..............................125
Figure 4.4: Topographical maps, entire subject group................................................. 129
Figure 4.5: Waveforms at selected sites, entire subject group ..................................... 132
Figure 4.6: Topographic distribution of the MMN-like effects, entire subject group .134
Figure 4.7: Topographic distribution of MMN-like effects, by subgroup ................... 139
Figure 5.1: Expected coarticulatory behavior of schwa and neutral signing space ..... 148
Figure 5.2: Locations of FATHER, MOTHER and neutral signing space .................. 150
Figure 5.3: The location of the target sign WANT in a typical utterance.................... 151
Figure 5.4: Position of the ultrasound markers ............................................................152
Figure 5.5: Definition of x, y, and z dimensions relative to signer.............................. 154
Figure 5.6: Two ASL sentences, as seen in motion capture data................................. 156
Figure 5.7: Locations of seven context signs ............................................................... 161
Figure 5.8: Beginning and end points of WISH........................................................... 165
Figure 5.9: Newer variant of RUSSIA ......................................................................... 167
Figure 5.10: Apparatus used for <red> and <green> ................................................... 168
Figure 5.11: Context-sign locations on the body and in 3-space ................................. 175
Figure 5.12: Motion capture data for two sentences, Signer 2.....................................183
Figure 5.13: Context item locations for left-handed signers........................................ 185
vii
Figure 5.14: Motion capture data for two sentences, Signer 3.....................................192
Figure 5.15: Signer 3 signing BOSS on upper chest.................................................... 194
Figure 5.16: Motion capture data for four sentences, Signer 4 ....................................200
Figure 5.17: Signer 4’s preferred form of GO ............................................................. 201
Figure 5.18: Motion capture data for two sentences, Signer 5.....................................208
Figure 5.19: Contraction of “I WANT,” Signer 5........................................................ 210
Figure 5.20: Contraction of “I WISH,” Signer 5..........................................................211
Figure 5.21: PANTS and HAT with modified location, Signer 5................................ 212
Figure 5.22: Orientation assimilation of WANT, Signer 5 .......................................... 215
Figure 5.23: Drift of target-sign location, Signers 3, 4, 5 ............................................ 226
Figure 6.1: Design of the perception task for the sign study ....................................... 241
Figure 7.1: Fowler’s (1983) model of VCV articulation ............................................. 250
Figure 7.2: Location assimilation in PANTS, and a contraction I_WANT .................258
Figure 7.3: Hand-Tier representation of neutral-space variant of PANTS ..................261
viii
Dedication
To my family.
ix
Acknowledgements
First and foremost, I wish to thank my committee members: David Corina, C. Orhan
Orgun and Tamara Swaab. I could not have arrived at this point without their guidance
and support.
David came to UC Davis during the second year of my graduate studies. I feel very
fortunate that he then invited me to work as a research assistant in his lab, which is one
of the most positive work environments I have ever been in. In my time here, I have
enjoyed learning about sign language, psychology and the design and carrying out of
language-related experiments. It has been a fascinating and rewarding experience, and
in addition has opened the door to a number of interesting adventures, such as the
SignTyp conference in Connecticut, the Zachary’s Pizza expedition during the CNS
conference, and the moving of the hot tub.
When I was looking for a topic for my first qualifying paper, it was Orhan who
suggested that I investigate vowel-to-vowel coarticulation across particular kinds of
consonants. Questions that arose during my work on that project led me to explore
coarticulation in other contexts, and that research in turn became the starting point of
this dissertation. It would therefore be difficult to overstate how valuable his input has
been. I have also enjoyed our weekly meetings over Chinese food, where he has
answered my phonetics and phonology questions with patience and humor.
x
Tamara gave me my first in-depth look at ERP methodology, and it was she who
suggested that I investigate the MMN component in my perceptual studies. I also very
much appreciate her willingness to meet with me when I have had questions, and have
valued her feedback and support.
A number of other people have also provided encouragement along the way,
particularly at this project’s early stages. These include Diana Archangeli, Carol
Fowler, Keith Johnson, Patricia Keating, Harriet Magen and Daniel Recasens. I also
wish to thank audiences at the UC Berkeley Phonetics/Phonology Phorum, the 2008
meeting of the Linguistic Society of America, the 2008 Northwest Linguistics
Conference, and the 2007 coarticulation workshop of the Association Francophone de
la Communication Parlée in Montpellier.
For my ASL studies, I have often needed guidance from others who know much more
about sign language than I do. In particular, David Corina, Sarah Hafer, Tara Williams,
Martha Tyrone and Claude Mauk have been extremely supportive. Thanks are also due
to participants of the 2008 SignTyp conference and audiences at the 2009 meeting of
the Linguistics Society of America and at Haskins Laboratories.
David Corina and Tamara Swaab were my primary go-to people when I had questions
related to my ERP work. However, a number of other people also gave me help along
the way. Foremost among these is Tony Shahin, who has been a good friend as well as
a life-saver during my initial efforts to learn about Matlab and EEGLAB. Eva Gutierrez
xi
showed me the ropes in SPSS. Thank you also to Emily Kappenman and Steve Luck.
Any errors in my use of ERP methodology are my responsibility and no one else’s.
Finally, I wish to thank members of my family, particularly those in Chico, Prague and
Los Angeles, who have lived through much of this writing process with me and given
me love, patience and support during this long and intensive effort. I would not be
where I am without them as well.
This research project was supported in part by grant NIH-NIDCD 2RO1-DC003099
(David P. Corina).
xii
Abstract
This project investigates the production and perception of long-distance coarticulation,
defined here as the articulatory influence of one phonetic element (e.g. consonant or
vowel) on another across more than one intervening element. Part I explores
anticipatory vowel-to-vowel (VV) coarticulation in English; Part II deals with
anticipatory location-to-location (LL) effects in American Sign Language (ASL). Longdistance effects were observed in both speech and sign production, sometimes across
several intervening elements. Even such long-distance effects were sometimes found to
be perceptible.
For the spoken-language study, sentences were created in which multiple consecutive
schwas (target vowels) were followed by various context vowels. Thirty-eight English
speakers were recorded as they repeated each sentence six times, and statistical tests
were performed to determine the extent to which target vowel formant frequencies were
influenced differently by the context vowels. For some speakers, significant effects of
one vowel on another were found across as many as five intervening segments. The
perception study used behavioral methods and found that even the longest-distance
effects were perceptible to some listeners; nearer-distance effects were detected by all
participants. Subjects’ coarticulatory production tendency was not correlated with
either speaking rate or perceptual sensitivity.
xiii
Seventeen perception-study subjects also provided EEG data for an event-related
potential (ERP) study, which used the same vowel stimuli as the behavioral perception
study, and sought to determine whether ERP methodology might provide a more
sensitive measure than behavioral methods. Significant ERP effects were found in
response to nearer-distance VV coarticulatory effects, but generally not for the longestdistance ones. This is the first ERP study to investigate the sub-phonemic processing
associated with the perception of coarticulation.
In Part II, motion-capture technology was used to investigate LL coarticulation in the
signing of five ASL users. Evidence was found of significant LL coarticulatory
influence of one sign on another across as many as three intervening signs. However,
LL effects were weaker and less frequent than the VV effects found in the spokenlanguage study. The perceptibility of these LL effects was then tested on both deaf and
hearing subjects; some subjects in each group scored significantly better than chance on
the task.
xiv
1
CHAPTER 1 — INTRODUCTION
1.1. Motivation for study
The phenomenon of coarticulation is relevant for issues as varied as lexical
processing and language change, for both spoken and signed languages. However,
research to date has not determined with certainty how far such effects can extend,
though it is apparent that there is a substantial amount of coarticulatory variability
among speakers in the production of spoken language. The first part of this project
investigates the extent of long-distance vowel-to-vowel (VV) coarticulation in
American English, and interspeaker variation in the production of these effects. While
work on coarticulation in sign language has been conducted by some researchers
(Cheek, 2001; Mauk, 2003), the question of long-distance coarticulation in sign
languages appears to have been unaddressed until now. The second part of the project
investigates long-distance location-to-location (LL) coarticulation in American Sign
Language (ASL). In addition, while research on the issue of perceptibility of
coarticulatory effects has been underway for at least a few decades in spoken language
research (e.g., Lehiste & Shockey, 1972), corresponding work on sign language seems
not yet to have begun in earnest. This project examines the perceptibility of longdistance coarticulatory effects in both spoken and signed language.
2
Manual-visual languages like American Sign Language (ASL) are naturally
occurring and show syntactic, morphological and phonological complexity which is
comparable to that of spoken languages. Sign languages are not mime, nor are such
languages a word-for-word translation of any spoken language. Besides the fact that
signed language is an interesting object of study in its own right, research into sign
languages also brings with it the insights to be gained by a comparison of
corresponding phenomena in the spoken and signed modalities. By examining the
similarities and differences of various aspects of spoken and signed language, we may
find that some assumptions about human language universals have to be revised, while
in other cases, new insights relevant to language in general may be revealed. Although
the structures of spoken and signed languages may seem quite different at first glance,
the history of sign language research supports this general approach.
This is seen, for example, in the formal study of sign language phonology,
which began with the work of Stokoe (1960). Not only did he recognize the
appropriateness of the appellation sign language, but was the first to develop the insight
that traditional methods of linguistic analysis (i.e., those developed for spoken
languages) could offer great utility in the study of sign, hence to the understanding of
the human language capacity in general. This reasoning has been followed by many
researchers since Stokoe began his work, and is the approach that I will follow here.
With the basic motives of this project now established, next follows a
discussion of assimilation and coarticulation, which are closely related but which
nevertheless can be usefully distinguished. Since these notions have been frequently
examined in previous spoken-language phonological research, this discussion will
3
concentrate more on their conception in sign language phonology and how one might
expect to usefully investigate them there. After this, the specific targets of study in this
project, English schwa and ASL neutral signing space, will be introduced and the
reasons for their use in this project will be explained. This introductory chapter then
concludes with a statement of this project’s research questions and an outline of
subsequent chapters of the dissertation.
1.2. Assimilation and coarticulation
1.2.1. Assimilation in spoken language
Following the line of reasoning described above, it will now be useful to sketch
some relevant ideas from spoken language phonology, starting from first principles,
and then consider possible sign-language analogues.
In spoken language phonology, the regularity of certain segment-to-segment
influences can be expressed by means of rules like those used in SPE (Chomsky &
Halle, 1968). Most directly, one can express the relevant changes in terms of the
segments themselves. For example, if one notices in some language that alveolar nasal
stops are persistently realized as velars before [k], one might express this process as
follows:
n→ŋ/_k
While this rule covers cases involving [k], one might subsequently find that it
misses others, such as those involving nasals preceding [g], or those involving place
assimilation before labials, even though these cases might seem to be effected by the
4
same process. Rephrasing the rule in terms of nasality and place-of-articulation features
makes the rule more general and also offers more explanatory power concerning the
underlying dynamics involved. Following the autosegmental approach (Goldsmith,
1976), one may even go one step further, as shown in Figure 1.1 below:
Figure 1.1. Nasal place-of-articulation assimilation, represented in autosegmental
terms.
Here, different parameter types, such as nasality and place, are separated into
different tiers, emphasizing their (relative) independence from one another. Such
configurations can be advantageous when certain phonetic/phonological properties
seem to be fundamentally related to (or “dependent” on) others, and the autosegmental
approach is also useful in cases where non-linear processes are involved, such as in the
templatic morphology of Arabic, in which different types of information may be carried
by consonant sequences and vowel sequences, which are eventually interleaved. The
autosegmental approach has also proven quite fruitful in the study of signed languages,
where arguably more use is made of simultaneity of structure than in spoken languages.
5
1.2.2. Distinguishing coarticulation and assimilation
Historically, the terms coarticulation and assimilation have often co-occurred in
the phonetics and phonology literature of both spoken and signed languages, seemingly
interchangeably in some cases (Kühnert & Nolan, 1999, for a historical summary).
However, it has also proven useful to distinguish the two; for example, Keating (1985,
p. 2) distinguishes assimilation from coarticulation as follows: “with assimilation, a
segment which normally might have some particular target or quantitative value for a
given feature, has a different target or quantitative value when adjacent to some other
segment.” Accordingly, I will consider assimilation to the special case of coarticulation
which is definable in terms of phonological features. That is, if the influence of item X
on item Y is such that item Y undergoes a change expressible in terms of a feature
alteration, then assimilation has occurred.1 Coarticulation is the more general case of
articulatory influence of one phonetic element (e.g. a consonant, a vowel, a gesture) on
another.
As an example, consider the American English pronunciation of the word
“haunt.” The velum lowering needed to articulate the [n] may occur early in the word,
resulting in a form with a nasalized vowel, [hãnt]. If velum lowering was in effect
throughout the duration of the vowel, then it may be said that the vowel has acquired a
[+nasal] or [+nasalized] feature, as is suggested by the transcription just given, in which
case this is an instance of assimilation. On the other hand, a speaker saying “haunt”
1
Keating’s description of assimilation in terms of “segments” is somewhat problematic when applied to
sign language, however, which is why I have used the more general term “item.” Also, I prefer a rather
loose interpretation of the term “adjacent” that Keating uses, so that assimilation can be considered
possible even for items which are not strictly consecutive (as occurs in vowel harmony, for example).
Assimilation can be either obligatory (e.g. nasal place assimilation in Japanese) or optional (e.g. nasal
place assimilation in English in cases like “inconceivable” or “ten bags”).
6
might accomplish complete lowering of the velum in time for the onset of the [n]
without having had the velum lowered until late into the articulation of the preceding
vowel. In this case, it might be argued that the vowel did not actually acquire a positive
feature value for nasality, even though some velum lowering did occur during the
vowel. This would be an instance of coarticulation, but not assimilation.
Clearly, such distinctions may not always be easy to make; for instance, in the
example just presented, the exact dividing line between “nasal enough” and “not nasal
enough” for a feature change to be said to have occurred is not obvious. The matter is
complicated by the question of what sorts of features should be considered relevant; for
example, vowel nasality is not considered phonologically contrastive in English, but
may often be indispensable for hearers in cases where the nasal consonant in a VN
sequence is deleted altogether. The nasal vowel could then be an important clue for the
listener trying to distinguish “haunt” [hãt] and “hot” [hat] (although with this pair,
context would be likely to help).
1.2.3. Assimilation and coarticulation in signed language
Having distinguished assimilation and coarticulation in spoken language, we
should now be in a good position to consider the corresponding distinction in signed
language. As it turns out, much research has already been conducted on assimilation in
sign languages such as ASL (see Sandler & Lillo-Martin, 2006). Rules representing
various types of sign-language assimilation are often similar in spirit to the
autosegmental rules for spoken languages that were discussed above, but depending on
7
the representational framework for signs that one adopts, the precise form of such rules
will vary somewhat.
1.2.3.1. Some models of sign language phonology
Stokoe (1960) treated signs as combinations of three parameters—Handshape,
Location, and Movement—occurring simultaneously. More recent approaches
incorporate sequential structure into sign representations as well. In discussing the
history of sign language phonology, Sandler and Lillo-Martin (2006) draw a parallel
with the study of spoken language phonology, in which the earliest approaches used
sequential, segment-by-segment representations, while later researchers have found it
useful to model certain aspects of spoken language phonology in non-sequential terms,
as in the autosegmental approach.
Move-Hold model
The Move-Hold model of Liddell and Johnson (1989[1985]) posits that signs
have a canonical phonological form which is expressible as a sequence of
“Movements” (“M”) and “Holds” (“H”), which are segmental items distinguished
according to whether or not the hands move. The Hold segment includes features
related to both of what Stokoe would have termed Location and Handshape. This model
is illustrated below in Figure 1.2 with an example given by Sandler and Lillo-Martin
(2006, p. 129), which shows the Move-Hold representation of the ASL sign IDEA.2
2
As is customary in the sign language literature, glosses of ASL signs will be given in capital letters.
8
Figure 1.2. Move-Hold representation of the ASL sign IDEA.
In many respects, this model offers greater explanatory power than Stokoe’s
original conception of signs, which emphasized the simultaneity which is characteristic
of the phonological structure of signed language, relative to that of spoken language.
Since the Move-Hold model builds sequential structure into its representation of signs,
some phenomena such as within-sign metathesis, requiring reference to sequentiality
and hence problematic for Stokoe’s approach, are analyzed quite straightforwardly.
Consistent with the autosegmental approach for spoken language, assimilation between
successive signs can be represented by reassociation of the relevant features. The model
incorporates 13 handshape features and 18 features for place, and can be used to
describe signs with a substantial degree of phonetic detail.
9
However, the Move-Hold model has also been criticized, in part because this
richness of phonetic description also leads to overgeneration and at the same time,
misses important generalizations. As an example of the latter, in many signs, the
general configuration of the hand(s) is consistent throughout the course of the sign, but
many features of both Hold components must be specified redundantly in both the first
and final columns of the Move-Hold representation of the sign, as can be seen for IDEA
in Figure 1.2.
Hand-Tier model
Sandler’s Hand-Tier model (1986, 1987, 1989), illustrated in Figure 1.3 below,
avoids many such problems by positing a hand-configuration (HC) node with multiple
associations to three nodes sequentially arranged in the order Location-MovementLocation, as depicted below. Based on the idea that (global) location tends to remain
relatively stable during most signs, even when path movement does occur, the two
Location nodes are associated to a node called “place,” an indicator of general global
position. This model is therefore similar to the Move-Hold model in incorporating
sequentiality within its representation of signs, but at the same time seeks to avoid the
redundancies, just discussed in connection with that model, which are undesirable from
a phonological perspective.3
3
It should be noted that later modifications by Liddell & Johnson (1989) and Liddell (1990) to the
Move-Hold model make use of underspecification, with a similar goal in mind.
10
Figure 1.3. Representation of a sign in the Hand-Tier model.
Many cases of assimilation can be efficiently represented within this
framework. An example taken from Sandler and Lillo-Martin (2006, p. 137), shown
below as Figure 1.4, is the hand-configuration assimilation in the ASL compound
BELIEVE, formed from the base signs THINK and MARRY. Notice that in order to
preserve the canonical L-M-L sequencing in the output form, Location deletion also
takes place.
It should be noted that the sign illustrated in Figure 1.4, BELIEVE, is
articulated by many signers with a “1” handshape transitioning into a “C” handshape
during the movement of the dominant hand from the forehead down to the nondominant hand. This differs from the situation depicted in Figure 1.4, which specifies a
“C” handshape throughout the duration of the sign. This model is in fact able to deal
with situations in which two handshape configurations or two places are maintained in
a compound; the form of BELIEVE illustrated in Figure 1.4 happens to be an example
of the latter and not the former.
11
Figure 1.4. Hand-configuration assimilation, represented in the Hand-Tier model.
The HC category itself includes a rich featural hierarchy, analogous to spokenlanguage feature geometry (Clements, 1985; Sagey, 1986) and incorporating work of
Sandler (1995, 1996), van der Hulst (1995), Crasborn and van der Kooij (1997) and van
der Kooij (2002). This is closely related to work by Corina (1990) seeking to provide
an adequate account of handshape assimilation; in both approaches, descriptions of
partial handshape assimilation are possible in addition to cases of total assimilation like
that depicted in Figure 1.4. Like spoken-language feature geometry, this is
accomplished in part by recognizing that certain features tend to behave similarly in
phonological systems, typically for physiological reasons.
Prosodic model
The Prosodic model of Brentari (1998) differs significantly from the models just
described, particularly in its treatment of movement. In this model, sign features are
divided into two types, Inherent and Prosodic, with the latter coding properties not
12
visible at any particular instant (i.e., those specifically relevant to motion). Because
movement does seem to be a particularly salient component of signs in terms of
perception (Corina & Hildebrandt, 2002), it may be reasonable to treat it as “special” in
some way as this model does, and Brentari (1998) provides additional justifications for
doing so. However, some phenomena that are arguably best characterized by a single
rule can only be coded in the Prosodic model by separate reference to two branches of
structure. As one example of this, Sandler and Lillo-Martin (2006) discuss the ASL
compound FAINT, formed from the base signs MIND and DROP. The relevant issue
here is that when hand configuration assimilates in ASL, not only are handshaperelated features like finger position and orientation assimilated, but so too is any
internal movement. This process can be expressed with a single rule in the Hand-Tier
model, but requires separate reference in the Prosodic model to the Inherent and
Prosodic feature types.
One key point that has emerged in the preceding discussion is the ongoing effort
of sign researchers to describe the formational parameters of signs in a way that is
comprehensive but at the same time constrained enough to provide descriptions which
are adequate at the phonological level. Conversely, for the kind of phonetic detail that
will be examined in this project, even the substantial amount of information that can be
conveyed in the Move-Hold or other sign models is not sufficient. This can be
compared to the situation in spoken language: in the studies that will be presented here,
sub-phonemic vowel contrasts will generally be discussed in terms of formant
frequencies, measured in Hertz. This is so because even the relatively fine phonetic
13
distinctions expressible in transcription systems like the IPA, making use of diacritics
indicating varying degrees of fronting, lowering and so on, are not sufficient for the
task.
Similarly, when sign-language coarticulation data are given, location will
generally be expressed in terms of three-dimensional spatial position, measured in
millimeters, because the degree of detail that is involved is not expressible in existing
sign models or transcription systems. However, there are some interesting cases of
assimilation in the sign results that will be discussed, and these will be expressed in
phonological terms. In such cases, the Hand-Tier model will be adopted, because the
key variable involved will be location, so movement need not be foregrounded as in the
Prosodic model, and the relevant issues can be expressed more succinctly in the HandTier model than in the Move-Hold or other sign models.
1.2.3.2. Approaching coarticulation in signed language
Leaving aside the specific case of assimilation proper, the general study of
coarticulation in sign language is bound to present certain challenges, regardless of the
framework one adopts. Much of the research completed to date in which sign-language
coarticulation has proven relevant (or problematic) has been conducted in the context of
machine learning (e.g., see Vogler & Metaxas, 1997), though some theoretical work
has been successfully conducted as well (e.g. Cheek, 2001; Mauk, 2003; see Chapter 5
for a discussion of both). Here again, looking to work already done in spoken language
research may prove useful (for an overview, see Hardcastle and Hewlett, 1999). Such
work to date has investigated many aspects and types of spoken-language production
14
phenomena, including VV, C-to-C, lingual, labial, and velar effects, as well as work on
perception and on long-distance coarticulation. By analogy, what sorts of coarticulation
might prove amenable to study in sign language research?
Probably, the sign parameter for which subtle phonetic-level phenomena such
as coarticulatory effects can be studied most directly is location. This is so since the
position of any point on the body in motion (e.g. a fingertip or a point on the palm) can
be measured at any particular timepoint and described in terms of three numerical
values, i.e. those corresponding to the three spatial dimensions (using motion-capture
technology or multiple video cameras) or perhaps just two (if only a single video image
is used). If one accepts the premise that movement is the most perceptually salient
feature of signs (e.g., see Hildebrandt & Corina, 2002), and hence may play a role in
sign akin to that of vowels in spoken language, then sign location (as well as
handshape) might be seen as performing a more “consonantal” function. If so, LL
effects could be considered an analogue of one kind of C-to-C coarticulation.
The detailed study of handshape-to-handshape (HH) coarticulation presents
more difficulties than that of LL effects, partly because of the many interacting
components—such as the numerous joints on the hand—that are involved in the
articulations of particular handshapes. Fine-level phonetic measurements and
descriptions of handshape effects in general will require measures for relative locations
of multiple parts of the hand (e.g. various points on the different fingers), which differ
as joints are bent at various angles, and would also have to be robust to changes in
orientation. Still, interesting experimental work can be accomplished in this domain,
particularly if one focuses on a limited subset of all the articulatory possibilities. One
15
example of this is Cheek’s (2001) work investigating HH coarticulation in the contexts
of the “1” and “5” handshapes. This was accomplished by measuring and comparing,
between the two contexts, pinky-to-base-of-hand distances at key timing points during
the articulation of various signs; this distance was expected to be smaller in the context
of the “1” handshape (since the pinky is curled in when that handshape is formed) than
in the “5” context (since the pinky is extended for that handshape).
Other coarticulation types may prove to be even more challenging as research
targets. Although previous theoretical (Perlmutter, 1991) and experimental research on
perceptual saliency properties of different sign parameters (Corina & Hildebrandt,
2002) indicate that movement-to-movement (MM) effects may be the closest analogue
of VV effects in spoken language, MM coarticulation is likely to be much more
challenging to study and describe. While VV effects can be analyzed fairly
straightforwardly by means of formant frequency measurements at particular
timepoints, movements are dynamic events tracing a three-dimensional path during an
interval of time. Describing MM effects in general would seem to require an algorithm
able to analyze the data record of such events in 3-space as being consistent with an
arc, or a single linear motion, or a spiral, and so on for the complete inventory of
motion types, regardless of other complicating issues such as spatial orientation. As an
example of the difficulties involved, might an arc path movement in one sign influence
a closed-to-open finger movement in the following sign? How would one seek to
measure this? One possibility I would suggest is that the complexity associated with
the characterization of movements might be usefully reduced, by looking for example
16
at simpler but importantly related parameters such as velocity, path length, or starting
and ending points of the paths traced by individual movements.
Simple parameter-to-parameter (X-to-X) effects will probably not tell the whole
story here, just as they do not in spoken language coarticulation. For example, prosodic
structure has been shown to be relevant in spoken-language work on coarticulation
(e.g., Cho, 1999, 2004), and there is no reason to assume this could not be the case with
sign. Brentari and Goldsmith (1993) have argued that a sign’s non-dominant hand may
be analogous to a syllable coda in spoken language, so coarticulatory effects related
specifically to the dominant or non-dominant hand might have implications for the
prosodic nature of signs or of language overall. More generally, one might also expect
to see interactions between sign parameters, such as location-to-movement effects, just
as one sees C-to-V effects in spoken language. Then again, perhaps some such
phenomena will prove to be specific to one modality only; if so, this would be
informative as well.
Because it is unlikely that all such possibilities can be investigated in a single
project, I have chosen in the current study to focus on LL coarticulation, since as
discussed above, location is probably the sign parameter for which coarticulatory
effects can be most straightforwardly measured.
1.3. Comparison of English schwa and ASL neutral space
In this project, I investigate coarticulation in spoken and signed language, with
an emphasis on the temporal extent of the phenomenon and variability in its production
17
and perception among speakers and signers. Since coarticulation is a complex,
multifaceted object of study, I have chosen to narrow my focus specifically onto VV
coarticulation in English and LL coarticulation in American Sign Language (ASL).
Specifically, I examine long-distance coarticulatory effects of various English vowels
on the instantiation of schwa and of various sign locations on that of ASL neutral space.
Schwa and neutral space were chosen as target items because of certain parallels that
may be drawn between the two. However, when considered more closely and with
respect to these items’ phonological status, such similarities appear more superficial;
therefore, an examination of the schwa - neutral space analogy will be useful in
evaluating the extent to which the comparison may be valid.
1.3.1. Status of schwa
The schwa is a mid central vowel and as such is located in the middle of twodimensional vowel space, as illustrated in Figure 1.5 below. This is so whether we
consider schwa in terms of the articulatory properties of height and frontness or the
acoustically determined quantities of first and second formant, since there is a strong
correspondence between these articulatory and acoustic measures.4
Unstressed English vowels often reduce to schwa, resulting in the oppositions
seen in pairs such as {photography [a], photograph [ə]} and {reflex [i], reflexive [ə]}.
Although other outcomes of vowel reduction, such as the high central [i], are possible,
these will not be considered here, as schwa appears to be the most frequent such
4
At first glance, this articulatory-perceptual distinction seems quite different from the situation in signed
language, since the sign articulators are viewed directly by the perceiver. However, since it is as yet
unclear just how language is “perceived,” whether there might in fact be a parallel along these lines
between signed and spoken language is an intriguing question.
18
reduction outcome and is certainly the most-discussed in the relevant literature. This is
so both for English and for other languages; for example, it is well-known that some
Russian vowels undergo reduction to schwa when in stressed or pre-tonic position. In
fact, schwa plays a “special” role in the phonological systems of most languages in
which it is a member, typically being the product of processes like reduction or
epenthesis (van Oostendorp, 2003).
Figure 1.5. The familiar vowel quadrangle, with mid central schwa also shown. The
articulatory parameters of tongue height and backness correspond roughly to the
inverses of the acoustic parameters first and second formant frequency, respectively.
In my work on spoken-language coarticulation, I have chosen schwa as target
vowel because of its susceptibility to coarticulatory influence from nearby vowels, a
19
property that has emerged in previous studies, both acoustic and articulatory (e.g.
Fowler, 1981; Alfonso & Baer, 1982; respectively). The great variability in the
production of English schwa has raised the question of whether this vowel may be
completely underspecified except for a [+vocalic] or [-consonantal] feature when
considered in phonological terms (e.g. see van Oostendorp, 2003), or may be
“targetless” when considered in articulatory terms. Browman and Goldstein (1992)
investigated the latter possibility through experimental study of one speaker and
concluded that at least for that speaker, schwa was not quite targetless, but rather had a
weak target which was “completely predictable ... [corresponding] to the mean tongue
... position for all the full vowels.” (p. 56)
Even if schwa is not completely underspecified or targetless, its coarticulatory
tendencies are well-established and hence this vowel has seemed a logical choice as
target in the present study of long-distance coarticulation. Analysis of acoustic data
obtained in an earlier spoken-language coarticulation study has found strong evidence
that in environments containing multiple consecutive schwas, VV coarticulatory effects
can reach at least as far as six segments’ distance (Grosvald & Corina, in press). The
search for a possible analogue of such effects in sign language has led me to wonder if
neutral signing space might behave similarly to schwa in its articulatory behavior, but
this also raises the question of how comparable English schwa and ASL neutral space
are within their linguistic systems.
20
1.3.2. Neutral signing space
A look at the sign language phonology literature suggests that the term “neutral
space,” typically defined as the general signing area in front of the signer not
immediately adjacent to any particular body part, refers to something that may actually
not be unitary in nature. This is apparent if we consider how two prominent
phonological theories deal with neutral space.
Recall that in Brentari’s Prosodic model (Brentari, 1998), each sign is
represented by means of a feature tree in which motion is accorded special status,
having its own branch of structure in which motion-related “prosodic features” are
coded, while articulator and place-of-articulation (POA) information are located in
separate branches under a broad “inherent features” node (p. 94). Under the POA node
are features coding where on the body a sign is articulated, along with the “articulatory
plane” associated with the sign. For signs articulated in neutral space, there is no body
location specified, except for the case where the non-dominant hand is used as sign
location, in which case “h2” is taken as body location. Since the latter kind of twohanded signs are also typically articulated in the neutral space area, this model suggests
that not all signs physically articulated in that region are phonologically alike with
respect to location.
In her Hand-Tier model, Sandler also represents such “h2-P” (i.e. non-dominant
hand as Place) signs differently from other neutral-space signs. In the Hand-Tier model,
“Place” refers to the general region where a sign is articulated, and is represented as a
node which is linked to more-finely tuned “Location” nodes in the posited Location-
21
Movement-Location sequence. For example, the neutral-space signs WANT and
DON’T_WANT have place feature [trunk] throughout their duration, but have location
features for distance and height (“settings”) that change as needed to describe each
sign. In the case of WANT, the “distance setting” changes from [distal] to [proximal]
while the “height setting” remains set at [mid] during the articulation of the sign, while
during the articulation of DON’T_WANT, the distance setting remains set at
[proximal] while the height setting changes from [mid] to [low]. (Sandler & LilloMartin, 2006, p. 229-30). For neutral-space signs articulated on the non-dominant hand,
Place is specified instead as [h2] (p. 186). Neither of these theories posits that neutralspace signs’ representations are underspecified with respect to location.5 In fact, both
the Hand-Tier and Prosodic models allow for phonologically coded positional
distinctions within the neutral-space region. The signs just discussed, WANT and
DON’T_WANT, are represented in the Hand-Tier model by varying the settings of
phonological features coding—within neutral space—height of articulation and
proximity to the body.
Similarly, the Prosodic model allows for “setting changes” in neutral-space
signs such as the move from [contra] to [ipsi] in the articulation of CHILDREN through
the transverse plane (i.e. toward the dominant hand’s side of the body from the other
side of the body). Such setting changes (i.e. movements) are specified on the Prosodic
Features branch of a sign’s representation, and allow for movements in both directions
and along either dimension within the plane specified as a neutral-space sign’s place of
articulation (Brentari, 1998, p. 151-4). In addition to these sorts of subdivisions of
5
In Brentari’s theory, the lack of association with any particular body area in the case of neutral-space
signs not articulated on the non-dominant hand is not treated as a case of underspecification.
22
neutral space recognized by the Hand-Tier and Prosodic models, both models also
make a distinction between neutral-space signs like the ones just described and those
articulated on the non-dominant hand, treated in both models as a separate place of
articulation.6
I am also unaware of any evidence suggesting that other signing locations
reduce to neutral space in particular contexts the way unstressed vowels so often reduce
to schwa (more discussion of this point, including the possible relevance to this issue of
sign whispering, follows in Section 1.3.3). Although one might expect, for instance,
that signs articulated at higher locations tend to migrate to lower locations—therefore
nearer to neutral space—when preceded and followed by lower-articulated signs during
rapid signing, an explanation of this type of process need not invoke neutral space
specifically, nor require an argument involving underspecification. For example,
Mauk’s (2003) discussion of coarticulation in this context is phrased in terms of
undershoot, and he does not claim any special status for neutral space.
6
An interesting question about “h2-P” signs is how to explain that the non-dominant hand’s physical
position is almost always itself in neutral space. Assuming that the non-dominant hand itself is not
phonologically specified as having a neutral-space location (double-marking of monosyllabic signs’
location as both “neutral-space” and “h2” would violate Battison’s (1978) observation that a sign has
only one major body area), this would seem to suggest that neutral space serves as a default position in
these cases, which would be consistent with underspecification.
23
Figure 1.6. Expected direction of influence of various vowels on schwa and of various
sign locations on neutral signing space (N.S.).
1.3.3. Schwa and neutral space: Comparison and contrast
In light of the preceding discussions of English schwa and ASL neutral signing
space, meaningful comparison of the two within their respective linguistic systems can
be made. The two share a significant degree of articulatory freedom—schwa is
articulated in the middle of vowel space and displays more articulatory variability than
other vowels, and neutral-space signs are articulated at an area in front of the signer’s
body where freedom of movement also appears to be relatively large. As an illustration
of the latter point, consider that the nose and chin are just a few centimeters apart and
serve as distinct sign locations, while the low and high boundaries of neutral signing
space appear to be significantly further apart. Work by Manuel and Krakow (1984) and
Manuel (1990) indicates that languages with more crowded vowel inventories tend to
display less VV coarticulation, presumably because of a smaller margin of error, which
suggests that a signing location like neutral space with fewer close neighbors might
24
also tend to show more coarticulatory variability. Figure 1.6 above illustrates the
expected coarticulatory influence of various linguistic elements (i.e. vowels or signing
locations) on schwa and neutral space within their respective articulatory domains. The
choice to use schwa and neutral space as coarticulation targets in the present study was
strongly motivated by this comparison.
Regardless of any articulatory similarities between schwa and neutral space,
however, the functional differences between the two are considerable. Probably the
most obvious example of such a difference is the fact that English schwa is a segment,
while post-Stokoe (1960) analyses tend to treat sign parameters like location as subsegmental entities more akin to features or autosegmental units. Perhaps the clearest
evidence for this distinction is the fact that schwa can be uttered in isolation, while
location does not have any such articulatory independence within ASL. Though
perhaps less significant, it should also be noted that freedom of movement within
neutral space is three-dimensional, while vowel space is most often depicted as twodimensional, though this characterization of vowels is somewhat incomplete. It is also
possible that schwa admits some freedom of variation in other ways, such as via labial
action or with respect to the advanced tongue root parameter.
In addition, while the vowel-consonant distinction in spoken-language
linguistics is fairly plain, the existence of entities within sign linguistics analogous to
spoken-language consonants and vowels is not universally accepted. Where suggestions
of such a parallel might be made, whether in terms of phonological structure (e.g.
Perlmutter, 1991) or perceptual salience (e.g. Corina & Hildebrandt, 2002), the
25
movement parameter seems to be a much stronger candidate for a vowel analogue than
location. This alone means that schwa and neutral space may belong to significantly
different category types within the linguistic systems of English and ASL.
Finally, despite their apparent articulatory similarities, schwa and neutral space
do not appear to behave alike within their phonological systems, a fact reflected in the
lack of similarity of their treatment within the most dominant phonological models. For
example, to the best of my knowledge, ASL lacks the abundance of word pairs like
those in English mentioned earlier (photography-photograph and reflex-reflexive) that
would offer support for the existence of oppositions of neutral space with other sign
locations like the oppositions of [a] and [i] with schwa in the (American) English word
pairs just given. This indicates that neutral space is not some kind of default location
that other location values migrate to in particular circumstances like the unstressed
environments which tend to yield schwa.
At first glance, evidence against this assertion might seem to come from a
phenomenon like sign whispering, in which the signing space is substantially reduced
and may be confined to a small region such as neutral space. However, Emmorey,
McCullough and Brentari (2003, p. 41) report that this smaller articulation space in sign
whispering is restricted to “a location to the side, below the chest or to a location not
easily observed.” In other words, neutral space is not a region to which the articulation
space is exclusively limited in these circumstances, but is simply one of a number of
options. This is unlike the situation of schwa described above.
26
1.3.4. Schwa and neutral space: Summary
Because of their positions in their respective articulatory spaces, schwa and
neutral space share some evident similarities. Previous investigations, as well as
analysis of data to be presented in the current studies, indicate that both are susceptible
to long-distance coarticulatory effects, although admittedly this does not mean that both
are unique in this way within the inventories of their phonological systems. Sign
languages in particular have been little-investigated with respect to the possibility of
differing coarticulatory patterns among segments or other contextual variables. In any
event, both schwa and neutral space may be considered useful targets of study because
of their similar positioning in the middle of their articulatory spaces: they can both be
expected to undergo influence from “above,” “below,” or from “the side,” whether in
the literal physical sense of where the articulators (whether tongue or hands) are
located, or with respect to the somewhat more abstractly conceived formant space
location in the case of spoken language.
Nevertheless, based on the considerations explored in this paper, the schwa neutral space analogy must be considered imperfect at best. Findings like those of
Corina and Hildebrandt (2002), as well as phonological models such as Perlmutter’s
Mora model (Perlmutter, 1991), argue for movement (perhaps in combination with
location) being a sign parameter more analogous to the vowel in spoken language,
which if so would mean that the status of schwa and neutral space within speech and
sign must be fundamentally dissimilar. Therefore, the present study’s investigations of
English and ASL coarticulation, while taking advantage of the “middle of articulation
27
space” position that schwa and neutral space have in common, should not be considered
an attempt to show that these two items are analogous in any deeper sense such as with
respect to their phonological status.
1.4. Dissertation outline
This dissertation continues in Chapter 2 with the presentation of a production
study of VV coarticulation in English in which two experiments were carried out. The
first of these involved a very limited set of vowel contrasts while the second
investigated a larger vowel set. Long-distance VV effects were seen over a greater
distance than has been found in previous studies like Magen’s (1997): several of the 20
speakers tested showed significant anticipatory VV effects across at least three vowels’
distance. A great deal of variation among speakers was also seen, however, and some
speakers showed no or only weak effects.
Recordings made of some of the production-study speakers were used as stimuli
for the closely-linked perception study, which was carried out simultaneously with the
production study and the results of which are presented in Chapters 3 and 4. The
perception study involved both behavioral and event-related potential (ERP)
methodologies; the behavioral results are described in Chapter 3, while the ERP study
outcomes are presented in Chapter 4. The results showed that all listeners were
sensitive to nearer-distance effects, while at further distances much more variation
between listeners was seen. Some listeners were sensitive even to effects which had
occurred over three vowels’ distance. Somewhat unexpectedly, strength of
28
coarticulatory effects was not significantly correlated with either speaking rate or
perceptual sensitivity to such effects.
The approach followed in the spoken-language study—that of performing a
production and perception study simultaneously—was also followed in this project’s
investigation of coarticulation in ASL. Chapter 5 describes a LL coarticulation
production study of ASL, which examined five signers and which, like the spokenlanguage study, found evidence of long-distance coarticulatory effects in some signers
as well as a great deal of variability among the participants. The task also included a
“non-linguistic coarticulation” component. Chapter 6 discusses the integrated
perception study, whose results indicate that at least some signers are sensitive even to
relatively subtle Location-related coarticulatory effects.
Chapter 7 is the concluding chapter of the dissertation, in which some
implications of this project’s findings are presented.
1.5. Research questions and summary of results
Following are the main questions that this research aimed to address, along with
a brief summary of the outcomes that were found.
1) How far can anticipatory VV coarticulation extend in English?
The great majority of the 38 speakers investigated here showed significant
coarticulatory VV effects over at least one intervening consonant. Several speakers
persistently showed such effects across as many as five intervening segments
29
(including two intervening vowels). Follow-up experiments indicated that for such
speakers, even longer-distance effects might sometimes occur, but not as consistently.
2) How far can anticipatory LL coarticulation extend in ASL?
Five signers of ASL were investigated and some evidence of LL coarticulation
was found, including apparent cases of LL effects across two or three intervening signs.
However, the strength of these effects was notably weaker than that of the speech
effects.
3) In both cases, how perceptible are these effects?
The nearer-distance speech effects were easily perceived by all 28 listeners who
took part in the speech perception study. Even the longer-distance effects, which were
much more subtle, were perceived by a few listeners. Some of the participants in the
corresponding sign perception study also performed at significantly above-chance
levels, but in contrast to the speech study, such outcomes were not particularly
common, even for shorter-distance coarticulatory effects.
4) Can ERP methodology offer a more sensitive perception measure than behavioral
methods, at least with respect to spoken-language coarticulatory effects?
Results here were mixed. Perception of closer-distance effects was consistently
associated with the mismatch negativity (MMN) component, which has previously
been shown to be sensitive to phonemic contrasts (see Chapter 4 for more on the
MMN). To the best of my knowledge, this is the first finding of an MMN-like response
30
to the sub-phonemic contrasts associated with VV coarticulation. However, for the
longest-distance contrasts, no MMN-like results were found at all; indeed, an
unexpected positivity was seen.
5) With respect to production, are the coarticulatory effects we see in English and ASL
truly language-specific, or may such effects be seen in non-linguistic actions as well?
To address this question, a non-linguistic task was incorporated into the sign
production experiment. The results were quite similar to those seen in the sign
production experiment—significant but weak effects were found.
6) What lessons about language in general can we learn from the modality-specific
outcomes we find here?
With respect to coarticulatory behavior in the contexts examined here, signing
and non-linguistic manual actions clustered together in showing significant but
relatively weak coarticulation patterns, unlike speech, for which strong coarticulatory
effects were the norm. This may be due to the larger articulators used in signing and
other manual actions, and suggests that “coarticulation” as understood by linguists
might be more usefully understood in the broader context of human actions in general.
One implication of the present findings that may be specific to language relates to the
temporal extent of language planning, which seems to differ between modalities when
considered in terms of absolute time units like milliseconds—unsurprisingly on the one
hand, since sign articulators are larger and slower than those used in speech. But on the
other hand, this suggests that the limits of linguistic planning might be most
31
appropriately understood in terms of units such as gestures. These and other
implications of this work are explored in more detail throughout the dissertation.
32
PART I: SPOKEN-LANGUAGE PRODUCTION
AND PERCEPTION
CHAPTER 2 — COARTICULATION PRODUCTION IN ENGLISH
2.1. Introduction
Following Öhman’s (1966) groundbreaking work on transconsonantal vowel-tovowel (VV) coarticulation in Swedish, English and Russian, researchers have sought to
understand how different factors influence these effects, such factors being as varied as
the specific consonants and vowels involved (Öhman; Butcher & Weiher, 1976;
Recasens, 1984), prosodic context (Cho, 2004; Fletcher, 2004), and the vowel
inventory of the language in question (Manuel & Krakow, 1984; Manuel, 1990).
Instances of long-distance coarticulation, involving effects crossing two or more
intervening segments, have been found for phenomena such as lip protrusion
(Benguerel & Cowan, 1974), velum movement (Moll & Daniloff, 1971), and liquid
resonances (West, 1999; Heid & Hawkins, 2000), but the possible range of VV
coarticulatory effects is not yet known. This study investigates the extent and the
perceptibility of long-distance VV coarticulation, with a particular focus on variation
among speakers and listeners.
33
Despite early indications that VV coarticulation might be a relatively local
phenomenon (e.g. Gay, 1977), subsequent work has shown that this is not always the
case. For example, Magen (1997) analyzed [bVbƏbVb] sequences produced by four
English speakers and found evidence of coarticulatory effects between the first and
final vowel, meaning that such effects can cross foot boundaries and multiple syllable
boundaries. However, this was not so for all four speakers in the study, which
illustrates the fact that VV coarticulatory effects vary not just from language to
language or context to context, but also from speaker to speaker. This suggests that in
order to determine how prevalent such effects are among speakers and how far they can
extend, researchers may find it necessary to recruit greater numbers of speakers than
has generally been done before, since this may be the only way to get around the
statistical problem of large interspeaker variation. The present study’s use of a
relatively large number of participants (38 speakers) is an attempt to move in such a
direction.
An additional goal of this study was to look for coarticulatory effects in naturallanguage utterances. Although the use of nonsense words can often not be avoided
when all permutations of even a limited set of consonants, vowels, and prosodic
contexts are considered, it seems reasonable to suppose that this may result in study
participants speaking less fluently, thus lessening the potential for long-distance effects
to occur. Therefore, rather than analyze the outcome of a small number of speakers
saying a large number of sentences or nonsense words (cf. the Gay, 1977, study
mentioned above, which analyzed two speakers’ production of all VCV combinations
of V={i, a, u} and C={p, t, k}), the present study investigated a large number of
34
speakers each saying a small number of natural-language sentences, with the
corresponding trade-off that the set of contrasts considered was limited.
Although it seems evident from earlier work that some speakers do not
coarticulate much or at all over long distances (Gay, 1977; Magen, 1997), the fact that
some speakers do must be accounted for in any viable model of speech production (see
Farnetani & Recasens, 1999, for an overview of relevant approaches). The existence of
long-distance coarticulatory effects has similar implications for theories of speech
perception, since research has shown that such effects are sometimes perceptible to
listeners. For example, Martin and Bunnell (1982) cross-spliced recordings of words in
such a way as to create stimuli which varied in the consistency of their VV
coarticulatory patterns, and then had listeners perform recognition tasks. Stimuli which
were consistent with naturally occurring coarticulation patterns tended to be associated
with fewer false alarms and faster reaction times (see also Scarborough, 2003).
The present study provided the opportunity to explore the perception of VV
effects from a different, though related, perspective. Here, the ability of study
participants to distinguish vowels which had been differently “colored” by
coarticulatory effects at various distances was examined. Since longer-distance effects
may be expected to be more subtle than nearer-distance effects (all else being equal),
this provided an opportunity to explore variation among listeners in terms of their
sensitivity to such effects. If long-distance coarticulation is sometimes perceptible, this
would be particularly relevant to the study of lexical processing, particularly in light of
the fact that both the production and the perception of coarticulation appear to be
heavily influenced by the coarticulatory patterns of users’ native language (Beddor,
35
Harnsberger & Lindemann, 2002). If listeners are able to “hear ahead” a few segments
in the flow of spoken language, it could help them narrow down the possible range of
upcoming words more effectively as they process the incoming speech stream. From
this standpoint, all else being equal, anticipatory coarticulation would probably be more
useful to listeners than carryover coarticulation. Therefore, the current study focused on
anticipatory, not carryover, VV coarticulation.
2.1.1. Segmental and contextual factors
Öhman (1966) theorized that VCV sequences were essentially diphthongal in
nature, with the consonant gesture superimposed in the middle. His analysis has been
supported by some other researchers, such as Purcell (1979), and was extended by
Fowler (1983), who proposed the model shown below in Figure 2.1. Fowler’s
suggestion is that in a VV or VCV sequence, the most acoustically salient or
“prominent” features of the segments may be perceived by the listener as abrupt
transitions from one sound to the next, while in fact the actions characteristic of the
various segments are still performed, albeit simultaneously and with little acoustic
salience.
36
Figure 2.1. Fowler’s (1983) model of VCV articulation. The dotted vertical lines in the
second picture indicate points in time where acoustic prominence shifts from one sound
to the next, and where listeners are therefore likely to perceive a segment boundary.
The overlapping solid lines represent the physical articulations of each sound, which
can occur simultaneously.
Öhman (1966) also found that some consonants seemed to block VV
coarticulation; these include fricatives in English and palatalized consonants in
Russian. This led to speculation that consonant sounds requiring greater articulatory
effort involving the tongue body might tend to inhibit coarticulatory effects between
surrounding vowels, since such effects would tend to be blocked by the active
articulatory movement associated with the intervening consonant. One might expect for
example that VCV sequences in which the consonant requires active tongue body
movement would show weak VV coarticulatory effects relative to VCV sequences in
which C is bilabial. In fact, much research supports this (e.g., Butcher & Weiher, 1976;
Purcell, 1979; Recasens, 1984; Fowler & Brancazio, 2000).
37
In an electropalatographic study, Butcher and Weiher (1976) found that tongue
movements for [t] are smaller and faster than those for [k], with more VV
coarticulation occurring across [t] than [k]. In acoustic and electropalatographic studies
on Catalan, Recasens (1984; 2002) has found that the extent of VV coarticulation,
measured both in terms of F2 and fronting, is inversely related to the amount of tongue
dorsum contact, which is consistent with the predictions of a model based on “degree of
coarticulatory constraint” (DAC) (Recasens, Pallarès & Fontdevila, 1997). In this
model, for example, bilabials have a DAC value of 1, the lowest possible, while
dorsals, including dark [l], have a DAC value of 3; dentals and alveolars like [n] and
light [l] fall in the middle (DAC=2). Modaressi et al. (2004) have also found that
alveolars in VCV sequences are resistant to anticipatory coarticulation influence from
the following vowel, perhaps allowing for greater carryover effects from the first vowel
on the second.
Öhman (1966) originally suggested that certain consonants, such as those with
secondary articulations (e.g. palatalized consonants), might operate on a “channel”
(articulatory subsystem) typically associated with vowels. Keating (1985) suggests that
this intuition can best be expressed in terms of an autosegmental model in which
secondary consonantal properties like palatalization are essentially vowel tier features,
and that consonants with such features act to block VV coarticulation (see Figure 2.2
below).
38
Vowel feature tier
Consonant feature tier
[αF]
|
V
[βF]
|
C
|
[δG]
[γF]
|
V
Figure 2.2. Coarticulation model from Keating (1985). The consonant has a secondary
articulation which tends to block VV coarticulation, since the two vowels’ features are
no longer adjacent.
On the other hand, Gay (1977) contradicts Öhman (1966) and asserts that it is
the CV sequence that forms the basic gestural unit of a VCV sequence, rather than the
two vowels, and Shaffer (1982) also criticizes the continuous vowel-production model.
Choi and Keating (1991) have found significant, though small, amounts of VV
coarticulation across palatalized consonants in Russian and in other Slavic languages,
implying that coarticulation is not blocked as effectively by such consonants as Öhman
(1966) had suggested.
In a study designed to test the validity of the different models suggested by
Fowler (1983) and Keating (1985), Hussein (1990) examined coarticulation patterns in
VCV sequences in Arabic, but found that the actual situation was too complicated for
either model to handle adequately.
Different vowels also appear to exhibit different coarticulatory properties. For
example, Bell-Berti et al. (1979) found that different vowels were associated with
different degrees of velum height during adjacent consonants. Butcher and Weiher
39
(1976) found that of the vowels [i], [u], and [a], [i] exerted by far the greatest
coarticulatory influence on other vowels, while [a] exerted the least; for instance, the
researchers found that the only significant coarticulation in VCV sequences when C
was [k] occurred when [i] influenced [a]. Gay’s (1974) cinefluorographic study also
found carryover articulation occurring only on [a].
Some research, such as that of Cho (2004) and Fletcher (2004), has suggested
that stressed vowels tend to show more coarticulatory dominance (i.e., they influence
other vowels more, and are less influenced by other vowels) than unstressed vowels.
Cho (1999, 2004) has also found that vowels are more resistant to coarticulatory effects
across prosodic domain boundaries. In addition, such resistance seems to be stronger at
the boundaries of higher-level prosodic domains, so that for example, more VV
coarticulation can be expected to occur across word boundaries than across Intonation
Phrase boundaries.
2.1.2. Language-specific factors
The possibility that different languages might exhibit different coarticulatory
patterns was raised early on, when Öhman (1966) found that Russian behaved
differently with respect to VV coarticulation than English or Swedish. Some possible
explanations for this, involving articulatory factors such as tongue body behavior or
secondary articulation, were discussed above. A deeper question is whether it is not
only articulatory processes, but also the language-specific phonemic contrasts
themselves, which can influence the nature or extent of VV coarticulation.
40
For instance, it seems reasonable to suppose that the size of a language’s vowel
inventory might determine the extent of permissible VV coarticulation, since a smaller
vowel inventory would tend to allow greater variation within each vowel’s articulation
space without crossover into the spaces of other vowels. To test this hypothesis,
Manuel and Krakow (1984) and Manuel (1990) conducted studies comparing VV
coarticulation in languages with five-vowel inventories (Ndebele and Shona in the first
study, Swahili and Shona in the second) to that in languages with larger vowel systems
(Sotho in the first study, with seven vowels, and English in the second). Both studies
found that coarticulation was less extensive in the languages with more crowded vowel
inventories. In addition, the first study found that [a] showed more movement than [e]
in the languages that were examined, which would be expected since [a] has fewer near
neighbors than [e] in those languages and hence might be expected to vary more freely.
The question of whether consonants of longer duration might affect VV
coarticulation has also been addressed, but the results to date have been inconclusive.
Since coarticulation can generally be expected to diminish across greater distances, one
might expect that long consonants would tend to block or dampen coarticulation
relative to short ones. Some research in English (e.g., Huffman, 1986; Magen, 1997)
has found that this does not seem to be the case, but the issue appears not to have been
investigated systematically in languages in which consonant length is contrastive.
2.1.3. Speaker-specific factors
In many cases, coarticulatory patterns appear to be idiosyncratic, which may be
due to the wide range of options speakers appear to have when producing a target
41
sound (Fowler & Turvey, 1980; Lindblom, Lubker & Gay, 1979). In an
electropalatographic study, Butcher and Weiher (1976) analyzed patterns of articulation
and coarticulation in three speakers’ VCV sequences, and found a great deal of
interspeaker variation. For example, speakers varied in the amount of dorsum-palate
contact they made in transitions during sequences like [ta] and [ti]. Some speakers
apparently favor more gestural preprogramming, leading to greater anticipatory
coarticulation, than others. In another study highlighting interspeaker differences,
Parush et al. (1983) found that talkers producing VCV sequences with velar stops and
back vowels exhibited coarticulatory behavior patterns that were consistent between
speakers for carryover coarticulation but different for anticipatory coarticulation.
That speaking rate may be relevant has also been mentioned by some
researchers, such as Hussein (1990), who suggested that fast talkers may tend to
coarticulate more. Hertrich and Ackermann (1995) found that slower speech was
associated with less carryover coarticulation, but no significant difference in
anticipatory coarticulation, relative to more rapid speech.
2.2. First production experiment: [i] and [a] contexts
The first production experiment investigated the extent of coarticulatory effects
of vowels [i] and [a] on the vowel schwa [ə]. In the second production experiment, to
be discussed in Section 2.3, the set of context vowels was expanded to include [æ] and
[u] as well.
42
2.2.1. Methodology
2.2.1.1. Speakers
Twenty participants (11 female, 9 male) took part in the first production
experiment. Seven subjects were known personally to the author and agreed freely to
take part; the other thirteen were undergraduate students at the University of California
at Davis who received course credit for participating. The subjects’ ages ranged from
18 to 62 (mean age = 25.2; SD = 13.3). All participants were native speakers of
American English with no history of speech or hearing problems. All participants were
uninformed as to the purpose of the study.
The first seven participants took part only in the production experiment. A
subset of their vowel recordings was then used as stimuli for subsequent subjects, who
took part in both the production and perception experiments (see Chapter 3 for the
perception study). While every effort was made to recruit only monolingual speakers,
most subjects had been exposed to at least one other language as students in a
university with a foreign language course requirement. Of these, four felt that they had
acquired substantial knowledge of another language. However, significance testing
results in the analysis of the group dataset did not change depending on whether or not
those subjects were excluded; this was so for both the production and perception
studies. Therefore all subjects’ data were included.
2.2.1.2. Speech samples
First, it was necessary to create sentences containing plentiful opportunities for
VV coarticulation to occur. The vowels [i] and [a] were chosen as context vowels
43
because of their distance in vowel space. Consecutive vowels likely to be produced as
schwas, or at least substantially reduced, were used as targets; schwa was chosen
because of its susceptibility to coarticulatory influence from neighboring vowels, a
property that is has emerged in previous studies, both acoustic and articulatory (e.g.
Fowler, 1981; Alfonso & Baer, 1982; respectively). The sentences used were:
“It’s fun to look up at a key.”
“It’s fun to look up at a car.”
These items were the outcome of the following set of preferences: (1) sentences
containing only real words, for the reasons discussed earlier; (2) sentences not differing
prior to the context ([i] or [a]) vowel, which was to be the sentence-final vowel; (3)
monosyllabic words containing the target (schwa) and context ([i] and [a]) vowels, so
that coarticulatory effects would be “spontaneous” and not well-practiced within
particular lexical items; (4) function words containing the target vowels and content
words containing the context vowels, to encourage reduction of target vowels and full
pronunciation of context vowels; (5) context-vowel content words having relatively
high (and similar) frequency of use; (6) voiceless intervening consonants, so that vowel
boundaries would be as clear as possible when making formant-frequency
measurements and when creating stimuli for the perception study (see Chapter 3).
To minimize interference from intervening consonants’ demands on the tongue
body (cf. Recasens, 1984), it might have been preferable if the exclusive use of bilabial
intervening consonants had been feasible, such as in Magen (1997), where nonsense
words of the form [bVbəbVb] were used. However, in the multi-word context
44
necessary for this investigation and given the above constraints, issues like lexical gaps
and morphological/syntactic constraints quickly asserted themselves. In addition,
excessive alliteration would raise issues of articulatory ease and naturalness (e.g.,
“Peter Piper picked a peck of pickled peppers”). In the end, the full array of voiceless
stops in English was used, with those making fewer demands on the tongue body
located further away from the context vowels, to “assist” longer-distance effects at the
possible expense of shorter-distance ones.7
One point worth noting regarding these sentences is that any coarticulatory
effects of context vowels extending at least as far back as the target vowel in the word
up would have to extend across consonants formed at all three places of articulation of
English stops (labial, alveolar and velar), which would be of interest in light of the
literature discussed earlier in Section 2.1.1, much of which suggests that VV effects
tend to be weakened or entirely absent when the intervening consonant is coronal or
dorsal.
A randomized list containing six copies of each sentence was provided to each
speaker. Before recording, speakers were told about list effects and the need for their
avoidance. In order to encourage consistent prosodic patterning among these utterances,
speakers were asked to say the sentences as if they were being spoken in a normal
conversation in response to the question, “What’s it fun to look up at?”, with the
intended effect of obtaining utterances with primary stress on the final word, with some
emphasis also expected on the word “fun.” Speakers were given a chance to rehearse
until any substantial deviations from this pattern, such as “It’s fun to look 'up at a car”
(i.e., as if meaning “not down at a car”) were corrected.
7
Pilot testing had indicated that VCV effects on schwa were quite common, even across [k].
45
Figure 2.3. The diagram shows the expected influence on schwa of coarticulatory
effects of nearby [i] or [a]. Near [i], F1 is lowered and F2 is raised, while the reverse
holds near [a]. The magnitude of coarticulatory effects may be exaggerated in this
figure.
The final vowel, either [i] or [a], served as context vowel, while the preceding vowels
in the words “up,” “at,” and “a,” were the target vowels. In this paper, these will be
referred to as “distance-3,” “distance-2” and “distance-1” vowels, respectively. Because
of the intervening consonants, these distance conditions correspond respectively to a
total of 5, 3 and 1 total intervening segments between the context and target vowels.
46
Figure 2.3 above illustrates the expected coarticulatory effects of [i] and [a] on schwa
in two-dimensional formant space.8
Since it is difficult to create natural-sounding sentences in English containing
multiple consecutive unstressed syllables, the vowel [^] was used as distance-3 vowel
because of its acoustic similarity to schwa. In careful speech, the vowels in “at” [æ] and
“a” [eI] are not schwas either, of course, but it was expected that in the casual speech
speakers were encouraged to produce here, these vowels would be realized as schwa or
at least be substantially reduced. During the rehearsal preceding the recording process,
speakers who did not seem to be speaking naturally—presumably because of the
perceived formality of participating in a scientific experiment—were gently coached
until their production became more relaxed. For example, the experimenter could show
such subjects a key and ask what it was. When subjects invariably replied, “a [ə] key,”
the experimenter pointed out that during the rehearsal, they had been saying, “a [eI]
key.” This almost always made them aware of the issue and hence resolved it.
Some subjects did occasionally exhibit a slightly [æ]-like quality in “at” or a
slightly [eI]-like pronunciation of “a” even in casual speech. It should be noted that the
research question being investigated (how far VV effects can extend) does not strictly
require that schwas be the target vowel, although of course that was the general
intention. Overall, the great majority of the “at” and “a” vowels were in fact produced
8
It was expected that the rhotic occurring after the low vowel in “car” would not significantly color that
context vowel, at least not to the point of creating a long-distance coarticulatory [i]-[r] contrast rather
than an [i]-[a] contrast. While Heid and Hawkins (2000) and West (1999) have found evidence of longdistance resonance effects of liquids /l/ and /r/, these were in contexts in which the liquids were in
syllable-initial position. Nevertheless, a check was performed against the possibility, as will be explained
in Section 2.2, Results and Discussion.
47
with the expected schwa-like quality, and for convenience, the vowels in “up,” “at” and
“a” will be referred to here as schwas.
In order to obtain baseline formant values for the context vowels [a] and [i],
each speaker was also recorded repeating each of the following sentences three times;
in these sentences the context vowels in the final word are preceded by (and therefore
coarticulated with) themselves:9
“It’s fun to see keys.”
“It’s fun to saw cars.”
2.2.1.3. Recording and digitizing
Participants were seated comfortably at a table in a laboratory environment (a
quiet room measuring approx. 15 feet by 18 feet). The recording equipment consisted
of a Shure SM48 microphone attached to a Marantz professional CDR300 digital
recorder; these digital recordings were made at 16-bit resolution with a 48 kHz
sampling rate. The participant was given a randomized list containing the appropriate
number of copies of the sentences discussed above (i.e., six copies of both sentences
containing the consecutive schwas, and three copies of both sentences containing the
repeated context vowels). After the rehearsal process described earlier, the subject was
handed the microphone and held it several inches to one side of (not directly in front of)
his/her mouth, while saying the sentences in the order indicated on the list. If a
disfluency or other unwanted event occurred, that repetition of that sentence was
9
N.B.: The vowels in “saw” and “car” have merged in the variety of American English spoken by these
study participants.
48
repeated, until all of the sentences had been successfully recorded the required number
of times.
2.2.1.4. Editing, measurements and analysis
Editing of the digital sound files and formant frequency measurements were
performed onscreen using the Sound Edit function in Praat (Boersma & Weenink,
2005) for each sound file, with the following settings: (for spectrogram) analysis
window length 5 ms, dynamic range 30 dB; (for formant) maximum formant 5000 Hz
for male speakers or 5500 Hz for female speakers, number of formants 5, analysis
window length 25 ms, dynamic range 30 dB, and pre-emphasis from 50 Hz, using the
Burg algorithm to calculate LPC coefficients.
Figure 2.4. The acoustic representation of part of one utterance containing the sequence
“up at a.” The vowel starting boundary is taken as the formant track marking just prior
to the onset of voicing, while the end boundary is taken as the corresponding location
after voicing offset.
49
Each target vowel was excised from the whole-sentence recording and saved as
a separate sound file for purposes of data analysis, and also for possible use later as a
stimulus in the perception experiment.10 The starting boundary for each vowel was
defined as the formant track marking just prior to the onset of voicing, while the end
boundary was the corresponding location after voicing offset, as shown in Figure 2.4
above. Because of the intervening consonants, determining the boundary points of each
vowel was generally unproblematic. In less straightforward cases, such as some in
which speakers flapped or otherwise reduced the [t] in “at a,” visual inspection of the
amplitude trajectory usually showed a rapid change of slope at a particular point, which
was taken as marking the vowel boundary. In the few cases where this boundary was
not so clear, a special notation was made so that measurements made for those tokens
could be given further attention if they were clear outliers. In the end, there were only a
few such outliers, and their inclusion or exclusion from the analysis did not change the
results.11
The effects of anticipatory VV coarticulation were expected to be strongest in
the later portion of the target vowels (cf. Beddor, Harnsberger & Lindemann, 2002),
10
Measurements were made after excision instead of before mostly for convenience, because some of
these excised vowels were to be used for the perception study, for which repeated measurements (and
some alterations; see Section 3) would be necessary, these being more easily performed on the shorter
sound files. More importantly, it made essentially no difference whether measurements were made
before or after excision (confirmed through pilot testing), because of where the measurements were made
and the width of the LPC analysis window.
11
The issue of outliers was dealt with as follows. Other than those that were the result of genuine errors
(such as Praat clearly missing an F1 value and giving an F2 value as F1 instead), I decided I should either
keep all of them or omit all of them, rather than making such decisions on a case-by-case basis, which
could lead to bias. I did not want any reported significant outcomes to be contingent on any such caseby-case decision-making. In the final analysis, keeping the outliers instead of omitting them made little
difference in general, but resulted in more conservative outcomes in a few cases (for example, an
additional speaker would have had a significant distance-3 outcome if a particular outlier were removed).
Therefore the outliers were included.
50
where influence of the immediately-following consonant was also likely to be greatest.
As a compromise between seeking the former while minimizing the latter,
measurements of target vowels’ F1 and F2 were made at 25 ms before vowel offset. For
target vowels with duration under 50 ms, measurements were made at vowel midpoint.
The aim was to fill the 25-ms LPC analysis window with only vocalic information, as
late in the vowel as was feasible, but without acquiring acoustic information directly
from the following consonant (though coarticulatory effects of the consonant on the
vowel were sometimes evident, as the results section will show). Since all the target
vowels were over 25 ms in length, this was always possible. VV coarticulatory effects
were investigated at each distance through a statistical comparison, between the [i] and
[a] contexts, of the formant values of the target vowels articulated at that distance from
the context vowel.
For all group analyses reported in this paper, a normalization procedure based
on Gerstman (1968) was applied to each speaker’s nth raw formant values for n = 1 and
2. Starting with a given speaker’s average first and second formant values for full
vowels [a] and [i] and with the raw formant value Fn raw (for n = 1 or 2), the
corresponding normalized value is given by the formula
Fn norm = 999 * (Fn raw - Fn min) / (Fn max - Fn min),
where Fn max and Fn min are the largest and smallest nth formant values among that
speaker’s full vowels; in other words, F1 max and F1 min are given by the speaker’s
average F1 for [a] and [i] respectively, with the reverse order for F2 values. The
procedure has the effect of scaling each speaker’s formant values relative to the width
and height of his or her own vowel space, as defined by [a] and [i]. Both F1 and F2 are
51
scaled to a range 0-999 with the context [a] at (999, 0) and the context [i] at (0, 999).
This makes comparisons between speakers more reasonable (though not unproblematic;
see Chapter 7).12
2.2.2. Results and discussion
2.2.2.1. Group results
Figure 2.5 below is a normalized vowel-space plot showing formant frequencies
relative to the extremes of the context [a] and [i] for each distance condition and in each
context, averaged over all 20 speakers. As one might expect, increased distance from
the context vowel is associated with progressively reduced formant differences between
the [i] and [a] contexts. The mean normalized (F1, F2) for the [a] and [i] contexts are
(288, 449) and (46, 801) at distance 1; (731, 315) and (617, 379) at distance 2; and
(388, 130) and (389, 160) at distance 3.
It should be noted that the measurements illustrated in Figure 2.5 were made
near the end of the target vowels, where coarticulatory influence of the context vowel
was expected to be strongest. Therefore these formant values should not be expected to
correspond too closely to the values one would obtain in the steady-state portion of a
schwa vowel. This is especially true considering that the influence of the immediately
following consonant in each distance condition appears to be in play as well; for
example, as place of articulation changes from labial to alveolar to velar at distances 3
(“up”), 2 (“at”) and 1 (“a” [k]), F2 of the target pairs increases accordingly. Similarly,
12
Gerstman’s (1968) original formula specified that Fmax and Fmin were to be taken over nine (Dutch)
vowels, not just [a] and [i], but given the positions of those two vowels in vowel space, it seems
reasonable to take their F1 and F2 as providing the desired maximum and minimum values. Gerstman
alluded to this himself (p. 80).
52
F1 values are quite low for the target vowels overall, particularly at distances 3 and 1;
these vowels immediately preceded [p] and [k] respectively, both of which generally
reached fuller closure than the [t] in “at,” often realized as a flap.
To determine whether the differences illustrated in Figure 2.5 are significant,
repeated-measures ANOVAs with context vowel as factor were performed on the group
dataset at each distance and for each formant, using normalized formant values as
discussed above. For both F1 and F2, there was a highly significant main effect of
vowel at both distance 1 (first formant: F(1,19)=83.7, p<0.001; second formant:
F(1,19)=101.7, p<0.001) and distance 2 (F1: F(1,19)=13.6, p<0.01; F2: F(1,19)=22.9,
p<0.001). At distance 3, these effects appear to taper off, as the non-significant
outcome for F1 (F(1,19)=0.052, p=.82) and significant but weaker outcome for F2
(F(1,19)=5.04, p<0.05) show. Together, these outcomes provide strong evidence of
coarticulatory effects having occurred at all three distances. The existence of distance-3
effects is particularly noteworthy given that the distance-3 vowel was the target vowel
which would be expected to undergo the least amount of reduction based on its [^]
quality.
53
Figure 2.5. Context-vowel and distance-1, -2 and -3 target-vowel positions in
normalized vowel space, averaged over all 20 speakers; these averaged values are
marked by the line segment endpoints, not the labels. Context and distance-1, -2 and -3
vowel pairs are labeled with progressively smaller text size and an adjacent “C,” “1,”
“2” or “3,” respectively. The context-related differences between F1 and F2 are
significant at the p<0.05 level or greater for all 3 distance conditions, except for F1 at
distance 3.
2.2.2.2. Individual results
In order to explore these results further, the coarticulatory tendencies of
individual speakers were next examined. For each speaker, one-tailed heteroscedastic ttests were run for F1 and F2 for each distance condition (1, 2 or 3) to determine if
54
formant values differed significantly between the [i] and [a] contexts. Raw formant
values were used here since formant values were not being directly compared between
speakers. One-tailed tests were appropriate since it was predicted that [i]-colored
vowels would have lower F1 and higher F2 than [a]-colored vowels. The significance
results for all 20 speakers are summarized below in Table 2.1. Significance is given
without Bonferroni correction for multiple tests in order to provide a better picture of
the differences between individuals and the gradience of the effects over distance.
Numerical data for each speaker are given in Appendix A. To address the possibility
that speaking rate might be a relevant factor here, this was also measured for each
speaker; these figures are shown in the rightmost column of the table. Speech rate for a
given speaker was calculated by averaging, over that speaker’s utterances, the time
elapsing between the start of the distance-3 vowel and the start of the context vowel, a
span of six segments.
55
Speaker
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Distance 3
(“up”)
F1
F2
√
√
√
*
√
√
√
√
√
√
**
√
√
√
√
+
√
√
√
+
+
+
+
Distance 2
(“at”)
F1
F2
√
*
*
***
***
√
*
**
√
*
√
***
√
**
*
*
**
***
*
*
√
+
+
*
**
√
**
*
*
+
√
***
+
*
Distance 1
(“a”)
F1
F2
√
*
***
***
√
***
√
*
***
***
***
**
***
***
+
***
**
**
√
**
√
√
*
***
**
***
***
**
***
**
√
√
**
***
*
√
***
**
***
***
Speech rate
(seg/s)
13.8
15.5
13.9
11.2
15.2
12.7
15.3
13.6
12.1
15.2
11.8
14.2
11.0
12.2
17.6
11.2
14.4
16.5
12.1
12.0
Table 2.1. For each speaker, the significance testing outcomes of six t-tests are shown,
comparing formant frequency values of that speaker’s target vowels for the [i] vs. [a]
contexts, for each of F1 and F2 and for each distance condition. Significant results are
shaded and labeled, where * = p<0.05, ** = p<0.01 and *** = p<0.001 (no Bonferroni
correction). Also noted are marginal results, where + = p<0.10; a √ indicates a nonsignificant outcome in which averages were nevertheless in the expected direction (i.e.,
F1 greater for the [a] context than for the [i] context, and F2 lower). The rightmost
column shows each speaker’s rate of speech in segments per second.
As expected, most speakers showed a substantial amount of VV coarticulation,
though a great deal of interspeaker variation is evident as well. While two participants
56
showed significant results in all three distance conditions, several others showed few or
no significant effects. More specifically, 1 speaker (5% of the group of 20) showed no
significant effects at all, 4 (20%) showed effects as far as distance 1 but no further, 13
(65%) had effects as far as distance 2 but no further, and 2 (10%) had significant
outcomes at distance 3 (at Bonferroni-corrected p<0.000417, 10 of 20 speakers had
significant effects at distance 1 for F1 or F2 or both, and 4 of 20 speakers had
significant effects at distance 2). This confirms and extends Magen’s (1997) results, in
which she found high variability between speakers in the production of long-distance
VV coarticulation.
Even when formant differences did not reach significance, they still tended to
pattern the way one would expect, with [i]-colored vowels showing lower F1 and
higher F2 with respect to [a]-colored vowels. This pattern held with few exceptions in
the distance-1 and distance-2 conditions, but was much less evident in the distance-3
condition, where only two speakers (3 and 7) showed a difference which reached
significance. The results for Subject 7 are pictured below in Figure 2.6, which shows
this speaker’s average context- and distance-1, -2 and -3 target-vowel positions in
vowel space (non-normalized formant values are shown). Context and distance-1, -2
and -3 vowel pairs are labeled with progressively smaller text size and an adjacent “C,”
“1,” “2” or “3,” respectively. In Appendix B may be found such an illustration for each
speaker of the entire group of 20.
57
Figure 2.6. Subject 7’s coarticulation effects were the strongest of the twenty speakers,
with differences between contexts significant at the p<0.01 level or greater for F2 at all
three distances and for F1 at distance 1. The graph shows this speaker’s average
context- and distance-1, -2 and -3 target-vowel positions in vowel space. Context and
distance-1, -2 and -3 vowel pairs are labeled with progressively smaller text size and an
adjacent “C,” “1,” “2” or “3,” respectively. Formant values are not normalized. Values
are marked by line segment endpoints, not the labels.
2.2.2.3. Follow-up tests
At this point, some issues that may be raised about the outcomes reported thus
far will be addressed. First, given the number of t-tests performed, one may ask
whether the significant outcomes for Speakers 3 and 7 at distance 3 may be spurious,
58
since the possibility of a Type I error increases along with the number of such tests.
Second, how do we know that the presence of the rhotic in the context word “car” was
itself not the cause of the coarticulatory contrast with [i], given the resonance effects of
liquids reported by Heid and Hawkins (2000) and West (1999)? Finally, is it not
possible that some intervening consonant(s) might act as triggers themselves? In
particular, the [k] preceding [i] in “key” is expected to be fronted, so one might suspect
that the different [k]s in “key” and “car” were the real trigger for the effects on the
preceding vowels.13
To answer these concerns, a small follow-up was conducted with the study
subjects who had shown significant distance-3 effects. First, those two participants
(Speakers 3 and 7) were recorded saying sentences similar to the ones used earlier, but
with [r]-free context words:
“It’s fun to look up at a keep.”
“It’s fun to look up at a cop.”
Speaker 3 was also recorded saying sentences with [k]-free context words:
“It’s fun to look up at a peep.”
“It’s fun to look up at a pop.”
Measurements and significance testing were performed as before; the results are
shown below in Table 2.2, along with the original “key/car” results for comparison.
Although there are some differences in outcome associated with the different context
word pairs, some important similarities are clear: in addition to strong distance-1 and -2
effects, we see significant distance-3 effects in all cases. For these speakers, then, the
13
It should be noted that in many if not most studies of long-distance coarticulation, it may be impossible
to avoid the possibility that long-distance effects are actually short-distance effects that are transmitted
successively via intermediate segments.
59
distance-3 effects seem quite robust, and given the variety of contexts in which they are
maintained, cannot be due solely to the presence of any particular segments in the
context words other than the context vowels [i] and [a].
3
7
Distance 3
(“up”)
F1
F2
*
**
3
7
*
*
**
**
***
***
***
***
***
keep/cop
keep/cop
3
*
*
***
**
***
peep/pop
Speaker
Distance 2
(“at”)
F1
F2
***
***
***
Distance 1
(“a”)
F1
F2
***
***
***
Contrast
key/car
key/car
Table 2.2. For the two speakers, the significance testing outcomes of six t-tests are
shown for each contrast, comparing (non-normalized)formant frequency values of that
speaker’s target vowels in the [i] vs. [a] contexts, for each of F1 and F2 and for each
distance condition. Significant results are shaded and labeled, where * = p<0.05, ** =
p<0.01 and *** = p<0.001.
An additional question which should be addressed is whether these significant
distance-3 results might simply be due to some reduction process like segment deletion
having occurred somewhere between the context and distance-3 vowels. A closer
inspection of the data shows that this was not the case, as illustrated by Figure 2.7
below, which shows part of one of Speaker 7’s utterances. Clear transitioning between
the consonants and vowels, with no segment deletion, is apparent, a pattern that is not
at all unique to this particular utterance for this speaker.
60
u
p
a
t
a
k
ey
Figure 2.7. A typical recording made from Speaker 7, who showed strong
coarticulatory effects at up to three vowels’ distance. This was not due to slurring or
segment deletion, as shown above in the clear transitions from one segment to the next.
The image shows the sequence “up at a key.”
Coarticulation at various distances: Correlation results
One pattern that might be expected and which Table 2.1 seems to show is that
speakers who coarticulated more strongly at a closer distance were also more likely to
show significant results at greater distances. Statistical tests confirm that average vowel
space differences (taken as Euclidean distance in normalized F1- F2 space) between
speakers’ [i]- and [a]-colored schwas were significantly correlated (r = 0.48, p < 0.05)
between distances 1 and 2. However, such correlation was much weaker between
distances 2 and 3 (r = 0.34, p = 0.14), and even more so between distances 1 and 3 (r =
0.0006, p = 0.998), presumably because of the absence of distance-3 effects for most
speakers.
61
A related but more surprising outcome also seen in Table 2.1 is that Speakers 3,
10 and 11 all show apparently discontinuous coarticulatory effects; each of these three
speakers had a significant distance-2 outcome for one formant without a corresponding
distance-1 effect. Recasens (2002) examined “interruption events” such as these in
VCV sequences and following Fowler and Saltzman (1993), suggested that such
occurrences may be the result of a “fixed, long-term” planning strategy for the second
vowel already being executed by speakers during the first vowel. It should also be
noted that in all three of these cases, the trends were in the expected direction (see the
raw data in Appendix A), indicating that coarticulatory forces may have been at work
during the distance-2 vowel but were too weak to yield a statistically significant result.
62
Figure 2.8. Correlation between coarticulation production measures at distances 1 and
2. The correlation measure r = 0.48 (p<0.05). Normalized formant values were used.
Relevance of speaking rate
Although some researchers (e.g. Hussein, 1990) have suggested that speaking
rate may be related to coarticulatory tendency, an inspection of Table 2.1 shows that the
speakers in this study who coarticulated the most were not the fastest talkers, nor vice
versa. Although the slowest speakers (Speakers 4, 13 and 16) showed an absence of
significant effects at distances greater than 1, statistical testing for correlation between
speakers’ speech rate and normalized vowel-space distance between [i]- and [a]-colored
schwas found no significant effects in any of the three distance conditions.14 This result
complements work of Hertrich and Ackermann (1995), who found that slower speech
was associated with less carryover coarticulation, but no significant difference in
anticipatory coarticulation, relative to more rapid speech.
Temporal extent of VV coarticulation
Fowler and Saltzman (1993) have suggested that long-distance coarticulation
effects can be considered “long-distance” only in terms of the number of intervening
segments, in that the time span across which such effects can occur is relatively narrow.
14
One reader has pointed out that this may be a case of a threshold effect, rather than an absence of an
effect altogether. In other words, such an effect might be present at slow speaking rates, but not be
apparent above a certain threshold rate, “perhaps having to do with the fact that one must enunciate to at
least some extent to be understandable.”
63
This may in fact be the case, but if so, the upper limit they suggest (approximately 200250 ms) seems low in light of the fact that the two speakers who coarticulated the most
in this study (Speakers 3 and 7) showed significant effects across time spans of well
over 300 ms. The temporal distance between Speaker 7’s distance-3 vowel offset and
context vowel onset over his 12 “key/car” utterances ranged from 298 to 377 ms and
averaged 333 ms; for Speaker 3 the distances involved are even greater (range = [301,
472]; mean = 384 ms).
2.2.2.4. Longer-distance effects
Remaining open is the question of whether VV coarticulatory effects at even
greater distances can occur with any substantial frequency. In related work, Heid and
Hawkins (2000) and West (1999) have found evidence of different resonances for [r]
compared to [l] across several syllables, manifested as lowered formants (F3 for West,
F2 + F3 + F4 for Heid & Hawkins), increased lip rounding, and high or back tongue
position for [r] contexts compared to [l] contexts. To investigate the possibility of such
extreme long-distance effects here, all of the vowels in the utterances of Speakers 3 and
7, who had already shown significant coarticulatory effects as far back as distance 3,
were analyzed and compared between the [i] and [a] contexts. The results are shown
below in Table 2.3, and appear to show some significant outcomes at distances 4 and 5.
However, the magnitude of the effects is consistently small. There are also a number of
discontinuities like those seen earlier for Speakers 3, 10 and 11, but occurring over
wider ranges and, unlike in those cases, with trends at closer distances not always in the
expected direction. To the extent that coarticulatory effects may have occurred over
64
such distances, they are clearly less robust than those reported for these speakers at
closer distances.
Dist.
7
(“it’s”)
6
(“fun”)
5
(“to”)
4
(“look”)
Spkr. F1 F2 F1 F2 F1 F2 F1
3
7
*
3
(“up”)
1
(“a”)
F2 F1 F2 F1 F2 F1
*
*
*
**
3
*
7
*
3
2
(“at”)
*
*
F2
Contrast
key/
car
key/
*** *** ***
car
*** ***
***
keep/
cop
keep/
*** *** ***
cop
** ** *** ***
* *** ** ***
peep/
pop
Table 2.3. Possible very-long-distance coarticulatory effects, Speakers 3 and 7, with
significance testing results between contexts indicated for target vowels at each
distance from 1 to 7 before the context vowel, with significant results shaded and
labeled, with * = p<0.05, ** = p<0.01 and *** = p<0.001.
2.3. Second production experiment: [i], [a], [u] and [æ] contexts
Because of the positive outcome of production experiment just discussed, in
which significant VV coarticulatory effects were evident for the [i] - [a] contrast at least
as far as three vowels back, a second, more comprehensive experiment was run, in
which more vowel contrasts were examined.
65
2.3.1 Methodology
2.3.1.1. Speakers
An additional 18 participants (6 female; ages ranging from 18 to 22, with mean
19.6 and SD 1.9) were recruited for the second production experiment. To avoid
confusion with earlier subjects, individuals in this group as well as those in later
chapters will be identified with indexing that increases rather than being “reset” for
later groups; the present group therefore consists of Speakers 21 through 38. All were
undergraduate students at the University of California at Davis who received course
credit for participating, were native speakers of American English and were uninformed
as to the purpose of the study. For comparison purposes, three of the speakers who had
coarticulated the most in the first experiment—Speakers 3, 5 and 7—also took part in
the second one, but their data was not used in the group analyses to be reported for this
second participant group.
As was the case with the first group of speakers, while only monolingual
speakers were sought, most subjects had been exposed to at least one other language as
students in a university with a foreign language course requirement. Of these, two felt
that they had acquired substantial knowledge of another language; since group results
did not change depending on whether they were excluded, their data was included.
2.3.1.2. Speech samples
The number of context vowels in the second experiment was increased to four,
corresponding to the four corners of the vowel quadrangle (i.e. the vowels [i], [a], [u]
66
and [æ]). In addition, recordings with [^] as context vowel were made for comparison
purposes. As in the first study, each sentence was spoken six times by each speaker.
These sentences were:
“It’s fun to look up at a keep.”
“It’s fun to look up at a coop.”
“It’s fun to look up at a cap.”
“It’s fun to look up at a cop.”
“It’s fun to look up at a cup.”
The expected effects of each of the four corner vowels on schwa are indicated in
Table 2.4 and illustrated in Figure 2.9 below.
Vowel
[i]
[a]
[u]
[æ]
High or Low?
High
Low
High
Low
Front or Back?
Front
Back
Back
Front
Effect on F1
Lowered
Raised
Lowered
Raised
Effect on F2
Raised
Lowered
Lowered
Raised
Table 2.4. Expected coarticulatory influence of four vowels on nearby schwa.
67
Figure 2.9. The diagram shows the expected directional influence on schwa from
coarticulatory effects of nearby [i], [a], [u] or [æ].
In order to obtain baseline formant values for the four context vowels plus [^],
each speaker was also recorded repeating each of the following sentences three times;
in these sentences the context vowels in the final word are preceded by (and therefore
coarticulated with) themselves:
“It’s fun to seek keeps.”
“It’s fun to sock cops.”
“It’s fun to sack caps.”
“It’s fun to sue coops.”
“It’s fun to suck cups.”
68
2.3.1.3. Recording and digitizing
These procedures were the same as in the first production experiment.
2.3.1.4. Measurements and data sample points
Recall that in the first experiment, formant measurements of target vowels were
taken at 25 ms before the vowel boundary (see Section 2.2.1.4), partly because of
concern that measurements made at the very end of the vowel might be compromised
by C-to-V effects of the immediately following consonant. In the end, the results for
those 20 speakers were such that these C-to-V effects did not appear to block longdistance VV effects, but rather seemed to be superimposed over them (recall Figures
2.5 and 2.6).
Therefore, for the second experiment, vowel boundaries were determined using
the same procedures as in the first experiment (see Figure 2.4), but measurements were
taken at the endpoint of each target vowel rather than 25 ms earlier. As a result,
influences related to VV coarticulation from the context vowel might be expected to be
greater than in the first experiment, though C-to-V effects from the following
consonant might also be expected to be stronger.
2.3.2. Results and discussion
2.3.2.1. Group results
Figure 2.10 below is a vowel-space plot showing normalized first and second
formant frequencies for each distance condition and in each context, averaged over the
group of 18 speakers. For the purposes of group analysis, data were excluded for the
69
three subjects from the first experiment (Speakers 3, 5 and 7) who had also provided
data for this experiment; this exclusion was made to avoid biasing the data for this new
group of speakers with data from speakers who had already been known to show strong
coarticulatory tendencies. As before, all group analyses to be given here were
performed using normalized formant values. Context and distance-1, -2 and -3 vowel
sets are labeled with progressively smaller typeface. As in the first production
experiment, increased distance from the context vowel is associated with progressively
reduced formant differences among the vowel contexts. Also, as was seen in the group
results for the first experiment, C-to-V effects seem to be superimposed over the VV
effects at each distance; as place of articulation changes from labial to alveolar to velar
at distance 3 (“up”), 2 (“at”) and 1 (“a keep/cop/coop/cap”), F2 of the target vowel sets
increases accordingly.
70
Figure 2.10. Context-vowel and distance-1, -2 and -3 target-vowel positions in
normalized vowel space, averaged over the 18 speakers. The averaged formant values
are marked by the line segment endpoints, not the labels. Context and distance-1, -2
and -3 vowel sets are labeled with progressively smaller text size and are seen toward
the left, center and right of the figure, respectively. Significance testing results are
given in the text below.
Four-vowel tests
To determine the extent to which the differences illustrated in Figure 2.10 are
significant, repeated-measures ANOVAs with context vowel as factor were performed
71
on the group dataset at each distance and for each formant. Only data for the four
corner vowels (i.e., excluding vowels coarticulated with [^]) were tested.
For both F1 and F2, there was a highly significant main effect of vowel at both
distance 1 (first formant: F(3,51)=37.9, p<0.001; second formant: F(3,51)=82.4,
p<0.001) and distance 2 (F(3,51)=14.9, p<0.001; F(3,51)=17.5, p<0.001). At distance
3, significant outcomes are seen for neither F1 (F(3,51)=2.24, p=0.09) nor F2
(F(3,51)=0.35, p=0.79).
Next, repeated-measures ANOVAs were performed on the group dataset at each
distance and for each formant, but this time two sets of such ANOVAs were run, the
first with respect to F1 with vowel height as factor, and the second with respect to F2
with vowel frontness as factor. The first set of ANOVAs should indicate how
differently target vowels in the context of high vowels ([i] or [u]) were articulated from
those in the low-vowel ([a] or [æ]) context. Similarly, the second set of ANOVAs
indicates the relative amount of difference in articulation of target vowels at various
distances associated with front ([i], [æ]) versus back ([u], [a]) context vowels.
The outcome found highly significant effects at distance 1 associated both with
height (F1) (F(1,17)=111.7, p<0.001) and frontness (F2) (F(1,17)=129.5, p<0.001).
Effects were also quite strong at distance 2 (for height (F1), F(1,17)=35.9, p<0.001; for
frontness (F2), F(1,17)=29.6, p<0.001). As was the case for the ANOVA testing with
vowel as factor, at distance 3 the effect associated with vowel height (F1) approaches
significance (F(1,17)=3.90, p=0.065), but there is not even a marginally significant
effect associated with frontness (F2) (F(1,17)=0.067, p=0.80).
72
Testing on individual vowel pairs
Finally, the formant differences of target vowels among the individual contexts
pairs were examined. Since there are six sets of vowel pairs associated with the four
corner vowels, six sets of repeated-measures ANOVAs were run; in each such set,
comparisons of normalized F1 and F2 values were made at each distance with vowel as
factor, where in each set the vowel factor had only two levels. The results are shown
below in Table 2.5, the final row of which gives results for the [i] - [a] contrast for the
entire set of 38 speakers who took part in either the first or second experiment. Only the
significance testing outcomes are shown in the table (with * = p<0.05, ** = p<0.01 and
*** = p<0.001); the numerical results are given in Appendix F.
Contrast
Distance 3
Distance 2
Distance 1
F1
F1
F2
F1
F2
***
***
***
***
***
**
F2
[i]-[a]
[a]-[u]
*
***
[æ]-[u]
[a]-[æ]
**
*
[i]-[æ]
***
[i]-[u]
All 38 speakers:
[a]-[i]
***
*
***
***
***
***
***
***
***
**
***
***
***
***
Table 2.5. Results of ANOVA testing on first and second formants of target vowels,
with comparisons made between each vowel pair chosen from the vowels [a], [i], [u],
73
[æ]. The first six rows show results for the second production experiment. The bottom
row gives results for the [i] - [a] contrast for all of the 38 speakers who took part in
either production experiment. Significant results are shaded and labeled, with * =
p<0.05, ** = p<0.01 and *** = p<0.001.
Together with the outcome of the first experiment, the results presented here
provide further evidence that coarticulatory VV effects can extend across as much as
three vowels’ distance, at least for some speakers. Now, the coarticulatory tendencies
of individual participants in the second production study will be examined.
2.3.2.2. Individual results
A great deal more data was collected for this second group of study participants
than for the first; comparisons of outcomes just for the [i] and [a] contexts will be made
to begin with before more comprehensive results are given. Since these are the two
vowels most distant in formant space, one may expect that the strongest coarticulation
results will tend to be found in comparisons made between those contexts. Focus on the
[i] - [a] pair will also be necessary when comparisons are made with the outcome of the
first group of subjects (Speakers 1 through 20), or when data for both groups are pooled
for analyses of the collective behavior of all 38 subjects.
Initial significance testing was performed the same as with the first group, with
t-tests between vowel contexts for each speaker at each distance, except that in this case
there were more contexts to compare. For now, just the results for the [i] - [a] contrast
will be presented. To summarize the testing procedure again, for each speaker, one-
74
tailed heteroscedastic t-tests were run for F1 and F2 for each distance condition (1, 2 or
3) to determine if formant values differed significantly between the [i] and [a] contexts.
One-tailed tests were appropriate since it was predicted that [i]-colored vowels would
have lower F1 and higher F2 than [a]-colored vowels. The significance results for all 21
speakers—the 18 new subjects plus the three speakers who had coarticulated the most
in the first experiment—are summarized below in Table 2.6 (numerical data for all
speakers are given in Appendix C). To address the possibility that speaking rate might
be a relevant factor here, this was also measured for each speaker; these figures are
shown in the rightmost column of the table. Speech rate for a given speaker was
calculated by averaging, over that speaker’s [i]- and [a]-context utterances, the time
elapsing between the start of the distance-3 vowel and the start of the context vowel, a
span of six segments.
75
Speaker
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
3
5
7
Distance 3
(“up”)
F1
F2
√
√
√
√
√
√
*
+
√
√
**
+
√
√
√
√
√
+
√
+
*
*
√
√
√
√
+
Distance 2
(“at”)
F1
F2
*
+
*
√
*
***
√
+
√
+
√
√
√
**
*
**
√
**
+
+
+
**
*
*
***
√
***
√
***
√
*
***
**
+
√
√
***
**
√
Distance 1
(“a”)
F1
F2
***
***
**
***
***
***
**
***
**
√
**
**
+
***
***
***
**
***
***
***
+
***
**
***
***
***
**
***
**
**
√
***
*
***
**
*
**
√
***
***
***
***
Speech rate
(seg/s)
14.5
13.0
13.8
11.7
14.7
11.9
13.2
13.7
12.3
12.1
11.1
16.5
13.8
14.6
12.1
13.7
13.6
14.2
13.4
14.7
15.0
Table 2.6. For each speaker, the significance testing outcomes of six t-tests are shown,
comparing formant frequency values of that speaker’s target vowels for the [i] vs. [a]
contexts, for each of F1 and F2 and for each distance condition. Significant results are
shaded and labeled, where * = p<0.05, ** = p<0.01 and *** = p<0.001 (not Bonferroni
corrected). Also noted are marginal results, where + = p<0.10; a √ indicates a nonsignificant outcome in which averages were nevertheless in the expected direction (i.e.,
F1 greater for the [a] context than for the [i] context, and F2 lower). The rightmost
column shows each speaker’s rate of speech in segments per second.
76
As with the first experiment, we find here significant testing outcomes even as
far back as distance 3, with more and stronger coarticulatory effects generally seen at
nearer distances than at further ones. Again, much interspeaker variation is evident as
well: while there were no speakers in the new group of 18 who had a total absence of
significant effects in the [i] vs. [a] contexts, 3 (17%) showed effects as far as distance 1
but no further, 8 (44%) had effects as far as distance 2 but no further, and 4 (22%) had
significant outcomes as far as distance 3 (at Bonferroni-corrected p<0.000463, 13 of 18
speakers had significant effects at distance 1 for F1 or F2 or both, and 3 of 20 speakers
had significant effects at distance 2).
One notable difference from the outcome of the first group is the relatively large
number of significant effects for F1, even at distance 3. In contrast, fewer significant
results were found for Speakers 3, 5 and 7 here than in the first experiment. In
particular, no significant results for those speakers were seen here at distance 3, though
Speaker 7 did have a weakly significant effect for F2 there. Possible reasons for these
differences will be addressed later, in Section 2.3.2.5.
Despite the weaker outcomes for those particular speakers, distance-3 effects
were not uncommon in this group as a whole. The strongest coarticulatory effects seen
in this group of speakers were probably those of Speaker 37, who showed significant
effects at all three distances. Figure 2.11 below illustrates this speaker’s average
context- and distance-1, -2 and -3 target-vowel positions in vowel space, for all four
corner-vowel contexts, not just [i] and [a]. Context and distance-1, -2 and -3 vowel
pairs are labeled with progressively smaller text size and an adjacent “C,” “1,” “2” or
“3,” respectively. Significance testing results for each distance condition and context
77
vowel pair are coded green for one of F1 or F2 significantly different, red for both, or
dotted black for neither. The outer four context vowels are joined by solid blue lines.
The context vowel [^] is also shown for reference purposes. Because this speaker, like
most of the others, produced the context vowel [u] as a diphthong, the vowel-space
positions of both the onset and offset of this vowel are shown; the former is labeled “u” and the latter is simply given as “u.” Appendix E provides such a graph for each
speaker of the entire group of 21 speakers who participated in this experiment.
Figure 2.11. Subject 37’s coarticulation effects were probably the strongest of the
speakers who took part in the second production experiment, with significant
coarticulatory effects produced at all three distances. The graph shows this speaker’s
average context- and distance-1, -2 and -3 target-vowel positions in vowel space.
78
Context and distance-1, -2 and -3 vowel pairs are labeled with progressively smaller
text size and an adjacent “C,” “1,” “2” or “3,” respectively. Significance testing results
for each distance condition and context vowel pair are coded green for one of F1 or F2
significantly different, red for both, or dotted black for neither. Context vowels are
joined by a blue line. The symbol “u-” indicates the onset of diphthongal context vowel
[u], while the “u” label indicates its offset. Except for those two labels, averaged
formant values are marked by the line segment endpoints, not the labels themselves.
The numerical data upon which Figure 2.11 is based are summarized below in
Tables 2.7 and 2.8. Table 2.7 shows Speaker 37’s average F1 and F2 values for each
distance condition and vowel context, including [^]. Standard deviations for each
formant at each distance across vowel contexts are also given. Table 2.8 shows how the
significance coding in Figure 2.11 was determined. Each of the values ranging from 0
to 1 in a cell of Table 2.8 is the probability outcome of a t-test comparing the six values
of that formant measured at that distance in each of the two contexts. Values under
0.05, indicating significance at that level, are colored orange. A blank cell indicates an
outcome in the contrary-to-expected direction (e.g. F1 higher for the [i] than [a]
context).
79
Distance 3
F1
F2
Distance 2
F1
F2
Distance 1
F1
F2
[a]
[æ]
[^]
[i]
[u]
489
463
457
453
454
1178
1161
1144
1149
1169
532
522
521
439
518
1464
1482
1513
1539
1478
410
387
392
361
386
1686
1907
1634
1956
1771
SDs
11.2
17.7
38.2
30.4
17.6
138
Context
Table 2.7. Average F1 and F2 values for Speaker 37 at each distance and in each vowel
context, and average SDs for each formant at each distance. This speaker showed
strong coarticulatory effects at all three distances.
Given the number of t-tests performed in Table 2.8, some false positives are
likely; the question is whether the number of such results is significantly larger than
what one would expect from a random outcome. To permit this determination, the
numbers of significant testing outcomes in each distance condition for the p<0.05 and
p<0.01 levels of significance have been tallied in the bottom row of Table 2.8. For each
distance condition, there are two formants and ten context vowel pairs, giving a total of
twenty probability values. At the p<0.05 level of significance, we would expect about 1
(i.e. 20 times 0.05) spurious result at random, and performing an analysis using the
binomial distribution we find that 3 spurious results or fewer would be expected more
than 98% of the time. For the p<0.01 level, even one such result is less than 2% likely.
Given the Figures at the bottom of Table 2.8, which show that Speaker 37’s significant
testing outcomes numbered much greater than these threshold values, we can be
80
confident that overall, the significant outcomes associated with this speaker, even at
distance 3, are not spurious.
Context pair
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# of sig results,
by distance:
Distance 3
F1
F2
0.018
0.006
0.035
0.002
0.232
0.344
0.165
0.405
0.935
0.417
<.05
4
0.752
0.337
0.431
<.01
2
Distance 2
F1
F2
0.016
0.657
0.354
0.275
0.020
0.477
0.389
0.029
0.052
0.449
<.05
9
Distance 1
F1
F2
0.000
0.167
0.003
0.305
0.004
0.016
0.063
0.134
0.024
0.115
0.403
0.014
0.000
0.002
<.01
5
0.487
0.095
0.231
0.362
<.05
8
0.000
0.001
0.097
0.146
0.003
0.001
0.001
0.001
<.01
6
Table 2.8. Level of significance associated with Speaker 37’s formant frequency
differences between contexts, for each of 10 context vowel pairs and 3 distance
conditions, for each of F1 and F2. A blank cell indicates an outcome in the contrary-toexpected direction (e.g. F1 higher for the [i] than [a] context). The final row is a tally of
significant results at the p<0.05 and p<0.01 levels of significance for each of the three
distance conditions. These tallied values are far greater than would be expected from
random false positives.
A final summary of Speaker 37’s results is given below in Table 2.9. The
notation used in this table is like that seen earlier, with * = p<0.05, ** = p<0.01, *** =
p<0.001, + = p<0.10; a √ indicates a non-significant outcome in which averages were
81
nevertheless in the expected direction. However, in this table the results for all ten
context vowel pairs are given, not only those for the [i] - [a] contrast. Also given in the
bottom row of each column, corresponding to one of F1 or F2 in a particular distance
condition, is a count of results significant at at least the p<0.05 level of significance.
Because there are ten values associated with each column, the expected number of false
positives per column is 0.5, with two or fewer such outcomes in any given column
being over 98% probable. Again, the results for this speaker are well above this
threshold.
Distance 3
F1
F2
Distance 2
F1
F2
Distance 1
F1
F2
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
*
**
*
**
√
√
√
√
√
√
*
√
√
√
*
√
√
*
+
√
***
√
**
√
**
*
+
√
*
√
√
*
***
**
√
+
√
√
# of results sig at
p<0.05,
by column
4
3
6
2
Context pair
√
√
√
0
***
***
+
√
**
***
***
**
6
Table 2.9. Summary of significance testing outcomes for Speaker 37 for each formant
at each distance and for each contrasting context-vowel pair, with * = p<0.05, ** =
p<0.01, *** = p<0.001, + = p<0.10, and a √ indicating a non-significant outcome in
which averages were nevertheless in the expected direction. The bottom row gives the
82
number of results in the column above which were significant at the p<0.05 level or
stronger.
Analyses like those presented for Speaker 37 in Tables 2.7, 2.8 and 2.9 are
given in Appendices C and D for all 21 speakers who took part in the second
production experiment. In order to highlight in brief the interspeaker variation seen in
these results, numerical data like those given in the bottom row of Table 2.9 for
Speaker 33 is shown below in Table 2.10 for each member of this group of 21 speakers,
representing the number out of the ten total context-vowel pairs for which statistical
testing yields an outcome significant at at least the p<0.05 level of confidence. As was
the case with the final row of Table 2.9, the expected value of each cell is 0.5, with
values above 1 expected less than 10% of the time, and values above 2 (shaded green in
the table) expected less than 1 percent of the time. Given this, the values shown in
Table 2.10 are clearly too numerous to have been obtained through chance outcomes,
highlighting the group results seen earlier. The data in the table show clearly that most
speakers produced significant distance-1 differences not only between the [a] and [i]
contexts, but for most of the distinct context vowel pairs. At distances 2 and 3, the
effects are not as ubiquitous as for distance 1 but are again well in excess of chance
levels.
83
Speaker
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
3
5
7
Distance 3
F1
F2
0
0
0
0
1
0
0
2
1
1
0
2
4
0
1
0
0
0
4
1
0
0
1
0
0
2
0
1
0
0
1
1
4
0
1
0
2
0
0
3
0
1
Distance 2
F1
F2
2
2
2
0
6
4
2
3
1
2
0
0
2
4
3
4
1
4
1
2
0
5
0
3
3
4
3
4
2
2
7
0
3
6
5
1
3
5
1
6
1
4
Distance 1
F1
F2
3
8
3
8
6
9
4
8
6
0
3
6
3
6
6
8
2
8
6
6
3
5
6
7
3
7
5
9
5
7
0
6
2
6
2
4
5
9
4
7
6
8
Table 2.10. For each speaker, for each formant at each distance, the table shows the
number of vowel-pair contrasts for which t-testing found a result significant at the
p<0.05 level or stronger. Values above 2 are expected less than 1% of the time and are
highlighted in green; for comparison purposes, values of 2 are colored yellow and
values under 2 are left unshaded.
84
2.3.2.3. Follow-up tests
Coarticulation at various distances: Correlation results
Recall that for the first group of participants (Subjects 1 to 20), the strength of
coarticulatory effects was significantly correlated when compared between distance
conditions 1 and 2. The results for the second group (Subjects 21 to 38) are weaker.
The correlation between these subjects’ formant differences between the [i] and [a]
vowel contexts was not significant between any two distance conditions (distances 1
and 2 (r = -0.05, p=0.85), distances 2 and 3 (r = 0.30, p=0.23), and distances 1 and 3 (r
= -0.01, p=0.96). However, pooling the data for the entire group of 38 subjects who
participated in either experiment, one finds significant correlation between these
coarticulation production measures between distances 1 and 2 (r = 0.37, p<0.05) and
between distances 2 and 3 (r = 0.35, p<0.05); the first result is illustrated below in
Figure 2.12. Between distances 2 and 3 the correlation is positive but not significant (r
= 0.07, p=0.68).
85
Figure 2.12. Correlation between coarticulation production measures at distances 1 and
3 for all of the speakers who took part in the either production experiment. The
correlation measure r = 0.37 (p<0.05). Normalized formant values were used.
The lack of strong correlation results for coarticulation production at various
distances for the second group in particular is unexpected, since it seems reasonable to
suppose that speakers who coarticulate at greater distances would also do so at nearer
distances as was true for the first group of speakers. It may be the case that the slightly
different measuring position used for the later subjects’ target vowel formant
frequencies (immediately adjacent to the following consonant, rather than 25 ms earlier
as was done for the first group of subjects) results in more substantial interference from
the coarticulatory effects of that consonant on the target vowel being measured.
86
Although we have seen for both groups (see Figures 2.5 and 2.10) that the C-to-V
effects seem to overlay the longer-distance VV effects rather than completely
supplanting them, perhaps this is too simple an assumption. Instead of simple
superposition, the C-to-V and VV effects may interact in a more complex manner than
is initially apparent, perhaps differing substantially between speakers and so weakening
the correlation results.
Relevance of speaking rate
It was seen earlier that for the first group of subjects, speaking rate and
coarticulation production were not significantly correlated at any distance. This appears
to be so for the second group of subjects as well (for distance 1, r = -0.30, p=0.23; for
distance 2, r = 0.27, p=0.29; for distance 3, r = 0.12, p=0.63). Use of other production
measures, such as raw or logarithmic values, or considering only F1 or only F2, did not
produce stronger results. This was found to be so of the datasets obtained from the first
group, the second group, and the pooled dataset for all 38 subjects.
It should be noted that the present results do not conflict with the idea that
speech rate and coarticulation might be correlated at the individual speaker level. In
other words, a person’s typical speech rate might not predict his or her coarticulatory
tendency relative to other speakers (as the results of the present study suggest, at least
for anticipatory VV effects), but a particular speaker may coarticulate more or less
depending on how fast he or she is speaking compared to his or her usual speech rate.
The current project was not intended to address this particular possibility, either for
87
spoken language or sign language, although other researchers have looked into the
matter; for example, Mauk (2003) explored issues related to undershoot by
investigating Location-to-Location coarticulation in ASL. Mauk found that signers
coarticulated more in faster signing conditions than in slower ones. On the other hand,
Matthies, Perrier, Perkell and Zandipour (2001) investigated the issue in spoken
language and obtained a null or at best weak result.
2.3.2.4. Longer-distance effects
Recall that each of the two speakers in the first production study who produced
significant distance-3 effects were also found to have possible, though inconsistent and
weaker, effects at even greater distances. Since a number of speakers in the second
study (Speakers 27, 30, 37 and 38) also produced significant distance-3 effects, a
follow-up on these subjects was run to see if they too might produce very long-distance
coarticulation. To focus the discussion and to facilitate comparison with the similar
follow-up to the first experiment, this investigation was limited to the [i] - [a] contrast.
Table 2.11 below shows the results. Only one outcome was significant. Given
the number of speakers and contexts that were considered, this cannot be considered
strong evidence of effects at distances greater than 3. Recall that a similar follow-up
was done after the first production experiment, and also yielded some positive but weak
results at distances 4 and 5. While the lack of a clear positive outcome here cannot rule
out the possibility that distance-4 or longer-distance effects might be found in other
speakers or contexts, the rapid drop-off seen in both experiments in the range of
distance 3 does suggest that this may be an upper limit, at least for anticipatory VV
88
effects. It is unlikely that many environments would be more conducive to such effects
than those used here: those having multiple consecutive schwas separated by single
consonants.
Dist.
7
(“it’s”)
6
(“fun”)
5
(“to”)
4
(“look”)
Spkr. F1 F2 F1 F2 F1 F2 F1
27
*
3
(“up”)
2
(“at”)
Contrast
keep/
**
***
cop
keep/
*** ***
cop
keep/
* *** * ***
cop
keep/
**
**
*
cop
F2 F1 F2 F1
*
30
**
37
*
38
*
1
(“a”)
F2
F1
F2
Table 2.11. Results of testing for coarticulatory effects at very long distances, for four
speakers from the second production study. Significant t-test outcomes between the [i]
and [a] contexts are indicated for target vowels at each distance from 1 to 7 before the
context vowel, with significant results shaded and labeled, with * = p<0.05, ** =
p<0.01 and *** = p<0.001.
2.3.2.5. Further comparison of the two experiments
Earlier it was noted that the coarticulatory behavior of the subjects who
performed both production experiments, Speakers 3, 5 and 7, seemed to differ
substantially between the two tasks, despite the conditions for both being very nearly
the same. Generally, the results for these subjects were weaker in the second
experiment. The two main differences between the first and second tasks and their
analysis were (1) the point at which formant measurements were taken (at the end of
89
the vowel, or 25 ms earlier), and (2) the different words used to create the [i] and [a]
contexts (“key/car” or “keep/cop”). To determine whether one or both of these might be
the responsible factor for the differences in testing outcomes, a comparison was made
for these speakers of significance testing results using each of the two measuring
timepoints with each of the two sets of sentence recordings (ending in “key/car” or in
“keep/cop”). A summary of significance testing results is given below in Table 2.12.
The table gives results for all four possible combinations of context word pair and
formant measuring timepoint, two of which were utilized in the two production
experiments—the vowel-endpoint-minus-25-ms measuring point for the “key/car”
contrast (used in the first experiment, shown in the first three rows of the table), and the
vowel endpoint for the “keep/cop” contrast (used in the second experiment, give in the
last three rows). All new measurements were made using the same recordings of each
speaker obtained earlier for the two experiments; these recordings numbered 24 for
each of the three speakers (i.e. 4 possible last words of a sentence, with each sentence
repeated six times).
90
Distance 3 Distance 2 Distance 1
Speaker F1
F2
F1
F2
F1
F2
3
5
7
*
***
*
**
3
5
7
**
3
5
7
*
**
*
*
3
5
7
Contrast
Measuring
Timepoint
key/car
key/car
key/car
V end - 25 ms
V end - 25 ms
V end - 25 ms
key/car
key/car
key/car
V end
V end
V end
***
**
***
***
***
***
***
***
***
***
***
**
***
**
***
***
***
**
*
***
***
***
***
*** keep/cop V end - 25 ms
*** keep/cop V end - 25 ms
*** keep/cop V end - 25 ms
***
**
**
*** keep/cop
*** keep/cop
*** keep/cop
***
V end
V end
V end
Table 2.12. Summary of significance testing outcomes between [i] and [a] contexts for
each distance, for the three speakers who took part in both production experiments.
Outcomes using all four possible combinations of context word pair (“key/car” versus
“keep/cop”) and formant measurement point (target vowel endpoint or target vowel
endpoint minus 25 ms) are given.
Other than at distance 1, where strong effects seem abundant, the data given in
Table 2.12 indicate that (1) significant testing outcomes between vowel contexts are
fewer when the formant measurements are made at the very end of the vowel and (2)
the “keep/cop” contrast may be associated with fewer significant outcomes than the
“key/car” contrast. The data in Table 2.13 below supports these observations.
91
First, formant comparisons between contexts at distance 2 or 3 yield results that
are stronger overall when formant measurements are taken at the earlier measuring
point. One possible explanation for this, mentioned earlier, is that VV effects closer to
the end of the vowel are more strongly influenced by the coarticulatory effects on the
target vowel of the immediately following consonant; in fact, given the overlap of
gestural timing expected of adjacent segments, the consonant has presumably already
“started” by the time the vowel reaches its acoustic endpoint.
Second, significance testing to compare the “key/car” and “keep/cop” contrasts
also reveals a difference: results for the “key/car” contrast tend to be stronger. One
possible explanation of this the same as the reason that open-monosyllabic words (or
nearly open, depending on how one treats the rhotic in “car”) were chosen as context
words in the first production experiment to begin with. It was expected that the vowels
in such words would be held longer and that that would increase their tendency to act
on the preceding target vowels, relative to context vowels in closed syllables. For the
second experiment, in which all four corner vowels plus [^] were to be used as context
vowels, this was not an option since English [^] and [æ] do not occur in open syllables.
So, while use of the closed-syllable [kVp] pattern permitted a minimal quintuple to be
used, it may also have resulted in an overall tendency toward weaker VV coarticulation
than might be expected with open-syllable words.
The final three rows of Table 2.12, which give results when both the “keep/cop”
contrast and the end-of-vowel measuring procedure are used, seem to show the
cumulative weakening effects of both, such as a complete absence of distance-3 effects
for these speakers and relatively few effects at distance 2. Since this is the methodology
92
used in the second production experiment, it is worth noting that several other speakers
in that experiment did have significant distance-3 effects. It is possible that those
speakers would have coarticulated even more strongly in other contexts. Similarly, the
issue of which portion of a vowel may most reliably undergo measureable
coarticulatory effects, or whether there is consistency on this point among different
contexts or different speakers, is beyond the scope of this project.
Distance 3
Distance 2
Measuring point
V end minus 25 ms
V end
ns
ns
***
ns
V end minus 25 ms
V end
*
ns
**
**
Context pair
key/car
keep/cop
ns
ns
**
ns
key/car
keep/cop
*
ns
***
*
Table 2.13. Summary of t-test outcomes comparing two factors: measuring point within
the target vowel (end of vowel or 25 ms earlier) and context word pair (“key/car” or
“keep/cop”). The “V end minus 25 ms” and “key/car” comparisons yield generally
stronger results than the others.
93
2.4. Chapter conclusion
The studies presented here provide strong evidence that many if not most
speakers produce a significant amount of anticipatory VV coarticulation in at least
some contexts, and that for some speakers, such effects can extend across several
intervening segments. As has been the case in previous coarticulation studies,
substantial variability between speakers was also found. While long- and short-distance
VV coarticulatory tendency were correlated overall, speaking rate and tendency to
coarticulate were not. The effects of C-to-V and VV coarticulation were seen
simultaneously, but it appeared that the former may have caused an attenuation of the
latter at close range, at least for some speakers. Whether such effects are more-or-less
additive in general or interact in a more complicated fashion has yet to be determined.
Long-distance effects like those found for several of the speakers in these
experiments have significant implications for models of speech production, as will be
discussed in Chapter 7. How relevant they are for speech processing or historical
change depends in large part on whether listeners can hear them. The remaining
spoken-language experiments that were conducted as part of this project, which are
presented in Chapters 3 and 4, examine the perceptibility of VV coarticulatory effects.
94
CHAPTER 3 — COARTICULATION PERCEPTION IN ENGLISH:
BEHAVIORAL STUDY
3.1. Introduction
In an early perception study, Lehiste and Shockey (1972) performed a
perception experiment in which listeners heard recordings of VCV sequences in which
one of the vowels, together with the adjacent half of the consonant, had been excised,
and were asked what they thought the missing vowel was. Results were no better than
chance. However, studies using other methods (e.g. Fowler & Smith, 1986; Beddor et
al, 2002) have found that VV coarticulatory effects are in fact perceptible to listeners.
For example, Beddor et al. (2002) have found that listeners are able to “compensate”
for coarticulation effects, in the sense that they can shift target vowel category
boundaries when perceptually necessary, but to a great extent this is true only of effects
typical of the listener’s L1. This is consistent with results of Fowler (1981), who found
that listeners were most sensitive to differences resulting from coarticulation behaviors
most similar to their own. In that study, identical vowel sounds were often perceived as
different in various flanking vowel contexts, and acoustically distinct versions of a
given vowel were sometimes heard as being the same in appropriate flanking vowel
contexts. As Fowler (1981) has suggested, the technique used in the Lehiste and
Shockey (1972) study may simply have been too insensitive to allow accurate
95
measurement of perceptual effects. In the current study, a modified version of their
technique is used, with the following reasoning.
Ohala (1981) has suggested that acoustic byproducts of physiological linguistic
processes may sometimes be perceived by listeners as grammatically important
information, and that this may ultimately lead to language change. For example, in a
vowel immediately following a voiceless consonant, voicing fundamental frequency f0
tends to start off somewhat high, and then fall rapidly back to normal, while after a
voiced consonant, the opposite occurs (Ohala, 1974, 1981). The reasons for this are not
well-understood but appear to be physiologically based. Ohala has suggested that in
some cases, this phenomenon could be the origin of tonal contrasts. Along the same
lines, vowel harmony may sometimes be the outcome of perceptible vowel-to-vowel
coarticulation that has become grammaticalized (Ohala, 1994). Przezdziecki (2000)
found evidence for such a connection between vowel harmony and VV coarticulation in
his study of three Yoruba dialects, one of which has vowel harmony and the other two
of which do not. Przezdziecki suspected that coarticulation patterns in the two dialects
without vowel harmony might be similar in kind but different in degree from those in
the third dialect, in which he hypothesized that such patterns had originated in a similar
way but had now become grammaticalized. Analysis of vowel formant data from the
three dialects offered support for his hypothesis.
If we suppose that these effects might sometimes be the origin of vowel
harmony, they would almost certainly have to be perceptible to listeners, at least in
some environments. An interesting point that should be made here is that the languagechange hypothesis does not require that all speakers coarticulate heavily, nor that all
96
listeners perceive such effects readily. Instead, if the hypothesis holds, only small
minorities of such speakers and listeners would suffice to get the process started, as the
graph shown below in Figure 3.1 illustrates. The graph shows a logistic or “S” curve,
used to describe change in many contexts, including population studies, business, and
historical linguistics (e.g. see Aitchison, 1981).
Figure 3.1. A logistic or “S” curve can be used to characterize the spread of a change
through a linguistic community. During the first phase, only a few people have adopted
the change. This is followed by a second phase of rapid spread of the change
throughout the population, and finally the third stage, during which the few remaining
holdouts either adopt the change or die off.
The language-change hypothesis implies that the spread of coarticulationrelated change could occur because listeners who are particularly sensitive even to
relatively weak coarticulatory signals would in turn retransmit those signals in a
stronger form. This also leads to the intriguing question, addressed later in the present
97
study, of whether listeners who are especially sensitive to coarticulation might also tend
to coarticulate more. The present research was inspired by the idea that in cases where
coarticulatory effects are particularly strong, a different result from Lehiste and
Shockey’s (1972) might be obtained, and that this result would still have important
implications, as just discussed. This approach led to positive results in Grosvald (2006),
where many listeners were able to determine the final vowel in VCV sequences, where
C was [k] or [p] and V was [a] or [i], when hearing only the initial vowel. Following
this approach, the current perception study used only recordings from speakers whose
VV coarticulation patterns were particularly strong.
3.2. Methodology
3.2.1. Listeners
Out of the 38 subjects who participated in the production experiments, the first
seven were those whose recordings provided the raw material for this perception study;
no perception data was obtained from them. Subjects 18, 19 and 20 took part in a pilot
ERP study for which no responses were necessary (the ERP experiment will be
discussed in Chapter 4). Therefore, a total of 28 subjects—Speakers 8 through 17 and
21 through 38 of the production study—provided data for the perception experiment to
be described here. Twelve were female and 16 were male; ages ranged from 18 to 24
and averaged 19.4, with an SD of 1.63. Table 3.1 below summarizes the information
just given. All perception study subjects were undergraduate students at the University
of California at Davis who received course credit for participating.
98
Subjects
Production data
Perception data
1-7, 18-20
[a]-[i] contexts
No
8-17
[a]-[i] contexts
Behavioral
21-38
[a]-[i]-[æ]-[u] contexts
Behavioral and ERP
Table 3.1. The scope of the study expanded as the number of subjects increased. Early
subjects (1-7) performed only the [a]-[i] production task. The next group (8-17)
performed a behavioral perception task as well, described in this chapter. The final
group (21-38) performed an expanded production task with several vowel contrasts and
also provided ERP data, to be discussed in Chapter 4.
3.2.2. Creation of stimuli for perception experiment
Recordings obtained from the first seven speakers in the production experiment
were used to create stimuli for subsequent subjects to respond to in this perception
study. Although it might seem that synthesized stimuli might be preferable, their
creation would require specific decisions about the kinds of coarticulation that can or
cannot occur at various distances, something that is simply not well-established at this
point in time. While the aim was to determine how perceptible long-distance
coarticulation effects could be, not all speakers in the initial production experiment had
produced such effects, so an appropriate subset of the recordings had to be chosen. The
basic approach taken here was to use typical tokens from speakers who had
coarticulated more strongly than average. This should not render irrelevant the obtained
results, since the language-change hypothesis described earlier requires only some
listeners to be sensitive to some speakers’ coarticulatory effects. The participants whose
99
schwa tokens were chosen for use here, Speakers 3 and 5, were two whose results were
among the strongest from the seven subjects who were initially recorded. In order to err
somewhat on the conservative side, recordings from the speaker who coarticulated the
most, Speaker 7, were not used, in case the results for that individual were truly
exceptional.
The tokens needed were [i]- and [a]-colored schwas in each of the distance
conditions 1, 2 and 3. Since individual recordings might have quirks which could be
used by listeners to determine their distinctiveness independent of vowel quality, four
recordings of schwa in each distance condition and vowel context were selected, to be
presented interchangeably during each presentation of that vowel type. Four recordings
from Speaker 5 were used for each context ([i] and [a]) for each distance 1 and 2, while
four of Speaker 3’s recordings were taken for each context for the distance-3
condition.15 Therefore, the total number of tokens used was
2 ([i] vs. [a] context) * 3 (distance-1, -2 or -3 condition) * 4 copies of each = 24.
Because speakers had repeated each sentence six times, six tokens for each
context and distance condition were available, of which four were to be chosen as
stimuli. The choice of which four to use was made in a principled manner. First, for
each context and distance condition, average F1 and F2 of the corresponding six tokens
were computed, defining a center point in vowel space for that group of six tokens.
15
Recordings of different speakers were used just as multiple tokens in each condition were used, both
for comparable reasons—so that in the event significant results were obtained, it would be much less
likely that this was due to listeners relying on the characteristics of a particular speaker or a particular
recording.
100
Next, the Euclidean distance from that center point to each token was computed, and
the two outliers—those whose distance from the center point was greatest—were
rejected; the remaining four tokens were used in the experiment. Consequently, it was
not the most extremely coarticulated, but rather most typically coarticulated, tokens that
were used. These tokens were then normalized (re-synthesized) in Praat for duration,
amplitude and f0, according to the values shown below in Table 3.2.16
Distance condition
1
2
3
Duration (ms)
65-70
55-60
75-80
Amplitude (dB)
70
70
70
f0 (Hz)
120
150
200
Table 3.2. The duration, amplitude and f0 values used to normalize the tokens used in
each of distance conditions 1, 2 and 3.
3.2.3. Task
All perception study subjects began by performing the production task
discussed in Chapter 2. Afterwards, they were given a brief introduction to the purpose
of this research. They were told that language sounds can affect each other over some
distance, that people can sometimes detect this, and that they were about to begin a task
in which their own perceptual abilities were to be tested: they would hear vowel sounds
taken from sentences like the ones they had just been saying, with some of these vowels
sounding more like [i] than the others. So that they would not be discouraged by the
more difficult contrasts, they were told that subjects in such experiments sometimes say
16
Statistical testing on the formant values of these standardized tokens confirmed that in terms of their
distribution in formant space in the [i] versus [a] contexts, the standardized token sets were not more
widely spaced--hence more easily distinguishable by listeners--than the originals.
101
afterwards that they felt they were answering mostly at random when the contrasts were
very subtle, but often turn out to have performed at significantly better-than-chance
levels. (This turned out to be the case in the present study as well.)
Subjects were then seated in a small (approx. 10 feet by 12 feet) soundattenuated room in a comfortable chair facing a high-quality loudspeaker (Epos, Model
ELS-3C) placed 36 inches away on a table 26 inches high. The stimuli (stored as .wav
files) were delivered using a program created by the author using Presentation software
(Neurobehavioral Systems), which also recorded subject responses. The tokens had
been standardized in Praat at 70 dB (as discussed earlier) and were delivered at this
amplitude, as verified by measurement on a sound level meter (Radio Shack, Model 332055). To make their responses, subjects used a keyboard that was placed on their lap
or in front of them on the table, whichever they felt was more comfortable. Free-field
presentation was used because data were also being collected from some subjects for a
related study involving event-related potentials (ERPs; see Chapter 4).
All subjects were given a very easy warm-up task about one minute long, during
which easily distinguishable [i] and [a] sounds (not schwas) were played at the same
rate and ratio as their counterparts in the actual task. Subjects were told to hit a
response button when they heard a sound like “[i].” After completing this warm-up,
they were told that the actual task would be the same in terms of pacing and goal
(responding to the [i]-like sounds), but more challenging. Impressionistically, [i]colored schwas do have a noticeably [i]-like quality to them, particularly for distance 1
and to some extent for distance 2; participants’ feedback as well as the results to be
102
presented here indicate that subjects understood the task once they completed the
warm-up.
Figure 3.2. Design of the perception task. Each block consisted of 40 consecutive
cycles of eight vowels each, with one randomly-placed [i]-colored schwa per cycle.
Figure 3.2 above illustrates the organization of the perception experiment,
which consisted of three blocks, with one block for each distance condition 1, 2 and 3,
in that order. This sequence (as opposed to random order) was chosen so that each
subject would begin with relatively easy discriminations, which it was hoped would
keep them from being discouraged as they then progressed to the more difficult
contrasts. Each block consisted of 40 cycles, each of which consisted in turn of eight
consecutive schwa tokens, one of which was [i]-colored and the other seven of which
were [a]-colored. Therefore, to perform with 100% accuracy, a subject would need to
respond 40 times per block, by correctly responding to the one [i]-colored schwa in
103
each of the 40 cycles in that block. The [i]-colored tokens were randomly positioned
between the second and eighth slot in each cycle, so that such tokens never occurred
consecutively.
The interstimulus interval (ISI) varied randomly between 1200 and 1400 ms,
which provided a reasonable amount of time for subjects to respond when they thought
they had just heard an [i]-colored vowel.17 The ISI between each cycle of eight stimuli
was somewhat longer, being set randomly between 2800 and 3100 ms. (These varying
ISIs were required for ERP data collection; see Chapter 4.) Each cycle of eight vowels
therefore lasted approximately 15 seconds, and each block of 40 cycles lasted about 10
minutes. Subjects were not told about the structure of cycles within blocks, but having
performed the warm-up task, they had a sense of how often the [i]-colored schwas
would tend to occur. Participants could choose to take short breaks of one to two
minutes between blocks if they wished, or could proceed straight through the whole
experiment.
3.3. Results and discussion
3.3.1. Perception measure
For an analysis of the results of the perception experiment, the raw scores are
not an appropriate measure, and a statistic from signal detection theory called d’ (“dprime”) will be used instead (see Macmillan & Creelman, 1991). The reasoning behind
17
Since both stimulus and response timing were recorded by the stimulus delivery software, it was
straightforward to assign each response to a vowel stimulus--namely, the immediately preceding one.
While it is likely that subjects made occasional “late” responses (i.e., a response intended for one
stimulus delayed until after presentation of the subsequent stimulus), participants’ feedback indicated
that they adjusted readily to the rhythm of the task.
104
the idea that raw scores are not the best measure can be illustrated with a simple
example. If a subject were to answer “i” for all items, the overall score would be 12.5%
since only 1/8 of the tokens were [i]-colored. On the other hand, answering “a” for all
items would be equally uninspired but would now result in fully 7/8 = 87.5% correct
overall. In less extreme situations, more subtle problems associated with guessing or
bias could also be overlooked.
To approach the analysis correctly, we can consider the subjects’ task to be a
signal-detection effort. All tokens heard by a given subject were schwa sounds, but in
1/8 of them, there was coarticulatory [i] coloration; this is the “signal” that the subject
was trying to detect. In this context, a successfully reported [i] item is referred to as a
“hit,” a successfully reported [a] item is a “correct rejection,” a wrong “i” answer is a
“false alarm,” and a wrong “a” answer is a “miss.” The d’ statistic is the difference
between a subject’s normalized hit and false-alarm rates:
d’ = z(hits / trials with signal present) - z(false alarms / trials with signal absent).
For rates of 0 or 1, z is not defined, so the values 0.01 and 0.99 are substituted.
Given this, possible values of d’ range from -4.65 to +4.65, with an expected value of 0
if the subject has zero sensitivity to the contrast in question. Scores in the vicinity of 1
or higher generally turn out to be significant at the p<0.05 level. The variance of d’ is
given by the formula
H(1-H)/NH[φ(H)]2 + F(1-F)/NF[φ(F)]2,
where H, F, NH, NF and φ are the hit rate, false-alarm rate, number of “i” trials, number
of “a” trials, and probability density function of the normal distribution, respectively
(Gourevitch & Galanter, 1967). Using this, a confidence interval for d’ can be
105
determined for each subject. If the lower endpoint of the interval is greater than zero,
one can be confident (to the chosen degree of significance) that the subject has some
sensitivity to the contrast. Note that similar d’ scores can be obtained with different
distributions of correct and incorrect answers, both of which are used to compute the
variance of d’, which means that d’ scores in the same range can have different
significance testing outcomes.
3.3.2. Group results
Both individual and group data results are given in terms of the d’ statistic. For
each of distance conditions 1, 2 and 3, results were significant for the entire set of study
participants (d’ = 4.04, 1.03 and 0.37 respectively, all p’s<0.001). When the two groups
of subjects who provided perception data are considered separately, the results remain
significant for each group for each distance condition (see Table 3.1 above). For
simplicity of presentation, the details of these outcomes are presented in the following
section, in which the outcomes for each subject are also given.
3.3.3. Individual results
As was mentioned earlier, the later group of subjects (21 through 38)
participated in two perception experiments—the behavioral one being described now
and an ERP study to be discussed in Chapter 4. Because of the nature of the ERP study,
that study was performed first, followed by this one. Although the ERP experiment did
not require responses of any kind, subjects were exposed to the same stimuli used in
this behavioral study. Because of this extra exposure to the stimuli that the initial group
106
of subjects did not have, it is possible that subjects in the later group had an added
opportunity to process these sounds to some extent and hence may have performed the
task differently from the subjects in the initial group. Therefore, for each distance
condition, the d’ scores of each group are presented and averaged separately in Table
3.3 below; overall average scores for the entire set of participants are also given at the
bottom of the table.
The table displays the values of d’ obtained for each listener in each distance
condition, together with their associated significance levels. The results show that
respondents were well able to distinguish [i]- and [a]-colored schwas in the distance-1
condition; many respondents had near-perfect results here. The distance-2 condition
was clearly more challenging, and seemed to represent a threshold of sorts, inasmuch as
five respondents’ scores did not reach significance while four others’ did (the
remaining subject provided no data here). The distance-3 condition was by far the most
challenging, and respondents appear to have answered mostly at random, although one
subject did attain the remarkable score of 2.28 and three others also had scores which
were significant, albeit less strongly so.
Mentioned above was the possibility that the later group of subjects might
perform the task differently (presumably better) than the earlier group, because of the
later group’s extra exposure to the study stimuli. The results indicate that this concern
was not borne out, at least not to a substantial degree. The average scores for each
group in each distance condition are not significantly different, and in fact the later
group’s averages are slightly lower. It is possible that the later group did have an
advantage in having been exposed to the study stimuli more, but with the price of some
107
additional fatigue, since the ERP task had required them to stay alert and relatively still
for an hour or so. In any case, distribution of scores was also fairly similar between the
groups, with all subjects scoring at much-higher-than-chance levels on the distance-1
task, on the order of half scoring above chance on the distance-2 task, and a small
minority (two in each group) achieving significant results at distance 3. For the entire
group of 28 perception study participants, the distribution of scores was as follows: at
distance 1, all subjects (100%) scored above chance; at distance 2, 15 subjects (54%)
did so; and at distance 3, four (14%) had a significant outcome.
Also shown in the table are the overall results for each subject group, as well as
for the entire pool of subjects who participated in either experiment. As might be
expected, results were well above chance for both the distance-1 and distance-2 tasks.
Further, although only a few subjects scored significantly better than chance in the
distance-3 task, the group results were also above chance levels.
108
Subject
Distance 3
Distance 2
Distance 1
8
9
10
11
12
13
14
15
16
17
Average
2.28 ***
-0.65
0.16
0.20
0.52
0.67 *
0.77
0.16
0.27
0.55
0.49 ***
1.19 *
no data
0.75
0.52
1.84 ***
1.71 ***
1.84 **
1.04
1.16
0.89
1.22 ***
3.97 ***
4.65 ***
4.65 ***
4.18 ***
3.40 ***
3.45 ***
3.54 ***
4.65 ***
4.29 ***
4.15 ***
4.09 ***
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Average
0.41
0.94 *
0.10
0.00
0.33
0.44
-0.05
-0.20
0.49
-0.18
0.44
1.26 *
0.17
-0.07
0.27
0.40
0.15
0.59
0.31 *
0.70
1.16 *
-0.07
0.79
1.15 *
0.69 *
1.03 *
-0.32
0.57
1.21 *
0.53
2.04 **
0.53
1.93 ***
1.72 **
0.94 *
1.46 **
0.80 *
0.94 ***
4.15 ***
4.65 ***
2.56 ***
3.92 ***
4.65 ***
3.83 ***
3.38 ***
4.29 ***
2.89 ***
3.97 ***
4.65 ***
3.97 ***
4.29 ***
4.63 ***
3.45 ***
3.97 ***
4.29 ***
4.65 ***
4.01 ***
Overall average
0.37 ***
1.03 ***
4.04 ***
Table 3.3. The table shows each subject’s d’ measure for each of the three distance
conditions. Significant results are shaded for individuals and labeled, where * = p<0.05,
** = p<0.01 and *** = p<0.001. Subject 9 provided no responses for the distance-2
condition.
109
3.3.4. Correlation between production and perception
In a study investigating the possibility of a link between production and
perception, Perkell, Guenther, Lane et al (2004) compared the distinctiveness of 19
American English speakers’ articulations of the vowels in the words “cod,” “cud,”
“who’d” and “hood” with those speakers’ performance in an ABX discrimination task
(Liberman, Harris, Kinney & Lane, 1951). The perception task used two sets of
synthesized words whose first three vowel formants were set at seven intermediate
stages between “cod” and “cud” for the first set and between “who’d” and “hood” for
the second set. Speakers who articulated the vowels more distinctly performed
significantly better in the discrimination task than other speakers; the authors
hypothesize that sensitive listeners establish, for phonemically contrasting sounds,
articulatory target regions that are tighter and farther apart than is the case for listeners
who are less sensitive.
Although the contrasts explored in the present study are all sub-phonemic in
nature, findings like those of the Perkell et al (2004) study raise a question relevant to
the language-change scenario explored earlier: is there a correlation between
individuals’ ability to detect coarticulatory effects and tendency to coarticulate? This
possibility was investigated for the 28 study participants who provided both production
and perception data. Because of the possibility—already raised—that production and/or
perception results for the earlier and later group of subjects might not be a completely
valid comparison, the results for each group will be presented separately; then the
results for the whole group will be given.
110
As a quantitative measure of perceptual ability for a given subject, the average
of that subject’s three d’ scores was used, while the production measure was the
average over the three distance conditions of the Euclidean distance in normalized
vowel space between that subject’s [i]- and [a]-colored schwas. For the earlier group of
ten participants (Subjects 8 through 17), the correlation measure r between the
perception and production measures so obtained was found to be 0.52, but this outcome
is not statistically significant (p = 0.13). In addition, as Figure 3.3 below illustrates, this
result is dependent on a small number of outliers and does not hold up if they are
excluded. In case some other measures of perception or production might reveal a
stronger relationship, other candidate measures were also tested, such as different
relative weightings among the distance conditions and the use of logarithmic vowel
space measures instead of raw formant numbers (cf. Johnson, 2003), but none of these
led to substantially different results.
111
Figure 3.3. Correlation between averaged production and perception measures for the
earlier group of ten subjects who provided data for both (r = 0.52, p=0.13).
The outcome for the later group (Subjects 21 through 38) is even more
problematic for the hypothesis of a production-perception correlation, as it finds that
the correlation between the average d’ and vowel-space difference between contexts for
this group is strongly negative (r = -0.60, p<0.01). This outcome is illustrated below in
Figure 3.4 and like the results from the earlier group, appears to reflect the influence of
a small number of outliers rather than constituting evidence of a perception-production
relationship.
112
Figure 3.4. Correlation between the averaged production and perception measures for
the 18 subjects from the later group (r = -0.60, p<0.01).
Finally, combining the data for both groups for one overall comparison results
in the positive correlation from the first group and negative correlation from the second
group largely canceling each other out (r = -0.15, p=0.46), as shown below in Figure
3.5. Again, other measures of production, such as using logarithmically scaled formant
values, do not substantially change this outcome.
113
Figure 3.5. Correlation between averaged production and perception measures for all
28 listeners, from both the earlier and later groups of subjects who provided data for
both (r = -0.15, p=0.46).
These results contrast markedly with those of the Perkell et al (2004) study cited
earlier. However, the perception and production of phonemically contrasting vowels
were the focus of that study, and the results were explained by the researchers as
reflecting more accurate articulatory targets for such vowels on the part of more
sensitive listeners. If this is accurate, one might actually expect less coarticulation on
the part of such listeners in their productions of phonemically contrasting vowels, and it
is unclear what expectations one might have for the articulation of schwa, whose
phonemic status is unclear (see Chapter 1). In any case, more study will evidently be
114
needed before any strong claims can be made concerning a possible perceptionproduction relationship, as far as coarticulation is concerned.
It is worth pointing out, however, that the lack of a production-perception
correlation would not invalidate the language-change hypothesis discussed earlier.
Recall that the basic idea of that hypothesis was that language change could occur as a
result of some listeners perceiving some speakers’ coarticulation and in turn
reproducing those patterns in their own speech, resulting in a feedback pattern
eventually leading to language change. This scenario does not actually require that
perception and production be correlated, although such a correlation would certainly
make the whole story more compelling. Still, all that is really necessary for such a
process to begin is that some minority of speakers coarticulate strongly enough for
some minority of listeners to perceive it.
Figure 3.6 below shows idealized production and perception measures mapped
against each other, each with some threshold value (Pr0 and Pe0, respectively) above
which speakers and listeners may participate in initiating the early stages of the
language change scenario. As long as the highlighted upper-right corner of the figure is
not empty—that is, as long as some speakers coarticulate a fair amount and some
listeners are sensitive enough to perceive it and then reproduce the coarticulatory
effects they have heard in a somewhat stronger form—it does not matter if part or all of
the speech community at large exhibit a perception-production correlation or not.
On the other hand, the lack of decisive correlation results in the present study
may simply be due to the measures used being not sensitive or sophisticated enough to
accurately reflect subjects’ “true” production and perception tendencies. For the time
115
being, the determination of actual values for Pr0 and Pe0 must be substantially
deferred.
Figure 3.6. Hypothetical threshold values of production tendency (Pr0) and perceptual
sensitivity (Pe0), above which speakers and listeners may participate in early stages of
coarticulation-related language change. The presence or absence of a productionperception correlation may be unimportant.
3.4. Chapter conclusion
This study found a large amount of variation between listeners in the
perceptibility of VV coarticulation. The nearest-distance effects appear to be in essence
universally perceptible (at least for listeners with normal hearing), although it should
116
also be remembered that the vowel stimuli used here were those produced by subjects
with above-average coarticulatory tendencies. In contrast to the case of near-distance
effects, subjects’ sensitivity to longer-distance VV coarticulation was quite variable,
although even these effects were perceptible to some listeners. Given the variation that
is evident among both speakers and listeners, the interplay between production and
perception of coarticulation in the context of actual language use is likely to be quite
complicated, perhaps much more so than measures such as the linear perceptionproduction correlation investigated here can adequately describe.
Although the perception results obtained here offer some insight into
differences among listeners in their perception of subtle coarticulatory effects, the
methodology used was admittedly rather crude, both because of the direct-questioning
nature of the task (“Which vowel sounds different?”), and because vowels were excised
from their natural contexts and played in isolation. In one example of a different
approach, Scarborough (2003) cross-spliced recordings of words in such a way as to
create stimuli which varied in the consistency of their coarticulatory patterns, and then
had listeners perform a lexical decision task. Stimuli which were consistent with
naturally occurring coarticulation patterns were generally associated with faster
reaction times.
Another avenue of research which appears to show potential is the use of nonbehavioral methodology to ascertain the perceptibility of coarticulation effects. For
instance, the event-related potential (ERP) technique, which involves the recording of
brain-wave (electroencephalogram) data, can provide insight into mental processes
which occur whether or not subjects are consciously aware of them (for an overview of
117
ERP methodology see Luck, 2005). Groups of neurons firing in response to particular
types of stimuli produce positive or negative electrical potentials at characteristic
locations on the scalp during particular timeframes. When recorded and averaged over
many trials, the background noise tends to zero and the response pattern consistently
associated with the stimulus present in each trial remains; such patterns are called ERP
components, and many are associated with linguistic processes.
The most well-known such components include the N400, a negative deflection
typically occurring on the order of 400 msec after a subject is exposed to a semantically
anomalous stimulus (Kutas & Hillyard, 1980), and the P600, a positive-going wave
generally associated with syntactic anomaly and occurring, as its name indicates,
approximately 600 msec after stimulus presentation (Osterhout & Holcomb, 1992).
Generally, such late-occurring components are thought to correspond to higher-order
processing, while lower-level processing patterns are associated with earlier
components. As one example of the productive use of ERP methodology in linguistic
perception research, Frenck-Mestre, Meunier, Espesser et al (2005) used ERP
methodology to investigate the ability of French listeners to perceive phonemic
contrasts existing in English but absent in French. In fact, much ERP research has been
directed at understanding listeners’ processing of phonemic contrasts. An ERP study
investigating the perception of sub-phonemic contrasts resulting from VV
coarticulation is presented in Chapter 4.
118
CHAPTER 4 — COARTICULATION PERCEPTION IN ENGLISH:
EVENT-RELATED POTENTIAL (ERP) STUDY
4.1. Introduction
The paradigm used in this ERP study targeted the mismatch-negativity (MMN)
component, which is seen at fronto-central and central scalp sites approximately 150 to
250 ms after the presentation of a “deviant” acoustic stimulus occurring among a train
of otherwise-similar (“standard”) acoustic stimuli. Specifically, the MMN is the
negative deflection seen in the difference wave obtained by subtracting the response to
the standard stimuli from that of the deviant stimuli. The MMN can be elicited, for
example, by an occasional high-frequency tone occurring among a series of lowfrequency tones. Generally, larger differences between the deviant and standard stimuli
are associated with MMN responses of smaller amplitude and greater peak latency
(Näätänen, Paavilainen, Rinne & Alho, 2007). Most importantly for the present study,
the MMN also shows sensitivity to linguistic phenomena such as phonemic
distinctions.
The MMN has counterparts in other sensory modalities such as the visual
(Alho, Woods, Algazi & Näätänen, 1992; Tales, Newton, Troscianko & Butler, 1999),
somatosensory (Kekoni, Hämäläinen, Saarinen, Gröhn, Reinikainen et al., 1997) and
olfactory (Krauel, Schott, Sojka, Pause & Ferstl, 1999) senses; and in other
119
technological/methodological approaches such as magnetoencephalography (MEG;
Hari, Hämäläinen, Ilmoniemi, Kaukoranta, Reinikainen et al., 1984), positron emission
tomography (PET; Tervaniemi, Medvedev, Alho, Pakhomov, Roudas et al., 2000) and
functional magnetic resonance imaging (fMRI; Celsis, Boulanouar, Doyon, Ranjeva,
Berry et al., 1999). Näätänen, Paavilainen, Rinne and Alho (2007) provide a thorough
review of the MMN and related phenomena. For more information on ERP
methodology in general, Luck (2005) is an excellent source.
A distinctive and important characteristic of the MMN is that can be elicited
even if subjects are not actively attending the stimuli, such as when they read a book or
watch a silent video (Näätänen, 1979, 1985; Näätänen, Gaillard & Mäntysalo, 1978), or
even while they sleep (Sallinen, Kaartinen & Lyytinen, 1994). Therefore, researchers
carrying out MMN studies do not depend on the ability or willingness of subjects to
focus on behavioral tasks. Because of its automatic nature, the MMN has proven to be a
valuable tool in clinical investigations where issues of auditory perception and memory
encoding are relevant (for a review see Kujala, Tervaniemi & Schröger, 2007), and this
aspect of the MMN was also a definite practical advantage in the present experiment,
since subjects report that behavioral tasks involving [a]- and [i]-colored schwas
sometimes become taxing.
The MMN is generally thought to reflect the outcome of an automatic process
comparing the just-occurred auditory event with a memory trace formed by the
preceding auditory events; this may be referred to as the “model adjustment”
hypothesis (Näätänen, 1992; Tiitinen, May, Reinikainen, & Näätänen, 1994). The
MMN is believed to have generators located bilaterally in the primary auditory cortex
120
and in the prefrontal cortex, whose respective roles are thought to involve sensorymemory and cognitive (comparative) functions (Giard, Perrin, Pernier & Bouchet,
1990; Gomot, Giard, Roux, Barthelemy & Bruneau, 2000). Rinne, Alho, Ilmoniemi,
Virtanen and Näätänen (2000) have found that during this process, the temporal
(auditory cortex) generators act later than those in the prefrontal cortex, supporting the
hypothesis that the outcome of sensory processing in the auditory cortex is passed to
the prefrontal cortex, where a change-detection operation is performed.
A more recent suggestion, the “adaptation hypothesis” (Jääskeläinen,
Ahveninen, Bonmassar, Dale, Ilmoniemi et al, 2004), is that the MMN is in fact
illusory, being caused by decreased neuronal response, “neuronal adaptation,” during
the train of similar (standard) auditory stimuli, resulting in attenuation and increased
latency of another component, the N1. In this scenario, the subtraction of the averaged
standard-stimuli response from the averaged deviant-stimuli response (from which the
MMN is derived) results merely in the illusion of a distinct component, but this is
merely an artifact of these differences in N1 behavior between the standard and deviant
conditions. While there is limited support for the adaptation hypothesis (see Garrido,
Kilner, Stephan & Friston, in press, for a discussion of the debate), the model
adjustment hypothesis remains more widely accepted, and one assumption fundamental
to that hypothesis—namely that the MMN exists and can therefore be studied—will be
made here.
The main question the present ERP study seeks to answer is how sensitive the
MMN might be to the sub-phonemic processing associated with the perception of VV
121
coarticulation. This will be of particular interest if the MMN provides a more sensitive
measure in this context than behavioral methods offer. If so, the general picture of
coarticulatory sensitivity could look something like Figure 4.1 below. The basic idea
here is that a given segment S0 has coarticulatory effects that in general may extend
across a number of neighboring segments, with stronger effects expected nearer to the
influencing segment. In the figure, the coarticulatory effects of S0 on preceding
(negatively-indexed) segments correspond to anticipatory effects, while those on
following (positively-indexed) segments correspond to carryover effects. One would
expect variation in the ranges involved depending on context and speaker, with
differences in the spans of anticipatory versus carryover influence, and depending as
well on the sensitivity of the listener, so that for example, the ranges might collapse to
near-zero width in the case of very insensitive listener or a speaker who coarticulates
very weakly.
In general, nearer effects would be expected to be stronger and therefore more
likely to be perceptible to listeners, something which could be confirmed in a
behavioral experiment such as the one discussed in Chapter 3. As the distance from the
segment increases, the coarticulatory effects are expected to be more subtle, and at
some point may not be perceptible to a listener in a way which can be meaningfully
measured with a given methodological approach. One question relevant to this study is
whether the ranges of effects amenable to study with ERP methodology may be wider
in general than might be usefully studied with behavioral techniques. Another
possibility is that the real situation is more complicated than the proper inclusions
depicted in Figure 4.1 indicate.
122
Figure 4.1. Hypothetical patterning related to the perceptibility of coarticulatory effects
of one segment, S0, on preceding and following segments. These range from easilyperceptible effects at a narrow range to subtler effects at greater ranges, which may be
amenable to study only with non-behavioral methodologies. At the limit, effects might
be present but too subtle to be perceptually relevant to the listener.
4.2. Methodology
4.2.1. Participants and Stimuli
All but one subject from the second group of 18 who participated in the
behavioral experiment described in Chapter 3 also participated in this ERP experiment;
the excluded individual was Subject 25, who was left-handed. This left 17 participants
who contributed EEG data (6 female; ages ranging from 18 to 24, with mean 19.4 and
SD 1.8). All were native speakers of English but as students at a university with a
foreign-language requirement, tended to have had some exposure to at least one other
language. Two indicated that they had substantial knowledge of another language, but
123
the group results did not differ if they were excluded, so their data was included. All
subjects performed the ERP task before the behavioral task, since the latter involved
debriefing each subject about the purpose of the study. Therefore, subjects were still
unaware of the specific goals of the study during their participation in the ERP
experiment; they merely knew that the study was language-related.
Figure 4.2. Each block consisted of 40 consecutive cycles of eight vowels each for the
distance-1 condition and 80 cycles for the distance-2 and -3 conditions, with one
randomly-placed [i]-colored schwa per cycle.
The stimuli and the sequencing of stimuli within each block were the same as in
the behavioral experiment, as seen in Figure 4.2 above, but the lengths of the blocks
were increased for the distance-2 and -3 blocks from 40 cycles to 80 cycles per block,
to provide more EEG data for averaging. Since it had already been established in the
initial perception study that the distance-1 task was very easy, the number of cycles in
the distance-1 block was left at 40 as in the behavioral task. Stimuli were presented
124
1200-1400 ms apart within each cycle of eight stimuli, with short blink breaks of 28003100 ms between each cycle and extra-long blink breaks of 10.8-11.1 s every ten
cycles.
In addition, unlike in the behavioral task, where blocks were always presented
in distance condition order 1, 2 and 3, ordering of the three blocks for a given subject
was random in the ERP experiment. Since this experiment was intended to evoke the
MMN, subjects did not have a response task, so they simply sat in the chair and were
asked stay alert by watching a silent film playing on a portable DVD player positioned
in front of the subject. It was after participating in the MMN experiment that subjects
performed the behavioral perception task described in Chapter 3.
4.2.2. Electroencephalogram (EEG) recording
125
Figure 4.3. Configuration of the electrodes in the 32-channel cap.
EEG data were recorded continuously from 32 scalp locations at frontal,
parietal, occipital, temporal and central sites, using AgCl electrodes attached to an
elastic cap (BioSemi). Figure 4.3 above shows the electrode configuration. Vertical and
horizontal eye movements were monitored by means of two electrodes placed above
and below the left eye and two others located adjacent to the left and right eye. All
electrodes were referenced to the average of the left and right mastoids. The EEG was
126
digitized online at 256 Hz, and filtered offline below 30 Hz. Scalp electrode impedance
threshold values were set at 20 kΩ. Epochs began 200 ms before stimulus onset and
ended 600 ms after. After inspection of subjects’ data by eye, artifact rejection
thresholds were set at ±100 μV and rejection was performed automatically. ERP
averages over epochs were calculated for each subject at each electrode for each
context (standard [a] and deviant [i]) and distance condition (1, 2 or 3). Analysis was
performed using EEGLAB (Delorme & Makeig, 2004). Two subjects were excluded
from the group results because of a persistently high proportion of rejected trials due to
artifacts (over 50 percent), leaving 15 participants whose data was used in the analyses
about to be presented.
4.3. Results and discussion
Topological maps of the effects that were found are shown in Figure 4.4 below,
and the associated waveforms are presented in Figure 4.5. Although the latency of the
MMN component is typically expected to fall in the neighborhood of 200 ms, the
effects seen here appear to be strongest closer to 300 ms, as can be seen in the figures.
Previous research has shown that the amplitude and peak latency of the MMN is
modulated by task difficulty, with larger and earlier MMN responses associated with
greater deviations from the standard (e.g. Näätänen, 2001; Tiitinen, May, Reinikainen
& Näätänen, 1994), so it is possible that the late effects seen here reflect a MMN-like
response whose great latency is due to the subtlety of these sub-phonemic differences.
The responses for distances 1 and 2 do have a distribution similar to that expected for
127
the MMN component and do show greater negativity for the deviant stimuli. For these
reasons, for the purposes of the group testing whose results are about to be presented,
mean amplitude over the time interval from 275 to 325 ms was used.
128
129
Figure 4.4. Topographical maps showing grand averages at each 50-ms interval in the
time range [-50 ms, 400 ms], for distance conditions 1, 2 and 3. The top, middle and
bottom sets of figures correspond to those distance conditions in that order. Units on
the scale are microvolts.
130
131
132
Figure 4.5. Grand-average waveforms at selected electrode sites in the time range [-200
ms, 600 ms], for each of distance conditions 1, 2 and 3, in that order from top to
bottom. In these graphs, green = standard ([a]), red = deviant ([i]), and black =
difference (deviant minus standard). Negative is plotted downward.
ANOVA testing was first performed on the group data for each distance
condition with within-subject factors of hemisphere (left, mid, or right), anteriority
(anterior, central, or posterior), and vowel context ([i] or [a]) in the time range [275 ms,
325 ms]. Greenhouse-Geisser and Sidak adjustments were performed as appropriate
and are reflected in the results reported here.
133
At distances 1 and 2, highly significant effects and interactions related to
electrode site and context vowel were found. At distance 1, there were significant main
effects of hemisphere (F(1.36,19.0)=11.0, p<0.01), anteriority (F(1.78,24.9)=25.5,
p<0.001) and vowel context (F(1,14)=14.9, p<0.01), and a hemisphere-anteriority
interaction (F(2.04,28.5)=6.42, p<0.01). At distance 2, there were main effects of
hemisphere (F(1.75,24.5)=15.2, p<0.001), anteriority (F(1.22,17.1)=14.9, p<0.01) and
vowel context (F(1,14)=11.1, p<0.01), and hemisphere-anteriority (F(1.78,24.9)=4.82,
p<0.05) and hemisphere-vowel (F(1.83,25.6)=4.48, p<0.05) interactions. Both effects
were in the expected direction, with greater negativity in the deviant context relative to
the standard context, showing up most strongly at frontal midline sites in both of the
distance-1 and -2 conditions. In the distance-1 condition, this negativity was somewhat
stronger in the right hemisphere than in the left hemisphere, as can be seen in Figure
4.4, but this difference was not significant in a pairwise comparison. At distance 3, only
the effects of hemisphere and anteriority were significant (F(1.33,18.6)=7.94, p<0.01
and F(1.46,20.5)=10.2, p<0.01, respectively), although there was a marginally
significant effect of context vowel (F(1,14)=2.25, p=0.156). The effects of hemisphere
and anteriority again reflect that the greatest voltage differences were seen in the front
midline region. However, for distance 3 this difference was in the contrary-to-expected
direction, with greater positivity in the deviant context.
Next, ANOVAs with vowel context and electrode as factor were performed on a
restricted set of electrode sites (FZ, CZ, PZ, AF3, AF4, FC1, FC2, CP1 and CP2)
located in the central midline region, where the MMN is typically expected and where
the strongest effects were seen here; these sites will henceforth be referred to as “MMN
134
sites.” Results were similar to those found in the earlier ANOVAs, with highly
significant outcomes for vowel context only seen at distances 1 and 2 and at best a
marginal outcome at distance 3 (results at distance 1 for electrode and context,
respectively: F(3.04,42.6)=12.0, p<0.001, and F(1,14)=15.3, p<0.01; for distance 2:
F(1.80,25.2)=11.3, p<0.001, and F(1,14)=13.1, p<0.01; for distance 3:
F(1.91,26.7)=3.97, p<0.05, and F(1,14)=2.32, p=0.15). Figure 4.6 below shows the
topographic distribution of these effects in each distance condition.
Figure 4.6. Topographic distribution of the MMN-like effects at 300 ms from stimulus
onset at distances 1 and 2, and the marginally significant positivity found in that
interval for the distance-3 condition, from top to bottom in that order. Units on the scale
are microvolts.
4.3.1. Latency
Because the timing of these effects is later than would generally be expected for
an MMN effect, a further breakdown of their temporal properties was made. To this
end, ANOVAs like those carried out earlier for the 275-to-325-ms interval were
135
performed over a series of 100-ms intervals, in 50-ms steps from stimulus onset up to
500 ms later. Table 4.1 below shows the outcomes of these tests. Also shown for
comparison in the two rightmost columns are (1) the results for the interval from 275 to
325 ms after stimulus onset, where as noted before, effects were strongest in this study,
and (2) results for the interval from 100 to 300 ms, where the MMN is typically
expected. For distance 1, significant results begin to be seen relatively early, in the
neighborhood of 100 ms. This may indicate that some but not all subjects began
exhibiting a differential response by this time to the differently-colored vowels. For
distance 2, the results indicate that the effect is concentrated in the later timeframe. As
noted earlier, context-related effects in the distance-3 condition tended to be in the
contrary-to-expected direction, but these differences were not significant.
For distance 1 and perhaps to some degree at distance 2, the effects appear
rather prolonged and together with the waveforms shown earlier in Figure 4.5, this
raises the question of whether more than one component may be involved, or perhaps
just as likely, whether these effects reflect contributions from different subgroups of
subjects; some subjects may have a more delayed response to these sub-phonemic
contrasts than others do, leading to smearing of the effects when examined at the
whole-group level. To examine this possibility further, subjects were next separated
into two groups based on the behavioral results discussed in Chapter 3. The results of
that investigation are presented in the following section.
136
Interval 0(ms)
100
50150
100200
150250
200300
250350
300400
350450
400500
275- 100325 300
All
sites
Dist 1
Dist 2
Dist 3
ns
ns
ns
ns
ns
ns
*
ns
ns
***
ns
ns
**
*
ns
**
*
ns
**
+
ns
*
ns
ns
ns
ns
ns
**
**
ns
**
ns
ns
MMN
sites
Dist 1
Dist 2
Dist 3
ns
ns
ns
ns
ns
ns
*
ns
ns
**
ns
ns
**
*
ns
**
*
ns
**
*
ns
*
ns
ns
+
ns
ns
**
**
ns
**
+
ns
Table 4.1. Latency results for the entire subject group (n=15), showing the outcome of
significance testing of mean amplitude difference between vowel contexts in the
indicated time windows. Significant results are noted, with * = p<0.05, ** = p<0.01
and *** = p<0.001. Also noted are marginal results, where + = p<0.10.
4.3.2. Relationship to behavioral results
For the purposes of this analysis, the group of 15 ERP subjects was broken
down into two subgroups, based on their behavioral (d’) scores in the study presented
in Chapter 3. Recall that the distance-1 task was quite easy, with all subjects
performing near ceiling levels, while the distance-3 task was so difficult that few
subjects performed at significantly better-than-chance levels. In contrast, the distance-2
task, on which about half of the subjects performed at better-than-chance levels while
the rest did not, provides a convenient means for breaking the subject group into two
nearly-equal-size subgroups, with seven in what will be called the “insensitive” group
and the other eight falling into the “sensitive” group.
137
Table 4.2 below presents the results of a latency analysis like that whose
outcome was shown for the entire subject group in Table 4.1 earlier, but with separate
analyses performed for the “insensitive” and “sensitive” subject subgroups. Topological
maps of the responses of the two groups as of 300 ms after stimulus onset are also
given below, in Figure 4.7. The results are much different for the two groups, with no
significant negative effects seen for the “insensitive” group, even at distance 1. This is
in spite of the fact that all subjects performed at well-above-chance levels in the
distance-1 behavioral task and in fact, the “insensitive” and “sensitive” groups did not
differ significantly in their performance of the distance-1 task (respective mean d-prime
scores = 3.82 and 3.94; t(8.81)=0.36, p=0.73). For distances 1 and 2, the “insensitive”
subjects do show a negative trend in the deviant condition, but it is clearly much
weaker than that of the “sensitive” group.
One other noteworthy difference between the two subject sub-groups is the
weak but significant positivity shown by “insensitive” subjects prior to 300 ms after
stimulus onset in the distance-3 condition and much earlier in the distance-1 condition.
The very early onset of these positive effects may be cause for suspicion that it is
spurious, although Jääskeläinen et al’s (2004) “adaptation hypothesis,” mentioned
earlier, may hint at an alternative explanation. Recall that according to that hypothesis,
the MMN is illusory, ultimately being the result of a progressively reduced N1 during
the train of standard stimuli, this reduced N1 itself being due to neuronal adaptation to
the mutually-similar standard stimuli. Following a similar line of thought, it might be
supposed that subjects’ MMN response in the present experiment is a combination of a
number of subcomponents, some positive and some negative, occurring during the
138
same general timeframe. If so, perhaps the “insensitive” subjects have a greater
tendency toward a reduced negative subcomponent in some circumstances, resulting in
a net positivity. Of course, this must remain a very tentative hypothesis at this time. In
any case, it is interesting that the “insensitive” subjects’ ERP response is so different
from the others’, even in cases where their behavioral results were very similar, as in
the distance-1 condition.
Interval
(ms)
0100
50150
100200
150250
200300
250350
300400
350450
400500
275325
100300
Sens.
(n=8)
Dist 1
Dist 2
Dist 3
ns
ns
ns
+
ns
ns
**
ns
ns
**
ns
ns
**
+
ns
**
+
ns
**
ns
ns
**
ns
ns
*
ns
ns
**
*
ns
**
ns
ns
Insens.
(n=7)
Dist 1
Dist 2
Dist 3
(*)
ns
(*)
ns
ns
(*)
ns
ns
(*)
+
ns
(*)
ns
ns
(+)
ns
ns
ns
ns
ns
ns
ns
ns
ns
ns
ns
ns
ns
ns
(+)
ns
ns
(*)
Table 4.2. Latency results by subject subgroup, with sensitivity being determined by
each subjects’ performance on the behavioral task at distance 2. Outcomes are given for
significance testing of mean amplitude difference between vowel contexts in the
indicated time windows. Significant results are noted, with * = p<0.05, ** = p<0.01
and *** = p<0.001. Also noted are marginal results, where + = p<0.10. Results in
parentheses indicate an outcome in the contrary-to-expected direction.
139
Figure 4.7. Topographic distribution of the effects seen at 300 ms after stimulus onset,
broken down by subject group. The column at left shows results for the subjects
deemed “sensitive” according to their d-prime scores, while the right column gives
corresponding results for “insensitive” subjects. The top, middle and bottom rows
correspond to distance conditions 1, 2 and 3, respectively. Units on the scale are
microvolts.
140
Another issue relevant to this study is whether the magnitude of listeners’ ERP
response to deviant stimuli might be correlated with their sensitivity as measured in the
behavioral task. Analysis of the data obtained in this study does not find evidence of
such a relationship. The correlation of ERP response (measured as difference in mean
amplitude at the designated “MMN” electrode sites between the deviant and standard
contexts) and d-prime scores obtained in the behavioral study was not found to be
significant. This was so for all combinations of distance-2 d-prime scores as well as dprime scores averaged over the three distance conditions, and ERP measures as seen in
either the [275 ms, 325 ms] time interval or the interval more typically associated with
the MMN, from 100 to 300 ms.
Finally, the data collected in the behavioral and ERP studies enable the
investigation of two other questions relevant to the relationship between subjects’
behavioral sensitivity measures and strength of ERP response.
4.3.2.1. Can ERP results predict behavioral outcomes?
For this analysis, subjects’ mean amplitude of MMN response over the interval
[275 ms, 325 ms] for the distance-1 condition was examined, together with their
behavioral scores in the distance-2 condition. A negative correlation was found, as
might be expected (the MMN is by definition negative), but this was not statistically
significant (r= -0.25, p=0.37).
141
4.3.2.2. Are ERP and behavioral responses correlated in general?
Among the 15 ERP study participants, mean amplitude of MMN response was
not significantly correlated with behavioral scores, in any of the three distance
conditions. This may be due to (1) the noise associated with individual EEG data, (2)
natural variation among subjects in strength of MMN response to particular stimuli, (3)
the possibility that the MMN is an incomplete index of perception in this context.18
4.4. Chapter conclusion
This is the first ERP study to investigate the sub-phonemic processing
associated with the perception of VV coarticulation. Distance-1 VV coarticulatory
effects (i.e., a vowel influencing another vowel across an intervening consonant) were
associated with strong MMN-like patterns. Distance-2 effects (a vowel influencing
another vowel across three intervening segments) were also associated with a highly
significant MMN-like response, though not in the sub-group of subjects considered
“insensitive” when tested behaviorally. At distance 3 (VV effects across five
intervening segments), the situation was quite different, with at best weakly significant
ERP effects that were positive instead of negative, in contrast with the behavioral (dprime) results which provided more straightforward evidence of some subjects’
perceptual sensitivity. These results do not provide evidence that ERP methodology can
provide a better measure of listeners’ sensitivity to coarticulatory effects than behavior
methods offer. However, this study does offer new information about the topography
18
A few individuals did not appear to generate an MMN-type response even in the easier listening
conditions, but as the correlation results were similar with or without those subjects’ data, their data was
included.
142
and timing of the processing of such effects, which can inform theories seeking to
explain how listeners perceive them.
143
PART II: SIGNED-LANGUAGE PRODUCTION
AND PERCEPTION
CHAPTER 5 — COARTICULATION PRODUCTION IN ASL
5.1. Introduction
The long-distance coarticulatory effects seen in the spoken-language study
described in Part I of this project inspire the question of whether such effects might also
be found in sign language, a question that appears to be unaddressed in the literature to
date, though earlier research has examined some other aspects of coarticulation and
related phenomena in ASL. These include Cheek’s (2001) work on handshape
coarticulation and Mauk’s (2003) examination of location-related effects in the context
of undershoot. Work by Nespor and Sandler (1999) and Brentari and Crossley (2002)
on “h2 spread” is also pertinent here, dealing as it does with another (potentially) longdistance sign phenomenon.
Cheek’s (2001) investigation of four signers used an infrared-based motion
capture system (Vicon) to investigate handshape-to-handshape (HH) coarticulatory
effects in a number of contexts, focusing on the different influences on neighboring
signs of signs articulated with the “1” or “5” handshapes. Participants signed sign pairs
144
exemplifying various handshape sequences such as “1”-“1” (e.g. TRUE SMART) and
“1”-“5” (e.g. TRUE MOTHER), and the resulting motion-capture data were analyzed
for evidence of HH effects. The metric that was used in determining whether such
influence had been exerted was the distance between two markers—one on the tip of
the pinky and the other on the base of the hand—which was assumed to be larger when
the pinky was more extended (as in the articulation of a “5” handshape) and smaller
when the pinky was more curled inward (as for a “1” handshape). Cheek found
significant differences between the “1” and “5” handshape contexts, in both the
anticipatory and carryover directions. In addition, faster signing conditions were often,
though not always, associated with stronger HH effects.
Mauk (2003) also used the infrared-based Vicon motion-capture system and
recruited four signers for his study, which investigated the phenomenon of undershoot
in both spoken and signed language. The second part of his sign study examined HH
effects, and like Cheek’s (2001) study, focused on the different articulatory influences
exerted by signs with “1” and “5” handshapes on the handshapes of neighboring signs.
The other part of his sign study examined location-to-location (LL) effects, and made
use of neutral-space signs as the present study does. Subjects signed sequences
consisting of three signs, each of whose specified locations was either neutral space or
the forehead. In each sequence, the first and third signs had the same location, which
was sometimes the same and sometimes different from that of the second sign. The
sequences could therefore be classified as forehead-forehead-forehead (e.g. SMART
FATHER SMART), forehead-neutral space-forehead (e.g. SMART CHILDREN
SMART), neutral space-neutral space-neutral space (e.g. LATE CHILDREN LATE),
145
or neutral space-forehead-neutral space (e.g. LATE FATHER LATE). It was expected
that neutral-space signs undergoing coarticulatory influence from forehead-location
signs would be articulated at a higher vertical position than they would be otherwise,
and similarly, that coarticulatory influence of neutral-space signs on forehead-location
signs would result in the latter being articulated at a lower vertical position.
Mauk found that forehead-location signs tended to exert significant
coarticulatory influence on neutral-space signs, but that the reverse did not generally
hold, indicating that signs specified for body contact may be less susceptible to
coarticulatory influence than neutral-space signs. Like Cheek (2001), Mauk also found
that faster signing conditions were generally associated with stronger coarticulatory
effects.
The coarticulatory effects found by Cheek and Mauk in their sign studies were
influences of signs on immediately preceding or following signs—“distance-1” effects,
following the terminology established in earlier chapters of this dissertation. In the
present chapter, the possibility that coarticulation may extend over greater distances in
the flow of signed language, just as it has been found to do in spoken language, will be
investigated. Although such a study has, to the best of my knowledge, not been
conducted before, previous studies have shown “h2 spread” to be an example of a
relevant sign-language phenomenon which can extend across non-adjacent signs.
The term “h2 spread” refers to the fact that in some situations, the non-dominant
hand (h2) may assume the handshape and location for which it is specified in a twohanded sign, during the articulation of neighboring one-handed signs. Nespor and
Sandler (1999) give examples of h2 spread in Israeli Sign Language, noting that it can
146
extend further than one sign away from the triggering sign, in either the anticipatory or
carryover directions, though it is blocked by phonological phrase boundaries. These
conclusions are supported by Brentari and Crossley (2002) in a study of prosody in
ASL.
Given the variability in coarticulatory behavior among subjects in the spokenlanguage study presented earlier, it seems reasonable to suspect that such variation will
also be seen among signers, something that the present study also seeks to investigate.
In Chapter 2, it was seen in the review of the relevant literature that many factors are
relevant as we seek to understand spoken-language coarticulation. The same will no
doubt be true of coarticulation in signed language. For example, Lucas, Bayley, Rose
and Wulf’s (2002) investigation of forehead-to-cheek location changes found that while
the location of the preceding sign was an important factor influencing whether such a
location shift would occur in a target sign, grammatical category of that target sign was
an even stronger predictor.19 In the speech production study discussed in Chapter 2,
where the focus was on interspeaker variation, a relatively small number of contexts
were examined and it was acknowledged that some factors relevant to the
understanding of coarticulation would not be investigated; the same applies to the sign
study to be presented here.
Presented below in Figure 5.1 (repeated from the Introduction) are at left, the
familiar vowel quadrangle, and at right, some typical sign locations. The sign HAT, for
example, is articulated on the forehead, while the sign PANTS is articulated by both
19
For the present study, coarticulatory effects of nouns on verbs will be investigated. Lucas et al. (2002)
found that in terms of susceptibility to coarticulatory influence from neighboring signs, nouns and verbs
had an intermediate ranking, between function words (the most susceptible) and adjectives (the least).
147
hands at waist level.20 Shown near the middle of the respective articulatory spaces are
schwa and neutral signing space; the latter is labeled “N.S.” Neutral space is the area in
front of the signer’s body which serves as the location for many signs not articulated at
particular points on the body. The arrows in the figure represent the expected direction
of influence on schwa and neutral space of nearby vowels [i] and [a] in the case of
schwa and of the illustrated sign locations (forehead, shoulder, waist) in the case of
neutral space.
The present study is motivated by the idea that schwa and neutral space may be
somewhat analogous, both in terms of their central position within their respective
articulatory spaces and of their coarticulatory behavior. It is important to point out that
there is no claim being made here that (1) neutral space is in some sense underspecified
in the way some researchers have suggested schwa may be (e.g. see Browman &
Goldstein, 1992; van Oostendorp, 2003), or (2) that the sign parameter Location is
analogous in sign phonology to vowels in spoken-language phonology.
20
As is customary in the literature on sign language, glosses of ASL signs will be given in capital letters.
148
Figure 5.1. Position and expected coarticulatory behavior of schwa in vowel space (left)
and of neutral space (labeled “N.S.”) in the greater signing space.
5.2. Initial study
Recall that for the spoken-language study, seven speakers were investigated and
recorded, and these recordings analyzed, so that a suitable subset of those recordings
could be modified for use as perception-study stimuli. Subsequent subjects were then
investigated with respect to both coarticulatory production tendency as well as
perceptual sensitivity. For this sign language study, this procedure was modified.
Because it was expected that fewer ASL signers than English speakers would be
recruited, an initial production study with one signer was conducted to determine the
types and magnitudes of effects that might be expected, (1) in preparation for the fullscale production study, which is discussed later in this chapter, starting in Section 5.3,
and (2) so that suitable stimuli could be created immediately for the sign-language
perception study, which will be described in detail in Chapter 6.
The latter point was particularly important because it is not possible to perform
the kind of resynthesis on video stimuli that was relatively straightforward to carry out
in the creation of the speech-study stimuli. Therefore, an initial production study was
advantageous in establishing appropriate norms—both in terms of spatial magnitude
and in terms of “distance” as the term was used in the speech study (i.e. across how
many intervening signs effects might be seen)—for creating stimuli for the sign
perception study. In addition, while it was very easy to recruit English users for the
149
speech study, deaf signers are a much smaller pool of potential subjects, so the
procedure followed in the speech study—obtaining production data from several
language users, creating stimuli from some subset of recordings from that group, and
then having succeeding subjects do both production and perception tasks—was not
ideal for the sign study.
Hence, the following basic procedure was followed instead. First, this initial
production study with one signer was conducted. The results indicated what kinds of
effects might in general be expected, both in terms of spatial magnitude and across how
many intervening signs. This information was then used in planning the full-scale
production and perception studies, in both of which all subsequent subjects
participated.
5.2.1. Methodology
5.2.1.1. Signer 1
The sole participant in the initial production study was a female native signer of
ASL who was an employee in the laboratory where the study was conducted and agreed
freely to take part. Importantly, because of her native-signer status and as the first
participant in the experiment, her intuitions regarding ASL syntax and particular lexical
items—explained below—set the precedents that later subjects were asked to follow.
She will be referred to throughout the discussion of the sign experiments as Signer 1,
with other signers to be introduced presently.
150
5.2.1.2. Task
Figure 5.2. The locations of the context signs FATHER and MOTHER relative to
neutral signing space (labeled “N.S.”), which is the location where the sign WANT is
articulated.
Randomized lists containing five copies of each of the following two ASL
sentences, interspersed among 20 filler sentences, were used. According to Signer 1,
without the second occurrence of the pronoun “I” the sentences would not seem natural
in ASL.
“I WANT GO FIND FATHER I.”
“I WANT GO FIND MOTHER I.”
The location of the context signs FATHER and MOTHER (the chin or
forehead; see Figure 5.2 above), served as context location, while the location of the
neutral-space sign WANT was the target location, corresponding to the distance-3
151
condition in the spoken-language study. The sign WANT is a particularly convenient
target item because its articulation includes a lowering and pulling-back movement
toward the signer which is very easily spotted in the motion-capture data. This is so for
other signers as well, not only Signer 1. The coarticulatory effects of the context signs’
location on the location of the distance-1 and -2 signs FIND and GO will not be
examined here because they did not have clear motion-capture signatures like that of
WANT. The location of WANT in a typical utterance is shown below in Figure 5.3.
Figure 5.3. The location of the target sign WANT in a typical utterance.
The signs MOTHER and FATHER are a minimal sign pair, formed with the
same handshape (“5”) and movement (two taps of the thumb against the body), but at
different locations. MOTHER is articulated on the chin, while FATHER is articulated
on the forehead, as illustrated earlier in Figure 5.2. The preceding three signs in these
sentences—FIND, GO and WANT—are all articulated in neutral signing space; it is
152
expected that when such signs are articulated in the FATHER context, they may be
positioned higher on average than in the MOTHER context. The first and last sign of
each sentence, I, is articulated on the chest.
Figure 5.4. Position of the two ultrasound markers on the hand and the one marker on
the neck.
5.2.1.3. Motion capture data recording
The signer signed these sentences while seated, with two ultrasound “markers”
(emitters) attached to the back of the dominant hand and one reference marker attached
to the front of the neck, as shown above (on a different person) in Figure 5.4. The same
marker configuration was used for this and all later subjects. The ultrasound signals
were detected with a set of microphones located approximately 750 cm away (Zebris
system CMS-HS-L with MA-HS measuring unit; data collection performed with
WinData software). This system uses triangulation to determine the position in threedimensional space of each marker at a given moment; this spatial information is
recorded at a 60 Hz sampling rate with 0.1 mm spatial precision. To obtain relativized
153
coordinates, the coordinates of the neck marker were subtracted from those of the wrist
marker since absolute coordinates would tend to change if the signer shifted her body
position, while relativized coordinates should be more stable.21 Figure 5.5 below shows
how the x-, y- and z-directions are defined in this study. Smaller or greater x-, y- and zvalues correspond respectively to the directions left or right, back or forward, and down
or up, relative to the signer.
While signing, the signer was videotaped in addition to being recorded with the
motion-capture equipment, so that the sign sentences could be inspected later. This was
necessary in order to ensure that motion-capture data from utterances containing
performance errors likely to be problematic for the purposes of a coarticulation study—
such as false starts or long mid-sentence pauses on the part of the signer—would not be
included in the analysis.
The sentences that the signer needed to sign were presented on a computer
screen 36 inches in front of the signer. The signer began the articulation of each
sentence with her hands in her lap, pressed a button on a keyboard (located at the
nearest edge of the same table that the computer screen rested on) in order to see the
next sentence that she was to sign, signed that sentence, returned her hands to her lap,
and again pressed the button to proceed to the next sentence. The sentences were
organized in blocks, with the same sentence frames in each block and the context signs
randomly ordered within blocks.
21
The wrist marker generally provided cleaner data than the back-of-hand marker, in the sense of having
fewer cases of occlusions, signal drop-out and other problems, and hence the analyses presented here use
relativized wrist position rather than relativized back-of-hand position. Because z-values relativized to
the neck are negative for positions below the neck and positive for those above, a constant was added to
all relativized z-values for each subject to reset the z=0 position to approximately waist level.
154
Figure 5.5. Definition of x, y, and z dimensions relative to signer.
5.2.2. Results and discussion
Figure 5.6 below shows the z-coordinate (altitude) of the wrist of Signer 1’s
dominant hand during the articulation of two sentences of the form “I WANT GO
FIND (X) I,” where (X) is a context sign MOTHER or FATHER. Time is shown along
the horizontal axis with successive labels 1 second apart. The sentences shown in the
figure have context words MOTHER and FATHER respectively (the intervening filler
sentences had other context signs not discussed here). The overall up-then-down pattern
of each sentence reflects the movement of the signer’s dominant hand, first from the lap
to the chest (for the sign “I”) and neutral space region (WANT GO FIND), then to its
highest point on the chin or forehead (for MOTHER or FATHER), and finally back
down to the chest area for “I” and then to the subject’s lap.
Each of the two arrows pointing toward the small zigzags near the start of each
of those two sentences indicates a local minimum characterizing the sign WANT,
which is articulated with both hands facing palms-up in neutral space making a slight
155
pulling motion down and toward the signer. It is the z-coordinate at this local minimum
that will be compared between contexts; it is expected that in general, it will have a
greater value in sentences whose context signs are located higher on the subject’s body,
as happens to be the case in the particular instantiations of the MOTHER and FATHER
sentences shown in Figure 5.6. It will be seen presently that this characterization of
WANT (i.e. as having a dip in the z-dimension) held true in general for all of the
signers who took part in the sign production studies, though with certain differences in
detail that will be explored in the discussion of the main sign production experiment in
Section 5.3.
156
Figure 5.6. The relativized z-position (height in cm) of Signer 1’s wrist during the
articulation of two ASL sentences of the form “I WANT GO FIND (X) I.” The arrows
indicate the local minima characteristic of the sign WANT; these minima were taken as
defining that sign’s spatial location for the purposes of analysis. For signs other than
WANT and the context signs MOTHER and FATHER, the labels in the figure can only
be considered approximate indications of sign position.
Such z-minima for WANT were found very consistently in the sentences signed
by Signer 1. Because their position in the z-dimension was constrained by particular
upper and lower bounds (namely, the locations of the surrounding signs), their physical
range was relatively narrow and no rejection or replacement of outliers was needed. For
157
this initial study as well as for the main sign production study, a trial could be rejected
for one of two reasons. First, if the video recording of the signing session showed that
the signer made an obvious error or false start during a sentence, that trial was rejected.
For example, this would be the case if in signing the sentence “I WANT GO FIND
MOTHER I,” the subject signed “I WANT GO FIND,” then moved the hand to the
forehead and began to sign FATHER before realizing her mistake and finishing the
sentence with “MOTHER I.” For the entire pilot sign study, data for 65 sentences
signed by Signer 1 were obtained, data for five of which were rejected for such reasons,
a 7.7 percent rejection rate.
A second possible reason for rejecting a trial was when a signer reduced the
target sign—which in the pilot study was always WANT—so much that the sign’s
characteristic z-minimum was not visible. Any decision to reject trials was of course
not ideal, and was made only as a last resort. It should be noted that these z-minima
were present in many cases in which the signing of WANT was so rapid in the flow of
signed sentences that it was hard to discern as a separate sign even in the video
recording. Since the video had a lower temporal resolution—29.97 frames per
second—than the motion-capture data, which were sampled at 60 Hz, the video
recordings were not more useful in seeking to recover “lost” trials than the motioncapture data was. However, these two kinds of data did complement each other. In
particular, video recordings for cases of missing z-minima almost always showed that
in those cases, the signer had actually “contracted” the target sign with neighboring
signs like “I.” These cases are of enough interest that they will be discussed separately
158
when the results of the main experiment are described. Signer 1 did not contract signs
in this way and so no such rejections were necessary for her trials.
Table 5.1 below gives the average z-value of the local minimum defining the
sign WANT in the contexts FATHER and MOTHER, together with the significance
testing outcome using a paired t-test. Paired t-testing was done to guard against the
possibility that neutral signing space might drift slightly over the course of the
experiment, being more similar for adjacent or near-adjacent utterances.22 Therefore,
the pairings were made between z-values for WANT in the first FATHER and
MOTHER sentences, in the second such pair, and so on through the fifth. In the case of
missing data, as when a trial was rejected, non-paired t-tests were performed instead.
The results show that the context-related difference in height was in the
expected direction (a higher altitude for WANT in the context of FATHER, the sign
articulated higher on the body), and significant at the p<0.01 level. Therefore, not only
did LL coarticulation occur, but did so across two intervening signs. Following the
terminology established in the earlier-discussed speech studies, such sign-to-sign
effects will be referred to as distance-3 effects, and likewise for distances 2 (effects
across one intervening sign) and 1 (adjacent sign-to-sign effects).
Another signing session was then run with the same signer and following the
same methodology, but with other context signs. Since MOTHER and FATHER are
located relatively closely in signing space, it was felt that context-related effects for
pairs of signs articulated as far apart as possible in signing space might result in
stronger effects, just as [i] and [a] were chosen as targets of study in the speech study
because of their great distance in vowel space. The other context pairs that were tested
22
This issue will be examined more closely in Section 5.3.3.
159
were DEER-RUSSIA (a minimal or near-minimal sign pair; both are 2-handed signs
articulated with “5” handshapes, the former at the temples and the latter at the waist),
and HAT-PANTS (signed with a “B” handshape on the head, and with an open-closingto-bent “B” at the waist, respectively). The same sentence frame, I WANT GO FIND
(X), was used. The results for those sign pairs are also given in Table 5.1. The result for
each of these two context pairs is a trend in the expected direction which approaches
but does not reach significance.
FATHER (forehead)
MOTHER (chin)
Average z value (cm),
with SD in brackets
8.65 [1.43]
7.12 [1.28]
HAT (head)
PANTS (waist)
17.1 [1.97]
16.0 [2.47]
p = 0.102
DEER (head)
RUSSIA (waist)
16.7 [0.92]
15.4 [0.77]
p = 0.104
Context
Significance test results
p < 0.01
Table 5.1. Average relativized z-position (height) of the sign WANT in various
contexts at distance 3, with results of significance testing between context pairs also
given.23
An investigation of LL effects in other directions than the z-dimension, and at
other distances, was also conducted. To investigate forward-back (y-dimension) and
23
The apparent difference in z-values between the first word pair (where z = approx. 8 cm) and the
others (where z = approx. 16 cm) is brought about because the data for the final two word pairs was
collected in a different recording session than for the FATHER-MOTHER pair. Since these z-values are
recorded with respect to an arbitrary point in three-dimensional space which is approximately but not
exactly the same between recording sessions, comparisons are valid between items in each context word
pair but not between words in different pairs.
160
right-left (x-dimension) effects, the behavior of WANT in the context of the signs
BOSS, CLOWN and CUPCAKE was also examined. These form a near-minimal sign
triple, all formed with a “bent-5” or “claw” handshape, and are located respectively on
the shoulder, nose and upright palm of the non-dominant hand (held in neutral space).
The locations of the context signs HAT, PANTS, DEER, RUSSIA, CLOWN, BOSS
and CUPCAKE are illustrated below in Figure 5.7.
In some of the signs shown in the figure, contacts at specified locations are
aided by noticeable accommodation by parts of the body other than the dominant hand.
For example, in the figure it can be seen that HAT and DEER are articulated with the
head tilted forward to meet the hand(s). While this may result in a somewhat shorter
travel distance on the part of the hand(s) in such situations, it must be kept in mind that
the coarticulatory effects of a given context item are being compared to those of
another context item, and that these pairings were made between items located far from
each other in signing space. For example, the height (z-value) of the dominant hand in
signing HAT is still quite high compared to that for PANTS, regardless of whether the
head is tilted forward while HAT is signed. Therefore, coarticulatory effects should not
be expected to be substantially reduced in such situations compared to what they would
be if no such accommodations occurred.
161
Figure 5.7. Starting on the top row, going left to right, the locations of the context signs
HAT, PANTS, DEER, RUSSIA, CLOWN, BOSS and CUPCAKE are shown.
162
Analysis of coarticulation in all of these contexts indicates that effects like those
shown in Table 5.1 for height may also be found for side-to-side and front-back
location coarticulation as well. Table 5.2 below shows results at distance 1 for these
two context sign pairs and one sign triple. The sign WANT is again the target sign, this
time in the distance-1 condition, in the sentence “I WANT (X) I,” where the (X)
represents the context item.
HAT (head)
PANTS (waist)
Average z
(up-down)
17.8 [1.5]
15.7 [1.9]
DEER (head)
RUSSIA (waist)
16.3 [0.6]
14.5 [1.7]
Context
BOSS (rt. shoulder)
CLOWN (nose)
CUPCAKE (N.S.)
Average x
(right-left)
36.5 [1.1]
34.5 [1.1]
35.6 [0.7]
Average y
(front-back)
0.20 [1.1]
1.41 [1.3]
1.43 [0.4]
Significance
test results
p=0.08
p<0.05
x: p<0.01,
y: p<0.05
Table 5.2. Average relativized values in cm of x (right-left), y (front-back) and z (updown) dimensions of the sign WANT in various contexts at distance 1, with results of
significance testing between contexts also given. Standard deviations are also given in
brackets. These measurements are based on the position of the wrist marker on the
signer’s dominant hand
The z-related effects for the context word pairs HAT-PANTS and DEERRUSSIA are stronger here than they were at distance 3: context-related differences at
distance 1 are in the neighborhood of 1.5-2 cm, rather than the 1 cm or so that was
163
typical at distance 3. Since BOSS is articulated further to the right than CUPCAKE or
CLOWN, one might expect that WANT in the BOSS context would be articulated
farthest to the right, which did in fact occur, as indicated in the second column of Table
5.2. Similarly, the third column in the table shows that WANT in the BOSS context
was articulated in a less frontward position than in the CUPCAKE or CLOWN
contexts, which one might expect given that the right shoulder is further back in the ydimension than either the nose or neutral space.24
This chapter now continues with a discussion of the main sign production
study. How the results of the initial sign production study were used to inform the sign
perception study will be presented in Chapter 6, along with a complete presentation of
that perception study,
5.3 Main study
The results of the initial study indicated that LL coarticulatory effects occur
between adjacent signs as well as over wider ranges. For the main production study, the
same basic methodology was followed as in the initial study, but with some
modifications to the sets of sentence frames and context items.
24
In contrast to these results, there were a small number of context-related effects which were in the
contrary-to-expected direction. No clear pattern emerged in such cases which would account for these
findings, but other researchers in sign phonetics have sometimes seen similar instances of dissimilatory
behavior that also resisted easy explanation (e.g. Claude Mauk and Martha Tyrone, p.c.). Some evidence
of dissimilatory behavior was also seen occasionally in the signers who took part in the main sign
production study, as will be seen presently.
164
5.3.1. Methodology
5.3.1.1. Subjects
Four other deaf participants took part in this study. All were residents of
Northern California who were recruited through advertisements or word-of-mouth and
were paid for their participation. All were uninformed as to the purpose of the study.
Relative to the spoken-language study participants and their use of English,
these signers’ backgrounds were quite varied in terms of their ages of acquisition of
ASL and in many aspects of their usage of ASL. Therefore, some additional detail
concerning their individual backgrounds and signing behavior will be presented both
here and throughout the rest of this chapter. Table 5.3 below gives some basic
demographic information concerning the five subjects who took part in the initial and
main sign studies, starting with the signer from the initial study. Continuing the
consecutive numbering of study participants established in earlier chapters, these
individuals would be considered Subjects 39 through 43, but for convenience they will
generally be referred to subsequently as Signers 1 through 5.
Subject
Gender
Age
Handedness
Age of Acquisition of ASL
39 (Signer 1)
f
35
R
Native
40 (Signer 2)
f
38
L
Native
41 (Signer 3)
m
37
L
Late (~high school)
42 (Signer 4)
m
40
R
Early (~age 3)
43 (Signer 5)
f
33
R
Early (~early childhood)
Table 5.3. Demographic information on the five signers who participated in the sign
production studies. Signer 1 was the sole participant of the initial sign study.
165
5.3.1.2. Task
Each of the four signers was asked to perform essentially the same task that was
performed by Signer 1 in the initial production study just described, but with six
repetitions of each sentence instead of five, in the hope of increasing statistical power
without increasing the overall duration of the task too greatly. Some changes were also
made in the set of sentence frames and context signs. First, the initial study having
found significant results as far as at distance 3, the possibility of further-distance effects
was investigated in the main study by constructing an additional sentence frame, I
WANT GO FIND OTHER (X), to create a distance-4 condition.
Figure 5.8. The beginning and ending points of the target sign WISH.
Second, an additional target sign, WISH, was examined; this sign is articulated
with a “C” handshape, with the palm toward the signer, with a downward motion on the
166
chest, which results in a trajectory in the z-direction, and a characteristic local
minimum, similar to those of WANT. Therefore, target signs in neutral space (WANT)
as well as on the body (WISH) could be investigated in the same study. The starting
and ending points of WISH in a typical utterance are illustrated above in Figure 5.8.
The set of sentence frames and context signs that were used in the main study are
shown in Table 5.4, with (X) representing the context sign in the sentence frames.
Distance
Sentence frame for WANT
Sentence frame for WISH
1
I WANT (X).
I WISH (X).
2
I WANT FIND (X).
3
I WANT GO FIND (X).
4
I WANT GO FIND OTHER (X).
Handshape &
Sign
I WISH GO FIND (X).
Location
Movement
(Palm Orientation)
HAT
B (down)
Forehead
Tap head twice
PANTS
B/Bent B (down)
Waist or thighs
Flick fingertips twice
BOSS
Bent 5 (down)
Shoulder
Tap shoulder twice
CUPCAKE
Bent 5 (down)
Non-dominant palm Tap palm twice, no rotation
CLOWN
Bent 5 (back)
Nose
Tap nose twice, no rotation
Table 5.4. Sentence frames and context signs used in the main sign production study.
The set of context items used in the main production study was {HAT, PANTS,
BOSS, CUPCAKE, CLOWN, <red>, <green>}. An explanation of the context items
<red> and <green> will be given presently. The target sign WISH was not investigated
for all four distance conditions mostly as a way to shorten the overall duration of the
167
task, which had been found in pilot testing to be taxing with all combinations of
sentence frames and context signs that were initially used. For similar reasons, the
context sign pair DEER-RUSSIA was not included in the task for signers in the main
study. An additional reason for omitting this pair was that the form of the sign for
RUSSIA used in this experiment is a historically older variant; most signers now use a
newer variant which is not articulated near the edge of signing space and thus was not a
useful candidate for the sorts of comparisons being made here. During pilot testing for
the main experiment, use of the historically older form often resulted in apparent false
starts or in outright substitutions with the modern variant, as shown below in Figure
5.9.
Figure 5.9. Newer variant of RUSSIA, signed here by accident instead of the older
variant.
168
In addition, two non-linguistic actions were also added to the set of context
items, so that a comparison of linguistic and “non-linguistic” coarticulation might be
made. The goal here was to create sign-like actions articulated at locations spanning the
vertical range of signing space just as the locations of the context signs HAT and
PANTS do. The question then was whether coarticulatory effects in this “nonlinguistic” condition would be similar to those in the linguistic condition for context
signs spanning a similar distance in the articulatory space.
Figure 5.10. The apparatus used for the non-linguistic contexts <red> and <green>,
performed by flipping the top and bottom switches on the device, respectively. The
distance between the two switches as shown here is approximately 24 inches.
169
These non-linguistic actions, to be referred to as <red> and <green>, were
performed by flipping one of two switches attached to a vertically oriented bar-like
holder, shown in Figure 5.10 above, that was braced to the same table on which the
computer screen and keyboard were resting. This holder was positioned in front of the
signer’s dominant hand, with one switch at the holder’s high end and the second switch
located at its low end directly below the first, a height difference of approximately two
feet; this distance was adjusted slightly for each subject, as explained below. Flipping
the top switch turned on a red light (and hence this non-signing action was termed the
<red> context), and flipping the bottom switch turned on a green light (the <green>
context). During the course of the signing task, the instruction to flip the top switch was
given by a red upward-pointing arrow, and likewise for the bottom switch and a green
downward-pointing arrow. This red and green pattern was chosen in order to create a
task that was non-linguistic but nevertheless intuitive and would not require special
training or a linguistic “go” signal on each trial, as it was assumed subjects would be
familiar with the orientation of red and green lights on traffic signals.
The vertical position of each switch was adjustable, so that the distance between
them could be modified for each signer. This was done in order to make their
separation proportional to each signer’s own signing space, this proportion being
defined in terms of signers’ height: the distance from the <red> location to the <green>
one for each signer was set at one-third of that individual’s height, with the <green>
switch positioned just under the table, at lap level, and the <red> switch at head level
170
with the subject seated, thus approximating the span of the locations of the linguistic
context items PANTS and HAT.
During the course of signing the sentence-frame + context-item combinations,
<red> or <green> were treated like all of the other (i.e. sign) items. In other words,
signers were instructed to embed them in the appropriate sentence frame, so that for
example in the distance-3 condition with WANT as target sign and <red> as context
item, the subject signed “I WANT GO FIND,” immediately flipped the top switch,
finished the sentence with the resumptive “I,” and put her hands in her lap in
preparation for signing the next sentence.
The task was organized into six blocks, one for each of the sentence frames
given earlier in Table 5.4. Since each frame was articulated with eight possible context
items repeated six times each, 48 sentences were signed in each block, for a total of 288
sentences in total signed by each subject. The task therefore required continuous and
repetitive signing for on the order of 25 minutes, not including breaks. For Signers 1
and 2, the task was not organized with one frame per sentence block; instead, there was
frequent shuffling of sentence frames within blocks. However, this meant that if a
subject decided to quit the task with a substantial portion of the trials left uncompleted,
some or all sentence frames would have been repeated fewer than the desired six times
per context item, perhaps weakening the statistical results. In fact, Signer 2 became
fatigued by the task and did not complete it, though fortunately this occurred near the
end of the task. By having later subjects complete all repetitions of a given sentence
frame in one block before moving on to another frame in the next block, this issue was
bypassed.
171
Signers were instructed to sign at a comfortable, natural rate and to avoid overly
slow or formal signing. Instructions similar to those given to speech study participants
were given here: signers were asked to imagine they were in an informal social
situation, that someone had asked what they wanted, and that for whatever reason, the
answer was “I WANT GO FIND CUPCAKE” (or whatever the sentence to be signed in
that trial was). Signers were given a brief warm-up task so that they would be familiar
with the basic format of the real experiment—the sentence frames, context signs, nonlinguistic context items, and so on. In some cases, which will be described in detail
later, subjects’ preferred signs for some items like HAT or PANTS were completely
different than had been established by Signer 1, and in such cases the warm-up task
provided an opportunity to politely ask subjects to employ the signs that were being
used in this experiment. In such cases it was emphasized that this was not a matter of
their preferred forms being “wrong” but rather of particular sign forms having been
chosen for a good reason which would soon be made clearer. (As with the speech
study, subjects were given an explanation of the purposes of the overall study between
their completion of the production task and their commencement of the perception
task.)
During the actual experiment, if subjects made substitutions of one sign for
another (e.g. in cases where a signer’s preferred form of HAT or PANTS was a variant
different from the form chosen for this experiment), polite reminders were given not to
do so. On the other hand, if the desired signs were being performed but in a reduced or
otherwise slightly modified form (e.g. because of assimilation of the locations of
adjacent signs), no such corrections were made, since variation of this sort reflected the
172
kind of phonetic/phonological behavior that this project sought to investigate. It will be
seen presently that such behavior was quite pronounced in some cases, to the point that
it was inconvenient from the point of view of statistical analysis. Nonetheless, it
seemed preferable that subjects not become self-conscious of their behavior at the
phonetic level, lest their articulation become slower, more formal, or otherwise less
natural than in their usual signing.
5.3.2. Results and discussion
Discussion of the results to be presented here is more involved than was needed
for the initial sign study for a number of reasons. Most importantly, there was a great
deal of variability among signers in their signing behavior, particularly with respect to
the amount of reduction or assimilation that was seen for certain signs, as well as in
other ways that will also be discussed presently. For individual signers, there was much
variability in signing behavior even for particular items in particular contexts. Although
comparisons between modalities are necessarily inexact, such variability seems more
pronounced in the signing data than in the speaking data, a theme that will recur in the
remainder of this dissertation.
For group analyses of the main sign study data, a normalization procedure was
used, similar to the one based on Gerstman (1968) that was applied to speakers’ raw
formant values in carrying out group analysis of the speech studies discussed earlier.
Here, starting with a given signer’s raw motion capture value z raw , the corresponding
normalized value is given by the formula
z norm = 999 * (z raw - z min) / (z max - z min),
173
where z max and z min are the assumed highest and lowest values of z-locations in that
signer’s signing space. Each signer’s z min was an averaged z-value of the signer’s hand
in the lap position between signed sentences during the recording session (essentially
the same height as that of the switch used in the <green> condition), while z max was
that value plus one-third the signer’s height, the same value as that which had been
used to determine the span between the switches the signers flipped in the <red> and
<green> contexts. The same scaling factor was used to normalize x- and y-values, with
the normalized values centered around the mean x- and y-coordinates of the sign
WANT across all articulations of that sign in a given distance condition and sentence
frame. For left-handed Signers 2 and 3, the x-values obtained from the formula above
were subtracted from 999, essentially providing a “mirror image” equivalent of those
values relative to the x-dimension, so that testing for rightward or leftward effects
would be consistent across all four signers.
The procedure therefore has the effect of scaling each signer’s motion-capture
coordinates relative to the size of his or her own signing space, making comparisons
between signers more reasonable—though, as in the case of spoken language
normalization procedures, not unproblematic. Because of the scaling factor used, one
normalized unit is equivalent to a spatial distance on the order of half a millimeter,
varying somewhat for each signer and each dimension x, y and z. Group results using
these normalized articulatory space measures are presented in the next section, which is
then followed by four sections each devoted to the results for an individual signer.
174
5.3.2.1. Group results
ANOVAs with context item as factor were used to compare context-related
differences for each sentence frame for each of the x-, y- and z-directions, in cases
where an expected difference in direction could be predicted based on the relative
positions of the context items, these predictions being as follows. For the pairs HATPANTS, <red>-<green>, CLOWN-BOSS, BOSS-CUPCAKE, and CLOWNCUPCAKE, it was expected that the first context item in each pair as just listed would
be associated with larger z-values (i.e. greater height) than the second item. For BOSSCLOWN and BOSS-CUPCAKE, greater x-values (i.e. more rightward position) were
expected for target items in the context of BOSS; and a greater y-position (more
frontward) was expected for targets in the context of the first item in each pair
CUPCAKE-BOSS, CUPCAKE-CLOWN and CLOWN-BOSS. The approximate
locations of these items are illustrated in Figure 5.11 below, as points on the body (left)
and as points in an idealized three-dimensional space (right). The locations of these
items also indicate their expected direction of coarticulatory influence on nearby
neutral-space signs.
175
Figure 5.11. Idealized locations on the body and in three-dimensional-space of the
seven context items, which determine the expected direction of coarticulatory influence
these items should have on preceding target signs.
The group and individual quantitative results will be given in both numerical
and pictorial form, with the latter following the same basic layout as seen in the threedimensional schematic presented in Figure 5.11. The perspective used in the schematic,
a front-left view, matches that of the video recordings that were made of signers during
the experiment, images from which are interspersed throughout this chapter. The three
axes indicate the left-right (x), back-front (y) and up-down (z) directions. For lefthanded Signers 2 and 3, the sign BOSS was articulated on the left shoulder, so the
expected coarticulatory influence of that sign is in the negative-x direction (i.e. to the
left from the point of view of the signer). As mentioned before, these handednessrelated differences in the x-direction were corrected for the purposes of group testing,
176
but in the subsequent presentation of individual subjects’ numerical results the original
x-values are used.
Table 5.5 below gives the numerical results of the sign study averaged over
Signers 2 through 5, given in normalized coordinates. The table is divided into six
subsections, each giving the results for one particular sentence frame, as indicated
below each subsection.25 Each of the six subsections is accompanied by a threedimensional diagram like that in Figure 5.11, placed just above the numerical results. In
these diagrams, values have been scaled independently for each of the three spatial
dimensions. This was done to maximize the contrasts of interest, but also results in
some distortion of the relationships of the points within the space as a whole. It should
be borne in mind that the three-dimensional diagram in Figure 5.11 represents a volume
approximately equivalent to the upper half of the signer’s body, but the diagrams in
Table 5.5 represent much smaller regions—namely those in which the target signs
WANT and WISH were articulated in the various contexts—typically spanning
distances on the order of several centimeters. Standard deviations are not given here but
were generally on the order of 50 normalized units, or about two to three centimeters.
Unexpectedly, none of the differences in target-sign location between context
pairs results were statistically significant. However, in 37 of the 60 context pairs that
were examined, trends were in the expected direction, which itself can be considered
significant in that such an outcome would be less than 5% likely if context-related
25
Data for Signer 1 were not included in these group results because that signer had participated in
multiple recording sessions, using somewhat different sentence frames, numbers of repetitions, and
contexts than were used in sessions with later signers. An overview of results for that subject was given
in Section 5.2.2.
177
differences were purely random. In addition, despite the absence of significant effects
between particular context pairs, other trends were apparent.
For example, differences in target-sign position related to the HAT-PANTS
context pair were in the expected direction in all sentence frames but one, the distance2 condition with target sign WANT. On the other hand, results related to the nonlinguistic context pair <red>-<green> were completely mixed. For the set BOSSCLOWN-CUPCAKE, it was expected that BOSS should be associated with the greatest
x-values, which was the case in all conditions in all cases but the first sentence frame (I
WANT (X)), in which the values for the BOSS-CUPCAKE pair are in the contrary-toexpected direction, though the magnitude of their difference in this case is minimal.
Finally, in the y-direction, where it was expected that values associated with the BOSS
context should be smallest, the expected trends are more or less evident for distances 1,
2 and 3 with target sign WANT, modulo some noise on order of 10 normalized units,
about half a centimeter.
178
I WANT (X)
<green>
z
HAT
y
x
CUPCAKE
<red>
PANTS
BOSS
CLOWN
Context pair
HAT–PANTS
x
y
<red>–<green>
BOSS–CLOWN
BOSS–CUPCAKE
CLOWN–CUPCAKE
z
382 – 367
370 – 390
515 – 503
515 – 518
551 – 543
551 – 572
543 – 572
Distance 1, WANT target: I WANT (X)
361 – 355
361 – 365
355 – 365
179
Context pair
HAT–PANTS
y
x
<red>–<green>
z
365 – 379
375 – 356
BOSS–CLOWN
531 – 514
535 – 551
BOSS–CUPCAKE
531 – 516
535 – 557
CLOWN–CUPCAKE
551 – 557
Distance 2, WANT target: I WANT FIND (X)
387 – 373
387 – 373
373 – 373
I WANT GO FIND (X)
z
<red>
HAT
BOSS
y
x
<green>
PANTS
Context pair
HAT–PANTS
x
CUPCAKE
CLOWN
y
<red>–<green>
BOSS–CLOWN
508 – 503
483 – 494
BOSS–CUPCAKE
508 – 503
483 – 478
CLOWN–CUPCAKE
494 – 478
Distance 3, WANT target: I WANT GO FIND (X)
z
397 – 359
404 – 367
390 – 357
390 – 353
357 – 353
180
Context pair
HAT–PANTS
<red>–<green>
x
y
z
395 – 326
367 – 391
BOSS–CLOWN
518 – 502
499 – 462
360 – 382
BOSS–CUPCAKE
518 – 518
499 – 513
360 – 341
CLOWN–CUPCAKE
462 – 513
382 – 341
Distance 4, WANT target: I WANT GO FIND OTHER (X)
181
Context pair
HAT–PANTS
y
x
z
302 – 234
<red>–<green>
BOSS–CLOWN
BOSS–CUPCAKE
CLOWN–CUPCAKE
296 – 250
488 – 467
488 – 480
492 – 481
492 – 485
481 – 485
Distance 1, WISH target: I WISH (X)
287 – 307
287 – 292
307 – 292
I WISH GO FIND (X)
<green>
z
BOSS
CUPCAKE
y
x
<red>
HAT
PANTS
CLOWN
Context pair
HAT–PANTS
x
y
<red>–<green>
BOSS–CLOWN
478 – 473
465 – 452
BOSS–CUPCAKE
478 – 476
465 – 459
CLOWN–CUPCAKE
452 – 459
Distance 3, WISH target: I WISH GO FIND (X)
z
303 – 300
308 – 335
331 – 285
331 – 316
285 – 316
Table 5.5. Numerical results for the main signing study, with locations of target signs
averaged and compared between contexts. Values are given in normalized x-, y- and zcoordinates. No context-related differences were found to be significant.
182
Notwithstanding these trends, the overall absence of significant group effects is
very different from what was observed in the speech studies, where very strong effects
at distances 1 and 2 were seen, along with weaker effects at distance 3. As mentioned
earlier, substantial variation among these signers in many aspects of their signing was
observed, so a more detailed investigation of the behavior of individual subjects will be
conducted than was the case for the speakers in the English production study. Some of
these differences, such as differences in lexicon or syntax, were mentioned earlier and
will be discussed in more detail in the individual results sections which follow.
5.3.2.2. Signer 2
183
Figure 5.12. Motion-capture data for two of Signer 2’s sentences in the distance-1
condition with target sign WISH (i.e., sentence frame I WISH (X) I), with context items
DEER and CUPCAKE. Clear z-minima (circled) are present in both sentences, typical
of the pattern in the motion-capture data of this and other signers for WISH. The scale
on the y-axis is in centimeters but the values themselves are arbitrary. The x-axis
indicates approximate time in seconds.
Signer 2, like Signer 1, was a native signer. The sign forms she preferred were
similar to those established for the study by Signer 1, though her signing behavior was
somewhat different, particularly in that Signer 2 showed more reduction of the sign
WANT as the recording session progressed, so that in some trials, the characteristic zminimum for that sign was either much subtler or entirely absent. In such situations, the
spatial coordinates of WANT could not be determined and the trial had to be rejected.
These factors, together with some cases of problematic signing behavior such as errors,
pauses and false starts—all of which were seen to some degree in later signers as
well—resulted in 12.7 percent of completed trials being rejected for this participant.
Signer 2 expressed some fatigue with the task and asked to end the session early,
though fortunately this occurred after almost all of the planned trials had already been
completed.
For this signer and for the others, rejected trials tended to occur less frequently
for the target sign WISH, whose characteristic drop in the z-dimension was much more
robust among contexts for all of the signers, as illustrated above in Figure 5.12, which
184
shows motion-capture data for two of Signer 2’s sentences, I WISH DEER26 and I
WISH CUPCAKE. The z-dimension pattern for CUPCAKE shows three local minima,
the first of which corresponds to WISH and the other two of which are signatures of the
two contacts made downward by the dominant hand onto h2 in the articulation of
CUPCAKE. Notice also that the z-minimum for WISH in the DEER context is lower
than that in the CUPCAKE context, which is the reverse of the expected situation since
DEER is articulated at a higher location. It will be seen that in the sign study,
significant context-related differences tended to be in the expected direction; however,
contrary-to-expected results were also seen, a marked difference from what was
observed in the speech study.
26
The context signs DEER and RUSSIA were used in the version of the experiment in which Signers 1
and 2 took part, but as discussed earlier, the signing of RUSSIA proved problematic. Therefore, this pair
was excluded from later signing sessions and no other data for the DEER or RUSSIA contexts will be
presented.
185
Figure 5.13. Locations in 3-space of the seven context items, indicating the expected
direction of coarticulatory influence these items would have on preceding target signs.
For this left-handed signer, the location of BOSS is on the left shoulder instead of the
right, so coarticulatory effects in the context of that sign are expected to be in the
negative x-direction as well.
Table 5.6 gives average x-, y- and z-coordinates for WANT and WISH for this
signer in the various contexts in which they were examined, along with the results of
statistical testing of context-related significance between contexts. The layout of the
table and the accompanying diagrams is the same as was given in the group results
section in Table 5.5. One such tabular summary will be given for each of the four
signers who took part in the main sign production study. In these tables, significant
testing results between contexts are given in each case only for context pairs for which
a prediction could be made, as was done for the group results presented earlier. In these
cases, significance testing was performed using one-tailed t-tests, since the expected
direction of the effects was predictable using the relative locations of the context signs.
Figure 5.13 above shows the expected direction of coarticulatory effects for this signer,
who was left-handed.
Consistent with notation established previously in the spoken-language studies,
significant results are noted, with * = p<0.05, ** = p<0.01 and *** = p<0.001.27 A plus
sign + indicates a marginally significant result, with p<0.10. In cases where contrasts
27
In Chapter 2 on the spoken-language production study, statistical testing results were given without
Bonferroni correction for multiple tests so that differences among individual subjects and at various
differences would be seen more clearly; results here will also be given without Bonferroni correction.
The issue of false-positive results will be discussed presently.
186
other than those of interest (e.g. HAT vs. PANTS in the y-direction) were significant at
the p<0.05 level using two-tailed t-testing, the numerical values and testing outcome
are given in square brackets. Results inconsistent with predicted results (i.e. in the
contrary-to-expected, or dissimilatory, direction) are given if significant at the p<0.05
level or stronger. Such cases are indicated with parentheses around the star(s)
indicating significance; e.g. (**) indicates an effect in the counter-to-expected direction
that is significant at the p<0.01 level. Complete numerical results for Signers 2 through
5, with both means and standard deviations given for all contexts and sentence frames,
are given in Appendix G.
The numerical data for each sentence frame are accompanied by a diagram like
those presented earlier with the group results. In each diagram, contrast-item pairs that
were associated with significance testing outcomes of p<0.05 or better in at least one
dimension x, y or z in that sentence frame are joined by a red line. In the case of
marginal significance (p<0.10), items are joined by a green line, and significant
outcomes in the contrary-to-expected direction are joined by a dotted black line.
187
Context pair
HAT–PANTS
x
y
<red>–<green>
BOSS–CLOWN
BOSS–CUPCAKE
CLOWN–CUPCAKE
z
10.2 – 10.3
6.32 – 6.39
16.1 – 15.5
16.1 – 14.5
6.29 – 6.32
6.29 – 7.47 +
6.32 – 7.47 *
Distance 1, WANT target: I WANT (X)
8.38 – 8.77
8.38 – 9.67
8.77 – 9.67
188
Context pair
HAT–PANTS
x
y
<red>–<green>
11.7 – 12.1
BOSS–CLOWN
14.4 – 13.5
7.59 – 9.00
BOSS–CUPCAKE
14.4 – 14.6
7.59 – 8.22
CLOWN–CUPCAKE
9.00 – 8.22
Distance 2, WANT target: I WANT FIND (X)
Context pair
HAT–PANTS
<red>–<green>
z
13.3 – 11.7
x
y
11.7 – 14.1
11.7 – 12.6
14.1 – 12.6
z
13.2 – 11.0 *
13.3 – 12.8
BOSS–CLOWN
12.7 – 13.9 *
9.27 – 8.95
12.9 – 11.6
BOSS–CUPCAKE
12.7 – 12.9
9.27 – 8.58
12.9 – 11.8
CLOWN–CUPCAKE
8.95 – 8.58
11.6 – 11.8
Distance 3, WANT target: I WANT GO FIND (X)
189
I WANT GO FIND OTHER (X)
CLOWN
<green>
y
x
z
HAT
PANTS
<red>
CUPCAKE
BOSS
Context pair
HAT–PANTS
y
x
z
13.7 – 12.9
<red>–<green>
12.5 – 13.0
BOSS–CLOWN
11.5 – 12.4
9.85 – 8.56
11.8 – 13.6
BOSS–CUPCAKE
11.5 – 11.8
9.85 – 9.48
11.8 – 12.5
CLOWN–CUPCAKE
8.56 – 9.48
13.6 – 12.5
Distance 4, WANT target: I WANT GO FIND OTHER (X)
I WISH (X)
z
<red>
HAT
CLOWN
CUPCAKE
x
y
<green>
PANTS
BOSS
190
Context pair
HAT–PANTS
x
y
<red>–<green>
BOSS–CLOWN
BOSS–CUPCAKE
CLOWN–CUPCAKE
Context pair
HAT–PANTS
z
10.1 – 6.61
10.8 – 2.94
10.4 – 10.6
10.4 – 10.8
9.90 – 9.15
9.90 – 10.2
9.15 – 10.2
Distance 1, WISH target: I WISH (X)
x
y
<red>–<green>
BOSS–CLOWN
9.23 – 11.1
12.0 – 12.2
BOSS–CUPCAKE
9.23 – 10.4
12.0 – 11.8
CLOWN–CUPCAKE
12.2 – 11.8
Distance 3, WISH target: I WISH GO FIND (X)
9.40 – 8.89
9.40 – 9.21
8.89 – 9.21
z
10.1 – 12.4
12.2 – 13.0
13.0 – 13.1
13.0 – 12.2
13.1 – 12.2
Table 5.6. Numerical results for Signer 2, given in centimeters for the x-, y- and zdimensions, relativized and averaged for each context. Significant results are noted,
with * = p<0.05, ** = p<0.01 and *** = p<0.001. A plus sign + indicates a marginally
significant result, with p<0.10. In each of the accompanying diagrams, contrast-item
191
pairs associated with significance testing outcomes of p<0.05 or better are joined by a
red line; for marginally significant results (p<0.10), items are joined by a green line.
Out of the 60 context pairs tested, 33 showed differences that were in the
expected direction, more than would be expected by chance, but not significantly so.
The results shown in Table 5.6 for Signer 2 show no evidence of strong coarticulation
patterns, with only a few significant testing outcomes, a pattern that will be seen to hold
generally for all of the signers investigated in this study. One apparent reason for this is
the relatively large amount of variability in signing behavior on the part of these
signers, with respect to a range of items including signing speed, preferred lexical
forms, amount of reduction and place assimilation, and the relatively large variances
associated with the quantitative measures of spatial location. This was often so even for
particular signers articulating particular target items in particular sentence frames. The
consequence of all of this, from a statistical testing perspective, is an outcome quite
different from what was seen in the spoken-language study, where close-distance
effects were nearly ubiquitous among speakers and longer-distance effects were also
quite common.
192
5.3.2.3. Signer 3
Figure 5.14. Motion-capture data for two of Signer 3’s sentences, I WANT HAT I and I
WISH GO FIND BOSS I. The clear z-minima (circled in the figure) for WANT and
WISH are typical for Signer 3. The scale on the y-axis is in centimeters but the values
themselves are arbitrary. The x-axis indicates approximate time in seconds.
Signer 3 was a late learner of ASL, and completed the production task with
great enthusiasm. His signing, accordingly, had a somewhat more emphatic character
throughout the duration of the recording session than that of the other signers, and
showed little or no reduction of the target signs WANT and WISH. The local minimum
in the z-dimension characterizing those signs was therefore quite evident in almost all
193
trials, as Figure 5.14 above illustrates, and relatively few of this signer’s trials were
rejected (8 of 288, 2.8%).
Signer 3’s preferred form of the context sign CLOWN used a twisting motion
in front of the nose rather than non-rotational contact as in the form of the sign used by
Signer 1. Signer 3’s preferred form of CUPCAKE also incorporated rotation of the
dominant hand. In addition, this signer indicated that he would not have chosen
spontaneously to repeat the “I” pronoun at the end of the sentence. Since it was
desirable for individual sign forms as well as for whole sentences to be consistent
among study participants, Signer 3 was asked to use the precedents that had been
established by Signer 1. Signer 3 signed at a somewhat slower rate than the other
signers who took part in this study, but when reminded during the first break that
informal, relatively fast signing was acceptable for the purposes of the experiment, he
indicated he was in fact signing at a speed that was natural and comfortable for him.
194
Figure 5.15. An instance of Signer 3 signing BOSS with an upper-chest location.
An additional feature of this participant’s signing that became more apparent as
the task progressed was seen in the context sign BOSS, for which this signer frequently
used a location on the upper chest instead of on the shoulder, as illustrated in Figure
5.15. Presumably, this was a kind of reduction bringing the location of the sign closer
to neutral space, where preceding signs were formed; more discussion of this issue will
be given in Chapter 7. This signer was left-handed and so for him, as for Signer 2,
expected coarticulatory influence of BOSS on the target signs was leftward. Table 5.7
below gives a summary of significance testing between contexts for this signer.
195
Context pair
HAT–PANTS
x
y
<red>–<green>
BOSS–CLOWN
BOSS–CUPCAKE
CLOWN–CUPCAKE
z
13.8 – 14.5
16.1 – 16.8
6.15 – 8.42 **
6.15 – 6.50
21.2 – 15.2
21.2 – 22.6
15.2 – 22.6 +
Distance 1, WANT target: I WANT (X)
13.9 – 15.3 +
13.9 – 14.6
15.3 – 14.6
196
Context pair
HAT–PANTS
x
y
<red>–<green>
BOSS–CLOWN
5.53 – 6.77
17.7 – 17.3
BOSS–CUPCAKE
5.53 – 6.34
17.7 – 20.1 *
CLOWN–CUPCAKE
17.3 – 20.1
Distance 2, WANT target: I WANT FIND (X)
z
11.1 – 13.1
11.0 – 13.2
11.5 – 11.2
11.5 – 11.9
11.2 – 11.9
197
Context pair
HAT–PANTS
x
y
<red>–<green>
z
14.1 – 13.5
16.2 – 15.0 +
BOSS–CLOWN
8.52 – 8.86
10.4 – 8.58
13.9 – 13.5
BOSS–CUPCAKE
8.52 – 7.62
10.4 – 14.4
13.9 – 12.3
CLOWN–CUPCAKE
8.58 – 14.4 +
13.5 – 12.3
Distance 3, WANT target: I WANT GO FIND (X)
Context pair
HAT–PANTS
<red>–<green>
x
y
z
13.5 – 10.9 **
[ 20.0 – 11.2 * ]
13.1 – 13.1
BOSS–CLOWN
6.32 – 7.90
16.2 – 12.1
12.5 – 12.0
BOSS–CUPCAKE
6.32 – 5.77
16.2 – 19.5
12.5 – 12.0
CLOWN–CUPCAKE
12.1 – 19.5 +
12.0 – 12.0
Distance 4, WANT target: I WANT GO FIND OTHER (X)
198
Context pair
HAT–PANTS
x
y
<red>–<green>
BOSS–CLOWN
BOSS–CUPCAKE
CLOWN–CUPCAKE
z
3.84 – 3.07 +
4.09 – 1.89 *
12.1 – 11.9
7.39 – 7.49
12.1 – 10.9
7.39 – 6.86
[ 11.9 – 10.9 * ]
7.49 – 6.86
Distance 1, WISH target: I WISH (X)
5.46 – 4.07 (*)
5.46 – 3.59 *
4.07 – 3.59
199
Context pair
HAT–PANTS
<red>–<green>
x
y
[12.5 – 15.8 ** ]
BOSS–CLOWN
14.1 – 13.9
3.67 – 4.62 +
BOSS–CUPCAKE
14.1 – 14.8
3.67 – 2.96
CLOWN–CUPCAKE
4.62 – 2.96
Distance 3, WISH target: I WISH GO FIND (X)
z
9.24 – 11.0
7.39 – 11.5 (**)
9.54 – 9.28
9.54 – 9.90
9.28 – 9.90
Table 5.7. Numerical results for Signer 3, given in centimeters for the x-, y- and zdimensions, relativized and averaged for each context. Significant results are noted,
with * = p<0.05, ** = p<0.01 and *** = p<0.001. A plus sign + indicates a marginally
significant result, with p<0.10. Significant outcomes in the contrary-to-expected
direction are given in parentheses. In each diagram, contrast-item pairs associated with
significance testing outcomes of p<0.05 or better are joined by a red line; for
marginally significant results (p<0.10), items are joined by a green line; and for
significant outcomes in the contrary-to-expected direction, items are joined by a dotted
black line.
As with Signer 2, the results for Signer 3 do not much resemble those seen in
the speech study; for example, stronger effects are not particularly evident at closer
distances relative to greater ones, and there is a general lack of very strong (p<0.001)
effects. In addition, some evidence of dissimilatory behavior can be seen. Overall, this
signer showed trends in the expected direction in only 25 out of the 60 context pairs
that were examined, somewhat less than the 30 that would be expected by chance.
200
5.3.2.4. Signer 4
Figure 5.16. Motion-capture data for four of Signer 4’s sentences. This signer differed
from the others in having z-minima for “I” as well, so it is the second z-minimum in
each sentence (circled in the figure) that marks this signer’s articulation of target signs
WANT or WISH. The scale on the y-axis is in centimeters but the values themselves
are arbitrary. The x-axis indicates approximate time in seconds.
Signer 4, though not a native signer, learned ASL at an early age. Like Signer 3,
this participant tended to sign fairly slowly, which might be due to the fact that many of
this signer’s preferred sign forms were different from those used in this experiment.
Many of these differences appear to be due to dialectal variation; this signer is African
American and originally came from Texas. The most relevant of these signing
201
behaviors from a data analysis perspective was his signing of “I,” which he articulated
with an concave-down arc trajectory in the sagittal plane, resulting in a slight descent
during the last part of the sign. Therefore his motion-capture data had a local zminimum for “I” followed by another for the target sign, WANT or WISH, as
illustrated above in Figure 5.16. Therefore it was the second z-minimum in each
sentence that was taken as marking the target signs in the motion-capture data for this
signer.
Other notable aspects of this subject’s signing behavior included the following.
For CLOWN, this signer’s preferred form used a twisting motion at the nose like the
sign form preferred by Signer 3. This signer would normally finger-spell “cupcake.”
The signer’s sign for “pants” uses two consecutive motions in neutral space, each with
two hands facing, which trace out the shape of left and right pant legs; this is therefore
a completely different form from Signer 1’s. The signer would also normally use a
different sign for HAT, using two hands and a motion like that of putting on a cap.
Figure 5.17. Signer 4’s preferred form of the sign GO.
202
In addition, Signer 4’s preferred form of the sign GO differed from that of all
the other signers. As illustrated in Figure 5.17, he began his articulation of this sign
somewhat like other signers did, with “1” handshapes on both hands and with the palms
facing forward or to the side, but rather than moving toward a palms-down position
with the fingertips pointing forward, this signer moved his hands in a right-to-left
direction, ending with the fingertips pointing leftward. The signer was not asked to
conform his signing of this particular item to that of the other signers.
This signer also indicated that he would not normally choose to use the
resumptive “I” pronoun in these sentences. Despite these differences, only 4.5% of his
trials needed to be rejected because of signing errors or other issues. This signer’s
productions of WANT and WISH were almost always very clear in the motion-capture
data, as exemplified in Figure 5.16 above. A summary of significance testing results
between contexts for Signer 4 is given below in Table 5.8.
203
Context pair
HAT–PANTS
x
y
<red>–<green>
BOSS–CLOWN
BOSS–CUPCAKE
CLOWN–CUPCAKE
z
14.0 – 13.5
13.3 – 12.8
11.7 – 11.4
11.7 – 12.9
15.2 – 17.5
15.2 – 17.6
17.5 – 17.6
Distance 1, WANT target: I WANT (X)
11.8 – 11.9
11.8 – 12.4
11.9 – 12.4
204
Context pair
HAT–PANTS
x
y
<red>–<green>
13.4 – 12.1
BOSS–CLOWN
14.5 – 10.1 *
15.9 – 16.8
BOSS–CUPCAKE
14.5 – 11.6 +
15.9 – 17.8
CLOWN–CUPCAKE
16.8 – 17.8
Distance 2, WANT target: I WANT FIND (X)
Context pair
HAT–PANTS
<red>–<green>
z
11.6 – 13.2
x
y
13.2 – 10.4
13.2 – 10.6 +
10.4 – 10.6
z
10.4 – 8.03
10.9 – 10.4
BOSS–CLOWN
11.0 – 10.2
11.0 – 15.0
11.0 – 11.6
BOSS–CUPCAKE
11.0 – 9.64 *
11.0 – 7.96
11.0 – 9.63 *
CLOWN–CUPCAKE
15.0 – 7.96
11.6 – 9.63 ***
Distance 3, WANT target: I WANT GO FIND (X)
205
Context pair
HAT–PANTS
<red>–<green>
x
[ 9.31 – 7.57 * ]
y
z
9.57 – 9.57
10.3 – 9.80
BOSS–CLOWN
8.26 – 9.13
6.69 – 7.22
8.65 – 10.3 +
BOSS–CUPCAKE
8.26 – 10.1
6.69 – 8.74
8.65 – 8.81
CLOWN–CUPCAKE
7.22 – 8.74
10.3 – 8.81
Distance 4, WANT target: I WANT GO FIND OTHER (X)
206
Context pair
HAT–PANTS
x
[ 3.53 – 9.27 * ]
<red>–<green>
BOSS–CLOWN
BOSS–CUPCAKE
CLOWN–CUPCAKE
y
[12.1 – 8.63 ** ]
z
9.16 – 7.57
[ 11.6 – 9.08 * ]
8.81 – 7.45
8.73 – 5.07 +
8.73 – 5.84 +
13.0 – 12.2
13.0 – 12.7
12.2 – 12.7
Distance 1, WISH target: I WISH (X)
4.25 – 8.20 *
4.25 – 7.26
8.20 – 7.26
I WISH GO FIND (X)
<green>
CUPCAKE
y
x
z
PANTS
<red>
HAT
CLOWN
Context pair
HAT–PANTS
x
BOSS
y
<red>–<green>
BOSS–CLOWN
8.07 – 9.05
9.71 – 9.60
BOSS–CUPCAKE
8.07 – 9.43
9.71 – 9.77
CLOWN–CUPCAKE
9.60 – 9.77
Distance 3, WISH target: I WISH GO FIND (X)
z
4.17 – 5.70
4.80 – 5.84
4.00 – 3.82
4.00 – 5.06
3.82 – 5.06
Table 5.8. Numerical results for Signer 4, given in centimeters for the x-, y- and zdimensions, relativized and averaged for each context. Significant results are noted,
with * = p<0.05, ** = p<0.01 and *** = p<0.001. A plus sign + indicates a marginally
significant result, with p<0.10. In each diagram, contrast-item pairs associated with
207
significance testing outcomes of p<0.05 or better are joined by a red line; for
marginally significant results (p<0.10), items are joined by a green line.
Signer 4’s results show a relatively small number of significant outcomes,
although the number of instances of trends in the expected direction, 37 out of 60
context pairs, is significantly greater than would be expected by chance (p<0.05). In
addition, one distance-3 effect was the most strongly significant result seen in the sign
production study, and was the only one which would remain significant under a
Bonferroni correction (more discussion of the significance results as a whole will be
given later). Again, this pattern of results is unlike that of the speech study; here, no
significant distance-1 results for target sign WANT were found, and even the trends at
near distances were not always in the expected direction.
208
5.3.2.5. Signer 5
Figure 5.18. Motion-capture data for two of Signer 5’s sentences in the distance-4
condition with target sign WANT (i.e., sentence frame I WANT GO FIND OTHER (X)
I) are shown, with context signs CUPCAKE and HAT. Strong reduction of WANT has
resulted in the absence of the expected z-minimum, a fairly frequent occurrence for this
signer. The scale on the y-axis is in centimeters but the values themselves are arbitrary.
The x-axis indicates approximate time in seconds.
Signer 5 was another early, though not native, signer of ASL. Her natural
signing speed was rapid, and her signs were often greatly reduced, as illustrated in
Figure 5.18 above, which shows motion-capture data for two of this signer’s sentences
209
in which the characteristic z-minimum for WANT was not present. Video recordings
showed that the sequence “I WANT” as signed by this signer often appeared to be
combined into a single form, essentially a sign analog of spoken-language contractions
like “I’d” for “I would” (more discussion of this point is forthcoming). The same was
true to some extent of the sequence “I WISH,” though this did not occur as frequently
as for “I WANT.” Figures 5.19 and 5.20 below are frame-by-frame sequences showing
these sign contractions in two of this signer’s utterances. They were taken from the
longer sequences “I WANT GO FIND BOSS” and “I WISH <green>,” respectively.
In Figure 5.19, it can be seen that the “1” handshape is never attained for the
sign “I” and instead, the contact with the chest specified for that sign is made with the
hand already spread as for the following sign, WANT. In fact, the signer’s fingers
already appear to be anticipating the subsequent sign GO at the same moment, which
would make this a case of distance-2 HH coarticulation (or distance-1, if the contracted
form is counted as just one sign form). Also notable is the fact that only the dominant
hand was used in signing this entire utterance. Crucially, the reduced form of WANT
used here is articulated without the small lowering motion (i.e. z-minimum) used to
establish the location of that sign (as confirmed by an inspection of the motion-capture
data), meaning that this trial had to be rejected in performing statistical analysis. The
frame-by-frame sequence in Figure 5.20 shows a contracted form of the sequence “I
WISH.” Again, the “1” handshape for the sign “I” is completely absent, evidently due
to assimilation with the following “C” handshape for WISH.28
28
The sequence being signed here, “I WISH <green>,” required that the signer next reach down to flip a
switch as described earlier (the switch is not visible in the figure), in order to articulate the non-linguistic
context item <green>. In such cases, signers sometimes made one smooth descent downward during the
sequence “WISH <green>,” so that no z-minimum could be established for the target sign WISH.
210
Figure 5.19. Contracted form of the sequence “I WANT” signed by Signer 5, shown
frame by frame, going from left to right and from the top row to the bottom row.
211
Figure 5.20. Contracted form of the sequence “I WISH” signed by Signer 5, shown
frame by frame, going from left to right and from the top row to the bottom row.
212
In trials like these, the motion-capture data lacked the z-minimum characteristic
of the target signs WANT or WISH. Although this signer’s preferred sign forms did not
differ from those used in this experiment (though like some of the other signers she did
indicate that she would not normally sign these sentences with the resumptive “I”), her
signs showed much greater variation in whether and how much they were reduced,
often exhibiting much stronger reduction than was the case for the other signers.
Because of this, more trials were rejected for her than for the other signers; fully 30.6%
of her trials were rejected. This may well have weakened the statistical testing
outcomes for Signer 5’s data, which are presented below in Table 5.9.
Figure 5.21. At left, a form of PANTS often seen in Signer 5’s signing, with location in
neutral space instead of at the waist or legs. At right, a lowered form of HAT.
213
Also noteworthy is this subject’s signing of PANTS, which she often performed
with the expected handshape and movement, but in neutral space rather than at waist
level.29 This is illustrated in Figure 5.21 above, together with a form of HAT whose
location is at the cheek instead of the top of the head. Like Signer 3’s signing of BOSS,
these forms seemed to have resulted from signers’ efforts (whether conscious or
otherwise) to make efficient transitions between neutral-space signs and the following
context signs.
The existence of sign “contractions,” like those for “I WANT” and “I WISH”
just seen, raises the question of whether such forms might become regularized at some
point, just as contractions in English are permitted in some cases but not others (cf. “he
would,” “that guy would,” “you are not,” “I am not,” and the corresponding he’d, that
guy’d, you’re not or you aren’t, and finally I’m not but *I amn’t). A study examining a
larger number of contexts might find evidence of rules governing such forms as they
currently exist in ASL; the variability in their use here suggests that multiple factors are
likely to be involved (cf. the Lucas, Bayley, Rose & Wulf (2002) study, mentioned
earlier).
This complexity can be illustrated by looking at another instance of a reduction
of WANT by this signer. In this utterance, the relevant part of which is shown frameby-frame in Figure 5.22 below, she articulated the WANT in “I WANT FIND PANTS”
with palms-down orientation, having apparently assimilated the palm-down orientation
specified for the following sign FIND (and for PANTS as well). The signing of WANT
29
This signer was not asked to “correct” this behavior because it seemed to be a spontaneously-occurring
modification of the sign chosen for use in this study, rather than a different lexical item. Nevertheless,
because such articulations of PANTS by this signer did not have the “low” location needed to make the
HAT-PANTS contrast viable, such trials were excluded from the analysis. For the distance-1 condition
with target sign WANT, this left no PANTS trials available for analysis.
214
with palms down is visible in the rightmost two frames in the top row, between the
more-clearly articulated signs I and FIND shown in the preceding and following
frames. Interestingly, in the final frame shown in the sequence (which is not
consecutive with the earlier frames), the signer’s articulation of PANTS is shown, and
in this utterance, PANTS is not articulated in neutral space as it often was in this
signer’s other utterances.
215
Figure 5.22. Orientation assimilation of WANT in the sentence I WANT FIND
PANTS, visible in the third and fourth images on the top tow. The entire sequence is
ordered frame by frame from left to right, from the top row to the bottom row. The very
216
last image in the sequence (at lower right) is not consecutive with the preceding frame
and shows the location of PANTS in this utterance.
Context pair
HAT-PANTS
x
y
<red>-<green>
BOSS-CLOWN
BOSS-CUPCAKE
CLOWN-CUPCAKE
z
8.63 – n.a.
7.81 - 7.72
15.2 - 13.5
15.2 - 10.6 *
11.7 - 15.2
11.7 - 10.8
15.2 - 10.8
Distance 1, WANT target: I WANT (X)
9.75 - 7.46
9.75 - 8.85
7.46 - 8.85
217
Context pair
HAT-PANTS
x
y
<red>-<green>
BOSS-CLOWN
11.6 - 13.7
10.1 - 12.4
BOSS-CUPCAKE
11.6 - 12.2
10.1 - 9.11
CLOWN-CUPCAKE
12.4 - 9.11
Distance 2, WANT target: I WANT FIND (X)
z
11.8 - 10.3
13.8 - 8.82 +
12.6 - 9.91
12.6 - 11.5
9.91 - 11.5
218
Context pair
HAT-PANTS
x
y
<red>-<green>
15.9 - 11.7
BOSS-CLOWN
12.8 - 13.3
11.8 - 11.5
BOSS-CUPCAKE
12.8 - 11.6 +
11.8 - 8.39
CLOWN-CUPCAKE
11.5 - 8.39
Distance 3, WANT target: I WANT GO FIND (X)
Context pair
HAT-PANTS
<red>-<green>
z
11.7 - 12.5
x
y
10.7 - 10.6
10.7 - 10.7
10.6 - 10.7
z
12.8 - 9.98 +
14.6 - 12.0
BOSS-CLOWN
14.6 - 12.1 +
12.7 - 10.2
9.61 - 11.0
BOSS-CUPCAKE
14.6 - 10.5 *
12.7 - 8.89
9.61 - 8.09
CLOWN-CUPCAKE
10.2 - 8.89
11.0 - 8.09 *
Distance 4, WANT target: I WANT GO FIND OTHER (X)
219
Context pair
HAT-PANTS
x
y
<red>-<green>
BOSS-CLOWN
BOSS-CUPCAKE
CLOWN-CUPCAKE
z
6.87 - 6.18
5.30 - 6.28
13.2 - 11.0 *
13.2 - 12.3
13.8 - 12.2
13.8 - 13.1
12.2 - 13.1
Distance 1, WISH target: I WISH (X)
6.99 - 6.57
6.99 - 6.87
6.57 - 6.87
220
Context pair
HAT-PANTS
<red>-<green>
x
y
z
6.17 – 3.78 *
[11.7 – 9.24 * ]
3.82 – 7.44 (*)
BOSS-CLOWN
11.9 – 12.2
12.7 – 12.0
BOSS-CUPCAKE
11.9 – 11.0 *
12.7 – 11.4
CLOWN-CUPCAKE
12.0 – 11.4
Distance 3, WISH target: I WISH GO FIND (X)
6.88 – 5.42
6.88 – 4.66 *
5.42 – 4.66
Table 5.9. Numerical results for Signer 5, given in centimeters for the x-, y- and zdimensions, relativized and averaged for each context. Significant results are noted,
with * = p<0.05, ** = p<0.01 and *** = p<0.001. A plus sign + indicates a marginally
significant result, with p<0.10. Significant outcomes in the contrary-to-expected
direction are given in parentheses. In each diagram, contrast-item pairs associated with
significance testing outcomes of p<0.05 or better are joined by a red line; for
marginally significant results (p<0.10), items are joined by a green line; and for
significant outcomes in the contrary-to-expected direction, items are joined by a dotted
black line.
In 27 of the 60 context pairs that were examined for this signer, trends were in
the expected direction, slightly fewer than the 30 that would be expected by chance. As
with the previous signers, the effects that were seen were not particularly strong, nor
did they follow a clear pattern, such as being stronger at closer distances. In fact, effects
seemed relatively strong for this signer at distance 4, where two significant and two
marginally-significant results were obtained, and where trends in the x- and z-directions
were all in the expected direction. Overall, the results for this signer provide further
221
evidence of substantially different coarticulatory behavior between the speaker and
signer groups.
5.3.3. Other aspects of intersigner variation
A recurring theme in this sign study has been the relatively great amount of
variation among subjects in various aspects of their language use, even compared to
that seen in the spoken-language study. Among the signers who took part in this ASL
study, we have noted differences in age of acquisition of the language, preferred sign
forms and syntactic structure, and the occurrence of assimilations, reductions and
“contractions.” Here, some additional measures of intersubject variation will be
presented.
Table 5.10 gives a summary of some information discussed earlier, along with a
rough measure of signing speed for each signer30. While all signers made occasional
errors during the performance of the production task (e.g. reaching for the bottom
switch instead of the top one to perform the <red> context item), relatively few trial
rejections were associated with these sorts of occurrences for any of the signers. In
contrast, great variation was seen in different subjects’ signing with respect to place
assimilation and sign reduction, most of all for Signer 5. This type of variation did not
appear to be related to signers’ age of acquisition, repeated for convenience in Table
5.10, as Signers 3 and 4 both showed little or no such tendency even though their ages
of acquisition were quite different. On the other hand, signing speed does seem likely to
30
The measure of signing speed used here was the mean sentence-to-sentence interval when that duration
was under 5 s, over the entire set of sentences signed by a given speaker. While this includes intersentence pauses and hence overestimates the actual sentence duration somewhat, the resulting values
accord well with impressionistic judgments of which signers were faster or slower.
222
have been a relevant factor, since faster signers tended to have more reductions and
hence more rejected trials because of missing z-minima than slower signers.
Signer
1
2
3
4
5
Age of
acquisition
of ASL
Native
Native
Late
Early
Early
Percent of
trials
rejected
7.7
12.7
2.8
4.5
30.6
Errors or
false starts?
Few
Some
Few
Few
Some
Extreme
reductions/
contractions?
Some31
Some
No
No
Pervasive
Sentence-tosentence
duration (s)
3.6
3.8
4.2
4.0
2.9
Table 5.10. Some measures of intersigner differences.
It was noted earlier that paired t-testing was performed in the statistical
comparisons between contexts because of the possibility that a signer’s target point for
a sign articulated in neutral signing space might tend to meander, or “drift,” during the
course of the experiment. Here, a quantification of that drift is made. The measures that
are used are (1) the standard deviations over all trials of the x-, y- and z-values of the
target signs (WANT or WISH), and (2) the standard deviation and range of rolling 7averages of those x-, y- and z-values. The rolling averages are taken over seven trials in
order to smooth out differences related directly to the various context items. A
summary of these measures is given below for each signer in Table 5.11.
31
Such occurrences were sometimes seen for this signer in other pilot tests whose data are not presented
here.
223
Signer
SD in cm of
all trials
x
y
z
SD in cm of
rolling 7-average
x
y
z
Range in cm of
rolling average
x
y
z
1
2
3
4
5
2.6
2.7
3.8
3.6
2.6
1.4
2.0
7.5
5.0
2.7
1.9
2.9
4.4
3.6
3.6
2.2
1.8
3.0
2.0
1.4
0.9
1.5
6.1
3.7
1.3
1.4
1.8
3.8
2.9
2.6
7.5
7.3
10.8
9.2
5.9
4.4
7.2
22.4
14.4
6.3
6.3
8.0
14.1
11.5
11.1
Column
average
3.06
3.72
3.28
2.08
2.70
2.50
8.14
10.94
10.20
Table 5.11. Quantification of neutral-space “drift” for each signer, for all trials and for
rolling 7-averages.
The table shows that the neutral-space locations of the target signs did change
for each signer over the course of the experiment, typical variations for most signers
being on the order of a few centimeters or so. There were, however, noticeable
differences among signers in this regard. For example, Signer 3 showed much greater
variability, particularly in the y (front-back) direction, than the others. The subjects
whose signing was the slowest, Signers 3 and 4, showed greater drift over the course of
their signing sessions in all three spatial dimensions than the other signers did, perhaps
indicating that longer intervals between articulations of a given sign (or sign location)
are associated with a fuzzier articulatory target than would be so in the case of shorter
intervals.
In considering the possibility of neutral-space drift in this experiment, it was
assumed that such drift would be more or less random within the neutral-space region
for each signer, rather than progressing systematically in a particular pattern. Figure
5.23 below indicates that this was the case. The figure shows the rolling 7-averages of
224
the x-, y- and z-coordinates of the target-sign locations produced by three of the signers
over the course of the experiment. Data for each of the six trial blocks is labeled with
the sentence frame used in that block. Graphs are given only for Signers 3, 4 and 5
because, as was mentioned earlier, the version of the experiment taken part in by
Signers 1 and 2 had more frequent changes of sentence frame, making the calculation
of these sorts of rolling averages problematic. The fact that many of Signer 5’s trials
were rejected explains the somewhat sparser appearance of her data in the figure
relative to that of the other two signers.
225
226
Figure 5.23. Drift over time of target-sign location in neutral signing space in each of
the x-, y- and z-directions, during each of the six blocks of the production study, for
Signers 3, 4 and 5. The data represented are rolling 7-averages through the duration of
each block. The numerical values along the vertical axis are arbitrary and are given
only to indicate the scale of the drift.
5.4. Chapter conclusion
These two sign production studies investigated five signers and sought to
determine whether evidence of location-to-location effects of one sign on another might
be found, particularly across intervening signs. Since this sign experiment was
patterned after the speech study, where strong effects were quite common, it is
something of a surprise that the effects seen here were relatively weak. Many of the
227
speech-production effects were strong enough to remain significant after application of
the Bonferroni correction, which is not the case here.
However, there were more significant testing outcomes seen here than would be
expected by chance. With a total of 240 significance tests performed for the main study
(four signers * six sentence frames * ten tests for each), 12 results significant at the
p<0.05 level and 24 marginally significant outcomes (at the p<0.10 level) would be
expected merely by chance. In this study, the total numbers of such results were 20 and
36, respectively, either of which would be less than 5% likely in a random-chance
scenario. Therefore, it seems evident that (1) many if not most of these effects were
indeed real, but that (2) they were generally rather weak, were not pervasive and not
robust with respect to either context, distance or signer. As has been noted already, this
is markedly different from what was seen in the speech production study, where closerdistance effects were very strong and nearly ubiquitous among speakers, with a steady
decrease in effects as distance increased, and with longer-distance effects seen only in
speakers who had coarticulated strongly at closer distances. In addition, some evidence
of dissimilatory behavior, which was never seen in the speech study, was also found in
this sign study.
A partial explanation of the modality-specific differences observed here may be
found in earlier work on spoken-language coarticulation. Daniloff and Hammarberg
(1973) and Hammarberg (1976) asserted that anticipatory coarticulation was essentially
due to speech planning while carryover coarticulation was due to the inertia of the
speech articulators. Accordingly, anticipatory coarticulation would be most expected in
speech production contexts in which smaller, lighter articulators like the tip of the
228
tongue were involved; while larger, heavier articulators like the jaw or the tongue body
might tend to be associated more with carryover coarticulation because of their greater
inertia. While the true situation is evidently more complicated, the intuition behind
Daniloff and Hammarberg’s claim may be useful here. Since sign articulators are much
larger than those involved in speech, one might expect to find more carryover effects in
signed language than anticipatory effects. Since this project has focused exclusively on
anticipatory coarticulation, the absence of strong results in the sign studies would
therefore not be surprising.
On the other hand, this seems to contradict the results obtained by the Cheek
(2001) and Mauk (2003) sign studies discussed earlier, in which both anticipatory and
carryover effects were seen. However, the sign sequences used in Mauk (2003) had
target signs placed between context signs, not only before or after, so the LL effects
that were found in that study involved carryover influences as well. In the case of
Cheek (2001), significant anticipatory HH effects were found in various sequences
consisting of two signs each. Since the articulators involved in creating different
handshapes are smaller than those used for shifts in location, perhaps it should be
expected that anticipatory HH effects would be more prevalent than anticipatory LL
effects (recall that an apparent instance of anticipatory distance-2 HH coarticulation
was shown in Figure 5.19 earlier in this chapter). In light of these observations, a
logical follow-up to the present sign study would be one in which carryover LL effects
are targeted, using sign sequences whose first item has a location high, low or to the
side in signing space, followed by multiple neutral-space signs.
Another notable contrast between the speech and sign studies concerns the
229
temporal extent of the effects. While the VV effects seen in the speech study for the
speakers who coarticulated the most extended over ranges on the order of 400-500 ms,
the corresponding temporal distances for the sign data obtained to date are significantly
greater, on the order of 500-800 ms or more.32 While on the one hand, this might be
expected given the difference in mass of the articulators of the two modalities, the fact
that the articulation of signs is accordingly slower than that of speech, and so on, such
differences also indicate that at the phonetic level, the limits of production planning for
language in general—the temporal horizon, so to speak—might not be expressed in
absolute time units like milliseconds, but may instead be determined in relation to the
number of gestures in a given timeframe via some function of “gestural density.” This
suggests a deep-level similarity between modalities despite the obvious surface-level
differences, and appears to be paralleled at the syntactic level as well. Klima and
Bellugi (1979) found that the rate at which propositions are conveyed is essentially the
same in ASL and English, despite the apparent difference in the articulation rate of
signs compared to that of spoken-language words.
One of the more surprising results of the speech study was the lack of
correlation between speech rate and coarticulatory tendency. The smaller number of
subjects investigated in the sign study makes such a determination impossible here, but
for this group of signers, it also does not appear as though signing rate and strength of
32
The time range for speech is given for the speakers who showed significant effects at distance 3, with
measurements taken (1) conservatively, from the end of the distance-3 vowel to the beginning of the
context vowel, and (2) more liberally, from the midpoint of the distance-3 vowel to the midpoint of the
context vowel. The approximate averages of those sets of measurements result in the range stated above.
The time range for sign is taken from measurements obtained from the motion-capture data of
sentences where Signer 1 had distance-3 effects in the pilot study. The timepoints at which these
measurements were made were the location of the z-minimum for WANT and the z-value (maximum or
minimum) of the context sign. The ranges involved here were much more variable and the context-sign
z-values were somewhat ambiguous in the case of minima (e.g. for PANTS), resulting in the relatively
inexact range given here.
230
coarticulatory effects are correlated. Slower signers like Signers 3 and 4 had significant
effects which overall were not substantially different in magnitude or frequency from
those of other signers. Together with results of Mauk (2003), who did find that faster
signing rates for individuals were associated with greater undershoot, i.e. LL
coarticulation for adjacent signs, perhaps the best explanation for these collective
results is that (i) among speakers or signers, one’s normal rate of language production
does not predict strength of tendency to coarticulate, but (ii) for a particular individual,
production at various rates—slower or faster, relative to one’s own normal rate—might
tend to be associated with different amounts of coarticulation.
One goal of this project was to investigate to what extent coarticulatory effects
such as those found here for speech and sign are specifically language-based, rather
than reflecting processes of motor planning common to human actions in general. The
effects related to the non-linguistic context actions <red> and <green> were not
substantially different in magnitude or distribution from those seen with linguistic
context items, and as such, do not permit a straightforward distinction to be made here
between “non-linguistic” and sign-based “linguistic” coarticulatory effects. Instead, the
results obtained in the speech and sign studies conducted here indicate that if a
distinction is to be made, a more logical division would be between speech-based
coarticulation on the one side and coarticulation related to signing and other manual
actions on the other, given the differences in the frequency and magnitude of the effects
seen in the production studies presented in Chapters 2 and 5. Whether this is merely
due to the differences in mass and speed of the articulators used in performing these
various types of motion must remain an open question for now.
231
Even if sign-based effects are weaker in general than those associated with
coarticulation in speech, they may still be perceptually relevant. The following chapter
examines this issue by making use of an experimental paradigm similar to the one that
was used in Chapter 3 to investigate the perceptibility of coarticulatory effects in
speech.
232
CHAPTER 6 — COARTICULATION PERCEPTION IN ASL:
BEHAVIORAL STUDY
6.1. Introduction
As with the speech study presented in Part I of this project, a major goal of the
sign study was to investigate the perceptibility of coarticulatory effects at various
distances, with a particular focus on variability between language users. In the context
of signed language, such a study presented some challenges not encountered in the
speech study. For example, for the English-language study, schwa waveforms were
rather straightforwardly manipulated for the purposes of obtaining normalized audio
stimuli, something which was not feasible for sign-language video clips. Furthermore,
for the speech study, the easy availability of English-language users made it possible to
examine a relatively large number of speakers (seven) before a subset of recordings
were selected for use as perception-study stimuli for subsequent participants. In
contrast, for the perception study to be presented in this chapter, the relative difficulty
of recruiting ASL users made it a practical necessity to create stimuli and build the
perception experiment at a much earlier stage, so that as many subjects as possible
would be able to contribute both production and perception data. On the other hand,
one advantage of the paradigm that was used for this experiment was that knowledge of
233
ASL was not needed to perform the task, making it possible to recruit hearing nonsigners as well as deaf signers as subjects.
6.2. Methodology
6.2.1. Participants
Because the task was one for which knowledge of ASL was not strictly
necessary, both deaf signers and hearing non-signers, a total of 20 subjects in all, took
part. Four deaf signers participated; this was the same group who had taken part in the
sign production studies described in Chapter 5, with the exception of Signer 2, who
declined to participate.
All 16 hearing non-signers who took part were undergraduate students at the
University of California at Davis who received course credit for participating and were
uninformed as to the purpose of the study. Their ages ranged from 18 to 25, with a
mean of 20.2 and a SD of 1.9. Nine of the 16 were female. For this study, unlike the
spoken-language experiments discussed in earlier chapters, no special effort was made
to recruit only monolingual English native speakers, since the crucial distinction
between groups for the purposes of the present study was signer vs. non-signer.
6.2.2. Creation of stimuli for perception experiment
Recall that for the spoken-language perception study presented in Chapter 3, the
crucial distinction between the two main stimulus types in each distance condition was
“[i]-colored” versus “[a]-colored,” with stimuli grouped in blocks ranging from quite
234
easy (distance 1) through challenging (distance 2) to very difficult (distance 3). The
sign-language perception study was very similar in format, but the distinction that
participants needed to make was between “high” versus “low,” according to the
position on the body (head or waist area, respectively) of the context sign. The subjects
saw initial portions of sentences like those produced by subjects in the sign production
study, edited so that they contained only the sign sequence “I WANT,” where the
subsequent (unseen) context sign had been either a “high” sign DEER or HAT or a
“low” sign RUSSIA or PANTS. The position in neutral signing space of the neutralspace sign WANT was to be the subjects’ guide as to whether the upcoming context
sign was “high” or “low”; recall from the production study presented in Chapter 5 that
such coarticulation-related differences typically amounted to 1 to 2 centimeters. The
sentences were of form I WANT (X), I WANT FIND (X), or I WANT GO FIND (X),
corresponding to distance conditions 1, 2 and 3, respectively.
The tokens needed were sequences “I WANT” in the context of “high”- and
“low”-location signs in each of the distance conditions 1, 2 and 3. Since individual
video recordings might have quirks which could be used by subjects to determine their
distinctiveness independent of sign height, six recordings for each distance condition
and location value (“high” or “low”) were selected, to be presented interchangeably
during each presentation of that location value.
Each token consisted of the portion of a sentence containing the sequence “I
WANT,” beginning with the frame before the right hand started moving upward to
make the sign “I” and ending at the frame in which both hands had reached the
midpoint of the sign “WANT.” Two versions of each token were used: for the first, the
235
end boundary was the video frame during the articulation of WANT at which the
minimum z-value was reached (generally identifiable as the frame in which the least
movement-related blurring was seen), and for the second, one additional frame after
this was included, giving subjects a very small amount of extra information. This latter
will be referred to as the “extra-frame” condition. Therefore, the total number of tokens
used was
2 (“high” vs. “low” context) * 3 (distance-1, -2 or -3 condition)
* 6 recordings of each * 2 (with or without the extra frame) = 72.
The first participant of the Chapter 5 production study, Signer 1, was recruited
for the filming of video stimuli. She signed sentences of the form I WANT (X), I
WANT FIND (X), or I WANT GO FIND (X), where (X) was either DEER, RUSSIA,
HAT or PANTS. Each sentence frame-context sign combination was articulated 10
times, for a total of 120 signed sentences, some of whose initial sequences “I WANT”
were used as stimuli; the selection procedure that was used will be discussed presently.
The signer was videotaped so that appropriate passages could be edited for use
as stimuli, and the signer also provided motion-capture data. One issue that had to be
addressed was that in general, even when z-value differences between “high” and
“low” contexts were significant overall, substantial spatial variability within each
context meant that among the multiple articulations of the various “high”- and “low”context versions of WANT, some “low”-context z-values were higher than some
“high”-context ones. Despite this, it was desirable to insure that the set of “high” and
236
“low” tokens would have non-overlapping z-ranges, with all “low”-context
articulations of WANT lower than all “high”-context articulations of WANT, so that
each trial would be informative to subjects.
Therefore the following selection procedure was followed. For each context
sign in each distance condition, the ten articulations of WANT were sorted by z-value
from minimum to maximum. For “high” contexts, the articulation of WANT with the
maximum z-value was excluded and the next three highest-coarticulated versions were
used. For “low” contexts, the articulation of WANT with the minimum z-value was
excluded and the next three lowest-coarticulated versions were taken. When these
selections were made, for each distance condition the z-ranges of “high” and “low” did
not overlap, and the average differences between “high” and “low” versions of WANT
were in line with the results from the initial production study.33 The results of that
study, described in Section 5.2, established that anticipatory LL effects could be
expected as far as three signs away, with magnitudes on the order of 1 to 2 cm,
generally diminishing at greater distances. The context-related z-value differences in
the perception-study stimuli are given in the second column of Table 6.1 below.
Recordings were imported from digital videotape and converted to .avi files
using Final Cut Pro editing software. Some basic information concerning these video
stimuli is given below in Table 6.1. The longer durations for greater distance-condition
values appear to reflect slightly slower starts on the part of the signer in preparing to
sign the somewhat longer sentences associated with those distance conditions, but does
not introduce a confound since durations were not significantly different between the
33
There was one exception to this; to maintain the condition of non-overlapping z-dimension range, four
HAT and two DEER tokens, instead of three of each, were used for the distance-3 “high” condition.
237
“high” and “low” contexts. Statistical testing on the frame lengths of these sets of
tokens confirmed that for each distance condition, the token sets did not differ
significantly in duration between contexts, and therefore it was assumed that they
would not be distinguishable by subjects in that way. For all video stimuli, the final
frame was repeated five times in sequence to create a very brief (150 ms) “freezeframe” effect; this was done to make the mid-sentence termination of the stimuli less
jarring to subjects.
Distance
1
2
3
Mean height
difference (cm)
2.6
2.0
1.6
Number of frames
Duration (ms)
16.5, 17.3
17.5, 18.0
21.7, 21.7
551, 578
584, 601
723, 723
Table 6.1. The average z-dimension (height) difference between WANT in “high” and
“low” contexts and mean duration, expressed both in number of frames and ms for
“high” and “low” tokens respectively, for each of distance conditions 1, 2 and 3.
Duration information is given for the standard-length versions of the video clips,
without the additional frame at the end.
6.2.3. Task
All perception study subjects began by performing the production task
discussed in Chapter 5. Afterwards, they were given a brief introduction to the purpose
of this research. They were told that language elements, whether in speech or sign, can
affect each other over some distance, that people can sometimes detect this, and that
they were about to begin a task in which their own perceptual abilities were to be
tested: they would see short video clips taken from ASL sentences containing “I
238
WANT” followed by other signs, with half of the WANT signs preceding “high”location signs and half preceding “low”-location signs. So that they would not be
discouraged by the more difficult contrasts, they were told that subjects in such
experiments sometimes say afterwards that they feel they were answering mostly at
random when the contrasts were very subtle, but often turn out to have performed at
significantly better-than-chance levels. (This turned out to be the case in the present
study as well.)
Subjects were seated in a small (approx. 10 feet by 12 feet) room in a
comfortable chair facing a computer screen placed 36 inches away on a table 26 inches
high. The stimuli (stored as .avi files) were delivered using a program created by the
author using Presentation software (Neurobehavioral Systems), which also recorded
subject responses. To make their responses, subjects used a keyboard that was placed
on their lap or in front of them on the table, whichever they felt was more comfortable.
All subjects were given a very easy warm-up task about one minute long, during
which sentences of the form I WANT (X), where X was either a “high” sign (HAT or
DEER) or a “low” high (RUSSIA or PANTS), were played in the same ratio as their
counterparts in the actual task. Subjects were told to hit one response button when the
sign following WANT was “high” and a different button when that sign was “low.”
Handedness of response buttons was counterbalanced across subjects. The first several
sentences in this warm-up were completely unedited, so the task was very easy at first,
but as the warm-up progressed, more and more frames were removed from the ends of
the video clips so that the task became more difficult, more closely resembling the task
that would be required for the real experiment. After completing this warm-up, subjects
239
were told that the actual task would be the same in terms of pacing and goal (answering
either “high” or “low” by pressing the appropriate button), but more challenging.
The clips in the warm-up contained more information than would be present in
the actual task, including visible movement after WANT toward the context sign.
Subjects were alerted to this in advance, and told that during the actual task, the only
information they would have available would be the ending position of the WANT
sign, which would somewhat “high” or “low” in each case because of the upcoming
context sign, though these differences would in general be subtle. They were told that
the correct answers would be 50 percent “high” and 50 percent “low,” ordered at
random, so that if they were tempted, for example, to answer “high” ten times in a row,
that this was very unlikely to be correct and that would indicate it was necessary to
“tune in” to more subtle differences. Subjects were told that the experiment consisted of
three blocks, generally getting more difficult as the experiment progressed (see below),
but that each block was brief, about 2 minutes each, with optional breaks between
blocks.
In the spoken-language perception task presented in Chapter 3, the order of
blocks was not random, but rather proceeded in order from easy to difficult by distance
condition, in sequence 1, 2 and 3. The reason for this was that it was felt that subjects
might become discouraged if faced with an extremely difficult task early on, and so
were allowed to progress toward the challenging contrasts starting with the easier ones.
Subjects in that study indicated afterward that although the later stages were in fact
much more difficult, they basically understood the task because the earlier trials helped
them to understand the rhythm of the task and what was generally expected of them.
240
The same basic strategy was followed in this sign perception study, whose sequencing
is illustrated below in Figure 6.1. Subjects proceeded through three blocks, in distance
condition order 1, 2 and 3.
For the first eight subjects, who were all hearing non-signers, each of the 72
stimuli was presented once (24 per block). Preliminary analysis of those subjects’
performance indicated that the task was harder, even at distance 1, than the spokenlanguage study had been, but this was tempered by the fact that compared to the
spoken-language task, relatively few trials were being used. Therefore, for all subjects
after those initial eight, the duration of each block was doubled: all the stimuli were run
twice in each block, with the ordering completely random within each block, to provide
more trials per subject and hence more statistical power. To alleviate the difficulty of
the task somewhat, subsequent hearing subjects were told that the signer in the video
stimuli was right-hand-dominant, so that they would have a better chance of directing
their attention to the relevant part of each image in determining whether it represented
the “high” or “low” context. This information was not given to the deaf subjects since it
was assumed that they would become aware of it on their own, though perhaps only
implicitly.
241
Figure 6.1. Design of the perception task for the sign study. There were three blocks,
corresponding to distance conditions 1, 2 and 3 in that order. Each block consisted of
48 stimuli of the form “I WANT” (24 for the first eight hearing subjects), half of which
were taken from “high”-context sentences and half from “low”-context sentences, with
ordering random within each block.
6.3. Results and discussion
6.3.1. Perception measure
As with the spoken-language perception study discussed in Chapter 3, both
individual and group data results will be given here in terms of the d’ statistic (see
Section 3.3.1).
6.3.2. Group results
For the entire group of 20 subjects (deaf and hearing combined), overall d’
scores were significantly better than chance (d’ = 0.35, p<0.001); this was true for
distances 1 and 2 considered separately (respectively, d’ = 0.33, p<0.01, and d’ = 0.55,
p<0.001), but not for distance 3 (d’ = 0.16, ns). (These and other key d’ scores are also
presented below in Table 6.2, which in addition gives the individual subjects’ scores.)
242
Although the second group of hearing subjects had received additional
instruction and had performed the task for twice the length of time as the first group,
the d’ scores for the two hearing subgroups were not significantly different overall
(t(13.36)=0.64, p=0.54), or for any of the other conditions investigated (either by
distance or in the “extra-frame” condition). Therefore, the hearing subjects are treated
as a single group for the remainder of the group analyses.
The deaf and hearing groups did not differ significantly in their performance of
the task, either overall (t(3.39)=0.03, p=0.98) or for any of distances 1, 2 or 3, despite
the fact that half of the signers, Signers 3 and 5, showed highly significant outcomes, a
much better performance relative to the group size than that of the hearing group.
Interestingly, those two signers were not the earliest acquirers of ASL within the deaf
group, so perceptual sensitivity in this context does not appear to have been strongly
influenced by age of acquisition. However, both this result and the overall lack of a
significant difference in the deaf and hearing groups’ performance may be due to the
small size of the deaf group.
The “extra frame” condition was also investigated with paired t-tests; subjects
did not in general perform differently whether or not the extra frame was included. This
was true for the entire group of 20 subjects (t(19)=0.24, p=0.81) as well as for the deaf
and hearing groups considered separately (respectively, t(3)=1.31, p=0.28; t(15)=0.20,
p=0.85).
These group results indicate that this task was much harder, even at distance 1,
than the analogous spoken-language task. Recall that in that experiment, all subjects
performed at near ceiling levels in the distance-1 condition and about half also
243
performed at above-chance levels in the distance-2 condition. Rather than the easymedium-hard progression seen in that experiment, it appears that here, all three distance
conditions were quite difficult. Although the intention behind the design of the present
experiment was to follow an easy-medium-difficult progression by using stimuli with
progressively smaller position differences between the “high” and “low” contexts,
subjects actually performed better overall in the distance-2 condition than in either of
the others. The difference is statistically significant, as shown by paired t-tests
comparing d’ scores at distances 1 vs. 2 (t(19)=2.11, p<0.05) and 2 vs. 3 (t(19)=2.96,
p<0.01).
It may be the case that cues other than the location of the signer’s hand in the
last frame of the video clips, such as the hand’s trajectory beforehand, were somehow
more informative to subjects in the distance-2 condition (in which the sentence frame
was “I WANT FIND (X)”) than in the others. However, a close examination of the
stimuli used in the three conditions indicates that this is rather unlikely. All of the
stimuli consist of just the sequence “I WANT,” and offer no clear evidence of what the
upcoming signs in the sentence (or their location) will be. It seems more probable that
the balance between the tendency of subjects’ performance to improve with practice but
decline with increasing task difficulty combined to favor a better performance in the
second of the three blocks than in the other two.
In the next section, the individual performances of the 20 study participants are
examined.
244
6.3.3. Individual results
The performances of each of the 20 subjects who participated in this perception
study are presented below in Table 6.2. The d’ scores for distance conditions 1, 2 and 3
are given, following the pattern established earlier, from right to left. Also given in the
rightmost column are overall d’ scores, collapsed across the three distance conditions.
245
Subject
Distance 3
Distance 2
Distance 1
39 (Signer 1)
40 (Signer 2)
41 (Signer 3)
42 (Signer 4)
43 (Signer 5)
Deaf Average
0.00
no data
0.99*
-0.11
0.32
0.32
-0.23
no data
1.64**
-0.42
1.47**
0.69**
0.00
no data
0.54
-0.11
0.60
0.26
Overall
score
-0.07
no data
1.03***
-0.20
0.74**
0.41***
44
45
46
47
48
49
50
51
Average
-0.64
0.00
-0.64
0.00
0.76
1.17
0.67
-0.22
0.08
1.40*
0.21
0.21
0.43
0.76
0.67
1.18
1.35
0.74**
0.67
-0.43
0.00
1.11
0.00
0.00
0.67
0.43
0.31
0.43
-0.07
-0.14
0.53
0.63
0.61
0.83*
0.50
0.38**
52
53
54
55
56
57
58
59
Average
0.00
0.00
0.65
0.11
-0.23
0.32
0.32
-0.11
0.13
1.11*
0.10
0.46
0.34
-0.21
0.10
1.24*
0.44
0.42**
1.06*
1.05*
0.34
0.00
0.10
0.21
0.64
0.11
0.38*
0.62*
0.33
0.48*
0.15
-0.11
0.21
0.71*
0.14
0.31**
Hearing Average
0.11
0.52***
0.36**
0.33***
Overall Average
0.16
0.55***
0.33**
0.35***
Table 6.2. The table shows each subject’s d’ measure for each of the three distance
conditions. Significant results are shaded for individual subjects and labeled, where * =
p<0.05, ** = p<0.01 and *** = p<0.001.
While some participants scored at above chance levels, with Subjects 41, 43, 52
and 58 foremost among them, the results for individual subjects confirm that this task
246
was much more challenging than that used in the spoken-language perception study.
For instance, in that study, a number of individuals were able to perform at abovechance levels in all three distance conditions, and all subjects attained d’ scores in
excess of 2.50 for distance 1. For the present experiment, results for individuals were
weak in general; for example there were no strongly significant results (p<0.01) at
distance 1, even for the signers. The significance of the group results at distances 1 and
2 shows that at least for some subjects were able to perform the task with a degree of
success, but the significance of those results is not matched by the weaker individual
outcomes, a situation quite different from the spoken-language study.
However, it would be premature to use these results to make general
comparisons between the spoken and signed modalities with respect to the
perceptibility of coarticulatory effects. In part, this is because the outcomes of this
particular study cannot be assumed to represent the totality of individuals’ sensitivity to
coarticulation. Additionally, we have already seen in Chapter 5 that the analogy drawn
earlier between schwa in vowel space and location in signing space was imperfect at
best, and so it is likely that perceptual effects related to those items are not directly
comparable.
Like the spoken-language study, however, the results obtained in the present
experiment speak to the variability between subjects in coarticulation-related tasks. A
few subjects in the present study reported feeling fairly confident of having done well,
while others felt they had responded at random in the overwhelming majority of the
trials. Such outcomes did not correspond particularly closely to subjects’ knowledge of
ASL; on the contrary, Signer 3, who performed best, was a late learner of ASL and
247
Signer 5, who also performed at well above-chance levels, was also not a native signer.
On the other hand, Signer 1, a native signer, and Signer 4, who learned ASL in very
early childhood, did not perform at better-than-chance levels.
6.3.4. Relationship between production and perception
For the subjects who participated in the experiments discussed in Chapters 2
and 3, the production and perception of VV coarticulation in the contexts examined
were found not to be correlated with each other. Such a determination is much more
difficult to make here, since the number of signers investigated was relatively small.
One individual, Signer 3, did appear to have the most strongly significant results in
both the production and perception studies, but the overall weakness of the results of
the sign production study makes such determinations difficult for the group overall, and
clearly, more signers would have to be tested for a meaningful comparison to be
possible.
6.4. Chapter conclusion
The results of this study indicate that location-to-location effects of the
magnitude seen at various distances in the initial sign production study are likely to be
perceptually salient at least some of the time, and are therefore likely to be relevant for
users of signed language just as coarticulation appears to be so for users of spoken
language.
248
This project has been largely designed around the idea of creating parallel
production and perception studies of speech and signed language, but the logical
successor to this chapter, a sign equivalent to Chapter 4 using ERP methodology, is not
straightforward (and may not in fact be feasible) at present, in part because the MMN is
evoked by audio stimuli. It is believed that there is a visual analog of the MMN (Alho,
Woods, Algazi & Näätänen, 1992; Tales, Newton, Troscianko & Butler., 1999; for
reviews, see Pazo-Alvarez, Cadaveira & Amenedo, 2003; Maekawa, Goto, Kinukawa,
Taniwaki, Kanba & Tobimatsu, 2005; and Czigler, 2007), so it is conceivable that a
sign-language experiment could be carried out, targeting the visual MMN, but before
examining sub-phonemic contrasts such as the coarticulatory effects examined in the
present study, a logical prior step would be to determine whether deaf subjects produce
an MMN-like response in the context of a oddball paradigm involving phonemic
contrasts. Since the question of what counts as a phoneme in sign languages is not a
settled question, even this first step would itself require some deliberation and, most
likely, trial and error. Therefore, although such a venture could be intriguing, it would
require laying a substantial amount of groundwork that is beyond the scope of the
present project.
249
CHAPTER 7 — DISCUSSION & CONCLUSION
7.1. Models of spoken-language coarticulation
Long-distance coarticulation like that seen in the spoken-language study
presented in Part I of this dissertation has implications for models of spoken-language
coarticulation (for an overview of such models, see Farnetani & Recasens, 1999, and
more generally Hardcastle & Hewlett, 1999). For example, in a coproduction model
(Fowler, 1983), the articulatory gesture(s) associated with a given speech segment have
a more-or-less fixed temporal duration which may overlap with those of neighboring
segments, as seen earlier in Figure 2.1 (repeated below as Figure 7.1), the resulting
output at a given moment being an interpolation or averaging of the gestures associated
with the segments in play at that time. A key prediction of such a model is that since
each gesture’s temporal duration is limited, its temporal range of influence on its
neighbors should have a rather small upper bound. As was noted earlier, long-distance
production results such as those seen in the current study appear inconsistent with this
last assertion. It has been seen here that from both a production and a perception
standpoint, VV coarticulatory processes may be relevant over at least three vowels’
distance, and hence across as many as five intervening segments. These results would
seem to pose a problem for any model of coarticulation not allowing for considerable
range of influence of segments on one another.
250
Figure 7.1. Fowler’s (1983) model of VCV articulation. The dotted vertical lines in the
second picture indicate points in time where acoustic prominence shifts from one sound
to the next, and where listeners are therefore likely to perceive a segment boundary.
The overlapping solid lines represent the physical articulations of each sound, which
can occur simultaneously.
In contrast, the Window model (Keating, 1988, 1990a, 1990b) uses an approach
more akin to feature spreading; gestural targets associated with linguistic segments are
“windows” (which are ranges, not points), through which paths are traversed over the
course of an utterance. The gestural paths between windows associated with successive
segments are achieved through interpolation, and such interpolation may stretch over
long distances in cases where intermediate segments are underspecified for a feature.
Since this model makes no specific predictions about the limits of long-distance
coarticulation, it may be considered more compatible with the kinds of long-distance
results obtained in this study than the coproduction model is. However, since the
251
production data showed coarticulation occurring almost universally across [k] and
frequently across [t], the idea raised by Öhman (1966) and implicit in the Window
model, that VV coarticulation should be largely blocked by consonants requiring more
deliberate or careful use of the tongue body (as opposed to labials like [p]), may be
somewhat overstated.34 The crucial factor here appears to have been the especially
large susceptibility of schwa to coarticulatory influence, regardless of the consonant
context.
Much of the difficulty in this area of research stems from the fact that
coarticulatory patterns seem to a large extent to be idiosyncratic, varying greatly from
speaker to speaker. Some of this probably has to do with the relative freedom speakers
have in producing a given speech sound, exemplified in studies in which speakers
successfully articulate particular vowels in spite of physical obstacles like bite blocks
(Fowler & Turvey, 1980; Lindblom, Lubker & Gay, 1979). A more complete analysis
will almost certainly be dependent on recruiting sizable groups of subjects, and in
addition may require more sensitive measures for production and perception. For
instance, an examination of Speaker 7’s numerical production data (see Appendix A)
shows that the standard deviations associated with his vowel formant values tended to
be smaller than those of other speakers; even if two speakers have similarly-sized
vowel spaces, one speaker may “hit the targets” more accurately than the other. A good
coarticulation measure may need to take this kind of variation into account (see Adank,
Smits & van Hout, 2004). A better production measure would lead to a better
quantification of the production/perception relationship as well.
34
For further work along these lines see, for example, Recasens, Pallarès & Fontdevila (1997) and
Modarresi, Sussman, Lindblom & Burlingame (2004).
252
Flemming (1997) mentions VV coarticulation in a discussion in which he
argues that phonological representations by necessity contain more phonetic
information than has traditionally been assumed; his goal is a “unified account of
coarticulation and assimilation.” Since coarticulatory effects at various distances are
perceptible in some or many cases, as the perceptual study presented in Chapter 3
makes clear, a complete account of this phenomenon will be a complicated undertaking
indeed, given the variation we see among speakers and listeners.
As an example of the subtleties involved, consider the symbols [əi] and [əa] that
have been used in earlier chapters to represent [i]- and [a]-colored schwas regardless of
the distance involved; recall also that all listeners in the Chapter 3 perception study
were sensitive to the differences between such sounds, at least sometimes. It is wellestablished that listeners can adapt very quickly to the speech patterns of the speakers
they hear; for example, Ladefoged and Broadbent (1957) found that listeners’
perception of a test word of form [bVt] was influenced by the F1 and F2 of the vowels
in a preceding carrier phrase. Perhaps part of the accommodation that listeners make
when exposed to a particular speaker includes becoming sensitive to that speaker’s
coarticulation patterns, including coarticulatory tendency at various distances.
Such possibilities might require models recognizing differences among items
such as [ia] (carryover [i]-coloring of [a]) versus [ai] (anticipatory [i]-coloring of [a]),
1
2
or more generally [əi ] (distance-1 anticipatory [i]-coloring of schwa), [ ao] (distance-2
carryover influence of [a] on [o]), and so on, even if only a subset of listeners is
253
sensitive to such subtleties.35 The implications of multiple simultaneous effects of
different neighboring segments on a single segment might also need to be considered;
recall the production results in the present study which appeared to show simultaneous
VV and C-to-V effects.
7.2. Incorporating signed-language coarticulation
Complex issues like those just discussed are also relevant for the study of
signed language. The results seen in the sign production study presented in Part II of
this dissertation, though generally not as decisive as those seen for spoken language,
nevertheless appear to show that long-distance coarticulation can be found in sign
language, that such effects are perceptible in many cases and as such are likely to be
relevant for issues like lexical processing. However, to the best of my knowledge, no
models of sign language production dealing specifically with coarticulation have been
developed at all, no doubt in part because the field of sign-language linguistics is so
much newer than that of spoken-language linguistics.
Assuming that speech and sign are two manifestations of a single human ability
(it seems unlikely that the two have somehow evolved independently), it should be
possible to describe language production and perception—and in particular, the
existence of coarticulation, long-distance and otherwise—in a way that applies to both
modalities. Indeed, the same basic issues raised earlier in comparing the coproduction
35
In the light of such possibilities, it should be noted that if anything, the method used in this project for
investigating perceptual sensitivity to coarticulatory effects probably underestimates perceivers’
sensitivity, since here the stimuli were excised from the contexts in which they had originally appeared
and were played in isolation.
254
and Window models—gestural overlap and gestural targets, to what extent different
gestures can influence each other, and so on—are relevant for both speech and sign.
This suggests that it might be possible to create sign-language analogs of existing
speech models.
However, if speech and sign are indeed two manifestations of one human
language capacity, then any model or theory that applies to one modality or the other
but not both is in some sense incomplete. Ultimately, any model or theory that aims to
describe the human language capacity in general will have to incorporate both spoken
and signed language, but current approaches tend to have serious limitations along
these lines. For example, phonological constraints in Optimality Theory (Prince &
Smolensky, 1993) are generally expressed in modality-specific terms, such as *NC (“no
nasal-voiceless obstruent sequences”; see Pater, 1999), yet are often claimed to be
linguistic universals. The existence of languages like ASL shows that forms of human
language are possible in which such constraints are irrelevant, just as constraints that
have been proposed in sign-language research—e.g. the Selected Fingers constraint
(Mandel, 1981; Corina & Sagey, 1989)—are unlikely linguistic universals if they
cannot be applied in the vast majority of languages.
This implies that the “alphabet” in which phonological constraints in OT or
other constraint- or rule-based approaches are written needs to be general enough to
make statements about consonants, vowels and tones as well as sign locations,
handshapes and movements. In other words, the “alphabet” of constraints may be
universal cross-linguistically even though it seems that constraints themselves are not.
A useful way forward is offered by Articulatory Phonology (AP; Browman &
255
Goldstein, 1986), which provides exactly this kind of universal “alphabet,” expressed in
terms of gestures and their relative timing. Although the approach taken in AP seems at
first glance to be quite different in spirit from OT, the two approaches are not
inconsistent (Gafos, 2002) and have been usefully combined, for example, to provide
accounts of so-called “intrusive vowels” (Hall, 2003) and the behavior of rhoticconsonant clusters in Spanish (Bradley & Schmeiser, 2003; Bradley, 2006) and
Norwegian (Bradley, 2007) more successfully than traditional OT approaches.
By expressing the rules of language in terms of gestures, one is able to state
constraints in a way that is general enough to account for phenomena seen in both the
spoken and sign language modalities. In addition, constraints expressed in more
traditional terms, such as *NC in spoken language or the Selected Fingers constraint in
sign language, are not rendered invalid or obsolete as a result, but rather can be
considered a kind of convenient shorthand for a corresponding gesture score. This may
in some sense be analogous to the situation of various computer languages like Pascal
and Fortran, which are underlyingly binary, but are not generally dealt with directly in
that way by the people who use them because it is inconvenient to do so.
A further question which must then be raised is whether a system thus capable
of incorporating both sign and speech can be general enough to include linguistic
phenomena but not so general that it includes other facets of human behavior, such as
locomotion or dancing. On the other hand, is it possible that any system capable of
successfully modeling both speech and sign must necessarily incorporate more human
actions than just linguistic ones? Research on mirror neurons and their properties
(Rizzolatti, Fadiga, Gallese & Fogassi, 1996; Arbib & Rizzolatti 1997; Rizzolatti &
256
Arbib 1998) and related work by Arbib (2005) on the origins of language, positing
intricate links between non-linguistic actions and both signed and spoken language,
suggest that indeed, a strict segregation between “linguistic” and “non-linguistic”
human actions may not be feasible.
7.3. Cross-modality contrasts
Even if sign and speech can be modeled together under a single unified system
along the lines described above, this obviously would not imply that there are no
important differences between the two modalities. Some of these differences have
emerged in the present research project. For example, a comparison of the temporal
extent of coarticulation in sign and speech indicates that there are substantial crossmodality differences in the temporal extent of these effects. This may be considered
unsurprising on the one hand, since signing requires the movement of larger, slower
articulators, but it does have implications for language planning, as discussed earlier in
the dissertation, in that it suggests the units of production planning at the phonetic level
seem likely to be expressed in terms of gestures or related units, rather than on an
absolute time scale such as number of milliseconds.
The difficulty of determining suitable analogs between speech and sign extends
also to the basic terminology used throughout this project. For example, “distance-2”
effects were seen in both the speech and sign studies discussed in earlier chapters, and
in both cases seemed to represent something of a threshold for perceptual sensitivity.
This apparent similarity between modalities is belied by the fact that it is not at all clear
257
that the “distance” metric is comparable in both cases, since it refers to vowels in one
case (with intervening consonants not counted) and sign location in the other (locations
for each intervening sign being counted toward the total). Therefore, in segmental
terms, the extent of VV and LL effects were probably not being measured consistently,
but in syllabic terms they probably were.36
A contrast that stands out particularly strongly in this study’s examination of
speech and sign is the greater variation that was apparent in the ASL data, seen both
with respect to different subjects—for example in their ages of acquisition of ASL and
in differences apparently related to their regional/dialectal backgrounds—and in the
behavior of individual signers. Certainly there is variation among users of American
English, but in common words like those used in the production-study sentences, such
differences might be expected to manifest themselves as relatively subtle variations
within individual lexical items (for example the use of [a] by some speakers in some
words where other speakers might use a vowel closer to [ɔ]), whereas the signers
studied here often used completely different signs for items that might be considered
relatively common, such as for “hat,” “cupcake,” and “pants,” and differed significantly
in other ways, such as in the degree of acceptability of the “I WANT … (X) I” sentence
frame with or without the final “I.” Individual signers exhibited other behaviors that
are interesting from a phonetic/phonological point of view, such as Signer 5’s frequent
modification of PANTS, in which she maintained the usual handshape and movement
of the sign but substituted a neutral-space location (see the left side of Figure 7.2
36
Whether or not either of these assertions is defensible, of course, depends on how (or even whether)
segments or syllables can be defined in sign language, something that is as yet not settled.
258
below), and Signer 3’s signing of BOSS in the front of, rather than on top of, his
shoulder. Neither of these behaviors were observed for other signers in this study, but
the fact that they were made at all may have significant implications, for the following
reasons.
Figure 7.2. Location assimilation in PANTS (left) and a contracted form of “I WANT.”
Both Subject 5’s assimilation of PANTS to a neutral-space position and Subject
3’s signing of BOSS on the front side of the shoulder—i.e. closer to neutral space—
argue against the idea that neutral signing space might be a kind of underspecified sign
location. In these cases, it appears that non-neutral space signs were being shifted
toward neutral space to facilitate signing with the preceding signs WANT, GO, and/or
FIND, which appeared consecutively and were of course frequently signed by
participants during the course of the production experiment. This is quite unlike the
259
behavior of subjects in the spoken-language study with respect to schwa; no speakers
substituted schwa for [i] or [a] in the words “key” or “car,” despite the numerous
repetitions of these words required in the speech production study, nor would we expect
a fluent English speaker to do so. None of these situations, either in English or ASL,
involved weak prosodic contexts like those in which such reduction might be expected.
This difference is also relevant to the earlier discussion about the possibility that
schwa and neutral space might operate in an analogous fashion in English and ASL in
terms of articulatory behavior. The results obtained here indicate such a comparison
may be reasonable up to a point, but that the relationships between schwa and other
vowels on the one hand, and neutral signing space and other sign locations on the other,
are not parallel. In Signer 3’s articulation of BOSS and particularly in Signer 5’s
articulation of PANTS, where it was expected that neutral-space signs would have their
locations shifted outward from the middle of the articulatory space as schwa does in
English, signs with locations in the outer regions of the articulatory space had their
locations moved to a more central position, toward neutral signing space. In terms of
articulatory ease, the signing behavior so observed is certainly not inexplicable, but it is
substantially different from what was observed in spoken language with schwa and the
context vowels that were examined. These context vowels, articulated near the four
corners of the vowel quadrangle, did not assimilate to a more central, schwa-like
position for any of the 38 speakers who took part in the speech production study.
The use of the term assimilation rather than coarticulation in connection with
Signer 5’s signing of PANTS in neutral space is apt, as the location change was never
gradient over the course of the sign and in each case was either completely present or
260
absent; i.e. Signer 5 always signed PANTS either with contact on the lower torso or in
its customary position in the middle of neutral space, never between. A representation
of this location assimilation using the Hand-Tier model (Sandler, 1986, 1987, 1989; see
Chapter 1) is given below in Figure 7.3. At left is depicted a neutral-space sign (like
those which always preceded PANTS in this study), whose location in the Hand-Tier
model is specified as having trunk as major body location, with a height setting of
“mid” and a distance setting of “distal.” The sign PANTS, depicted at right, has the
same major body location but different underlying location settings. The location shift
illustrated here is expressible as a feature value change and hence fits the definition of
assimilation that was given in Chapter 1.
This situation, in which a non-neutral space sign acquires neutral-space location
as the result of assimilation, is complemented by some realizations of the sequence “I
WANT,” such as the one depicted at right in Figure 7.2. As noted in Chapter 5, this
sequence was often contracted by Signer 5, resulting in a single sign form which as
represented in the Hand-Tier model shares most of its feature specifications with “I”
but has the same hand-configuration as WANT. In this case, the resulting sign form
loses the neutral-space location specified for WANT in favor of the torso contact
specified for “I.”
261
Figure 7.3. Hand-Tier representation of Signer 5’s variant of PANTS showing
assimilation to the location of a preceding neutral-space sign.
The general pattern seen in the data obtained in this project was that the LL
coarticulatory effects seen in ASL were weaker than the VV effects found in English,
apparently due in part to the greater overall variability seen in the sign data relative to
the spoken-language data. Despite these differences, it would be premature to make
broad generalizations from them about cross-modality contrasts. One reason for this is
that the sign study focused only on sign location, while coarticulatory effects involving
other parameters are likely to be relevant as well, for both production and perception
(cf. Cheek, 2001; Corina & Hildebrandt, 2002; Mauk, 2003). Since the contexts in
which data were obtained here were also by necessity relatively limited, only a thin
cross-section of the phenomenon of sign-language coarticulation could be explored.
262
In addition, in the experiments carried out for this project, the sign task was
much longer than the speech task. This was so because the distance-1, -2, and -3 vowels
appeared consecutively in the English sentences that were used, something that was
possible because of the ease with which each vowel’s auditory properties could be
analyzed computationally. In contrast, neutral-space signs are not so easy to investigate
in general with the methodology used here, because not all such signs have signatures
that are clear when analyzed using motion-capture technology. It was the easily
distinguished movement characteristics of the sign WANT that made that sign a
desirable object of study, but different sentences were needed to create the various
distance conditions, since the same sign could not be repeated consecutively in any one
(meaningful) sentence. Therefore the signers who participated had to articulate many
more sentences in total (on the order of 300 instead of 30 as in the spoken-language
task), which might itself have led to modality-independent differences in articulatory
behavior.
A different issue is the possibility that the presence of denser or richer
information in the visual signal than in the auditory signal means that there is less need
for sign-language users to be attuned to extra cues in the language signal, either in the
process of producing that signal or while acting as perceivers. In other words, to the
extent that consistent, perceptible coarticulatory patterns may assist the perceiver,
perhaps this is simply not as necessary in the visual modality. A related issue is that the
reductions seen here are relatively unlikely to result in confusable homophones in ASL,
at least in the contexts that were considered in this study. For example, PANTS
articulated in a neutral-space position or BOSS articulated on the front of the shoulder
263
are unlikely to be confused with other signs, while in contrast, a [k] plus schwa
sequence, with schwa a reduced version of some full vowel, could have originated as
either “key,” “coo” or “caw”; hence there is a greater need in the English contexts for
such full vowels not to be reduced, to the extent that language users prefer to avoid
such ambiguities. Therefore there may be greater freedom for ASL users to modify sign
location than there is for English speakers to modify articulatory-space positions of the
vowels they produce.37
7.4. Dissertation conclusion
This project examined the extent and perceptibility of long-distance
coarticulatory effects in both spoken and signed language, and the degree to which
these vary among language users. The speech study found that anticipatory VV
coarticulation can occur over at least three vowels’ distance in natural discourse, and
that even such long-distance effects can be perceived by some listeners. Both
coarticulatory strength and perceptual sensitivity varied greatly among study
participants.
The ERP study found significant results for nearer-distance VV effects
but not for the longest-distance ones. The possibility of interplay between
coarticulatory production and perception was investigated, but no significant
correlation between the two was found. Speaking rate and coarticulation strength were
also found not to be correlated.
37
The issue of homophone avoidance has long been discussed in linguistic research (e.g. Gilliéron,
1918), though its importance may have been somewhat overstated at times, given how often homophony
does in fact occur. Issues related to homonymy are relevant in sign language research as well; for
example, see work by Siedlecki and Bonvillian (1998) on the production of homonyms by children
acquiring ASL.
264
The sign language production study found evidence of long-distance LL
coarticulation, but these effects were in general weaker and less pervasive than the VV
effects found in the spoken-language study. The signers who took part showed a great
deal of variation in their sign production, probably more than was found between
speakers in their speech production. Also incorporated into the sign production study
was an attempt to compare “linguistic” and “non-linguistic” coarticulation. The effects
that were found in the “non-linguistic” contexts were similar in terms of spatial
magnitude as well as statistical significance to those found in the context of actual
signs, and were weaker than the effects found in the speech production study. The sign
language perception study found that LL coarticulatory effects were perceptible to
some signers as well as to some hearing non-signers. However, results here were also
generally weaker than those in the spoken-language perception study.
A number of factors have been examined which individually or collectively
may explain these cross-modality contrasts. Such factors include the differences in the
mass of the articulators involved and in the media in which the relevant linguistic
information is carried. If we accept that speech and sign are two manifestations of a
single human language capacity, cross-modality studies such as this one will prove
indispensible as we seek to understand the complexities which underlie the
phenomenon as a whole.
265
APPENDIX A
Formant values for first production experiment:
3 distance conditions, 2 vowel contexts, 20 speakers
The following tables show the average target-vowel F1 and F2 values for each
speaker in each distance condition and vowel context obtained in the first production
experiment, together with the associated standard deviations. Note that the “a” and “i”
labeling refers to context, not the measured vowels themselves, which were always
schwa or [^]. Note also that these measurements were made near the end of the target
vowels, where coarticulatory influence of the context vowel was expected to be
strongest, and where effects of the immediately following consonant appear to be seen
as well. Therefore these formant values should not be expected to correspond too
closely to the values one would obtain in the steady-state portion of a schwa vowel.
The first, second and third tables show results for the distance-1, -2 and -3
conditions, respectively. Significant results are noted, where * = p<0.05, ** = p<0.01
and *** = p<0.001. A pictorial representation of the results for each subject is given in
Appendix B.
266
Table A1. Formant values of distance-1 target vowels in [a] vs. [i] context
Speaker
1 (f)
2 (m)
3 (f)
4 (f)
5 (f)
6 (m)
7 (m)
8 (f)
9 (m)
10 (f)
11 (f)
12 (m)
13 (m)
14 (f)
15 (f)
16 (f)
17 (m)
18 (m)
19 (m)
20 (f)
F1
mean
431
408
359
527
585
534
484
452
355
439
458
309
445
432
465
359
447
353
429
428
[a]
SD
34
14
96
104
42
48
8.6
110
22
77
93
93
51
35
29
87
58
68
49
55
F1 [i]
mean SD
405
43
282
36
315
57
471
57
423
55
393
33
419
4.4
379
62
300
32
377
110
418
97
195
109
331
53
300
45
345
52
307
51
339
53
251
39
308
36
305
33
***
***
***
***
**
*
**
***
***
**
*
***
***
F2 [a]
mean SD
1991 200
1442 28
1729 72
2140 239
1698 70
1622 62
1538 43
1814 88
1837 43
2145 147
2089 207
1562 63
1727 33
1914 129
1826 60
2138 480
1577 18
1808 51
1634 114
1896 93
F2 [i]
mean SD
2274 121
1992 78
2644 165
2524 292
2360 100
1893 138
1980 21
2605 228
2025 90
2731 297
2299 347
1843 93
2123 76
2437 305
2235 252
2499 354
1932 122
1979 221
1854 83
2862 106
*
***
***
*
***
**
***
***
**
**
***
***
**
**
***
**
***
267
Table A2. Formant values of distance-2 target vowels in [a] vs. [i] context
Speaker
1 (f)
2 (m)
3 (f)
4 (f)
5 (f)
6 (m)
7 (m)
8 (f)
9 (m)
10 (f)
11 (f)
12 (m)
13 (m)
14 (f)
15 (f)
16 (f)
17 (m)
18 (m)
19 (m)
20 (f)
F1
mean
572
461
633
757
594
551
483
870
528
617
798
539
631
721
612
794
621
459
506
588
[a]
SD
31
7.6
82
14
9.6
42
9.0
20
30
42
34
43
12
38
22
37
18
32
19
29
F1 [i]
mean SD
583
21
400
57
404
100
746
35
558
26
515
52
478
22
832
68
487
26
536
50
803
48
439
78
617
19
664
55
595
41
802
37
587
26
360
83
489
17
563
53
*
***
**
*
**
*
*
*
*
F2 [a]
mean SD
1622 73
1440 38
1731 61
1979 72
1780 79
1311 68
1363 14
2052 50
1418 43
1964 35
1993 42
1611 59
1439 113
1866 43
1519 48
1919 105
1391 27
1422 43
1491 70
1706 34
F2 [i]
mean SD
1659 56
1515 56
2009 47
1978 72
1914 39
1411 64
1504 16
2167 82
1481 45
2069 29
2058 55
1632 32
1523 42
1973 50
1620 39
1735 382
1523 15
1404 92
1548 33
1841 102
*
***
**
*
***
**
*
***
*
**
**
***
*
268
Table A3. Formant values of distance-3 target vowels in [a] vs. [i] context
Speaker
1 (f)
2 (m)
3 (f)
4 (f)
5 (f)
6 (m)
7 (m)
8 (f)
9 (m)
10 (f)
11 (f)
12 (m)
13 (m)
14 (f)
15 (f)
16 (f)
17 (m)
18 (m)
19 (m)
20 (f)
F1
mean
587
455
718
617
692
462
542
489
310
395
525
452
214
507
375
318
462
390
509
416
[a]
SD
27
89
88
105
48
210
40
199
87
116
78
91
31
124
64
99
63
58
36
117
F1 [i]
mean SD
635
45
479
10
704
95
592
87
722
67
530
37
515
35
367
258
287
103
290
158
434
149
487
94
169
64
468
79
416
94
546
180
500
57
318
82
526
19
443
158
F2 [a]
mean SD
1503 251
1127 35
1412 235
1802 139
1497 63
1220 294
1201 69
1655 103
1294 109
1655 134
1788 157
1184 130
1201 52
1502 58
1356 211
1680 137
1248 85
1269 86
1261 73
1501 275
F2 [i]
mean SD
1527 117
1116 99
1648 49
1767 213
1577 96
1290 254
1297 31
1709 138
1319 59
1679 114
1706 149
1257 218
1196 140
1543 100
1380 197
1641 62
1314 61
1208 88
1312 47
1690 115
*
**
269
APPENDIX B
Vowel space graphs for first production experiment:
3 distance conditions, 2 vowel contexts, 20 speakers
The figures starting on the next page were generated from the numerical data
summarized in Appendix A and show average context-vowel and distance-1, -2 and -3
target-vowel positions in vowel space for each of the 20 speakers who took part in the
first production experiment. For each distance condition and each speaker, the [i]- and
[a]-colored target vowels are joined by a line which is dotted back if neither F1 nor F2
were significantly different (at the p<0.05 level) between contexts, solid green if only
one of F1 or F2 was significantly different, and solid red if both were significantly
different. Context vowels are joined by a solid blue line. Context and distance-1, -2 and
-3 vowel pairs are labeled with progressively smaller text size. Each speaker’s gender is
also given.
Again, it is important to keep in mind that measurements for target vowels were
made near the end of those vowels, where coarticulatory influence of the context vowel
was expected to be strongest, and where effects of the immediately following
consonant appear to be seen as well. Therefore these formant values should not be
expected to correspond too closely to the values one would obtain in the steady-state
portion of a schwa vowel.
270
271
272
273
274
275
276
277
278
279
280
APPENDIX C
Formant values for second production experiment:
3 distance conditions, 5 vowel contexts, 21 speakers
The following tables show the average target-vowel F1 and F2 values for each
speaker in each distance condition and vowel context obtained in the second production
experiment, together with the standard deviation of the values of each formant obtained
at each distance. Results for Speakers 3, 5 and 7 are given after those of the other
speakers. Note that the “a” and “i” labeling refers to context, not the measured vowels
themselves, which were always schwa or [^]. A summary of significance testing results
for these data is given in Appendix D. A pictorial representation of these results for
each subject is given in Appendix E.
Formant means & SDs by distance for Speaker 21
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
482
449
446
430
439
3
F2
1527
1470
1440
1479
1471
2
F1
584
531
570
560
557
2
F2
1751
1754
1809
1809
1722
1
F1
449
389
426
352
407
1
F2
2126
2343
2217
2564
2184
SDs
19.4
31.5
19.4
38.6
36.5
174.3
281
Formant means & SDs by distance for Speaker 22
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
497
505
521
472
498
3
F2
1333
1213
1292
1293
1252
2
F1
595
549
580
547
579
2
F2
1770
1751
1764
1803
1800
1
F1
402
371
399
317
386
1
F2
1821
2286
2039
2354
2021
SDs
17.4
45.7
21.2
22.7
34.6
216.3
Formant means & SDs by distance for Speaker 23
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
441
456
453
448
436
3
F2
1291
1333
1306
1334
1304
2
F1
607
609
552
563
557
2
F2
1574
1597
1591
1636
1622
1
F1
513
477
513
421
472
1
F2
1514
1884
1725
2039
1759
SDs
8.2
18.9
28.0
24.8
37.8
195.1
Formant means & SDs by distance for Speaker 24
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
429
441
422
429
411
3
F2
1150
1159
1191
1173
1141
2
F1
452
465
459
444
438
2
F2
1426
1454
1433
1447
1457
1
F1
457
390
413
369
378
1
F2
1517
1750
1614
1785
1642
SDs
11.0
19.5
11.0
13.6
34.8
107.9
282
Formant means & SDs by distance for Speaker 25
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
498
464
477
490
483
3
F2
1128
1246
1177
1121
1175
2
F1
497
526
496
444
480
2
F2
1603
1563
1582
1671
1603
1
F1
503
460
469
403
411
1
F2
1830
1836
1830
1835
1823
SDs
12.6
49.9
29.9
40.9
41.9
5.1
Formant means & SDs by distance for Speaker 26
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
415
421
461
457
440
3
F2
1215
1220
1266
1236
1186
2
F1
538
543
537
524
588
2
F2
1513
1534
1502
1530
1509
1
F1
415
409
424
349
394
1
F2
1518
1819
1574
2001
1492
SDs
20.6
29.4
24.3
13.9
29.5
221.1
Formant means & SDs by distance for Speaker 27
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
469
478
501
407
480
3
F2
1076
1104
1076
1124
1132
2
F1
575
531
558
532
511
2
F2
1586
1537
1579
1647
1592
1
F1
437
436
463
389
432
1
F2
1666
1845
1625
1957
1675
SDs
22.5
29.6
25.3
39.2
26.7
141.6
283
Formant means & SDs by distance for Speaker 28
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
433
410
472
394
441
3
F2
1248
1255
1287
1232
1298
2
F1
457
446
419
369
449
2
F2
1512
1589
1548
1638
1572
1
F1
413
316
354
268
380
1
F2
1659
2134
1704
2246
1764
SDs
29.9
27.6
36.1
47.3
56.0
269.0
Formant means & SDs by distance for Speaker 29
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
467
464
531
525
468
3
F2
1437
1451
1490
1472
1441
2
F1
513
548
514
503
488
2
F2
1771
1835
1835
1851
1795
1
F1
402
392
380
344
355
1
F2
1841
2548
2048
2680
2165
SDs
31.6
18.5
21.8
33.2
24.4
349.6
Formant means & SDs by distance for Speaker 30
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
375
382
367
330
386
3
F2
1328
1317
1362
1410
1334
2
F1
586
595
577
531
578
2
F2
1762
1843
1790
1840
1751
1
F1
363
309
326
272
309
1
F2
1892
2327
1980
2347
1973
SDs
26.0
37.7
25.1
42.7
32.9
215.9
284
Formant means & SDs by distance for Speaker 31
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
470
420
457
461
424
3
F2
1259
1230
1245
1244
1209
2
F1
563
574
557
534
543
2
F2
1376
1425
1416
1456
1436
1
F1
428
399
376
391
361
1
F2
1873
2045
1991
2308
1913
SDs
22.7
19.0
16.1
29.7
25.5
171.1
Formant means & SDs by distance for Speaker 32
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
550
531
546
527
509
3
F2
1134
1138
1149
1138
1111
2
F1
484
484
508
495
476
2
F2
1350
1343
1365
1442
1382
1
F1
422
420
427
379
388
1
F2
1647
1874
1670
1968
1740
SDs
16.2
13.9
12.3
39.5
21.9
137.5
Formant means & SDs by distance for Speaker 33
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
349
333
311
367
353
3
F2
1375
1477
1735
1400
1676
2
F1
589
563
550
500
532
2
F2
1723
1816
1736
1836
1782
1
F1
391
312
344
303
307
1
F2
1950
2574
1995
2469
2271
SDs
40.7
158.0
33.5
48.9
37.3
277.5
285
Formant means & SDs by distance for Speaker 34
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
455
468
459
458
451
3
F2
1125
1117
1127
1126
1100
2
F1
503
521
522
481
483
2
F2
1388
1394
1376
1457
1413
1
F1
415
398
394
367
383
1
F2
1624
1804
1704
1978
1705
SDs
6.3
11.1
19.8
31.8
17.8
136.2
Formant means & SDs by distance for Speaker 35
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
446
441
435
413
419
3
F2
1324
1272
1277
1270
1274
2
F1
519
470
471
486
452
2
F2
1519
1459
1456
1500
1438
1
F1
384
358
373
312
333
1
F2
1859
2033
1837
2133
1784
SDs
33.3
27.7
25.2
33.5
29.2
147.3
Formant means & SDs by distance for Speaker 36
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
521
486
454
508
486
3
F2
1369
1378
1404
1430
1432
2
F1
584
560
548
473
508
2
F2
1673
1723
1678
1706
1683
1
F1
409
387
400
388
371
1
F2
1883
2150
1976
2191
1841
SDs
25.5
29.1
44.1
21.2
14.6
156.7
286
Formant means & SDs by distance for Speaker 37
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
489
463
457
453
454
3
F2
1178
1161
1144
1149
1169
2
F1
532
522
521
439
518
2
F2
1464
1482
1513
1539
1478
1
F1
410
387
392
361
386
1
F2
1686
1907
1634
1956
1771
SDs
11.2
17.7
38.2
30.4
17.6
138.2
Formant means & SDs by distance for Speaker 38
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
367
309
325
312
336
3
F2
1267
1256
1246
1272
1283
2
F1
505
493
450
421
415
2
F2
1679
1673
1709
1758
1706
1
F1
325
303
323
282
304
1
F2
2204
2355
2338
2541
2252
SDs
47.5
15.4
41.2
33.6
17.4
129.3
Formant means & SDs by distance for Speaker 3
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
694
709
657
627
634
3
F2
1354
1430
1364
1418
1402
2
F1
507
544
542
486
502
2
F2
1718
1811
1758
1920
1790
1
F1
437
432
404
360
382
1
F2
1939
2516
1984
2633
1813
SDs
36.2
33.3
25.6
76.2
32.7
370.3
287
Formant means & SDs by distance for Speaker 5
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
702
696
730
684
668
3
F2
1437
1446
1436
1417
1373
2
F1
509
502
474
486
480
2
F2
1745
1806
1772
1853
1767
1
F1
493
460
475
482
359
1
F2
1800
2455
1906
2423
1785
SDs
23.0
29.2
14.8
41.8
54.5
337.1
Formant means & SDs by distance for Speaker 7
Distance
[a]
[æ]
[^]
[i]
[u]
3
F1
476
499
493
479
462
3
F2
1098
1125
1147
1143
1114
2
F1
450
471
474
452
476
2
F2
1434
1391
1342
1471
1403
1
F1
461
433
446
357
428
1
F2
1637
2064
1675
2151
1587
SDs
14.4
20.0
12.5
48.2
40.1
263.5
288
APPENDIX D
Significant testing for second production experiment:
3 distance conditions, 5 vowel contexts, 21 speakers
For each speaker, two adjacent tables are given, separated by a thick border.
The table at left shows the level of significance associated with the speaker’s formant
frequency differences between contexts, for each of 10 context vowel pairs and 3
distance conditions, for each of F1 and F2. A blank cell indicates an outcome in the
contrary-to-expected direction (e.g. F1 higher for the [i] than [a] context). The bottom
row is a tally of significant results (for each of the p<0.05 and p<0.01 levels of
significance) for each of the three distance conditions, summed over both F1 and F2.
The table at right provides, for the given speaker, a summary of significance
testing outcomes for each formant at each distance and for each contrasting contextvowel pair, with * = p<0.05, ** = p<0.01, *** = p<0.001, + = p<0.10, and a √
indicating a non-significant outcome in which averages were nevertheless in the
expected direction. The bottom row gives the number of results in the column above
which were significant at the p<0.05 level or higher.
Spkr 21
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# sig
Spkr 22
Dist. 3
F1
F2
0.121
0.307
0.193
0.059
0.333
0.473
0.364
0.374
0.819
0.427
<.05
0
Dist. 2
F1
F2
0.023
0.021
0.169
0.434 0.073
0.409
0.235
0.064
0.454
0.207
0.429
0.036
3
2
Dist. 1
F1
F2
0.000
0.069
0.162
0.051
0.096
0.000
0.001
0.035
0.151
0.001
0.021
0.124
0.005
0.160 0.210 0.497 0.002 0.000
0.435 0.857 0.010 0.025 0.000
0.242 0.115 0.221 0.262
<.01 <.05 <.01 <.05 <.01
0
4
0
11
8
1
Dist. 3
F1 F2
Dist. 2
F1 F2
Dist. 1
F1 F2
√
*
+ *** ***
√
*
√
+ ***
√
√
√
√
*
+
√
+
√
+
√
√
√
*
+
**
√
√
*
√
√
**
√
√
√
√
** ***
√
√
√
*
* ***
√
√
√
√
√
# of sig. results, p<.05, by column
0
0
2
2
3
8
3
2
1
289
F1
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# sig
F2
3
2
F2
0.128
0.123 0.148
0.339
0.322 0.717
0.222 0.484
0.381 0.157
0.037 0.159
0.316 0.105
0.267 0.114
0.061 0.475
<.05 <.01
1
0
Spkr 24
F2
F2
0.048
0.949
0.036
0.030
0.018
0.021
0.009
0.001
0.037
0.266
0.003
0.011
0.403
F1
0.116
0.353
0.020
0.692
0.289
F2
0.000 0.000
0.052 0.000
0.002
0.019 0.003
0.009 0.011
0.010
0.397 0.039
0.069 0.000 0.000
0.769 0.192 0.031 0.001
0.025
<.05 <.01 <.05 <.01
10
3
15
9
2
F2
F1
F1
F1
0.323 0.076 0.003
0.481
0.306 0.029 0.029
0.357
0.326 0.068
0.169
0.113 0.049 0.006
0.260
0.094
0.150
0.157
0.273 0.024
0.065 0.260 0.003
0.306
0.159 0.100 0.033
0.363 0.100 0.634
0.607
F2
0.000
0.000
0.006
0.001
0.185
0.005
0.012
0.000
0.000
F1
F2
F1
*
*
√
√
√
√
**
√
√
√
+
F2
***
***
*
√
√
*
√
√
+
√
**
√
**
+
√
√
√
** **
√
√
√
√
* ***
+
√
√
√
√
# of sig. results, p<.05, by column
0
0
2
0
3
8
F1
3
F2
F1
2
F2
F1
1
F2
√
* *** *** ***
√
√
*
+ ***
√
*
√
**
√
√
*
**
*
**
√
√
*
*
**
*
√
√
*
√
*
*
√
**
√
*
√
√
+ *** ***
√
√
√
√
* ***
+
√
*
# of sig. results, p<.05, by column
1
0
6
4
6
9
√
1
F2
F2
√
√
1
F1
3
F1
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
F1
0.049 0.140 0.008 0.000
0.049
0.344 0.000
0.281
0.449 0.012
0.170 0.222 0.256 0.246 0.021
0.157 0.107 0.474 0.064 0.067 0.169
0.008
0.373
0.003
0.056 0.491 0.139 0.119 0.009 0.002
0.350 0.258 0.232 0.472 0.024 0.001
0.093 0.291 0.481
0.284 0.422
<.05 <.01 <.05 <.01 <.05 <.01
0
0
2
0
11
8
F1
# sig
F2
0.221
0.770
Spkr 23
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
F1
F1
√
√
√
√
√
+
√
3
F2
√
√
*
√
√
√
√
F1
√
√
√
+
√
**
√
√
2
F2
+
*
√
*
F1
**
*
+
**
√
*
+
√
*
√
1
F2
***
***
**
**
√
**
*
***
***
290
[^]-[u]
# sig
0.276 0.028 0.008
<.05 <.01 <.05
2
0
5
Spkr 25
3
F1
Context
[i]-[a]
[a]-æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# sig
F1
F1
F2
0.006 0.477
0.177 0.472
0.132
0.007 0.902
0.020
0.476
0.005 0.436
0.012 0.480
0.706 0.444
0.002 0.465
<.05 <.01
6
4
3
F2
F1
2
F2
F1
F1
F2
F1
F2
0.040 0.089 0.147 0.002 0.069 0.001
0.711 0.207 0.139
0.965 0.004
0.498 0.244
0.176 0.025 0.780 0.419 0.851
0.024 0.199
0.026 0.043 0.066
0.162
0.019
F1
2
F2
F1
1
F2
√
+
**
√
*
+
√
√
√
√
√
√
+
√
**
√
√
*
*
√
+
√
√
**
**
√
√
*
*
√
√
√
+
√
√
√
√
**
√
# of sig. results, p<.05, by column
1
1
1
2
6
0
F1
3
F2
√
√
+
*
√
F1
F1
*
√
*
3
F2
+
√
√
√
√
√
√
√
√
2
F2
1
F2
F1
√
√
**
√
2
F2
F1
1
F2
**
+
√
***
**
√
*
√
*
√
+
*
**
√
*
√
√
√
√
√
*
√
√
√
*
√
√
√
*
√
√
√
**
*
√
+
√
√
+
**
√
*
√
√
# of sig. results, p<.05, by column
0
2
0
0
3
6
1
F2
3
F2
√
*
√
√
1
0.286 0.343 0.008 0.007
0.905 0.317 0.803 0.020
0.489
0.183
0.896 0.197 0.735
0.337
0.017 0.147
0.453 0.250
0.035
0.266
0.293 0.272 0.017
0.426
0.364 0.269 0.002 0.011
0.440 0.082 0.188 0.317 0.099 0.006
0.196 0.016
0.100 0.175
<.05 <.01 <.05 <.01 <.05 <.01
2
0
0
0
9
4
F1
√
*
**
+
# of sig. results, p<.05, by column
0
2
2
3
4
8
1
F2
2
F2
<.01
9
0.268
0.840 0.463
0.065
0.043
0.394
Spkr 27
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
F1
3
F1
# sig
2
F2
0.298
0.257 0.055
0.038 0.045 0.068
0.187 0.209 0.464
0.142 0.302 0.059 0.995
0.167 0.013
0.176 0.071
0.140 0.004
0.265 0.047
0.598
0.654 0.062
0.492 0.181
<.05 <.01 <.05 <.01
2
0
3
1
Spkr 26
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
<.01
2
0.087
<.05
12
F1
√
√
√
*
291
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# sig
0.213
0.422 0.009
0.007 0.055 0.233 0.020 0.016 0.003
0.035
0.580 0.009 0.034 0.001
0.153
0.027
0.129
<.05 <.01 <.05 <.01 <.05 <.01
4
1
6
2
9
5
Spkr 28
3
F1
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# sig
F2
0.120
0.439 0.428
0.194
0.180
0.238
0.013
0.234
0.228
<.05
1
Spkr 29
<.01
0
# sig
1
F2
0.015
0.750
0.171
0.412
0.016
0.226
0.007
0.017
0.214
0.080
0.098
0.149
0.119
0.094 0.042
0.032 0.046
<.05
7
<.01
1
F1
0.000
0.000
0.093
0.008
0.028
0.000
0.000
0.002 0.000
0.006 0.000
<.05
14
2
F2
F1
3
F2
F1
2
F2
F1
<.01
11
F1
√
√
F2
0.000
0.000
0.011
0.034
0.175
0.000
0.006
0.000
0.002
<.01
7
F1
F2
0.008 0.066 0.067 0.062 0.001 0.000
0.689
0.701 0.051 0.009 0.006
0.347 0.237 0.383 0.248 0.027 0.119
√
√
√
F1
*
√
√
√
*
√
2
F2
1
F2
F1
**
*
√
+
+
√
√
*
*
*** ***
*** ***
*
+
√
**
*
*
***
***
** ***
** ***
2
F2
1
F2
*
+
√
*
√
# of sig. results, p<.05, by column
1
0
3
4
6
8
F1
3
F2
F1
**
√
√
√
√
+
√
√
F1
F1
** ** ***
*
√ ***
+
√
*
√
√
*
*
*
√
+
√
+
√
√ ***
√
*
*
√
**
√
√
√
√ ***
+
√
√
*
√
**
√
+
√
√
√
# of sig. results, p<.05, by column
0
0
2
4
2
8
√
1
F2
3
F2
√
1
0.400 0.009 0.006
0.301 0.024 0.752
0.058 0.266
0.299 0.357 0.032
0.015 0.276 0.100
0.084 0.489 0.395
0.424 0.036 0.043 0.163
0.453
0.354 0.321 0.171
0.092 0.193 0.655 0.012 0.653
0.135 0.098 0.253 0.122 0.258
<.05 <.01 <.05 <.01 <.05
0
0
6
1
10
F1
F2
0.000
0.000
0.012
0.147
0.020
0.121
0.952 0.383
0.051
0.913
0.313
Spkr 30
Context
[i]-[a]
[a]-[æ]
[a]-[^]
F1
3
F1
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
2
√
√
**
**
+
√
*
*
**
*
√
**
* ***
√
*
√
# of sig. results, p<.05, by column
4
0
2
4
3
6
3
F2
+
√
√
√
F1
2
F2
+
√
√
+
+
√
F1
1
F2
*** ***
** **
*
√
292
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# sig
0.907 0.422
0.006 0.014 0.035
0.245
0.245
0.326
0.045 0.121 0.107
0.012 0.075 0.304
0.272
<.05 <.01 <.05
5
2
3
Spkr 31
3
F1
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# sig
F2
0.091
0.576
0.390
0.296
0.055
0.226
0.301 0.208
0.152
0.239 0.207 0.799
0.193 0.259 0.353
<.05 <.01 <.05
0
0
5
3
F1
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# sig
0.220
0.527
0.444
0.092
0.417
0.298
0.442
0.016
0.015
0.000
0.001
0.462
<.01
8
2
1
F1
0.347
0.201
0.345
0.085 0.188
0.371
Spkr 32
0.825 0.005
0.029
0.101
0.036
0.123 0.007
0.044 0.051
0.181 0.102
<.01 <.05
0
12
F2
F1
0.004
0.027
0.039
0.042
0.104
0.322
0.053
0.204
0.015
0.004
0.330
0.113
0.028
0.000
0.054
0.032
0.714
0.014
0.289
0.163
0.046
0.000
0.229 0.110 0.004
0.188 0.242
<.01 <.05 <.01
1
8
4
2
F2
F2
F1
F1
F2
0.455
0.019 0.003
0.444 0.981
0.801
0.335
0.336
0.449 0.343 0.481 0.006
0.009 0.004
3
F1
2
F2
F1
√
**
√
√
√
*
*
*
*
√
√
** ***
*
+
*
+ ***
√
√
√
√
# of sig. results, p<.05, by column
4
1
1
2
6
6
F1
F1
F2
F1
√
2
F2
**
*
F1
1
F2
+
**
+ ***
√
*
√
+
√
*
*
*
√
√
*
**
√
√
+
√
√
*
√
√
√
√
√
√
*
√
√
*
***
√
√
√
√
√
**
√
√
√
√
√
# of sig. results, p<.05, by column
0
0
0
5
3
5
F1
√
√
√
+
√
3
F2
√
√
√
√
F1
2
F2
*
√
√
√
√
**
F1
**
√
**
**
1
F2
***
**
√
**
+
**
*
***
***
√
+
√
**
+
√
** **
√
√
√
+
√
*
+
+
**
# of sig. results, p<.05, by column
1
0
0
3
6
7
1
F2
3
F2
√
*
√
√
√
√
√
√
√
+
1
F2
0.000
0.004
0.162
0.007
0.095
0.006
0.109 0.088 0.360
0.006 0.029
0.099
0.229 0.001 0.002 0.000
0.232 0.117 0.337 0.055 0.474 0.000
0.010 0.090 0.082
0.003
<.05 <.01 <.05 <.01 <.05 <.01
1
0
3
2
13
12
Spkr 33
√
*
F1
3
F2
F1
2
F2
F1
1
F2
293
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# sig
0.338 0.024
0.596 0.071 0.501
0.128 0.010 0.140
0.018 0.088
0.035
0.215
0.288
0.166
0.048
0.866
0.372
0.342 0.247
<.05 <.01 <.05
2
0
7
Spkr 34
3
F1
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# sig
F1
3
F1
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
<.01
2
0.000
0.014
0.057
0.001
0.354
F1
* *** *** ***
√
*
* ***
√
√
+
√
+
√
** ***
*
√
√
√
√
*
***
√
√
√
**
*
**
+ ***
√
√
√
√
*
√
√
√
# of sig. results, p<.05, by column
0
2
3
4
3
7
√
√
1
F2
F1
F2
0.000 0.001 0.000
0.372 0.235 0.001
0.049 0.004
0.220 0.027 0.015
0.005 0.029 0.001
0.214 0.392 0.016
0.194 0.018
0.000 0.031 0.000
0.027 0.313 0.000
0.245
<.01 <.05 <.01
4
14
7
2
F2
0.000
0.000
0.319
0.001
0.000
0.424 0.001
0.070 0.001
0.839 0.028
0.104
<.05 <.01
10
8
2
F2
0.482 0.161
0.668
0.233
0.474
0.430 0.046 0.117
0.354 0.363 0.032
0.363
0.282 0.211 0.006
0.456
0.053
0.752 0.094 0.926
0.302 0.113 0.035
<.05 <.01 <.05
1
0
7
Spkr 35
0.001
0.021
0.321
0.347
0.306
0.033
0.303
0.001
0.192
F1
F1
F2
0.068
0.173
0.008 0.008
0.845
0.050
0.135 0.033
0.342
0.006
0.315
0.177 0.313 0.003 0.007 0.006 0.362
0.155
0.117 0.038 0.004
0.424
0.437
0.000
0.249
0.227 0.177 0.058 0.000
0.225
0.096 0.023 0.000
0.852
0.357 0.043 0.381 0.000
0.316 0.471 0.171 0.200 0.045 0.087
<.05 <.01 <.05 <.01 <.05 <.01
3
F2
√
F1
2
F2
√
√
***
√
F1
2
F2
F1
1
F2
** ***
√
√ ***
√
*
**
√
*
√
√
*
*
√
√
*
**
*
**
√
√
√
*
√
√
**
√
*
√
+ *** * ***
√
+
√
*
√ ***
√
√
*
√
# of sig. results, p<.05, by column
0
1
3
4
5
9
1
F2
√
+
*
*
F1
3
F2
F1
1
F2
+
√
** **
√
+
√
*
√
**
√
√
√
** ** **
√
√
√
*
**
√
√
***
√
√
√
+ ***
√
+
* ***
√
√
*
√ ***
√
√
√
√
*
+
# of sig. results, p<.05, by column
294
# sig
0
Spkr 36
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# sig
0
3
F2
0.303
0.199
0.046
0.173
0.091
0.418
0.212
0.165
0.046
# sig
3
12
F2
8
F1
3
F2
F1
F2
F1
0.120
0.058
0.435
0.659
0.137
0.285
0.380
0.060
√
√
*
√
0.059
0.061 0.205
0.123 0.325
0.125 0.347
0.167
<.01 <.05
4
6
0.000
0.000
0.053
0.658
0.224
0.006
0.006
0.002
0.003
0.097
<.01
6
2
1
F2
F1
2
F2
0
F2
F1
3
F1
0
1
F2
0.018
0.016 0.000 0.016 0.000
0.006
0.657 0.167 0.063 0.001
0.035
0.354 0.003 0.134
0.002 0.752 0.275 0.305 0.024 0.097
0.232
0.020 0.004 0.115 0.146
0.344 0.337 0.477
0.003
0.165
0.389 0.403 0.487 0.001
0.405 0.431 0.029 0.014 0.095 0.001
0.935
0.052 0.000 0.231 0.001
0.417
0.449 0.002 0.362
<.05 <.01 <.05 <.01 <.05 <.01
4
2
9
5
8
6
Spkr 38
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
F1
0.000
0.439
0.066
0.003
0.007
0.196
0.339
0.491
0.047
0.159 0.001
0.558
0.043
0.019
<.05 <.01 <.05
2
0
7
F1
3
2
F1
Spkr 37
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
4
F1
0.048 0.382 0.001 0.059 0.009
0.075
0.686
0.316
0.091
0.038 0.258 0.449
0.145 0.407 0.001 0.526 0.106
0.171 0.024 0.020 0.155
0.321 0.129
0.019
0.284 0.152 0.169 0.137 0.006
2
F1
2
F2
5
7
F1
1
F2
*** √
√ ***
√
+
√ ***
+
√
√
+
**
√
+
√
**
√
√
√
+
**
√
*
+
√
**
√ *** √
√
**
√
*
√
√
**
*
√
+
# of sig. results, p<.05, by column
1
1
7
0
0
6
F1
+
√
√
√
*
3
F2
F1
2
F2
F1
1
F2
*
* *** * ***
**
√
√
+ ***
*
√
**
√
**
√
√
√
*
+
√
*
**
√
√
√
√
√
**
√
√
√
√ ***
√
√
*
*
+ ***
√
+ *** √
**
√
√
**
√
# of sig. results, p<.05, by column
4
0
3
6
2
6
1
F2
2
F2
F1
0.024
0.166
0.187
0.779
0.018
0.423
0.214
0.006
*
+
+
√
√
3
F2
√
√
√
√
√
F1
**
√
*
**
*
√
*
√
2
F2
+
F1
√
√
*
**
√
√
√
√
√
**
1
F2
*
√
√
√
*
√
√
**
295
[i]-[u]
[^]-[u]
# sig
0.205
<.05
1
Spkr 3
<.01
0
3
F1
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# sig
0.110
0.696
0.120
0.042
0.085
0.103
0.045
0.285
0.898
0.241
<.05
2
Spkr 5
# sig
F1
F2
F1
3
F2
F1
F1
F2
F1
3
F2
0.414
0.360
0.240
0.003
0.000
0.000
0.110
0.858
√
√
√
F1
3
F2
0.000
0.018 0.000
0.000
0.042 0.000
0.006 0.012
<.05 <.01
11
8
F1
0.053
0.241 0.000
0.060 0.115 0.300
0.017
0.023
0.095
0.304 0.472
0.556 0.013
0.058 0.241 0.168 0.001 0.000
F2
0.000
0.000
0.067
0.080
0.042
F1
2
F2
F1
1
F2
√
√
√ *** ** ***
√
√
√
*
√ ***
√
√
√
+
√
*
√
√
√
*
*
+
*
** **
*
√
√
√
√
+ ***
*
√
+
√
** ***
√
√
*
**
* ***
√
√
√
**
√ ***
√
*
√
**
# of sig. results, p<.05, by column
2
0
3
5
5
9
F1
2
F2
F1
1
F2
**
√ ***
*
√ ***
√
√
√
√
*
√
**
√
√
*
√
+
***
√
**
*
* ***
√
**
***
√
+
√
**
* ***
+
**
√
**
*
# of sig. results, p<.05, by column
0
3
1
6
4
7
1
F2
3
F2
F1
2
F2
√
√
√
*
√
√
√
√
# of sig. results, p<.05, by column
1
0
5
1
2
4
1
F2
0.298
0.155 0.004
0.862 0.371 0.720 0.032
0.023 0.204
0.207 0.026 0.081 0.470
0.370
0.287 0.040
0.349 0.125 0.080
0.259 0.010 0.206 0.032
0.125
0.008
0.707 0.082 0.827 0.004
0.092 0.006
0.417
<.05 <.01 <.05 <.01
3
2
7
3
F1
F1
2
F2
√
1
0.206 0.251 0.001 0.007 0.000
0.120 0.370 0.038 0.796 0.000
0.443
0.227 0.074 0.150
0.439 0.424 0.163 0.013 0.044
0.047 0.009 0.007 0.016
0.120 0.474 0.155 0.056 0.000
0.262 0.096 0.327 0.004 0.000
0.225 0.011 0.003 0.048 0.000
0.401 0.360 0.004 0.358 0.000
0.028
0.112 0.006
<.01 <.05 <.01 <.05 <.01
0
8
4
14
10
3
Spkr 7
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
2
F2
F1
Context
[i]-[a]
[a]-[æ]
[a]-[^]
[a]-[u]
[æ]-[i]
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
0.810 0.102 0.130 0.019
0.134 0.467 0.105 0.244
<.05 <.01 <.05 <.01
6
2
6
3
+
√
+
+
√
*
√
√
√
√
*
+
√
√
√
F1
2
F2
√
√
√
F1
1
F2
*** ***
* ***
+
+
√
*
+
*** *** *
296
[æ]-[^]
[æ]-[u]
[i]-[^]
[i]-[u]
[^]-[u]
# sig
0.324
0.036
0.105 0.300
0.350
0.162
0.076 0.000 0.000
0.552 0.135 0.009 0.000 0.000
0.151 0.070
0.108
<.05 <.01 <.05 <.01 <.05
1
0
5
4
14
0.000
0.000
0.000
0.000
0.003
<.01
11
√
*
***
√
√
√ ***
√
+ *** *** ***
√
√
** *** *** ***
√
+
√
**
# of sig. results, p<.05, by column
0
1
1
4
6
8
297
APPENDIX E
Vowel space graphs for second production experiment:
3 distance conditions, 4 vowel contexts, 21 speakers
The figures beginning on the next page were generated from the numerical data
summarized in Appendix C and show average context-vowel and distance-1, -2 and -3
target-vowel positions in vowel space for each of the 21 speakers who took part in the
second production experiment. For each distance condition and each speaker, target
vowels associated with each context pair are joined by a line which is dotted back if
neither F1 nor F2 were significantly different (at the p<0.05 level) between contexts,
solid green if only one of F1 or F2 was significantly different, and solid red if both
were significantly different. Context vowels are joined by a solid blue line. Context and
distance-1, -2 and -3 vowel pairs are labeled with progressively smaller text size. Each
speaker’s gender is also given. The graphs for Speakers 3, 5 and 7 are given after those
of the other speakers.
For the context vowels only, the average position of the [^] vowel is also given
for reference’s sake. In addition, since most speakers produced [u] as a diphthong, both
the onset and offset of that context vowel are labeled as well, with “u-” indicating
vowel onset and “u” indicating vowel offset.
298
299
300
301
302
303
304
305
306
307
308
309
APPENDIX F
ANOVA results for second production experiment for
individual vowel pairs
3 distance conditions, 4 vowel contexts [6 vowel pairs], 21 speakers
Since there are six sets of vowel pairs associated with the four corner vowels,
six sets of repeated-measures ANOVAs were run; in each such set, comparisons of
normalized F1 and F2 values were made at each distance with vowel as factor, where in
each set the vowel factor had only two levels. At bottom are results for the [i] - [a]
contrast for the entire set of 38 speakers who took part in either the first or second
experiment.
[i]-[a] contrast
Distance 1
Distance 2
Distance 3
F1 testing results
F(1,17)=77.2, p<0.001
F(1,17)=36.5, p<0.001
F(1,17)=4.07, p=0.06
F2 testing results
F(1,17)=130.4, p<0.001
F(1,17)=51.4, p<0.001
F(1,17)=0.48, p=0.50
[a]-[u] contrast
Distance 1
Distance 2
Distance 3
F1 testing results
F(1,17)=46.7, p<0.001
F(1,17)=16.5, p<0.001
F(1,17)=7.2, p<0.05
F2 testing results
F(1,17)=13.6, p<0.01
F(1,17)=3.5, p=0.08
F(1,17)=0.47, p=0.50
[æ]-[u] contrast
Distance 1
Distance 2
Distance 3
F1 testing results
F(1,17)=1.47, p=0.24
F(1,17)=8.5, p<0.01
F(1,17)=0.21, p=0.65
F2 testing results
F(1,17)=71.3, p<0.001
F(1,17)=0.004, p=0.95
F(1,17)=0.78, p=0.39
[a]-[æ] contrast
Distance 1
Distance 2
Distance 3
F1 testing results
F(1,17)=26.8, p<0.001
F(1,17)=1.83, p=0.19
F(1,17)=4.64, p<0.05
F2 testing results
F(1,17)=72.2, p<0.001
F(1,17)=2.1, p=0.16
F(1,17)=0.03, p=0.86
[i]-[æ] contrast
Distance 1
Distance 2
Distance 3
F1 testing results
F(1,17)=61.6, p<0.001
F(1,17)=18.2, p<0.001
F(1,17)=0.31, p=0.59
F2 testing results
F(1,17)=24.8, p<0.001
F(1,17)=21.9, p<0.001
F(1,17)=0.14, p=0.72
310
[i]-[u] contrast
Distance 1
Distance 2
Distance 3
F1 testing results
F(1,17)=11.5, p<0.01
F(1,17)=4.0, p=0.06
F(1,17)=0.15, p=0.71
F2 testing results
F(1,17)=109.2, p<0.001
F(1,17)=42.6, p<0.001
F(1,17)=0.12, p=0.73
Whole-group results (Speakers 1 to 38)
[i]-[a] contrast
Distance 1
Distance 2
Distance 3
F1 testing results
F(1,37)=137.5, p<0.001
F(1,37)=37.2, p<0.001
F(1,37)=0.98, p=0.33
F2 testing results
F(1,37)=223.4, p<0.001
F(1,37)=54.8, p<0.001
F(1,37)=5.04, p<0.05
311
APPENDIX G
Numerical results from main signing study
The following tables show the average target-sign values in the x-, y- and zdimensions (right-left, front-back, and up-down, respectively), given in centimeters, for
Signers 2, 3, 4 and 5 in each distance condition and sign context obtained in the main
sign production experiment, together with the associated standard deviations. The sign
CLOTHES, for which data were not given in the main text, was used as an additional
context sign in order to enable comparisons of the behavior of the target signs between
neutral-space contexts and a body location adjacent to neutral space, which is where
CLOTHES is signed. (The sign is formed with a B handshape on both hands, with both
thumbs flicking down twice against the body at the chest area.)
Results for Signer 2
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
15.18 [1.799]
6.952 [0.763]
10.08 [1.815]
HAT (head)
PANTS (waist)
14.70 [1.728]
14.61 [1.507]
6.78 [1.089]
8.01 [1.269]
10.16 [3.393]
10.32 [1.817]
<red> (high)
<green> (low)
17.62 [4.880]
16.82 [NA]
6.525 [2.555]
4.87 [NA]
6.315 [4.190]
6.39 [NA]
BOSS (shoulder)
CLOWN (nose)
CUPCAKE (N.S.)
16.08 [1.692]
6.29 [1.053]
15.52 [1.497]
6.32 [0.844]
14.52 [1.077]
7.47 [0.674]
Distance 1, WANT target: I WANT (X)
8.387 [1.417]
8.767 [1.784]
9.672 [2.306]
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
13.14 [1.692]
8.11 [0.370]
12.43 [1.128]
312
HAT (head)
PANTS (waist)
14.30 [0.276]
14.36 [1.056]
8.602 [1.090]
8.362 [1.868]
13.34 [1.273]
11.68 [2.715]
<red> (high)
<green> (low)
14.1 [0.334]
12.32 [0.544]
9.036 [0.922]
7.575 [1.590]
11.71 [1.104]
12.14 [0.367]
BOSS (shoulder)
14.38 [2.354]
7.586 [1.174]
11.71 [4.705]
CLOWN (nose)
13.51 [0.428]
9 [1.713]
14.10 [0.258]
CUPCAKE (N.S.)
14.63 [0.623]
8.216 [0.674]
12.63 [2.496]
Distance 2, WANT target: I WANT FIND (X)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
12.02 [0.995]
9.934 [1.265]
12.60 [1.354]
HAT (head)
PANTS (waist)
12.12 [1.193]
12.35 [2.142]
9.254 [1.455]
9.63 [0.947]
13.23 [1.245]
10.98 [0.742]
<red> (high)
<green> (low)
13.29 [1.601]
12.58 [0.551]
8.912 [0.402]
9.39 [1.425]
13.31 [1.116]
12.83 [0.368]
BOSS (shoulder)
12.70 [0.899]
9.268 [1.814]
12.89 [0.632]
CLOWN (nose)
13.91 [0.540]
8.95 [0.296]
11.61 [1.080]
CUPCAKE (N.S.)
12.93 [1.251]
8.58 [1.454]
11.79 [1.423]
Distance 3, WANT target: I WANT GO FIND (X)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
12.34 [0.162]
10.36 [0.700]
12.29 [0.120]
HAT (head)
PANTS (waist)
12.17 [0.554]
12.38 [1.032]
9.9 [1.023]
11.23 [1.806]
13.65 [0.293]
12.93 [1.076]
<red> (high)
<green> (low)
11.81 [0.399]
12.88 [1.473]
9.14 [1.468]
8.957 [2.048]
12.54 [1.412]
13.03 [1.014]
BOSS (shoulder)
CLOWN (nose)
CUPCAKE (N.S.)
11.51 [0.636]
12.44 [1.008]
11.79 [1.278]
9.845 [1.051]
8.556 [1.373]
9.477 [0.945]
11.86 [2.731]
13.55 [1.727]
12.45 [0.658]
313
Distance 4, WANT target: I WANT GO FIND OTHER (X)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
12.06 [0.200]
8.933 [1.914]
10.04 [3.266]
HAT (head)
PANTS (waist)
11.21 [0.658]
14.62 [7.313]
9.652 [1.122]
9.043 [1.145]
10.10 [2.203]
6.61 [4.291]
<red> (high)
<green> (low)
11.22 [2.269]
16.73 [11.85]
9.696 [1.623]
8.03 [2.078]
10.83 [2.337]
2.94 [7.170]
10.38 [1.653]
9.9 [0.494]
10.62 [1.252]
9.15 [1.386]
10.82 [1.988]
10.15 [1.350]
Distance 1, WISH target: I WISH (X)
9.403 [1.326]
8.892 [3.732]
9.205 [2.387]
BOSS (shoulder)
CLOWN (nose)
CUPCAKE (N.S.)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
9.705 [0.582]
11.38 [1.494]
12.63 [1.655]
HAT (head)
PANTS (waist)
10.13 [0.926]
10.39 [1.420]
10.81 [2.141]
11.02 [1.125]
10.07 [4.120]
12.35 [0.417]
<red> (high)
<green> (low)
10.67 [0.700]
11.19 [1.679]
11.47 [0.169]
11.66 [0.941]
12.27 [2.064]
12.96 [1.285]
BOSS (shoulder)
9.23 [0.599]
11.98 [1.805]
13 [1.347]
CLOWN (nose)
11.06 [1.782]
12.16 [0.619]
13.06 [1.684]
CUPCAKE (N.S.)
10.42 [1.525]
11.81 [1.819]
12.15 [1.368]
Distance 3, WISH target: I WISH GO FIND (X)
Results for Signer 3
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
6.631 [1.696]
21.58 [5.779]
14.26 [1.384]
HAT (head)
PANTS (waist)
6.036 [1.337]
6.576 [4.064]
16.73 [8.686]
17.29 [7.416]
13.83 [0.954]
14.51 [1.252]
314
<red> (high)
<green> (low)
BOSS (shoulder)
CLOWN (nose)
CUPCAKE (N.S.)
6.666 [2.129]
8.002 [1.677]
13.57 [5.921]
17.25 [4.108]
6.146 [1.045]
21.2 [6.474]
8.426 [2.014]
15.18 [7.709]
6.503 [1.325]
22.58 [3.681]
Distance 1, WANT target: I WANT (X)
16.07 [2.580]
16.79 [1.880]
13.92 [2.694]
15.27 [1.191]
14.59 [1.811]
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
5.648 [1.226]
22.1 [2.668]
11.37 [3.142]
HAT (head)
PANTS (waist)
6.151 [2.999]
7.747 [1.989]
15.27 [7.912]
17.33 [5.538]
11.12 [4.836]
13.08 [0.730]
<red> (high)
<green> (low)
6.528 [0.754]
8.286 [2.188]
19.01 [2.571]
18.09 [7.209]
11.03 [1.723]
13.24 [2.573]
BOSS (shoulder)
5.526 [1.831]
17.65 [1.983]
11.49 [3.269]
CLOWN (nose)
6.773 [2.068]
17.28 [5.865]
11.19 [2.653]
CUPCAKE (N.S.)
6.341 [1.753]
20.13 [3.062]
11.85 [2.784]
Distance 2, WANT target: I WANT FIND (X)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
9.44 [3.311]
11.73 [6.961]
13.48 [2.175]
HAT (head)
PANTS (waist)
10.54 [2.137]
8.652 [1.779]
7.223 [4.695]
9.164 [4.215]
14.13 [3.707]
13.47 [2.127]
<red> (high)
<green> (low)
7.748 [2.126]
8.916 [1.585]
10.97 [3.869]
11.11 [3.110]
16.20 [1.366]
14.98 [1.937]
BOSS (shoulder)
8.522 [3.799]
10.39 [5.756]
13.93 [3.350]
CLOWN (nose)
8.863 [1.315]
8.583 [4.699]
13.53 [2.825]
CUPCAKE (N.S.)
7.621 [2.748]
14.35 [6.106]
12.29 [2.767]
Distance 3, WANT target: I WANT GO FIND (X)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
315
CLOTHES (chest)
7.835 [1.093]
12.88 [5.215]
12.34 [1.697]
HAT (head)
PANTS (waist)
6.236 [2.005]
5.668 [1.675]
21.35 [2.992]
16.55 [9.110]
13.46 [1.107]
10.87 [1.140]
<red> (high)
<green> (low)
6.361 [1.871]
7.482 [0.788]
20.04 [7.698]
11.19 [5.016]
13.11 [2.792]
13.13 [2.127]
BOSS (shoulder)
6.315 [2.095]
16.20 [6.012]
12.5 [1.505]
CLOWN (nose)
7.896 [3.421]
12.09 [10.50]
11.98 [3.261]
CUPCAKE (N.S.)
5.765 [0.824]
19.52 [4.559]
11.96 [3.188]
Distance 4, WANT target: I WANT GO FIND OTHER (X)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
12.76 [0.614]
7.195 [1.166]
4.713 [1.836]
HAT (head)
PANTS (waist)
11.67 [1.286]
11.17 [1.331]
6.533 [1.616]
7.23 [0.858]
3.838 [0.908]
3.066 [0.900]
<red> (high)
<green> (low)
10.87 [1.410]
5.988 [10.21]
7.19 [1.034]
5.234 [3.264]
4.093 [1.355]
1.892 [2.236]
12.1 [1.340]
7.388 [1.055]
11.91 [2.176]
7.49 [1.156]
10.89 [2.589]
6.858 [0.797]
Distance 1, WISH target: I WISH (X)
5.458 [2.290]
4.066 [2.299]
3.593 [2.803]
BOSS (shoulder)
CLOWN (nose)
CUPCAKE (N.S.)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
14.54 [1.521]
3.822 [1.499]
9.642 [2.923]
HAT (head)
PANTS (waist)
14.05 [2.039]
14.74 [2.295]
4.018 [2.022]
4.23 [1.235]
9.241 [3.487]
10.99 [3.507]
<red> (high)
<green> (low)
12.52 [0.750]
15.77 [1.353]
4.575 [1.196]
5.218 [1.500]
7.385 [0.889]
11.51 [2.022]
BOSS (shoulder)
CLOWN (nose)
CUPCAKE (N.S.)
14.08 [0.635]
13.88 [0.818]
14.78 [1.287]
3.671 [2.070]
4.616 [0.954]
2.956 [2.254]
9.541 [1.336]
9.283 [1.061]
9.896 [1.762]
316
Distance 3, WISH target: I WISH GO FIND (X)
Results for Signer 4
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
11.38 [2.283]
15.52 [3.684]
12.83 [1.047]
HAT (head)
PANTS (waist)
11.90 [0.448]
11.47 [1.994]
16.80 [4.846]
20.78 [0.947]
13.98 [1.536]
13.53 [1.999]
<red> (high)
<green> (low)
12.26 [0.927]
11.02 [0.134]
13.1 [4.536]
12.58 [7.877]
13.33 [2.214]
12.76 [4.157]
BOSS (shoulder)
CLOWN (nose)
CUPCAKE (N.S.)
11.69 [3.259]
15.16 [5.469]
11.38 [2.482]
17.51 [6.150]
12.87 [1.652]
17.55 [3.481]
Distance 1, WANT target: I WANT (X)
11.78 [2.849]
11.89 [2.596]
12.35 [1.663]
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
12.89 [0.944]
18.2 [1.255]
10.71 [2.566]
HAT (head)
PANTS (waist)
11.64 [4.663]
10.73 [2.668]
15.97 [4.042]
16.12 [2.236]
11.55 [2.981]
13.18 [2.449]
<red> (high)
<green> (low)
11.6 [2.753]
10.20 [4.102]
16.05 [3.138]
15.35 [2.281]
13.36 [1.819]
12.13 [2.700]
BOSS (shoulder)
14.46 [3.198]
15.87 [4.208]
13.19 [2.020]
CLOWN (nose)
10.17 [4.818]
16.81 [3.185]
10.38 [2.069]
CUPCAKE (N.S.)
11.63 [3.304]
17.77 [2.359]
10.57 [1.958]
Distance 2, WANT target: I WANT FIND (X)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
12.03 [2.545]
15.07 [7.123]
11.04 [2.926]
HAT (head)
PANTS (waist)
10.29 [1.550]
11.06 [NA]
9.005 [1.185]
8.93 [NA]
10.35 [1.375]
8.03 [NA]
317
<red> (high)
<green> (low)
8.782 [0.802]
8.846 [0.918]
9.914 [4.435]
8.342 [1.409]
10.93 [2.157]
10.35 [1.475]
BOSS (shoulder)
11.03 [1.161]
10.98 [4.668]
11.03 [1.254]
CLOWN (nose)
10.23 [1.976]
14.97 [5.992]
11.56 [0.787]
CUPCAKE (N.S.)
9.638 [1.094]
7.96 [1.340]
9.634 [0.541]
Distance 3, WANT target: I WANT GO FIND (X)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
10.09 [2.692]
10.12 [6.933]
10 [2.133]
HAT (head)
PANTS (waist)
9.31 [1.598]
7.566 [1.668]
9.006 [2.716]
5.73 [2.475]
9.566 [0.539]
9.568 [1.034]
<red> (high)
<green> (low)
7.23 [2.321]
7.053 [2.410]
6.266 [2.911]
5.83 [2.041]
10.31 [1.365]
9.796 [0.506]
BOSS (shoulder)
8.256 [1.312]
6.694 [2.374]
8.654 [0.964]
CLOWN (nose)
9.128 [1.924]
7.218 [1.082]
10.27 [1.881]
CUPCAKE (N.S.)
10.05 [2.209]
8.745 [5.466]
8.81 [1.708]
Distance 4, WANT target: I WANT GO FIND OTHER (X)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
5.823 [2.962]
11.89 [2.385]
7.588 [3.188]
HAT (head)
PANTS (waist)
3.528 [1.375]
9.265 [1.393]
12.10 [1.851]
8.63 [0.042]
9.16 [2.996]
7.565 [1.138]
<red> (high)
<green> (low)
7.628 [2.698]
18.18 [9.876]
11.55 [1.874]
9.077 [1.207]
8.811 [5.189]
7.447 [0.514]
8.732 [3.621]
12.96 [2.650]
5.065 [2.628]
12.20 [1.717]
5.836 [2.015]
12.65 [1.202]
Distance 1, WISH target: I WISH (X)
4.248 [2.562]
8.197 [3.098]
7.262 [3.436]
BOSS (shoulder)
CLOWN (nose)
CUPCAKE (N.S.)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
318
CLOTHES (chest)
8.704 [3.255]
8.82 [1.308]
4.306 [1.257]
HAT (head)
PANTS (waist)
7.624 [2.872]
7.28 [2.311]
8.95 [1.471]
9.33 [1.910]
4.174 [0.887]
5.695 [2.010]
<red> (high)
<green> (low)
7.795 [2.892]
6.821 [1.650]
9.425 [0.968]
8.305 [2.297]
4.8 [2.347]
5.841 [2.855]
BOSS (shoulder)
8.065 [3.352]
9.712 [1.415]
4.002 [1.225]
CLOWN (nose)
9.051 [3.892]
9.603 [2.560]
3.823 [2.584]
CUPCAKE (N.S.)
9.43 [3.458]
9.772 [1.206]
5.056 [3.144]
Distance 3, WISH target: I WISH GO FIND (X)
Results for Signer 5
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
17.68 [2.919]
12.70 [2.040]
6.617 [0.988]
HAT (head)
PANTS (waist)
14.58 [1.278]
no data
13.54 [2.109]
no data
8.626 [0.819]
no data
<red> (high)
<green> (low)
15.77 [2.174]
16.39 [1.214]
15.33 [2.455]
16.33 [3.318]
7.812 [1.600]
7.717 [1.612]
BOSS (shoulder)
CLOWN (nose)
CUPCAKE (N.S.)
15.24 [3.234]
11.72 [2.436]
13.51 [2.008]
15.15 [2.778]
10.55 [2.229]
10.84 [2.363]
Distance 1, WANT target: I WANT (X)
9.752 [3.912]
7.46 [0.381]
8.853 [1.783]
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
13.48 [1.840]
10.03 [1.918]
10.14 [1.833]
HAT (head)
PANTS (waist)
13.59 [3.083]
14.74 [1.837]
11.22 [1.167]
13.04 [2.678]
11.75 [3.224]
10.34 [1.488]
<red> (high)
<green> (low)
14.47 [1.821]
14.71 [1.137]
11.08 [2.340]
12.87 [1.443]
13.76 [3.983]
8.816 [2.595]
319
BOSS (shoulder)
11.55 [4.187]
10.13 [3.657]
12.57 [4.333]
CLOWN (nose)
13.67 [1.486]
12.41 [2.755]
9.905 [2.406]
CUPCAKE (N.S.)
12.21 [1.978]
9.114 [2.561]
11.49 [3.242]
Distance 2, WANT target: I WANT FIND (X)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
12.31 [NA]
7.07 [NA]
9.41 [NA]
HAT (head)
PANTS (waist)
13.05 [1.740]
12.34 [1.767]
10.16 [3.001]
13.59 [2.361]
11.66 [4.733]
12.46 [5.289]
<red> (high)
<green> (low)
11.18 [0.987]
13.61 [NA]
10.24 [0.710]
7.9 [NA]
15.88 [1.347]
11.66 [NA]
BOSS (shoulder)
12.81 [1.786]
11.82 [2.965]
10.69 [4.033]
CLOWN (nose)
13.32 [2.807]
11.47 [3.012]
10.55 [2.319]
CUPCAKE (N.S.)
11.55 [0.535]
8.39 [0.956]
10.72 [2.190]
Distance 3, WANT target: I WANT GO FIND (X)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
10.36 [1.507]
9.82 [2.246]
9.694 [2.210]
HAT (head)
PANTS (waist)
10.47 [0.960]
13.44 [3.735]
11.21 [2.243]
11.84 [2.563]
12.82 [2.929]
9.98 [1.921]
<red> (high)
<green> (low)
10.46 [1.566]
10.95 [1.205]
10.82 [2.467]
11.17 [2.626]
14.64 [4.044]
11.96 [4.730]
BOSS (shoulder)
14.60 [2.784]
12.74 [1.948]
9.606 [2.834]
CLOWN (nose)
12.13 [2.387]
10.17 [3.934]
11.01 [3.137]
CUPCAKE (N.S.)
10.49 [0.445]
8.885 [4.688]
8.09 [0.763]
Distance 4, WANT target: I WANT GO FIND OTHER (X)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
14.55 [4.051]
11.45 [1.205]
7.283 [2.260]
320
HAT (head)
PANTS (waist)
12.25 [0.991]
14.95 [NA]
13.17 [1.443]
14.94 [NA]
6.874 [1.132]
6.18 [NA]
<red> (high)
<green> (low)
12.79 [1.286]
18.78 [5.359]
11.25 [2.207]
9.47 [2.542]
5.3 [1.658]
6.275 [2.455]
13.15 [1.017]
13.77 [1.271]
10.98 [1.527]
12.23 [1.751]
12.26 [1.927]
13.09 [2.794]
Distance 1, WISH target: I WISH (X)
6.986 [1.940]
6.566 [2.089]
6.87 [2.531]
BOSS (shoulder)
CLOWN (nose)
CUPCAKE (N.S.)
Context
x (right-left):
Average [SD]
y (front-back):
Average [SD]
z (up-down):
Average [SD]
CLOTHES (chest)
11.04 [0.985]
9.143 [1.981]
7.113 [3.266]
HAT (head)
PANTS (waist)
11.32 [0.665]
12.1 [0.268]
10.32 [2.340]
11.01 [0.296]
6.168 [2.436]
3.78 [0.325]
<red> (high)
<green> (low)
12.38 [1.475]
11.57 [0.781]
11.68 [0.862]
9.236 [2.053]
3.817 [1.688]
7.438 [1.748]
BOSS (shoulder)
11.88 [0.829]
12.72 [1.241]
6.876 [1.770]
CLOWN (nose)
12.21 [0.799]
12.04 [2.474]
5.425 [1.661]
CUPCAKE (N.S.)
10.96 [1.447]
11.40 [1.886]
4.662 [2.036]
Distance 3, WISH target: I WISH GO FIND (X)
321
BIBLIOGRAPHY
Adank, P., Smits, R., & van Hout, R. (2004). A comparison of vowel normalization
procedures for language variation research. Journal of the Acoustical Society of
America, 116, 3099-3107.
Aitchison, J. (1981). Language change: Progress or decay? London: Fontana.
Alfonso, P. J., & Baer, T. (1982). Dynamics of vowel articulation. Language and
Speech, 25, 151–173.
Alho, K., Woods, D. L., Algazi, A., & Näätänen, R. (1992). Intermodal selective
attention II: effects of attentional load on processing auditory and visual stimuli in
central space. Electroencephalogr Clin Neurophysiol, 82, 356–68.
Arbib, M. A. (2005). Interweaving protosign and protospeech: Further developments
beyond the mirror. Interaction Studies: Social Behaviour and Communication in
Biological and Artificial Systems, 6, 145-71.
Arbib, M. A., & Rizzolatti, G. (1997). Neural expectations: A possible evolutionary
path from manual skills to language. Communication and Cognition, 29, 393-423.
Battison, R. (1978). Lexical borrowing in American Sign Language. Silver Spring:
Linstok Press.
Bayley, R., Lucas, C., & Rose, M. (2002). Phonological variation in American Sign
Language: The case of 1 handshape. Language Variation and Change, 14, 19-53.
Beddor, P.S., Harnsberger, J.D. & Lindemann, S. (2002). Language-specific patterns of
vowel-to-vowel coarticulation: Acoustic structures and their perceptual correlates.
Journal of Phonetics, 20, 591-627.
Bell-Berti, F., Baer, T., Harris, K. S., & Niimi, S. (1979). Coarticulatory effects of
vowel quality on velar function. Phonetica, 36, 187-193.
Benguerel, A.-P., & Cowan, H. A. (1974). Coarticulation of upper lip protrusion in
French. Phonetica, 30, 41-55.
Boersma, P. & Weenink, D. (2005). Praat: Doing phonetics by computer [computer
program]. Available from http://www.praat.org.
Bradley, T. (2006). Spanish complex onsets and the phonetics-phonology interface. In
F. Martínez-Gil & S. Colina (Eds.), Optimality-Theoretic Studies in Spanish
Phonology, 15-38. Amsterdam: John Benjamins.
322
Bradley, T. (2007). Morphological derived-environment effects in gestural
coordination: A case study of Norwegian clusters. Lingua, 117, 950-985.
Bradley, T., & Schmeiser, B. (2003). On The Phonetic Reality of /r/ in Spanish
Complex Onsets. In P. M. Kempchinsky & C.-E. Piñeros (Eds.), Theory, Practice, and
Acquisition: Papers from the 6th Hispanic Linguistics Symposium, 1-20. Somerville,
MA: Cascadilla Press.
Brentari, D. (1998). A Prosodic Model of Sign Language Phonology. Cambridge, MA:
MIT Press.
Brentari, D., & Crossley, L. (2002). Prosody on the hands and face: Evidence from
American Sign Language. Sign Language & Linguistics, 5, 105-130.
Brentari, D., & Goldsmith, J. A. (1993). H2 as secondary licenser. In G. R. Coulter
(Ed.), Current Issues in ASL Phonology, volume 3 of Phonetics and Phonology. San
Diego: Academic Press.
Browman, C., & Goldstein, L. (1986). Towards an articulatory phonology. Phonology,
3, 219-252.
Browman, C., & Goldstein, L. (1992). “Targetless” schwa: An articulatory analysis. In
G. J. Docherty & R. Ladd (Eds.), Papers in Laboratory Phonology II. Gesture,
Segment, Prosody, 26-56. Cambridge: Cambridge University Press.
Butcher, A., & Weiher, E. (1976). An electropalatographic investigation of
coarticulation in VCV sequences. Journal of Phonetics, 4, 59-74.
Celsis, P., Boulanouar, K., Doyon, B., Ranjeva, J. P., Berry, I., Nespoulous, J. L., et al.
(1999). Differential fMRI responses in the left posterior superior temporal gyrus and
left supramarginal gyrus to habituation and change detection in syllables and tones.
NeuroImage, 9, 135-44.
Cheek, D. A. (2001). The Phonetics and Phonology of Handshape in American Sign
Language. Academic dissertation. University of Texas at Austin.
Cho, T. (1999). Effect of prosody on vowel-to-vowel coarticulation in English.
Proceedings of the XIVth International Congress of Phonetic Sciences, 459-462.
Cho, T. (2004). Prosodically conditioned strengthening and vowel-to-vowel
coarticulation in English. Journal of Phonetics, 32, 141-176.
Choi, J. D., & Keating, P. (1991). Vowel-to-vowel coarticulation in three Slavic
languages. University of California Working Papers in Phonetics, 78, 78-86.
Chomsky, N., & Halle, M. (1968). The Sound Pattern of English. Harper and Row.
323
Clements, G. N. (1985). The geometry of phonological features. Phonology, 2, 225252.
Corina, D. (1990). Handshape assimilations in hierarchical phonological
representations. In C. Lucas (Ed.), Sign Language Research: Theoretical Issues, 27-49.
Washington: Gallaudet University Press.
Corina, D. P., & Hildebrandt, U. C. (2002). Psycholinguistic investigations of
phonological structure in ASL. In R. P. Meier, K. Cormier, et al. (Eds.), Modality and
Structure in Signed and Spoken Language, 88-111. New York: Cambridge University
Press.
Corina, D. P., & Sagey, E. (1989). Predictability in ASL handshapes and handshape
sequences, with implications for features and feature geometry. Ms., University of
California at San Diego.
Crasborn, O., & Klooij, E. van der. (1997). Relative Orientation in Sign Language
Phonology. In J. Coerts & H. de Hoop (Eds.), Linguistics in the Netherlands, 37-48.
Amsterdam: John Benjamins.
Czigler, I. (2007). Visual mismatch-negativity: violation of non-attended environmental
regularities. Journal of Psychophysiology, 21, 224–230.
Daniloff, R., & Hammarberg, R. (1973). On defining coarticulation. Journal of
Phonetics, 1, 239-48.
Delorme, A. & Makeig, S. (2004), EEGLAB: An open source toolbox for analysis of
single-trial EEG dynamics including independent component analysis, Journal of
Neuroscience methods [computer program], in press, found at
http://sccn.ucsd.edu/eeglab/download/eeglab_jnm03.pdf.
Donchin, E. (1981). Surprise!...Surprise? Psychophysiology, 18, 493-513.
Emmorey, K., McCullough, S., & Brentari, D. (2003). Categorical perception in
American Sign Language. Language and Cognitive Processes, 18, 21–46.
Farnetani, E., & Recasens, D. (1999). Coarticulation models in recent speech
production theories. In W. J. Hardcastle and N. Hewlett (Eds.), Coarticulation: Theory,
Data and Techniques, 31-65. Cambridge: Cambridge University Press.
Flemming, E. (1997). Phonetic detail in phonology: Towards a unified account of
assimilation and coarticulation. In K. Suzuki and D. Elzinga (Eds.), Proceeding volume
of the 1995 Southwestern Workshop in Optimality Theory (SWOT), University of
Arizona, Tucson, AZ.
324
Fletcher, J. (2004). An EMA/EPG study of vowel-to-vowel articulation across velars in
Southern British English. Clinical Linguistics & Phonetics, 18, 577-592.
Fowler, C. A. (1980). Coarticulation and theories of extrinsic timing. Journal of
Phonetics, 8, 113-133.
Fowler, C. A. (1981). Production and perception of coarticulation among stressed and
unstressed vowels. Journal of Speech and Hearing Research, 24, 127-139.
Fowler, C. A. (1983). Converging sources of evidence on spoken and perceived
rhythms in speech: Cyclic productions of vowels in monosyllabic stress feet. Journal of
Experimental Psychology: General, 112, 386–412.
Fowler, C. A., & Brancazio, L. (2000). Coarticulation resistance of American English
consonants and its effects on transconsonantal vowel-to-vowel coarticulation.
Language and Speech, 43, 1-41.
Fowler, C. A., & Saltzman, E. (1993). Coordination and coarticulation in speech
production. Language and Speech, 36, 171-195.
Fowler, C. A., & Smith, M. (1986). Speech perception as “vector analysis”: An
approach to the problem of segmentation and invariance, In J. S. Perkell & D. H. Klatt,
(Eds.), Invariance and variability of speech processes, 123-136. Hillsdale, NJ:
Erlbaum.
Fowler, C. A., & Turvey, M. T. (1980). Immediate compensation in bite-block speech.
Phonetica, 37, 306-326.
Frenck-Mestre, C., Meunier, C., Espesser, R., Daffner, K., & Holcomb, P. (2005).
Perceiving nonnative vowels: The effect of context on perception as evidenced by
event-related brain potentials. Journal of Speech, Language, and Hearing Research, 48,
1-15.
Gafos, A. (2002). A Grammar of Gestural Coordination. Natural Language and
Linguistic Theory, 20, 269-337.
Garrido, M., Kilner, J., Stephan, K. & Friston, K. (in press). The mismatch negativity:
A review of underlying mechanisms. Clinical Neurophysiology.
Gay, T. (1974). A cinefluorographic study of vowel production. Journal of Phonetics,
2, 255-266.
Gay, T. (1977). Articulatory movements in VCV sequences. Journal of the Acoustical
Society of America, 62, 183-193.
325
Gazzaniga, M. S., Ivry, R. B., & Mangun, G. R. (1998). Cognitive neuroscience: The
biology of the mind. New York: Norton.
Gerstman, L. H. (1968). Classification of self-normalized vowels. IEEE Transactions
on Audio and Electroacoustics, ACC-16, 78-80.
Giard, M. H., Perrin, F., Pernier, J., & Bouchet, P. (1990). Brain generators implicated
in the processing of auditory stimulus deviance: a topographic event-related potential
study. Psychophysiology, 27, 627-40.
Gilliéron, J. (1918). Généalogie des mots qui désignent l’abeille d’après l’Atlas
linguistique de la France. Paris: Champion.
Goldsmith, J. (1976). Autosegmental phonology. Doctoral dissertation, MIT,
Cambridge, MA. [Published 1979, New York: Garland Press]
Gomot, M., Giard, M.-H., Roux, S., Barthelemy, C., & Bruneau, N. (2000). Maturation
of frontal and temporal components of mismatch negativity (MMN) in children.
Neuroreport, 14, 3109-12.
Gordon, M. (1999). The phonetics and phonology of non-modal vowels: A crosslinguistic perspective. Berkeley Linguistics Society, 24, 93-105.
Gourevitch, V., & Galanter, E. (1967). A significance test for one-parameter
isosensitivity functions. Psychometrika, 32, 25-33.
Grosvald, M. (2006). Vowel-to-vowel coarticulation: Length and palatalization effects
and perceptibility. Unpublished manuscript, University of California at Davis.
Grosvald, M., & Corina, D. (in press). Exploring the limits of long-distance vowel-tovowel coarticulation. Proceedings, Workshop of the Association Francophone de la
Communication Parlée: “Coarticulation: Cues, Direction and Representation.”
Montpellier, France; December 7, 2007.
Hall, N. (2003). Gestures and Segments: Vowel Intrusion as Overlap. Doctoral
dissertation. Amherst, MA: University of Massachusetts, Amherst.
Hammarberg, R. (1976). The metaphysics of coarticulation. Journal of Phonetics, 4,
353-63.
Hardcastle, W. J., & Hewlett, N. (1999). Coarticulation: Theory, Data and Techniques.
Cambridge: Cambridge University Press.
Hari, R., Hämäläinen, M., Ilmoniemi, R., Kaukoranta, E., Reinikainen, K., Salminen,
J., et al. (1984). Responses of the primary auditory cortex to pitch changes in a
sequence of tone pips: Neuromagnetic recordings in man. Neurosci Lett, 50, 127-32.
326
Heid, S. & Hawkins, S. (2000). An acoustical study of long-domain /r/ and /l/
coarticulation. Proceedings of the 5th seminar on speech production: Models and data,
77-80. Kloster Seeon, Bavaria, Germany.
Hertrich, I., & Ackermann, H. (1995). Coarticulation in slow speech: durational and
spectral analysis. Language and Speech, 38, 159-187.
Hildebrandt, U., & Corina, D. (2002). Phonological similarity in American Sign
Language. Language and Cognitive Processes, 17, 593-612.
Huffman, M. K. (1986). Patterns of coarticulation in English. University of California
Working Papers in Phonetics, 63, 26-47.
Hussein, L. (1990). VCV coarticulation in Arabic. Ohio State University Working
Papers in Linguistics, 38, 88-104.
Jääskeläinen, I. P., Ahveninen, J., Bonmassar, G., Dale, A. M., Ilmoniemi, R. J.,
Levänen, S., et al. (2004). Human posterior auditory cortex gates novel sounds to
consciousness. Proc Natl Acad Sci USA, 101, 6809-14.
Johnson, K. (2003). Acoustic and auditory phonetics. Malden, MA: Blackwell
Publishing.
Keating, P. (1985). CV phonology, experimental phonetics, and coarticulation. UCLA
Working Papers in Phonetics, 62, 1–13.
Keating, P. (1988). Underspecification in phonetics. Phonology, 5, 275-292.
Keating, P. (1990a). Phonetic representations in a generative grammar. Journal of
Phonetics, 18, 321-334.
Keating, P. (1990b). The window model of coarticulation: Articulatory evidence. In J.
Kingston & M. E. Beckman (Eds.), Papers in Laboratory Phonetics I: Between the
Grammar and the Physics of Speech, 451-470. Cambridge University Press.
Kekoni, J., Hämäläinen, H., Saarinen, M., Gröhn, J., Reinikainen, K., Lehtokoski, A., et
al. (1997). Rate effect and mismatch responses in the somatosensory system: ERPrecordings in humans. Biol Psychol, 46, 125-42.
Klima, D., & Bellugi, U. (1979). The Signs of Language. Cambridge MA: Harvard
University Press.
Kozhevnikov, V., & Chistovich, L. (1965). Speech: Articulation and perception.
Translation 30, 543. Washington, DC: Joint Publications Research Service.
327
Krauel, K., Schott, P., Sojka, B., Pause, B. M. & Ferstl, R. (1999). Is there a mismatch
negativity analogue in the olfactory event-related potential? J Psychophysiol, 13, 49-55.
Kuehn, D., & Moll, K. (1972). Perceptual effects of forward coarticulation. Journal of
Speech and Hearing Research, 15, 654-664.
Kühnert, B., & Nolan, F. (1999). The origin of coarticulation. In W. J. Hardcastle and
N. Hewlett (Eds.), Coarticulation: Theory, Data and Techniques, 7-30. Cambridge:
Cambridge University Press.
Kujala, T., Tervaniemi, M. & Schröger, E. (2007). The mismatch in cognitive and
clinical neuroscience: theoretical and methodological considerations. Biol Psychol, 74,
1-19.
Kutas, M., & Hillyard, S. A. (1980). Reading senseless sentences: Brain potentials
reflect semantic incongruity. Science, 207, 203-205.
Ladefoged, P., & Broadbent, D. (1957). Information conveyed by vowels. Journal of
the Acoustical Society of America 29, 98-104.
Lehiste, I., & Shockey, L. (1972). On the perception of coarticulation effects in English
VCV syllables. Journal of Speech and Hearing Research, 15, 500-506.
Liberman, A. M., Harris, K. S., Kinney, J. A., & Lane, H. (1951). “The discrimination
of relative onset time of the components of certain speech and nonspeech patterns,’’ J.
Exp. Psychology, 61, 379–388.
Liddell, S. (1990). Structures for representing handshape and local movement at the
phonemic level. In S.D. Fischer & P. Siple (Eds.), Theoretical Issues in Sign Language
Research Vol. 1, 37–65. Chicago: University of Chicago Press.
Liddell, S., & Johnson, R. (1989[1985]). American Sign Language: The phonological
base. Sign Language Studies, 64, 197-277. (Originally distributed as manuscript)
Liddell, S., & Johnson, R. (1989). American Sign Language: The Phonological Base.
Sign Language Studies, 64, 195-277.
Lindblom, B., Lubker, J., & Gay, T. (1979). Formant frequencies of some fixedmandible vowels and a model of speech-motor programming by predictive simulation.
Journal of Phonetics, 7, 147-161.
Lucas, C., Bayley, R., Rose, M., & Wulf, A. (2002). Location Variation in American
Sign Language. Sign Language Studies, 2, 407-440.
Luck, S. J. (2005). An introduction to the event-related potential technique. Cambridge,
MA: MIT Press.
328
Macmillan, N. A., & Creelman, C. D. (1991). Detection Theory: A User's Guide. New
York: Cambridge University Press.
Maekawa, T., Goto, Y., Kinukawa, N., Taniwaki, T., Kanba, S., & Tobimatsu, S.
(2005). Functional characterization of mismatch negativity to a visual stimulus. Clin
Neurophysiol, 116, 2392–402.
Magen, H. S. (1997). The extent of vowel-to-vowel coarticulation in English. Journal
of Phonetics, 25, 187-205.
Mandel, M. (1981). Phonotactics and morphophonology in American Sign Language.
Doctoral dissertation, University of California at Berkeley.
Manuel, S. Y. (1990). The role of contrast in limiting vowel-to-vowel coarticulation in
different languages. Haskins Laboratories Status Report on Speech Research, 103-104,
1-20.
Manuel, S. Y., & Krakow, R. A. (1984). Universal and language particular aspects of
vowel-to-vowel coarticulation. Haskins Laboratories Status Report on Speech
Research, 77-78, 69-78.
Martin, J.G., & Bunnell, H.T. (1982). Perception of anticipatory coarticulation effects
in vowel-stop consonant-vowel sequences. Journal of Experimental Psychology:
Human Perception and Performance, 8, 473-488.
Matthies, M., Perrier, P., Perkell, J. S., & Zandipour, M. (2001). Variation in
anticipatory coarticulation with changes in clarity and rate. J Speech Lang Hear Res,
44, 340–353.
Mauk, C. (2003). Undershoot in Two Modalities: Evidence from Fast Speech and Fast
Signing. Academic dissertation. University of Texas at Austin.
Modarresi, G., Sussman, H., Lindblom, B., & Burlingame, E. (2004). An acoustic
analysis of the bidirectionality of coarticulation in VCV utterances. Journal of
Phonetics, 32, 291-312.
Moll, K. L., & Daniloff, R. G. (1971). Investigation of the timing of velar movements
during speech. Journal of the Acoustical Society of America, 50, 678-684.
Näätänen, R. (1979). Orienting and Evoked Potentials. In H. D. Kimmel, E. H. van
Olst, J. F. Orlebeke (Eds.), The orienting reflex in humans, 61-75. New Jersey:
Erlbaum.
Näätänen, R. (1985). Selective attention and stimulus processing: reflections in eventrelated potentials, magnetoencephalogram, and regional cerebral blood flow. In M. I.
329
Posner, O. S. M. Marin (Eds.), Attention and performance 1985; vol. XI, 355-73.
Hillsdale, NJ: Erlbaum.
Näätänen, R. (1992). Attention and brain function. Hillsdale, NJ: Lawrence Erlbaum.
Näätänen, R. (2001). The perception of speech sounds by the human brain as reflected
by the Mismatch Negativity and its magnetic equivalent. Psychophysiology, 38, 1–21.
Näätänen, R., Gaillard, A. W. K., & Mäntysalo, S. (1978). Early selective-attention
effect on evoked potential reinterpreted. Acta Psychologica, 42, 313-329.
Näätänen, R., Paavilainen, P., Rinne, T. & Alho, K. (2007). The mismatch negativity
(MMN) in basic research of central auditory processing: A review. Clin Neurophysiol,
118, 2544-2590.
Näätänen, R., & Winkler, I. (1999). The concept of auditory stimulus representation in
cognitive neuroscience. Psychological Bulletin, 125, 826-859.
Nespor, M., & Sandler, W. (1999). Prosody in Israeli Sign Language. Language &
Speech, 42, 143-176.
Ochiai, K., & Fujimura, O. (1971). Vowel identification and phonetic contexts. Reports
from the University of Electrocommunications (Tokyo), 22, 103-111.
Ohala, J. (1974). Experimental historical phonology. In J. Anderson & C. Jones (Eds.),
Historical linguistics, II: Theory and description in phonology. Amsterdam: NorthHolland.
Ohala, J. (1981). The listener as a source of sound change. In M.F. Miller (Ed.), Papers
from the parasession on language behavior. Chicago: Chicago Linguistic Association.
Ohala, J. (1994). Towards a universal, phonetically-based, theory of vowel harmony.
ICSLP-1994, 491-494.
Öhman, S. E. G. (1966). Coarticulation in VCV utterances: Spectrographic
measurements. Journal of the Acoustical Society of America, 39, 151-168.
Osterhout, L. & Holcomb, P. J. (1992). Event-related brain potentials elicited by
syntactic anomaly. Journal of Memory & Language, 31, 785-806.
Padgett, J. (1995). Feature classes. In J. Beckman, L. W. Dickey & S. Urbanczyk
(Eds.), Papers in Optimality Theory. Amherst: GLSA.
Parush, A., Ostry, D. J., & Munhall, K. G. (1983). A kinematic study of lingual
coarticulation in VCV sequences. Journal of the Acoustical Society of America, 74,
1115-1125.
330
Pater, J. (1999). Austronesian nasal substitution and other NC effects. In R. Kager, H.
van der Hulst & W. Zonnevelt (Eds.), The Prosody-Morphology Interface, 310-343.
Cambridge: Cambridge University Press.
Pazo-Alvarez, O., Cadaveira, F., & Amenedo, E. (2003). MMN in the visual modality:
a review. Biol Psychol, 63, 199–236.
Perkell, J. S., Guenther, F. H., Lane, H., Matthies, M. L., Stockmann, E., Tiede, M., &
Zandipour, M. (2004). The distinctness of speakers’ productions of vowel contrasts is
related to their discrimination of the contrasts. Journal of the Acoustical Society of
America, 116, 2338–2344.
Perlmutter, D. (1991). Prosodic vs. segmental structure: A moraic theory of American
Sign Language syllable structure. Ms., University of California, San Diego.
Prince, A., & Smolensky, P. (1993). Optimality Theory: constraint interaction in
generative grammar. Ms, Rutgers University & University of Colorado, Boulder.
Przezdziecki, M. (2000). Vowel harmony and vowel-to-vowel coarticulation in three
dialects of Yoruba. Working Papers of the Cornell Phonetics Laboratory, 13, 105-124.
Purcell, E. T. (1979). Formant frequency patterns in Russian VCV utterances. Journal
of the Acoustical Society of America, 66, 1691-1702.
Recasens, D. (1984). Vowel-to-vowel coarticulation in Catalan VCV sequences.
Journal of the Acoustical Society of America, 76, 1624-1635.
Recasens, D. (1989). Long range coarticulation effects for tongue dorsum contact in
VCVCV sequences. Speech Communication, 8, 293-307.
Recasens, D. (2002). An EMA study of VCV coarticulatory direction. Journal of the
Acoustical Society of America, 111, 2828-2841.
Recasens, D., Pallarès, M. D., & Fontdevila, J. (1997). A model of lingual
coarticulation based on articulatory constraints. Journal of the Acoustical Society of
America, 102, 544-561.
Rinne, T., Alho, K., Ilmoniemi, R. J., Virtanen, J., & Näätänen, R. (2000). Separate
time behaviors of the temporal and frontal mismatch negativity sources. NeuroImage,
12, 14-9.
Rizzolatti, G., Fadiga L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the
recognition of motor actions. Cognitive Brain Research, 3, 131-41.
331
Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends in
Neurosciences, 21, 188-94.
Sagey, E. (1986). The Representation of Features and Relations in Non-Linear
Phonology. Doctoral dissertation, MIT, Cambridge, MA.
Sallinen, M., Kaartinen, J. & Lyytinen, H. (1994). Is the appearance of mismatch
negativity during stage 2 sleep related to the elicitation of K-complex?
Electroencephalogr Clin Neurophysiol, 91, 140-8.
Sandler, W. (1986). The spreading hand autosegment of American Sign Language. Sign
Language Studies, 50, 1-28.
Sandler, W. (1987). Sequentiality and simultaneity in American Sign Language.
Doctoral dissertation, University of Texas.
Sandler, W. (1989). Phonological representation of the sign: Linearity and nonlinearity
in American Sign Language. Dordrecht: Foris.
Sandler, W. (1995). Markedness in the handshapes of sign language: A componential
analysis. in J. van de Weijer & H. van der Hulst (Eds.), Leiden in Last: Holland
Institute of Linguistics Phonology Papers, 369-399. The Hague: Holland Academie
Graphics.
Sandler, W. (1996). Representing handshapes. International Journal of Sign
Linguistics, 1, 115-158.
Sandler, W., & Lillo-Martin, D. (2006). Sign Language and Linguistic Universals.
Cambridge, UK: Cambridge University Press.
Scarborough, R. A. (2003). Lexical confusability and degree of coarticulation.
Proceedings of the 29th Meeting of the Berkeley Linguistics Society, February 14-17,
2003.
Shaffer, L. H. (1982). Rhythm and timing in skill. Psychological Review, 89, 109-122.
Siedlecki, T. J., & Bonvillian, J. D. (1998). Homonymy in the lexicons of young
children acquiring American Sign Language. Journal of Psycholinguistic Research, 27,
47–68.
Stokoe, W. (1960). Sign language structure: An outline of the visual communication
systems of the American Deaf. Studies in Linguistics, Occasional Papers, 8. Silver
Spring, MD: Linstok Press.
Tales, A., Newton, P., Troscianko, T., & Butler, S. (1999). Mismatch negativity in the
visual modality. Neuroreport ,10, 3363–3367.
332
Tervaniemi, M., Medvedev, S. V., Alho, K., Pakhomov, S. V., Roudas, M. S., van
Zuijen, T. L., et al. (2000). Lateralized automatic auditory processing of phonetic
versus musical information: a PET study. Human Brain Map, 10, 74-9.
Tiitinen, H., May, P., Reinikainen, K., & Näätänen, R. (1994). Attentive novelty
detection in humans is governed by pre-attentive sensory memory. Nature, 372, 90-92.
van der Hulst, H. (1995). The composition of handshapes. In Working papers in
linguistics 23. Department of Linguistics, University of Trondheim, Dragvoll.
van der Kooij, E. (2002). Phonological Categories in Sign Language of the
Netherlands. The Role of Phonetic Implementation and Iconicity. Doctoral dissertation,
Universiteit Leiden, Leiden.
van Oostendorp, M. (2003). Schwa in Phonological Theory. In L. Cheng & R. Sybesma
(Eds.), The Second Glot International State-of-the-Article Book. The Latest in
Linguistics. (Studies in Generative Grammar 61), 431-461. Berlin: Mouton de Gruyter.
Vogler, C. & Metaxas, D. (1997). Adapting hidden Markov models for ASL
recognition by using three-dimensional computer vision methods. Proc. IEEE Int. Conf.
Systems, Man and Cybernetics, Orlando, FL, 1997, 156–161.
West, P. (1999). The extent of coarticulation of English liquids: An acoustic and
articulatory study. Proceedings of the International Conference of Phonetic Sciences,
1901-4. San Francisco.
333
Vita
Michael Grosvald was born and raised in northern California. He majored in
Mathematics and Linguistics and minored in Psychology as an undergraduate at the
University of California at Davis, then entered a doctoral program in Mathematics at
UC Berkeley, earning a Masters degree and advancing to candidacy before leaving the
program to work in the financial district in San Francisco. Two years later, realizing
that he missed studying languages, he decided to fulfill a dream of living and traveling
abroad in order to see the world and become multilingual. For the following seven
years, he lived and worked in Prague, Berlin, and Taipei, and traveled extensively,
eventually visiting over 100 countries. He then came back full circle to UC Davis to
work on his doctorate in Linguistics. His research interests include phonetics and
phonology, psycholinguistics, second language acquisition, and computational
linguistics.
Permanent address:
1023 Sir William Ct
Chico, CA 95926
USA
This dissertation was typed by the author.
Download