Long-Distance Coarticulation: A Production and Perception Study of English and American Sign Language By MICHAEL ANDREW GROSVALD B.A.S. (University of California, Davis) 1989 M.A. (University of California, Berkeley) 1993 M.A. (University of California, Davis) 2006 DISSERTATION Submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Linguistics in the OFFICE OF GRADUATE STUDIES of the UNIVERSITY OF CALIFORNIA DAVIS Approved: _____________________________________ _____________________________________ _____________________________________ Committee in Charge 2009 i Copyright by Michael Andrew Grosvald 2009 ii TABLE OF CONTENTS List of Tables .................................................................................................................vi List of Figures ...............................................................................................................vii Dedication ......................................................................................................................ix Acknowledgements......................................................................................................... x Abstract ........................................................................................................................xiii Chapter 1 — Introduction............................................................................................. 1 1.1. Motivation for study............................................................................................ 1 1.2. Assimilation and coarticulation ......................................................................... 3 1.2.1. Assimilation in spoken language.................................................................... 3 1.2.2. Distinguishing coarticulation and assimilation ..............................................5 1.2.3. Assimilation and coarticulation in signed language....................................... 6 1.2.3.1. Some models of sign language phonology.............................................. 7 1.2.3.2. Approaching coarticulation in signed language ....................................13 1.3. Comparison of English schwa and ASL neutral space .................................. 16 1.3.1. Status of schwa............................................................................................. 17 1.3.2. Neutral signing space ................................................................................... 20 1.3.3. Schwa and neutral space: Comparison and contrast .................................... 23 1.3.4. Schwa and neutral space: Summary............................................................. 26 1.4. Dissertation outline ........................................................................................... 27 1.5. Research questions and summary of results................................................... 28 PART I: Spoken-language production and perception ............................................ 32 Chapter 2 — Coarticulation production in English.................................................. 32 2.1. Introduction ....................................................................................................... 32 2.1.1. Segmental and contextual factors................................................................. 35 2.1.2. Language-specific factors ............................................................................ 39 2.1.3. Speaker-specific factors ............................................................................... 40 2.2. First production experiment: [i] and [a] contexts .......................................... 41 2.2.1 Methodology ................................................................................................ 42 2.2.1.1. Speakers ................................................................................................ 42 2.2.1.2. Speech samples ..................................................................................... 42 2.2.1.3. Recording and digitizing ....................................................................... 47 2.2.1.4. Editing, measurements and analysis......................................................48 2.2.2. Results and discussion.................................................................................. 51 2.2.2.1. Group results ......................................................................................... 51 2.2.2.2. Individual results ................................................................................... 53 2.2.2.3. Follow-up tests ...................................................................................... 57 2.2.2.4. Longer-distance effects .........................................................................63 2.3. Second production experiment: [i], [a], [u] and [æ] contexts........................ 64 2.3.1 Methodology ................................................................................................ 65 2.3.1.1. Speakers ................................................................................................ 65 2.3.1.2. Speech samples ..................................................................................... 65 2.3.1.3. Recording and digitizing ....................................................................... 68 2.3.1.4. Measurements and data sample points ..................................................68 2.3.2. Results and discussion.................................................................................. 68 iii 2.3.2.1. Group results ......................................................................................... 68 2.3.2.2. Individual results ................................................................................... 73 2.3.2.3. Follow-up tests ...................................................................................... 84 2.3.2.4. Longer-distance effects .........................................................................87 2.3.2.5. Further comparison of the two experiments..........................................88 2.4. Chapter conclusion............................................................................................ 93 Chapter 3 — Coarticulation perception in English: Behavioral study ................... 94 3.1. Introduction ....................................................................................................... 94 3.2. Methodology ...................................................................................................... 97 3.2.1. Listeners ....................................................................................................... 97 3.2.2. Creation of stimuli for perception experiment .............................................98 3.2.3. Task ............................................................................................................ 100 3.3. Results and discussion..................................................................................... 103 3.3.1. Perception measure .................................................................................... 103 3.3.2. Group results .............................................................................................. 105 3.3.3. Individual results ........................................................................................ 105 3.3.4. Correlation between production and perception ........................................ 109 3.4. Chapter conclusion.......................................................................................... 115 Chapter 4 — Coarticulation perception in English: ERP study............................ 118 4.1. Introduction ..................................................................................................... 118 4.2. Methodology .................................................................................................... 122 4.2.1. Participants and Stimuli ............................................................................. 122 4.2.2. Electroencephalogram (EEG) recording ....................................................124 4.3. Results and discussion..................................................................................... 126 4.3.1. Latency ....................................................................................................... 134 4.3.2. Relationship to behavioral results .............................................................. 136 4.3.2.1. Can ERP results predict behavioral outcomes?................................... 140 4.3.2.2. Are ERP and behavioral responses correlated in general?..................141 4.4. Chapter conclusion.......................................................................................... 141 PART II: Signed-language production and perception.......................................... 143 Chapter 5 — Coarticulation production in ASL..................................................... 143 5.1. Introduction ..................................................................................................... 143 5.2. Initial study ...................................................................................................... 148 5.2.1. Methodology .............................................................................................. 149 5.2.1.1. Signer 1 ............................................................................................... 149 5.2.1.2. Task ..................................................................................................... 150 5.2.1.3. Motion capture data recording ............................................................ 152 5.2.2. Results ........................................................................................................ 154 5.3 Main study ....................................................................................................... 163 5.3.1. Methodology .............................................................................................. 164 5.3.1.1. Subjects ............................................................................................... 164 5.3.1.2. Task ..................................................................................................... 165 5.3.2. Results ........................................................................................................ 172 5.3.2.1. Group results ....................................................................................... 174 5.3.2.2. Signer 2 ............................................................................................... 182 5.3.2.3. Signer 3 ............................................................................................... 192 iv 5.3.2.4. Signer 4 ............................................................................................... 200 5.3.2.5. Signer 5 ............................................................................................... 208 5.3.3. Other aspects of intersigner variation......................................................... 221 5.4. Chapter conclusion.......................................................................................... 226 Chapter 6 — Coarticulation perception in ASL: Behavioral study ...................... 232 6.1. Introduction ..................................................................................................... 232 6.2. Methodology .................................................................................................... 233 6.2.1. Participants ................................................................................................. 233 6.2.2. Creation of stimuli for perception experiment ...........................................233 6.2.3. Task ............................................................................................................ 237 6.3. Results and discussion..................................................................................... 241 6.3.1. Perception measure .................................................................................... 241 6.3.2. Group results .............................................................................................. 241 6.3.3. Individual results ........................................................................................ 244 6.3.4. Relationship between production and perception ......................................247 6.4. Chapter conclusion.......................................................................................... 247 Chapter 7 — Discussion and conclusion .................................................................. 249 7.1. Models of spoken-language coarticulation.................................................... 249 7.2. Incorporating signed-language coarticulation ............................................. 253 7.3. Cross-modality contrasts ................................................................................ 256 7.4. Dissertation conclusion ................................................................................... 263 Appendix A ................................................................................................................. 265 Appendix B ................................................................................................................. 269 Appendix C ................................................................................................................. 280 Appendix D ................................................................................................................. 288 Appendix E ................................................................................................................. 297 Appendix F.................................................................................................................. 309 Appendix G ................................................................................................................. 311 Bibliography ............................................................................................................... 321 Vita .............................................................................................................................. 333 v LIST OF TABLES Table 2.1: Significance testing outcomes for first group of speakers ............................ 55 Table 2.2: Follow-up testing results, Speakers 3 and 7.................................................. 59 Table 2.3: Possible very-long-distance coarticulatory effects for Speakers 3 and 7...... 64 Table 2.4: Expected coarticulatory influence of four vowels on nearby schwa ............ 66 Table 2.5: ANOVA testing results, all 38 speakers ....................................................... 72 Table 2.6: Significance testing outcomes, second group of speakers ............................ 75 Table 2.7: Average formant values, Speaker 37 ............................................................79 Table 2.8: Significance testing results for all context pairs, Speaker 37 ....................... 80 Table 2.9: Further results, Speaker 37............................................................................ 81 Table 2.10: Summary of testing outcomes for all contexts, second speaker group ....... 83 Table 2.11: Very-long-distance significance testing results for four speakers .............. 88 Table 2.12: Summary of overall results for Speakers 3, 5 and 7 ................................... 90 Table 2.13: Summary of outcomes for the two experiments ......................................... 92 Table 3.1: Type of data obtained from each subject group............................................ 98 Table 3.2: Duration, amplitude and f0 values used in normalizing vowel stimuli ......100 Table 3.3: Perception scores, all subjects..................................................................... 108 Table 4.1: Latency results for the entire subject group ................................................ 136 Table 4.2: Latency results by subgroup ....................................................................... 138 Table 5.1: Results for WANT in z-dimension (height) at distance 3, Signer 1 ........... 159 Table 5.2: Distance-1 results for WANT, Signer 1......................................................162 Table 5.3: Demographic information of the five signers ............................................. 164 Table 5.4: Sentence frames and context signs, main sign production study................166 Table 5.5: Numerical results for the main sign production study ................................ 181 Table 5.6: Production results, Signer 2 ........................................................................ 190 Table 5.7: Production results, Signer 3 ........................................................................ 199 Table 5.8: Production results, Signer 4 ........................................................................ 206 Table 5.9: Production results, Signer 5 ........................................................................ 220 Table 5.10: Some measures of intersigner differences ................................................ 222 Table 5.11: Quantification of neutral-space “drift” for each signer.............................223 Table 6.1: Duration and height-difference information for the sign stimuli................ 237 Table 6.2: Results for each subject in the sign perception study ................................. 245 Table A1: Formant values of distance-1 target vowels in [a] vs. [i] contex ................266 Table A2: Formant values of distance-2 target vowels in [a] vs. [i] context ...............267 Table A3: Formant values of distance-3 target vowels in [a] vs. [i] context ...............268 vi LIST OF FIGURES Figure 1.1: Autosegmental representation of nasal place assimilation ............................ 4 Figure 1.2: Move-Hold representation of the ASL sign IDEA ........................................ 8 Figure 1.3: Representation of a sign in the Hand-Tier model........................................ 10 Figure 1.4: Hand-configuration assimilation in the Hand-Tier model...........................11 Figure 1.5: The vowel quadrangle and schwa................................................................ 18 Figure 1.6: Possible coarticulatory influences on schwa and neutral signing space......23 Figure 2.1: Fowler’s (1983) model of VCV articulation ............................................... 36 Figure 2.2: Coarticulation model from Keating (1985) ................................................. 38 Figure 2.3: Expected coarticulatory influence on schwa of nearby [i] or [a].................45 Figure 2.4: Editing points used for the sequence up at a ............................................... 48 Figure 2.5: Average F1 and F2 of target vowels, first group of speakers......................53 Figure 2.6: Coarticulatory effects produced by Speaker 7............................................. 57 Figure 2.7: A typical recording made from Speaker 7...................................................60 Figure 2.8: Correlation results, production at distances 1 and 2, first speaker group .... 62 Figure 2.9: Expected influence on schwa from [i], [a], [u] or [æ] .................................67 Figure 2.10: Target vowel positions in F1-F2 space, second group of speakers ........... 70 Figure 2.11: Coarticulatory effects on target vowels, Speaker 37 ................................. 77 Figure 2.12: Correlation results for production at distances 1 and 3, both groups ........85 Figure 3.1: Logistic curve .............................................................................................. 96 Figure 3.2: Design of the perception task for the speech study ................................... 102 Figure 3.3: Production-perception correlation results, first speaker group.................. 111 Figure 3.4: Production-perception correlation results, second speaker group ............. 112 Figure 3.5: Production-perception correlation results, both groups............................. 113 Figure 3.6: Production and perception—hypothetical threshold values ...................... 115 Figure 4.1: Hypothetical patterning of perceptual sensitivity to coarticulation........... 122 Figure 4.2: Sequencing of the ERP study stimuli ........................................................ 123 Figure 4.3: Configuration of the electrodes in the 32-channel cap ..............................125 Figure 4.4: Topographical maps, entire subject group................................................. 129 Figure 4.5: Waveforms at selected sites, entire subject group ..................................... 132 Figure 4.6: Topographic distribution of the MMN-like effects, entire subject group .134 Figure 4.7: Topographic distribution of MMN-like effects, by subgroup ................... 139 Figure 5.1: Expected coarticulatory behavior of schwa and neutral signing space ..... 148 Figure 5.2: Locations of FATHER, MOTHER and neutral signing space .................. 150 Figure 5.3: The location of the target sign WANT in a typical utterance.................... 151 Figure 5.4: Position of the ultrasound markers ............................................................152 Figure 5.5: Definition of x, y, and z dimensions relative to signer.............................. 154 Figure 5.6: Two ASL sentences, as seen in motion capture data................................. 156 Figure 5.7: Locations of seven context signs ............................................................... 161 Figure 5.8: Beginning and end points of WISH........................................................... 165 Figure 5.9: Newer variant of RUSSIA ......................................................................... 167 Figure 5.10: Apparatus used for <red> and <green> ................................................... 168 Figure 5.11: Context-sign locations on the body and in 3-space ................................. 175 Figure 5.12: Motion capture data for two sentences, Signer 2.....................................183 Figure 5.13: Context item locations for left-handed signers........................................ 185 vii Figure 5.14: Motion capture data for two sentences, Signer 3.....................................192 Figure 5.15: Signer 3 signing BOSS on upper chest.................................................... 194 Figure 5.16: Motion capture data for four sentences, Signer 4 ....................................200 Figure 5.17: Signer 4’s preferred form of GO ............................................................. 201 Figure 5.18: Motion capture data for two sentences, Signer 5.....................................208 Figure 5.19: Contraction of “I WANT,” Signer 5........................................................ 210 Figure 5.20: Contraction of “I WISH,” Signer 5..........................................................211 Figure 5.21: PANTS and HAT with modified location, Signer 5................................ 212 Figure 5.22: Orientation assimilation of WANT, Signer 5 .......................................... 215 Figure 5.23: Drift of target-sign location, Signers 3, 4, 5 ............................................ 226 Figure 6.1: Design of the perception task for the sign study ....................................... 241 Figure 7.1: Fowler’s (1983) model of VCV articulation ............................................. 250 Figure 7.2: Location assimilation in PANTS, and a contraction I_WANT .................258 Figure 7.3: Hand-Tier representation of neutral-space variant of PANTS ..................261 viii Dedication To my family. ix Acknowledgements First and foremost, I wish to thank my committee members: David Corina, C. Orhan Orgun and Tamara Swaab. I could not have arrived at this point without their guidance and support. David came to UC Davis during the second year of my graduate studies. I feel very fortunate that he then invited me to work as a research assistant in his lab, which is one of the most positive work environments I have ever been in. In my time here, I have enjoyed learning about sign language, psychology and the design and carrying out of language-related experiments. It has been a fascinating and rewarding experience, and in addition has opened the door to a number of interesting adventures, such as the SignTyp conference in Connecticut, the Zachary’s Pizza expedition during the CNS conference, and the moving of the hot tub. When I was looking for a topic for my first qualifying paper, it was Orhan who suggested that I investigate vowel-to-vowel coarticulation across particular kinds of consonants. Questions that arose during my work on that project led me to explore coarticulation in other contexts, and that research in turn became the starting point of this dissertation. It would therefore be difficult to overstate how valuable his input has been. I have also enjoyed our weekly meetings over Chinese food, where he has answered my phonetics and phonology questions with patience and humor. x Tamara gave me my first in-depth look at ERP methodology, and it was she who suggested that I investigate the MMN component in my perceptual studies. I also very much appreciate her willingness to meet with me when I have had questions, and have valued her feedback and support. A number of other people have also provided encouragement along the way, particularly at this project’s early stages. These include Diana Archangeli, Carol Fowler, Keith Johnson, Patricia Keating, Harriet Magen and Daniel Recasens. I also wish to thank audiences at the UC Berkeley Phonetics/Phonology Phorum, the 2008 meeting of the Linguistic Society of America, the 2008 Northwest Linguistics Conference, and the 2007 coarticulation workshop of the Association Francophone de la Communication Parlée in Montpellier. For my ASL studies, I have often needed guidance from others who know much more about sign language than I do. In particular, David Corina, Sarah Hafer, Tara Williams, Martha Tyrone and Claude Mauk have been extremely supportive. Thanks are also due to participants of the 2008 SignTyp conference and audiences at the 2009 meeting of the Linguistics Society of America and at Haskins Laboratories. David Corina and Tamara Swaab were my primary go-to people when I had questions related to my ERP work. However, a number of other people also gave me help along the way. Foremost among these is Tony Shahin, who has been a good friend as well as a life-saver during my initial efforts to learn about Matlab and EEGLAB. Eva Gutierrez xi showed me the ropes in SPSS. Thank you also to Emily Kappenman and Steve Luck. Any errors in my use of ERP methodology are my responsibility and no one else’s. Finally, I wish to thank members of my family, particularly those in Chico, Prague and Los Angeles, who have lived through much of this writing process with me and given me love, patience and support during this long and intensive effort. I would not be where I am without them as well. This research project was supported in part by grant NIH-NIDCD 2RO1-DC003099 (David P. Corina). xii Abstract This project investigates the production and perception of long-distance coarticulation, defined here as the articulatory influence of one phonetic element (e.g. consonant or vowel) on another across more than one intervening element. Part I explores anticipatory vowel-to-vowel (VV) coarticulation in English; Part II deals with anticipatory location-to-location (LL) effects in American Sign Language (ASL). Longdistance effects were observed in both speech and sign production, sometimes across several intervening elements. Even such long-distance effects were sometimes found to be perceptible. For the spoken-language study, sentences were created in which multiple consecutive schwas (target vowels) were followed by various context vowels. Thirty-eight English speakers were recorded as they repeated each sentence six times, and statistical tests were performed to determine the extent to which target vowel formant frequencies were influenced differently by the context vowels. For some speakers, significant effects of one vowel on another were found across as many as five intervening segments. The perception study used behavioral methods and found that even the longest-distance effects were perceptible to some listeners; nearer-distance effects were detected by all participants. Subjects’ coarticulatory production tendency was not correlated with either speaking rate or perceptual sensitivity. xiii Seventeen perception-study subjects also provided EEG data for an event-related potential (ERP) study, which used the same vowel stimuli as the behavioral perception study, and sought to determine whether ERP methodology might provide a more sensitive measure than behavioral methods. Significant ERP effects were found in response to nearer-distance VV coarticulatory effects, but generally not for the longestdistance ones. This is the first ERP study to investigate the sub-phonemic processing associated with the perception of coarticulation. In Part II, motion-capture technology was used to investigate LL coarticulation in the signing of five ASL users. Evidence was found of significant LL coarticulatory influence of one sign on another across as many as three intervening signs. However, LL effects were weaker and less frequent than the VV effects found in the spokenlanguage study. The perceptibility of these LL effects was then tested on both deaf and hearing subjects; some subjects in each group scored significantly better than chance on the task. xiv 1 CHAPTER 1 — INTRODUCTION 1.1. Motivation for study The phenomenon of coarticulation is relevant for issues as varied as lexical processing and language change, for both spoken and signed languages. However, research to date has not determined with certainty how far such effects can extend, though it is apparent that there is a substantial amount of coarticulatory variability among speakers in the production of spoken language. The first part of this project investigates the extent of long-distance vowel-to-vowel (VV) coarticulation in American English, and interspeaker variation in the production of these effects. While work on coarticulation in sign language has been conducted by some researchers (Cheek, 2001; Mauk, 2003), the question of long-distance coarticulation in sign languages appears to have been unaddressed until now. The second part of the project investigates long-distance location-to-location (LL) coarticulation in American Sign Language (ASL). In addition, while research on the issue of perceptibility of coarticulatory effects has been underway for at least a few decades in spoken language research (e.g., Lehiste & Shockey, 1972), corresponding work on sign language seems not yet to have begun in earnest. This project examines the perceptibility of longdistance coarticulatory effects in both spoken and signed language. 2 Manual-visual languages like American Sign Language (ASL) are naturally occurring and show syntactic, morphological and phonological complexity which is comparable to that of spoken languages. Sign languages are not mime, nor are such languages a word-for-word translation of any spoken language. Besides the fact that signed language is an interesting object of study in its own right, research into sign languages also brings with it the insights to be gained by a comparison of corresponding phenomena in the spoken and signed modalities. By examining the similarities and differences of various aspects of spoken and signed language, we may find that some assumptions about human language universals have to be revised, while in other cases, new insights relevant to language in general may be revealed. Although the structures of spoken and signed languages may seem quite different at first glance, the history of sign language research supports this general approach. This is seen, for example, in the formal study of sign language phonology, which began with the work of Stokoe (1960). Not only did he recognize the appropriateness of the appellation sign language, but was the first to develop the insight that traditional methods of linguistic analysis (i.e., those developed for spoken languages) could offer great utility in the study of sign, hence to the understanding of the human language capacity in general. This reasoning has been followed by many researchers since Stokoe began his work, and is the approach that I will follow here. With the basic motives of this project now established, next follows a discussion of assimilation and coarticulation, which are closely related but which nevertheless can be usefully distinguished. Since these notions have been frequently examined in previous spoken-language phonological research, this discussion will 3 concentrate more on their conception in sign language phonology and how one might expect to usefully investigate them there. After this, the specific targets of study in this project, English schwa and ASL neutral signing space, will be introduced and the reasons for their use in this project will be explained. This introductory chapter then concludes with a statement of this project’s research questions and an outline of subsequent chapters of the dissertation. 1.2. Assimilation and coarticulation 1.2.1. Assimilation in spoken language Following the line of reasoning described above, it will now be useful to sketch some relevant ideas from spoken language phonology, starting from first principles, and then consider possible sign-language analogues. In spoken language phonology, the regularity of certain segment-to-segment influences can be expressed by means of rules like those used in SPE (Chomsky & Halle, 1968). Most directly, one can express the relevant changes in terms of the segments themselves. For example, if one notices in some language that alveolar nasal stops are persistently realized as velars before [k], one might express this process as follows: n→ŋ/_k While this rule covers cases involving [k], one might subsequently find that it misses others, such as those involving nasals preceding [g], or those involving place assimilation before labials, even though these cases might seem to be effected by the 4 same process. Rephrasing the rule in terms of nasality and place-of-articulation features makes the rule more general and also offers more explanatory power concerning the underlying dynamics involved. Following the autosegmental approach (Goldsmith, 1976), one may even go one step further, as shown in Figure 1.1 below: Figure 1.1. Nasal place-of-articulation assimilation, represented in autosegmental terms. Here, different parameter types, such as nasality and place, are separated into different tiers, emphasizing their (relative) independence from one another. Such configurations can be advantageous when certain phonetic/phonological properties seem to be fundamentally related to (or “dependent” on) others, and the autosegmental approach is also useful in cases where non-linear processes are involved, such as in the templatic morphology of Arabic, in which different types of information may be carried by consonant sequences and vowel sequences, which are eventually interleaved. The autosegmental approach has also proven quite fruitful in the study of signed languages, where arguably more use is made of simultaneity of structure than in spoken languages. 5 1.2.2. Distinguishing coarticulation and assimilation Historically, the terms coarticulation and assimilation have often co-occurred in the phonetics and phonology literature of both spoken and signed languages, seemingly interchangeably in some cases (Kühnert & Nolan, 1999, for a historical summary). However, it has also proven useful to distinguish the two; for example, Keating (1985, p. 2) distinguishes assimilation from coarticulation as follows: “with assimilation, a segment which normally might have some particular target or quantitative value for a given feature, has a different target or quantitative value when adjacent to some other segment.” Accordingly, I will consider assimilation to the special case of coarticulation which is definable in terms of phonological features. That is, if the influence of item X on item Y is such that item Y undergoes a change expressible in terms of a feature alteration, then assimilation has occurred.1 Coarticulation is the more general case of articulatory influence of one phonetic element (e.g. a consonant, a vowel, a gesture) on another. As an example, consider the American English pronunciation of the word “haunt.” The velum lowering needed to articulate the [n] may occur early in the word, resulting in a form with a nasalized vowel, [hãnt]. If velum lowering was in effect throughout the duration of the vowel, then it may be said that the vowel has acquired a [+nasal] or [+nasalized] feature, as is suggested by the transcription just given, in which case this is an instance of assimilation. On the other hand, a speaker saying “haunt” 1 Keating’s description of assimilation in terms of “segments” is somewhat problematic when applied to sign language, however, which is why I have used the more general term “item.” Also, I prefer a rather loose interpretation of the term “adjacent” that Keating uses, so that assimilation can be considered possible even for items which are not strictly consecutive (as occurs in vowel harmony, for example). Assimilation can be either obligatory (e.g. nasal place assimilation in Japanese) or optional (e.g. nasal place assimilation in English in cases like “inconceivable” or “ten bags”). 6 might accomplish complete lowering of the velum in time for the onset of the [n] without having had the velum lowered until late into the articulation of the preceding vowel. In this case, it might be argued that the vowel did not actually acquire a positive feature value for nasality, even though some velum lowering did occur during the vowel. This would be an instance of coarticulation, but not assimilation. Clearly, such distinctions may not always be easy to make; for instance, in the example just presented, the exact dividing line between “nasal enough” and “not nasal enough” for a feature change to be said to have occurred is not obvious. The matter is complicated by the question of what sorts of features should be considered relevant; for example, vowel nasality is not considered phonologically contrastive in English, but may often be indispensable for hearers in cases where the nasal consonant in a VN sequence is deleted altogether. The nasal vowel could then be an important clue for the listener trying to distinguish “haunt” [hãt] and “hot” [hat] (although with this pair, context would be likely to help). 1.2.3. Assimilation and coarticulation in signed language Having distinguished assimilation and coarticulation in spoken language, we should now be in a good position to consider the corresponding distinction in signed language. As it turns out, much research has already been conducted on assimilation in sign languages such as ASL (see Sandler & Lillo-Martin, 2006). Rules representing various types of sign-language assimilation are often similar in spirit to the autosegmental rules for spoken languages that were discussed above, but depending on 7 the representational framework for signs that one adopts, the precise form of such rules will vary somewhat. 1.2.3.1. Some models of sign language phonology Stokoe (1960) treated signs as combinations of three parameters—Handshape, Location, and Movement—occurring simultaneously. More recent approaches incorporate sequential structure into sign representations as well. In discussing the history of sign language phonology, Sandler and Lillo-Martin (2006) draw a parallel with the study of spoken language phonology, in which the earliest approaches used sequential, segment-by-segment representations, while later researchers have found it useful to model certain aspects of spoken language phonology in non-sequential terms, as in the autosegmental approach. Move-Hold model The Move-Hold model of Liddell and Johnson (1989[1985]) posits that signs have a canonical phonological form which is expressible as a sequence of “Movements” (“M”) and “Holds” (“H”), which are segmental items distinguished according to whether or not the hands move. The Hold segment includes features related to both of what Stokoe would have termed Location and Handshape. This model is illustrated below in Figure 1.2 with an example given by Sandler and Lillo-Martin (2006, p. 129), which shows the Move-Hold representation of the ASL sign IDEA.2 2 As is customary in the sign language literature, glosses of ASL signs will be given in capital letters. 8 Figure 1.2. Move-Hold representation of the ASL sign IDEA. In many respects, this model offers greater explanatory power than Stokoe’s original conception of signs, which emphasized the simultaneity which is characteristic of the phonological structure of signed language, relative to that of spoken language. Since the Move-Hold model builds sequential structure into its representation of signs, some phenomena such as within-sign metathesis, requiring reference to sequentiality and hence problematic for Stokoe’s approach, are analyzed quite straightforwardly. Consistent with the autosegmental approach for spoken language, assimilation between successive signs can be represented by reassociation of the relevant features. The model incorporates 13 handshape features and 18 features for place, and can be used to describe signs with a substantial degree of phonetic detail. 9 However, the Move-Hold model has also been criticized, in part because this richness of phonetic description also leads to overgeneration and at the same time, misses important generalizations. As an example of the latter, in many signs, the general configuration of the hand(s) is consistent throughout the course of the sign, but many features of both Hold components must be specified redundantly in both the first and final columns of the Move-Hold representation of the sign, as can be seen for IDEA in Figure 1.2. Hand-Tier model Sandler’s Hand-Tier model (1986, 1987, 1989), illustrated in Figure 1.3 below, avoids many such problems by positing a hand-configuration (HC) node with multiple associations to three nodes sequentially arranged in the order Location-MovementLocation, as depicted below. Based on the idea that (global) location tends to remain relatively stable during most signs, even when path movement does occur, the two Location nodes are associated to a node called “place,” an indicator of general global position. This model is therefore similar to the Move-Hold model in incorporating sequentiality within its representation of signs, but at the same time seeks to avoid the redundancies, just discussed in connection with that model, which are undesirable from a phonological perspective.3 3 It should be noted that later modifications by Liddell & Johnson (1989) and Liddell (1990) to the Move-Hold model make use of underspecification, with a similar goal in mind. 10 Figure 1.3. Representation of a sign in the Hand-Tier model. Many cases of assimilation can be efficiently represented within this framework. An example taken from Sandler and Lillo-Martin (2006, p. 137), shown below as Figure 1.4, is the hand-configuration assimilation in the ASL compound BELIEVE, formed from the base signs THINK and MARRY. Notice that in order to preserve the canonical L-M-L sequencing in the output form, Location deletion also takes place. It should be noted that the sign illustrated in Figure 1.4, BELIEVE, is articulated by many signers with a “1” handshape transitioning into a “C” handshape during the movement of the dominant hand from the forehead down to the nondominant hand. This differs from the situation depicted in Figure 1.4, which specifies a “C” handshape throughout the duration of the sign. This model is in fact able to deal with situations in which two handshape configurations or two places are maintained in a compound; the form of BELIEVE illustrated in Figure 1.4 happens to be an example of the latter and not the former. 11 Figure 1.4. Hand-configuration assimilation, represented in the Hand-Tier model. The HC category itself includes a rich featural hierarchy, analogous to spokenlanguage feature geometry (Clements, 1985; Sagey, 1986) and incorporating work of Sandler (1995, 1996), van der Hulst (1995), Crasborn and van der Kooij (1997) and van der Kooij (2002). This is closely related to work by Corina (1990) seeking to provide an adequate account of handshape assimilation; in both approaches, descriptions of partial handshape assimilation are possible in addition to cases of total assimilation like that depicted in Figure 1.4. Like spoken-language feature geometry, this is accomplished in part by recognizing that certain features tend to behave similarly in phonological systems, typically for physiological reasons. Prosodic model The Prosodic model of Brentari (1998) differs significantly from the models just described, particularly in its treatment of movement. In this model, sign features are divided into two types, Inherent and Prosodic, with the latter coding properties not 12 visible at any particular instant (i.e., those specifically relevant to motion). Because movement does seem to be a particularly salient component of signs in terms of perception (Corina & Hildebrandt, 2002), it may be reasonable to treat it as “special” in some way as this model does, and Brentari (1998) provides additional justifications for doing so. However, some phenomena that are arguably best characterized by a single rule can only be coded in the Prosodic model by separate reference to two branches of structure. As one example of this, Sandler and Lillo-Martin (2006) discuss the ASL compound FAINT, formed from the base signs MIND and DROP. The relevant issue here is that when hand configuration assimilates in ASL, not only are handshaperelated features like finger position and orientation assimilated, but so too is any internal movement. This process can be expressed with a single rule in the Hand-Tier model, but requires separate reference in the Prosodic model to the Inherent and Prosodic feature types. One key point that has emerged in the preceding discussion is the ongoing effort of sign researchers to describe the formational parameters of signs in a way that is comprehensive but at the same time constrained enough to provide descriptions which are adequate at the phonological level. Conversely, for the kind of phonetic detail that will be examined in this project, even the substantial amount of information that can be conveyed in the Move-Hold or other sign models is not sufficient. This can be compared to the situation in spoken language: in the studies that will be presented here, sub-phonemic vowel contrasts will generally be discussed in terms of formant frequencies, measured in Hertz. This is so because even the relatively fine phonetic 13 distinctions expressible in transcription systems like the IPA, making use of diacritics indicating varying degrees of fronting, lowering and so on, are not sufficient for the task. Similarly, when sign-language coarticulation data are given, location will generally be expressed in terms of three-dimensional spatial position, measured in millimeters, because the degree of detail that is involved is not expressible in existing sign models or transcription systems. However, there are some interesting cases of assimilation in the sign results that will be discussed, and these will be expressed in phonological terms. In such cases, the Hand-Tier model will be adopted, because the key variable involved will be location, so movement need not be foregrounded as in the Prosodic model, and the relevant issues can be expressed more succinctly in the HandTier model than in the Move-Hold or other sign models. 1.2.3.2. Approaching coarticulation in signed language Leaving aside the specific case of assimilation proper, the general study of coarticulation in sign language is bound to present certain challenges, regardless of the framework one adopts. Much of the research completed to date in which sign-language coarticulation has proven relevant (or problematic) has been conducted in the context of machine learning (e.g., see Vogler & Metaxas, 1997), though some theoretical work has been successfully conducted as well (e.g. Cheek, 2001; Mauk, 2003; see Chapter 5 for a discussion of both). Here again, looking to work already done in spoken language research may prove useful (for an overview, see Hardcastle and Hewlett, 1999). Such work to date has investigated many aspects and types of spoken-language production 14 phenomena, including VV, C-to-C, lingual, labial, and velar effects, as well as work on perception and on long-distance coarticulation. By analogy, what sorts of coarticulation might prove amenable to study in sign language research? Probably, the sign parameter for which subtle phonetic-level phenomena such as coarticulatory effects can be studied most directly is location. This is so since the position of any point on the body in motion (e.g. a fingertip or a point on the palm) can be measured at any particular timepoint and described in terms of three numerical values, i.e. those corresponding to the three spatial dimensions (using motion-capture technology or multiple video cameras) or perhaps just two (if only a single video image is used). If one accepts the premise that movement is the most perceptually salient feature of signs (e.g., see Hildebrandt & Corina, 2002), and hence may play a role in sign akin to that of vowels in spoken language, then sign location (as well as handshape) might be seen as performing a more “consonantal” function. If so, LL effects could be considered an analogue of one kind of C-to-C coarticulation. The detailed study of handshape-to-handshape (HH) coarticulation presents more difficulties than that of LL effects, partly because of the many interacting components—such as the numerous joints on the hand—that are involved in the articulations of particular handshapes. Fine-level phonetic measurements and descriptions of handshape effects in general will require measures for relative locations of multiple parts of the hand (e.g. various points on the different fingers), which differ as joints are bent at various angles, and would also have to be robust to changes in orientation. Still, interesting experimental work can be accomplished in this domain, particularly if one focuses on a limited subset of all the articulatory possibilities. One 15 example of this is Cheek’s (2001) work investigating HH coarticulation in the contexts of the “1” and “5” handshapes. This was accomplished by measuring and comparing, between the two contexts, pinky-to-base-of-hand distances at key timing points during the articulation of various signs; this distance was expected to be smaller in the context of the “1” handshape (since the pinky is curled in when that handshape is formed) than in the “5” context (since the pinky is extended for that handshape). Other coarticulation types may prove to be even more challenging as research targets. Although previous theoretical (Perlmutter, 1991) and experimental research on perceptual saliency properties of different sign parameters (Corina & Hildebrandt, 2002) indicate that movement-to-movement (MM) effects may be the closest analogue of VV effects in spoken language, MM coarticulation is likely to be much more challenging to study and describe. While VV effects can be analyzed fairly straightforwardly by means of formant frequency measurements at particular timepoints, movements are dynamic events tracing a three-dimensional path during an interval of time. Describing MM effects in general would seem to require an algorithm able to analyze the data record of such events in 3-space as being consistent with an arc, or a single linear motion, or a spiral, and so on for the complete inventory of motion types, regardless of other complicating issues such as spatial orientation. As an example of the difficulties involved, might an arc path movement in one sign influence a closed-to-open finger movement in the following sign? How would one seek to measure this? One possibility I would suggest is that the complexity associated with the characterization of movements might be usefully reduced, by looking for example 16 at simpler but importantly related parameters such as velocity, path length, or starting and ending points of the paths traced by individual movements. Simple parameter-to-parameter (X-to-X) effects will probably not tell the whole story here, just as they do not in spoken language coarticulation. For example, prosodic structure has been shown to be relevant in spoken-language work on coarticulation (e.g., Cho, 1999, 2004), and there is no reason to assume this could not be the case with sign. Brentari and Goldsmith (1993) have argued that a sign’s non-dominant hand may be analogous to a syllable coda in spoken language, so coarticulatory effects related specifically to the dominant or non-dominant hand might have implications for the prosodic nature of signs or of language overall. More generally, one might also expect to see interactions between sign parameters, such as location-to-movement effects, just as one sees C-to-V effects in spoken language. Then again, perhaps some such phenomena will prove to be specific to one modality only; if so, this would be informative as well. Because it is unlikely that all such possibilities can be investigated in a single project, I have chosen in the current study to focus on LL coarticulation, since as discussed above, location is probably the sign parameter for which coarticulatory effects can be most straightforwardly measured. 1.3. Comparison of English schwa and ASL neutral space In this project, I investigate coarticulation in spoken and signed language, with an emphasis on the temporal extent of the phenomenon and variability in its production 17 and perception among speakers and signers. Since coarticulation is a complex, multifaceted object of study, I have chosen to narrow my focus specifically onto VV coarticulation in English and LL coarticulation in American Sign Language (ASL). Specifically, I examine long-distance coarticulatory effects of various English vowels on the instantiation of schwa and of various sign locations on that of ASL neutral space. Schwa and neutral space were chosen as target items because of certain parallels that may be drawn between the two. However, when considered more closely and with respect to these items’ phonological status, such similarities appear more superficial; therefore, an examination of the schwa - neutral space analogy will be useful in evaluating the extent to which the comparison may be valid. 1.3.1. Status of schwa The schwa is a mid central vowel and as such is located in the middle of twodimensional vowel space, as illustrated in Figure 1.5 below. This is so whether we consider schwa in terms of the articulatory properties of height and frontness or the acoustically determined quantities of first and second formant, since there is a strong correspondence between these articulatory and acoustic measures.4 Unstressed English vowels often reduce to schwa, resulting in the oppositions seen in pairs such as {photography [a], photograph [ə]} and {reflex [i], reflexive [ə]}. Although other outcomes of vowel reduction, such as the high central [i], are possible, these will not be considered here, as schwa appears to be the most frequent such 4 At first glance, this articulatory-perceptual distinction seems quite different from the situation in signed language, since the sign articulators are viewed directly by the perceiver. However, since it is as yet unclear just how language is “perceived,” whether there might in fact be a parallel along these lines between signed and spoken language is an intriguing question. 18 reduction outcome and is certainly the most-discussed in the relevant literature. This is so both for English and for other languages; for example, it is well-known that some Russian vowels undergo reduction to schwa when in stressed or pre-tonic position. In fact, schwa plays a “special” role in the phonological systems of most languages in which it is a member, typically being the product of processes like reduction or epenthesis (van Oostendorp, 2003). Figure 1.5. The familiar vowel quadrangle, with mid central schwa also shown. The articulatory parameters of tongue height and backness correspond roughly to the inverses of the acoustic parameters first and second formant frequency, respectively. In my work on spoken-language coarticulation, I have chosen schwa as target vowel because of its susceptibility to coarticulatory influence from nearby vowels, a 19 property that has emerged in previous studies, both acoustic and articulatory (e.g. Fowler, 1981; Alfonso & Baer, 1982; respectively). The great variability in the production of English schwa has raised the question of whether this vowel may be completely underspecified except for a [+vocalic] or [-consonantal] feature when considered in phonological terms (e.g. see van Oostendorp, 2003), or may be “targetless” when considered in articulatory terms. Browman and Goldstein (1992) investigated the latter possibility through experimental study of one speaker and concluded that at least for that speaker, schwa was not quite targetless, but rather had a weak target which was “completely predictable ... [corresponding] to the mean tongue ... position for all the full vowels.” (p. 56) Even if schwa is not completely underspecified or targetless, its coarticulatory tendencies are well-established and hence this vowel has seemed a logical choice as target in the present study of long-distance coarticulation. Analysis of acoustic data obtained in an earlier spoken-language coarticulation study has found strong evidence that in environments containing multiple consecutive schwas, VV coarticulatory effects can reach at least as far as six segments’ distance (Grosvald & Corina, in press). The search for a possible analogue of such effects in sign language has led me to wonder if neutral signing space might behave similarly to schwa in its articulatory behavior, but this also raises the question of how comparable English schwa and ASL neutral space are within their linguistic systems. 20 1.3.2. Neutral signing space A look at the sign language phonology literature suggests that the term “neutral space,” typically defined as the general signing area in front of the signer not immediately adjacent to any particular body part, refers to something that may actually not be unitary in nature. This is apparent if we consider how two prominent phonological theories deal with neutral space. Recall that in Brentari’s Prosodic model (Brentari, 1998), each sign is represented by means of a feature tree in which motion is accorded special status, having its own branch of structure in which motion-related “prosodic features” are coded, while articulator and place-of-articulation (POA) information are located in separate branches under a broad “inherent features” node (p. 94). Under the POA node are features coding where on the body a sign is articulated, along with the “articulatory plane” associated with the sign. For signs articulated in neutral space, there is no body location specified, except for the case where the non-dominant hand is used as sign location, in which case “h2” is taken as body location. Since the latter kind of twohanded signs are also typically articulated in the neutral space area, this model suggests that not all signs physically articulated in that region are phonologically alike with respect to location. In her Hand-Tier model, Sandler also represents such “h2-P” (i.e. non-dominant hand as Place) signs differently from other neutral-space signs. In the Hand-Tier model, “Place” refers to the general region where a sign is articulated, and is represented as a node which is linked to more-finely tuned “Location” nodes in the posited Location- 21 Movement-Location sequence. For example, the neutral-space signs WANT and DON’T_WANT have place feature [trunk] throughout their duration, but have location features for distance and height (“settings”) that change as needed to describe each sign. In the case of WANT, the “distance setting” changes from [distal] to [proximal] while the “height setting” remains set at [mid] during the articulation of the sign, while during the articulation of DON’T_WANT, the distance setting remains set at [proximal] while the height setting changes from [mid] to [low]. (Sandler & LilloMartin, 2006, p. 229-30). For neutral-space signs articulated on the non-dominant hand, Place is specified instead as [h2] (p. 186). Neither of these theories posits that neutralspace signs’ representations are underspecified with respect to location.5 In fact, both the Hand-Tier and Prosodic models allow for phonologically coded positional distinctions within the neutral-space region. The signs just discussed, WANT and DON’T_WANT, are represented in the Hand-Tier model by varying the settings of phonological features coding—within neutral space—height of articulation and proximity to the body. Similarly, the Prosodic model allows for “setting changes” in neutral-space signs such as the move from [contra] to [ipsi] in the articulation of CHILDREN through the transverse plane (i.e. toward the dominant hand’s side of the body from the other side of the body). Such setting changes (i.e. movements) are specified on the Prosodic Features branch of a sign’s representation, and allow for movements in both directions and along either dimension within the plane specified as a neutral-space sign’s place of articulation (Brentari, 1998, p. 151-4). In addition to these sorts of subdivisions of 5 In Brentari’s theory, the lack of association with any particular body area in the case of neutral-space signs not articulated on the non-dominant hand is not treated as a case of underspecification. 22 neutral space recognized by the Hand-Tier and Prosodic models, both models also make a distinction between neutral-space signs like the ones just described and those articulated on the non-dominant hand, treated in both models as a separate place of articulation.6 I am also unaware of any evidence suggesting that other signing locations reduce to neutral space in particular contexts the way unstressed vowels so often reduce to schwa (more discussion of this point, including the possible relevance to this issue of sign whispering, follows in Section 1.3.3). Although one might expect, for instance, that signs articulated at higher locations tend to migrate to lower locations—therefore nearer to neutral space—when preceded and followed by lower-articulated signs during rapid signing, an explanation of this type of process need not invoke neutral space specifically, nor require an argument involving underspecification. For example, Mauk’s (2003) discussion of coarticulation in this context is phrased in terms of undershoot, and he does not claim any special status for neutral space. 6 An interesting question about “h2-P” signs is how to explain that the non-dominant hand’s physical position is almost always itself in neutral space. Assuming that the non-dominant hand itself is not phonologically specified as having a neutral-space location (double-marking of monosyllabic signs’ location as both “neutral-space” and “h2” would violate Battison’s (1978) observation that a sign has only one major body area), this would seem to suggest that neutral space serves as a default position in these cases, which would be consistent with underspecification. 23 Figure 1.6. Expected direction of influence of various vowels on schwa and of various sign locations on neutral signing space (N.S.). 1.3.3. Schwa and neutral space: Comparison and contrast In light of the preceding discussions of English schwa and ASL neutral signing space, meaningful comparison of the two within their respective linguistic systems can be made. The two share a significant degree of articulatory freedom—schwa is articulated in the middle of vowel space and displays more articulatory variability than other vowels, and neutral-space signs are articulated at an area in front of the signer’s body where freedom of movement also appears to be relatively large. As an illustration of the latter point, consider that the nose and chin are just a few centimeters apart and serve as distinct sign locations, while the low and high boundaries of neutral signing space appear to be significantly further apart. Work by Manuel and Krakow (1984) and Manuel (1990) indicates that languages with more crowded vowel inventories tend to display less VV coarticulation, presumably because of a smaller margin of error, which suggests that a signing location like neutral space with fewer close neighbors might 24 also tend to show more coarticulatory variability. Figure 1.6 above illustrates the expected coarticulatory influence of various linguistic elements (i.e. vowels or signing locations) on schwa and neutral space within their respective articulatory domains. The choice to use schwa and neutral space as coarticulation targets in the present study was strongly motivated by this comparison. Regardless of any articulatory similarities between schwa and neutral space, however, the functional differences between the two are considerable. Probably the most obvious example of such a difference is the fact that English schwa is a segment, while post-Stokoe (1960) analyses tend to treat sign parameters like location as subsegmental entities more akin to features or autosegmental units. Perhaps the clearest evidence for this distinction is the fact that schwa can be uttered in isolation, while location does not have any such articulatory independence within ASL. Though perhaps less significant, it should also be noted that freedom of movement within neutral space is three-dimensional, while vowel space is most often depicted as twodimensional, though this characterization of vowels is somewhat incomplete. It is also possible that schwa admits some freedom of variation in other ways, such as via labial action or with respect to the advanced tongue root parameter. In addition, while the vowel-consonant distinction in spoken-language linguistics is fairly plain, the existence of entities within sign linguistics analogous to spoken-language consonants and vowels is not universally accepted. Where suggestions of such a parallel might be made, whether in terms of phonological structure (e.g. Perlmutter, 1991) or perceptual salience (e.g. Corina & Hildebrandt, 2002), the 25 movement parameter seems to be a much stronger candidate for a vowel analogue than location. This alone means that schwa and neutral space may belong to significantly different category types within the linguistic systems of English and ASL. Finally, despite their apparent articulatory similarities, schwa and neutral space do not appear to behave alike within their phonological systems, a fact reflected in the lack of similarity of their treatment within the most dominant phonological models. For example, to the best of my knowledge, ASL lacks the abundance of word pairs like those in English mentioned earlier (photography-photograph and reflex-reflexive) that would offer support for the existence of oppositions of neutral space with other sign locations like the oppositions of [a] and [i] with schwa in the (American) English word pairs just given. This indicates that neutral space is not some kind of default location that other location values migrate to in particular circumstances like the unstressed environments which tend to yield schwa. At first glance, evidence against this assertion might seem to come from a phenomenon like sign whispering, in which the signing space is substantially reduced and may be confined to a small region such as neutral space. However, Emmorey, McCullough and Brentari (2003, p. 41) report that this smaller articulation space in sign whispering is restricted to “a location to the side, below the chest or to a location not easily observed.” In other words, neutral space is not a region to which the articulation space is exclusively limited in these circumstances, but is simply one of a number of options. This is unlike the situation of schwa described above. 26 1.3.4. Schwa and neutral space: Summary Because of their positions in their respective articulatory spaces, schwa and neutral space share some evident similarities. Previous investigations, as well as analysis of data to be presented in the current studies, indicate that both are susceptible to long-distance coarticulatory effects, although admittedly this does not mean that both are unique in this way within the inventories of their phonological systems. Sign languages in particular have been little-investigated with respect to the possibility of differing coarticulatory patterns among segments or other contextual variables. In any event, both schwa and neutral space may be considered useful targets of study because of their similar positioning in the middle of their articulatory spaces: they can both be expected to undergo influence from “above,” “below,” or from “the side,” whether in the literal physical sense of where the articulators (whether tongue or hands) are located, or with respect to the somewhat more abstractly conceived formant space location in the case of spoken language. Nevertheless, based on the considerations explored in this paper, the schwa neutral space analogy must be considered imperfect at best. Findings like those of Corina and Hildebrandt (2002), as well as phonological models such as Perlmutter’s Mora model (Perlmutter, 1991), argue for movement (perhaps in combination with location) being a sign parameter more analogous to the vowel in spoken language, which if so would mean that the status of schwa and neutral space within speech and sign must be fundamentally dissimilar. Therefore, the present study’s investigations of English and ASL coarticulation, while taking advantage of the “middle of articulation 27 space” position that schwa and neutral space have in common, should not be considered an attempt to show that these two items are analogous in any deeper sense such as with respect to their phonological status. 1.4. Dissertation outline This dissertation continues in Chapter 2 with the presentation of a production study of VV coarticulation in English in which two experiments were carried out. The first of these involved a very limited set of vowel contrasts while the second investigated a larger vowel set. Long-distance VV effects were seen over a greater distance than has been found in previous studies like Magen’s (1997): several of the 20 speakers tested showed significant anticipatory VV effects across at least three vowels’ distance. A great deal of variation among speakers was also seen, however, and some speakers showed no or only weak effects. Recordings made of some of the production-study speakers were used as stimuli for the closely-linked perception study, which was carried out simultaneously with the production study and the results of which are presented in Chapters 3 and 4. The perception study involved both behavioral and event-related potential (ERP) methodologies; the behavioral results are described in Chapter 3, while the ERP study outcomes are presented in Chapter 4. The results showed that all listeners were sensitive to nearer-distance effects, while at further distances much more variation between listeners was seen. Some listeners were sensitive even to effects which had occurred over three vowels’ distance. Somewhat unexpectedly, strength of 28 coarticulatory effects was not significantly correlated with either speaking rate or perceptual sensitivity to such effects. The approach followed in the spoken-language study—that of performing a production and perception study simultaneously—was also followed in this project’s investigation of coarticulation in ASL. Chapter 5 describes a LL coarticulation production study of ASL, which examined five signers and which, like the spokenlanguage study, found evidence of long-distance coarticulatory effects in some signers as well as a great deal of variability among the participants. The task also included a “non-linguistic coarticulation” component. Chapter 6 discusses the integrated perception study, whose results indicate that at least some signers are sensitive even to relatively subtle Location-related coarticulatory effects. Chapter 7 is the concluding chapter of the dissertation, in which some implications of this project’s findings are presented. 1.5. Research questions and summary of results Following are the main questions that this research aimed to address, along with a brief summary of the outcomes that were found. 1) How far can anticipatory VV coarticulation extend in English? The great majority of the 38 speakers investigated here showed significant coarticulatory VV effects over at least one intervening consonant. Several speakers persistently showed such effects across as many as five intervening segments 29 (including two intervening vowels). Follow-up experiments indicated that for such speakers, even longer-distance effects might sometimes occur, but not as consistently. 2) How far can anticipatory LL coarticulation extend in ASL? Five signers of ASL were investigated and some evidence of LL coarticulation was found, including apparent cases of LL effects across two or three intervening signs. However, the strength of these effects was notably weaker than that of the speech effects. 3) In both cases, how perceptible are these effects? The nearer-distance speech effects were easily perceived by all 28 listeners who took part in the speech perception study. Even the longer-distance effects, which were much more subtle, were perceived by a few listeners. Some of the participants in the corresponding sign perception study also performed at significantly above-chance levels, but in contrast to the speech study, such outcomes were not particularly common, even for shorter-distance coarticulatory effects. 4) Can ERP methodology offer a more sensitive perception measure than behavioral methods, at least with respect to spoken-language coarticulatory effects? Results here were mixed. Perception of closer-distance effects was consistently associated with the mismatch negativity (MMN) component, which has previously been shown to be sensitive to phonemic contrasts (see Chapter 4 for more on the MMN). To the best of my knowledge, this is the first finding of an MMN-like response 30 to the sub-phonemic contrasts associated with VV coarticulation. However, for the longest-distance contrasts, no MMN-like results were found at all; indeed, an unexpected positivity was seen. 5) With respect to production, are the coarticulatory effects we see in English and ASL truly language-specific, or may such effects be seen in non-linguistic actions as well? To address this question, a non-linguistic task was incorporated into the sign production experiment. The results were quite similar to those seen in the sign production experiment—significant but weak effects were found. 6) What lessons about language in general can we learn from the modality-specific outcomes we find here? With respect to coarticulatory behavior in the contexts examined here, signing and non-linguistic manual actions clustered together in showing significant but relatively weak coarticulation patterns, unlike speech, for which strong coarticulatory effects were the norm. This may be due to the larger articulators used in signing and other manual actions, and suggests that “coarticulation” as understood by linguists might be more usefully understood in the broader context of human actions in general. One implication of the present findings that may be specific to language relates to the temporal extent of language planning, which seems to differ between modalities when considered in terms of absolute time units like milliseconds—unsurprisingly on the one hand, since sign articulators are larger and slower than those used in speech. But on the other hand, this suggests that the limits of linguistic planning might be most 31 appropriately understood in terms of units such as gestures. These and other implications of this work are explored in more detail throughout the dissertation. 32 PART I: SPOKEN-LANGUAGE PRODUCTION AND PERCEPTION CHAPTER 2 — COARTICULATION PRODUCTION IN ENGLISH 2.1. Introduction Following Öhman’s (1966) groundbreaking work on transconsonantal vowel-tovowel (VV) coarticulation in Swedish, English and Russian, researchers have sought to understand how different factors influence these effects, such factors being as varied as the specific consonants and vowels involved (Öhman; Butcher & Weiher, 1976; Recasens, 1984), prosodic context (Cho, 2004; Fletcher, 2004), and the vowel inventory of the language in question (Manuel & Krakow, 1984; Manuel, 1990). Instances of long-distance coarticulation, involving effects crossing two or more intervening segments, have been found for phenomena such as lip protrusion (Benguerel & Cowan, 1974), velum movement (Moll & Daniloff, 1971), and liquid resonances (West, 1999; Heid & Hawkins, 2000), but the possible range of VV coarticulatory effects is not yet known. This study investigates the extent and the perceptibility of long-distance VV coarticulation, with a particular focus on variation among speakers and listeners. 33 Despite early indications that VV coarticulation might be a relatively local phenomenon (e.g. Gay, 1977), subsequent work has shown that this is not always the case. For example, Magen (1997) analyzed [bVbƏbVb] sequences produced by four English speakers and found evidence of coarticulatory effects between the first and final vowel, meaning that such effects can cross foot boundaries and multiple syllable boundaries. However, this was not so for all four speakers in the study, which illustrates the fact that VV coarticulatory effects vary not just from language to language or context to context, but also from speaker to speaker. This suggests that in order to determine how prevalent such effects are among speakers and how far they can extend, researchers may find it necessary to recruit greater numbers of speakers than has generally been done before, since this may be the only way to get around the statistical problem of large interspeaker variation. The present study’s use of a relatively large number of participants (38 speakers) is an attempt to move in such a direction. An additional goal of this study was to look for coarticulatory effects in naturallanguage utterances. Although the use of nonsense words can often not be avoided when all permutations of even a limited set of consonants, vowels, and prosodic contexts are considered, it seems reasonable to suppose that this may result in study participants speaking less fluently, thus lessening the potential for long-distance effects to occur. Therefore, rather than analyze the outcome of a small number of speakers saying a large number of sentences or nonsense words (cf. the Gay, 1977, study mentioned above, which analyzed two speakers’ production of all VCV combinations of V={i, a, u} and C={p, t, k}), the present study investigated a large number of 34 speakers each saying a small number of natural-language sentences, with the corresponding trade-off that the set of contrasts considered was limited. Although it seems evident from earlier work that some speakers do not coarticulate much or at all over long distances (Gay, 1977; Magen, 1997), the fact that some speakers do must be accounted for in any viable model of speech production (see Farnetani & Recasens, 1999, for an overview of relevant approaches). The existence of long-distance coarticulatory effects has similar implications for theories of speech perception, since research has shown that such effects are sometimes perceptible to listeners. For example, Martin and Bunnell (1982) cross-spliced recordings of words in such a way as to create stimuli which varied in the consistency of their VV coarticulatory patterns, and then had listeners perform recognition tasks. Stimuli which were consistent with naturally occurring coarticulation patterns tended to be associated with fewer false alarms and faster reaction times (see also Scarborough, 2003). The present study provided the opportunity to explore the perception of VV effects from a different, though related, perspective. Here, the ability of study participants to distinguish vowels which had been differently “colored” by coarticulatory effects at various distances was examined. Since longer-distance effects may be expected to be more subtle than nearer-distance effects (all else being equal), this provided an opportunity to explore variation among listeners in terms of their sensitivity to such effects. If long-distance coarticulation is sometimes perceptible, this would be particularly relevant to the study of lexical processing, particularly in light of the fact that both the production and the perception of coarticulation appear to be heavily influenced by the coarticulatory patterns of users’ native language (Beddor, 35 Harnsberger & Lindemann, 2002). If listeners are able to “hear ahead” a few segments in the flow of spoken language, it could help them narrow down the possible range of upcoming words more effectively as they process the incoming speech stream. From this standpoint, all else being equal, anticipatory coarticulation would probably be more useful to listeners than carryover coarticulation. Therefore, the current study focused on anticipatory, not carryover, VV coarticulation. 2.1.1. Segmental and contextual factors Öhman (1966) theorized that VCV sequences were essentially diphthongal in nature, with the consonant gesture superimposed in the middle. His analysis has been supported by some other researchers, such as Purcell (1979), and was extended by Fowler (1983), who proposed the model shown below in Figure 2.1. Fowler’s suggestion is that in a VV or VCV sequence, the most acoustically salient or “prominent” features of the segments may be perceived by the listener as abrupt transitions from one sound to the next, while in fact the actions characteristic of the various segments are still performed, albeit simultaneously and with little acoustic salience. 36 Figure 2.1. Fowler’s (1983) model of VCV articulation. The dotted vertical lines in the second picture indicate points in time where acoustic prominence shifts from one sound to the next, and where listeners are therefore likely to perceive a segment boundary. The overlapping solid lines represent the physical articulations of each sound, which can occur simultaneously. Öhman (1966) also found that some consonants seemed to block VV coarticulation; these include fricatives in English and palatalized consonants in Russian. This led to speculation that consonant sounds requiring greater articulatory effort involving the tongue body might tend to inhibit coarticulatory effects between surrounding vowels, since such effects would tend to be blocked by the active articulatory movement associated with the intervening consonant. One might expect for example that VCV sequences in which the consonant requires active tongue body movement would show weak VV coarticulatory effects relative to VCV sequences in which C is bilabial. In fact, much research supports this (e.g., Butcher & Weiher, 1976; Purcell, 1979; Recasens, 1984; Fowler & Brancazio, 2000). 37 In an electropalatographic study, Butcher and Weiher (1976) found that tongue movements for [t] are smaller and faster than those for [k], with more VV coarticulation occurring across [t] than [k]. In acoustic and electropalatographic studies on Catalan, Recasens (1984; 2002) has found that the extent of VV coarticulation, measured both in terms of F2 and fronting, is inversely related to the amount of tongue dorsum contact, which is consistent with the predictions of a model based on “degree of coarticulatory constraint” (DAC) (Recasens, Pallarès & Fontdevila, 1997). In this model, for example, bilabials have a DAC value of 1, the lowest possible, while dorsals, including dark [l], have a DAC value of 3; dentals and alveolars like [n] and light [l] fall in the middle (DAC=2). Modaressi et al. (2004) have also found that alveolars in VCV sequences are resistant to anticipatory coarticulation influence from the following vowel, perhaps allowing for greater carryover effects from the first vowel on the second. Öhman (1966) originally suggested that certain consonants, such as those with secondary articulations (e.g. palatalized consonants), might operate on a “channel” (articulatory subsystem) typically associated with vowels. Keating (1985) suggests that this intuition can best be expressed in terms of an autosegmental model in which secondary consonantal properties like palatalization are essentially vowel tier features, and that consonants with such features act to block VV coarticulation (see Figure 2.2 below). 38 Vowel feature tier Consonant feature tier [αF] | V [βF] | C | [δG] [γF] | V Figure 2.2. Coarticulation model from Keating (1985). The consonant has a secondary articulation which tends to block VV coarticulation, since the two vowels’ features are no longer adjacent. On the other hand, Gay (1977) contradicts Öhman (1966) and asserts that it is the CV sequence that forms the basic gestural unit of a VCV sequence, rather than the two vowels, and Shaffer (1982) also criticizes the continuous vowel-production model. Choi and Keating (1991) have found significant, though small, amounts of VV coarticulation across palatalized consonants in Russian and in other Slavic languages, implying that coarticulation is not blocked as effectively by such consonants as Öhman (1966) had suggested. In a study designed to test the validity of the different models suggested by Fowler (1983) and Keating (1985), Hussein (1990) examined coarticulation patterns in VCV sequences in Arabic, but found that the actual situation was too complicated for either model to handle adequately. Different vowels also appear to exhibit different coarticulatory properties. For example, Bell-Berti et al. (1979) found that different vowels were associated with different degrees of velum height during adjacent consonants. Butcher and Weiher 39 (1976) found that of the vowels [i], [u], and [a], [i] exerted by far the greatest coarticulatory influence on other vowels, while [a] exerted the least; for instance, the researchers found that the only significant coarticulation in VCV sequences when C was [k] occurred when [i] influenced [a]. Gay’s (1974) cinefluorographic study also found carryover articulation occurring only on [a]. Some research, such as that of Cho (2004) and Fletcher (2004), has suggested that stressed vowels tend to show more coarticulatory dominance (i.e., they influence other vowels more, and are less influenced by other vowels) than unstressed vowels. Cho (1999, 2004) has also found that vowels are more resistant to coarticulatory effects across prosodic domain boundaries. In addition, such resistance seems to be stronger at the boundaries of higher-level prosodic domains, so that for example, more VV coarticulation can be expected to occur across word boundaries than across Intonation Phrase boundaries. 2.1.2. Language-specific factors The possibility that different languages might exhibit different coarticulatory patterns was raised early on, when Öhman (1966) found that Russian behaved differently with respect to VV coarticulation than English or Swedish. Some possible explanations for this, involving articulatory factors such as tongue body behavior or secondary articulation, were discussed above. A deeper question is whether it is not only articulatory processes, but also the language-specific phonemic contrasts themselves, which can influence the nature or extent of VV coarticulation. 40 For instance, it seems reasonable to suppose that the size of a language’s vowel inventory might determine the extent of permissible VV coarticulation, since a smaller vowel inventory would tend to allow greater variation within each vowel’s articulation space without crossover into the spaces of other vowels. To test this hypothesis, Manuel and Krakow (1984) and Manuel (1990) conducted studies comparing VV coarticulation in languages with five-vowel inventories (Ndebele and Shona in the first study, Swahili and Shona in the second) to that in languages with larger vowel systems (Sotho in the first study, with seven vowels, and English in the second). Both studies found that coarticulation was less extensive in the languages with more crowded vowel inventories. In addition, the first study found that [a] showed more movement than [e] in the languages that were examined, which would be expected since [a] has fewer near neighbors than [e] in those languages and hence might be expected to vary more freely. The question of whether consonants of longer duration might affect VV coarticulation has also been addressed, but the results to date have been inconclusive. Since coarticulation can generally be expected to diminish across greater distances, one might expect that long consonants would tend to block or dampen coarticulation relative to short ones. Some research in English (e.g., Huffman, 1986; Magen, 1997) has found that this does not seem to be the case, but the issue appears not to have been investigated systematically in languages in which consonant length is contrastive. 2.1.3. Speaker-specific factors In many cases, coarticulatory patterns appear to be idiosyncratic, which may be due to the wide range of options speakers appear to have when producing a target 41 sound (Fowler & Turvey, 1980; Lindblom, Lubker & Gay, 1979). In an electropalatographic study, Butcher and Weiher (1976) analyzed patterns of articulation and coarticulation in three speakers’ VCV sequences, and found a great deal of interspeaker variation. For example, speakers varied in the amount of dorsum-palate contact they made in transitions during sequences like [ta] and [ti]. Some speakers apparently favor more gestural preprogramming, leading to greater anticipatory coarticulation, than others. In another study highlighting interspeaker differences, Parush et al. (1983) found that talkers producing VCV sequences with velar stops and back vowels exhibited coarticulatory behavior patterns that were consistent between speakers for carryover coarticulation but different for anticipatory coarticulation. That speaking rate may be relevant has also been mentioned by some researchers, such as Hussein (1990), who suggested that fast talkers may tend to coarticulate more. Hertrich and Ackermann (1995) found that slower speech was associated with less carryover coarticulation, but no significant difference in anticipatory coarticulation, relative to more rapid speech. 2.2. First production experiment: [i] and [a] contexts The first production experiment investigated the extent of coarticulatory effects of vowels [i] and [a] on the vowel schwa [ə]. In the second production experiment, to be discussed in Section 2.3, the set of context vowels was expanded to include [æ] and [u] as well. 42 2.2.1. Methodology 2.2.1.1. Speakers Twenty participants (11 female, 9 male) took part in the first production experiment. Seven subjects were known personally to the author and agreed freely to take part; the other thirteen were undergraduate students at the University of California at Davis who received course credit for participating. The subjects’ ages ranged from 18 to 62 (mean age = 25.2; SD = 13.3). All participants were native speakers of American English with no history of speech or hearing problems. All participants were uninformed as to the purpose of the study. The first seven participants took part only in the production experiment. A subset of their vowel recordings was then used as stimuli for subsequent subjects, who took part in both the production and perception experiments (see Chapter 3 for the perception study). While every effort was made to recruit only monolingual speakers, most subjects had been exposed to at least one other language as students in a university with a foreign language course requirement. Of these, four felt that they had acquired substantial knowledge of another language. However, significance testing results in the analysis of the group dataset did not change depending on whether or not those subjects were excluded; this was so for both the production and perception studies. Therefore all subjects’ data were included. 2.2.1.2. Speech samples First, it was necessary to create sentences containing plentiful opportunities for VV coarticulation to occur. The vowels [i] and [a] were chosen as context vowels 43 because of their distance in vowel space. Consecutive vowels likely to be produced as schwas, or at least substantially reduced, were used as targets; schwa was chosen because of its susceptibility to coarticulatory influence from neighboring vowels, a property that is has emerged in previous studies, both acoustic and articulatory (e.g. Fowler, 1981; Alfonso & Baer, 1982; respectively). The sentences used were: “It’s fun to look up at a key.” “It’s fun to look up at a car.” These items were the outcome of the following set of preferences: (1) sentences containing only real words, for the reasons discussed earlier; (2) sentences not differing prior to the context ([i] or [a]) vowel, which was to be the sentence-final vowel; (3) monosyllabic words containing the target (schwa) and context ([i] and [a]) vowels, so that coarticulatory effects would be “spontaneous” and not well-practiced within particular lexical items; (4) function words containing the target vowels and content words containing the context vowels, to encourage reduction of target vowels and full pronunciation of context vowels; (5) context-vowel content words having relatively high (and similar) frequency of use; (6) voiceless intervening consonants, so that vowel boundaries would be as clear as possible when making formant-frequency measurements and when creating stimuli for the perception study (see Chapter 3). To minimize interference from intervening consonants’ demands on the tongue body (cf. Recasens, 1984), it might have been preferable if the exclusive use of bilabial intervening consonants had been feasible, such as in Magen (1997), where nonsense words of the form [bVbəbVb] were used. However, in the multi-word context 44 necessary for this investigation and given the above constraints, issues like lexical gaps and morphological/syntactic constraints quickly asserted themselves. In addition, excessive alliteration would raise issues of articulatory ease and naturalness (e.g., “Peter Piper picked a peck of pickled peppers”). In the end, the full array of voiceless stops in English was used, with those making fewer demands on the tongue body located further away from the context vowels, to “assist” longer-distance effects at the possible expense of shorter-distance ones.7 One point worth noting regarding these sentences is that any coarticulatory effects of context vowels extending at least as far back as the target vowel in the word up would have to extend across consonants formed at all three places of articulation of English stops (labial, alveolar and velar), which would be of interest in light of the literature discussed earlier in Section 2.1.1, much of which suggests that VV effects tend to be weakened or entirely absent when the intervening consonant is coronal or dorsal. A randomized list containing six copies of each sentence was provided to each speaker. Before recording, speakers were told about list effects and the need for their avoidance. In order to encourage consistent prosodic patterning among these utterances, speakers were asked to say the sentences as if they were being spoken in a normal conversation in response to the question, “What’s it fun to look up at?”, with the intended effect of obtaining utterances with primary stress on the final word, with some emphasis also expected on the word “fun.” Speakers were given a chance to rehearse until any substantial deviations from this pattern, such as “It’s fun to look 'up at a car” (i.e., as if meaning “not down at a car”) were corrected. 7 Pilot testing had indicated that VCV effects on schwa were quite common, even across [k]. 45 Figure 2.3. The diagram shows the expected influence on schwa of coarticulatory effects of nearby [i] or [a]. Near [i], F1 is lowered and F2 is raised, while the reverse holds near [a]. The magnitude of coarticulatory effects may be exaggerated in this figure. The final vowel, either [i] or [a], served as context vowel, while the preceding vowels in the words “up,” “at,” and “a,” were the target vowels. In this paper, these will be referred to as “distance-3,” “distance-2” and “distance-1” vowels, respectively. Because of the intervening consonants, these distance conditions correspond respectively to a total of 5, 3 and 1 total intervening segments between the context and target vowels. 46 Figure 2.3 above illustrates the expected coarticulatory effects of [i] and [a] on schwa in two-dimensional formant space.8 Since it is difficult to create natural-sounding sentences in English containing multiple consecutive unstressed syllables, the vowel [^] was used as distance-3 vowel because of its acoustic similarity to schwa. In careful speech, the vowels in “at” [æ] and “a” [eI] are not schwas either, of course, but it was expected that in the casual speech speakers were encouraged to produce here, these vowels would be realized as schwa or at least be substantially reduced. During the rehearsal preceding the recording process, speakers who did not seem to be speaking naturally—presumably because of the perceived formality of participating in a scientific experiment—were gently coached until their production became more relaxed. For example, the experimenter could show such subjects a key and ask what it was. When subjects invariably replied, “a [ə] key,” the experimenter pointed out that during the rehearsal, they had been saying, “a [eI] key.” This almost always made them aware of the issue and hence resolved it. Some subjects did occasionally exhibit a slightly [æ]-like quality in “at” or a slightly [eI]-like pronunciation of “a” even in casual speech. It should be noted that the research question being investigated (how far VV effects can extend) does not strictly require that schwas be the target vowel, although of course that was the general intention. Overall, the great majority of the “at” and “a” vowels were in fact produced 8 It was expected that the rhotic occurring after the low vowel in “car” would not significantly color that context vowel, at least not to the point of creating a long-distance coarticulatory [i]-[r] contrast rather than an [i]-[a] contrast. While Heid and Hawkins (2000) and West (1999) have found evidence of longdistance resonance effects of liquids /l/ and /r/, these were in contexts in which the liquids were in syllable-initial position. Nevertheless, a check was performed against the possibility, as will be explained in Section 2.2, Results and Discussion. 47 with the expected schwa-like quality, and for convenience, the vowels in “up,” “at” and “a” will be referred to here as schwas. In order to obtain baseline formant values for the context vowels [a] and [i], each speaker was also recorded repeating each of the following sentences three times; in these sentences the context vowels in the final word are preceded by (and therefore coarticulated with) themselves:9 “It’s fun to see keys.” “It’s fun to saw cars.” 2.2.1.3. Recording and digitizing Participants were seated comfortably at a table in a laboratory environment (a quiet room measuring approx. 15 feet by 18 feet). The recording equipment consisted of a Shure SM48 microphone attached to a Marantz professional CDR300 digital recorder; these digital recordings were made at 16-bit resolution with a 48 kHz sampling rate. The participant was given a randomized list containing the appropriate number of copies of the sentences discussed above (i.e., six copies of both sentences containing the consecutive schwas, and three copies of both sentences containing the repeated context vowels). After the rehearsal process described earlier, the subject was handed the microphone and held it several inches to one side of (not directly in front of) his/her mouth, while saying the sentences in the order indicated on the list. If a disfluency or other unwanted event occurred, that repetition of that sentence was 9 N.B.: The vowels in “saw” and “car” have merged in the variety of American English spoken by these study participants. 48 repeated, until all of the sentences had been successfully recorded the required number of times. 2.2.1.4. Editing, measurements and analysis Editing of the digital sound files and formant frequency measurements were performed onscreen using the Sound Edit function in Praat (Boersma & Weenink, 2005) for each sound file, with the following settings: (for spectrogram) analysis window length 5 ms, dynamic range 30 dB; (for formant) maximum formant 5000 Hz for male speakers or 5500 Hz for female speakers, number of formants 5, analysis window length 25 ms, dynamic range 30 dB, and pre-emphasis from 50 Hz, using the Burg algorithm to calculate LPC coefficients. Figure 2.4. The acoustic representation of part of one utterance containing the sequence “up at a.” The vowel starting boundary is taken as the formant track marking just prior to the onset of voicing, while the end boundary is taken as the corresponding location after voicing offset. 49 Each target vowel was excised from the whole-sentence recording and saved as a separate sound file for purposes of data analysis, and also for possible use later as a stimulus in the perception experiment.10 The starting boundary for each vowel was defined as the formant track marking just prior to the onset of voicing, while the end boundary was the corresponding location after voicing offset, as shown in Figure 2.4 above. Because of the intervening consonants, determining the boundary points of each vowel was generally unproblematic. In less straightforward cases, such as some in which speakers flapped or otherwise reduced the [t] in “at a,” visual inspection of the amplitude trajectory usually showed a rapid change of slope at a particular point, which was taken as marking the vowel boundary. In the few cases where this boundary was not so clear, a special notation was made so that measurements made for those tokens could be given further attention if they were clear outliers. In the end, there were only a few such outliers, and their inclusion or exclusion from the analysis did not change the results.11 The effects of anticipatory VV coarticulation were expected to be strongest in the later portion of the target vowels (cf. Beddor, Harnsberger & Lindemann, 2002), 10 Measurements were made after excision instead of before mostly for convenience, because some of these excised vowels were to be used for the perception study, for which repeated measurements (and some alterations; see Section 3) would be necessary, these being more easily performed on the shorter sound files. More importantly, it made essentially no difference whether measurements were made before or after excision (confirmed through pilot testing), because of where the measurements were made and the width of the LPC analysis window. 11 The issue of outliers was dealt with as follows. Other than those that were the result of genuine errors (such as Praat clearly missing an F1 value and giving an F2 value as F1 instead), I decided I should either keep all of them or omit all of them, rather than making such decisions on a case-by-case basis, which could lead to bias. I did not want any reported significant outcomes to be contingent on any such caseby-case decision-making. In the final analysis, keeping the outliers instead of omitting them made little difference in general, but resulted in more conservative outcomes in a few cases (for example, an additional speaker would have had a significant distance-3 outcome if a particular outlier were removed). Therefore the outliers were included. 50 where influence of the immediately-following consonant was also likely to be greatest. As a compromise between seeking the former while minimizing the latter, measurements of target vowels’ F1 and F2 were made at 25 ms before vowel offset. For target vowels with duration under 50 ms, measurements were made at vowel midpoint. The aim was to fill the 25-ms LPC analysis window with only vocalic information, as late in the vowel as was feasible, but without acquiring acoustic information directly from the following consonant (though coarticulatory effects of the consonant on the vowel were sometimes evident, as the results section will show). Since all the target vowels were over 25 ms in length, this was always possible. VV coarticulatory effects were investigated at each distance through a statistical comparison, between the [i] and [a] contexts, of the formant values of the target vowels articulated at that distance from the context vowel. For all group analyses reported in this paper, a normalization procedure based on Gerstman (1968) was applied to each speaker’s nth raw formant values for n = 1 and 2. Starting with a given speaker’s average first and second formant values for full vowels [a] and [i] and with the raw formant value Fn raw (for n = 1 or 2), the corresponding normalized value is given by the formula Fn norm = 999 * (Fn raw - Fn min) / (Fn max - Fn min), where Fn max and Fn min are the largest and smallest nth formant values among that speaker’s full vowels; in other words, F1 max and F1 min are given by the speaker’s average F1 for [a] and [i] respectively, with the reverse order for F2 values. The procedure has the effect of scaling each speaker’s formant values relative to the width and height of his or her own vowel space, as defined by [a] and [i]. Both F1 and F2 are 51 scaled to a range 0-999 with the context [a] at (999, 0) and the context [i] at (0, 999). This makes comparisons between speakers more reasonable (though not unproblematic; see Chapter 7).12 2.2.2. Results and discussion 2.2.2.1. Group results Figure 2.5 below is a normalized vowel-space plot showing formant frequencies relative to the extremes of the context [a] and [i] for each distance condition and in each context, averaged over all 20 speakers. As one might expect, increased distance from the context vowel is associated with progressively reduced formant differences between the [i] and [a] contexts. The mean normalized (F1, F2) for the [a] and [i] contexts are (288, 449) and (46, 801) at distance 1; (731, 315) and (617, 379) at distance 2; and (388, 130) and (389, 160) at distance 3. It should be noted that the measurements illustrated in Figure 2.5 were made near the end of the target vowels, where coarticulatory influence of the context vowel was expected to be strongest. Therefore these formant values should not be expected to correspond too closely to the values one would obtain in the steady-state portion of a schwa vowel. This is especially true considering that the influence of the immediately following consonant in each distance condition appears to be in play as well; for example, as place of articulation changes from labial to alveolar to velar at distances 3 (“up”), 2 (“at”) and 1 (“a” [k]), F2 of the target pairs increases accordingly. Similarly, 12 Gerstman’s (1968) original formula specified that Fmax and Fmin were to be taken over nine (Dutch) vowels, not just [a] and [i], but given the positions of those two vowels in vowel space, it seems reasonable to take their F1 and F2 as providing the desired maximum and minimum values. Gerstman alluded to this himself (p. 80). 52 F1 values are quite low for the target vowels overall, particularly at distances 3 and 1; these vowels immediately preceded [p] and [k] respectively, both of which generally reached fuller closure than the [t] in “at,” often realized as a flap. To determine whether the differences illustrated in Figure 2.5 are significant, repeated-measures ANOVAs with context vowel as factor were performed on the group dataset at each distance and for each formant, using normalized formant values as discussed above. For both F1 and F2, there was a highly significant main effect of vowel at both distance 1 (first formant: F(1,19)=83.7, p<0.001; second formant: F(1,19)=101.7, p<0.001) and distance 2 (F1: F(1,19)=13.6, p<0.01; F2: F(1,19)=22.9, p<0.001). At distance 3, these effects appear to taper off, as the non-significant outcome for F1 (F(1,19)=0.052, p=.82) and significant but weaker outcome for F2 (F(1,19)=5.04, p<0.05) show. Together, these outcomes provide strong evidence of coarticulatory effects having occurred at all three distances. The existence of distance-3 effects is particularly noteworthy given that the distance-3 vowel was the target vowel which would be expected to undergo the least amount of reduction based on its [^] quality. 53 Figure 2.5. Context-vowel and distance-1, -2 and -3 target-vowel positions in normalized vowel space, averaged over all 20 speakers; these averaged values are marked by the line segment endpoints, not the labels. Context and distance-1, -2 and -3 vowel pairs are labeled with progressively smaller text size and an adjacent “C,” “1,” “2” or “3,” respectively. The context-related differences between F1 and F2 are significant at the p<0.05 level or greater for all 3 distance conditions, except for F1 at distance 3. 2.2.2.2. Individual results In order to explore these results further, the coarticulatory tendencies of individual speakers were next examined. For each speaker, one-tailed heteroscedastic ttests were run for F1 and F2 for each distance condition (1, 2 or 3) to determine if 54 formant values differed significantly between the [i] and [a] contexts. Raw formant values were used here since formant values were not being directly compared between speakers. One-tailed tests were appropriate since it was predicted that [i]-colored vowels would have lower F1 and higher F2 than [a]-colored vowels. The significance results for all 20 speakers are summarized below in Table 2.1. Significance is given without Bonferroni correction for multiple tests in order to provide a better picture of the differences between individuals and the gradience of the effects over distance. Numerical data for each speaker are given in Appendix A. To address the possibility that speaking rate might be a relevant factor here, this was also measured for each speaker; these figures are shown in the rightmost column of the table. Speech rate for a given speaker was calculated by averaging, over that speaker’s utterances, the time elapsing between the start of the distance-3 vowel and the start of the context vowel, a span of six segments. 55 Speaker 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Distance 3 (“up”) F1 F2 √ √ √ * √ √ √ √ √ √ ** √ √ √ √ + √ √ √ + + + + Distance 2 (“at”) F1 F2 √ * * *** *** √ * ** √ * √ *** √ ** * * ** *** * * √ + + * ** √ ** * * + √ *** + * Distance 1 (“a”) F1 F2 √ * *** *** √ *** √ * *** *** *** ** *** *** + *** ** ** √ ** √ √ * *** ** *** *** ** *** ** √ √ ** *** * √ *** ** *** *** Speech rate (seg/s) 13.8 15.5 13.9 11.2 15.2 12.7 15.3 13.6 12.1 15.2 11.8 14.2 11.0 12.2 17.6 11.2 14.4 16.5 12.1 12.0 Table 2.1. For each speaker, the significance testing outcomes of six t-tests are shown, comparing formant frequency values of that speaker’s target vowels for the [i] vs. [a] contexts, for each of F1 and F2 and for each distance condition. Significant results are shaded and labeled, where * = p<0.05, ** = p<0.01 and *** = p<0.001 (no Bonferroni correction). Also noted are marginal results, where + = p<0.10; a √ indicates a nonsignificant outcome in which averages were nevertheless in the expected direction (i.e., F1 greater for the [a] context than for the [i] context, and F2 lower). The rightmost column shows each speaker’s rate of speech in segments per second. As expected, most speakers showed a substantial amount of VV coarticulation, though a great deal of interspeaker variation is evident as well. While two participants 56 showed significant results in all three distance conditions, several others showed few or no significant effects. More specifically, 1 speaker (5% of the group of 20) showed no significant effects at all, 4 (20%) showed effects as far as distance 1 but no further, 13 (65%) had effects as far as distance 2 but no further, and 2 (10%) had significant outcomes at distance 3 (at Bonferroni-corrected p<0.000417, 10 of 20 speakers had significant effects at distance 1 for F1 or F2 or both, and 4 of 20 speakers had significant effects at distance 2). This confirms and extends Magen’s (1997) results, in which she found high variability between speakers in the production of long-distance VV coarticulation. Even when formant differences did not reach significance, they still tended to pattern the way one would expect, with [i]-colored vowels showing lower F1 and higher F2 with respect to [a]-colored vowels. This pattern held with few exceptions in the distance-1 and distance-2 conditions, but was much less evident in the distance-3 condition, where only two speakers (3 and 7) showed a difference which reached significance. The results for Subject 7 are pictured below in Figure 2.6, which shows this speaker’s average context- and distance-1, -2 and -3 target-vowel positions in vowel space (non-normalized formant values are shown). Context and distance-1, -2 and -3 vowel pairs are labeled with progressively smaller text size and an adjacent “C,” “1,” “2” or “3,” respectively. In Appendix B may be found such an illustration for each speaker of the entire group of 20. 57 Figure 2.6. Subject 7’s coarticulation effects were the strongest of the twenty speakers, with differences between contexts significant at the p<0.01 level or greater for F2 at all three distances and for F1 at distance 1. The graph shows this speaker’s average context- and distance-1, -2 and -3 target-vowel positions in vowel space. Context and distance-1, -2 and -3 vowel pairs are labeled with progressively smaller text size and an adjacent “C,” “1,” “2” or “3,” respectively. Formant values are not normalized. Values are marked by line segment endpoints, not the labels. 2.2.2.3. Follow-up tests At this point, some issues that may be raised about the outcomes reported thus far will be addressed. First, given the number of t-tests performed, one may ask whether the significant outcomes for Speakers 3 and 7 at distance 3 may be spurious, 58 since the possibility of a Type I error increases along with the number of such tests. Second, how do we know that the presence of the rhotic in the context word “car” was itself not the cause of the coarticulatory contrast with [i], given the resonance effects of liquids reported by Heid and Hawkins (2000) and West (1999)? Finally, is it not possible that some intervening consonant(s) might act as triggers themselves? In particular, the [k] preceding [i] in “key” is expected to be fronted, so one might suspect that the different [k]s in “key” and “car” were the real trigger for the effects on the preceding vowels.13 To answer these concerns, a small follow-up was conducted with the study subjects who had shown significant distance-3 effects. First, those two participants (Speakers 3 and 7) were recorded saying sentences similar to the ones used earlier, but with [r]-free context words: “It’s fun to look up at a keep.” “It’s fun to look up at a cop.” Speaker 3 was also recorded saying sentences with [k]-free context words: “It’s fun to look up at a peep.” “It’s fun to look up at a pop.” Measurements and significance testing were performed as before; the results are shown below in Table 2.2, along with the original “key/car” results for comparison. Although there are some differences in outcome associated with the different context word pairs, some important similarities are clear: in addition to strong distance-1 and -2 effects, we see significant distance-3 effects in all cases. For these speakers, then, the 13 It should be noted that in many if not most studies of long-distance coarticulation, it may be impossible to avoid the possibility that long-distance effects are actually short-distance effects that are transmitted successively via intermediate segments. 59 distance-3 effects seem quite robust, and given the variety of contexts in which they are maintained, cannot be due solely to the presence of any particular segments in the context words other than the context vowels [i] and [a]. 3 7 Distance 3 (“up”) F1 F2 * ** 3 7 * * ** ** *** *** *** *** *** keep/cop keep/cop 3 * * *** ** *** peep/pop Speaker Distance 2 (“at”) F1 F2 *** *** *** Distance 1 (“a”) F1 F2 *** *** *** Contrast key/car key/car Table 2.2. For the two speakers, the significance testing outcomes of six t-tests are shown for each contrast, comparing (non-normalized)formant frequency values of that speaker’s target vowels in the [i] vs. [a] contexts, for each of F1 and F2 and for each distance condition. Significant results are shaded and labeled, where * = p<0.05, ** = p<0.01 and *** = p<0.001. An additional question which should be addressed is whether these significant distance-3 results might simply be due to some reduction process like segment deletion having occurred somewhere between the context and distance-3 vowels. A closer inspection of the data shows that this was not the case, as illustrated by Figure 2.7 below, which shows part of one of Speaker 7’s utterances. Clear transitioning between the consonants and vowels, with no segment deletion, is apparent, a pattern that is not at all unique to this particular utterance for this speaker. 60 u p a t a k ey Figure 2.7. A typical recording made from Speaker 7, who showed strong coarticulatory effects at up to three vowels’ distance. This was not due to slurring or segment deletion, as shown above in the clear transitions from one segment to the next. The image shows the sequence “up at a key.” Coarticulation at various distances: Correlation results One pattern that might be expected and which Table 2.1 seems to show is that speakers who coarticulated more strongly at a closer distance were also more likely to show significant results at greater distances. Statistical tests confirm that average vowel space differences (taken as Euclidean distance in normalized F1- F2 space) between speakers’ [i]- and [a]-colored schwas were significantly correlated (r = 0.48, p < 0.05) between distances 1 and 2. However, such correlation was much weaker between distances 2 and 3 (r = 0.34, p = 0.14), and even more so between distances 1 and 3 (r = 0.0006, p = 0.998), presumably because of the absence of distance-3 effects for most speakers. 61 A related but more surprising outcome also seen in Table 2.1 is that Speakers 3, 10 and 11 all show apparently discontinuous coarticulatory effects; each of these three speakers had a significant distance-2 outcome for one formant without a corresponding distance-1 effect. Recasens (2002) examined “interruption events” such as these in VCV sequences and following Fowler and Saltzman (1993), suggested that such occurrences may be the result of a “fixed, long-term” planning strategy for the second vowel already being executed by speakers during the first vowel. It should also be noted that in all three of these cases, the trends were in the expected direction (see the raw data in Appendix A), indicating that coarticulatory forces may have been at work during the distance-2 vowel but were too weak to yield a statistically significant result. 62 Figure 2.8. Correlation between coarticulation production measures at distances 1 and 2. The correlation measure r = 0.48 (p<0.05). Normalized formant values were used. Relevance of speaking rate Although some researchers (e.g. Hussein, 1990) have suggested that speaking rate may be related to coarticulatory tendency, an inspection of Table 2.1 shows that the speakers in this study who coarticulated the most were not the fastest talkers, nor vice versa. Although the slowest speakers (Speakers 4, 13 and 16) showed an absence of significant effects at distances greater than 1, statistical testing for correlation between speakers’ speech rate and normalized vowel-space distance between [i]- and [a]-colored schwas found no significant effects in any of the three distance conditions.14 This result complements work of Hertrich and Ackermann (1995), who found that slower speech was associated with less carryover coarticulation, but no significant difference in anticipatory coarticulation, relative to more rapid speech. Temporal extent of VV coarticulation Fowler and Saltzman (1993) have suggested that long-distance coarticulation effects can be considered “long-distance” only in terms of the number of intervening segments, in that the time span across which such effects can occur is relatively narrow. 14 One reader has pointed out that this may be a case of a threshold effect, rather than an absence of an effect altogether. In other words, such an effect might be present at slow speaking rates, but not be apparent above a certain threshold rate, “perhaps having to do with the fact that one must enunciate to at least some extent to be understandable.” 63 This may in fact be the case, but if so, the upper limit they suggest (approximately 200250 ms) seems low in light of the fact that the two speakers who coarticulated the most in this study (Speakers 3 and 7) showed significant effects across time spans of well over 300 ms. The temporal distance between Speaker 7’s distance-3 vowel offset and context vowel onset over his 12 “key/car” utterances ranged from 298 to 377 ms and averaged 333 ms; for Speaker 3 the distances involved are even greater (range = [301, 472]; mean = 384 ms). 2.2.2.4. Longer-distance effects Remaining open is the question of whether VV coarticulatory effects at even greater distances can occur with any substantial frequency. In related work, Heid and Hawkins (2000) and West (1999) have found evidence of different resonances for [r] compared to [l] across several syllables, manifested as lowered formants (F3 for West, F2 + F3 + F4 for Heid & Hawkins), increased lip rounding, and high or back tongue position for [r] contexts compared to [l] contexts. To investigate the possibility of such extreme long-distance effects here, all of the vowels in the utterances of Speakers 3 and 7, who had already shown significant coarticulatory effects as far back as distance 3, were analyzed and compared between the [i] and [a] contexts. The results are shown below in Table 2.3, and appear to show some significant outcomes at distances 4 and 5. However, the magnitude of the effects is consistently small. There are also a number of discontinuities like those seen earlier for Speakers 3, 10 and 11, but occurring over wider ranges and, unlike in those cases, with trends at closer distances not always in the expected direction. To the extent that coarticulatory effects may have occurred over 64 such distances, they are clearly less robust than those reported for these speakers at closer distances. Dist. 7 (“it’s”) 6 (“fun”) 5 (“to”) 4 (“look”) Spkr. F1 F2 F1 F2 F1 F2 F1 3 7 * 3 (“up”) 1 (“a”) F2 F1 F2 F1 F2 F1 * * * ** 3 * 7 * 3 2 (“at”) * * F2 Contrast key/ car key/ *** *** *** car *** *** *** keep/ cop keep/ *** *** *** cop ** ** *** *** * *** ** *** peep/ pop Table 2.3. Possible very-long-distance coarticulatory effects, Speakers 3 and 7, with significance testing results between contexts indicated for target vowels at each distance from 1 to 7 before the context vowel, with significant results shaded and labeled, with * = p<0.05, ** = p<0.01 and *** = p<0.001. 2.3. Second production experiment: [i], [a], [u] and [æ] contexts Because of the positive outcome of production experiment just discussed, in which significant VV coarticulatory effects were evident for the [i] - [a] contrast at least as far as three vowels back, a second, more comprehensive experiment was run, in which more vowel contrasts were examined. 65 2.3.1 Methodology 2.3.1.1. Speakers An additional 18 participants (6 female; ages ranging from 18 to 22, with mean 19.6 and SD 1.9) were recruited for the second production experiment. To avoid confusion with earlier subjects, individuals in this group as well as those in later chapters will be identified with indexing that increases rather than being “reset” for later groups; the present group therefore consists of Speakers 21 through 38. All were undergraduate students at the University of California at Davis who received course credit for participating, were native speakers of American English and were uninformed as to the purpose of the study. For comparison purposes, three of the speakers who had coarticulated the most in the first experiment—Speakers 3, 5 and 7—also took part in the second one, but their data was not used in the group analyses to be reported for this second participant group. As was the case with the first group of speakers, while only monolingual speakers were sought, most subjects had been exposed to at least one other language as students in a university with a foreign language course requirement. Of these, two felt that they had acquired substantial knowledge of another language; since group results did not change depending on whether they were excluded, their data was included. 2.3.1.2. Speech samples The number of context vowels in the second experiment was increased to four, corresponding to the four corners of the vowel quadrangle (i.e. the vowels [i], [a], [u] 66 and [æ]). In addition, recordings with [^] as context vowel were made for comparison purposes. As in the first study, each sentence was spoken six times by each speaker. These sentences were: “It’s fun to look up at a keep.” “It’s fun to look up at a coop.” “It’s fun to look up at a cap.” “It’s fun to look up at a cop.” “It’s fun to look up at a cup.” The expected effects of each of the four corner vowels on schwa are indicated in Table 2.4 and illustrated in Figure 2.9 below. Vowel [i] [a] [u] [æ] High or Low? High Low High Low Front or Back? Front Back Back Front Effect on F1 Lowered Raised Lowered Raised Effect on F2 Raised Lowered Lowered Raised Table 2.4. Expected coarticulatory influence of four vowels on nearby schwa. 67 Figure 2.9. The diagram shows the expected directional influence on schwa from coarticulatory effects of nearby [i], [a], [u] or [æ]. In order to obtain baseline formant values for the four context vowels plus [^], each speaker was also recorded repeating each of the following sentences three times; in these sentences the context vowels in the final word are preceded by (and therefore coarticulated with) themselves: “It’s fun to seek keeps.” “It’s fun to sock cops.” “It’s fun to sack caps.” “It’s fun to sue coops.” “It’s fun to suck cups.” 68 2.3.1.3. Recording and digitizing These procedures were the same as in the first production experiment. 2.3.1.4. Measurements and data sample points Recall that in the first experiment, formant measurements of target vowels were taken at 25 ms before the vowel boundary (see Section 2.2.1.4), partly because of concern that measurements made at the very end of the vowel might be compromised by C-to-V effects of the immediately following consonant. In the end, the results for those 20 speakers were such that these C-to-V effects did not appear to block longdistance VV effects, but rather seemed to be superimposed over them (recall Figures 2.5 and 2.6). Therefore, for the second experiment, vowel boundaries were determined using the same procedures as in the first experiment (see Figure 2.4), but measurements were taken at the endpoint of each target vowel rather than 25 ms earlier. As a result, influences related to VV coarticulation from the context vowel might be expected to be greater than in the first experiment, though C-to-V effects from the following consonant might also be expected to be stronger. 2.3.2. Results and discussion 2.3.2.1. Group results Figure 2.10 below is a vowel-space plot showing normalized first and second formant frequencies for each distance condition and in each context, averaged over the group of 18 speakers. For the purposes of group analysis, data were excluded for the 69 three subjects from the first experiment (Speakers 3, 5 and 7) who had also provided data for this experiment; this exclusion was made to avoid biasing the data for this new group of speakers with data from speakers who had already been known to show strong coarticulatory tendencies. As before, all group analyses to be given here were performed using normalized formant values. Context and distance-1, -2 and -3 vowel sets are labeled with progressively smaller typeface. As in the first production experiment, increased distance from the context vowel is associated with progressively reduced formant differences among the vowel contexts. Also, as was seen in the group results for the first experiment, C-to-V effects seem to be superimposed over the VV effects at each distance; as place of articulation changes from labial to alveolar to velar at distance 3 (“up”), 2 (“at”) and 1 (“a keep/cop/coop/cap”), F2 of the target vowel sets increases accordingly. 70 Figure 2.10. Context-vowel and distance-1, -2 and -3 target-vowel positions in normalized vowel space, averaged over the 18 speakers. The averaged formant values are marked by the line segment endpoints, not the labels. Context and distance-1, -2 and -3 vowel sets are labeled with progressively smaller text size and are seen toward the left, center and right of the figure, respectively. Significance testing results are given in the text below. Four-vowel tests To determine the extent to which the differences illustrated in Figure 2.10 are significant, repeated-measures ANOVAs with context vowel as factor were performed 71 on the group dataset at each distance and for each formant. Only data for the four corner vowels (i.e., excluding vowels coarticulated with [^]) were tested. For both F1 and F2, there was a highly significant main effect of vowel at both distance 1 (first formant: F(3,51)=37.9, p<0.001; second formant: F(3,51)=82.4, p<0.001) and distance 2 (F(3,51)=14.9, p<0.001; F(3,51)=17.5, p<0.001). At distance 3, significant outcomes are seen for neither F1 (F(3,51)=2.24, p=0.09) nor F2 (F(3,51)=0.35, p=0.79). Next, repeated-measures ANOVAs were performed on the group dataset at each distance and for each formant, but this time two sets of such ANOVAs were run, the first with respect to F1 with vowel height as factor, and the second with respect to F2 with vowel frontness as factor. The first set of ANOVAs should indicate how differently target vowels in the context of high vowels ([i] or [u]) were articulated from those in the low-vowel ([a] or [æ]) context. Similarly, the second set of ANOVAs indicates the relative amount of difference in articulation of target vowels at various distances associated with front ([i], [æ]) versus back ([u], [a]) context vowels. The outcome found highly significant effects at distance 1 associated both with height (F1) (F(1,17)=111.7, p<0.001) and frontness (F2) (F(1,17)=129.5, p<0.001). Effects were also quite strong at distance 2 (for height (F1), F(1,17)=35.9, p<0.001; for frontness (F2), F(1,17)=29.6, p<0.001). As was the case for the ANOVA testing with vowel as factor, at distance 3 the effect associated with vowel height (F1) approaches significance (F(1,17)=3.90, p=0.065), but there is not even a marginally significant effect associated with frontness (F2) (F(1,17)=0.067, p=0.80). 72 Testing on individual vowel pairs Finally, the formant differences of target vowels among the individual contexts pairs were examined. Since there are six sets of vowel pairs associated with the four corner vowels, six sets of repeated-measures ANOVAs were run; in each such set, comparisons of normalized F1 and F2 values were made at each distance with vowel as factor, where in each set the vowel factor had only two levels. The results are shown below in Table 2.5, the final row of which gives results for the [i] - [a] contrast for the entire set of 38 speakers who took part in either the first or second experiment. Only the significance testing outcomes are shown in the table (with * = p<0.05, ** = p<0.01 and *** = p<0.001); the numerical results are given in Appendix F. Contrast Distance 3 Distance 2 Distance 1 F1 F1 F2 F1 F2 *** *** *** *** *** ** F2 [i]-[a] [a]-[u] * *** [æ]-[u] [a]-[æ] ** * [i]-[æ] *** [i]-[u] All 38 speakers: [a]-[i] *** * *** *** *** *** *** *** *** ** *** *** *** *** Table 2.5. Results of ANOVA testing on first and second formants of target vowels, with comparisons made between each vowel pair chosen from the vowels [a], [i], [u], 73 [æ]. The first six rows show results for the second production experiment. The bottom row gives results for the [i] - [a] contrast for all of the 38 speakers who took part in either production experiment. Significant results are shaded and labeled, with * = p<0.05, ** = p<0.01 and *** = p<0.001. Together with the outcome of the first experiment, the results presented here provide further evidence that coarticulatory VV effects can extend across as much as three vowels’ distance, at least for some speakers. Now, the coarticulatory tendencies of individual participants in the second production study will be examined. 2.3.2.2. Individual results A great deal more data was collected for this second group of study participants than for the first; comparisons of outcomes just for the [i] and [a] contexts will be made to begin with before more comprehensive results are given. Since these are the two vowels most distant in formant space, one may expect that the strongest coarticulation results will tend to be found in comparisons made between those contexts. Focus on the [i] - [a] pair will also be necessary when comparisons are made with the outcome of the first group of subjects (Speakers 1 through 20), or when data for both groups are pooled for analyses of the collective behavior of all 38 subjects. Initial significance testing was performed the same as with the first group, with t-tests between vowel contexts for each speaker at each distance, except that in this case there were more contexts to compare. For now, just the results for the [i] - [a] contrast will be presented. To summarize the testing procedure again, for each speaker, one- 74 tailed heteroscedastic t-tests were run for F1 and F2 for each distance condition (1, 2 or 3) to determine if formant values differed significantly between the [i] and [a] contexts. One-tailed tests were appropriate since it was predicted that [i]-colored vowels would have lower F1 and higher F2 than [a]-colored vowels. The significance results for all 21 speakers—the 18 new subjects plus the three speakers who had coarticulated the most in the first experiment—are summarized below in Table 2.6 (numerical data for all speakers are given in Appendix C). To address the possibility that speaking rate might be a relevant factor here, this was also measured for each speaker; these figures are shown in the rightmost column of the table. Speech rate for a given speaker was calculated by averaging, over that speaker’s [i]- and [a]-context utterances, the time elapsing between the start of the distance-3 vowel and the start of the context vowel, a span of six segments. 75 Speaker 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 3 5 7 Distance 3 (“up”) F1 F2 √ √ √ √ √ √ * + √ √ ** + √ √ √ √ √ + √ + * * √ √ √ √ + Distance 2 (“at”) F1 F2 * + * √ * *** √ + √ + √ √ √ ** * ** √ ** + + + ** * * *** √ *** √ *** √ * *** ** + √ √ *** ** √ Distance 1 (“a”) F1 F2 *** *** ** *** *** *** ** *** ** √ ** ** + *** *** *** ** *** *** *** + *** ** *** *** *** ** *** ** ** √ *** * *** ** * ** √ *** *** *** *** Speech rate (seg/s) 14.5 13.0 13.8 11.7 14.7 11.9 13.2 13.7 12.3 12.1 11.1 16.5 13.8 14.6 12.1 13.7 13.6 14.2 13.4 14.7 15.0 Table 2.6. For each speaker, the significance testing outcomes of six t-tests are shown, comparing formant frequency values of that speaker’s target vowels for the [i] vs. [a] contexts, for each of F1 and F2 and for each distance condition. Significant results are shaded and labeled, where * = p<0.05, ** = p<0.01 and *** = p<0.001 (not Bonferroni corrected). Also noted are marginal results, where + = p<0.10; a √ indicates a nonsignificant outcome in which averages were nevertheless in the expected direction (i.e., F1 greater for the [a] context than for the [i] context, and F2 lower). The rightmost column shows each speaker’s rate of speech in segments per second. 76 As with the first experiment, we find here significant testing outcomes even as far back as distance 3, with more and stronger coarticulatory effects generally seen at nearer distances than at further ones. Again, much interspeaker variation is evident as well: while there were no speakers in the new group of 18 who had a total absence of significant effects in the [i] vs. [a] contexts, 3 (17%) showed effects as far as distance 1 but no further, 8 (44%) had effects as far as distance 2 but no further, and 4 (22%) had significant outcomes as far as distance 3 (at Bonferroni-corrected p<0.000463, 13 of 18 speakers had significant effects at distance 1 for F1 or F2 or both, and 3 of 20 speakers had significant effects at distance 2). One notable difference from the outcome of the first group is the relatively large number of significant effects for F1, even at distance 3. In contrast, fewer significant results were found for Speakers 3, 5 and 7 here than in the first experiment. In particular, no significant results for those speakers were seen here at distance 3, though Speaker 7 did have a weakly significant effect for F2 there. Possible reasons for these differences will be addressed later, in Section 2.3.2.5. Despite the weaker outcomes for those particular speakers, distance-3 effects were not uncommon in this group as a whole. The strongest coarticulatory effects seen in this group of speakers were probably those of Speaker 37, who showed significant effects at all three distances. Figure 2.11 below illustrates this speaker’s average context- and distance-1, -2 and -3 target-vowel positions in vowel space, for all four corner-vowel contexts, not just [i] and [a]. Context and distance-1, -2 and -3 vowel pairs are labeled with progressively smaller text size and an adjacent “C,” “1,” “2” or “3,” respectively. Significance testing results for each distance condition and context 77 vowel pair are coded green for one of F1 or F2 significantly different, red for both, or dotted black for neither. The outer four context vowels are joined by solid blue lines. The context vowel [^] is also shown for reference purposes. Because this speaker, like most of the others, produced the context vowel [u] as a diphthong, the vowel-space positions of both the onset and offset of this vowel are shown; the former is labeled “u” and the latter is simply given as “u.” Appendix E provides such a graph for each speaker of the entire group of 21 speakers who participated in this experiment. Figure 2.11. Subject 37’s coarticulation effects were probably the strongest of the speakers who took part in the second production experiment, with significant coarticulatory effects produced at all three distances. The graph shows this speaker’s average context- and distance-1, -2 and -3 target-vowel positions in vowel space. 78 Context and distance-1, -2 and -3 vowel pairs are labeled with progressively smaller text size and an adjacent “C,” “1,” “2” or “3,” respectively. Significance testing results for each distance condition and context vowel pair are coded green for one of F1 or F2 significantly different, red for both, or dotted black for neither. Context vowels are joined by a blue line. The symbol “u-” indicates the onset of diphthongal context vowel [u], while the “u” label indicates its offset. Except for those two labels, averaged formant values are marked by the line segment endpoints, not the labels themselves. The numerical data upon which Figure 2.11 is based are summarized below in Tables 2.7 and 2.8. Table 2.7 shows Speaker 37’s average F1 and F2 values for each distance condition and vowel context, including [^]. Standard deviations for each formant at each distance across vowel contexts are also given. Table 2.8 shows how the significance coding in Figure 2.11 was determined. Each of the values ranging from 0 to 1 in a cell of Table 2.8 is the probability outcome of a t-test comparing the six values of that formant measured at that distance in each of the two contexts. Values under 0.05, indicating significance at that level, are colored orange. A blank cell indicates an outcome in the contrary-to-expected direction (e.g. F1 higher for the [i] than [a] context). 79 Distance 3 F1 F2 Distance 2 F1 F2 Distance 1 F1 F2 [a] [æ] [^] [i] [u] 489 463 457 453 454 1178 1161 1144 1149 1169 532 522 521 439 518 1464 1482 1513 1539 1478 410 387 392 361 386 1686 1907 1634 1956 1771 SDs 11.2 17.7 38.2 30.4 17.6 138 Context Table 2.7. Average F1 and F2 values for Speaker 37 at each distance and in each vowel context, and average SDs for each formant at each distance. This speaker showed strong coarticulatory effects at all three distances. Given the number of t-tests performed in Table 2.8, some false positives are likely; the question is whether the number of such results is significantly larger than what one would expect from a random outcome. To permit this determination, the numbers of significant testing outcomes in each distance condition for the p<0.05 and p<0.01 levels of significance have been tallied in the bottom row of Table 2.8. For each distance condition, there are two formants and ten context vowel pairs, giving a total of twenty probability values. At the p<0.05 level of significance, we would expect about 1 (i.e. 20 times 0.05) spurious result at random, and performing an analysis using the binomial distribution we find that 3 spurious results or fewer would be expected more than 98% of the time. For the p<0.01 level, even one such result is less than 2% likely. Given the Figures at the bottom of Table 2.8, which show that Speaker 37’s significant testing outcomes numbered much greater than these threshold values, we can be 80 confident that overall, the significant outcomes associated with this speaker, even at distance 3, are not spurious. Context pair [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # of sig results, by distance: Distance 3 F1 F2 0.018 0.006 0.035 0.002 0.232 0.344 0.165 0.405 0.935 0.417 <.05 4 0.752 0.337 0.431 <.01 2 Distance 2 F1 F2 0.016 0.657 0.354 0.275 0.020 0.477 0.389 0.029 0.052 0.449 <.05 9 Distance 1 F1 F2 0.000 0.167 0.003 0.305 0.004 0.016 0.063 0.134 0.024 0.115 0.403 0.014 0.000 0.002 <.01 5 0.487 0.095 0.231 0.362 <.05 8 0.000 0.001 0.097 0.146 0.003 0.001 0.001 0.001 <.01 6 Table 2.8. Level of significance associated with Speaker 37’s formant frequency differences between contexts, for each of 10 context vowel pairs and 3 distance conditions, for each of F1 and F2. A blank cell indicates an outcome in the contrary-toexpected direction (e.g. F1 higher for the [i] than [a] context). The final row is a tally of significant results at the p<0.05 and p<0.01 levels of significance for each of the three distance conditions. These tallied values are far greater than would be expected from random false positives. A final summary of Speaker 37’s results is given below in Table 2.9. The notation used in this table is like that seen earlier, with * = p<0.05, ** = p<0.01, *** = p<0.001, + = p<0.10; a √ indicates a non-significant outcome in which averages were 81 nevertheless in the expected direction. However, in this table the results for all ten context vowel pairs are given, not only those for the [i] - [a] contrast. Also given in the bottom row of each column, corresponding to one of F1 or F2 in a particular distance condition, is a count of results significant at at least the p<0.05 level of significance. Because there are ten values associated with each column, the expected number of false positives per column is 0.5, with two or fewer such outcomes in any given column being over 98% probable. Again, the results for this speaker are well above this threshold. Distance 3 F1 F2 Distance 2 F1 F2 Distance 1 F1 F2 [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] * ** * ** √ √ √ √ √ √ * √ √ √ * √ √ * + √ *** √ ** √ ** * + √ * √ √ * *** ** √ + √ √ # of results sig at p<0.05, by column 4 3 6 2 Context pair √ √ √ 0 *** *** + √ ** *** *** ** 6 Table 2.9. Summary of significance testing outcomes for Speaker 37 for each formant at each distance and for each contrasting context-vowel pair, with * = p<0.05, ** = p<0.01, *** = p<0.001, + = p<0.10, and a √ indicating a non-significant outcome in which averages were nevertheless in the expected direction. The bottom row gives the 82 number of results in the column above which were significant at the p<0.05 level or stronger. Analyses like those presented for Speaker 37 in Tables 2.7, 2.8 and 2.9 are given in Appendices C and D for all 21 speakers who took part in the second production experiment. In order to highlight in brief the interspeaker variation seen in these results, numerical data like those given in the bottom row of Table 2.9 for Speaker 33 is shown below in Table 2.10 for each member of this group of 21 speakers, representing the number out of the ten total context-vowel pairs for which statistical testing yields an outcome significant at at least the p<0.05 level of confidence. As was the case with the final row of Table 2.9, the expected value of each cell is 0.5, with values above 1 expected less than 10% of the time, and values above 2 (shaded green in the table) expected less than 1 percent of the time. Given this, the values shown in Table 2.10 are clearly too numerous to have been obtained through chance outcomes, highlighting the group results seen earlier. The data in the table show clearly that most speakers produced significant distance-1 differences not only between the [a] and [i] contexts, but for most of the distinct context vowel pairs. At distances 2 and 3, the effects are not as ubiquitous as for distance 1 but are again well in excess of chance levels. 83 Speaker 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 3 5 7 Distance 3 F1 F2 0 0 0 0 1 0 0 2 1 1 0 2 4 0 1 0 0 0 4 1 0 0 1 0 0 2 0 1 0 0 1 1 4 0 1 0 2 0 0 3 0 1 Distance 2 F1 F2 2 2 2 0 6 4 2 3 1 2 0 0 2 4 3 4 1 4 1 2 0 5 0 3 3 4 3 4 2 2 7 0 3 6 5 1 3 5 1 6 1 4 Distance 1 F1 F2 3 8 3 8 6 9 4 8 6 0 3 6 3 6 6 8 2 8 6 6 3 5 6 7 3 7 5 9 5 7 0 6 2 6 2 4 5 9 4 7 6 8 Table 2.10. For each speaker, for each formant at each distance, the table shows the number of vowel-pair contrasts for which t-testing found a result significant at the p<0.05 level or stronger. Values above 2 are expected less than 1% of the time and are highlighted in green; for comparison purposes, values of 2 are colored yellow and values under 2 are left unshaded. 84 2.3.2.3. Follow-up tests Coarticulation at various distances: Correlation results Recall that for the first group of participants (Subjects 1 to 20), the strength of coarticulatory effects was significantly correlated when compared between distance conditions 1 and 2. The results for the second group (Subjects 21 to 38) are weaker. The correlation between these subjects’ formant differences between the [i] and [a] vowel contexts was not significant between any two distance conditions (distances 1 and 2 (r = -0.05, p=0.85), distances 2 and 3 (r = 0.30, p=0.23), and distances 1 and 3 (r = -0.01, p=0.96). However, pooling the data for the entire group of 38 subjects who participated in either experiment, one finds significant correlation between these coarticulation production measures between distances 1 and 2 (r = 0.37, p<0.05) and between distances 2 and 3 (r = 0.35, p<0.05); the first result is illustrated below in Figure 2.12. Between distances 2 and 3 the correlation is positive but not significant (r = 0.07, p=0.68). 85 Figure 2.12. Correlation between coarticulation production measures at distances 1 and 3 for all of the speakers who took part in the either production experiment. The correlation measure r = 0.37 (p<0.05). Normalized formant values were used. The lack of strong correlation results for coarticulation production at various distances for the second group in particular is unexpected, since it seems reasonable to suppose that speakers who coarticulate at greater distances would also do so at nearer distances as was true for the first group of speakers. It may be the case that the slightly different measuring position used for the later subjects’ target vowel formant frequencies (immediately adjacent to the following consonant, rather than 25 ms earlier as was done for the first group of subjects) results in more substantial interference from the coarticulatory effects of that consonant on the target vowel being measured. 86 Although we have seen for both groups (see Figures 2.5 and 2.10) that the C-to-V effects seem to overlay the longer-distance VV effects rather than completely supplanting them, perhaps this is too simple an assumption. Instead of simple superposition, the C-to-V and VV effects may interact in a more complex manner than is initially apparent, perhaps differing substantially between speakers and so weakening the correlation results. Relevance of speaking rate It was seen earlier that for the first group of subjects, speaking rate and coarticulation production were not significantly correlated at any distance. This appears to be so for the second group of subjects as well (for distance 1, r = -0.30, p=0.23; for distance 2, r = 0.27, p=0.29; for distance 3, r = 0.12, p=0.63). Use of other production measures, such as raw or logarithmic values, or considering only F1 or only F2, did not produce stronger results. This was found to be so of the datasets obtained from the first group, the second group, and the pooled dataset for all 38 subjects. It should be noted that the present results do not conflict with the idea that speech rate and coarticulation might be correlated at the individual speaker level. In other words, a person’s typical speech rate might not predict his or her coarticulatory tendency relative to other speakers (as the results of the present study suggest, at least for anticipatory VV effects), but a particular speaker may coarticulate more or less depending on how fast he or she is speaking compared to his or her usual speech rate. The current project was not intended to address this particular possibility, either for 87 spoken language or sign language, although other researchers have looked into the matter; for example, Mauk (2003) explored issues related to undershoot by investigating Location-to-Location coarticulation in ASL. Mauk found that signers coarticulated more in faster signing conditions than in slower ones. On the other hand, Matthies, Perrier, Perkell and Zandipour (2001) investigated the issue in spoken language and obtained a null or at best weak result. 2.3.2.4. Longer-distance effects Recall that each of the two speakers in the first production study who produced significant distance-3 effects were also found to have possible, though inconsistent and weaker, effects at even greater distances. Since a number of speakers in the second study (Speakers 27, 30, 37 and 38) also produced significant distance-3 effects, a follow-up on these subjects was run to see if they too might produce very long-distance coarticulation. To focus the discussion and to facilitate comparison with the similar follow-up to the first experiment, this investigation was limited to the [i] - [a] contrast. Table 2.11 below shows the results. Only one outcome was significant. Given the number of speakers and contexts that were considered, this cannot be considered strong evidence of effects at distances greater than 3. Recall that a similar follow-up was done after the first production experiment, and also yielded some positive but weak results at distances 4 and 5. While the lack of a clear positive outcome here cannot rule out the possibility that distance-4 or longer-distance effects might be found in other speakers or contexts, the rapid drop-off seen in both experiments in the range of distance 3 does suggest that this may be an upper limit, at least for anticipatory VV 88 effects. It is unlikely that many environments would be more conducive to such effects than those used here: those having multiple consecutive schwas separated by single consonants. Dist. 7 (“it’s”) 6 (“fun”) 5 (“to”) 4 (“look”) Spkr. F1 F2 F1 F2 F1 F2 F1 27 * 3 (“up”) 2 (“at”) Contrast keep/ ** *** cop keep/ *** *** cop keep/ * *** * *** cop keep/ ** ** * cop F2 F1 F2 F1 * 30 ** 37 * 38 * 1 (“a”) F2 F1 F2 Table 2.11. Results of testing for coarticulatory effects at very long distances, for four speakers from the second production study. Significant t-test outcomes between the [i] and [a] contexts are indicated for target vowels at each distance from 1 to 7 before the context vowel, with significant results shaded and labeled, with * = p<0.05, ** = p<0.01 and *** = p<0.001. 2.3.2.5. Further comparison of the two experiments Earlier it was noted that the coarticulatory behavior of the subjects who performed both production experiments, Speakers 3, 5 and 7, seemed to differ substantially between the two tasks, despite the conditions for both being very nearly the same. Generally, the results for these subjects were weaker in the second experiment. The two main differences between the first and second tasks and their analysis were (1) the point at which formant measurements were taken (at the end of 89 the vowel, or 25 ms earlier), and (2) the different words used to create the [i] and [a] contexts (“key/car” or “keep/cop”). To determine whether one or both of these might be the responsible factor for the differences in testing outcomes, a comparison was made for these speakers of significance testing results using each of the two measuring timepoints with each of the two sets of sentence recordings (ending in “key/car” or in “keep/cop”). A summary of significance testing results is given below in Table 2.12. The table gives results for all four possible combinations of context word pair and formant measuring timepoint, two of which were utilized in the two production experiments—the vowel-endpoint-minus-25-ms measuring point for the “key/car” contrast (used in the first experiment, shown in the first three rows of the table), and the vowel endpoint for the “keep/cop” contrast (used in the second experiment, give in the last three rows). All new measurements were made using the same recordings of each speaker obtained earlier for the two experiments; these recordings numbered 24 for each of the three speakers (i.e. 4 possible last words of a sentence, with each sentence repeated six times). 90 Distance 3 Distance 2 Distance 1 Speaker F1 F2 F1 F2 F1 F2 3 5 7 * *** * ** 3 5 7 ** 3 5 7 * ** * * 3 5 7 Contrast Measuring Timepoint key/car key/car key/car V end - 25 ms V end - 25 ms V end - 25 ms key/car key/car key/car V end V end V end *** ** *** *** *** *** *** *** *** *** *** ** *** ** *** *** *** ** * *** *** *** *** *** keep/cop V end - 25 ms *** keep/cop V end - 25 ms *** keep/cop V end - 25 ms *** ** ** *** keep/cop *** keep/cop *** keep/cop *** V end V end V end Table 2.12. Summary of significance testing outcomes between [i] and [a] contexts for each distance, for the three speakers who took part in both production experiments. Outcomes using all four possible combinations of context word pair (“key/car” versus “keep/cop”) and formant measurement point (target vowel endpoint or target vowel endpoint minus 25 ms) are given. Other than at distance 1, where strong effects seem abundant, the data given in Table 2.12 indicate that (1) significant testing outcomes between vowel contexts are fewer when the formant measurements are made at the very end of the vowel and (2) the “keep/cop” contrast may be associated with fewer significant outcomes than the “key/car” contrast. The data in Table 2.13 below supports these observations. 91 First, formant comparisons between contexts at distance 2 or 3 yield results that are stronger overall when formant measurements are taken at the earlier measuring point. One possible explanation for this, mentioned earlier, is that VV effects closer to the end of the vowel are more strongly influenced by the coarticulatory effects on the target vowel of the immediately following consonant; in fact, given the overlap of gestural timing expected of adjacent segments, the consonant has presumably already “started” by the time the vowel reaches its acoustic endpoint. Second, significance testing to compare the “key/car” and “keep/cop” contrasts also reveals a difference: results for the “key/car” contrast tend to be stronger. One possible explanation of this the same as the reason that open-monosyllabic words (or nearly open, depending on how one treats the rhotic in “car”) were chosen as context words in the first production experiment to begin with. It was expected that the vowels in such words would be held longer and that that would increase their tendency to act on the preceding target vowels, relative to context vowels in closed syllables. For the second experiment, in which all four corner vowels plus [^] were to be used as context vowels, this was not an option since English [^] and [æ] do not occur in open syllables. So, while use of the closed-syllable [kVp] pattern permitted a minimal quintuple to be used, it may also have resulted in an overall tendency toward weaker VV coarticulation than might be expected with open-syllable words. The final three rows of Table 2.12, which give results when both the “keep/cop” contrast and the end-of-vowel measuring procedure are used, seem to show the cumulative weakening effects of both, such as a complete absence of distance-3 effects for these speakers and relatively few effects at distance 2. Since this is the methodology 92 used in the second production experiment, it is worth noting that several other speakers in that experiment did have significant distance-3 effects. It is possible that those speakers would have coarticulated even more strongly in other contexts. Similarly, the issue of which portion of a vowel may most reliably undergo measureable coarticulatory effects, or whether there is consistency on this point among different contexts or different speakers, is beyond the scope of this project. Distance 3 Distance 2 Measuring point V end minus 25 ms V end ns ns *** ns V end minus 25 ms V end * ns ** ** Context pair key/car keep/cop ns ns ** ns key/car keep/cop * ns *** * Table 2.13. Summary of t-test outcomes comparing two factors: measuring point within the target vowel (end of vowel or 25 ms earlier) and context word pair (“key/car” or “keep/cop”). The “V end minus 25 ms” and “key/car” comparisons yield generally stronger results than the others. 93 2.4. Chapter conclusion The studies presented here provide strong evidence that many if not most speakers produce a significant amount of anticipatory VV coarticulation in at least some contexts, and that for some speakers, such effects can extend across several intervening segments. As has been the case in previous coarticulation studies, substantial variability between speakers was also found. While long- and short-distance VV coarticulatory tendency were correlated overall, speaking rate and tendency to coarticulate were not. The effects of C-to-V and VV coarticulation were seen simultaneously, but it appeared that the former may have caused an attenuation of the latter at close range, at least for some speakers. Whether such effects are more-or-less additive in general or interact in a more complicated fashion has yet to be determined. Long-distance effects like those found for several of the speakers in these experiments have significant implications for models of speech production, as will be discussed in Chapter 7. How relevant they are for speech processing or historical change depends in large part on whether listeners can hear them. The remaining spoken-language experiments that were conducted as part of this project, which are presented in Chapters 3 and 4, examine the perceptibility of VV coarticulatory effects. 94 CHAPTER 3 — COARTICULATION PERCEPTION IN ENGLISH: BEHAVIORAL STUDY 3.1. Introduction In an early perception study, Lehiste and Shockey (1972) performed a perception experiment in which listeners heard recordings of VCV sequences in which one of the vowels, together with the adjacent half of the consonant, had been excised, and were asked what they thought the missing vowel was. Results were no better than chance. However, studies using other methods (e.g. Fowler & Smith, 1986; Beddor et al, 2002) have found that VV coarticulatory effects are in fact perceptible to listeners. For example, Beddor et al. (2002) have found that listeners are able to “compensate” for coarticulation effects, in the sense that they can shift target vowel category boundaries when perceptually necessary, but to a great extent this is true only of effects typical of the listener’s L1. This is consistent with results of Fowler (1981), who found that listeners were most sensitive to differences resulting from coarticulation behaviors most similar to their own. In that study, identical vowel sounds were often perceived as different in various flanking vowel contexts, and acoustically distinct versions of a given vowel were sometimes heard as being the same in appropriate flanking vowel contexts. As Fowler (1981) has suggested, the technique used in the Lehiste and Shockey (1972) study may simply have been too insensitive to allow accurate 95 measurement of perceptual effects. In the current study, a modified version of their technique is used, with the following reasoning. Ohala (1981) has suggested that acoustic byproducts of physiological linguistic processes may sometimes be perceived by listeners as grammatically important information, and that this may ultimately lead to language change. For example, in a vowel immediately following a voiceless consonant, voicing fundamental frequency f0 tends to start off somewhat high, and then fall rapidly back to normal, while after a voiced consonant, the opposite occurs (Ohala, 1974, 1981). The reasons for this are not well-understood but appear to be physiologically based. Ohala has suggested that in some cases, this phenomenon could be the origin of tonal contrasts. Along the same lines, vowel harmony may sometimes be the outcome of perceptible vowel-to-vowel coarticulation that has become grammaticalized (Ohala, 1994). Przezdziecki (2000) found evidence for such a connection between vowel harmony and VV coarticulation in his study of three Yoruba dialects, one of which has vowel harmony and the other two of which do not. Przezdziecki suspected that coarticulation patterns in the two dialects without vowel harmony might be similar in kind but different in degree from those in the third dialect, in which he hypothesized that such patterns had originated in a similar way but had now become grammaticalized. Analysis of vowel formant data from the three dialects offered support for his hypothesis. If we suppose that these effects might sometimes be the origin of vowel harmony, they would almost certainly have to be perceptible to listeners, at least in some environments. An interesting point that should be made here is that the languagechange hypothesis does not require that all speakers coarticulate heavily, nor that all 96 listeners perceive such effects readily. Instead, if the hypothesis holds, only small minorities of such speakers and listeners would suffice to get the process started, as the graph shown below in Figure 3.1 illustrates. The graph shows a logistic or “S” curve, used to describe change in many contexts, including population studies, business, and historical linguistics (e.g. see Aitchison, 1981). Figure 3.1. A logistic or “S” curve can be used to characterize the spread of a change through a linguistic community. During the first phase, only a few people have adopted the change. This is followed by a second phase of rapid spread of the change throughout the population, and finally the third stage, during which the few remaining holdouts either adopt the change or die off. The language-change hypothesis implies that the spread of coarticulationrelated change could occur because listeners who are particularly sensitive even to relatively weak coarticulatory signals would in turn retransmit those signals in a stronger form. This also leads to the intriguing question, addressed later in the present 97 study, of whether listeners who are especially sensitive to coarticulation might also tend to coarticulate more. The present research was inspired by the idea that in cases where coarticulatory effects are particularly strong, a different result from Lehiste and Shockey’s (1972) might be obtained, and that this result would still have important implications, as just discussed. This approach led to positive results in Grosvald (2006), where many listeners were able to determine the final vowel in VCV sequences, where C was [k] or [p] and V was [a] or [i], when hearing only the initial vowel. Following this approach, the current perception study used only recordings from speakers whose VV coarticulation patterns were particularly strong. 3.2. Methodology 3.2.1. Listeners Out of the 38 subjects who participated in the production experiments, the first seven were those whose recordings provided the raw material for this perception study; no perception data was obtained from them. Subjects 18, 19 and 20 took part in a pilot ERP study for which no responses were necessary (the ERP experiment will be discussed in Chapter 4). Therefore, a total of 28 subjects—Speakers 8 through 17 and 21 through 38 of the production study—provided data for the perception experiment to be described here. Twelve were female and 16 were male; ages ranged from 18 to 24 and averaged 19.4, with an SD of 1.63. Table 3.1 below summarizes the information just given. All perception study subjects were undergraduate students at the University of California at Davis who received course credit for participating. 98 Subjects Production data Perception data 1-7, 18-20 [a]-[i] contexts No 8-17 [a]-[i] contexts Behavioral 21-38 [a]-[i]-[æ]-[u] contexts Behavioral and ERP Table 3.1. The scope of the study expanded as the number of subjects increased. Early subjects (1-7) performed only the [a]-[i] production task. The next group (8-17) performed a behavioral perception task as well, described in this chapter. The final group (21-38) performed an expanded production task with several vowel contrasts and also provided ERP data, to be discussed in Chapter 4. 3.2.2. Creation of stimuli for perception experiment Recordings obtained from the first seven speakers in the production experiment were used to create stimuli for subsequent subjects to respond to in this perception study. Although it might seem that synthesized stimuli might be preferable, their creation would require specific decisions about the kinds of coarticulation that can or cannot occur at various distances, something that is simply not well-established at this point in time. While the aim was to determine how perceptible long-distance coarticulation effects could be, not all speakers in the initial production experiment had produced such effects, so an appropriate subset of the recordings had to be chosen. The basic approach taken here was to use typical tokens from speakers who had coarticulated more strongly than average. This should not render irrelevant the obtained results, since the language-change hypothesis described earlier requires only some listeners to be sensitive to some speakers’ coarticulatory effects. The participants whose 99 schwa tokens were chosen for use here, Speakers 3 and 5, were two whose results were among the strongest from the seven subjects who were initially recorded. In order to err somewhat on the conservative side, recordings from the speaker who coarticulated the most, Speaker 7, were not used, in case the results for that individual were truly exceptional. The tokens needed were [i]- and [a]-colored schwas in each of the distance conditions 1, 2 and 3. Since individual recordings might have quirks which could be used by listeners to determine their distinctiveness independent of vowel quality, four recordings of schwa in each distance condition and vowel context were selected, to be presented interchangeably during each presentation of that vowel type. Four recordings from Speaker 5 were used for each context ([i] and [a]) for each distance 1 and 2, while four of Speaker 3’s recordings were taken for each context for the distance-3 condition.15 Therefore, the total number of tokens used was 2 ([i] vs. [a] context) * 3 (distance-1, -2 or -3 condition) * 4 copies of each = 24. Because speakers had repeated each sentence six times, six tokens for each context and distance condition were available, of which four were to be chosen as stimuli. The choice of which four to use was made in a principled manner. First, for each context and distance condition, average F1 and F2 of the corresponding six tokens were computed, defining a center point in vowel space for that group of six tokens. 15 Recordings of different speakers were used just as multiple tokens in each condition were used, both for comparable reasons—so that in the event significant results were obtained, it would be much less likely that this was due to listeners relying on the characteristics of a particular speaker or a particular recording. 100 Next, the Euclidean distance from that center point to each token was computed, and the two outliers—those whose distance from the center point was greatest—were rejected; the remaining four tokens were used in the experiment. Consequently, it was not the most extremely coarticulated, but rather most typically coarticulated, tokens that were used. These tokens were then normalized (re-synthesized) in Praat for duration, amplitude and f0, according to the values shown below in Table 3.2.16 Distance condition 1 2 3 Duration (ms) 65-70 55-60 75-80 Amplitude (dB) 70 70 70 f0 (Hz) 120 150 200 Table 3.2. The duration, amplitude and f0 values used to normalize the tokens used in each of distance conditions 1, 2 and 3. 3.2.3. Task All perception study subjects began by performing the production task discussed in Chapter 2. Afterwards, they were given a brief introduction to the purpose of this research. They were told that language sounds can affect each other over some distance, that people can sometimes detect this, and that they were about to begin a task in which their own perceptual abilities were to be tested: they would hear vowel sounds taken from sentences like the ones they had just been saying, with some of these vowels sounding more like [i] than the others. So that they would not be discouraged by the more difficult contrasts, they were told that subjects in such experiments sometimes say 16 Statistical testing on the formant values of these standardized tokens confirmed that in terms of their distribution in formant space in the [i] versus [a] contexts, the standardized token sets were not more widely spaced--hence more easily distinguishable by listeners--than the originals. 101 afterwards that they felt they were answering mostly at random when the contrasts were very subtle, but often turn out to have performed at significantly better-than-chance levels. (This turned out to be the case in the present study as well.) Subjects were then seated in a small (approx. 10 feet by 12 feet) soundattenuated room in a comfortable chair facing a high-quality loudspeaker (Epos, Model ELS-3C) placed 36 inches away on a table 26 inches high. The stimuli (stored as .wav files) were delivered using a program created by the author using Presentation software (Neurobehavioral Systems), which also recorded subject responses. The tokens had been standardized in Praat at 70 dB (as discussed earlier) and were delivered at this amplitude, as verified by measurement on a sound level meter (Radio Shack, Model 332055). To make their responses, subjects used a keyboard that was placed on their lap or in front of them on the table, whichever they felt was more comfortable. Free-field presentation was used because data were also being collected from some subjects for a related study involving event-related potentials (ERPs; see Chapter 4). All subjects were given a very easy warm-up task about one minute long, during which easily distinguishable [i] and [a] sounds (not schwas) were played at the same rate and ratio as their counterparts in the actual task. Subjects were told to hit a response button when they heard a sound like “[i].” After completing this warm-up, they were told that the actual task would be the same in terms of pacing and goal (responding to the [i]-like sounds), but more challenging. Impressionistically, [i]colored schwas do have a noticeably [i]-like quality to them, particularly for distance 1 and to some extent for distance 2; participants’ feedback as well as the results to be 102 presented here indicate that subjects understood the task once they completed the warm-up. Figure 3.2. Design of the perception task. Each block consisted of 40 consecutive cycles of eight vowels each, with one randomly-placed [i]-colored schwa per cycle. Figure 3.2 above illustrates the organization of the perception experiment, which consisted of three blocks, with one block for each distance condition 1, 2 and 3, in that order. This sequence (as opposed to random order) was chosen so that each subject would begin with relatively easy discriminations, which it was hoped would keep them from being discouraged as they then progressed to the more difficult contrasts. Each block consisted of 40 cycles, each of which consisted in turn of eight consecutive schwa tokens, one of which was [i]-colored and the other seven of which were [a]-colored. Therefore, to perform with 100% accuracy, a subject would need to respond 40 times per block, by correctly responding to the one [i]-colored schwa in 103 each of the 40 cycles in that block. The [i]-colored tokens were randomly positioned between the second and eighth slot in each cycle, so that such tokens never occurred consecutively. The interstimulus interval (ISI) varied randomly between 1200 and 1400 ms, which provided a reasonable amount of time for subjects to respond when they thought they had just heard an [i]-colored vowel.17 The ISI between each cycle of eight stimuli was somewhat longer, being set randomly between 2800 and 3100 ms. (These varying ISIs were required for ERP data collection; see Chapter 4.) Each cycle of eight vowels therefore lasted approximately 15 seconds, and each block of 40 cycles lasted about 10 minutes. Subjects were not told about the structure of cycles within blocks, but having performed the warm-up task, they had a sense of how often the [i]-colored schwas would tend to occur. Participants could choose to take short breaks of one to two minutes between blocks if they wished, or could proceed straight through the whole experiment. 3.3. Results and discussion 3.3.1. Perception measure For an analysis of the results of the perception experiment, the raw scores are not an appropriate measure, and a statistic from signal detection theory called d’ (“dprime”) will be used instead (see Macmillan & Creelman, 1991). The reasoning behind 17 Since both stimulus and response timing were recorded by the stimulus delivery software, it was straightforward to assign each response to a vowel stimulus--namely, the immediately preceding one. While it is likely that subjects made occasional “late” responses (i.e., a response intended for one stimulus delayed until after presentation of the subsequent stimulus), participants’ feedback indicated that they adjusted readily to the rhythm of the task. 104 the idea that raw scores are not the best measure can be illustrated with a simple example. If a subject were to answer “i” for all items, the overall score would be 12.5% since only 1/8 of the tokens were [i]-colored. On the other hand, answering “a” for all items would be equally uninspired but would now result in fully 7/8 = 87.5% correct overall. In less extreme situations, more subtle problems associated with guessing or bias could also be overlooked. To approach the analysis correctly, we can consider the subjects’ task to be a signal-detection effort. All tokens heard by a given subject were schwa sounds, but in 1/8 of them, there was coarticulatory [i] coloration; this is the “signal” that the subject was trying to detect. In this context, a successfully reported [i] item is referred to as a “hit,” a successfully reported [a] item is a “correct rejection,” a wrong “i” answer is a “false alarm,” and a wrong “a” answer is a “miss.” The d’ statistic is the difference between a subject’s normalized hit and false-alarm rates: d’ = z(hits / trials with signal present) - z(false alarms / trials with signal absent). For rates of 0 or 1, z is not defined, so the values 0.01 and 0.99 are substituted. Given this, possible values of d’ range from -4.65 to +4.65, with an expected value of 0 if the subject has zero sensitivity to the contrast in question. Scores in the vicinity of 1 or higher generally turn out to be significant at the p<0.05 level. The variance of d’ is given by the formula H(1-H)/NH[φ(H)]2 + F(1-F)/NF[φ(F)]2, where H, F, NH, NF and φ are the hit rate, false-alarm rate, number of “i” trials, number of “a” trials, and probability density function of the normal distribution, respectively (Gourevitch & Galanter, 1967). Using this, a confidence interval for d’ can be 105 determined for each subject. If the lower endpoint of the interval is greater than zero, one can be confident (to the chosen degree of significance) that the subject has some sensitivity to the contrast. Note that similar d’ scores can be obtained with different distributions of correct and incorrect answers, both of which are used to compute the variance of d’, which means that d’ scores in the same range can have different significance testing outcomes. 3.3.2. Group results Both individual and group data results are given in terms of the d’ statistic. For each of distance conditions 1, 2 and 3, results were significant for the entire set of study participants (d’ = 4.04, 1.03 and 0.37 respectively, all p’s<0.001). When the two groups of subjects who provided perception data are considered separately, the results remain significant for each group for each distance condition (see Table 3.1 above). For simplicity of presentation, the details of these outcomes are presented in the following section, in which the outcomes for each subject are also given. 3.3.3. Individual results As was mentioned earlier, the later group of subjects (21 through 38) participated in two perception experiments—the behavioral one being described now and an ERP study to be discussed in Chapter 4. Because of the nature of the ERP study, that study was performed first, followed by this one. Although the ERP experiment did not require responses of any kind, subjects were exposed to the same stimuli used in this behavioral study. Because of this extra exposure to the stimuli that the initial group 106 of subjects did not have, it is possible that subjects in the later group had an added opportunity to process these sounds to some extent and hence may have performed the task differently from the subjects in the initial group. Therefore, for each distance condition, the d’ scores of each group are presented and averaged separately in Table 3.3 below; overall average scores for the entire set of participants are also given at the bottom of the table. The table displays the values of d’ obtained for each listener in each distance condition, together with their associated significance levels. The results show that respondents were well able to distinguish [i]- and [a]-colored schwas in the distance-1 condition; many respondents had near-perfect results here. The distance-2 condition was clearly more challenging, and seemed to represent a threshold of sorts, inasmuch as five respondents’ scores did not reach significance while four others’ did (the remaining subject provided no data here). The distance-3 condition was by far the most challenging, and respondents appear to have answered mostly at random, although one subject did attain the remarkable score of 2.28 and three others also had scores which were significant, albeit less strongly so. Mentioned above was the possibility that the later group of subjects might perform the task differently (presumably better) than the earlier group, because of the later group’s extra exposure to the study stimuli. The results indicate that this concern was not borne out, at least not to a substantial degree. The average scores for each group in each distance condition are not significantly different, and in fact the later group’s averages are slightly lower. It is possible that the later group did have an advantage in having been exposed to the study stimuli more, but with the price of some 107 additional fatigue, since the ERP task had required them to stay alert and relatively still for an hour or so. In any case, distribution of scores was also fairly similar between the groups, with all subjects scoring at much-higher-than-chance levels on the distance-1 task, on the order of half scoring above chance on the distance-2 task, and a small minority (two in each group) achieving significant results at distance 3. For the entire group of 28 perception study participants, the distribution of scores was as follows: at distance 1, all subjects (100%) scored above chance; at distance 2, 15 subjects (54%) did so; and at distance 3, four (14%) had a significant outcome. Also shown in the table are the overall results for each subject group, as well as for the entire pool of subjects who participated in either experiment. As might be expected, results were well above chance for both the distance-1 and distance-2 tasks. Further, although only a few subjects scored significantly better than chance in the distance-3 task, the group results were also above chance levels. 108 Subject Distance 3 Distance 2 Distance 1 8 9 10 11 12 13 14 15 16 17 Average 2.28 *** -0.65 0.16 0.20 0.52 0.67 * 0.77 0.16 0.27 0.55 0.49 *** 1.19 * no data 0.75 0.52 1.84 *** 1.71 *** 1.84 ** 1.04 1.16 0.89 1.22 *** 3.97 *** 4.65 *** 4.65 *** 4.18 *** 3.40 *** 3.45 *** 3.54 *** 4.65 *** 4.29 *** 4.15 *** 4.09 *** 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Average 0.41 0.94 * 0.10 0.00 0.33 0.44 -0.05 -0.20 0.49 -0.18 0.44 1.26 * 0.17 -0.07 0.27 0.40 0.15 0.59 0.31 * 0.70 1.16 * -0.07 0.79 1.15 * 0.69 * 1.03 * -0.32 0.57 1.21 * 0.53 2.04 ** 0.53 1.93 *** 1.72 ** 0.94 * 1.46 ** 0.80 * 0.94 *** 4.15 *** 4.65 *** 2.56 *** 3.92 *** 4.65 *** 3.83 *** 3.38 *** 4.29 *** 2.89 *** 3.97 *** 4.65 *** 3.97 *** 4.29 *** 4.63 *** 3.45 *** 3.97 *** 4.29 *** 4.65 *** 4.01 *** Overall average 0.37 *** 1.03 *** 4.04 *** Table 3.3. The table shows each subject’s d’ measure for each of the three distance conditions. Significant results are shaded for individuals and labeled, where * = p<0.05, ** = p<0.01 and *** = p<0.001. Subject 9 provided no responses for the distance-2 condition. 109 3.3.4. Correlation between production and perception In a study investigating the possibility of a link between production and perception, Perkell, Guenther, Lane et al (2004) compared the distinctiveness of 19 American English speakers’ articulations of the vowels in the words “cod,” “cud,” “who’d” and “hood” with those speakers’ performance in an ABX discrimination task (Liberman, Harris, Kinney & Lane, 1951). The perception task used two sets of synthesized words whose first three vowel formants were set at seven intermediate stages between “cod” and “cud” for the first set and between “who’d” and “hood” for the second set. Speakers who articulated the vowels more distinctly performed significantly better in the discrimination task than other speakers; the authors hypothesize that sensitive listeners establish, for phonemically contrasting sounds, articulatory target regions that are tighter and farther apart than is the case for listeners who are less sensitive. Although the contrasts explored in the present study are all sub-phonemic in nature, findings like those of the Perkell et al (2004) study raise a question relevant to the language-change scenario explored earlier: is there a correlation between individuals’ ability to detect coarticulatory effects and tendency to coarticulate? This possibility was investigated for the 28 study participants who provided both production and perception data. Because of the possibility—already raised—that production and/or perception results for the earlier and later group of subjects might not be a completely valid comparison, the results for each group will be presented separately; then the results for the whole group will be given. 110 As a quantitative measure of perceptual ability for a given subject, the average of that subject’s three d’ scores was used, while the production measure was the average over the three distance conditions of the Euclidean distance in normalized vowel space between that subject’s [i]- and [a]-colored schwas. For the earlier group of ten participants (Subjects 8 through 17), the correlation measure r between the perception and production measures so obtained was found to be 0.52, but this outcome is not statistically significant (p = 0.13). In addition, as Figure 3.3 below illustrates, this result is dependent on a small number of outliers and does not hold up if they are excluded. In case some other measures of perception or production might reveal a stronger relationship, other candidate measures were also tested, such as different relative weightings among the distance conditions and the use of logarithmic vowel space measures instead of raw formant numbers (cf. Johnson, 2003), but none of these led to substantially different results. 111 Figure 3.3. Correlation between averaged production and perception measures for the earlier group of ten subjects who provided data for both (r = 0.52, p=0.13). The outcome for the later group (Subjects 21 through 38) is even more problematic for the hypothesis of a production-perception correlation, as it finds that the correlation between the average d’ and vowel-space difference between contexts for this group is strongly negative (r = -0.60, p<0.01). This outcome is illustrated below in Figure 3.4 and like the results from the earlier group, appears to reflect the influence of a small number of outliers rather than constituting evidence of a perception-production relationship. 112 Figure 3.4. Correlation between the averaged production and perception measures for the 18 subjects from the later group (r = -0.60, p<0.01). Finally, combining the data for both groups for one overall comparison results in the positive correlation from the first group and negative correlation from the second group largely canceling each other out (r = -0.15, p=0.46), as shown below in Figure 3.5. Again, other measures of production, such as using logarithmically scaled formant values, do not substantially change this outcome. 113 Figure 3.5. Correlation between averaged production and perception measures for all 28 listeners, from both the earlier and later groups of subjects who provided data for both (r = -0.15, p=0.46). These results contrast markedly with those of the Perkell et al (2004) study cited earlier. However, the perception and production of phonemically contrasting vowels were the focus of that study, and the results were explained by the researchers as reflecting more accurate articulatory targets for such vowels on the part of more sensitive listeners. If this is accurate, one might actually expect less coarticulation on the part of such listeners in their productions of phonemically contrasting vowels, and it is unclear what expectations one might have for the articulation of schwa, whose phonemic status is unclear (see Chapter 1). In any case, more study will evidently be 114 needed before any strong claims can be made concerning a possible perceptionproduction relationship, as far as coarticulation is concerned. It is worth pointing out, however, that the lack of a production-perception correlation would not invalidate the language-change hypothesis discussed earlier. Recall that the basic idea of that hypothesis was that language change could occur as a result of some listeners perceiving some speakers’ coarticulation and in turn reproducing those patterns in their own speech, resulting in a feedback pattern eventually leading to language change. This scenario does not actually require that perception and production be correlated, although such a correlation would certainly make the whole story more compelling. Still, all that is really necessary for such a process to begin is that some minority of speakers coarticulate strongly enough for some minority of listeners to perceive it. Figure 3.6 below shows idealized production and perception measures mapped against each other, each with some threshold value (Pr0 and Pe0, respectively) above which speakers and listeners may participate in initiating the early stages of the language change scenario. As long as the highlighted upper-right corner of the figure is not empty—that is, as long as some speakers coarticulate a fair amount and some listeners are sensitive enough to perceive it and then reproduce the coarticulatory effects they have heard in a somewhat stronger form—it does not matter if part or all of the speech community at large exhibit a perception-production correlation or not. On the other hand, the lack of decisive correlation results in the present study may simply be due to the measures used being not sensitive or sophisticated enough to accurately reflect subjects’ “true” production and perception tendencies. For the time 115 being, the determination of actual values for Pr0 and Pe0 must be substantially deferred. Figure 3.6. Hypothetical threshold values of production tendency (Pr0) and perceptual sensitivity (Pe0), above which speakers and listeners may participate in early stages of coarticulation-related language change. The presence or absence of a productionperception correlation may be unimportant. 3.4. Chapter conclusion This study found a large amount of variation between listeners in the perceptibility of VV coarticulation. The nearest-distance effects appear to be in essence universally perceptible (at least for listeners with normal hearing), although it should 116 also be remembered that the vowel stimuli used here were those produced by subjects with above-average coarticulatory tendencies. In contrast to the case of near-distance effects, subjects’ sensitivity to longer-distance VV coarticulation was quite variable, although even these effects were perceptible to some listeners. Given the variation that is evident among both speakers and listeners, the interplay between production and perception of coarticulation in the context of actual language use is likely to be quite complicated, perhaps much more so than measures such as the linear perceptionproduction correlation investigated here can adequately describe. Although the perception results obtained here offer some insight into differences among listeners in their perception of subtle coarticulatory effects, the methodology used was admittedly rather crude, both because of the direct-questioning nature of the task (“Which vowel sounds different?”), and because vowels were excised from their natural contexts and played in isolation. In one example of a different approach, Scarborough (2003) cross-spliced recordings of words in such a way as to create stimuli which varied in the consistency of their coarticulatory patterns, and then had listeners perform a lexical decision task. Stimuli which were consistent with naturally occurring coarticulation patterns were generally associated with faster reaction times. Another avenue of research which appears to show potential is the use of nonbehavioral methodology to ascertain the perceptibility of coarticulation effects. For instance, the event-related potential (ERP) technique, which involves the recording of brain-wave (electroencephalogram) data, can provide insight into mental processes which occur whether or not subjects are consciously aware of them (for an overview of 117 ERP methodology see Luck, 2005). Groups of neurons firing in response to particular types of stimuli produce positive or negative electrical potentials at characteristic locations on the scalp during particular timeframes. When recorded and averaged over many trials, the background noise tends to zero and the response pattern consistently associated with the stimulus present in each trial remains; such patterns are called ERP components, and many are associated with linguistic processes. The most well-known such components include the N400, a negative deflection typically occurring on the order of 400 msec after a subject is exposed to a semantically anomalous stimulus (Kutas & Hillyard, 1980), and the P600, a positive-going wave generally associated with syntactic anomaly and occurring, as its name indicates, approximately 600 msec after stimulus presentation (Osterhout & Holcomb, 1992). Generally, such late-occurring components are thought to correspond to higher-order processing, while lower-level processing patterns are associated with earlier components. As one example of the productive use of ERP methodology in linguistic perception research, Frenck-Mestre, Meunier, Espesser et al (2005) used ERP methodology to investigate the ability of French listeners to perceive phonemic contrasts existing in English but absent in French. In fact, much ERP research has been directed at understanding listeners’ processing of phonemic contrasts. An ERP study investigating the perception of sub-phonemic contrasts resulting from VV coarticulation is presented in Chapter 4. 118 CHAPTER 4 — COARTICULATION PERCEPTION IN ENGLISH: EVENT-RELATED POTENTIAL (ERP) STUDY 4.1. Introduction The paradigm used in this ERP study targeted the mismatch-negativity (MMN) component, which is seen at fronto-central and central scalp sites approximately 150 to 250 ms after the presentation of a “deviant” acoustic stimulus occurring among a train of otherwise-similar (“standard”) acoustic stimuli. Specifically, the MMN is the negative deflection seen in the difference wave obtained by subtracting the response to the standard stimuli from that of the deviant stimuli. The MMN can be elicited, for example, by an occasional high-frequency tone occurring among a series of lowfrequency tones. Generally, larger differences between the deviant and standard stimuli are associated with MMN responses of smaller amplitude and greater peak latency (Näätänen, Paavilainen, Rinne & Alho, 2007). Most importantly for the present study, the MMN also shows sensitivity to linguistic phenomena such as phonemic distinctions. The MMN has counterparts in other sensory modalities such as the visual (Alho, Woods, Algazi & Näätänen, 1992; Tales, Newton, Troscianko & Butler, 1999), somatosensory (Kekoni, Hämäläinen, Saarinen, Gröhn, Reinikainen et al., 1997) and olfactory (Krauel, Schott, Sojka, Pause & Ferstl, 1999) senses; and in other 119 technological/methodological approaches such as magnetoencephalography (MEG; Hari, Hämäläinen, Ilmoniemi, Kaukoranta, Reinikainen et al., 1984), positron emission tomography (PET; Tervaniemi, Medvedev, Alho, Pakhomov, Roudas et al., 2000) and functional magnetic resonance imaging (fMRI; Celsis, Boulanouar, Doyon, Ranjeva, Berry et al., 1999). Näätänen, Paavilainen, Rinne and Alho (2007) provide a thorough review of the MMN and related phenomena. For more information on ERP methodology in general, Luck (2005) is an excellent source. A distinctive and important characteristic of the MMN is that can be elicited even if subjects are not actively attending the stimuli, such as when they read a book or watch a silent video (Näätänen, 1979, 1985; Näätänen, Gaillard & Mäntysalo, 1978), or even while they sleep (Sallinen, Kaartinen & Lyytinen, 1994). Therefore, researchers carrying out MMN studies do not depend on the ability or willingness of subjects to focus on behavioral tasks. Because of its automatic nature, the MMN has proven to be a valuable tool in clinical investigations where issues of auditory perception and memory encoding are relevant (for a review see Kujala, Tervaniemi & Schröger, 2007), and this aspect of the MMN was also a definite practical advantage in the present experiment, since subjects report that behavioral tasks involving [a]- and [i]-colored schwas sometimes become taxing. The MMN is generally thought to reflect the outcome of an automatic process comparing the just-occurred auditory event with a memory trace formed by the preceding auditory events; this may be referred to as the “model adjustment” hypothesis (Näätänen, 1992; Tiitinen, May, Reinikainen, & Näätänen, 1994). The MMN is believed to have generators located bilaterally in the primary auditory cortex 120 and in the prefrontal cortex, whose respective roles are thought to involve sensorymemory and cognitive (comparative) functions (Giard, Perrin, Pernier & Bouchet, 1990; Gomot, Giard, Roux, Barthelemy & Bruneau, 2000). Rinne, Alho, Ilmoniemi, Virtanen and Näätänen (2000) have found that during this process, the temporal (auditory cortex) generators act later than those in the prefrontal cortex, supporting the hypothesis that the outcome of sensory processing in the auditory cortex is passed to the prefrontal cortex, where a change-detection operation is performed. A more recent suggestion, the “adaptation hypothesis” (Jääskeläinen, Ahveninen, Bonmassar, Dale, Ilmoniemi et al, 2004), is that the MMN is in fact illusory, being caused by decreased neuronal response, “neuronal adaptation,” during the train of similar (standard) auditory stimuli, resulting in attenuation and increased latency of another component, the N1. In this scenario, the subtraction of the averaged standard-stimuli response from the averaged deviant-stimuli response (from which the MMN is derived) results merely in the illusion of a distinct component, but this is merely an artifact of these differences in N1 behavior between the standard and deviant conditions. While there is limited support for the adaptation hypothesis (see Garrido, Kilner, Stephan & Friston, in press, for a discussion of the debate), the model adjustment hypothesis remains more widely accepted, and one assumption fundamental to that hypothesis—namely that the MMN exists and can therefore be studied—will be made here. The main question the present ERP study seeks to answer is how sensitive the MMN might be to the sub-phonemic processing associated with the perception of VV 121 coarticulation. This will be of particular interest if the MMN provides a more sensitive measure in this context than behavioral methods offer. If so, the general picture of coarticulatory sensitivity could look something like Figure 4.1 below. The basic idea here is that a given segment S0 has coarticulatory effects that in general may extend across a number of neighboring segments, with stronger effects expected nearer to the influencing segment. In the figure, the coarticulatory effects of S0 on preceding (negatively-indexed) segments correspond to anticipatory effects, while those on following (positively-indexed) segments correspond to carryover effects. One would expect variation in the ranges involved depending on context and speaker, with differences in the spans of anticipatory versus carryover influence, and depending as well on the sensitivity of the listener, so that for example, the ranges might collapse to near-zero width in the case of very insensitive listener or a speaker who coarticulates very weakly. In general, nearer effects would be expected to be stronger and therefore more likely to be perceptible to listeners, something which could be confirmed in a behavioral experiment such as the one discussed in Chapter 3. As the distance from the segment increases, the coarticulatory effects are expected to be more subtle, and at some point may not be perceptible to a listener in a way which can be meaningfully measured with a given methodological approach. One question relevant to this study is whether the ranges of effects amenable to study with ERP methodology may be wider in general than might be usefully studied with behavioral techniques. Another possibility is that the real situation is more complicated than the proper inclusions depicted in Figure 4.1 indicate. 122 Figure 4.1. Hypothetical patterning related to the perceptibility of coarticulatory effects of one segment, S0, on preceding and following segments. These range from easilyperceptible effects at a narrow range to subtler effects at greater ranges, which may be amenable to study only with non-behavioral methodologies. At the limit, effects might be present but too subtle to be perceptually relevant to the listener. 4.2. Methodology 4.2.1. Participants and Stimuli All but one subject from the second group of 18 who participated in the behavioral experiment described in Chapter 3 also participated in this ERP experiment; the excluded individual was Subject 25, who was left-handed. This left 17 participants who contributed EEG data (6 female; ages ranging from 18 to 24, with mean 19.4 and SD 1.8). All were native speakers of English but as students at a university with a foreign-language requirement, tended to have had some exposure to at least one other language. Two indicated that they had substantial knowledge of another language, but 123 the group results did not differ if they were excluded, so their data was included. All subjects performed the ERP task before the behavioral task, since the latter involved debriefing each subject about the purpose of the study. Therefore, subjects were still unaware of the specific goals of the study during their participation in the ERP experiment; they merely knew that the study was language-related. Figure 4.2. Each block consisted of 40 consecutive cycles of eight vowels each for the distance-1 condition and 80 cycles for the distance-2 and -3 conditions, with one randomly-placed [i]-colored schwa per cycle. The stimuli and the sequencing of stimuli within each block were the same as in the behavioral experiment, as seen in Figure 4.2 above, but the lengths of the blocks were increased for the distance-2 and -3 blocks from 40 cycles to 80 cycles per block, to provide more EEG data for averaging. Since it had already been established in the initial perception study that the distance-1 task was very easy, the number of cycles in the distance-1 block was left at 40 as in the behavioral task. Stimuli were presented 124 1200-1400 ms apart within each cycle of eight stimuli, with short blink breaks of 28003100 ms between each cycle and extra-long blink breaks of 10.8-11.1 s every ten cycles. In addition, unlike in the behavioral task, where blocks were always presented in distance condition order 1, 2 and 3, ordering of the three blocks for a given subject was random in the ERP experiment. Since this experiment was intended to evoke the MMN, subjects did not have a response task, so they simply sat in the chair and were asked stay alert by watching a silent film playing on a portable DVD player positioned in front of the subject. It was after participating in the MMN experiment that subjects performed the behavioral perception task described in Chapter 3. 4.2.2. Electroencephalogram (EEG) recording 125 Figure 4.3. Configuration of the electrodes in the 32-channel cap. EEG data were recorded continuously from 32 scalp locations at frontal, parietal, occipital, temporal and central sites, using AgCl electrodes attached to an elastic cap (BioSemi). Figure 4.3 above shows the electrode configuration. Vertical and horizontal eye movements were monitored by means of two electrodes placed above and below the left eye and two others located adjacent to the left and right eye. All electrodes were referenced to the average of the left and right mastoids. The EEG was 126 digitized online at 256 Hz, and filtered offline below 30 Hz. Scalp electrode impedance threshold values were set at 20 kΩ. Epochs began 200 ms before stimulus onset and ended 600 ms after. After inspection of subjects’ data by eye, artifact rejection thresholds were set at ±100 μV and rejection was performed automatically. ERP averages over epochs were calculated for each subject at each electrode for each context (standard [a] and deviant [i]) and distance condition (1, 2 or 3). Analysis was performed using EEGLAB (Delorme & Makeig, 2004). Two subjects were excluded from the group results because of a persistently high proportion of rejected trials due to artifacts (over 50 percent), leaving 15 participants whose data was used in the analyses about to be presented. 4.3. Results and discussion Topological maps of the effects that were found are shown in Figure 4.4 below, and the associated waveforms are presented in Figure 4.5. Although the latency of the MMN component is typically expected to fall in the neighborhood of 200 ms, the effects seen here appear to be strongest closer to 300 ms, as can be seen in the figures. Previous research has shown that the amplitude and peak latency of the MMN is modulated by task difficulty, with larger and earlier MMN responses associated with greater deviations from the standard (e.g. Näätänen, 2001; Tiitinen, May, Reinikainen & Näätänen, 1994), so it is possible that the late effects seen here reflect a MMN-like response whose great latency is due to the subtlety of these sub-phonemic differences. The responses for distances 1 and 2 do have a distribution similar to that expected for 127 the MMN component and do show greater negativity for the deviant stimuli. For these reasons, for the purposes of the group testing whose results are about to be presented, mean amplitude over the time interval from 275 to 325 ms was used. 128 129 Figure 4.4. Topographical maps showing grand averages at each 50-ms interval in the time range [-50 ms, 400 ms], for distance conditions 1, 2 and 3. The top, middle and bottom sets of figures correspond to those distance conditions in that order. Units on the scale are microvolts. 130 131 132 Figure 4.5. Grand-average waveforms at selected electrode sites in the time range [-200 ms, 600 ms], for each of distance conditions 1, 2 and 3, in that order from top to bottom. In these graphs, green = standard ([a]), red = deviant ([i]), and black = difference (deviant minus standard). Negative is plotted downward. ANOVA testing was first performed on the group data for each distance condition with within-subject factors of hemisphere (left, mid, or right), anteriority (anterior, central, or posterior), and vowel context ([i] or [a]) in the time range [275 ms, 325 ms]. Greenhouse-Geisser and Sidak adjustments were performed as appropriate and are reflected in the results reported here. 133 At distances 1 and 2, highly significant effects and interactions related to electrode site and context vowel were found. At distance 1, there were significant main effects of hemisphere (F(1.36,19.0)=11.0, p<0.01), anteriority (F(1.78,24.9)=25.5, p<0.001) and vowel context (F(1,14)=14.9, p<0.01), and a hemisphere-anteriority interaction (F(2.04,28.5)=6.42, p<0.01). At distance 2, there were main effects of hemisphere (F(1.75,24.5)=15.2, p<0.001), anteriority (F(1.22,17.1)=14.9, p<0.01) and vowel context (F(1,14)=11.1, p<0.01), and hemisphere-anteriority (F(1.78,24.9)=4.82, p<0.05) and hemisphere-vowel (F(1.83,25.6)=4.48, p<0.05) interactions. Both effects were in the expected direction, with greater negativity in the deviant context relative to the standard context, showing up most strongly at frontal midline sites in both of the distance-1 and -2 conditions. In the distance-1 condition, this negativity was somewhat stronger in the right hemisphere than in the left hemisphere, as can be seen in Figure 4.4, but this difference was not significant in a pairwise comparison. At distance 3, only the effects of hemisphere and anteriority were significant (F(1.33,18.6)=7.94, p<0.01 and F(1.46,20.5)=10.2, p<0.01, respectively), although there was a marginally significant effect of context vowel (F(1,14)=2.25, p=0.156). The effects of hemisphere and anteriority again reflect that the greatest voltage differences were seen in the front midline region. However, for distance 3 this difference was in the contrary-to-expected direction, with greater positivity in the deviant context. Next, ANOVAs with vowel context and electrode as factor were performed on a restricted set of electrode sites (FZ, CZ, PZ, AF3, AF4, FC1, FC2, CP1 and CP2) located in the central midline region, where the MMN is typically expected and where the strongest effects were seen here; these sites will henceforth be referred to as “MMN 134 sites.” Results were similar to those found in the earlier ANOVAs, with highly significant outcomes for vowel context only seen at distances 1 and 2 and at best a marginal outcome at distance 3 (results at distance 1 for electrode and context, respectively: F(3.04,42.6)=12.0, p<0.001, and F(1,14)=15.3, p<0.01; for distance 2: F(1.80,25.2)=11.3, p<0.001, and F(1,14)=13.1, p<0.01; for distance 3: F(1.91,26.7)=3.97, p<0.05, and F(1,14)=2.32, p=0.15). Figure 4.6 below shows the topographic distribution of these effects in each distance condition. Figure 4.6. Topographic distribution of the MMN-like effects at 300 ms from stimulus onset at distances 1 and 2, and the marginally significant positivity found in that interval for the distance-3 condition, from top to bottom in that order. Units on the scale are microvolts. 4.3.1. Latency Because the timing of these effects is later than would generally be expected for an MMN effect, a further breakdown of their temporal properties was made. To this end, ANOVAs like those carried out earlier for the 275-to-325-ms interval were 135 performed over a series of 100-ms intervals, in 50-ms steps from stimulus onset up to 500 ms later. Table 4.1 below shows the outcomes of these tests. Also shown for comparison in the two rightmost columns are (1) the results for the interval from 275 to 325 ms after stimulus onset, where as noted before, effects were strongest in this study, and (2) results for the interval from 100 to 300 ms, where the MMN is typically expected. For distance 1, significant results begin to be seen relatively early, in the neighborhood of 100 ms. This may indicate that some but not all subjects began exhibiting a differential response by this time to the differently-colored vowels. For distance 2, the results indicate that the effect is concentrated in the later timeframe. As noted earlier, context-related effects in the distance-3 condition tended to be in the contrary-to-expected direction, but these differences were not significant. For distance 1 and perhaps to some degree at distance 2, the effects appear rather prolonged and together with the waveforms shown earlier in Figure 4.5, this raises the question of whether more than one component may be involved, or perhaps just as likely, whether these effects reflect contributions from different subgroups of subjects; some subjects may have a more delayed response to these sub-phonemic contrasts than others do, leading to smearing of the effects when examined at the whole-group level. To examine this possibility further, subjects were next separated into two groups based on the behavioral results discussed in Chapter 3. The results of that investigation are presented in the following section. 136 Interval 0(ms) 100 50150 100200 150250 200300 250350 300400 350450 400500 275- 100325 300 All sites Dist 1 Dist 2 Dist 3 ns ns ns ns ns ns * ns ns *** ns ns ** * ns ** * ns ** + ns * ns ns ns ns ns ** ** ns ** ns ns MMN sites Dist 1 Dist 2 Dist 3 ns ns ns ns ns ns * ns ns ** ns ns ** * ns ** * ns ** * ns * ns ns + ns ns ** ** ns ** + ns Table 4.1. Latency results for the entire subject group (n=15), showing the outcome of significance testing of mean amplitude difference between vowel contexts in the indicated time windows. Significant results are noted, with * = p<0.05, ** = p<0.01 and *** = p<0.001. Also noted are marginal results, where + = p<0.10. 4.3.2. Relationship to behavioral results For the purposes of this analysis, the group of 15 ERP subjects was broken down into two subgroups, based on their behavioral (d’) scores in the study presented in Chapter 3. Recall that the distance-1 task was quite easy, with all subjects performing near ceiling levels, while the distance-3 task was so difficult that few subjects performed at significantly better-than-chance levels. In contrast, the distance-2 task, on which about half of the subjects performed at better-than-chance levels while the rest did not, provides a convenient means for breaking the subject group into two nearly-equal-size subgroups, with seven in what will be called the “insensitive” group and the other eight falling into the “sensitive” group. 137 Table 4.2 below presents the results of a latency analysis like that whose outcome was shown for the entire subject group in Table 4.1 earlier, but with separate analyses performed for the “insensitive” and “sensitive” subject subgroups. Topological maps of the responses of the two groups as of 300 ms after stimulus onset are also given below, in Figure 4.7. The results are much different for the two groups, with no significant negative effects seen for the “insensitive” group, even at distance 1. This is in spite of the fact that all subjects performed at well-above-chance levels in the distance-1 behavioral task and in fact, the “insensitive” and “sensitive” groups did not differ significantly in their performance of the distance-1 task (respective mean d-prime scores = 3.82 and 3.94; t(8.81)=0.36, p=0.73). For distances 1 and 2, the “insensitive” subjects do show a negative trend in the deviant condition, but it is clearly much weaker than that of the “sensitive” group. One other noteworthy difference between the two subject sub-groups is the weak but significant positivity shown by “insensitive” subjects prior to 300 ms after stimulus onset in the distance-3 condition and much earlier in the distance-1 condition. The very early onset of these positive effects may be cause for suspicion that it is spurious, although Jääskeläinen et al’s (2004) “adaptation hypothesis,” mentioned earlier, may hint at an alternative explanation. Recall that according to that hypothesis, the MMN is illusory, ultimately being the result of a progressively reduced N1 during the train of standard stimuli, this reduced N1 itself being due to neuronal adaptation to the mutually-similar standard stimuli. Following a similar line of thought, it might be supposed that subjects’ MMN response in the present experiment is a combination of a number of subcomponents, some positive and some negative, occurring during the 138 same general timeframe. If so, perhaps the “insensitive” subjects have a greater tendency toward a reduced negative subcomponent in some circumstances, resulting in a net positivity. Of course, this must remain a very tentative hypothesis at this time. In any case, it is interesting that the “insensitive” subjects’ ERP response is so different from the others’, even in cases where their behavioral results were very similar, as in the distance-1 condition. Interval (ms) 0100 50150 100200 150250 200300 250350 300400 350450 400500 275325 100300 Sens. (n=8) Dist 1 Dist 2 Dist 3 ns ns ns + ns ns ** ns ns ** ns ns ** + ns ** + ns ** ns ns ** ns ns * ns ns ** * ns ** ns ns Insens. (n=7) Dist 1 Dist 2 Dist 3 (*) ns (*) ns ns (*) ns ns (*) + ns (*) ns ns (+) ns ns ns ns ns ns ns ns ns ns ns ns ns ns (+) ns ns (*) Table 4.2. Latency results by subject subgroup, with sensitivity being determined by each subjects’ performance on the behavioral task at distance 2. Outcomes are given for significance testing of mean amplitude difference between vowel contexts in the indicated time windows. Significant results are noted, with * = p<0.05, ** = p<0.01 and *** = p<0.001. Also noted are marginal results, where + = p<0.10. Results in parentheses indicate an outcome in the contrary-to-expected direction. 139 Figure 4.7. Topographic distribution of the effects seen at 300 ms after stimulus onset, broken down by subject group. The column at left shows results for the subjects deemed “sensitive” according to their d-prime scores, while the right column gives corresponding results for “insensitive” subjects. The top, middle and bottom rows correspond to distance conditions 1, 2 and 3, respectively. Units on the scale are microvolts. 140 Another issue relevant to this study is whether the magnitude of listeners’ ERP response to deviant stimuli might be correlated with their sensitivity as measured in the behavioral task. Analysis of the data obtained in this study does not find evidence of such a relationship. The correlation of ERP response (measured as difference in mean amplitude at the designated “MMN” electrode sites between the deviant and standard contexts) and d-prime scores obtained in the behavioral study was not found to be significant. This was so for all combinations of distance-2 d-prime scores as well as dprime scores averaged over the three distance conditions, and ERP measures as seen in either the [275 ms, 325 ms] time interval or the interval more typically associated with the MMN, from 100 to 300 ms. Finally, the data collected in the behavioral and ERP studies enable the investigation of two other questions relevant to the relationship between subjects’ behavioral sensitivity measures and strength of ERP response. 4.3.2.1. Can ERP results predict behavioral outcomes? For this analysis, subjects’ mean amplitude of MMN response over the interval [275 ms, 325 ms] for the distance-1 condition was examined, together with their behavioral scores in the distance-2 condition. A negative correlation was found, as might be expected (the MMN is by definition negative), but this was not statistically significant (r= -0.25, p=0.37). 141 4.3.2.2. Are ERP and behavioral responses correlated in general? Among the 15 ERP study participants, mean amplitude of MMN response was not significantly correlated with behavioral scores, in any of the three distance conditions. This may be due to (1) the noise associated with individual EEG data, (2) natural variation among subjects in strength of MMN response to particular stimuli, (3) the possibility that the MMN is an incomplete index of perception in this context.18 4.4. Chapter conclusion This is the first ERP study to investigate the sub-phonemic processing associated with the perception of VV coarticulation. Distance-1 VV coarticulatory effects (i.e., a vowel influencing another vowel across an intervening consonant) were associated with strong MMN-like patterns. Distance-2 effects (a vowel influencing another vowel across three intervening segments) were also associated with a highly significant MMN-like response, though not in the sub-group of subjects considered “insensitive” when tested behaviorally. At distance 3 (VV effects across five intervening segments), the situation was quite different, with at best weakly significant ERP effects that were positive instead of negative, in contrast with the behavioral (dprime) results which provided more straightforward evidence of some subjects’ perceptual sensitivity. These results do not provide evidence that ERP methodology can provide a better measure of listeners’ sensitivity to coarticulatory effects than behavior methods offer. However, this study does offer new information about the topography 18 A few individuals did not appear to generate an MMN-type response even in the easier listening conditions, but as the correlation results were similar with or without those subjects’ data, their data was included. 142 and timing of the processing of such effects, which can inform theories seeking to explain how listeners perceive them. 143 PART II: SIGNED-LANGUAGE PRODUCTION AND PERCEPTION CHAPTER 5 — COARTICULATION PRODUCTION IN ASL 5.1. Introduction The long-distance coarticulatory effects seen in the spoken-language study described in Part I of this project inspire the question of whether such effects might also be found in sign language, a question that appears to be unaddressed in the literature to date, though earlier research has examined some other aspects of coarticulation and related phenomena in ASL. These include Cheek’s (2001) work on handshape coarticulation and Mauk’s (2003) examination of location-related effects in the context of undershoot. Work by Nespor and Sandler (1999) and Brentari and Crossley (2002) on “h2 spread” is also pertinent here, dealing as it does with another (potentially) longdistance sign phenomenon. Cheek’s (2001) investigation of four signers used an infrared-based motion capture system (Vicon) to investigate handshape-to-handshape (HH) coarticulatory effects in a number of contexts, focusing on the different influences on neighboring signs of signs articulated with the “1” or “5” handshapes. Participants signed sign pairs 144 exemplifying various handshape sequences such as “1”-“1” (e.g. TRUE SMART) and “1”-“5” (e.g. TRUE MOTHER), and the resulting motion-capture data were analyzed for evidence of HH effects. The metric that was used in determining whether such influence had been exerted was the distance between two markers—one on the tip of the pinky and the other on the base of the hand—which was assumed to be larger when the pinky was more extended (as in the articulation of a “5” handshape) and smaller when the pinky was more curled inward (as for a “1” handshape). Cheek found significant differences between the “1” and “5” handshape contexts, in both the anticipatory and carryover directions. In addition, faster signing conditions were often, though not always, associated with stronger HH effects. Mauk (2003) also used the infrared-based Vicon motion-capture system and recruited four signers for his study, which investigated the phenomenon of undershoot in both spoken and signed language. The second part of his sign study examined HH effects, and like Cheek’s (2001) study, focused on the different articulatory influences exerted by signs with “1” and “5” handshapes on the handshapes of neighboring signs. The other part of his sign study examined location-to-location (LL) effects, and made use of neutral-space signs as the present study does. Subjects signed sequences consisting of three signs, each of whose specified locations was either neutral space or the forehead. In each sequence, the first and third signs had the same location, which was sometimes the same and sometimes different from that of the second sign. The sequences could therefore be classified as forehead-forehead-forehead (e.g. SMART FATHER SMART), forehead-neutral space-forehead (e.g. SMART CHILDREN SMART), neutral space-neutral space-neutral space (e.g. LATE CHILDREN LATE), 145 or neutral space-forehead-neutral space (e.g. LATE FATHER LATE). It was expected that neutral-space signs undergoing coarticulatory influence from forehead-location signs would be articulated at a higher vertical position than they would be otherwise, and similarly, that coarticulatory influence of neutral-space signs on forehead-location signs would result in the latter being articulated at a lower vertical position. Mauk found that forehead-location signs tended to exert significant coarticulatory influence on neutral-space signs, but that the reverse did not generally hold, indicating that signs specified for body contact may be less susceptible to coarticulatory influence than neutral-space signs. Like Cheek (2001), Mauk also found that faster signing conditions were generally associated with stronger coarticulatory effects. The coarticulatory effects found by Cheek and Mauk in their sign studies were influences of signs on immediately preceding or following signs—“distance-1” effects, following the terminology established in earlier chapters of this dissertation. In the present chapter, the possibility that coarticulation may extend over greater distances in the flow of signed language, just as it has been found to do in spoken language, will be investigated. Although such a study has, to the best of my knowledge, not been conducted before, previous studies have shown “h2 spread” to be an example of a relevant sign-language phenomenon which can extend across non-adjacent signs. The term “h2 spread” refers to the fact that in some situations, the non-dominant hand (h2) may assume the handshape and location for which it is specified in a twohanded sign, during the articulation of neighboring one-handed signs. Nespor and Sandler (1999) give examples of h2 spread in Israeli Sign Language, noting that it can 146 extend further than one sign away from the triggering sign, in either the anticipatory or carryover directions, though it is blocked by phonological phrase boundaries. These conclusions are supported by Brentari and Crossley (2002) in a study of prosody in ASL. Given the variability in coarticulatory behavior among subjects in the spokenlanguage study presented earlier, it seems reasonable to suspect that such variation will also be seen among signers, something that the present study also seeks to investigate. In Chapter 2, it was seen in the review of the relevant literature that many factors are relevant as we seek to understand spoken-language coarticulation. The same will no doubt be true of coarticulation in signed language. For example, Lucas, Bayley, Rose and Wulf’s (2002) investigation of forehead-to-cheek location changes found that while the location of the preceding sign was an important factor influencing whether such a location shift would occur in a target sign, grammatical category of that target sign was an even stronger predictor.19 In the speech production study discussed in Chapter 2, where the focus was on interspeaker variation, a relatively small number of contexts were examined and it was acknowledged that some factors relevant to the understanding of coarticulation would not be investigated; the same applies to the sign study to be presented here. Presented below in Figure 5.1 (repeated from the Introduction) are at left, the familiar vowel quadrangle, and at right, some typical sign locations. The sign HAT, for example, is articulated on the forehead, while the sign PANTS is articulated by both 19 For the present study, coarticulatory effects of nouns on verbs will be investigated. Lucas et al. (2002) found that in terms of susceptibility to coarticulatory influence from neighboring signs, nouns and verbs had an intermediate ranking, between function words (the most susceptible) and adjectives (the least). 147 hands at waist level.20 Shown near the middle of the respective articulatory spaces are schwa and neutral signing space; the latter is labeled “N.S.” Neutral space is the area in front of the signer’s body which serves as the location for many signs not articulated at particular points on the body. The arrows in the figure represent the expected direction of influence on schwa and neutral space of nearby vowels [i] and [a] in the case of schwa and of the illustrated sign locations (forehead, shoulder, waist) in the case of neutral space. The present study is motivated by the idea that schwa and neutral space may be somewhat analogous, both in terms of their central position within their respective articulatory spaces and of their coarticulatory behavior. It is important to point out that there is no claim being made here that (1) neutral space is in some sense underspecified in the way some researchers have suggested schwa may be (e.g. see Browman & Goldstein, 1992; van Oostendorp, 2003), or (2) that the sign parameter Location is analogous in sign phonology to vowels in spoken-language phonology. 20 As is customary in the literature on sign language, glosses of ASL signs will be given in capital letters. 148 Figure 5.1. Position and expected coarticulatory behavior of schwa in vowel space (left) and of neutral space (labeled “N.S.”) in the greater signing space. 5.2. Initial study Recall that for the spoken-language study, seven speakers were investigated and recorded, and these recordings analyzed, so that a suitable subset of those recordings could be modified for use as perception-study stimuli. Subsequent subjects were then investigated with respect to both coarticulatory production tendency as well as perceptual sensitivity. For this sign language study, this procedure was modified. Because it was expected that fewer ASL signers than English speakers would be recruited, an initial production study with one signer was conducted to determine the types and magnitudes of effects that might be expected, (1) in preparation for the fullscale production study, which is discussed later in this chapter, starting in Section 5.3, and (2) so that suitable stimuli could be created immediately for the sign-language perception study, which will be described in detail in Chapter 6. The latter point was particularly important because it is not possible to perform the kind of resynthesis on video stimuli that was relatively straightforward to carry out in the creation of the speech-study stimuli. Therefore, an initial production study was advantageous in establishing appropriate norms—both in terms of spatial magnitude and in terms of “distance” as the term was used in the speech study (i.e. across how many intervening signs effects might be seen)—for creating stimuli for the sign perception study. In addition, while it was very easy to recruit English users for the 149 speech study, deaf signers are a much smaller pool of potential subjects, so the procedure followed in the speech study—obtaining production data from several language users, creating stimuli from some subset of recordings from that group, and then having succeeding subjects do both production and perception tasks—was not ideal for the sign study. Hence, the following basic procedure was followed instead. First, this initial production study with one signer was conducted. The results indicated what kinds of effects might in general be expected, both in terms of spatial magnitude and across how many intervening signs. This information was then used in planning the full-scale production and perception studies, in both of which all subsequent subjects participated. 5.2.1. Methodology 5.2.1.1. Signer 1 The sole participant in the initial production study was a female native signer of ASL who was an employee in the laboratory where the study was conducted and agreed freely to take part. Importantly, because of her native-signer status and as the first participant in the experiment, her intuitions regarding ASL syntax and particular lexical items—explained below—set the precedents that later subjects were asked to follow. She will be referred to throughout the discussion of the sign experiments as Signer 1, with other signers to be introduced presently. 150 5.2.1.2. Task Figure 5.2. The locations of the context signs FATHER and MOTHER relative to neutral signing space (labeled “N.S.”), which is the location where the sign WANT is articulated. Randomized lists containing five copies of each of the following two ASL sentences, interspersed among 20 filler sentences, were used. According to Signer 1, without the second occurrence of the pronoun “I” the sentences would not seem natural in ASL. “I WANT GO FIND FATHER I.” “I WANT GO FIND MOTHER I.” The location of the context signs FATHER and MOTHER (the chin or forehead; see Figure 5.2 above), served as context location, while the location of the neutral-space sign WANT was the target location, corresponding to the distance-3 151 condition in the spoken-language study. The sign WANT is a particularly convenient target item because its articulation includes a lowering and pulling-back movement toward the signer which is very easily spotted in the motion-capture data. This is so for other signers as well, not only Signer 1. The coarticulatory effects of the context signs’ location on the location of the distance-1 and -2 signs FIND and GO will not be examined here because they did not have clear motion-capture signatures like that of WANT. The location of WANT in a typical utterance is shown below in Figure 5.3. Figure 5.3. The location of the target sign WANT in a typical utterance. The signs MOTHER and FATHER are a minimal sign pair, formed with the same handshape (“5”) and movement (two taps of the thumb against the body), but at different locations. MOTHER is articulated on the chin, while FATHER is articulated on the forehead, as illustrated earlier in Figure 5.2. The preceding three signs in these sentences—FIND, GO and WANT—are all articulated in neutral signing space; it is 152 expected that when such signs are articulated in the FATHER context, they may be positioned higher on average than in the MOTHER context. The first and last sign of each sentence, I, is articulated on the chest. Figure 5.4. Position of the two ultrasound markers on the hand and the one marker on the neck. 5.2.1.3. Motion capture data recording The signer signed these sentences while seated, with two ultrasound “markers” (emitters) attached to the back of the dominant hand and one reference marker attached to the front of the neck, as shown above (on a different person) in Figure 5.4. The same marker configuration was used for this and all later subjects. The ultrasound signals were detected with a set of microphones located approximately 750 cm away (Zebris system CMS-HS-L with MA-HS measuring unit; data collection performed with WinData software). This system uses triangulation to determine the position in threedimensional space of each marker at a given moment; this spatial information is recorded at a 60 Hz sampling rate with 0.1 mm spatial precision. To obtain relativized 153 coordinates, the coordinates of the neck marker were subtracted from those of the wrist marker since absolute coordinates would tend to change if the signer shifted her body position, while relativized coordinates should be more stable.21 Figure 5.5 below shows how the x-, y- and z-directions are defined in this study. Smaller or greater x-, y- and zvalues correspond respectively to the directions left or right, back or forward, and down or up, relative to the signer. While signing, the signer was videotaped in addition to being recorded with the motion-capture equipment, so that the sign sentences could be inspected later. This was necessary in order to ensure that motion-capture data from utterances containing performance errors likely to be problematic for the purposes of a coarticulation study— such as false starts or long mid-sentence pauses on the part of the signer—would not be included in the analysis. The sentences that the signer needed to sign were presented on a computer screen 36 inches in front of the signer. The signer began the articulation of each sentence with her hands in her lap, pressed a button on a keyboard (located at the nearest edge of the same table that the computer screen rested on) in order to see the next sentence that she was to sign, signed that sentence, returned her hands to her lap, and again pressed the button to proceed to the next sentence. The sentences were organized in blocks, with the same sentence frames in each block and the context signs randomly ordered within blocks. 21 The wrist marker generally provided cleaner data than the back-of-hand marker, in the sense of having fewer cases of occlusions, signal drop-out and other problems, and hence the analyses presented here use relativized wrist position rather than relativized back-of-hand position. Because z-values relativized to the neck are negative for positions below the neck and positive for those above, a constant was added to all relativized z-values for each subject to reset the z=0 position to approximately waist level. 154 Figure 5.5. Definition of x, y, and z dimensions relative to signer. 5.2.2. Results and discussion Figure 5.6 below shows the z-coordinate (altitude) of the wrist of Signer 1’s dominant hand during the articulation of two sentences of the form “I WANT GO FIND (X) I,” where (X) is a context sign MOTHER or FATHER. Time is shown along the horizontal axis with successive labels 1 second apart. The sentences shown in the figure have context words MOTHER and FATHER respectively (the intervening filler sentences had other context signs not discussed here). The overall up-then-down pattern of each sentence reflects the movement of the signer’s dominant hand, first from the lap to the chest (for the sign “I”) and neutral space region (WANT GO FIND), then to its highest point on the chin or forehead (for MOTHER or FATHER), and finally back down to the chest area for “I” and then to the subject’s lap. Each of the two arrows pointing toward the small zigzags near the start of each of those two sentences indicates a local minimum characterizing the sign WANT, which is articulated with both hands facing palms-up in neutral space making a slight 155 pulling motion down and toward the signer. It is the z-coordinate at this local minimum that will be compared between contexts; it is expected that in general, it will have a greater value in sentences whose context signs are located higher on the subject’s body, as happens to be the case in the particular instantiations of the MOTHER and FATHER sentences shown in Figure 5.6. It will be seen presently that this characterization of WANT (i.e. as having a dip in the z-dimension) held true in general for all of the signers who took part in the sign production studies, though with certain differences in detail that will be explored in the discussion of the main sign production experiment in Section 5.3. 156 Figure 5.6. The relativized z-position (height in cm) of Signer 1’s wrist during the articulation of two ASL sentences of the form “I WANT GO FIND (X) I.” The arrows indicate the local minima characteristic of the sign WANT; these minima were taken as defining that sign’s spatial location for the purposes of analysis. For signs other than WANT and the context signs MOTHER and FATHER, the labels in the figure can only be considered approximate indications of sign position. Such z-minima for WANT were found very consistently in the sentences signed by Signer 1. Because their position in the z-dimension was constrained by particular upper and lower bounds (namely, the locations of the surrounding signs), their physical range was relatively narrow and no rejection or replacement of outliers was needed. For 157 this initial study as well as for the main sign production study, a trial could be rejected for one of two reasons. First, if the video recording of the signing session showed that the signer made an obvious error or false start during a sentence, that trial was rejected. For example, this would be the case if in signing the sentence “I WANT GO FIND MOTHER I,” the subject signed “I WANT GO FIND,” then moved the hand to the forehead and began to sign FATHER before realizing her mistake and finishing the sentence with “MOTHER I.” For the entire pilot sign study, data for 65 sentences signed by Signer 1 were obtained, data for five of which were rejected for such reasons, a 7.7 percent rejection rate. A second possible reason for rejecting a trial was when a signer reduced the target sign—which in the pilot study was always WANT—so much that the sign’s characteristic z-minimum was not visible. Any decision to reject trials was of course not ideal, and was made only as a last resort. It should be noted that these z-minima were present in many cases in which the signing of WANT was so rapid in the flow of signed sentences that it was hard to discern as a separate sign even in the video recording. Since the video had a lower temporal resolution—29.97 frames per second—than the motion-capture data, which were sampled at 60 Hz, the video recordings were not more useful in seeking to recover “lost” trials than the motioncapture data was. However, these two kinds of data did complement each other. In particular, video recordings for cases of missing z-minima almost always showed that in those cases, the signer had actually “contracted” the target sign with neighboring signs like “I.” These cases are of enough interest that they will be discussed separately 158 when the results of the main experiment are described. Signer 1 did not contract signs in this way and so no such rejections were necessary for her trials. Table 5.1 below gives the average z-value of the local minimum defining the sign WANT in the contexts FATHER and MOTHER, together with the significance testing outcome using a paired t-test. Paired t-testing was done to guard against the possibility that neutral signing space might drift slightly over the course of the experiment, being more similar for adjacent or near-adjacent utterances.22 Therefore, the pairings were made between z-values for WANT in the first FATHER and MOTHER sentences, in the second such pair, and so on through the fifth. In the case of missing data, as when a trial was rejected, non-paired t-tests were performed instead. The results show that the context-related difference in height was in the expected direction (a higher altitude for WANT in the context of FATHER, the sign articulated higher on the body), and significant at the p<0.01 level. Therefore, not only did LL coarticulation occur, but did so across two intervening signs. Following the terminology established in the earlier-discussed speech studies, such sign-to-sign effects will be referred to as distance-3 effects, and likewise for distances 2 (effects across one intervening sign) and 1 (adjacent sign-to-sign effects). Another signing session was then run with the same signer and following the same methodology, but with other context signs. Since MOTHER and FATHER are located relatively closely in signing space, it was felt that context-related effects for pairs of signs articulated as far apart as possible in signing space might result in stronger effects, just as [i] and [a] were chosen as targets of study in the speech study because of their great distance in vowel space. The other context pairs that were tested 22 This issue will be examined more closely in Section 5.3.3. 159 were DEER-RUSSIA (a minimal or near-minimal sign pair; both are 2-handed signs articulated with “5” handshapes, the former at the temples and the latter at the waist), and HAT-PANTS (signed with a “B” handshape on the head, and with an open-closingto-bent “B” at the waist, respectively). The same sentence frame, I WANT GO FIND (X), was used. The results for those sign pairs are also given in Table 5.1. The result for each of these two context pairs is a trend in the expected direction which approaches but does not reach significance. FATHER (forehead) MOTHER (chin) Average z value (cm), with SD in brackets 8.65 [1.43] 7.12 [1.28] HAT (head) PANTS (waist) 17.1 [1.97] 16.0 [2.47] p = 0.102 DEER (head) RUSSIA (waist) 16.7 [0.92] 15.4 [0.77] p = 0.104 Context Significance test results p < 0.01 Table 5.1. Average relativized z-position (height) of the sign WANT in various contexts at distance 3, with results of significance testing between context pairs also given.23 An investigation of LL effects in other directions than the z-dimension, and at other distances, was also conducted. To investigate forward-back (y-dimension) and 23 The apparent difference in z-values between the first word pair (where z = approx. 8 cm) and the others (where z = approx. 16 cm) is brought about because the data for the final two word pairs was collected in a different recording session than for the FATHER-MOTHER pair. Since these z-values are recorded with respect to an arbitrary point in three-dimensional space which is approximately but not exactly the same between recording sessions, comparisons are valid between items in each context word pair but not between words in different pairs. 160 right-left (x-dimension) effects, the behavior of WANT in the context of the signs BOSS, CLOWN and CUPCAKE was also examined. These form a near-minimal sign triple, all formed with a “bent-5” or “claw” handshape, and are located respectively on the shoulder, nose and upright palm of the non-dominant hand (held in neutral space). The locations of the context signs HAT, PANTS, DEER, RUSSIA, CLOWN, BOSS and CUPCAKE are illustrated below in Figure 5.7. In some of the signs shown in the figure, contacts at specified locations are aided by noticeable accommodation by parts of the body other than the dominant hand. For example, in the figure it can be seen that HAT and DEER are articulated with the head tilted forward to meet the hand(s). While this may result in a somewhat shorter travel distance on the part of the hand(s) in such situations, it must be kept in mind that the coarticulatory effects of a given context item are being compared to those of another context item, and that these pairings were made between items located far from each other in signing space. For example, the height (z-value) of the dominant hand in signing HAT is still quite high compared to that for PANTS, regardless of whether the head is tilted forward while HAT is signed. Therefore, coarticulatory effects should not be expected to be substantially reduced in such situations compared to what they would be if no such accommodations occurred. 161 Figure 5.7. Starting on the top row, going left to right, the locations of the context signs HAT, PANTS, DEER, RUSSIA, CLOWN, BOSS and CUPCAKE are shown. 162 Analysis of coarticulation in all of these contexts indicates that effects like those shown in Table 5.1 for height may also be found for side-to-side and front-back location coarticulation as well. Table 5.2 below shows results at distance 1 for these two context sign pairs and one sign triple. The sign WANT is again the target sign, this time in the distance-1 condition, in the sentence “I WANT (X) I,” where the (X) represents the context item. HAT (head) PANTS (waist) Average z (up-down) 17.8 [1.5] 15.7 [1.9] DEER (head) RUSSIA (waist) 16.3 [0.6] 14.5 [1.7] Context BOSS (rt. shoulder) CLOWN (nose) CUPCAKE (N.S.) Average x (right-left) 36.5 [1.1] 34.5 [1.1] 35.6 [0.7] Average y (front-back) 0.20 [1.1] 1.41 [1.3] 1.43 [0.4] Significance test results p=0.08 p<0.05 x: p<0.01, y: p<0.05 Table 5.2. Average relativized values in cm of x (right-left), y (front-back) and z (updown) dimensions of the sign WANT in various contexts at distance 1, with results of significance testing between contexts also given. Standard deviations are also given in brackets. These measurements are based on the position of the wrist marker on the signer’s dominant hand The z-related effects for the context word pairs HAT-PANTS and DEERRUSSIA are stronger here than they were at distance 3: context-related differences at distance 1 are in the neighborhood of 1.5-2 cm, rather than the 1 cm or so that was 163 typical at distance 3. Since BOSS is articulated further to the right than CUPCAKE or CLOWN, one might expect that WANT in the BOSS context would be articulated farthest to the right, which did in fact occur, as indicated in the second column of Table 5.2. Similarly, the third column in the table shows that WANT in the BOSS context was articulated in a less frontward position than in the CUPCAKE or CLOWN contexts, which one might expect given that the right shoulder is further back in the ydimension than either the nose or neutral space.24 This chapter now continues with a discussion of the main sign production study. How the results of the initial sign production study were used to inform the sign perception study will be presented in Chapter 6, along with a complete presentation of that perception study, 5.3 Main study The results of the initial study indicated that LL coarticulatory effects occur between adjacent signs as well as over wider ranges. For the main production study, the same basic methodology was followed as in the initial study, but with some modifications to the sets of sentence frames and context items. 24 In contrast to these results, there were a small number of context-related effects which were in the contrary-to-expected direction. No clear pattern emerged in such cases which would account for these findings, but other researchers in sign phonetics have sometimes seen similar instances of dissimilatory behavior that also resisted easy explanation (e.g. Claude Mauk and Martha Tyrone, p.c.). Some evidence of dissimilatory behavior was also seen occasionally in the signers who took part in the main sign production study, as will be seen presently. 164 5.3.1. Methodology 5.3.1.1. Subjects Four other deaf participants took part in this study. All were residents of Northern California who were recruited through advertisements or word-of-mouth and were paid for their participation. All were uninformed as to the purpose of the study. Relative to the spoken-language study participants and their use of English, these signers’ backgrounds were quite varied in terms of their ages of acquisition of ASL and in many aspects of their usage of ASL. Therefore, some additional detail concerning their individual backgrounds and signing behavior will be presented both here and throughout the rest of this chapter. Table 5.3 below gives some basic demographic information concerning the five subjects who took part in the initial and main sign studies, starting with the signer from the initial study. Continuing the consecutive numbering of study participants established in earlier chapters, these individuals would be considered Subjects 39 through 43, but for convenience they will generally be referred to subsequently as Signers 1 through 5. Subject Gender Age Handedness Age of Acquisition of ASL 39 (Signer 1) f 35 R Native 40 (Signer 2) f 38 L Native 41 (Signer 3) m 37 L Late (~high school) 42 (Signer 4) m 40 R Early (~age 3) 43 (Signer 5) f 33 R Early (~early childhood) Table 5.3. Demographic information on the five signers who participated in the sign production studies. Signer 1 was the sole participant of the initial sign study. 165 5.3.1.2. Task Each of the four signers was asked to perform essentially the same task that was performed by Signer 1 in the initial production study just described, but with six repetitions of each sentence instead of five, in the hope of increasing statistical power without increasing the overall duration of the task too greatly. Some changes were also made in the set of sentence frames and context signs. First, the initial study having found significant results as far as at distance 3, the possibility of further-distance effects was investigated in the main study by constructing an additional sentence frame, I WANT GO FIND OTHER (X), to create a distance-4 condition. Figure 5.8. The beginning and ending points of the target sign WISH. Second, an additional target sign, WISH, was examined; this sign is articulated with a “C” handshape, with the palm toward the signer, with a downward motion on the 166 chest, which results in a trajectory in the z-direction, and a characteristic local minimum, similar to those of WANT. Therefore, target signs in neutral space (WANT) as well as on the body (WISH) could be investigated in the same study. The starting and ending points of WISH in a typical utterance are illustrated above in Figure 5.8. The set of sentence frames and context signs that were used in the main study are shown in Table 5.4, with (X) representing the context sign in the sentence frames. Distance Sentence frame for WANT Sentence frame for WISH 1 I WANT (X). I WISH (X). 2 I WANT FIND (X). 3 I WANT GO FIND (X). 4 I WANT GO FIND OTHER (X). Handshape & Sign I WISH GO FIND (X). Location Movement (Palm Orientation) HAT B (down) Forehead Tap head twice PANTS B/Bent B (down) Waist or thighs Flick fingertips twice BOSS Bent 5 (down) Shoulder Tap shoulder twice CUPCAKE Bent 5 (down) Non-dominant palm Tap palm twice, no rotation CLOWN Bent 5 (back) Nose Tap nose twice, no rotation Table 5.4. Sentence frames and context signs used in the main sign production study. The set of context items used in the main production study was {HAT, PANTS, BOSS, CUPCAKE, CLOWN, <red>, <green>}. An explanation of the context items <red> and <green> will be given presently. The target sign WISH was not investigated for all four distance conditions mostly as a way to shorten the overall duration of the 167 task, which had been found in pilot testing to be taxing with all combinations of sentence frames and context signs that were initially used. For similar reasons, the context sign pair DEER-RUSSIA was not included in the task for signers in the main study. An additional reason for omitting this pair was that the form of the sign for RUSSIA used in this experiment is a historically older variant; most signers now use a newer variant which is not articulated near the edge of signing space and thus was not a useful candidate for the sorts of comparisons being made here. During pilot testing for the main experiment, use of the historically older form often resulted in apparent false starts or in outright substitutions with the modern variant, as shown below in Figure 5.9. Figure 5.9. Newer variant of RUSSIA, signed here by accident instead of the older variant. 168 In addition, two non-linguistic actions were also added to the set of context items, so that a comparison of linguistic and “non-linguistic” coarticulation might be made. The goal here was to create sign-like actions articulated at locations spanning the vertical range of signing space just as the locations of the context signs HAT and PANTS do. The question then was whether coarticulatory effects in this “nonlinguistic” condition would be similar to those in the linguistic condition for context signs spanning a similar distance in the articulatory space. Figure 5.10. The apparatus used for the non-linguistic contexts <red> and <green>, performed by flipping the top and bottom switches on the device, respectively. The distance between the two switches as shown here is approximately 24 inches. 169 These non-linguistic actions, to be referred to as <red> and <green>, were performed by flipping one of two switches attached to a vertically oriented bar-like holder, shown in Figure 5.10 above, that was braced to the same table on which the computer screen and keyboard were resting. This holder was positioned in front of the signer’s dominant hand, with one switch at the holder’s high end and the second switch located at its low end directly below the first, a height difference of approximately two feet; this distance was adjusted slightly for each subject, as explained below. Flipping the top switch turned on a red light (and hence this non-signing action was termed the <red> context), and flipping the bottom switch turned on a green light (the <green> context). During the course of the signing task, the instruction to flip the top switch was given by a red upward-pointing arrow, and likewise for the bottom switch and a green downward-pointing arrow. This red and green pattern was chosen in order to create a task that was non-linguistic but nevertheless intuitive and would not require special training or a linguistic “go” signal on each trial, as it was assumed subjects would be familiar with the orientation of red and green lights on traffic signals. The vertical position of each switch was adjustable, so that the distance between them could be modified for each signer. This was done in order to make their separation proportional to each signer’s own signing space, this proportion being defined in terms of signers’ height: the distance from the <red> location to the <green> one for each signer was set at one-third of that individual’s height, with the <green> switch positioned just under the table, at lap level, and the <red> switch at head level 170 with the subject seated, thus approximating the span of the locations of the linguistic context items PANTS and HAT. During the course of signing the sentence-frame + context-item combinations, <red> or <green> were treated like all of the other (i.e. sign) items. In other words, signers were instructed to embed them in the appropriate sentence frame, so that for example in the distance-3 condition with WANT as target sign and <red> as context item, the subject signed “I WANT GO FIND,” immediately flipped the top switch, finished the sentence with the resumptive “I,” and put her hands in her lap in preparation for signing the next sentence. The task was organized into six blocks, one for each of the sentence frames given earlier in Table 5.4. Since each frame was articulated with eight possible context items repeated six times each, 48 sentences were signed in each block, for a total of 288 sentences in total signed by each subject. The task therefore required continuous and repetitive signing for on the order of 25 minutes, not including breaks. For Signers 1 and 2, the task was not organized with one frame per sentence block; instead, there was frequent shuffling of sentence frames within blocks. However, this meant that if a subject decided to quit the task with a substantial portion of the trials left uncompleted, some or all sentence frames would have been repeated fewer than the desired six times per context item, perhaps weakening the statistical results. In fact, Signer 2 became fatigued by the task and did not complete it, though fortunately this occurred near the end of the task. By having later subjects complete all repetitions of a given sentence frame in one block before moving on to another frame in the next block, this issue was bypassed. 171 Signers were instructed to sign at a comfortable, natural rate and to avoid overly slow or formal signing. Instructions similar to those given to speech study participants were given here: signers were asked to imagine they were in an informal social situation, that someone had asked what they wanted, and that for whatever reason, the answer was “I WANT GO FIND CUPCAKE” (or whatever the sentence to be signed in that trial was). Signers were given a brief warm-up task so that they would be familiar with the basic format of the real experiment—the sentence frames, context signs, nonlinguistic context items, and so on. In some cases, which will be described in detail later, subjects’ preferred signs for some items like HAT or PANTS were completely different than had been established by Signer 1, and in such cases the warm-up task provided an opportunity to politely ask subjects to employ the signs that were being used in this experiment. In such cases it was emphasized that this was not a matter of their preferred forms being “wrong” but rather of particular sign forms having been chosen for a good reason which would soon be made clearer. (As with the speech study, subjects were given an explanation of the purposes of the overall study between their completion of the production task and their commencement of the perception task.) During the actual experiment, if subjects made substitutions of one sign for another (e.g. in cases where a signer’s preferred form of HAT or PANTS was a variant different from the form chosen for this experiment), polite reminders were given not to do so. On the other hand, if the desired signs were being performed but in a reduced or otherwise slightly modified form (e.g. because of assimilation of the locations of adjacent signs), no such corrections were made, since variation of this sort reflected the 172 kind of phonetic/phonological behavior that this project sought to investigate. It will be seen presently that such behavior was quite pronounced in some cases, to the point that it was inconvenient from the point of view of statistical analysis. Nonetheless, it seemed preferable that subjects not become self-conscious of their behavior at the phonetic level, lest their articulation become slower, more formal, or otherwise less natural than in their usual signing. 5.3.2. Results and discussion Discussion of the results to be presented here is more involved than was needed for the initial sign study for a number of reasons. Most importantly, there was a great deal of variability among signers in their signing behavior, particularly with respect to the amount of reduction or assimilation that was seen for certain signs, as well as in other ways that will also be discussed presently. For individual signers, there was much variability in signing behavior even for particular items in particular contexts. Although comparisons between modalities are necessarily inexact, such variability seems more pronounced in the signing data than in the speaking data, a theme that will recur in the remainder of this dissertation. For group analyses of the main sign study data, a normalization procedure was used, similar to the one based on Gerstman (1968) that was applied to speakers’ raw formant values in carrying out group analysis of the speech studies discussed earlier. Here, starting with a given signer’s raw motion capture value z raw , the corresponding normalized value is given by the formula z norm = 999 * (z raw - z min) / (z max - z min), 173 where z max and z min are the assumed highest and lowest values of z-locations in that signer’s signing space. Each signer’s z min was an averaged z-value of the signer’s hand in the lap position between signed sentences during the recording session (essentially the same height as that of the switch used in the <green> condition), while z max was that value plus one-third the signer’s height, the same value as that which had been used to determine the span between the switches the signers flipped in the <red> and <green> contexts. The same scaling factor was used to normalize x- and y-values, with the normalized values centered around the mean x- and y-coordinates of the sign WANT across all articulations of that sign in a given distance condition and sentence frame. For left-handed Signers 2 and 3, the x-values obtained from the formula above were subtracted from 999, essentially providing a “mirror image” equivalent of those values relative to the x-dimension, so that testing for rightward or leftward effects would be consistent across all four signers. The procedure therefore has the effect of scaling each signer’s motion-capture coordinates relative to the size of his or her own signing space, making comparisons between signers more reasonable—though, as in the case of spoken language normalization procedures, not unproblematic. Because of the scaling factor used, one normalized unit is equivalent to a spatial distance on the order of half a millimeter, varying somewhat for each signer and each dimension x, y and z. Group results using these normalized articulatory space measures are presented in the next section, which is then followed by four sections each devoted to the results for an individual signer. 174 5.3.2.1. Group results ANOVAs with context item as factor were used to compare context-related differences for each sentence frame for each of the x-, y- and z-directions, in cases where an expected difference in direction could be predicted based on the relative positions of the context items, these predictions being as follows. For the pairs HATPANTS, <red>-<green>, CLOWN-BOSS, BOSS-CUPCAKE, and CLOWNCUPCAKE, it was expected that the first context item in each pair as just listed would be associated with larger z-values (i.e. greater height) than the second item. For BOSSCLOWN and BOSS-CUPCAKE, greater x-values (i.e. more rightward position) were expected for target items in the context of BOSS; and a greater y-position (more frontward) was expected for targets in the context of the first item in each pair CUPCAKE-BOSS, CUPCAKE-CLOWN and CLOWN-BOSS. The approximate locations of these items are illustrated in Figure 5.11 below, as points on the body (left) and as points in an idealized three-dimensional space (right). The locations of these items also indicate their expected direction of coarticulatory influence on nearby neutral-space signs. 175 Figure 5.11. Idealized locations on the body and in three-dimensional-space of the seven context items, which determine the expected direction of coarticulatory influence these items should have on preceding target signs. The group and individual quantitative results will be given in both numerical and pictorial form, with the latter following the same basic layout as seen in the threedimensional schematic presented in Figure 5.11. The perspective used in the schematic, a front-left view, matches that of the video recordings that were made of signers during the experiment, images from which are interspersed throughout this chapter. The three axes indicate the left-right (x), back-front (y) and up-down (z) directions. For lefthanded Signers 2 and 3, the sign BOSS was articulated on the left shoulder, so the expected coarticulatory influence of that sign is in the negative-x direction (i.e. to the left from the point of view of the signer). As mentioned before, these handednessrelated differences in the x-direction were corrected for the purposes of group testing, 176 but in the subsequent presentation of individual subjects’ numerical results the original x-values are used. Table 5.5 below gives the numerical results of the sign study averaged over Signers 2 through 5, given in normalized coordinates. The table is divided into six subsections, each giving the results for one particular sentence frame, as indicated below each subsection.25 Each of the six subsections is accompanied by a threedimensional diagram like that in Figure 5.11, placed just above the numerical results. In these diagrams, values have been scaled independently for each of the three spatial dimensions. This was done to maximize the contrasts of interest, but also results in some distortion of the relationships of the points within the space as a whole. It should be borne in mind that the three-dimensional diagram in Figure 5.11 represents a volume approximately equivalent to the upper half of the signer’s body, but the diagrams in Table 5.5 represent much smaller regions—namely those in which the target signs WANT and WISH were articulated in the various contexts—typically spanning distances on the order of several centimeters. Standard deviations are not given here but were generally on the order of 50 normalized units, or about two to three centimeters. Unexpectedly, none of the differences in target-sign location between context pairs results were statistically significant. However, in 37 of the 60 context pairs that were examined, trends were in the expected direction, which itself can be considered significant in that such an outcome would be less than 5% likely if context-related 25 Data for Signer 1 were not included in these group results because that signer had participated in multiple recording sessions, using somewhat different sentence frames, numbers of repetitions, and contexts than were used in sessions with later signers. An overview of results for that subject was given in Section 5.2.2. 177 differences were purely random. In addition, despite the absence of significant effects between particular context pairs, other trends were apparent. For example, differences in target-sign position related to the HAT-PANTS context pair were in the expected direction in all sentence frames but one, the distance2 condition with target sign WANT. On the other hand, results related to the nonlinguistic context pair <red>-<green> were completely mixed. For the set BOSSCLOWN-CUPCAKE, it was expected that BOSS should be associated with the greatest x-values, which was the case in all conditions in all cases but the first sentence frame (I WANT (X)), in which the values for the BOSS-CUPCAKE pair are in the contrary-toexpected direction, though the magnitude of their difference in this case is minimal. Finally, in the y-direction, where it was expected that values associated with the BOSS context should be smallest, the expected trends are more or less evident for distances 1, 2 and 3 with target sign WANT, modulo some noise on order of 10 normalized units, about half a centimeter. 178 I WANT (X) <green> z HAT y x CUPCAKE <red> PANTS BOSS CLOWN Context pair HAT–PANTS x y <red>–<green> BOSS–CLOWN BOSS–CUPCAKE CLOWN–CUPCAKE z 382 – 367 370 – 390 515 – 503 515 – 518 551 – 543 551 – 572 543 – 572 Distance 1, WANT target: I WANT (X) 361 – 355 361 – 365 355 – 365 179 Context pair HAT–PANTS y x <red>–<green> z 365 – 379 375 – 356 BOSS–CLOWN 531 – 514 535 – 551 BOSS–CUPCAKE 531 – 516 535 – 557 CLOWN–CUPCAKE 551 – 557 Distance 2, WANT target: I WANT FIND (X) 387 – 373 387 – 373 373 – 373 I WANT GO FIND (X) z <red> HAT BOSS y x <green> PANTS Context pair HAT–PANTS x CUPCAKE CLOWN y <red>–<green> BOSS–CLOWN 508 – 503 483 – 494 BOSS–CUPCAKE 508 – 503 483 – 478 CLOWN–CUPCAKE 494 – 478 Distance 3, WANT target: I WANT GO FIND (X) z 397 – 359 404 – 367 390 – 357 390 – 353 357 – 353 180 Context pair HAT–PANTS <red>–<green> x y z 395 – 326 367 – 391 BOSS–CLOWN 518 – 502 499 – 462 360 – 382 BOSS–CUPCAKE 518 – 518 499 – 513 360 – 341 CLOWN–CUPCAKE 462 – 513 382 – 341 Distance 4, WANT target: I WANT GO FIND OTHER (X) 181 Context pair HAT–PANTS y x z 302 – 234 <red>–<green> BOSS–CLOWN BOSS–CUPCAKE CLOWN–CUPCAKE 296 – 250 488 – 467 488 – 480 492 – 481 492 – 485 481 – 485 Distance 1, WISH target: I WISH (X) 287 – 307 287 – 292 307 – 292 I WISH GO FIND (X) <green> z BOSS CUPCAKE y x <red> HAT PANTS CLOWN Context pair HAT–PANTS x y <red>–<green> BOSS–CLOWN 478 – 473 465 – 452 BOSS–CUPCAKE 478 – 476 465 – 459 CLOWN–CUPCAKE 452 – 459 Distance 3, WISH target: I WISH GO FIND (X) z 303 – 300 308 – 335 331 – 285 331 – 316 285 – 316 Table 5.5. Numerical results for the main signing study, with locations of target signs averaged and compared between contexts. Values are given in normalized x-, y- and zcoordinates. No context-related differences were found to be significant. 182 Notwithstanding these trends, the overall absence of significant group effects is very different from what was observed in the speech studies, where very strong effects at distances 1 and 2 were seen, along with weaker effects at distance 3. As mentioned earlier, substantial variation among these signers in many aspects of their signing was observed, so a more detailed investigation of the behavior of individual subjects will be conducted than was the case for the speakers in the English production study. Some of these differences, such as differences in lexicon or syntax, were mentioned earlier and will be discussed in more detail in the individual results sections which follow. 5.3.2.2. Signer 2 183 Figure 5.12. Motion-capture data for two of Signer 2’s sentences in the distance-1 condition with target sign WISH (i.e., sentence frame I WISH (X) I), with context items DEER and CUPCAKE. Clear z-minima (circled) are present in both sentences, typical of the pattern in the motion-capture data of this and other signers for WISH. The scale on the y-axis is in centimeters but the values themselves are arbitrary. The x-axis indicates approximate time in seconds. Signer 2, like Signer 1, was a native signer. The sign forms she preferred were similar to those established for the study by Signer 1, though her signing behavior was somewhat different, particularly in that Signer 2 showed more reduction of the sign WANT as the recording session progressed, so that in some trials, the characteristic zminimum for that sign was either much subtler or entirely absent. In such situations, the spatial coordinates of WANT could not be determined and the trial had to be rejected. These factors, together with some cases of problematic signing behavior such as errors, pauses and false starts—all of which were seen to some degree in later signers as well—resulted in 12.7 percent of completed trials being rejected for this participant. Signer 2 expressed some fatigue with the task and asked to end the session early, though fortunately this occurred after almost all of the planned trials had already been completed. For this signer and for the others, rejected trials tended to occur less frequently for the target sign WISH, whose characteristic drop in the z-dimension was much more robust among contexts for all of the signers, as illustrated above in Figure 5.12, which 184 shows motion-capture data for two of Signer 2’s sentences, I WISH DEER26 and I WISH CUPCAKE. The z-dimension pattern for CUPCAKE shows three local minima, the first of which corresponds to WISH and the other two of which are signatures of the two contacts made downward by the dominant hand onto h2 in the articulation of CUPCAKE. Notice also that the z-minimum for WISH in the DEER context is lower than that in the CUPCAKE context, which is the reverse of the expected situation since DEER is articulated at a higher location. It will be seen that in the sign study, significant context-related differences tended to be in the expected direction; however, contrary-to-expected results were also seen, a marked difference from what was observed in the speech study. 26 The context signs DEER and RUSSIA were used in the version of the experiment in which Signers 1 and 2 took part, but as discussed earlier, the signing of RUSSIA proved problematic. Therefore, this pair was excluded from later signing sessions and no other data for the DEER or RUSSIA contexts will be presented. 185 Figure 5.13. Locations in 3-space of the seven context items, indicating the expected direction of coarticulatory influence these items would have on preceding target signs. For this left-handed signer, the location of BOSS is on the left shoulder instead of the right, so coarticulatory effects in the context of that sign are expected to be in the negative x-direction as well. Table 5.6 gives average x-, y- and z-coordinates for WANT and WISH for this signer in the various contexts in which they were examined, along with the results of statistical testing of context-related significance between contexts. The layout of the table and the accompanying diagrams is the same as was given in the group results section in Table 5.5. One such tabular summary will be given for each of the four signers who took part in the main sign production study. In these tables, significant testing results between contexts are given in each case only for context pairs for which a prediction could be made, as was done for the group results presented earlier. In these cases, significance testing was performed using one-tailed t-tests, since the expected direction of the effects was predictable using the relative locations of the context signs. Figure 5.13 above shows the expected direction of coarticulatory effects for this signer, who was left-handed. Consistent with notation established previously in the spoken-language studies, significant results are noted, with * = p<0.05, ** = p<0.01 and *** = p<0.001.27 A plus sign + indicates a marginally significant result, with p<0.10. In cases where contrasts 27 In Chapter 2 on the spoken-language production study, statistical testing results were given without Bonferroni correction for multiple tests so that differences among individual subjects and at various differences would be seen more clearly; results here will also be given without Bonferroni correction. The issue of false-positive results will be discussed presently. 186 other than those of interest (e.g. HAT vs. PANTS in the y-direction) were significant at the p<0.05 level using two-tailed t-testing, the numerical values and testing outcome are given in square brackets. Results inconsistent with predicted results (i.e. in the contrary-to-expected, or dissimilatory, direction) are given if significant at the p<0.05 level or stronger. Such cases are indicated with parentheses around the star(s) indicating significance; e.g. (**) indicates an effect in the counter-to-expected direction that is significant at the p<0.01 level. Complete numerical results for Signers 2 through 5, with both means and standard deviations given for all contexts and sentence frames, are given in Appendix G. The numerical data for each sentence frame are accompanied by a diagram like those presented earlier with the group results. In each diagram, contrast-item pairs that were associated with significance testing outcomes of p<0.05 or better in at least one dimension x, y or z in that sentence frame are joined by a red line. In the case of marginal significance (p<0.10), items are joined by a green line, and significant outcomes in the contrary-to-expected direction are joined by a dotted black line. 187 Context pair HAT–PANTS x y <red>–<green> BOSS–CLOWN BOSS–CUPCAKE CLOWN–CUPCAKE z 10.2 – 10.3 6.32 – 6.39 16.1 – 15.5 16.1 – 14.5 6.29 – 6.32 6.29 – 7.47 + 6.32 – 7.47 * Distance 1, WANT target: I WANT (X) 8.38 – 8.77 8.38 – 9.67 8.77 – 9.67 188 Context pair HAT–PANTS x y <red>–<green> 11.7 – 12.1 BOSS–CLOWN 14.4 – 13.5 7.59 – 9.00 BOSS–CUPCAKE 14.4 – 14.6 7.59 – 8.22 CLOWN–CUPCAKE 9.00 – 8.22 Distance 2, WANT target: I WANT FIND (X) Context pair HAT–PANTS <red>–<green> z 13.3 – 11.7 x y 11.7 – 14.1 11.7 – 12.6 14.1 – 12.6 z 13.2 – 11.0 * 13.3 – 12.8 BOSS–CLOWN 12.7 – 13.9 * 9.27 – 8.95 12.9 – 11.6 BOSS–CUPCAKE 12.7 – 12.9 9.27 – 8.58 12.9 – 11.8 CLOWN–CUPCAKE 8.95 – 8.58 11.6 – 11.8 Distance 3, WANT target: I WANT GO FIND (X) 189 I WANT GO FIND OTHER (X) CLOWN <green> y x z HAT PANTS <red> CUPCAKE BOSS Context pair HAT–PANTS y x z 13.7 – 12.9 <red>–<green> 12.5 – 13.0 BOSS–CLOWN 11.5 – 12.4 9.85 – 8.56 11.8 – 13.6 BOSS–CUPCAKE 11.5 – 11.8 9.85 – 9.48 11.8 – 12.5 CLOWN–CUPCAKE 8.56 – 9.48 13.6 – 12.5 Distance 4, WANT target: I WANT GO FIND OTHER (X) I WISH (X) z <red> HAT CLOWN CUPCAKE x y <green> PANTS BOSS 190 Context pair HAT–PANTS x y <red>–<green> BOSS–CLOWN BOSS–CUPCAKE CLOWN–CUPCAKE Context pair HAT–PANTS z 10.1 – 6.61 10.8 – 2.94 10.4 – 10.6 10.4 – 10.8 9.90 – 9.15 9.90 – 10.2 9.15 – 10.2 Distance 1, WISH target: I WISH (X) x y <red>–<green> BOSS–CLOWN 9.23 – 11.1 12.0 – 12.2 BOSS–CUPCAKE 9.23 – 10.4 12.0 – 11.8 CLOWN–CUPCAKE 12.2 – 11.8 Distance 3, WISH target: I WISH GO FIND (X) 9.40 – 8.89 9.40 – 9.21 8.89 – 9.21 z 10.1 – 12.4 12.2 – 13.0 13.0 – 13.1 13.0 – 12.2 13.1 – 12.2 Table 5.6. Numerical results for Signer 2, given in centimeters for the x-, y- and zdimensions, relativized and averaged for each context. Significant results are noted, with * = p<0.05, ** = p<0.01 and *** = p<0.001. A plus sign + indicates a marginally significant result, with p<0.10. In each of the accompanying diagrams, contrast-item 191 pairs associated with significance testing outcomes of p<0.05 or better are joined by a red line; for marginally significant results (p<0.10), items are joined by a green line. Out of the 60 context pairs tested, 33 showed differences that were in the expected direction, more than would be expected by chance, but not significantly so. The results shown in Table 5.6 for Signer 2 show no evidence of strong coarticulation patterns, with only a few significant testing outcomes, a pattern that will be seen to hold generally for all of the signers investigated in this study. One apparent reason for this is the relatively large amount of variability in signing behavior on the part of these signers, with respect to a range of items including signing speed, preferred lexical forms, amount of reduction and place assimilation, and the relatively large variances associated with the quantitative measures of spatial location. This was often so even for particular signers articulating particular target items in particular sentence frames. The consequence of all of this, from a statistical testing perspective, is an outcome quite different from what was seen in the spoken-language study, where close-distance effects were nearly ubiquitous among speakers and longer-distance effects were also quite common. 192 5.3.2.3. Signer 3 Figure 5.14. Motion-capture data for two of Signer 3’s sentences, I WANT HAT I and I WISH GO FIND BOSS I. The clear z-minima (circled in the figure) for WANT and WISH are typical for Signer 3. The scale on the y-axis is in centimeters but the values themselves are arbitrary. The x-axis indicates approximate time in seconds. Signer 3 was a late learner of ASL, and completed the production task with great enthusiasm. His signing, accordingly, had a somewhat more emphatic character throughout the duration of the recording session than that of the other signers, and showed little or no reduction of the target signs WANT and WISH. The local minimum in the z-dimension characterizing those signs was therefore quite evident in almost all 193 trials, as Figure 5.14 above illustrates, and relatively few of this signer’s trials were rejected (8 of 288, 2.8%). Signer 3’s preferred form of the context sign CLOWN used a twisting motion in front of the nose rather than non-rotational contact as in the form of the sign used by Signer 1. Signer 3’s preferred form of CUPCAKE also incorporated rotation of the dominant hand. In addition, this signer indicated that he would not have chosen spontaneously to repeat the “I” pronoun at the end of the sentence. Since it was desirable for individual sign forms as well as for whole sentences to be consistent among study participants, Signer 3 was asked to use the precedents that had been established by Signer 1. Signer 3 signed at a somewhat slower rate than the other signers who took part in this study, but when reminded during the first break that informal, relatively fast signing was acceptable for the purposes of the experiment, he indicated he was in fact signing at a speed that was natural and comfortable for him. 194 Figure 5.15. An instance of Signer 3 signing BOSS with an upper-chest location. An additional feature of this participant’s signing that became more apparent as the task progressed was seen in the context sign BOSS, for which this signer frequently used a location on the upper chest instead of on the shoulder, as illustrated in Figure 5.15. Presumably, this was a kind of reduction bringing the location of the sign closer to neutral space, where preceding signs were formed; more discussion of this issue will be given in Chapter 7. This signer was left-handed and so for him, as for Signer 2, expected coarticulatory influence of BOSS on the target signs was leftward. Table 5.7 below gives a summary of significance testing between contexts for this signer. 195 Context pair HAT–PANTS x y <red>–<green> BOSS–CLOWN BOSS–CUPCAKE CLOWN–CUPCAKE z 13.8 – 14.5 16.1 – 16.8 6.15 – 8.42 ** 6.15 – 6.50 21.2 – 15.2 21.2 – 22.6 15.2 – 22.6 + Distance 1, WANT target: I WANT (X) 13.9 – 15.3 + 13.9 – 14.6 15.3 – 14.6 196 Context pair HAT–PANTS x y <red>–<green> BOSS–CLOWN 5.53 – 6.77 17.7 – 17.3 BOSS–CUPCAKE 5.53 – 6.34 17.7 – 20.1 * CLOWN–CUPCAKE 17.3 – 20.1 Distance 2, WANT target: I WANT FIND (X) z 11.1 – 13.1 11.0 – 13.2 11.5 – 11.2 11.5 – 11.9 11.2 – 11.9 197 Context pair HAT–PANTS x y <red>–<green> z 14.1 – 13.5 16.2 – 15.0 + BOSS–CLOWN 8.52 – 8.86 10.4 – 8.58 13.9 – 13.5 BOSS–CUPCAKE 8.52 – 7.62 10.4 – 14.4 13.9 – 12.3 CLOWN–CUPCAKE 8.58 – 14.4 + 13.5 – 12.3 Distance 3, WANT target: I WANT GO FIND (X) Context pair HAT–PANTS <red>–<green> x y z 13.5 – 10.9 ** [ 20.0 – 11.2 * ] 13.1 – 13.1 BOSS–CLOWN 6.32 – 7.90 16.2 – 12.1 12.5 – 12.0 BOSS–CUPCAKE 6.32 – 5.77 16.2 – 19.5 12.5 – 12.0 CLOWN–CUPCAKE 12.1 – 19.5 + 12.0 – 12.0 Distance 4, WANT target: I WANT GO FIND OTHER (X) 198 Context pair HAT–PANTS x y <red>–<green> BOSS–CLOWN BOSS–CUPCAKE CLOWN–CUPCAKE z 3.84 – 3.07 + 4.09 – 1.89 * 12.1 – 11.9 7.39 – 7.49 12.1 – 10.9 7.39 – 6.86 [ 11.9 – 10.9 * ] 7.49 – 6.86 Distance 1, WISH target: I WISH (X) 5.46 – 4.07 (*) 5.46 – 3.59 * 4.07 – 3.59 199 Context pair HAT–PANTS <red>–<green> x y [12.5 – 15.8 ** ] BOSS–CLOWN 14.1 – 13.9 3.67 – 4.62 + BOSS–CUPCAKE 14.1 – 14.8 3.67 – 2.96 CLOWN–CUPCAKE 4.62 – 2.96 Distance 3, WISH target: I WISH GO FIND (X) z 9.24 – 11.0 7.39 – 11.5 (**) 9.54 – 9.28 9.54 – 9.90 9.28 – 9.90 Table 5.7. Numerical results for Signer 3, given in centimeters for the x-, y- and zdimensions, relativized and averaged for each context. Significant results are noted, with * = p<0.05, ** = p<0.01 and *** = p<0.001. A plus sign + indicates a marginally significant result, with p<0.10. Significant outcomes in the contrary-to-expected direction are given in parentheses. In each diagram, contrast-item pairs associated with significance testing outcomes of p<0.05 or better are joined by a red line; for marginally significant results (p<0.10), items are joined by a green line; and for significant outcomes in the contrary-to-expected direction, items are joined by a dotted black line. As with Signer 2, the results for Signer 3 do not much resemble those seen in the speech study; for example, stronger effects are not particularly evident at closer distances relative to greater ones, and there is a general lack of very strong (p<0.001) effects. In addition, some evidence of dissimilatory behavior can be seen. Overall, this signer showed trends in the expected direction in only 25 out of the 60 context pairs that were examined, somewhat less than the 30 that would be expected by chance. 200 5.3.2.4. Signer 4 Figure 5.16. Motion-capture data for four of Signer 4’s sentences. This signer differed from the others in having z-minima for “I” as well, so it is the second z-minimum in each sentence (circled in the figure) that marks this signer’s articulation of target signs WANT or WISH. The scale on the y-axis is in centimeters but the values themselves are arbitrary. The x-axis indicates approximate time in seconds. Signer 4, though not a native signer, learned ASL at an early age. Like Signer 3, this participant tended to sign fairly slowly, which might be due to the fact that many of this signer’s preferred sign forms were different from those used in this experiment. Many of these differences appear to be due to dialectal variation; this signer is African American and originally came from Texas. The most relevant of these signing 201 behaviors from a data analysis perspective was his signing of “I,” which he articulated with an concave-down arc trajectory in the sagittal plane, resulting in a slight descent during the last part of the sign. Therefore his motion-capture data had a local zminimum for “I” followed by another for the target sign, WANT or WISH, as illustrated above in Figure 5.16. Therefore it was the second z-minimum in each sentence that was taken as marking the target signs in the motion-capture data for this signer. Other notable aspects of this subject’s signing behavior included the following. For CLOWN, this signer’s preferred form used a twisting motion at the nose like the sign form preferred by Signer 3. This signer would normally finger-spell “cupcake.” The signer’s sign for “pants” uses two consecutive motions in neutral space, each with two hands facing, which trace out the shape of left and right pant legs; this is therefore a completely different form from Signer 1’s. The signer would also normally use a different sign for HAT, using two hands and a motion like that of putting on a cap. Figure 5.17. Signer 4’s preferred form of the sign GO. 202 In addition, Signer 4’s preferred form of the sign GO differed from that of all the other signers. As illustrated in Figure 5.17, he began his articulation of this sign somewhat like other signers did, with “1” handshapes on both hands and with the palms facing forward or to the side, but rather than moving toward a palms-down position with the fingertips pointing forward, this signer moved his hands in a right-to-left direction, ending with the fingertips pointing leftward. The signer was not asked to conform his signing of this particular item to that of the other signers. This signer also indicated that he would not normally choose to use the resumptive “I” pronoun in these sentences. Despite these differences, only 4.5% of his trials needed to be rejected because of signing errors or other issues. This signer’s productions of WANT and WISH were almost always very clear in the motion-capture data, as exemplified in Figure 5.16 above. A summary of significance testing results between contexts for Signer 4 is given below in Table 5.8. 203 Context pair HAT–PANTS x y <red>–<green> BOSS–CLOWN BOSS–CUPCAKE CLOWN–CUPCAKE z 14.0 – 13.5 13.3 – 12.8 11.7 – 11.4 11.7 – 12.9 15.2 – 17.5 15.2 – 17.6 17.5 – 17.6 Distance 1, WANT target: I WANT (X) 11.8 – 11.9 11.8 – 12.4 11.9 – 12.4 204 Context pair HAT–PANTS x y <red>–<green> 13.4 – 12.1 BOSS–CLOWN 14.5 – 10.1 * 15.9 – 16.8 BOSS–CUPCAKE 14.5 – 11.6 + 15.9 – 17.8 CLOWN–CUPCAKE 16.8 – 17.8 Distance 2, WANT target: I WANT FIND (X) Context pair HAT–PANTS <red>–<green> z 11.6 – 13.2 x y 13.2 – 10.4 13.2 – 10.6 + 10.4 – 10.6 z 10.4 – 8.03 10.9 – 10.4 BOSS–CLOWN 11.0 – 10.2 11.0 – 15.0 11.0 – 11.6 BOSS–CUPCAKE 11.0 – 9.64 * 11.0 – 7.96 11.0 – 9.63 * CLOWN–CUPCAKE 15.0 – 7.96 11.6 – 9.63 *** Distance 3, WANT target: I WANT GO FIND (X) 205 Context pair HAT–PANTS <red>–<green> x [ 9.31 – 7.57 * ] y z 9.57 – 9.57 10.3 – 9.80 BOSS–CLOWN 8.26 – 9.13 6.69 – 7.22 8.65 – 10.3 + BOSS–CUPCAKE 8.26 – 10.1 6.69 – 8.74 8.65 – 8.81 CLOWN–CUPCAKE 7.22 – 8.74 10.3 – 8.81 Distance 4, WANT target: I WANT GO FIND OTHER (X) 206 Context pair HAT–PANTS x [ 3.53 – 9.27 * ] <red>–<green> BOSS–CLOWN BOSS–CUPCAKE CLOWN–CUPCAKE y [12.1 – 8.63 ** ] z 9.16 – 7.57 [ 11.6 – 9.08 * ] 8.81 – 7.45 8.73 – 5.07 + 8.73 – 5.84 + 13.0 – 12.2 13.0 – 12.7 12.2 – 12.7 Distance 1, WISH target: I WISH (X) 4.25 – 8.20 * 4.25 – 7.26 8.20 – 7.26 I WISH GO FIND (X) <green> CUPCAKE y x z PANTS <red> HAT CLOWN Context pair HAT–PANTS x BOSS y <red>–<green> BOSS–CLOWN 8.07 – 9.05 9.71 – 9.60 BOSS–CUPCAKE 8.07 – 9.43 9.71 – 9.77 CLOWN–CUPCAKE 9.60 – 9.77 Distance 3, WISH target: I WISH GO FIND (X) z 4.17 – 5.70 4.80 – 5.84 4.00 – 3.82 4.00 – 5.06 3.82 – 5.06 Table 5.8. Numerical results for Signer 4, given in centimeters for the x-, y- and zdimensions, relativized and averaged for each context. Significant results are noted, with * = p<0.05, ** = p<0.01 and *** = p<0.001. A plus sign + indicates a marginally significant result, with p<0.10. In each diagram, contrast-item pairs associated with 207 significance testing outcomes of p<0.05 or better are joined by a red line; for marginally significant results (p<0.10), items are joined by a green line. Signer 4’s results show a relatively small number of significant outcomes, although the number of instances of trends in the expected direction, 37 out of 60 context pairs, is significantly greater than would be expected by chance (p<0.05). In addition, one distance-3 effect was the most strongly significant result seen in the sign production study, and was the only one which would remain significant under a Bonferroni correction (more discussion of the significance results as a whole will be given later). Again, this pattern of results is unlike that of the speech study; here, no significant distance-1 results for target sign WANT were found, and even the trends at near distances were not always in the expected direction. 208 5.3.2.5. Signer 5 Figure 5.18. Motion-capture data for two of Signer 5’s sentences in the distance-4 condition with target sign WANT (i.e., sentence frame I WANT GO FIND OTHER (X) I) are shown, with context signs CUPCAKE and HAT. Strong reduction of WANT has resulted in the absence of the expected z-minimum, a fairly frequent occurrence for this signer. The scale on the y-axis is in centimeters but the values themselves are arbitrary. The x-axis indicates approximate time in seconds. Signer 5 was another early, though not native, signer of ASL. Her natural signing speed was rapid, and her signs were often greatly reduced, as illustrated in Figure 5.18 above, which shows motion-capture data for two of this signer’s sentences 209 in which the characteristic z-minimum for WANT was not present. Video recordings showed that the sequence “I WANT” as signed by this signer often appeared to be combined into a single form, essentially a sign analog of spoken-language contractions like “I’d” for “I would” (more discussion of this point is forthcoming). The same was true to some extent of the sequence “I WISH,” though this did not occur as frequently as for “I WANT.” Figures 5.19 and 5.20 below are frame-by-frame sequences showing these sign contractions in two of this signer’s utterances. They were taken from the longer sequences “I WANT GO FIND BOSS” and “I WISH <green>,” respectively. In Figure 5.19, it can be seen that the “1” handshape is never attained for the sign “I” and instead, the contact with the chest specified for that sign is made with the hand already spread as for the following sign, WANT. In fact, the signer’s fingers already appear to be anticipating the subsequent sign GO at the same moment, which would make this a case of distance-2 HH coarticulation (or distance-1, if the contracted form is counted as just one sign form). Also notable is the fact that only the dominant hand was used in signing this entire utterance. Crucially, the reduced form of WANT used here is articulated without the small lowering motion (i.e. z-minimum) used to establish the location of that sign (as confirmed by an inspection of the motion-capture data), meaning that this trial had to be rejected in performing statistical analysis. The frame-by-frame sequence in Figure 5.20 shows a contracted form of the sequence “I WISH.” Again, the “1” handshape for the sign “I” is completely absent, evidently due to assimilation with the following “C” handshape for WISH.28 28 The sequence being signed here, “I WISH <green>,” required that the signer next reach down to flip a switch as described earlier (the switch is not visible in the figure), in order to articulate the non-linguistic context item <green>. In such cases, signers sometimes made one smooth descent downward during the sequence “WISH <green>,” so that no z-minimum could be established for the target sign WISH. 210 Figure 5.19. Contracted form of the sequence “I WANT” signed by Signer 5, shown frame by frame, going from left to right and from the top row to the bottom row. 211 Figure 5.20. Contracted form of the sequence “I WISH” signed by Signer 5, shown frame by frame, going from left to right and from the top row to the bottom row. 212 In trials like these, the motion-capture data lacked the z-minimum characteristic of the target signs WANT or WISH. Although this signer’s preferred sign forms did not differ from those used in this experiment (though like some of the other signers she did indicate that she would not normally sign these sentences with the resumptive “I”), her signs showed much greater variation in whether and how much they were reduced, often exhibiting much stronger reduction than was the case for the other signers. Because of this, more trials were rejected for her than for the other signers; fully 30.6% of her trials were rejected. This may well have weakened the statistical testing outcomes for Signer 5’s data, which are presented below in Table 5.9. Figure 5.21. At left, a form of PANTS often seen in Signer 5’s signing, with location in neutral space instead of at the waist or legs. At right, a lowered form of HAT. 213 Also noteworthy is this subject’s signing of PANTS, which she often performed with the expected handshape and movement, but in neutral space rather than at waist level.29 This is illustrated in Figure 5.21 above, together with a form of HAT whose location is at the cheek instead of the top of the head. Like Signer 3’s signing of BOSS, these forms seemed to have resulted from signers’ efforts (whether conscious or otherwise) to make efficient transitions between neutral-space signs and the following context signs. The existence of sign “contractions,” like those for “I WANT” and “I WISH” just seen, raises the question of whether such forms might become regularized at some point, just as contractions in English are permitted in some cases but not others (cf. “he would,” “that guy would,” “you are not,” “I am not,” and the corresponding he’d, that guy’d, you’re not or you aren’t, and finally I’m not but *I amn’t). A study examining a larger number of contexts might find evidence of rules governing such forms as they currently exist in ASL; the variability in their use here suggests that multiple factors are likely to be involved (cf. the Lucas, Bayley, Rose & Wulf (2002) study, mentioned earlier). This complexity can be illustrated by looking at another instance of a reduction of WANT by this signer. In this utterance, the relevant part of which is shown frameby-frame in Figure 5.22 below, she articulated the WANT in “I WANT FIND PANTS” with palms-down orientation, having apparently assimilated the palm-down orientation specified for the following sign FIND (and for PANTS as well). The signing of WANT 29 This signer was not asked to “correct” this behavior because it seemed to be a spontaneously-occurring modification of the sign chosen for use in this study, rather than a different lexical item. Nevertheless, because such articulations of PANTS by this signer did not have the “low” location needed to make the HAT-PANTS contrast viable, such trials were excluded from the analysis. For the distance-1 condition with target sign WANT, this left no PANTS trials available for analysis. 214 with palms down is visible in the rightmost two frames in the top row, between the more-clearly articulated signs I and FIND shown in the preceding and following frames. Interestingly, in the final frame shown in the sequence (which is not consecutive with the earlier frames), the signer’s articulation of PANTS is shown, and in this utterance, PANTS is not articulated in neutral space as it often was in this signer’s other utterances. 215 Figure 5.22. Orientation assimilation of WANT in the sentence I WANT FIND PANTS, visible in the third and fourth images on the top tow. The entire sequence is ordered frame by frame from left to right, from the top row to the bottom row. The very 216 last image in the sequence (at lower right) is not consecutive with the preceding frame and shows the location of PANTS in this utterance. Context pair HAT-PANTS x y <red>-<green> BOSS-CLOWN BOSS-CUPCAKE CLOWN-CUPCAKE z 8.63 – n.a. 7.81 - 7.72 15.2 - 13.5 15.2 - 10.6 * 11.7 - 15.2 11.7 - 10.8 15.2 - 10.8 Distance 1, WANT target: I WANT (X) 9.75 - 7.46 9.75 - 8.85 7.46 - 8.85 217 Context pair HAT-PANTS x y <red>-<green> BOSS-CLOWN 11.6 - 13.7 10.1 - 12.4 BOSS-CUPCAKE 11.6 - 12.2 10.1 - 9.11 CLOWN-CUPCAKE 12.4 - 9.11 Distance 2, WANT target: I WANT FIND (X) z 11.8 - 10.3 13.8 - 8.82 + 12.6 - 9.91 12.6 - 11.5 9.91 - 11.5 218 Context pair HAT-PANTS x y <red>-<green> 15.9 - 11.7 BOSS-CLOWN 12.8 - 13.3 11.8 - 11.5 BOSS-CUPCAKE 12.8 - 11.6 + 11.8 - 8.39 CLOWN-CUPCAKE 11.5 - 8.39 Distance 3, WANT target: I WANT GO FIND (X) Context pair HAT-PANTS <red>-<green> z 11.7 - 12.5 x y 10.7 - 10.6 10.7 - 10.7 10.6 - 10.7 z 12.8 - 9.98 + 14.6 - 12.0 BOSS-CLOWN 14.6 - 12.1 + 12.7 - 10.2 9.61 - 11.0 BOSS-CUPCAKE 14.6 - 10.5 * 12.7 - 8.89 9.61 - 8.09 CLOWN-CUPCAKE 10.2 - 8.89 11.0 - 8.09 * Distance 4, WANT target: I WANT GO FIND OTHER (X) 219 Context pair HAT-PANTS x y <red>-<green> BOSS-CLOWN BOSS-CUPCAKE CLOWN-CUPCAKE z 6.87 - 6.18 5.30 - 6.28 13.2 - 11.0 * 13.2 - 12.3 13.8 - 12.2 13.8 - 13.1 12.2 - 13.1 Distance 1, WISH target: I WISH (X) 6.99 - 6.57 6.99 - 6.87 6.57 - 6.87 220 Context pair HAT-PANTS <red>-<green> x y z 6.17 – 3.78 * [11.7 – 9.24 * ] 3.82 – 7.44 (*) BOSS-CLOWN 11.9 – 12.2 12.7 – 12.0 BOSS-CUPCAKE 11.9 – 11.0 * 12.7 – 11.4 CLOWN-CUPCAKE 12.0 – 11.4 Distance 3, WISH target: I WISH GO FIND (X) 6.88 – 5.42 6.88 – 4.66 * 5.42 – 4.66 Table 5.9. Numerical results for Signer 5, given in centimeters for the x-, y- and zdimensions, relativized and averaged for each context. Significant results are noted, with * = p<0.05, ** = p<0.01 and *** = p<0.001. A plus sign + indicates a marginally significant result, with p<0.10. Significant outcomes in the contrary-to-expected direction are given in parentheses. In each diagram, contrast-item pairs associated with significance testing outcomes of p<0.05 or better are joined by a red line; for marginally significant results (p<0.10), items are joined by a green line; and for significant outcomes in the contrary-to-expected direction, items are joined by a dotted black line. In 27 of the 60 context pairs that were examined for this signer, trends were in the expected direction, slightly fewer than the 30 that would be expected by chance. As with the previous signers, the effects that were seen were not particularly strong, nor did they follow a clear pattern, such as being stronger at closer distances. In fact, effects seemed relatively strong for this signer at distance 4, where two significant and two marginally-significant results were obtained, and where trends in the x- and z-directions were all in the expected direction. Overall, the results for this signer provide further 221 evidence of substantially different coarticulatory behavior between the speaker and signer groups. 5.3.3. Other aspects of intersigner variation A recurring theme in this sign study has been the relatively great amount of variation among subjects in various aspects of their language use, even compared to that seen in the spoken-language study. Among the signers who took part in this ASL study, we have noted differences in age of acquisition of the language, preferred sign forms and syntactic structure, and the occurrence of assimilations, reductions and “contractions.” Here, some additional measures of intersubject variation will be presented. Table 5.10 gives a summary of some information discussed earlier, along with a rough measure of signing speed for each signer30. While all signers made occasional errors during the performance of the production task (e.g. reaching for the bottom switch instead of the top one to perform the <red> context item), relatively few trial rejections were associated with these sorts of occurrences for any of the signers. In contrast, great variation was seen in different subjects’ signing with respect to place assimilation and sign reduction, most of all for Signer 5. This type of variation did not appear to be related to signers’ age of acquisition, repeated for convenience in Table 5.10, as Signers 3 and 4 both showed little or no such tendency even though their ages of acquisition were quite different. On the other hand, signing speed does seem likely to 30 The measure of signing speed used here was the mean sentence-to-sentence interval when that duration was under 5 s, over the entire set of sentences signed by a given speaker. While this includes intersentence pauses and hence overestimates the actual sentence duration somewhat, the resulting values accord well with impressionistic judgments of which signers were faster or slower. 222 have been a relevant factor, since faster signers tended to have more reductions and hence more rejected trials because of missing z-minima than slower signers. Signer 1 2 3 4 5 Age of acquisition of ASL Native Native Late Early Early Percent of trials rejected 7.7 12.7 2.8 4.5 30.6 Errors or false starts? Few Some Few Few Some Extreme reductions/ contractions? Some31 Some No No Pervasive Sentence-tosentence duration (s) 3.6 3.8 4.2 4.0 2.9 Table 5.10. Some measures of intersigner differences. It was noted earlier that paired t-testing was performed in the statistical comparisons between contexts because of the possibility that a signer’s target point for a sign articulated in neutral signing space might tend to meander, or “drift,” during the course of the experiment. Here, a quantification of that drift is made. The measures that are used are (1) the standard deviations over all trials of the x-, y- and z-values of the target signs (WANT or WISH), and (2) the standard deviation and range of rolling 7averages of those x-, y- and z-values. The rolling averages are taken over seven trials in order to smooth out differences related directly to the various context items. A summary of these measures is given below for each signer in Table 5.11. 31 Such occurrences were sometimes seen for this signer in other pilot tests whose data are not presented here. 223 Signer SD in cm of all trials x y z SD in cm of rolling 7-average x y z Range in cm of rolling average x y z 1 2 3 4 5 2.6 2.7 3.8 3.6 2.6 1.4 2.0 7.5 5.0 2.7 1.9 2.9 4.4 3.6 3.6 2.2 1.8 3.0 2.0 1.4 0.9 1.5 6.1 3.7 1.3 1.4 1.8 3.8 2.9 2.6 7.5 7.3 10.8 9.2 5.9 4.4 7.2 22.4 14.4 6.3 6.3 8.0 14.1 11.5 11.1 Column average 3.06 3.72 3.28 2.08 2.70 2.50 8.14 10.94 10.20 Table 5.11. Quantification of neutral-space “drift” for each signer, for all trials and for rolling 7-averages. The table shows that the neutral-space locations of the target signs did change for each signer over the course of the experiment, typical variations for most signers being on the order of a few centimeters or so. There were, however, noticeable differences among signers in this regard. For example, Signer 3 showed much greater variability, particularly in the y (front-back) direction, than the others. The subjects whose signing was the slowest, Signers 3 and 4, showed greater drift over the course of their signing sessions in all three spatial dimensions than the other signers did, perhaps indicating that longer intervals between articulations of a given sign (or sign location) are associated with a fuzzier articulatory target than would be so in the case of shorter intervals. In considering the possibility of neutral-space drift in this experiment, it was assumed that such drift would be more or less random within the neutral-space region for each signer, rather than progressing systematically in a particular pattern. Figure 5.23 below indicates that this was the case. The figure shows the rolling 7-averages of 224 the x-, y- and z-coordinates of the target-sign locations produced by three of the signers over the course of the experiment. Data for each of the six trial blocks is labeled with the sentence frame used in that block. Graphs are given only for Signers 3, 4 and 5 because, as was mentioned earlier, the version of the experiment taken part in by Signers 1 and 2 had more frequent changes of sentence frame, making the calculation of these sorts of rolling averages problematic. The fact that many of Signer 5’s trials were rejected explains the somewhat sparser appearance of her data in the figure relative to that of the other two signers. 225 226 Figure 5.23. Drift over time of target-sign location in neutral signing space in each of the x-, y- and z-directions, during each of the six blocks of the production study, for Signers 3, 4 and 5. The data represented are rolling 7-averages through the duration of each block. The numerical values along the vertical axis are arbitrary and are given only to indicate the scale of the drift. 5.4. Chapter conclusion These two sign production studies investigated five signers and sought to determine whether evidence of location-to-location effects of one sign on another might be found, particularly across intervening signs. Since this sign experiment was patterned after the speech study, where strong effects were quite common, it is something of a surprise that the effects seen here were relatively weak. Many of the 227 speech-production effects were strong enough to remain significant after application of the Bonferroni correction, which is not the case here. However, there were more significant testing outcomes seen here than would be expected by chance. With a total of 240 significance tests performed for the main study (four signers * six sentence frames * ten tests for each), 12 results significant at the p<0.05 level and 24 marginally significant outcomes (at the p<0.10 level) would be expected merely by chance. In this study, the total numbers of such results were 20 and 36, respectively, either of which would be less than 5% likely in a random-chance scenario. Therefore, it seems evident that (1) many if not most of these effects were indeed real, but that (2) they were generally rather weak, were not pervasive and not robust with respect to either context, distance or signer. As has been noted already, this is markedly different from what was seen in the speech production study, where closerdistance effects were very strong and nearly ubiquitous among speakers, with a steady decrease in effects as distance increased, and with longer-distance effects seen only in speakers who had coarticulated strongly at closer distances. In addition, some evidence of dissimilatory behavior, which was never seen in the speech study, was also found in this sign study. A partial explanation of the modality-specific differences observed here may be found in earlier work on spoken-language coarticulation. Daniloff and Hammarberg (1973) and Hammarberg (1976) asserted that anticipatory coarticulation was essentially due to speech planning while carryover coarticulation was due to the inertia of the speech articulators. Accordingly, anticipatory coarticulation would be most expected in speech production contexts in which smaller, lighter articulators like the tip of the 228 tongue were involved; while larger, heavier articulators like the jaw or the tongue body might tend to be associated more with carryover coarticulation because of their greater inertia. While the true situation is evidently more complicated, the intuition behind Daniloff and Hammarberg’s claim may be useful here. Since sign articulators are much larger than those involved in speech, one might expect to find more carryover effects in signed language than anticipatory effects. Since this project has focused exclusively on anticipatory coarticulation, the absence of strong results in the sign studies would therefore not be surprising. On the other hand, this seems to contradict the results obtained by the Cheek (2001) and Mauk (2003) sign studies discussed earlier, in which both anticipatory and carryover effects were seen. However, the sign sequences used in Mauk (2003) had target signs placed between context signs, not only before or after, so the LL effects that were found in that study involved carryover influences as well. In the case of Cheek (2001), significant anticipatory HH effects were found in various sequences consisting of two signs each. Since the articulators involved in creating different handshapes are smaller than those used for shifts in location, perhaps it should be expected that anticipatory HH effects would be more prevalent than anticipatory LL effects (recall that an apparent instance of anticipatory distance-2 HH coarticulation was shown in Figure 5.19 earlier in this chapter). In light of these observations, a logical follow-up to the present sign study would be one in which carryover LL effects are targeted, using sign sequences whose first item has a location high, low or to the side in signing space, followed by multiple neutral-space signs. Another notable contrast between the speech and sign studies concerns the 229 temporal extent of the effects. While the VV effects seen in the speech study for the speakers who coarticulated the most extended over ranges on the order of 400-500 ms, the corresponding temporal distances for the sign data obtained to date are significantly greater, on the order of 500-800 ms or more.32 While on the one hand, this might be expected given the difference in mass of the articulators of the two modalities, the fact that the articulation of signs is accordingly slower than that of speech, and so on, such differences also indicate that at the phonetic level, the limits of production planning for language in general—the temporal horizon, so to speak—might not be expressed in absolute time units like milliseconds, but may instead be determined in relation to the number of gestures in a given timeframe via some function of “gestural density.” This suggests a deep-level similarity between modalities despite the obvious surface-level differences, and appears to be paralleled at the syntactic level as well. Klima and Bellugi (1979) found that the rate at which propositions are conveyed is essentially the same in ASL and English, despite the apparent difference in the articulation rate of signs compared to that of spoken-language words. One of the more surprising results of the speech study was the lack of correlation between speech rate and coarticulatory tendency. The smaller number of subjects investigated in the sign study makes such a determination impossible here, but for this group of signers, it also does not appear as though signing rate and strength of 32 The time range for speech is given for the speakers who showed significant effects at distance 3, with measurements taken (1) conservatively, from the end of the distance-3 vowel to the beginning of the context vowel, and (2) more liberally, from the midpoint of the distance-3 vowel to the midpoint of the context vowel. The approximate averages of those sets of measurements result in the range stated above. The time range for sign is taken from measurements obtained from the motion-capture data of sentences where Signer 1 had distance-3 effects in the pilot study. The timepoints at which these measurements were made were the location of the z-minimum for WANT and the z-value (maximum or minimum) of the context sign. The ranges involved here were much more variable and the context-sign z-values were somewhat ambiguous in the case of minima (e.g. for PANTS), resulting in the relatively inexact range given here. 230 coarticulatory effects are correlated. Slower signers like Signers 3 and 4 had significant effects which overall were not substantially different in magnitude or frequency from those of other signers. Together with results of Mauk (2003), who did find that faster signing rates for individuals were associated with greater undershoot, i.e. LL coarticulation for adjacent signs, perhaps the best explanation for these collective results is that (i) among speakers or signers, one’s normal rate of language production does not predict strength of tendency to coarticulate, but (ii) for a particular individual, production at various rates—slower or faster, relative to one’s own normal rate—might tend to be associated with different amounts of coarticulation. One goal of this project was to investigate to what extent coarticulatory effects such as those found here for speech and sign are specifically language-based, rather than reflecting processes of motor planning common to human actions in general. The effects related to the non-linguistic context actions <red> and <green> were not substantially different in magnitude or distribution from those seen with linguistic context items, and as such, do not permit a straightforward distinction to be made here between “non-linguistic” and sign-based “linguistic” coarticulatory effects. Instead, the results obtained in the speech and sign studies conducted here indicate that if a distinction is to be made, a more logical division would be between speech-based coarticulation on the one side and coarticulation related to signing and other manual actions on the other, given the differences in the frequency and magnitude of the effects seen in the production studies presented in Chapters 2 and 5. Whether this is merely due to the differences in mass and speed of the articulators used in performing these various types of motion must remain an open question for now. 231 Even if sign-based effects are weaker in general than those associated with coarticulation in speech, they may still be perceptually relevant. The following chapter examines this issue by making use of an experimental paradigm similar to the one that was used in Chapter 3 to investigate the perceptibility of coarticulatory effects in speech. 232 CHAPTER 6 — COARTICULATION PERCEPTION IN ASL: BEHAVIORAL STUDY 6.1. Introduction As with the speech study presented in Part I of this project, a major goal of the sign study was to investigate the perceptibility of coarticulatory effects at various distances, with a particular focus on variability between language users. In the context of signed language, such a study presented some challenges not encountered in the speech study. For example, for the English-language study, schwa waveforms were rather straightforwardly manipulated for the purposes of obtaining normalized audio stimuli, something which was not feasible for sign-language video clips. Furthermore, for the speech study, the easy availability of English-language users made it possible to examine a relatively large number of speakers (seven) before a subset of recordings were selected for use as perception-study stimuli for subsequent participants. In contrast, for the perception study to be presented in this chapter, the relative difficulty of recruiting ASL users made it a practical necessity to create stimuli and build the perception experiment at a much earlier stage, so that as many subjects as possible would be able to contribute both production and perception data. On the other hand, one advantage of the paradigm that was used for this experiment was that knowledge of 233 ASL was not needed to perform the task, making it possible to recruit hearing nonsigners as well as deaf signers as subjects. 6.2. Methodology 6.2.1. Participants Because the task was one for which knowledge of ASL was not strictly necessary, both deaf signers and hearing non-signers, a total of 20 subjects in all, took part. Four deaf signers participated; this was the same group who had taken part in the sign production studies described in Chapter 5, with the exception of Signer 2, who declined to participate. All 16 hearing non-signers who took part were undergraduate students at the University of California at Davis who received course credit for participating and were uninformed as to the purpose of the study. Their ages ranged from 18 to 25, with a mean of 20.2 and a SD of 1.9. Nine of the 16 were female. For this study, unlike the spoken-language experiments discussed in earlier chapters, no special effort was made to recruit only monolingual English native speakers, since the crucial distinction between groups for the purposes of the present study was signer vs. non-signer. 6.2.2. Creation of stimuli for perception experiment Recall that for the spoken-language perception study presented in Chapter 3, the crucial distinction between the two main stimulus types in each distance condition was “[i]-colored” versus “[a]-colored,” with stimuli grouped in blocks ranging from quite 234 easy (distance 1) through challenging (distance 2) to very difficult (distance 3). The sign-language perception study was very similar in format, but the distinction that participants needed to make was between “high” versus “low,” according to the position on the body (head or waist area, respectively) of the context sign. The subjects saw initial portions of sentences like those produced by subjects in the sign production study, edited so that they contained only the sign sequence “I WANT,” where the subsequent (unseen) context sign had been either a “high” sign DEER or HAT or a “low” sign RUSSIA or PANTS. The position in neutral signing space of the neutralspace sign WANT was to be the subjects’ guide as to whether the upcoming context sign was “high” or “low”; recall from the production study presented in Chapter 5 that such coarticulation-related differences typically amounted to 1 to 2 centimeters. The sentences were of form I WANT (X), I WANT FIND (X), or I WANT GO FIND (X), corresponding to distance conditions 1, 2 and 3, respectively. The tokens needed were sequences “I WANT” in the context of “high”- and “low”-location signs in each of the distance conditions 1, 2 and 3. Since individual video recordings might have quirks which could be used by subjects to determine their distinctiveness independent of sign height, six recordings for each distance condition and location value (“high” or “low”) were selected, to be presented interchangeably during each presentation of that location value. Each token consisted of the portion of a sentence containing the sequence “I WANT,” beginning with the frame before the right hand started moving upward to make the sign “I” and ending at the frame in which both hands had reached the midpoint of the sign “WANT.” Two versions of each token were used: for the first, the 235 end boundary was the video frame during the articulation of WANT at which the minimum z-value was reached (generally identifiable as the frame in which the least movement-related blurring was seen), and for the second, one additional frame after this was included, giving subjects a very small amount of extra information. This latter will be referred to as the “extra-frame” condition. Therefore, the total number of tokens used was 2 (“high” vs. “low” context) * 3 (distance-1, -2 or -3 condition) * 6 recordings of each * 2 (with or without the extra frame) = 72. The first participant of the Chapter 5 production study, Signer 1, was recruited for the filming of video stimuli. She signed sentences of the form I WANT (X), I WANT FIND (X), or I WANT GO FIND (X), where (X) was either DEER, RUSSIA, HAT or PANTS. Each sentence frame-context sign combination was articulated 10 times, for a total of 120 signed sentences, some of whose initial sequences “I WANT” were used as stimuli; the selection procedure that was used will be discussed presently. The signer was videotaped so that appropriate passages could be edited for use as stimuli, and the signer also provided motion-capture data. One issue that had to be addressed was that in general, even when z-value differences between “high” and “low” contexts were significant overall, substantial spatial variability within each context meant that among the multiple articulations of the various “high”- and “low”context versions of WANT, some “low”-context z-values were higher than some “high”-context ones. Despite this, it was desirable to insure that the set of “high” and 236 “low” tokens would have non-overlapping z-ranges, with all “low”-context articulations of WANT lower than all “high”-context articulations of WANT, so that each trial would be informative to subjects. Therefore the following selection procedure was followed. For each context sign in each distance condition, the ten articulations of WANT were sorted by z-value from minimum to maximum. For “high” contexts, the articulation of WANT with the maximum z-value was excluded and the next three highest-coarticulated versions were used. For “low” contexts, the articulation of WANT with the minimum z-value was excluded and the next three lowest-coarticulated versions were taken. When these selections were made, for each distance condition the z-ranges of “high” and “low” did not overlap, and the average differences between “high” and “low” versions of WANT were in line with the results from the initial production study.33 The results of that study, described in Section 5.2, established that anticipatory LL effects could be expected as far as three signs away, with magnitudes on the order of 1 to 2 cm, generally diminishing at greater distances. The context-related z-value differences in the perception-study stimuli are given in the second column of Table 6.1 below. Recordings were imported from digital videotape and converted to .avi files using Final Cut Pro editing software. Some basic information concerning these video stimuli is given below in Table 6.1. The longer durations for greater distance-condition values appear to reflect slightly slower starts on the part of the signer in preparing to sign the somewhat longer sentences associated with those distance conditions, but does not introduce a confound since durations were not significantly different between the 33 There was one exception to this; to maintain the condition of non-overlapping z-dimension range, four HAT and two DEER tokens, instead of three of each, were used for the distance-3 “high” condition. 237 “high” and “low” contexts. Statistical testing on the frame lengths of these sets of tokens confirmed that for each distance condition, the token sets did not differ significantly in duration between contexts, and therefore it was assumed that they would not be distinguishable by subjects in that way. For all video stimuli, the final frame was repeated five times in sequence to create a very brief (150 ms) “freezeframe” effect; this was done to make the mid-sentence termination of the stimuli less jarring to subjects. Distance 1 2 3 Mean height difference (cm) 2.6 2.0 1.6 Number of frames Duration (ms) 16.5, 17.3 17.5, 18.0 21.7, 21.7 551, 578 584, 601 723, 723 Table 6.1. The average z-dimension (height) difference between WANT in “high” and “low” contexts and mean duration, expressed both in number of frames and ms for “high” and “low” tokens respectively, for each of distance conditions 1, 2 and 3. Duration information is given for the standard-length versions of the video clips, without the additional frame at the end. 6.2.3. Task All perception study subjects began by performing the production task discussed in Chapter 5. Afterwards, they were given a brief introduction to the purpose of this research. They were told that language elements, whether in speech or sign, can affect each other over some distance, that people can sometimes detect this, and that they were about to begin a task in which their own perceptual abilities were to be tested: they would see short video clips taken from ASL sentences containing “I 238 WANT” followed by other signs, with half of the WANT signs preceding “high”location signs and half preceding “low”-location signs. So that they would not be discouraged by the more difficult contrasts, they were told that subjects in such experiments sometimes say afterwards that they feel they were answering mostly at random when the contrasts were very subtle, but often turn out to have performed at significantly better-than-chance levels. (This turned out to be the case in the present study as well.) Subjects were seated in a small (approx. 10 feet by 12 feet) room in a comfortable chair facing a computer screen placed 36 inches away on a table 26 inches high. The stimuli (stored as .avi files) were delivered using a program created by the author using Presentation software (Neurobehavioral Systems), which also recorded subject responses. To make their responses, subjects used a keyboard that was placed on their lap or in front of them on the table, whichever they felt was more comfortable. All subjects were given a very easy warm-up task about one minute long, during which sentences of the form I WANT (X), where X was either a “high” sign (HAT or DEER) or a “low” high (RUSSIA or PANTS), were played in the same ratio as their counterparts in the actual task. Subjects were told to hit one response button when the sign following WANT was “high” and a different button when that sign was “low.” Handedness of response buttons was counterbalanced across subjects. The first several sentences in this warm-up were completely unedited, so the task was very easy at first, but as the warm-up progressed, more and more frames were removed from the ends of the video clips so that the task became more difficult, more closely resembling the task that would be required for the real experiment. After completing this warm-up, subjects 239 were told that the actual task would be the same in terms of pacing and goal (answering either “high” or “low” by pressing the appropriate button), but more challenging. The clips in the warm-up contained more information than would be present in the actual task, including visible movement after WANT toward the context sign. Subjects were alerted to this in advance, and told that during the actual task, the only information they would have available would be the ending position of the WANT sign, which would somewhat “high” or “low” in each case because of the upcoming context sign, though these differences would in general be subtle. They were told that the correct answers would be 50 percent “high” and 50 percent “low,” ordered at random, so that if they were tempted, for example, to answer “high” ten times in a row, that this was very unlikely to be correct and that would indicate it was necessary to “tune in” to more subtle differences. Subjects were told that the experiment consisted of three blocks, generally getting more difficult as the experiment progressed (see below), but that each block was brief, about 2 minutes each, with optional breaks between blocks. In the spoken-language perception task presented in Chapter 3, the order of blocks was not random, but rather proceeded in order from easy to difficult by distance condition, in sequence 1, 2 and 3. The reason for this was that it was felt that subjects might become discouraged if faced with an extremely difficult task early on, and so were allowed to progress toward the challenging contrasts starting with the easier ones. Subjects in that study indicated afterward that although the later stages were in fact much more difficult, they basically understood the task because the earlier trials helped them to understand the rhythm of the task and what was generally expected of them. 240 The same basic strategy was followed in this sign perception study, whose sequencing is illustrated below in Figure 6.1. Subjects proceeded through three blocks, in distance condition order 1, 2 and 3. For the first eight subjects, who were all hearing non-signers, each of the 72 stimuli was presented once (24 per block). Preliminary analysis of those subjects’ performance indicated that the task was harder, even at distance 1, than the spokenlanguage study had been, but this was tempered by the fact that compared to the spoken-language task, relatively few trials were being used. Therefore, for all subjects after those initial eight, the duration of each block was doubled: all the stimuli were run twice in each block, with the ordering completely random within each block, to provide more trials per subject and hence more statistical power. To alleviate the difficulty of the task somewhat, subsequent hearing subjects were told that the signer in the video stimuli was right-hand-dominant, so that they would have a better chance of directing their attention to the relevant part of each image in determining whether it represented the “high” or “low” context. This information was not given to the deaf subjects since it was assumed that they would become aware of it on their own, though perhaps only implicitly. 241 Figure 6.1. Design of the perception task for the sign study. There were three blocks, corresponding to distance conditions 1, 2 and 3 in that order. Each block consisted of 48 stimuli of the form “I WANT” (24 for the first eight hearing subjects), half of which were taken from “high”-context sentences and half from “low”-context sentences, with ordering random within each block. 6.3. Results and discussion 6.3.1. Perception measure As with the spoken-language perception study discussed in Chapter 3, both individual and group data results will be given here in terms of the d’ statistic (see Section 3.3.1). 6.3.2. Group results For the entire group of 20 subjects (deaf and hearing combined), overall d’ scores were significantly better than chance (d’ = 0.35, p<0.001); this was true for distances 1 and 2 considered separately (respectively, d’ = 0.33, p<0.01, and d’ = 0.55, p<0.001), but not for distance 3 (d’ = 0.16, ns). (These and other key d’ scores are also presented below in Table 6.2, which in addition gives the individual subjects’ scores.) 242 Although the second group of hearing subjects had received additional instruction and had performed the task for twice the length of time as the first group, the d’ scores for the two hearing subgroups were not significantly different overall (t(13.36)=0.64, p=0.54), or for any of the other conditions investigated (either by distance or in the “extra-frame” condition). Therefore, the hearing subjects are treated as a single group for the remainder of the group analyses. The deaf and hearing groups did not differ significantly in their performance of the task, either overall (t(3.39)=0.03, p=0.98) or for any of distances 1, 2 or 3, despite the fact that half of the signers, Signers 3 and 5, showed highly significant outcomes, a much better performance relative to the group size than that of the hearing group. Interestingly, those two signers were not the earliest acquirers of ASL within the deaf group, so perceptual sensitivity in this context does not appear to have been strongly influenced by age of acquisition. However, both this result and the overall lack of a significant difference in the deaf and hearing groups’ performance may be due to the small size of the deaf group. The “extra frame” condition was also investigated with paired t-tests; subjects did not in general perform differently whether or not the extra frame was included. This was true for the entire group of 20 subjects (t(19)=0.24, p=0.81) as well as for the deaf and hearing groups considered separately (respectively, t(3)=1.31, p=0.28; t(15)=0.20, p=0.85). These group results indicate that this task was much harder, even at distance 1, than the analogous spoken-language task. Recall that in that experiment, all subjects performed at near ceiling levels in the distance-1 condition and about half also 243 performed at above-chance levels in the distance-2 condition. Rather than the easymedium-hard progression seen in that experiment, it appears that here, all three distance conditions were quite difficult. Although the intention behind the design of the present experiment was to follow an easy-medium-difficult progression by using stimuli with progressively smaller position differences between the “high” and “low” contexts, subjects actually performed better overall in the distance-2 condition than in either of the others. The difference is statistically significant, as shown by paired t-tests comparing d’ scores at distances 1 vs. 2 (t(19)=2.11, p<0.05) and 2 vs. 3 (t(19)=2.96, p<0.01). It may be the case that cues other than the location of the signer’s hand in the last frame of the video clips, such as the hand’s trajectory beforehand, were somehow more informative to subjects in the distance-2 condition (in which the sentence frame was “I WANT FIND (X)”) than in the others. However, a close examination of the stimuli used in the three conditions indicates that this is rather unlikely. All of the stimuli consist of just the sequence “I WANT,” and offer no clear evidence of what the upcoming signs in the sentence (or their location) will be. It seems more probable that the balance between the tendency of subjects’ performance to improve with practice but decline with increasing task difficulty combined to favor a better performance in the second of the three blocks than in the other two. In the next section, the individual performances of the 20 study participants are examined. 244 6.3.3. Individual results The performances of each of the 20 subjects who participated in this perception study are presented below in Table 6.2. The d’ scores for distance conditions 1, 2 and 3 are given, following the pattern established earlier, from right to left. Also given in the rightmost column are overall d’ scores, collapsed across the three distance conditions. 245 Subject Distance 3 Distance 2 Distance 1 39 (Signer 1) 40 (Signer 2) 41 (Signer 3) 42 (Signer 4) 43 (Signer 5) Deaf Average 0.00 no data 0.99* -0.11 0.32 0.32 -0.23 no data 1.64** -0.42 1.47** 0.69** 0.00 no data 0.54 -0.11 0.60 0.26 Overall score -0.07 no data 1.03*** -0.20 0.74** 0.41*** 44 45 46 47 48 49 50 51 Average -0.64 0.00 -0.64 0.00 0.76 1.17 0.67 -0.22 0.08 1.40* 0.21 0.21 0.43 0.76 0.67 1.18 1.35 0.74** 0.67 -0.43 0.00 1.11 0.00 0.00 0.67 0.43 0.31 0.43 -0.07 -0.14 0.53 0.63 0.61 0.83* 0.50 0.38** 52 53 54 55 56 57 58 59 Average 0.00 0.00 0.65 0.11 -0.23 0.32 0.32 -0.11 0.13 1.11* 0.10 0.46 0.34 -0.21 0.10 1.24* 0.44 0.42** 1.06* 1.05* 0.34 0.00 0.10 0.21 0.64 0.11 0.38* 0.62* 0.33 0.48* 0.15 -0.11 0.21 0.71* 0.14 0.31** Hearing Average 0.11 0.52*** 0.36** 0.33*** Overall Average 0.16 0.55*** 0.33** 0.35*** Table 6.2. The table shows each subject’s d’ measure for each of the three distance conditions. Significant results are shaded for individual subjects and labeled, where * = p<0.05, ** = p<0.01 and *** = p<0.001. While some participants scored at above chance levels, with Subjects 41, 43, 52 and 58 foremost among them, the results for individual subjects confirm that this task 246 was much more challenging than that used in the spoken-language perception study. For instance, in that study, a number of individuals were able to perform at abovechance levels in all three distance conditions, and all subjects attained d’ scores in excess of 2.50 for distance 1. For the present experiment, results for individuals were weak in general; for example there were no strongly significant results (p<0.01) at distance 1, even for the signers. The significance of the group results at distances 1 and 2 shows that at least for some subjects were able to perform the task with a degree of success, but the significance of those results is not matched by the weaker individual outcomes, a situation quite different from the spoken-language study. However, it would be premature to use these results to make general comparisons between the spoken and signed modalities with respect to the perceptibility of coarticulatory effects. In part, this is because the outcomes of this particular study cannot be assumed to represent the totality of individuals’ sensitivity to coarticulation. Additionally, we have already seen in Chapter 5 that the analogy drawn earlier between schwa in vowel space and location in signing space was imperfect at best, and so it is likely that perceptual effects related to those items are not directly comparable. Like the spoken-language study, however, the results obtained in the present experiment speak to the variability between subjects in coarticulation-related tasks. A few subjects in the present study reported feeling fairly confident of having done well, while others felt they had responded at random in the overwhelming majority of the trials. Such outcomes did not correspond particularly closely to subjects’ knowledge of ASL; on the contrary, Signer 3, who performed best, was a late learner of ASL and 247 Signer 5, who also performed at well above-chance levels, was also not a native signer. On the other hand, Signer 1, a native signer, and Signer 4, who learned ASL in very early childhood, did not perform at better-than-chance levels. 6.3.4. Relationship between production and perception For the subjects who participated in the experiments discussed in Chapters 2 and 3, the production and perception of VV coarticulation in the contexts examined were found not to be correlated with each other. Such a determination is much more difficult to make here, since the number of signers investigated was relatively small. One individual, Signer 3, did appear to have the most strongly significant results in both the production and perception studies, but the overall weakness of the results of the sign production study makes such determinations difficult for the group overall, and clearly, more signers would have to be tested for a meaningful comparison to be possible. 6.4. Chapter conclusion The results of this study indicate that location-to-location effects of the magnitude seen at various distances in the initial sign production study are likely to be perceptually salient at least some of the time, and are therefore likely to be relevant for users of signed language just as coarticulation appears to be so for users of spoken language. 248 This project has been largely designed around the idea of creating parallel production and perception studies of speech and signed language, but the logical successor to this chapter, a sign equivalent to Chapter 4 using ERP methodology, is not straightforward (and may not in fact be feasible) at present, in part because the MMN is evoked by audio stimuli. It is believed that there is a visual analog of the MMN (Alho, Woods, Algazi & Näätänen, 1992; Tales, Newton, Troscianko & Butler., 1999; for reviews, see Pazo-Alvarez, Cadaveira & Amenedo, 2003; Maekawa, Goto, Kinukawa, Taniwaki, Kanba & Tobimatsu, 2005; and Czigler, 2007), so it is conceivable that a sign-language experiment could be carried out, targeting the visual MMN, but before examining sub-phonemic contrasts such as the coarticulatory effects examined in the present study, a logical prior step would be to determine whether deaf subjects produce an MMN-like response in the context of a oddball paradigm involving phonemic contrasts. Since the question of what counts as a phoneme in sign languages is not a settled question, even this first step would itself require some deliberation and, most likely, trial and error. Therefore, although such a venture could be intriguing, it would require laying a substantial amount of groundwork that is beyond the scope of the present project. 249 CHAPTER 7 — DISCUSSION & CONCLUSION 7.1. Models of spoken-language coarticulation Long-distance coarticulation like that seen in the spoken-language study presented in Part I of this dissertation has implications for models of spoken-language coarticulation (for an overview of such models, see Farnetani & Recasens, 1999, and more generally Hardcastle & Hewlett, 1999). For example, in a coproduction model (Fowler, 1983), the articulatory gesture(s) associated with a given speech segment have a more-or-less fixed temporal duration which may overlap with those of neighboring segments, as seen earlier in Figure 2.1 (repeated below as Figure 7.1), the resulting output at a given moment being an interpolation or averaging of the gestures associated with the segments in play at that time. A key prediction of such a model is that since each gesture’s temporal duration is limited, its temporal range of influence on its neighbors should have a rather small upper bound. As was noted earlier, long-distance production results such as those seen in the current study appear inconsistent with this last assertion. It has been seen here that from both a production and a perception standpoint, VV coarticulatory processes may be relevant over at least three vowels’ distance, and hence across as many as five intervening segments. These results would seem to pose a problem for any model of coarticulation not allowing for considerable range of influence of segments on one another. 250 Figure 7.1. Fowler’s (1983) model of VCV articulation. The dotted vertical lines in the second picture indicate points in time where acoustic prominence shifts from one sound to the next, and where listeners are therefore likely to perceive a segment boundary. The overlapping solid lines represent the physical articulations of each sound, which can occur simultaneously. In contrast, the Window model (Keating, 1988, 1990a, 1990b) uses an approach more akin to feature spreading; gestural targets associated with linguistic segments are “windows” (which are ranges, not points), through which paths are traversed over the course of an utterance. The gestural paths between windows associated with successive segments are achieved through interpolation, and such interpolation may stretch over long distances in cases where intermediate segments are underspecified for a feature. Since this model makes no specific predictions about the limits of long-distance coarticulation, it may be considered more compatible with the kinds of long-distance results obtained in this study than the coproduction model is. However, since the 251 production data showed coarticulation occurring almost universally across [k] and frequently across [t], the idea raised by Öhman (1966) and implicit in the Window model, that VV coarticulation should be largely blocked by consonants requiring more deliberate or careful use of the tongue body (as opposed to labials like [p]), may be somewhat overstated.34 The crucial factor here appears to have been the especially large susceptibility of schwa to coarticulatory influence, regardless of the consonant context. Much of the difficulty in this area of research stems from the fact that coarticulatory patterns seem to a large extent to be idiosyncratic, varying greatly from speaker to speaker. Some of this probably has to do with the relative freedom speakers have in producing a given speech sound, exemplified in studies in which speakers successfully articulate particular vowels in spite of physical obstacles like bite blocks (Fowler & Turvey, 1980; Lindblom, Lubker & Gay, 1979). A more complete analysis will almost certainly be dependent on recruiting sizable groups of subjects, and in addition may require more sensitive measures for production and perception. For instance, an examination of Speaker 7’s numerical production data (see Appendix A) shows that the standard deviations associated with his vowel formant values tended to be smaller than those of other speakers; even if two speakers have similarly-sized vowel spaces, one speaker may “hit the targets” more accurately than the other. A good coarticulation measure may need to take this kind of variation into account (see Adank, Smits & van Hout, 2004). A better production measure would lead to a better quantification of the production/perception relationship as well. 34 For further work along these lines see, for example, Recasens, Pallarès & Fontdevila (1997) and Modarresi, Sussman, Lindblom & Burlingame (2004). 252 Flemming (1997) mentions VV coarticulation in a discussion in which he argues that phonological representations by necessity contain more phonetic information than has traditionally been assumed; his goal is a “unified account of coarticulation and assimilation.” Since coarticulatory effects at various distances are perceptible in some or many cases, as the perceptual study presented in Chapter 3 makes clear, a complete account of this phenomenon will be a complicated undertaking indeed, given the variation we see among speakers and listeners. As an example of the subtleties involved, consider the symbols [əi] and [əa] that have been used in earlier chapters to represent [i]- and [a]-colored schwas regardless of the distance involved; recall also that all listeners in the Chapter 3 perception study were sensitive to the differences between such sounds, at least sometimes. It is wellestablished that listeners can adapt very quickly to the speech patterns of the speakers they hear; for example, Ladefoged and Broadbent (1957) found that listeners’ perception of a test word of form [bVt] was influenced by the F1 and F2 of the vowels in a preceding carrier phrase. Perhaps part of the accommodation that listeners make when exposed to a particular speaker includes becoming sensitive to that speaker’s coarticulation patterns, including coarticulatory tendency at various distances. Such possibilities might require models recognizing differences among items such as [ia] (carryover [i]-coloring of [a]) versus [ai] (anticipatory [i]-coloring of [a]), 1 2 or more generally [əi ] (distance-1 anticipatory [i]-coloring of schwa), [ ao] (distance-2 carryover influence of [a] on [o]), and so on, even if only a subset of listeners is 253 sensitive to such subtleties.35 The implications of multiple simultaneous effects of different neighboring segments on a single segment might also need to be considered; recall the production results in the present study which appeared to show simultaneous VV and C-to-V effects. 7.2. Incorporating signed-language coarticulation Complex issues like those just discussed are also relevant for the study of signed language. The results seen in the sign production study presented in Part II of this dissertation, though generally not as decisive as those seen for spoken language, nevertheless appear to show that long-distance coarticulation can be found in sign language, that such effects are perceptible in many cases and as such are likely to be relevant for issues like lexical processing. However, to the best of my knowledge, no models of sign language production dealing specifically with coarticulation have been developed at all, no doubt in part because the field of sign-language linguistics is so much newer than that of spoken-language linguistics. Assuming that speech and sign are two manifestations of a single human ability (it seems unlikely that the two have somehow evolved independently), it should be possible to describe language production and perception—and in particular, the existence of coarticulation, long-distance and otherwise—in a way that applies to both modalities. Indeed, the same basic issues raised earlier in comparing the coproduction 35 In the light of such possibilities, it should be noted that if anything, the method used in this project for investigating perceptual sensitivity to coarticulatory effects probably underestimates perceivers’ sensitivity, since here the stimuli were excised from the contexts in which they had originally appeared and were played in isolation. 254 and Window models—gestural overlap and gestural targets, to what extent different gestures can influence each other, and so on—are relevant for both speech and sign. This suggests that it might be possible to create sign-language analogs of existing speech models. However, if speech and sign are indeed two manifestations of one human language capacity, then any model or theory that applies to one modality or the other but not both is in some sense incomplete. Ultimately, any model or theory that aims to describe the human language capacity in general will have to incorporate both spoken and signed language, but current approaches tend to have serious limitations along these lines. For example, phonological constraints in Optimality Theory (Prince & Smolensky, 1993) are generally expressed in modality-specific terms, such as *NC (“no nasal-voiceless obstruent sequences”; see Pater, 1999), yet are often claimed to be linguistic universals. The existence of languages like ASL shows that forms of human language are possible in which such constraints are irrelevant, just as constraints that have been proposed in sign-language research—e.g. the Selected Fingers constraint (Mandel, 1981; Corina & Sagey, 1989)—are unlikely linguistic universals if they cannot be applied in the vast majority of languages. This implies that the “alphabet” in which phonological constraints in OT or other constraint- or rule-based approaches are written needs to be general enough to make statements about consonants, vowels and tones as well as sign locations, handshapes and movements. In other words, the “alphabet” of constraints may be universal cross-linguistically even though it seems that constraints themselves are not. A useful way forward is offered by Articulatory Phonology (AP; Browman & 255 Goldstein, 1986), which provides exactly this kind of universal “alphabet,” expressed in terms of gestures and their relative timing. Although the approach taken in AP seems at first glance to be quite different in spirit from OT, the two approaches are not inconsistent (Gafos, 2002) and have been usefully combined, for example, to provide accounts of so-called “intrusive vowels” (Hall, 2003) and the behavior of rhoticconsonant clusters in Spanish (Bradley & Schmeiser, 2003; Bradley, 2006) and Norwegian (Bradley, 2007) more successfully than traditional OT approaches. By expressing the rules of language in terms of gestures, one is able to state constraints in a way that is general enough to account for phenomena seen in both the spoken and sign language modalities. In addition, constraints expressed in more traditional terms, such as *NC in spoken language or the Selected Fingers constraint in sign language, are not rendered invalid or obsolete as a result, but rather can be considered a kind of convenient shorthand for a corresponding gesture score. This may in some sense be analogous to the situation of various computer languages like Pascal and Fortran, which are underlyingly binary, but are not generally dealt with directly in that way by the people who use them because it is inconvenient to do so. A further question which must then be raised is whether a system thus capable of incorporating both sign and speech can be general enough to include linguistic phenomena but not so general that it includes other facets of human behavior, such as locomotion or dancing. On the other hand, is it possible that any system capable of successfully modeling both speech and sign must necessarily incorporate more human actions than just linguistic ones? Research on mirror neurons and their properties (Rizzolatti, Fadiga, Gallese & Fogassi, 1996; Arbib & Rizzolatti 1997; Rizzolatti & 256 Arbib 1998) and related work by Arbib (2005) on the origins of language, positing intricate links between non-linguistic actions and both signed and spoken language, suggest that indeed, a strict segregation between “linguistic” and “non-linguistic” human actions may not be feasible. 7.3. Cross-modality contrasts Even if sign and speech can be modeled together under a single unified system along the lines described above, this obviously would not imply that there are no important differences between the two modalities. Some of these differences have emerged in the present research project. For example, a comparison of the temporal extent of coarticulation in sign and speech indicates that there are substantial crossmodality differences in the temporal extent of these effects. This may be considered unsurprising on the one hand, since signing requires the movement of larger, slower articulators, but it does have implications for language planning, as discussed earlier in the dissertation, in that it suggests the units of production planning at the phonetic level seem likely to be expressed in terms of gestures or related units, rather than on an absolute time scale such as number of milliseconds. The difficulty of determining suitable analogs between speech and sign extends also to the basic terminology used throughout this project. For example, “distance-2” effects were seen in both the speech and sign studies discussed in earlier chapters, and in both cases seemed to represent something of a threshold for perceptual sensitivity. This apparent similarity between modalities is belied by the fact that it is not at all clear 257 that the “distance” metric is comparable in both cases, since it refers to vowels in one case (with intervening consonants not counted) and sign location in the other (locations for each intervening sign being counted toward the total). Therefore, in segmental terms, the extent of VV and LL effects were probably not being measured consistently, but in syllabic terms they probably were.36 A contrast that stands out particularly strongly in this study’s examination of speech and sign is the greater variation that was apparent in the ASL data, seen both with respect to different subjects—for example in their ages of acquisition of ASL and in differences apparently related to their regional/dialectal backgrounds—and in the behavior of individual signers. Certainly there is variation among users of American English, but in common words like those used in the production-study sentences, such differences might be expected to manifest themselves as relatively subtle variations within individual lexical items (for example the use of [a] by some speakers in some words where other speakers might use a vowel closer to [ɔ]), whereas the signers studied here often used completely different signs for items that might be considered relatively common, such as for “hat,” “cupcake,” and “pants,” and differed significantly in other ways, such as in the degree of acceptability of the “I WANT … (X) I” sentence frame with or without the final “I.” Individual signers exhibited other behaviors that are interesting from a phonetic/phonological point of view, such as Signer 5’s frequent modification of PANTS, in which she maintained the usual handshape and movement of the sign but substituted a neutral-space location (see the left side of Figure 7.2 36 Whether or not either of these assertions is defensible, of course, depends on how (or even whether) segments or syllables can be defined in sign language, something that is as yet not settled. 258 below), and Signer 3’s signing of BOSS in the front of, rather than on top of, his shoulder. Neither of these behaviors were observed for other signers in this study, but the fact that they were made at all may have significant implications, for the following reasons. Figure 7.2. Location assimilation in PANTS (left) and a contracted form of “I WANT.” Both Subject 5’s assimilation of PANTS to a neutral-space position and Subject 3’s signing of BOSS on the front side of the shoulder—i.e. closer to neutral space— argue against the idea that neutral signing space might be a kind of underspecified sign location. In these cases, it appears that non-neutral space signs were being shifted toward neutral space to facilitate signing with the preceding signs WANT, GO, and/or FIND, which appeared consecutively and were of course frequently signed by participants during the course of the production experiment. This is quite unlike the 259 behavior of subjects in the spoken-language study with respect to schwa; no speakers substituted schwa for [i] or [a] in the words “key” or “car,” despite the numerous repetitions of these words required in the speech production study, nor would we expect a fluent English speaker to do so. None of these situations, either in English or ASL, involved weak prosodic contexts like those in which such reduction might be expected. This difference is also relevant to the earlier discussion about the possibility that schwa and neutral space might operate in an analogous fashion in English and ASL in terms of articulatory behavior. The results obtained here indicate such a comparison may be reasonable up to a point, but that the relationships between schwa and other vowels on the one hand, and neutral signing space and other sign locations on the other, are not parallel. In Signer 3’s articulation of BOSS and particularly in Signer 5’s articulation of PANTS, where it was expected that neutral-space signs would have their locations shifted outward from the middle of the articulatory space as schwa does in English, signs with locations in the outer regions of the articulatory space had their locations moved to a more central position, toward neutral signing space. In terms of articulatory ease, the signing behavior so observed is certainly not inexplicable, but it is substantially different from what was observed in spoken language with schwa and the context vowels that were examined. These context vowels, articulated near the four corners of the vowel quadrangle, did not assimilate to a more central, schwa-like position for any of the 38 speakers who took part in the speech production study. The use of the term assimilation rather than coarticulation in connection with Signer 5’s signing of PANTS in neutral space is apt, as the location change was never gradient over the course of the sign and in each case was either completely present or 260 absent; i.e. Signer 5 always signed PANTS either with contact on the lower torso or in its customary position in the middle of neutral space, never between. A representation of this location assimilation using the Hand-Tier model (Sandler, 1986, 1987, 1989; see Chapter 1) is given below in Figure 7.3. At left is depicted a neutral-space sign (like those which always preceded PANTS in this study), whose location in the Hand-Tier model is specified as having trunk as major body location, with a height setting of “mid” and a distance setting of “distal.” The sign PANTS, depicted at right, has the same major body location but different underlying location settings. The location shift illustrated here is expressible as a feature value change and hence fits the definition of assimilation that was given in Chapter 1. This situation, in which a non-neutral space sign acquires neutral-space location as the result of assimilation, is complemented by some realizations of the sequence “I WANT,” such as the one depicted at right in Figure 7.2. As noted in Chapter 5, this sequence was often contracted by Signer 5, resulting in a single sign form which as represented in the Hand-Tier model shares most of its feature specifications with “I” but has the same hand-configuration as WANT. In this case, the resulting sign form loses the neutral-space location specified for WANT in favor of the torso contact specified for “I.” 261 Figure 7.3. Hand-Tier representation of Signer 5’s variant of PANTS showing assimilation to the location of a preceding neutral-space sign. The general pattern seen in the data obtained in this project was that the LL coarticulatory effects seen in ASL were weaker than the VV effects found in English, apparently due in part to the greater overall variability seen in the sign data relative to the spoken-language data. Despite these differences, it would be premature to make broad generalizations from them about cross-modality contrasts. One reason for this is that the sign study focused only on sign location, while coarticulatory effects involving other parameters are likely to be relevant as well, for both production and perception (cf. Cheek, 2001; Corina & Hildebrandt, 2002; Mauk, 2003). Since the contexts in which data were obtained here were also by necessity relatively limited, only a thin cross-section of the phenomenon of sign-language coarticulation could be explored. 262 In addition, in the experiments carried out for this project, the sign task was much longer than the speech task. This was so because the distance-1, -2, and -3 vowels appeared consecutively in the English sentences that were used, something that was possible because of the ease with which each vowel’s auditory properties could be analyzed computationally. In contrast, neutral-space signs are not so easy to investigate in general with the methodology used here, because not all such signs have signatures that are clear when analyzed using motion-capture technology. It was the easily distinguished movement characteristics of the sign WANT that made that sign a desirable object of study, but different sentences were needed to create the various distance conditions, since the same sign could not be repeated consecutively in any one (meaningful) sentence. Therefore the signers who participated had to articulate many more sentences in total (on the order of 300 instead of 30 as in the spoken-language task), which might itself have led to modality-independent differences in articulatory behavior. A different issue is the possibility that the presence of denser or richer information in the visual signal than in the auditory signal means that there is less need for sign-language users to be attuned to extra cues in the language signal, either in the process of producing that signal or while acting as perceivers. In other words, to the extent that consistent, perceptible coarticulatory patterns may assist the perceiver, perhaps this is simply not as necessary in the visual modality. A related issue is that the reductions seen here are relatively unlikely to result in confusable homophones in ASL, at least in the contexts that were considered in this study. For example, PANTS articulated in a neutral-space position or BOSS articulated on the front of the shoulder 263 are unlikely to be confused with other signs, while in contrast, a [k] plus schwa sequence, with schwa a reduced version of some full vowel, could have originated as either “key,” “coo” or “caw”; hence there is a greater need in the English contexts for such full vowels not to be reduced, to the extent that language users prefer to avoid such ambiguities. Therefore there may be greater freedom for ASL users to modify sign location than there is for English speakers to modify articulatory-space positions of the vowels they produce.37 7.4. Dissertation conclusion This project examined the extent and perceptibility of long-distance coarticulatory effects in both spoken and signed language, and the degree to which these vary among language users. The speech study found that anticipatory VV coarticulation can occur over at least three vowels’ distance in natural discourse, and that even such long-distance effects can be perceived by some listeners. Both coarticulatory strength and perceptual sensitivity varied greatly among study participants. The ERP study found significant results for nearer-distance VV effects but not for the longest-distance ones. The possibility of interplay between coarticulatory production and perception was investigated, but no significant correlation between the two was found. Speaking rate and coarticulation strength were also found not to be correlated. 37 The issue of homophone avoidance has long been discussed in linguistic research (e.g. Gilliéron, 1918), though its importance may have been somewhat overstated at times, given how often homophony does in fact occur. Issues related to homonymy are relevant in sign language research as well; for example, see work by Siedlecki and Bonvillian (1998) on the production of homonyms by children acquiring ASL. 264 The sign language production study found evidence of long-distance LL coarticulation, but these effects were in general weaker and less pervasive than the VV effects found in the spoken-language study. The signers who took part showed a great deal of variation in their sign production, probably more than was found between speakers in their speech production. Also incorporated into the sign production study was an attempt to compare “linguistic” and “non-linguistic” coarticulation. The effects that were found in the “non-linguistic” contexts were similar in terms of spatial magnitude as well as statistical significance to those found in the context of actual signs, and were weaker than the effects found in the speech production study. The sign language perception study found that LL coarticulatory effects were perceptible to some signers as well as to some hearing non-signers. However, results here were also generally weaker than those in the spoken-language perception study. A number of factors have been examined which individually or collectively may explain these cross-modality contrasts. Such factors include the differences in the mass of the articulators involved and in the media in which the relevant linguistic information is carried. If we accept that speech and sign are two manifestations of a single human language capacity, cross-modality studies such as this one will prove indispensible as we seek to understand the complexities which underlie the phenomenon as a whole. 265 APPENDIX A Formant values for first production experiment: 3 distance conditions, 2 vowel contexts, 20 speakers The following tables show the average target-vowel F1 and F2 values for each speaker in each distance condition and vowel context obtained in the first production experiment, together with the associated standard deviations. Note that the “a” and “i” labeling refers to context, not the measured vowels themselves, which were always schwa or [^]. Note also that these measurements were made near the end of the target vowels, where coarticulatory influence of the context vowel was expected to be strongest, and where effects of the immediately following consonant appear to be seen as well. Therefore these formant values should not be expected to correspond too closely to the values one would obtain in the steady-state portion of a schwa vowel. The first, second and third tables show results for the distance-1, -2 and -3 conditions, respectively. Significant results are noted, where * = p<0.05, ** = p<0.01 and *** = p<0.001. A pictorial representation of the results for each subject is given in Appendix B. 266 Table A1. Formant values of distance-1 target vowels in [a] vs. [i] context Speaker 1 (f) 2 (m) 3 (f) 4 (f) 5 (f) 6 (m) 7 (m) 8 (f) 9 (m) 10 (f) 11 (f) 12 (m) 13 (m) 14 (f) 15 (f) 16 (f) 17 (m) 18 (m) 19 (m) 20 (f) F1 mean 431 408 359 527 585 534 484 452 355 439 458 309 445 432 465 359 447 353 429 428 [a] SD 34 14 96 104 42 48 8.6 110 22 77 93 93 51 35 29 87 58 68 49 55 F1 [i] mean SD 405 43 282 36 315 57 471 57 423 55 393 33 419 4.4 379 62 300 32 377 110 418 97 195 109 331 53 300 45 345 52 307 51 339 53 251 39 308 36 305 33 *** *** *** *** ** * ** *** *** ** * *** *** F2 [a] mean SD 1991 200 1442 28 1729 72 2140 239 1698 70 1622 62 1538 43 1814 88 1837 43 2145 147 2089 207 1562 63 1727 33 1914 129 1826 60 2138 480 1577 18 1808 51 1634 114 1896 93 F2 [i] mean SD 2274 121 1992 78 2644 165 2524 292 2360 100 1893 138 1980 21 2605 228 2025 90 2731 297 2299 347 1843 93 2123 76 2437 305 2235 252 2499 354 1932 122 1979 221 1854 83 2862 106 * *** *** * *** ** *** *** ** ** *** *** ** ** *** ** *** 267 Table A2. Formant values of distance-2 target vowels in [a] vs. [i] context Speaker 1 (f) 2 (m) 3 (f) 4 (f) 5 (f) 6 (m) 7 (m) 8 (f) 9 (m) 10 (f) 11 (f) 12 (m) 13 (m) 14 (f) 15 (f) 16 (f) 17 (m) 18 (m) 19 (m) 20 (f) F1 mean 572 461 633 757 594 551 483 870 528 617 798 539 631 721 612 794 621 459 506 588 [a] SD 31 7.6 82 14 9.6 42 9.0 20 30 42 34 43 12 38 22 37 18 32 19 29 F1 [i] mean SD 583 21 400 57 404 100 746 35 558 26 515 52 478 22 832 68 487 26 536 50 803 48 439 78 617 19 664 55 595 41 802 37 587 26 360 83 489 17 563 53 * *** ** * ** * * * * F2 [a] mean SD 1622 73 1440 38 1731 61 1979 72 1780 79 1311 68 1363 14 2052 50 1418 43 1964 35 1993 42 1611 59 1439 113 1866 43 1519 48 1919 105 1391 27 1422 43 1491 70 1706 34 F2 [i] mean SD 1659 56 1515 56 2009 47 1978 72 1914 39 1411 64 1504 16 2167 82 1481 45 2069 29 2058 55 1632 32 1523 42 1973 50 1620 39 1735 382 1523 15 1404 92 1548 33 1841 102 * *** ** * *** ** * *** * ** ** *** * 268 Table A3. Formant values of distance-3 target vowels in [a] vs. [i] context Speaker 1 (f) 2 (m) 3 (f) 4 (f) 5 (f) 6 (m) 7 (m) 8 (f) 9 (m) 10 (f) 11 (f) 12 (m) 13 (m) 14 (f) 15 (f) 16 (f) 17 (m) 18 (m) 19 (m) 20 (f) F1 mean 587 455 718 617 692 462 542 489 310 395 525 452 214 507 375 318 462 390 509 416 [a] SD 27 89 88 105 48 210 40 199 87 116 78 91 31 124 64 99 63 58 36 117 F1 [i] mean SD 635 45 479 10 704 95 592 87 722 67 530 37 515 35 367 258 287 103 290 158 434 149 487 94 169 64 468 79 416 94 546 180 500 57 318 82 526 19 443 158 F2 [a] mean SD 1503 251 1127 35 1412 235 1802 139 1497 63 1220 294 1201 69 1655 103 1294 109 1655 134 1788 157 1184 130 1201 52 1502 58 1356 211 1680 137 1248 85 1269 86 1261 73 1501 275 F2 [i] mean SD 1527 117 1116 99 1648 49 1767 213 1577 96 1290 254 1297 31 1709 138 1319 59 1679 114 1706 149 1257 218 1196 140 1543 100 1380 197 1641 62 1314 61 1208 88 1312 47 1690 115 * ** 269 APPENDIX B Vowel space graphs for first production experiment: 3 distance conditions, 2 vowel contexts, 20 speakers The figures starting on the next page were generated from the numerical data summarized in Appendix A and show average context-vowel and distance-1, -2 and -3 target-vowel positions in vowel space for each of the 20 speakers who took part in the first production experiment. For each distance condition and each speaker, the [i]- and [a]-colored target vowels are joined by a line which is dotted back if neither F1 nor F2 were significantly different (at the p<0.05 level) between contexts, solid green if only one of F1 or F2 was significantly different, and solid red if both were significantly different. Context vowels are joined by a solid blue line. Context and distance-1, -2 and -3 vowel pairs are labeled with progressively smaller text size. Each speaker’s gender is also given. Again, it is important to keep in mind that measurements for target vowels were made near the end of those vowels, where coarticulatory influence of the context vowel was expected to be strongest, and where effects of the immediately following consonant appear to be seen as well. Therefore these formant values should not be expected to correspond too closely to the values one would obtain in the steady-state portion of a schwa vowel. 270 271 272 273 274 275 276 277 278 279 280 APPENDIX C Formant values for second production experiment: 3 distance conditions, 5 vowel contexts, 21 speakers The following tables show the average target-vowel F1 and F2 values for each speaker in each distance condition and vowel context obtained in the second production experiment, together with the standard deviation of the values of each formant obtained at each distance. Results for Speakers 3, 5 and 7 are given after those of the other speakers. Note that the “a” and “i” labeling refers to context, not the measured vowels themselves, which were always schwa or [^]. A summary of significance testing results for these data is given in Appendix D. A pictorial representation of these results for each subject is given in Appendix E. Formant means & SDs by distance for Speaker 21 Distance [a] [æ] [^] [i] [u] 3 F1 482 449 446 430 439 3 F2 1527 1470 1440 1479 1471 2 F1 584 531 570 560 557 2 F2 1751 1754 1809 1809 1722 1 F1 449 389 426 352 407 1 F2 2126 2343 2217 2564 2184 SDs 19.4 31.5 19.4 38.6 36.5 174.3 281 Formant means & SDs by distance for Speaker 22 Distance [a] [æ] [^] [i] [u] 3 F1 497 505 521 472 498 3 F2 1333 1213 1292 1293 1252 2 F1 595 549 580 547 579 2 F2 1770 1751 1764 1803 1800 1 F1 402 371 399 317 386 1 F2 1821 2286 2039 2354 2021 SDs 17.4 45.7 21.2 22.7 34.6 216.3 Formant means & SDs by distance for Speaker 23 Distance [a] [æ] [^] [i] [u] 3 F1 441 456 453 448 436 3 F2 1291 1333 1306 1334 1304 2 F1 607 609 552 563 557 2 F2 1574 1597 1591 1636 1622 1 F1 513 477 513 421 472 1 F2 1514 1884 1725 2039 1759 SDs 8.2 18.9 28.0 24.8 37.8 195.1 Formant means & SDs by distance for Speaker 24 Distance [a] [æ] [^] [i] [u] 3 F1 429 441 422 429 411 3 F2 1150 1159 1191 1173 1141 2 F1 452 465 459 444 438 2 F2 1426 1454 1433 1447 1457 1 F1 457 390 413 369 378 1 F2 1517 1750 1614 1785 1642 SDs 11.0 19.5 11.0 13.6 34.8 107.9 282 Formant means & SDs by distance for Speaker 25 Distance [a] [æ] [^] [i] [u] 3 F1 498 464 477 490 483 3 F2 1128 1246 1177 1121 1175 2 F1 497 526 496 444 480 2 F2 1603 1563 1582 1671 1603 1 F1 503 460 469 403 411 1 F2 1830 1836 1830 1835 1823 SDs 12.6 49.9 29.9 40.9 41.9 5.1 Formant means & SDs by distance for Speaker 26 Distance [a] [æ] [^] [i] [u] 3 F1 415 421 461 457 440 3 F2 1215 1220 1266 1236 1186 2 F1 538 543 537 524 588 2 F2 1513 1534 1502 1530 1509 1 F1 415 409 424 349 394 1 F2 1518 1819 1574 2001 1492 SDs 20.6 29.4 24.3 13.9 29.5 221.1 Formant means & SDs by distance for Speaker 27 Distance [a] [æ] [^] [i] [u] 3 F1 469 478 501 407 480 3 F2 1076 1104 1076 1124 1132 2 F1 575 531 558 532 511 2 F2 1586 1537 1579 1647 1592 1 F1 437 436 463 389 432 1 F2 1666 1845 1625 1957 1675 SDs 22.5 29.6 25.3 39.2 26.7 141.6 283 Formant means & SDs by distance for Speaker 28 Distance [a] [æ] [^] [i] [u] 3 F1 433 410 472 394 441 3 F2 1248 1255 1287 1232 1298 2 F1 457 446 419 369 449 2 F2 1512 1589 1548 1638 1572 1 F1 413 316 354 268 380 1 F2 1659 2134 1704 2246 1764 SDs 29.9 27.6 36.1 47.3 56.0 269.0 Formant means & SDs by distance for Speaker 29 Distance [a] [æ] [^] [i] [u] 3 F1 467 464 531 525 468 3 F2 1437 1451 1490 1472 1441 2 F1 513 548 514 503 488 2 F2 1771 1835 1835 1851 1795 1 F1 402 392 380 344 355 1 F2 1841 2548 2048 2680 2165 SDs 31.6 18.5 21.8 33.2 24.4 349.6 Formant means & SDs by distance for Speaker 30 Distance [a] [æ] [^] [i] [u] 3 F1 375 382 367 330 386 3 F2 1328 1317 1362 1410 1334 2 F1 586 595 577 531 578 2 F2 1762 1843 1790 1840 1751 1 F1 363 309 326 272 309 1 F2 1892 2327 1980 2347 1973 SDs 26.0 37.7 25.1 42.7 32.9 215.9 284 Formant means & SDs by distance for Speaker 31 Distance [a] [æ] [^] [i] [u] 3 F1 470 420 457 461 424 3 F2 1259 1230 1245 1244 1209 2 F1 563 574 557 534 543 2 F2 1376 1425 1416 1456 1436 1 F1 428 399 376 391 361 1 F2 1873 2045 1991 2308 1913 SDs 22.7 19.0 16.1 29.7 25.5 171.1 Formant means & SDs by distance for Speaker 32 Distance [a] [æ] [^] [i] [u] 3 F1 550 531 546 527 509 3 F2 1134 1138 1149 1138 1111 2 F1 484 484 508 495 476 2 F2 1350 1343 1365 1442 1382 1 F1 422 420 427 379 388 1 F2 1647 1874 1670 1968 1740 SDs 16.2 13.9 12.3 39.5 21.9 137.5 Formant means & SDs by distance for Speaker 33 Distance [a] [æ] [^] [i] [u] 3 F1 349 333 311 367 353 3 F2 1375 1477 1735 1400 1676 2 F1 589 563 550 500 532 2 F2 1723 1816 1736 1836 1782 1 F1 391 312 344 303 307 1 F2 1950 2574 1995 2469 2271 SDs 40.7 158.0 33.5 48.9 37.3 277.5 285 Formant means & SDs by distance for Speaker 34 Distance [a] [æ] [^] [i] [u] 3 F1 455 468 459 458 451 3 F2 1125 1117 1127 1126 1100 2 F1 503 521 522 481 483 2 F2 1388 1394 1376 1457 1413 1 F1 415 398 394 367 383 1 F2 1624 1804 1704 1978 1705 SDs 6.3 11.1 19.8 31.8 17.8 136.2 Formant means & SDs by distance for Speaker 35 Distance [a] [æ] [^] [i] [u] 3 F1 446 441 435 413 419 3 F2 1324 1272 1277 1270 1274 2 F1 519 470 471 486 452 2 F2 1519 1459 1456 1500 1438 1 F1 384 358 373 312 333 1 F2 1859 2033 1837 2133 1784 SDs 33.3 27.7 25.2 33.5 29.2 147.3 Formant means & SDs by distance for Speaker 36 Distance [a] [æ] [^] [i] [u] 3 F1 521 486 454 508 486 3 F2 1369 1378 1404 1430 1432 2 F1 584 560 548 473 508 2 F2 1673 1723 1678 1706 1683 1 F1 409 387 400 388 371 1 F2 1883 2150 1976 2191 1841 SDs 25.5 29.1 44.1 21.2 14.6 156.7 286 Formant means & SDs by distance for Speaker 37 Distance [a] [æ] [^] [i] [u] 3 F1 489 463 457 453 454 3 F2 1178 1161 1144 1149 1169 2 F1 532 522 521 439 518 2 F2 1464 1482 1513 1539 1478 1 F1 410 387 392 361 386 1 F2 1686 1907 1634 1956 1771 SDs 11.2 17.7 38.2 30.4 17.6 138.2 Formant means & SDs by distance for Speaker 38 Distance [a] [æ] [^] [i] [u] 3 F1 367 309 325 312 336 3 F2 1267 1256 1246 1272 1283 2 F1 505 493 450 421 415 2 F2 1679 1673 1709 1758 1706 1 F1 325 303 323 282 304 1 F2 2204 2355 2338 2541 2252 SDs 47.5 15.4 41.2 33.6 17.4 129.3 Formant means & SDs by distance for Speaker 3 Distance [a] [æ] [^] [i] [u] 3 F1 694 709 657 627 634 3 F2 1354 1430 1364 1418 1402 2 F1 507 544 542 486 502 2 F2 1718 1811 1758 1920 1790 1 F1 437 432 404 360 382 1 F2 1939 2516 1984 2633 1813 SDs 36.2 33.3 25.6 76.2 32.7 370.3 287 Formant means & SDs by distance for Speaker 5 Distance [a] [æ] [^] [i] [u] 3 F1 702 696 730 684 668 3 F2 1437 1446 1436 1417 1373 2 F1 509 502 474 486 480 2 F2 1745 1806 1772 1853 1767 1 F1 493 460 475 482 359 1 F2 1800 2455 1906 2423 1785 SDs 23.0 29.2 14.8 41.8 54.5 337.1 Formant means & SDs by distance for Speaker 7 Distance [a] [æ] [^] [i] [u] 3 F1 476 499 493 479 462 3 F2 1098 1125 1147 1143 1114 2 F1 450 471 474 452 476 2 F2 1434 1391 1342 1471 1403 1 F1 461 433 446 357 428 1 F2 1637 2064 1675 2151 1587 SDs 14.4 20.0 12.5 48.2 40.1 263.5 288 APPENDIX D Significant testing for second production experiment: 3 distance conditions, 5 vowel contexts, 21 speakers For each speaker, two adjacent tables are given, separated by a thick border. The table at left shows the level of significance associated with the speaker’s formant frequency differences between contexts, for each of 10 context vowel pairs and 3 distance conditions, for each of F1 and F2. A blank cell indicates an outcome in the contrary-to-expected direction (e.g. F1 higher for the [i] than [a] context). The bottom row is a tally of significant results (for each of the p<0.05 and p<0.01 levels of significance) for each of the three distance conditions, summed over both F1 and F2. The table at right provides, for the given speaker, a summary of significance testing outcomes for each formant at each distance and for each contrasting contextvowel pair, with * = p<0.05, ** = p<0.01, *** = p<0.001, + = p<0.10, and a √ indicating a non-significant outcome in which averages were nevertheless in the expected direction. The bottom row gives the number of results in the column above which were significant at the p<0.05 level or higher. Spkr 21 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # sig Spkr 22 Dist. 3 F1 F2 0.121 0.307 0.193 0.059 0.333 0.473 0.364 0.374 0.819 0.427 <.05 0 Dist. 2 F1 F2 0.023 0.021 0.169 0.434 0.073 0.409 0.235 0.064 0.454 0.207 0.429 0.036 3 2 Dist. 1 F1 F2 0.000 0.069 0.162 0.051 0.096 0.000 0.001 0.035 0.151 0.001 0.021 0.124 0.005 0.160 0.210 0.497 0.002 0.000 0.435 0.857 0.010 0.025 0.000 0.242 0.115 0.221 0.262 <.01 <.05 <.01 <.05 <.01 0 4 0 11 8 1 Dist. 3 F1 F2 Dist. 2 F1 F2 Dist. 1 F1 F2 √ * + *** *** √ * √ + *** √ √ √ √ * + √ + √ + √ √ √ * + ** √ √ * √ √ ** √ √ √ √ ** *** √ √ √ * * *** √ √ √ √ √ # of sig. results, p<.05, by column 0 0 2 2 3 8 3 2 1 289 F1 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # sig F2 3 2 F2 0.128 0.123 0.148 0.339 0.322 0.717 0.222 0.484 0.381 0.157 0.037 0.159 0.316 0.105 0.267 0.114 0.061 0.475 <.05 <.01 1 0 Spkr 24 F2 F2 0.048 0.949 0.036 0.030 0.018 0.021 0.009 0.001 0.037 0.266 0.003 0.011 0.403 F1 0.116 0.353 0.020 0.692 0.289 F2 0.000 0.000 0.052 0.000 0.002 0.019 0.003 0.009 0.011 0.010 0.397 0.039 0.069 0.000 0.000 0.769 0.192 0.031 0.001 0.025 <.05 <.01 <.05 <.01 10 3 15 9 2 F2 F1 F1 F1 0.323 0.076 0.003 0.481 0.306 0.029 0.029 0.357 0.326 0.068 0.169 0.113 0.049 0.006 0.260 0.094 0.150 0.157 0.273 0.024 0.065 0.260 0.003 0.306 0.159 0.100 0.033 0.363 0.100 0.634 0.607 F2 0.000 0.000 0.006 0.001 0.185 0.005 0.012 0.000 0.000 F1 F2 F1 * * √ √ √ √ ** √ √ √ + F2 *** *** * √ √ * √ √ + √ ** √ ** + √ √ √ ** ** √ √ √ √ * *** + √ √ √ √ # of sig. results, p<.05, by column 0 0 2 0 3 8 F1 3 F2 F1 2 F2 F1 1 F2 √ * *** *** *** √ √ * + *** √ * √ ** √ √ * ** * ** √ √ * * ** * √ √ * √ * * √ ** √ * √ √ + *** *** √ √ √ √ * *** + √ * # of sig. results, p<.05, by column 1 0 6 4 6 9 √ 1 F2 F2 √ √ 1 F1 3 F1 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] F1 0.049 0.140 0.008 0.000 0.049 0.344 0.000 0.281 0.449 0.012 0.170 0.222 0.256 0.246 0.021 0.157 0.107 0.474 0.064 0.067 0.169 0.008 0.373 0.003 0.056 0.491 0.139 0.119 0.009 0.002 0.350 0.258 0.232 0.472 0.024 0.001 0.093 0.291 0.481 0.284 0.422 <.05 <.01 <.05 <.01 <.05 <.01 0 0 2 0 11 8 F1 # sig F2 0.221 0.770 Spkr 23 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] F1 F1 √ √ √ √ √ + √ 3 F2 √ √ * √ √ √ √ F1 √ √ √ + √ ** √ √ 2 F2 + * √ * F1 ** * + ** √ * + √ * √ 1 F2 *** *** ** ** √ ** * *** *** 290 [^]-[u] # sig 0.276 0.028 0.008 <.05 <.01 <.05 2 0 5 Spkr 25 3 F1 Context [i]-[a] [a]-æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # sig F1 F1 F2 0.006 0.477 0.177 0.472 0.132 0.007 0.902 0.020 0.476 0.005 0.436 0.012 0.480 0.706 0.444 0.002 0.465 <.05 <.01 6 4 3 F2 F1 2 F2 F1 F1 F2 F1 F2 0.040 0.089 0.147 0.002 0.069 0.001 0.711 0.207 0.139 0.965 0.004 0.498 0.244 0.176 0.025 0.780 0.419 0.851 0.024 0.199 0.026 0.043 0.066 0.162 0.019 F1 2 F2 F1 1 F2 √ + ** √ * + √ √ √ √ √ √ + √ ** √ √ * * √ + √ √ ** ** √ √ * * √ √ √ + √ √ √ √ ** √ # of sig. results, p<.05, by column 1 1 1 2 6 0 F1 3 F2 √ √ + * √ F1 F1 * √ * 3 F2 + √ √ √ √ √ √ √ √ 2 F2 1 F2 F1 √ √ ** √ 2 F2 F1 1 F2 ** + √ *** ** √ * √ * √ + * ** √ * √ √ √ √ √ * √ √ √ * √ √ √ * √ √ √ ** * √ + √ √ + ** √ * √ √ # of sig. results, p<.05, by column 0 2 0 0 3 6 1 F2 3 F2 √ * √ √ 1 0.286 0.343 0.008 0.007 0.905 0.317 0.803 0.020 0.489 0.183 0.896 0.197 0.735 0.337 0.017 0.147 0.453 0.250 0.035 0.266 0.293 0.272 0.017 0.426 0.364 0.269 0.002 0.011 0.440 0.082 0.188 0.317 0.099 0.006 0.196 0.016 0.100 0.175 <.05 <.01 <.05 <.01 <.05 <.01 2 0 0 0 9 4 F1 √ * ** + # of sig. results, p<.05, by column 0 2 2 3 4 8 1 F2 2 F2 <.01 9 0.268 0.840 0.463 0.065 0.043 0.394 Spkr 27 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] F1 3 F1 # sig 2 F2 0.298 0.257 0.055 0.038 0.045 0.068 0.187 0.209 0.464 0.142 0.302 0.059 0.995 0.167 0.013 0.176 0.071 0.140 0.004 0.265 0.047 0.598 0.654 0.062 0.492 0.181 <.05 <.01 <.05 <.01 2 0 3 1 Spkr 26 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] <.01 2 0.087 <.05 12 F1 √ √ √ * 291 [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # sig 0.213 0.422 0.009 0.007 0.055 0.233 0.020 0.016 0.003 0.035 0.580 0.009 0.034 0.001 0.153 0.027 0.129 <.05 <.01 <.05 <.01 <.05 <.01 4 1 6 2 9 5 Spkr 28 3 F1 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # sig F2 0.120 0.439 0.428 0.194 0.180 0.238 0.013 0.234 0.228 <.05 1 Spkr 29 <.01 0 # sig 1 F2 0.015 0.750 0.171 0.412 0.016 0.226 0.007 0.017 0.214 0.080 0.098 0.149 0.119 0.094 0.042 0.032 0.046 <.05 7 <.01 1 F1 0.000 0.000 0.093 0.008 0.028 0.000 0.000 0.002 0.000 0.006 0.000 <.05 14 2 F2 F1 3 F2 F1 2 F2 F1 <.01 11 F1 √ √ F2 0.000 0.000 0.011 0.034 0.175 0.000 0.006 0.000 0.002 <.01 7 F1 F2 0.008 0.066 0.067 0.062 0.001 0.000 0.689 0.701 0.051 0.009 0.006 0.347 0.237 0.383 0.248 0.027 0.119 √ √ √ F1 * √ √ √ * √ 2 F2 1 F2 F1 ** * √ + + √ √ * * *** *** *** *** * + √ ** * * *** *** ** *** ** *** 2 F2 1 F2 * + √ * √ # of sig. results, p<.05, by column 1 0 3 4 6 8 F1 3 F2 F1 ** √ √ √ √ + √ √ F1 F1 ** ** *** * √ *** + √ * √ √ * * * √ + √ + √ √ *** √ * * √ ** √ √ √ √ *** + √ √ * √ ** √ + √ √ √ # of sig. results, p<.05, by column 0 0 2 4 2 8 √ 1 F2 3 F2 √ 1 0.400 0.009 0.006 0.301 0.024 0.752 0.058 0.266 0.299 0.357 0.032 0.015 0.276 0.100 0.084 0.489 0.395 0.424 0.036 0.043 0.163 0.453 0.354 0.321 0.171 0.092 0.193 0.655 0.012 0.653 0.135 0.098 0.253 0.122 0.258 <.05 <.01 <.05 <.01 <.05 0 0 6 1 10 F1 F2 0.000 0.000 0.012 0.147 0.020 0.121 0.952 0.383 0.051 0.913 0.313 Spkr 30 Context [i]-[a] [a]-[æ] [a]-[^] F1 3 F1 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] 2 √ √ ** ** + √ * * ** * √ ** * *** √ * √ # of sig. results, p<.05, by column 4 0 2 4 3 6 3 F2 + √ √ √ F1 2 F2 + √ √ + + √ F1 1 F2 *** *** ** ** * √ 292 [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # sig 0.907 0.422 0.006 0.014 0.035 0.245 0.245 0.326 0.045 0.121 0.107 0.012 0.075 0.304 0.272 <.05 <.01 <.05 5 2 3 Spkr 31 3 F1 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # sig F2 0.091 0.576 0.390 0.296 0.055 0.226 0.301 0.208 0.152 0.239 0.207 0.799 0.193 0.259 0.353 <.05 <.01 <.05 0 0 5 3 F1 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # sig 0.220 0.527 0.444 0.092 0.417 0.298 0.442 0.016 0.015 0.000 0.001 0.462 <.01 8 2 1 F1 0.347 0.201 0.345 0.085 0.188 0.371 Spkr 32 0.825 0.005 0.029 0.101 0.036 0.123 0.007 0.044 0.051 0.181 0.102 <.01 <.05 0 12 F2 F1 0.004 0.027 0.039 0.042 0.104 0.322 0.053 0.204 0.015 0.004 0.330 0.113 0.028 0.000 0.054 0.032 0.714 0.014 0.289 0.163 0.046 0.000 0.229 0.110 0.004 0.188 0.242 <.01 <.05 <.01 1 8 4 2 F2 F2 F1 F1 F2 0.455 0.019 0.003 0.444 0.981 0.801 0.335 0.336 0.449 0.343 0.481 0.006 0.009 0.004 3 F1 2 F2 F1 √ ** √ √ √ * * * * √ √ ** *** * + * + *** √ √ √ √ # of sig. results, p<.05, by column 4 1 1 2 6 6 F1 F1 F2 F1 √ 2 F2 ** * F1 1 F2 + ** + *** √ * √ + √ * * * √ √ * ** √ √ + √ √ * √ √ √ √ √ √ * √ √ * *** √ √ √ √ √ ** √ √ √ √ √ # of sig. results, p<.05, by column 0 0 0 5 3 5 F1 √ √ √ + √ 3 F2 √ √ √ √ F1 2 F2 * √ √ √ √ ** F1 ** √ ** ** 1 F2 *** ** √ ** + ** * *** *** √ + √ ** + √ ** ** √ √ √ + √ * + + ** # of sig. results, p<.05, by column 1 0 0 3 6 7 1 F2 3 F2 √ * √ √ √ √ √ √ √ + 1 F2 0.000 0.004 0.162 0.007 0.095 0.006 0.109 0.088 0.360 0.006 0.029 0.099 0.229 0.001 0.002 0.000 0.232 0.117 0.337 0.055 0.474 0.000 0.010 0.090 0.082 0.003 <.05 <.01 <.05 <.01 <.05 <.01 1 0 3 2 13 12 Spkr 33 √ * F1 3 F2 F1 2 F2 F1 1 F2 293 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # sig 0.338 0.024 0.596 0.071 0.501 0.128 0.010 0.140 0.018 0.088 0.035 0.215 0.288 0.166 0.048 0.866 0.372 0.342 0.247 <.05 <.01 <.05 2 0 7 Spkr 34 3 F1 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # sig F1 3 F1 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] <.01 2 0.000 0.014 0.057 0.001 0.354 F1 * *** *** *** √ * * *** √ √ + √ + √ ** *** * √ √ √ √ * *** √ √ √ ** * ** + *** √ √ √ √ * √ √ √ # of sig. results, p<.05, by column 0 2 3 4 3 7 √ √ 1 F2 F1 F2 0.000 0.001 0.000 0.372 0.235 0.001 0.049 0.004 0.220 0.027 0.015 0.005 0.029 0.001 0.214 0.392 0.016 0.194 0.018 0.000 0.031 0.000 0.027 0.313 0.000 0.245 <.01 <.05 <.01 4 14 7 2 F2 0.000 0.000 0.319 0.001 0.000 0.424 0.001 0.070 0.001 0.839 0.028 0.104 <.05 <.01 10 8 2 F2 0.482 0.161 0.668 0.233 0.474 0.430 0.046 0.117 0.354 0.363 0.032 0.363 0.282 0.211 0.006 0.456 0.053 0.752 0.094 0.926 0.302 0.113 0.035 <.05 <.01 <.05 1 0 7 Spkr 35 0.001 0.021 0.321 0.347 0.306 0.033 0.303 0.001 0.192 F1 F1 F2 0.068 0.173 0.008 0.008 0.845 0.050 0.135 0.033 0.342 0.006 0.315 0.177 0.313 0.003 0.007 0.006 0.362 0.155 0.117 0.038 0.004 0.424 0.437 0.000 0.249 0.227 0.177 0.058 0.000 0.225 0.096 0.023 0.000 0.852 0.357 0.043 0.381 0.000 0.316 0.471 0.171 0.200 0.045 0.087 <.05 <.01 <.05 <.01 <.05 <.01 3 F2 √ F1 2 F2 √ √ *** √ F1 2 F2 F1 1 F2 ** *** √ √ *** √ * ** √ * √ √ * * √ √ * ** * ** √ √ √ * √ √ ** √ * √ + *** * *** √ + √ * √ *** √ √ * √ # of sig. results, p<.05, by column 0 1 3 4 5 9 1 F2 √ + * * F1 3 F2 F1 1 F2 + √ ** ** √ + √ * √ ** √ √ √ ** ** ** √ √ √ * ** √ √ *** √ √ √ + *** √ + * *** √ √ * √ *** √ √ √ √ * + # of sig. results, p<.05, by column 294 # sig 0 Spkr 36 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # sig 0 3 F2 0.303 0.199 0.046 0.173 0.091 0.418 0.212 0.165 0.046 # sig 3 12 F2 8 F1 3 F2 F1 F2 F1 0.120 0.058 0.435 0.659 0.137 0.285 0.380 0.060 √ √ * √ 0.059 0.061 0.205 0.123 0.325 0.125 0.347 0.167 <.01 <.05 4 6 0.000 0.000 0.053 0.658 0.224 0.006 0.006 0.002 0.003 0.097 <.01 6 2 1 F2 F1 2 F2 0 F2 F1 3 F1 0 1 F2 0.018 0.016 0.000 0.016 0.000 0.006 0.657 0.167 0.063 0.001 0.035 0.354 0.003 0.134 0.002 0.752 0.275 0.305 0.024 0.097 0.232 0.020 0.004 0.115 0.146 0.344 0.337 0.477 0.003 0.165 0.389 0.403 0.487 0.001 0.405 0.431 0.029 0.014 0.095 0.001 0.935 0.052 0.000 0.231 0.001 0.417 0.449 0.002 0.362 <.05 <.01 <.05 <.01 <.05 <.01 4 2 9 5 8 6 Spkr 38 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] F1 0.000 0.439 0.066 0.003 0.007 0.196 0.339 0.491 0.047 0.159 0.001 0.558 0.043 0.019 <.05 <.01 <.05 2 0 7 F1 3 2 F1 Spkr 37 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] 4 F1 0.048 0.382 0.001 0.059 0.009 0.075 0.686 0.316 0.091 0.038 0.258 0.449 0.145 0.407 0.001 0.526 0.106 0.171 0.024 0.020 0.155 0.321 0.129 0.019 0.284 0.152 0.169 0.137 0.006 2 F1 2 F2 5 7 F1 1 F2 *** √ √ *** √ + √ *** + √ √ + ** √ + √ ** √ √ √ + ** √ * + √ ** √ *** √ √ ** √ * √ √ ** * √ + # of sig. results, p<.05, by column 1 1 7 0 0 6 F1 + √ √ √ * 3 F2 F1 2 F2 F1 1 F2 * * *** * *** ** √ √ + *** * √ ** √ ** √ √ √ * + √ * ** √ √ √ √ √ ** √ √ √ √ *** √ √ * * + *** √ + *** √ ** √ √ ** √ # of sig. results, p<.05, by column 4 0 3 6 2 6 1 F2 2 F2 F1 0.024 0.166 0.187 0.779 0.018 0.423 0.214 0.006 * + + √ √ 3 F2 √ √ √ √ √ F1 ** √ * ** * √ * √ 2 F2 + F1 √ √ * ** √ √ √ √ √ ** 1 F2 * √ √ √ * √ √ ** 295 [i]-[u] [^]-[u] # sig 0.205 <.05 1 Spkr 3 <.01 0 3 F1 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # sig 0.110 0.696 0.120 0.042 0.085 0.103 0.045 0.285 0.898 0.241 <.05 2 Spkr 5 # sig F1 F2 F1 3 F2 F1 F1 F2 F1 3 F2 0.414 0.360 0.240 0.003 0.000 0.000 0.110 0.858 √ √ √ F1 3 F2 0.000 0.018 0.000 0.000 0.042 0.000 0.006 0.012 <.05 <.01 11 8 F1 0.053 0.241 0.000 0.060 0.115 0.300 0.017 0.023 0.095 0.304 0.472 0.556 0.013 0.058 0.241 0.168 0.001 0.000 F2 0.000 0.000 0.067 0.080 0.042 F1 2 F2 F1 1 F2 √ √ √ *** ** *** √ √ √ * √ *** √ √ √ + √ * √ √ √ * * + * ** ** * √ √ √ √ + *** * √ + √ ** *** √ √ * ** * *** √ √ √ ** √ *** √ * √ ** # of sig. results, p<.05, by column 2 0 3 5 5 9 F1 2 F2 F1 1 F2 ** √ *** * √ *** √ √ √ √ * √ ** √ √ * √ + *** √ ** * * *** √ ** *** √ + √ ** * *** + ** √ ** * # of sig. results, p<.05, by column 0 3 1 6 4 7 1 F2 3 F2 F1 2 F2 √ √ √ * √ √ √ √ # of sig. results, p<.05, by column 1 0 5 1 2 4 1 F2 0.298 0.155 0.004 0.862 0.371 0.720 0.032 0.023 0.204 0.207 0.026 0.081 0.470 0.370 0.287 0.040 0.349 0.125 0.080 0.259 0.010 0.206 0.032 0.125 0.008 0.707 0.082 0.827 0.004 0.092 0.006 0.417 <.05 <.01 <.05 <.01 3 2 7 3 F1 F1 2 F2 √ 1 0.206 0.251 0.001 0.007 0.000 0.120 0.370 0.038 0.796 0.000 0.443 0.227 0.074 0.150 0.439 0.424 0.163 0.013 0.044 0.047 0.009 0.007 0.016 0.120 0.474 0.155 0.056 0.000 0.262 0.096 0.327 0.004 0.000 0.225 0.011 0.003 0.048 0.000 0.401 0.360 0.004 0.358 0.000 0.028 0.112 0.006 <.01 <.05 <.01 <.05 <.01 0 8 4 14 10 3 Spkr 7 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] 2 F2 F1 Context [i]-[a] [a]-[æ] [a]-[^] [a]-[u] [æ]-[i] [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] 0.810 0.102 0.130 0.019 0.134 0.467 0.105 0.244 <.05 <.01 <.05 <.01 6 2 6 3 + √ + + √ * √ √ √ √ * + √ √ √ F1 2 F2 √ √ √ F1 1 F2 *** *** * *** + + √ * + *** *** * 296 [æ]-[^] [æ]-[u] [i]-[^] [i]-[u] [^]-[u] # sig 0.324 0.036 0.105 0.300 0.350 0.162 0.076 0.000 0.000 0.552 0.135 0.009 0.000 0.000 0.151 0.070 0.108 <.05 <.01 <.05 <.01 <.05 1 0 5 4 14 0.000 0.000 0.000 0.000 0.003 <.01 11 √ * *** √ √ √ *** √ + *** *** *** √ √ ** *** *** *** √ + √ ** # of sig. results, p<.05, by column 0 1 1 4 6 8 297 APPENDIX E Vowel space graphs for second production experiment: 3 distance conditions, 4 vowel contexts, 21 speakers The figures beginning on the next page were generated from the numerical data summarized in Appendix C and show average context-vowel and distance-1, -2 and -3 target-vowel positions in vowel space for each of the 21 speakers who took part in the second production experiment. For each distance condition and each speaker, target vowels associated with each context pair are joined by a line which is dotted back if neither F1 nor F2 were significantly different (at the p<0.05 level) between contexts, solid green if only one of F1 or F2 was significantly different, and solid red if both were significantly different. Context vowels are joined by a solid blue line. Context and distance-1, -2 and -3 vowel pairs are labeled with progressively smaller text size. Each speaker’s gender is also given. The graphs for Speakers 3, 5 and 7 are given after those of the other speakers. For the context vowels only, the average position of the [^] vowel is also given for reference’s sake. In addition, since most speakers produced [u] as a diphthong, both the onset and offset of that context vowel are labeled as well, with “u-” indicating vowel onset and “u” indicating vowel offset. 298 299 300 301 302 303 304 305 306 307 308 309 APPENDIX F ANOVA results for second production experiment for individual vowel pairs 3 distance conditions, 4 vowel contexts [6 vowel pairs], 21 speakers Since there are six sets of vowel pairs associated with the four corner vowels, six sets of repeated-measures ANOVAs were run; in each such set, comparisons of normalized F1 and F2 values were made at each distance with vowel as factor, where in each set the vowel factor had only two levels. At bottom are results for the [i] - [a] contrast for the entire set of 38 speakers who took part in either the first or second experiment. [i]-[a] contrast Distance 1 Distance 2 Distance 3 F1 testing results F(1,17)=77.2, p<0.001 F(1,17)=36.5, p<0.001 F(1,17)=4.07, p=0.06 F2 testing results F(1,17)=130.4, p<0.001 F(1,17)=51.4, p<0.001 F(1,17)=0.48, p=0.50 [a]-[u] contrast Distance 1 Distance 2 Distance 3 F1 testing results F(1,17)=46.7, p<0.001 F(1,17)=16.5, p<0.001 F(1,17)=7.2, p<0.05 F2 testing results F(1,17)=13.6, p<0.01 F(1,17)=3.5, p=0.08 F(1,17)=0.47, p=0.50 [æ]-[u] contrast Distance 1 Distance 2 Distance 3 F1 testing results F(1,17)=1.47, p=0.24 F(1,17)=8.5, p<0.01 F(1,17)=0.21, p=0.65 F2 testing results F(1,17)=71.3, p<0.001 F(1,17)=0.004, p=0.95 F(1,17)=0.78, p=0.39 [a]-[æ] contrast Distance 1 Distance 2 Distance 3 F1 testing results F(1,17)=26.8, p<0.001 F(1,17)=1.83, p=0.19 F(1,17)=4.64, p<0.05 F2 testing results F(1,17)=72.2, p<0.001 F(1,17)=2.1, p=0.16 F(1,17)=0.03, p=0.86 [i]-[æ] contrast Distance 1 Distance 2 Distance 3 F1 testing results F(1,17)=61.6, p<0.001 F(1,17)=18.2, p<0.001 F(1,17)=0.31, p=0.59 F2 testing results F(1,17)=24.8, p<0.001 F(1,17)=21.9, p<0.001 F(1,17)=0.14, p=0.72 310 [i]-[u] contrast Distance 1 Distance 2 Distance 3 F1 testing results F(1,17)=11.5, p<0.01 F(1,17)=4.0, p=0.06 F(1,17)=0.15, p=0.71 F2 testing results F(1,17)=109.2, p<0.001 F(1,17)=42.6, p<0.001 F(1,17)=0.12, p=0.73 Whole-group results (Speakers 1 to 38) [i]-[a] contrast Distance 1 Distance 2 Distance 3 F1 testing results F(1,37)=137.5, p<0.001 F(1,37)=37.2, p<0.001 F(1,37)=0.98, p=0.33 F2 testing results F(1,37)=223.4, p<0.001 F(1,37)=54.8, p<0.001 F(1,37)=5.04, p<0.05 311 APPENDIX G Numerical results from main signing study The following tables show the average target-sign values in the x-, y- and zdimensions (right-left, front-back, and up-down, respectively), given in centimeters, for Signers 2, 3, 4 and 5 in each distance condition and sign context obtained in the main sign production experiment, together with the associated standard deviations. The sign CLOTHES, for which data were not given in the main text, was used as an additional context sign in order to enable comparisons of the behavior of the target signs between neutral-space contexts and a body location adjacent to neutral space, which is where CLOTHES is signed. (The sign is formed with a B handshape on both hands, with both thumbs flicking down twice against the body at the chest area.) Results for Signer 2 Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 15.18 [1.799] 6.952 [0.763] 10.08 [1.815] HAT (head) PANTS (waist) 14.70 [1.728] 14.61 [1.507] 6.78 [1.089] 8.01 [1.269] 10.16 [3.393] 10.32 [1.817] <red> (high) <green> (low) 17.62 [4.880] 16.82 [NA] 6.525 [2.555] 4.87 [NA] 6.315 [4.190] 6.39 [NA] BOSS (shoulder) CLOWN (nose) CUPCAKE (N.S.) 16.08 [1.692] 6.29 [1.053] 15.52 [1.497] 6.32 [0.844] 14.52 [1.077] 7.47 [0.674] Distance 1, WANT target: I WANT (X) 8.387 [1.417] 8.767 [1.784] 9.672 [2.306] Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 13.14 [1.692] 8.11 [0.370] 12.43 [1.128] 312 HAT (head) PANTS (waist) 14.30 [0.276] 14.36 [1.056] 8.602 [1.090] 8.362 [1.868] 13.34 [1.273] 11.68 [2.715] <red> (high) <green> (low) 14.1 [0.334] 12.32 [0.544] 9.036 [0.922] 7.575 [1.590] 11.71 [1.104] 12.14 [0.367] BOSS (shoulder) 14.38 [2.354] 7.586 [1.174] 11.71 [4.705] CLOWN (nose) 13.51 [0.428] 9 [1.713] 14.10 [0.258] CUPCAKE (N.S.) 14.63 [0.623] 8.216 [0.674] 12.63 [2.496] Distance 2, WANT target: I WANT FIND (X) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 12.02 [0.995] 9.934 [1.265] 12.60 [1.354] HAT (head) PANTS (waist) 12.12 [1.193] 12.35 [2.142] 9.254 [1.455] 9.63 [0.947] 13.23 [1.245] 10.98 [0.742] <red> (high) <green> (low) 13.29 [1.601] 12.58 [0.551] 8.912 [0.402] 9.39 [1.425] 13.31 [1.116] 12.83 [0.368] BOSS (shoulder) 12.70 [0.899] 9.268 [1.814] 12.89 [0.632] CLOWN (nose) 13.91 [0.540] 8.95 [0.296] 11.61 [1.080] CUPCAKE (N.S.) 12.93 [1.251] 8.58 [1.454] 11.79 [1.423] Distance 3, WANT target: I WANT GO FIND (X) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 12.34 [0.162] 10.36 [0.700] 12.29 [0.120] HAT (head) PANTS (waist) 12.17 [0.554] 12.38 [1.032] 9.9 [1.023] 11.23 [1.806] 13.65 [0.293] 12.93 [1.076] <red> (high) <green> (low) 11.81 [0.399] 12.88 [1.473] 9.14 [1.468] 8.957 [2.048] 12.54 [1.412] 13.03 [1.014] BOSS (shoulder) CLOWN (nose) CUPCAKE (N.S.) 11.51 [0.636] 12.44 [1.008] 11.79 [1.278] 9.845 [1.051] 8.556 [1.373] 9.477 [0.945] 11.86 [2.731] 13.55 [1.727] 12.45 [0.658] 313 Distance 4, WANT target: I WANT GO FIND OTHER (X) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 12.06 [0.200] 8.933 [1.914] 10.04 [3.266] HAT (head) PANTS (waist) 11.21 [0.658] 14.62 [7.313] 9.652 [1.122] 9.043 [1.145] 10.10 [2.203] 6.61 [4.291] <red> (high) <green> (low) 11.22 [2.269] 16.73 [11.85] 9.696 [1.623] 8.03 [2.078] 10.83 [2.337] 2.94 [7.170] 10.38 [1.653] 9.9 [0.494] 10.62 [1.252] 9.15 [1.386] 10.82 [1.988] 10.15 [1.350] Distance 1, WISH target: I WISH (X) 9.403 [1.326] 8.892 [3.732] 9.205 [2.387] BOSS (shoulder) CLOWN (nose) CUPCAKE (N.S.) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 9.705 [0.582] 11.38 [1.494] 12.63 [1.655] HAT (head) PANTS (waist) 10.13 [0.926] 10.39 [1.420] 10.81 [2.141] 11.02 [1.125] 10.07 [4.120] 12.35 [0.417] <red> (high) <green> (low) 10.67 [0.700] 11.19 [1.679] 11.47 [0.169] 11.66 [0.941] 12.27 [2.064] 12.96 [1.285] BOSS (shoulder) 9.23 [0.599] 11.98 [1.805] 13 [1.347] CLOWN (nose) 11.06 [1.782] 12.16 [0.619] 13.06 [1.684] CUPCAKE (N.S.) 10.42 [1.525] 11.81 [1.819] 12.15 [1.368] Distance 3, WISH target: I WISH GO FIND (X) Results for Signer 3 Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 6.631 [1.696] 21.58 [5.779] 14.26 [1.384] HAT (head) PANTS (waist) 6.036 [1.337] 6.576 [4.064] 16.73 [8.686] 17.29 [7.416] 13.83 [0.954] 14.51 [1.252] 314 <red> (high) <green> (low) BOSS (shoulder) CLOWN (nose) CUPCAKE (N.S.) 6.666 [2.129] 8.002 [1.677] 13.57 [5.921] 17.25 [4.108] 6.146 [1.045] 21.2 [6.474] 8.426 [2.014] 15.18 [7.709] 6.503 [1.325] 22.58 [3.681] Distance 1, WANT target: I WANT (X) 16.07 [2.580] 16.79 [1.880] 13.92 [2.694] 15.27 [1.191] 14.59 [1.811] Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 5.648 [1.226] 22.1 [2.668] 11.37 [3.142] HAT (head) PANTS (waist) 6.151 [2.999] 7.747 [1.989] 15.27 [7.912] 17.33 [5.538] 11.12 [4.836] 13.08 [0.730] <red> (high) <green> (low) 6.528 [0.754] 8.286 [2.188] 19.01 [2.571] 18.09 [7.209] 11.03 [1.723] 13.24 [2.573] BOSS (shoulder) 5.526 [1.831] 17.65 [1.983] 11.49 [3.269] CLOWN (nose) 6.773 [2.068] 17.28 [5.865] 11.19 [2.653] CUPCAKE (N.S.) 6.341 [1.753] 20.13 [3.062] 11.85 [2.784] Distance 2, WANT target: I WANT FIND (X) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 9.44 [3.311] 11.73 [6.961] 13.48 [2.175] HAT (head) PANTS (waist) 10.54 [2.137] 8.652 [1.779] 7.223 [4.695] 9.164 [4.215] 14.13 [3.707] 13.47 [2.127] <red> (high) <green> (low) 7.748 [2.126] 8.916 [1.585] 10.97 [3.869] 11.11 [3.110] 16.20 [1.366] 14.98 [1.937] BOSS (shoulder) 8.522 [3.799] 10.39 [5.756] 13.93 [3.350] CLOWN (nose) 8.863 [1.315] 8.583 [4.699] 13.53 [2.825] CUPCAKE (N.S.) 7.621 [2.748] 14.35 [6.106] 12.29 [2.767] Distance 3, WANT target: I WANT GO FIND (X) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] 315 CLOTHES (chest) 7.835 [1.093] 12.88 [5.215] 12.34 [1.697] HAT (head) PANTS (waist) 6.236 [2.005] 5.668 [1.675] 21.35 [2.992] 16.55 [9.110] 13.46 [1.107] 10.87 [1.140] <red> (high) <green> (low) 6.361 [1.871] 7.482 [0.788] 20.04 [7.698] 11.19 [5.016] 13.11 [2.792] 13.13 [2.127] BOSS (shoulder) 6.315 [2.095] 16.20 [6.012] 12.5 [1.505] CLOWN (nose) 7.896 [3.421] 12.09 [10.50] 11.98 [3.261] CUPCAKE (N.S.) 5.765 [0.824] 19.52 [4.559] 11.96 [3.188] Distance 4, WANT target: I WANT GO FIND OTHER (X) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 12.76 [0.614] 7.195 [1.166] 4.713 [1.836] HAT (head) PANTS (waist) 11.67 [1.286] 11.17 [1.331] 6.533 [1.616] 7.23 [0.858] 3.838 [0.908] 3.066 [0.900] <red> (high) <green> (low) 10.87 [1.410] 5.988 [10.21] 7.19 [1.034] 5.234 [3.264] 4.093 [1.355] 1.892 [2.236] 12.1 [1.340] 7.388 [1.055] 11.91 [2.176] 7.49 [1.156] 10.89 [2.589] 6.858 [0.797] Distance 1, WISH target: I WISH (X) 5.458 [2.290] 4.066 [2.299] 3.593 [2.803] BOSS (shoulder) CLOWN (nose) CUPCAKE (N.S.) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 14.54 [1.521] 3.822 [1.499] 9.642 [2.923] HAT (head) PANTS (waist) 14.05 [2.039] 14.74 [2.295] 4.018 [2.022] 4.23 [1.235] 9.241 [3.487] 10.99 [3.507] <red> (high) <green> (low) 12.52 [0.750] 15.77 [1.353] 4.575 [1.196] 5.218 [1.500] 7.385 [0.889] 11.51 [2.022] BOSS (shoulder) CLOWN (nose) CUPCAKE (N.S.) 14.08 [0.635] 13.88 [0.818] 14.78 [1.287] 3.671 [2.070] 4.616 [0.954] 2.956 [2.254] 9.541 [1.336] 9.283 [1.061] 9.896 [1.762] 316 Distance 3, WISH target: I WISH GO FIND (X) Results for Signer 4 Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 11.38 [2.283] 15.52 [3.684] 12.83 [1.047] HAT (head) PANTS (waist) 11.90 [0.448] 11.47 [1.994] 16.80 [4.846] 20.78 [0.947] 13.98 [1.536] 13.53 [1.999] <red> (high) <green> (low) 12.26 [0.927] 11.02 [0.134] 13.1 [4.536] 12.58 [7.877] 13.33 [2.214] 12.76 [4.157] BOSS (shoulder) CLOWN (nose) CUPCAKE (N.S.) 11.69 [3.259] 15.16 [5.469] 11.38 [2.482] 17.51 [6.150] 12.87 [1.652] 17.55 [3.481] Distance 1, WANT target: I WANT (X) 11.78 [2.849] 11.89 [2.596] 12.35 [1.663] Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 12.89 [0.944] 18.2 [1.255] 10.71 [2.566] HAT (head) PANTS (waist) 11.64 [4.663] 10.73 [2.668] 15.97 [4.042] 16.12 [2.236] 11.55 [2.981] 13.18 [2.449] <red> (high) <green> (low) 11.6 [2.753] 10.20 [4.102] 16.05 [3.138] 15.35 [2.281] 13.36 [1.819] 12.13 [2.700] BOSS (shoulder) 14.46 [3.198] 15.87 [4.208] 13.19 [2.020] CLOWN (nose) 10.17 [4.818] 16.81 [3.185] 10.38 [2.069] CUPCAKE (N.S.) 11.63 [3.304] 17.77 [2.359] 10.57 [1.958] Distance 2, WANT target: I WANT FIND (X) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 12.03 [2.545] 15.07 [7.123] 11.04 [2.926] HAT (head) PANTS (waist) 10.29 [1.550] 11.06 [NA] 9.005 [1.185] 8.93 [NA] 10.35 [1.375] 8.03 [NA] 317 <red> (high) <green> (low) 8.782 [0.802] 8.846 [0.918] 9.914 [4.435] 8.342 [1.409] 10.93 [2.157] 10.35 [1.475] BOSS (shoulder) 11.03 [1.161] 10.98 [4.668] 11.03 [1.254] CLOWN (nose) 10.23 [1.976] 14.97 [5.992] 11.56 [0.787] CUPCAKE (N.S.) 9.638 [1.094] 7.96 [1.340] 9.634 [0.541] Distance 3, WANT target: I WANT GO FIND (X) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 10.09 [2.692] 10.12 [6.933] 10 [2.133] HAT (head) PANTS (waist) 9.31 [1.598] 7.566 [1.668] 9.006 [2.716] 5.73 [2.475] 9.566 [0.539] 9.568 [1.034] <red> (high) <green> (low) 7.23 [2.321] 7.053 [2.410] 6.266 [2.911] 5.83 [2.041] 10.31 [1.365] 9.796 [0.506] BOSS (shoulder) 8.256 [1.312] 6.694 [2.374] 8.654 [0.964] CLOWN (nose) 9.128 [1.924] 7.218 [1.082] 10.27 [1.881] CUPCAKE (N.S.) 10.05 [2.209] 8.745 [5.466] 8.81 [1.708] Distance 4, WANT target: I WANT GO FIND OTHER (X) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 5.823 [2.962] 11.89 [2.385] 7.588 [3.188] HAT (head) PANTS (waist) 3.528 [1.375] 9.265 [1.393] 12.10 [1.851] 8.63 [0.042] 9.16 [2.996] 7.565 [1.138] <red> (high) <green> (low) 7.628 [2.698] 18.18 [9.876] 11.55 [1.874] 9.077 [1.207] 8.811 [5.189] 7.447 [0.514] 8.732 [3.621] 12.96 [2.650] 5.065 [2.628] 12.20 [1.717] 5.836 [2.015] 12.65 [1.202] Distance 1, WISH target: I WISH (X) 4.248 [2.562] 8.197 [3.098] 7.262 [3.436] BOSS (shoulder) CLOWN (nose) CUPCAKE (N.S.) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] 318 CLOTHES (chest) 8.704 [3.255] 8.82 [1.308] 4.306 [1.257] HAT (head) PANTS (waist) 7.624 [2.872] 7.28 [2.311] 8.95 [1.471] 9.33 [1.910] 4.174 [0.887] 5.695 [2.010] <red> (high) <green> (low) 7.795 [2.892] 6.821 [1.650] 9.425 [0.968] 8.305 [2.297] 4.8 [2.347] 5.841 [2.855] BOSS (shoulder) 8.065 [3.352] 9.712 [1.415] 4.002 [1.225] CLOWN (nose) 9.051 [3.892] 9.603 [2.560] 3.823 [2.584] CUPCAKE (N.S.) 9.43 [3.458] 9.772 [1.206] 5.056 [3.144] Distance 3, WISH target: I WISH GO FIND (X) Results for Signer 5 Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 17.68 [2.919] 12.70 [2.040] 6.617 [0.988] HAT (head) PANTS (waist) 14.58 [1.278] no data 13.54 [2.109] no data 8.626 [0.819] no data <red> (high) <green> (low) 15.77 [2.174] 16.39 [1.214] 15.33 [2.455] 16.33 [3.318] 7.812 [1.600] 7.717 [1.612] BOSS (shoulder) CLOWN (nose) CUPCAKE (N.S.) 15.24 [3.234] 11.72 [2.436] 13.51 [2.008] 15.15 [2.778] 10.55 [2.229] 10.84 [2.363] Distance 1, WANT target: I WANT (X) 9.752 [3.912] 7.46 [0.381] 8.853 [1.783] Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 13.48 [1.840] 10.03 [1.918] 10.14 [1.833] HAT (head) PANTS (waist) 13.59 [3.083] 14.74 [1.837] 11.22 [1.167] 13.04 [2.678] 11.75 [3.224] 10.34 [1.488] <red> (high) <green> (low) 14.47 [1.821] 14.71 [1.137] 11.08 [2.340] 12.87 [1.443] 13.76 [3.983] 8.816 [2.595] 319 BOSS (shoulder) 11.55 [4.187] 10.13 [3.657] 12.57 [4.333] CLOWN (nose) 13.67 [1.486] 12.41 [2.755] 9.905 [2.406] CUPCAKE (N.S.) 12.21 [1.978] 9.114 [2.561] 11.49 [3.242] Distance 2, WANT target: I WANT FIND (X) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 12.31 [NA] 7.07 [NA] 9.41 [NA] HAT (head) PANTS (waist) 13.05 [1.740] 12.34 [1.767] 10.16 [3.001] 13.59 [2.361] 11.66 [4.733] 12.46 [5.289] <red> (high) <green> (low) 11.18 [0.987] 13.61 [NA] 10.24 [0.710] 7.9 [NA] 15.88 [1.347] 11.66 [NA] BOSS (shoulder) 12.81 [1.786] 11.82 [2.965] 10.69 [4.033] CLOWN (nose) 13.32 [2.807] 11.47 [3.012] 10.55 [2.319] CUPCAKE (N.S.) 11.55 [0.535] 8.39 [0.956] 10.72 [2.190] Distance 3, WANT target: I WANT GO FIND (X) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 10.36 [1.507] 9.82 [2.246] 9.694 [2.210] HAT (head) PANTS (waist) 10.47 [0.960] 13.44 [3.735] 11.21 [2.243] 11.84 [2.563] 12.82 [2.929] 9.98 [1.921] <red> (high) <green> (low) 10.46 [1.566] 10.95 [1.205] 10.82 [2.467] 11.17 [2.626] 14.64 [4.044] 11.96 [4.730] BOSS (shoulder) 14.60 [2.784] 12.74 [1.948] 9.606 [2.834] CLOWN (nose) 12.13 [2.387] 10.17 [3.934] 11.01 [3.137] CUPCAKE (N.S.) 10.49 [0.445] 8.885 [4.688] 8.09 [0.763] Distance 4, WANT target: I WANT GO FIND OTHER (X) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 14.55 [4.051] 11.45 [1.205] 7.283 [2.260] 320 HAT (head) PANTS (waist) 12.25 [0.991] 14.95 [NA] 13.17 [1.443] 14.94 [NA] 6.874 [1.132] 6.18 [NA] <red> (high) <green> (low) 12.79 [1.286] 18.78 [5.359] 11.25 [2.207] 9.47 [2.542] 5.3 [1.658] 6.275 [2.455] 13.15 [1.017] 13.77 [1.271] 10.98 [1.527] 12.23 [1.751] 12.26 [1.927] 13.09 [2.794] Distance 1, WISH target: I WISH (X) 6.986 [1.940] 6.566 [2.089] 6.87 [2.531] BOSS (shoulder) CLOWN (nose) CUPCAKE (N.S.) Context x (right-left): Average [SD] y (front-back): Average [SD] z (up-down): Average [SD] CLOTHES (chest) 11.04 [0.985] 9.143 [1.981] 7.113 [3.266] HAT (head) PANTS (waist) 11.32 [0.665] 12.1 [0.268] 10.32 [2.340] 11.01 [0.296] 6.168 [2.436] 3.78 [0.325] <red> (high) <green> (low) 12.38 [1.475] 11.57 [0.781] 11.68 [0.862] 9.236 [2.053] 3.817 [1.688] 7.438 [1.748] BOSS (shoulder) 11.88 [0.829] 12.72 [1.241] 6.876 [1.770] CLOWN (nose) 12.21 [0.799] 12.04 [2.474] 5.425 [1.661] CUPCAKE (N.S.) 10.96 [1.447] 11.40 [1.886] 4.662 [2.036] Distance 3, WISH target: I WISH GO FIND (X) 321 BIBLIOGRAPHY Adank, P., Smits, R., & van Hout, R. (2004). A comparison of vowel normalization procedures for language variation research. Journal of the Acoustical Society of America, 116, 3099-3107. Aitchison, J. (1981). Language change: Progress or decay? London: Fontana. Alfonso, P. J., & Baer, T. (1982). Dynamics of vowel articulation. Language and Speech, 25, 151–173. Alho, K., Woods, D. L., Algazi, A., & Näätänen, R. (1992). Intermodal selective attention II: effects of attentional load on processing auditory and visual stimuli in central space. Electroencephalogr Clin Neurophysiol, 82, 356–68. Arbib, M. A. (2005). Interweaving protosign and protospeech: Further developments beyond the mirror. Interaction Studies: Social Behaviour and Communication in Biological and Artificial Systems, 6, 145-71. Arbib, M. A., & Rizzolatti, G. (1997). Neural expectations: A possible evolutionary path from manual skills to language. Communication and Cognition, 29, 393-423. Battison, R. (1978). Lexical borrowing in American Sign Language. Silver Spring: Linstok Press. Bayley, R., Lucas, C., & Rose, M. (2002). Phonological variation in American Sign Language: The case of 1 handshape. Language Variation and Change, 14, 19-53. Beddor, P.S., Harnsberger, J.D. & Lindemann, S. (2002). Language-specific patterns of vowel-to-vowel coarticulation: Acoustic structures and their perceptual correlates. Journal of Phonetics, 20, 591-627. Bell-Berti, F., Baer, T., Harris, K. S., & Niimi, S. (1979). Coarticulatory effects of vowel quality on velar function. Phonetica, 36, 187-193. Benguerel, A.-P., & Cowan, H. A. (1974). Coarticulation of upper lip protrusion in French. Phonetica, 30, 41-55. Boersma, P. & Weenink, D. (2005). Praat: Doing phonetics by computer [computer program]. Available from http://www.praat.org. Bradley, T. (2006). Spanish complex onsets and the phonetics-phonology interface. In F. Martínez-Gil & S. Colina (Eds.), Optimality-Theoretic Studies in Spanish Phonology, 15-38. Amsterdam: John Benjamins. 322 Bradley, T. (2007). Morphological derived-environment effects in gestural coordination: A case study of Norwegian clusters. Lingua, 117, 950-985. Bradley, T., & Schmeiser, B. (2003). On The Phonetic Reality of /r/ in Spanish Complex Onsets. In P. M. Kempchinsky & C.-E. Piñeros (Eds.), Theory, Practice, and Acquisition: Papers from the 6th Hispanic Linguistics Symposium, 1-20. Somerville, MA: Cascadilla Press. Brentari, D. (1998). A Prosodic Model of Sign Language Phonology. Cambridge, MA: MIT Press. Brentari, D., & Crossley, L. (2002). Prosody on the hands and face: Evidence from American Sign Language. Sign Language & Linguistics, 5, 105-130. Brentari, D., & Goldsmith, J. A. (1993). H2 as secondary licenser. In G. R. Coulter (Ed.), Current Issues in ASL Phonology, volume 3 of Phonetics and Phonology. San Diego: Academic Press. Browman, C., & Goldstein, L. (1986). Towards an articulatory phonology. Phonology, 3, 219-252. Browman, C., & Goldstein, L. (1992). “Targetless” schwa: An articulatory analysis. In G. J. Docherty & R. Ladd (Eds.), Papers in Laboratory Phonology II. Gesture, Segment, Prosody, 26-56. Cambridge: Cambridge University Press. Butcher, A., & Weiher, E. (1976). An electropalatographic investigation of coarticulation in VCV sequences. Journal of Phonetics, 4, 59-74. Celsis, P., Boulanouar, K., Doyon, B., Ranjeva, J. P., Berry, I., Nespoulous, J. L., et al. (1999). Differential fMRI responses in the left posterior superior temporal gyrus and left supramarginal gyrus to habituation and change detection in syllables and tones. NeuroImage, 9, 135-44. Cheek, D. A. (2001). The Phonetics and Phonology of Handshape in American Sign Language. Academic dissertation. University of Texas at Austin. Cho, T. (1999). Effect of prosody on vowel-to-vowel coarticulation in English. Proceedings of the XIVth International Congress of Phonetic Sciences, 459-462. Cho, T. (2004). Prosodically conditioned strengthening and vowel-to-vowel coarticulation in English. Journal of Phonetics, 32, 141-176. Choi, J. D., & Keating, P. (1991). Vowel-to-vowel coarticulation in three Slavic languages. University of California Working Papers in Phonetics, 78, 78-86. Chomsky, N., & Halle, M. (1968). The Sound Pattern of English. Harper and Row. 323 Clements, G. N. (1985). The geometry of phonological features. Phonology, 2, 225252. Corina, D. (1990). Handshape assimilations in hierarchical phonological representations. In C. Lucas (Ed.), Sign Language Research: Theoretical Issues, 27-49. Washington: Gallaudet University Press. Corina, D. P., & Hildebrandt, U. C. (2002). Psycholinguistic investigations of phonological structure in ASL. In R. P. Meier, K. Cormier, et al. (Eds.), Modality and Structure in Signed and Spoken Language, 88-111. New York: Cambridge University Press. Corina, D. P., & Sagey, E. (1989). Predictability in ASL handshapes and handshape sequences, with implications for features and feature geometry. Ms., University of California at San Diego. Crasborn, O., & Klooij, E. van der. (1997). Relative Orientation in Sign Language Phonology. In J. Coerts & H. de Hoop (Eds.), Linguistics in the Netherlands, 37-48. Amsterdam: John Benjamins. Czigler, I. (2007). Visual mismatch-negativity: violation of non-attended environmental regularities. Journal of Psychophysiology, 21, 224–230. Daniloff, R., & Hammarberg, R. (1973). On defining coarticulation. Journal of Phonetics, 1, 239-48. Delorme, A. & Makeig, S. (2004), EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis, Journal of Neuroscience methods [computer program], in press, found at http://sccn.ucsd.edu/eeglab/download/eeglab_jnm03.pdf. Donchin, E. (1981). Surprise!...Surprise? Psychophysiology, 18, 493-513. Emmorey, K., McCullough, S., & Brentari, D. (2003). Categorical perception in American Sign Language. Language and Cognitive Processes, 18, 21–46. Farnetani, E., & Recasens, D. (1999). Coarticulation models in recent speech production theories. In W. J. Hardcastle and N. Hewlett (Eds.), Coarticulation: Theory, Data and Techniques, 31-65. Cambridge: Cambridge University Press. Flemming, E. (1997). Phonetic detail in phonology: Towards a unified account of assimilation and coarticulation. In K. Suzuki and D. Elzinga (Eds.), Proceeding volume of the 1995 Southwestern Workshop in Optimality Theory (SWOT), University of Arizona, Tucson, AZ. 324 Fletcher, J. (2004). An EMA/EPG study of vowel-to-vowel articulation across velars in Southern British English. Clinical Linguistics & Phonetics, 18, 577-592. Fowler, C. A. (1980). Coarticulation and theories of extrinsic timing. Journal of Phonetics, 8, 113-133. Fowler, C. A. (1981). Production and perception of coarticulation among stressed and unstressed vowels. Journal of Speech and Hearing Research, 24, 127-139. Fowler, C. A. (1983). Converging sources of evidence on spoken and perceived rhythms in speech: Cyclic productions of vowels in monosyllabic stress feet. Journal of Experimental Psychology: General, 112, 386–412. Fowler, C. A., & Brancazio, L. (2000). Coarticulation resistance of American English consonants and its effects on transconsonantal vowel-to-vowel coarticulation. Language and Speech, 43, 1-41. Fowler, C. A., & Saltzman, E. (1993). Coordination and coarticulation in speech production. Language and Speech, 36, 171-195. Fowler, C. A., & Smith, M. (1986). Speech perception as “vector analysis”: An approach to the problem of segmentation and invariance, In J. S. Perkell & D. H. Klatt, (Eds.), Invariance and variability of speech processes, 123-136. Hillsdale, NJ: Erlbaum. Fowler, C. A., & Turvey, M. T. (1980). Immediate compensation in bite-block speech. Phonetica, 37, 306-326. Frenck-Mestre, C., Meunier, C., Espesser, R., Daffner, K., & Holcomb, P. (2005). Perceiving nonnative vowels: The effect of context on perception as evidenced by event-related brain potentials. Journal of Speech, Language, and Hearing Research, 48, 1-15. Gafos, A. (2002). A Grammar of Gestural Coordination. Natural Language and Linguistic Theory, 20, 269-337. Garrido, M., Kilner, J., Stephan, K. & Friston, K. (in press). The mismatch negativity: A review of underlying mechanisms. Clinical Neurophysiology. Gay, T. (1974). A cinefluorographic study of vowel production. Journal of Phonetics, 2, 255-266. Gay, T. (1977). Articulatory movements in VCV sequences. Journal of the Acoustical Society of America, 62, 183-193. 325 Gazzaniga, M. S., Ivry, R. B., & Mangun, G. R. (1998). Cognitive neuroscience: The biology of the mind. New York: Norton. Gerstman, L. H. (1968). Classification of self-normalized vowels. IEEE Transactions on Audio and Electroacoustics, ACC-16, 78-80. Giard, M. H., Perrin, F., Pernier, J., & Bouchet, P. (1990). Brain generators implicated in the processing of auditory stimulus deviance: a topographic event-related potential study. Psychophysiology, 27, 627-40. Gilliéron, J. (1918). Généalogie des mots qui désignent l’abeille d’après l’Atlas linguistique de la France. Paris: Champion. Goldsmith, J. (1976). Autosegmental phonology. Doctoral dissertation, MIT, Cambridge, MA. [Published 1979, New York: Garland Press] Gomot, M., Giard, M.-H., Roux, S., Barthelemy, C., & Bruneau, N. (2000). Maturation of frontal and temporal components of mismatch negativity (MMN) in children. Neuroreport, 14, 3109-12. Gordon, M. (1999). The phonetics and phonology of non-modal vowels: A crosslinguistic perspective. Berkeley Linguistics Society, 24, 93-105. Gourevitch, V., & Galanter, E. (1967). A significance test for one-parameter isosensitivity functions. Psychometrika, 32, 25-33. Grosvald, M. (2006). Vowel-to-vowel coarticulation: Length and palatalization effects and perceptibility. Unpublished manuscript, University of California at Davis. Grosvald, M., & Corina, D. (in press). Exploring the limits of long-distance vowel-tovowel coarticulation. Proceedings, Workshop of the Association Francophone de la Communication Parlée: “Coarticulation: Cues, Direction and Representation.” Montpellier, France; December 7, 2007. Hall, N. (2003). Gestures and Segments: Vowel Intrusion as Overlap. Doctoral dissertation. Amherst, MA: University of Massachusetts, Amherst. Hammarberg, R. (1976). The metaphysics of coarticulation. Journal of Phonetics, 4, 353-63. Hardcastle, W. J., & Hewlett, N. (1999). Coarticulation: Theory, Data and Techniques. Cambridge: Cambridge University Press. Hari, R., Hämäläinen, M., Ilmoniemi, R., Kaukoranta, E., Reinikainen, K., Salminen, J., et al. (1984). Responses of the primary auditory cortex to pitch changes in a sequence of tone pips: Neuromagnetic recordings in man. Neurosci Lett, 50, 127-32. 326 Heid, S. & Hawkins, S. (2000). An acoustical study of long-domain /r/ and /l/ coarticulation. Proceedings of the 5th seminar on speech production: Models and data, 77-80. Kloster Seeon, Bavaria, Germany. Hertrich, I., & Ackermann, H. (1995). Coarticulation in slow speech: durational and spectral analysis. Language and Speech, 38, 159-187. Hildebrandt, U., & Corina, D. (2002). Phonological similarity in American Sign Language. Language and Cognitive Processes, 17, 593-612. Huffman, M. K. (1986). Patterns of coarticulation in English. University of California Working Papers in Phonetics, 63, 26-47. Hussein, L. (1990). VCV coarticulation in Arabic. Ohio State University Working Papers in Linguistics, 38, 88-104. Jääskeläinen, I. P., Ahveninen, J., Bonmassar, G., Dale, A. M., Ilmoniemi, R. J., Levänen, S., et al. (2004). Human posterior auditory cortex gates novel sounds to consciousness. Proc Natl Acad Sci USA, 101, 6809-14. Johnson, K. (2003). Acoustic and auditory phonetics. Malden, MA: Blackwell Publishing. Keating, P. (1985). CV phonology, experimental phonetics, and coarticulation. UCLA Working Papers in Phonetics, 62, 1–13. Keating, P. (1988). Underspecification in phonetics. Phonology, 5, 275-292. Keating, P. (1990a). Phonetic representations in a generative grammar. Journal of Phonetics, 18, 321-334. Keating, P. (1990b). The window model of coarticulation: Articulatory evidence. In J. Kingston & M. E. Beckman (Eds.), Papers in Laboratory Phonetics I: Between the Grammar and the Physics of Speech, 451-470. Cambridge University Press. Kekoni, J., Hämäläinen, H., Saarinen, M., Gröhn, J., Reinikainen, K., Lehtokoski, A., et al. (1997). Rate effect and mismatch responses in the somatosensory system: ERPrecordings in humans. Biol Psychol, 46, 125-42. Klima, D., & Bellugi, U. (1979). The Signs of Language. Cambridge MA: Harvard University Press. Kozhevnikov, V., & Chistovich, L. (1965). Speech: Articulation and perception. Translation 30, 543. Washington, DC: Joint Publications Research Service. 327 Krauel, K., Schott, P., Sojka, B., Pause, B. M. & Ferstl, R. (1999). Is there a mismatch negativity analogue in the olfactory event-related potential? J Psychophysiol, 13, 49-55. Kuehn, D., & Moll, K. (1972). Perceptual effects of forward coarticulation. Journal of Speech and Hearing Research, 15, 654-664. Kühnert, B., & Nolan, F. (1999). The origin of coarticulation. In W. J. Hardcastle and N. Hewlett (Eds.), Coarticulation: Theory, Data and Techniques, 7-30. Cambridge: Cambridge University Press. Kujala, T., Tervaniemi, M. & Schröger, E. (2007). The mismatch in cognitive and clinical neuroscience: theoretical and methodological considerations. Biol Psychol, 74, 1-19. Kutas, M., & Hillyard, S. A. (1980). Reading senseless sentences: Brain potentials reflect semantic incongruity. Science, 207, 203-205. Ladefoged, P., & Broadbent, D. (1957). Information conveyed by vowels. Journal of the Acoustical Society of America 29, 98-104. Lehiste, I., & Shockey, L. (1972). On the perception of coarticulation effects in English VCV syllables. Journal of Speech and Hearing Research, 15, 500-506. Liberman, A. M., Harris, K. S., Kinney, J. A., & Lane, H. (1951). “The discrimination of relative onset time of the components of certain speech and nonspeech patterns,’’ J. Exp. Psychology, 61, 379–388. Liddell, S. (1990). Structures for representing handshape and local movement at the phonemic level. In S.D. Fischer & P. Siple (Eds.), Theoretical Issues in Sign Language Research Vol. 1, 37–65. Chicago: University of Chicago Press. Liddell, S., & Johnson, R. (1989[1985]). American Sign Language: The phonological base. Sign Language Studies, 64, 197-277. (Originally distributed as manuscript) Liddell, S., & Johnson, R. (1989). American Sign Language: The Phonological Base. Sign Language Studies, 64, 195-277. Lindblom, B., Lubker, J., & Gay, T. (1979). Formant frequencies of some fixedmandible vowels and a model of speech-motor programming by predictive simulation. Journal of Phonetics, 7, 147-161. Lucas, C., Bayley, R., Rose, M., & Wulf, A. (2002). Location Variation in American Sign Language. Sign Language Studies, 2, 407-440. Luck, S. J. (2005). An introduction to the event-related potential technique. Cambridge, MA: MIT Press. 328 Macmillan, N. A., & Creelman, C. D. (1991). Detection Theory: A User's Guide. New York: Cambridge University Press. Maekawa, T., Goto, Y., Kinukawa, N., Taniwaki, T., Kanba, S., & Tobimatsu, S. (2005). Functional characterization of mismatch negativity to a visual stimulus. Clin Neurophysiol, 116, 2392–402. Magen, H. S. (1997). The extent of vowel-to-vowel coarticulation in English. Journal of Phonetics, 25, 187-205. Mandel, M. (1981). Phonotactics and morphophonology in American Sign Language. Doctoral dissertation, University of California at Berkeley. Manuel, S. Y. (1990). The role of contrast in limiting vowel-to-vowel coarticulation in different languages. Haskins Laboratories Status Report on Speech Research, 103-104, 1-20. Manuel, S. Y., & Krakow, R. A. (1984). Universal and language particular aspects of vowel-to-vowel coarticulation. Haskins Laboratories Status Report on Speech Research, 77-78, 69-78. Martin, J.G., & Bunnell, H.T. (1982). Perception of anticipatory coarticulation effects in vowel-stop consonant-vowel sequences. Journal of Experimental Psychology: Human Perception and Performance, 8, 473-488. Matthies, M., Perrier, P., Perkell, J. S., & Zandipour, M. (2001). Variation in anticipatory coarticulation with changes in clarity and rate. J Speech Lang Hear Res, 44, 340–353. Mauk, C. (2003). Undershoot in Two Modalities: Evidence from Fast Speech and Fast Signing. Academic dissertation. University of Texas at Austin. Modarresi, G., Sussman, H., Lindblom, B., & Burlingame, E. (2004). An acoustic analysis of the bidirectionality of coarticulation in VCV utterances. Journal of Phonetics, 32, 291-312. Moll, K. L., & Daniloff, R. G. (1971). Investigation of the timing of velar movements during speech. Journal of the Acoustical Society of America, 50, 678-684. Näätänen, R. (1979). Orienting and Evoked Potentials. In H. D. Kimmel, E. H. van Olst, J. F. Orlebeke (Eds.), The orienting reflex in humans, 61-75. New Jersey: Erlbaum. Näätänen, R. (1985). Selective attention and stimulus processing: reflections in eventrelated potentials, magnetoencephalogram, and regional cerebral blood flow. In M. I. 329 Posner, O. S. M. Marin (Eds.), Attention and performance 1985; vol. XI, 355-73. Hillsdale, NJ: Erlbaum. Näätänen, R. (1992). Attention and brain function. Hillsdale, NJ: Lawrence Erlbaum. Näätänen, R. (2001). The perception of speech sounds by the human brain as reflected by the Mismatch Negativity and its magnetic equivalent. Psychophysiology, 38, 1–21. Näätänen, R., Gaillard, A. W. K., & Mäntysalo, S. (1978). Early selective-attention effect on evoked potential reinterpreted. Acta Psychologica, 42, 313-329. Näätänen, R., Paavilainen, P., Rinne, T. & Alho, K. (2007). The mismatch negativity (MMN) in basic research of central auditory processing: A review. Clin Neurophysiol, 118, 2544-2590. Näätänen, R., & Winkler, I. (1999). The concept of auditory stimulus representation in cognitive neuroscience. Psychological Bulletin, 125, 826-859. Nespor, M., & Sandler, W. (1999). Prosody in Israeli Sign Language. Language & Speech, 42, 143-176. Ochiai, K., & Fujimura, O. (1971). Vowel identification and phonetic contexts. Reports from the University of Electrocommunications (Tokyo), 22, 103-111. Ohala, J. (1974). Experimental historical phonology. In J. Anderson & C. Jones (Eds.), Historical linguistics, II: Theory and description in phonology. Amsterdam: NorthHolland. Ohala, J. (1981). The listener as a source of sound change. In M.F. Miller (Ed.), Papers from the parasession on language behavior. Chicago: Chicago Linguistic Association. Ohala, J. (1994). Towards a universal, phonetically-based, theory of vowel harmony. ICSLP-1994, 491-494. Öhman, S. E. G. (1966). Coarticulation in VCV utterances: Spectrographic measurements. Journal of the Acoustical Society of America, 39, 151-168. Osterhout, L. & Holcomb, P. J. (1992). Event-related brain potentials elicited by syntactic anomaly. Journal of Memory & Language, 31, 785-806. Padgett, J. (1995). Feature classes. In J. Beckman, L. W. Dickey & S. Urbanczyk (Eds.), Papers in Optimality Theory. Amherst: GLSA. Parush, A., Ostry, D. J., & Munhall, K. G. (1983). A kinematic study of lingual coarticulation in VCV sequences. Journal of the Acoustical Society of America, 74, 1115-1125. 330 Pater, J. (1999). Austronesian nasal substitution and other NC effects. In R. Kager, H. van der Hulst & W. Zonnevelt (Eds.), The Prosody-Morphology Interface, 310-343. Cambridge: Cambridge University Press. Pazo-Alvarez, O., Cadaveira, F., & Amenedo, E. (2003). MMN in the visual modality: a review. Biol Psychol, 63, 199–236. Perkell, J. S., Guenther, F. H., Lane, H., Matthies, M. L., Stockmann, E., Tiede, M., & Zandipour, M. (2004). The distinctness of speakers’ productions of vowel contrasts is related to their discrimination of the contrasts. Journal of the Acoustical Society of America, 116, 2338–2344. Perlmutter, D. (1991). Prosodic vs. segmental structure: A moraic theory of American Sign Language syllable structure. Ms., University of California, San Diego. Prince, A., & Smolensky, P. (1993). Optimality Theory: constraint interaction in generative grammar. Ms, Rutgers University & University of Colorado, Boulder. Przezdziecki, M. (2000). Vowel harmony and vowel-to-vowel coarticulation in three dialects of Yoruba. Working Papers of the Cornell Phonetics Laboratory, 13, 105-124. Purcell, E. T. (1979). Formant frequency patterns in Russian VCV utterances. Journal of the Acoustical Society of America, 66, 1691-1702. Recasens, D. (1984). Vowel-to-vowel coarticulation in Catalan VCV sequences. Journal of the Acoustical Society of America, 76, 1624-1635. Recasens, D. (1989). Long range coarticulation effects for tongue dorsum contact in VCVCV sequences. Speech Communication, 8, 293-307. Recasens, D. (2002). An EMA study of VCV coarticulatory direction. Journal of the Acoustical Society of America, 111, 2828-2841. Recasens, D., Pallarès, M. D., & Fontdevila, J. (1997). A model of lingual coarticulation based on articulatory constraints. Journal of the Acoustical Society of America, 102, 544-561. Rinne, T., Alho, K., Ilmoniemi, R. J., Virtanen, J., & Näätänen, R. (2000). Separate time behaviors of the temporal and frontal mismatch negativity sources. NeuroImage, 12, 14-9. Rizzolatti, G., Fadiga L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, 131-41. 331 Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends in Neurosciences, 21, 188-94. Sagey, E. (1986). The Representation of Features and Relations in Non-Linear Phonology. Doctoral dissertation, MIT, Cambridge, MA. Sallinen, M., Kaartinen, J. & Lyytinen, H. (1994). Is the appearance of mismatch negativity during stage 2 sleep related to the elicitation of K-complex? Electroencephalogr Clin Neurophysiol, 91, 140-8. Sandler, W. (1986). The spreading hand autosegment of American Sign Language. Sign Language Studies, 50, 1-28. Sandler, W. (1987). Sequentiality and simultaneity in American Sign Language. Doctoral dissertation, University of Texas. Sandler, W. (1989). Phonological representation of the sign: Linearity and nonlinearity in American Sign Language. Dordrecht: Foris. Sandler, W. (1995). Markedness in the handshapes of sign language: A componential analysis. in J. van de Weijer & H. van der Hulst (Eds.), Leiden in Last: Holland Institute of Linguistics Phonology Papers, 369-399. The Hague: Holland Academie Graphics. Sandler, W. (1996). Representing handshapes. International Journal of Sign Linguistics, 1, 115-158. Sandler, W., & Lillo-Martin, D. (2006). Sign Language and Linguistic Universals. Cambridge, UK: Cambridge University Press. Scarborough, R. A. (2003). Lexical confusability and degree of coarticulation. Proceedings of the 29th Meeting of the Berkeley Linguistics Society, February 14-17, 2003. Shaffer, L. H. (1982). Rhythm and timing in skill. Psychological Review, 89, 109-122. Siedlecki, T. J., & Bonvillian, J. D. (1998). Homonymy in the lexicons of young children acquiring American Sign Language. Journal of Psycholinguistic Research, 27, 47–68. Stokoe, W. (1960). Sign language structure: An outline of the visual communication systems of the American Deaf. Studies in Linguistics, Occasional Papers, 8. Silver Spring, MD: Linstok Press. Tales, A., Newton, P., Troscianko, T., & Butler, S. (1999). Mismatch negativity in the visual modality. Neuroreport ,10, 3363–3367. 332 Tervaniemi, M., Medvedev, S. V., Alho, K., Pakhomov, S. V., Roudas, M. S., van Zuijen, T. L., et al. (2000). Lateralized automatic auditory processing of phonetic versus musical information: a PET study. Human Brain Map, 10, 74-9. Tiitinen, H., May, P., Reinikainen, K., & Näätänen, R. (1994). Attentive novelty detection in humans is governed by pre-attentive sensory memory. Nature, 372, 90-92. van der Hulst, H. (1995). The composition of handshapes. In Working papers in linguistics 23. Department of Linguistics, University of Trondheim, Dragvoll. van der Kooij, E. (2002). Phonological Categories in Sign Language of the Netherlands. The Role of Phonetic Implementation and Iconicity. Doctoral dissertation, Universiteit Leiden, Leiden. van Oostendorp, M. (2003). Schwa in Phonological Theory. In L. Cheng & R. Sybesma (Eds.), The Second Glot International State-of-the-Article Book. The Latest in Linguistics. (Studies in Generative Grammar 61), 431-461. Berlin: Mouton de Gruyter. Vogler, C. & Metaxas, D. (1997). Adapting hidden Markov models for ASL recognition by using three-dimensional computer vision methods. Proc. IEEE Int. Conf. Systems, Man and Cybernetics, Orlando, FL, 1997, 156–161. West, P. (1999). The extent of coarticulation of English liquids: An acoustic and articulatory study. Proceedings of the International Conference of Phonetic Sciences, 1901-4. San Francisco. 333 Vita Michael Grosvald was born and raised in northern California. He majored in Mathematics and Linguistics and minored in Psychology as an undergraduate at the University of California at Davis, then entered a doctoral program in Mathematics at UC Berkeley, earning a Masters degree and advancing to candidacy before leaving the program to work in the financial district in San Francisco. Two years later, realizing that he missed studying languages, he decided to fulfill a dream of living and traveling abroad in order to see the world and become multilingual. For the following seven years, he lived and worked in Prague, Berlin, and Taipei, and traveled extensively, eventually visiting over 100 countries. He then came back full circle to UC Davis to work on his doctorate in Linguistics. His research interests include phonetics and phonology, psycholinguistics, second language acquisition, and computational linguistics. Permanent address: 1023 Sir William Ct Chico, CA 95926 USA This dissertation was typed by the author.