A Quantitative Approach to Defining Code-switching Pattern Natalia Bakaeva University of Toronto In this presentation I will show some methodological improvements of Muysken's (2000) typological approach patterning bilingual speech that establish a more precise quantitative profile for the identification of two types of code-switching patterns: insertion and alternation. Muysken distinguished two main code-switching patterns: insertion and alternation. Insertion (see 1) is characterized by insertion of material (lexical item or entire constituent) from one language into morphosyntactic structure from the other language. (1) Ya ego kazhdyi den videla I him every day see3S.Pas 'I saw him every day in the subway' v in SUBWAY 1 Alternation (see 2) is "alternation between structures from [different] languages" (Muysken, 2000:3). (2) Oni ne zamechayut, I DON'T KNOW IF I WILL FEEL THE SAME They NEG notice3Pl.Pres 'They do not notice that, I don't know if I will feel the same' The grammatical constraints on intrasentential code-switching are numerous (Myers-Scotton, 1993, 2004 ; Treffers-Daller, 1994 ; Mahootian and Santorini, 1996 ; Poplack, 1980, 1990, 2001; MacSwan, 1999, 2006), but partially convergent. All these studies involve a variety of language pairs, social settings, and speaker types. Although data sets of bilingual speech share many similar features, they also show a large variation in types, forms and frequency of switches. These differences may be caused by numerous factors, both linguistic and social, which may explain why a consensus has yet to be reached on how to grammatically constraint the code-switching patterns. Muysken's typology allows to reconcile two main models of code-switching (Poplack, 1980 and Myers-Scotton, 1993), which up to now have been proved hardly convergent. So, the model of linear equivalence of Poplack (1980) better reflects the alternation pattern, while the Matrix Language Frame Model of MyersScotton (1993) better reports the insertion. The method I developed allows for systematic and objective comparison across corpora to more easily distinguish these types and thus improve our understanding of the phenomena involved. I illustrate my method via analysis of a corpus of Russian-English code-switching data. 1,123 switches are extracted from my corpus of 11 hours of spontaneous speech recorded from six speakers in Toronto. All speakers are bilingual, dominant in Russian, first generation immigrants who moved to Canada at the age of 17 or older and have been here for at least 7 years. This study aims to 1) refine the linguistic criteria used by Muysken to define code-switching patterns (insertion and alternation), and 2) organize the linguistic diagnostic features for insertion into a hierarchy according to their relative predictive power. I present my objective definitions of the diagnostic criteria and my method of quantifying their contribution and then apply the criteria to the data set. The statistical weight of each diagnostic feature (elements of the linguistic context of a switch), showing how strongly it is predictive of insertion, is determined using GoldVarb’s multivariate function. Muysken (2000: 230-231) suggests that these “specific diagnostic features [of the two types of alternation] can be applied to a corpus, and that the set of values for each feature will match one pattern more closely than the other”. These two strategies are structurally defined and a number of their diagnostic criteria are listed. The 27 diagnostic criteria are gathered in four groups: constituency, element switched, switch site, and properties of switch. For example, the different feature values determine the different code-switching patterns: the positive value for the feature single constituent relating to the insertion pattern means that if a particular switch is a single constituent (see 1) and thus has a positive value for this feature, it is more likely to match the insertion pattern than the alternation. 1 All examples are from my Russian\English corpus. If the switch is not a single constituent (see 2), however, and has a negative value for this feature, this would be indicative of the alternation pattern. However, the criteria cannot in fact be considered in isolation because they are related – there are implicational hierarchies and interactions among these factors. A primary goal of my analysis is to simplify the set of features by untangling these interactions. My analysis builds on Deuchar, Muysken and Wang’s (2007) initial attempt to empirically test the hypotheses. Their study, a quantitative analysis of three corpora of spontaneous conversations (Welsh-English, Tsou-Mandarin Chinese, and Taiwanese-Mandarin Chinese), reveals some methodological and conceptual issues. One is that all of the criteria used to identify the code-switching patterns have been treated as if they had equal weight, while there are reasonable expectations that some criteria (e.g. linear equivalence, function word) may be more important than others (e.g. homophonous diamorphs, triggering). Second, there is some redundancy in the system that wasn’t taken into account: cases where the value of one criterion will determine the value of another. For example, the negative value of the criterion nested will determine the positive value of the criterion non-nested. Third, not all criteria apply to all switches. Finally, the same criteria are often offered for both types of switches. As a result, the scores for each switch are not directly comparable. The criteria need to be more precisely defined to allow testing (Deuchar, Muysken, Wang, 2007: 336). For this purpose, all of the diagnostic features were first divided into criteria of absolute value (e.g. single constituent, content word) and criteria of relative value (e.g. long constituent, complex constituent) according to their applicability. The next step of analysis consisted of the categorization (binning) of relative value criteria as well as the elimination of redundant criteria (e.g., non-nested, self-correction) and irrelevant criteria in the context provided (e.g., homophonous diamorphs, embedding in discourse). Once the criteria are better defined and pruned, quantitative analysis of the binary values of the remaining 19 criteria was applied to the 1,123 switches. This produced two scores for each switch: an insertion score and an alternation score. The higher the score, the better the match to the pattern of predictive features. Adding up the scores for all switches in the corpus allows me to determine, on the basis of the highest score, the pattern that is matched best in the corpus as a whole. The dominant pattern for this set of data is insertion: insertion score 3526 versus alternation score 4520. Next, the statistical weight of each diagnostic feature indicative of insertion was assessed using GoldVarb (Sankoff, Tagliamonte & Smith, 2005). Insertion vs. alternation is the dependent variable and all 7 diagnostic features indicative of insertion are independent linguistic variables, binarily coded according to their applicable value. The diagnostic criteria indicative of insertion fall into the following hierarchy. Those ranked at the top are most powerful in predicting insertion in this corpus. Future work will determine how universal this ranking is. Rank 1) 2) 3) 4) 5) 6) 7) Factor content word morphological integration dummy word insertion single constituent selected element nested telegraphic mixing Factor Weight 1.00 1.00 1.00 .80 .40 .31 [ ] Significance p < 0.05 p < 0.05 p < 0.05 not significant The validation and refinement of diagnostic criteria permit me to more clearly define the relative value features (long constituent, complex constituent) and to reduce the number of criteria considered from 27 to 19. The results of this analysis indicate that only six of these play a critical role in defining insertion, suggesting that future analysis could be considerably simplified. References Deuchar, M., Muysken, P. and S.L. Wang (2007). Structured Variation in Codeswitching: Towards an Empirically Based Typology of Bilingual Speech Patterns. International Journal of Bilingual Education and Bilingualism, 10:3, 298-340. MacSwan, J. (1999). A Minimalist approach to intrasentential code switching. New York: Garland. MacSwan, J. (2006). Code switching and grammatical theory. In T. K. Bhatia & W. C. Ritchie (eds) The Handbook of Bilingualism. Malden, MA: Blackwell. 283-311. Mahootian, S. & B. Santorini (1996). Code switching and the complement/adjunct distinction. Linguistic Inquiry 27(3), 464-479. Muysken, P. (2000). Bilingual speech: A typology of code-mixing. Cambridge, England: Cambridge UP. Myser-Scotton, C. (1993). Duelling language: grammatical structure in codeswitching. New York: Oxford UP. Myser-Scotton, C. (2004). Precision tuning of the Matrix Language Frame (MLF) Model of codeswitching. Sociolinguistica 18, 106-117. Poplack, S. (1980). Sometimes I'll start a sentence in Spanish Y termino en espanol: toward a typology of code-switching. Linguistics 18(7-8), 581-618. Poplack, S., Wheeler, S. and A. Westwood (1990). Distinguishing language contact phenomena: evidence from Finnish-English bilingualism. In R. Jacobson (ed.) Code-switching as a worldwide phenomenon. New York: Peter Lang. 170-185. Poplack, S. (2001). Code-switching (linguistic). In N. Smelser & P. Baltes (eds) International Encyclopedia of the Social and Behavioral Sciences. Elsevier Science Ltd. 2062-2065. Sankoff, D., S. Tagliamonte and E. Smith. (2005). GOLDVARB X. individual.utoronto.ca/tagliamonte/ goldvarb.htm. Treffers-Daller, J. (1994). Mixing two languages, French-Dutch contact in a comparative perspective. Berlin: Mouton de Gruyter.