Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010 Outline Lexical & syntactic complexity: The what and why Syntactic complexity in EFL writing Lexical complexity in EFL speaking 2 Lexical and Syntactic Complexity: The What and Why What is lexical and syntactic complexity Lexical complexity A multidimensional feature of language use encompassing lexical density, sophistication and variation (Wolfe-Quintero et al. 1998; Read 2000) Does not focus on errors, a dimension in Read’s (2000) conceptualization of lexical richness Syntactic complexity The range of forms that surface in language production and the degree of sophistication of such forms (Ortega 2003) 4 Why measure linguistic complexity? First language acquisition & psycholinguistics Studies of L1 developmental sequence Objective measures of L1 developmental level Ordering experimental stimuli by complexity Relationship of complexity in childhood to symptoms of Alzheimer’s disease (Kemper et al. 2001) 5 Why measure linguistic complexity? Second language acquisition Objective L2 developmental indices Assessing cross-proficiency differences Assessing effect of pedagogical intervention Tracking L2 learners’ linguistic development over time Relationship between lexical/syntactic complexity and proficiency claimed in many test rating scales 6 Syntactic Complexity in EFL Writing Lu, X. (forthcoming 2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15. Lu, X. (forthcoming 2010). A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly, 44(4). Outline Measures of L2 syntactic complexity L2 syntactic complexity analyzer Syntactic complexity & EFL writing development Summary 8 Measures of L2 syntactic complexity Measures reviewed in two research syntheses Wolfe-Quintero et al. (1998) Ortega (2003) Selection criterion At least one previous study showed at least weak correlation with or effect for proficiency Issues among previous studies Variation in measure selection and definition Variation in experiment design Inconsistent results reported on the same measures 9 Measures of L2 syntactic complexity Length of production 1. Mean length of clause (MLC) 2. Mean length of sentence (MLS) 3. Mean length of T-unit (MLT) Sentence complexity 4. Mean number of clauses per sentence (C/S) 10 Measures of L2 syntactic complexity Subordination 5. 6. 7. 8. Mean number of clauses per T-unit (C/T) Mean number of complex T-units per T-unit (CT/T) Mean number of dependent clauses per clause (DC/C) Mean number of dependent clauses per T-unit (DC/T) 11 Measures of L2 syntactic complexity Coordination 9. Mean number of coordinate phrases per clause (CP/C) 10.Mean number of coordinate phrases per T-unit (CP/T) 11.Mean number of T-units per sentence (T/S) Particular grammatical structures 12.Mean number of complex nominals per clause (CN/C) 13.Mean number of complex nominals per T-unit (CN/T) 14.Mean number of verb phrases per T-unit (VP/T) 12 L2 syntactic complexity analyzer Input: plain English text Step 1: Parsing using Stanford parser Step 2: Retrieving & counting occurrences of Words, sentences, clauses, dependent clauses T-units, complex T-units Coordinate phrases, complex nominals, verb phrases Step 3: Computing ratios for the 14 measures Output: 14 syntactic complexity indices 13 How counting is done Word: all non-punctuation tokens Other units: Tregex (Levy & Andrew, 2006) Define the units linguistically Formulate Tregex patterns matching the unit definitions Query the parse trees with the Tregex patterns Retrieve/count (sub)trees matching each pattern 14 Definition and pattern examples Clause: subject + finite verb (Polio 1997) ‘S|SINV|SQ < (VP <# MD|VBP|VBZ|VBD)’ Dependent clause: adverbial, adjectival or nominal clause ‘SBAR < (S|SINV|SQ < (VP <# MD|VBP|VBZ|VBD))’ 15 Evaluation Experiment setup 40 essays from the Written English Corpus of Chinese Learners (Wen et al. 2005), average 315 words Written by English majors in four-year colleges in China 20 used for training, 20 for testing Two annotators counted unit occurrences in the essays Inter-annotator agreement Evaluated on 10 essays F-score for unit identification: .907 (CN) - 1.000 (S) Correlations of complexity ratios: .912 (CT/T) - 1.000 (MLS) 16 Unit identification results on test data Counts System-annotator agreement Manual Identical 357 357 Precision 1.000 Recall 1.000 F-score 1.000 Structure S System 357 C 545 558 530 .972 .950 .961 DC 170 178 161 .947 .904 .925 T 376 380 369 .981 .971 .976 CT 129 136 126 .977 .926 .951 CP 138 135 125 .906 .926 .916 CN 660 572 511 .774 .893 .830 VP 750 758 698 .931 .921 .926 17 Correlations of complexity ratios Measure Development Test Measure Development Test MLC .941 .932 DC/T .950 .941 MLS 1.000 1.000 CP/C .845 .834 MLT .989 .987 CP/T .876 .871 C/S .939 .928 T/S .931 .919 C/T .978 .961 CN/C .883 .867 CT/T .903 .892 CN/T .904 .896 18 Error analysis Attachment and conjunction scope errors e.g., benefit a lot from [the Internet in academic study] More reliable in identifying higher-level units: S, C, T, CT Learner errors not a major cause for problems Advanced EFL learners Idiomaticity vs. grammatical completeness Some errors do not lead to structural misanalysis 19 Syntactic complexity & EFL writing development Research questions The WECCL corpus Results Summary 20 Research questions 1) 2) 3) 4) 5) Effect of sampling condition Measures discriminating proficiency levels Magnitudes for differences to be significant Relationships between measures Patterns of development for the measures 21 The WECCL corpus School Level Argumentation Narration Exposition All Time d Untimed Timed Untimed Timed Untimed 1 695 395 89 0 30 0 1209 2 441 398 246 0 28 0 1113 3 504 459 91 0 30 0 1084 4 60 0 88 0 0 0 148 All 1700 1252 514 0 88 0 3554 Essay length: range=[89, 892], mean=315, sd=87 22 Effect of sampling condition Institution: sig. inter-institution dif. for All metrics using all data 12 metrics using Y1-3 timed arg essays Genre: sig. dif. between arg vs. nar for All metrics using arg & nar essays All metrics using timed arg & nar essays 13 metrics using timed arg & nar essays from ND 23 Sampling condition effect (cont) Timing: sig. dif. between un/timed arg for 13 measures using all arg essays 11 metrics using arg essays from ND Data for other research questions 422 timed argumentative essays from ND 24 Measures discriminating levels 3 showed sig. dif between first 3 levels MLC, CN/C, and CN/T 4 showed sig. dif between first 2 levels MLS, MLT, CP/C, and CP/T 5 showed sig. dif. between non-adjacent levels C/S, C/T, CT/T, DC/C, and DC/T 2 showed no sig. between-level dif. T/S and VP/T 25 Significant magnitudes Metric Magnitude Levels Measure Magnitude Levels MLC .573 2-3 DC/C -.033 1-4 MLS 1.658 1-2 DC/T -.071 1-4 MLT 1.651 1-2 CP/C .040 1-2 C/S -.112 2-4 CP/T .061 1-2 C/T -.078 2-4 CN/C .133 2-3 CT/T -.043 2-4 CN/T .178 2-3 26 Relationships between measures Strong relationship between measures of the same type or involving the same structure MLS and MLT show weak-moderate correlations with subordination measures MLC shows low-weak negative correlations with subordination measures Length measures show moderate-high correlations with CN measures and weak-moderate correlations with CP measures CN and CP measures weakly correlated with each other 27 Developmental patterns Measures with sig. positive changes Linear increase Y1-4: MLC, CN/C Increase Y1-3 (Y4=Y3): CP/C Increase Y1-3 (Y4<Y3, insig.): MLS, MLT, CP/T, CN/T Measures with sig. negative changes Linear decrease Y1-4: C/S Nonlinear Y1<Y2>Y3>Y4: DC/C, DC/T 28 Summary of findings Important to control for the effects of relevant learner-, task- and context-related factors Seven measures recommended for future use CN/C, MLC: discriminates 2+ adjacent levels, linear increases CN/T, MLS, MLT: 2 adjacent levels; positive sig changes CP/C, CP/T: nonadjacent levels, positive sig changes Developmental prediction: complexification at the phrasal level vs. the clausal level 29 Summary of findings (cont.) Smaller magnitudes than reported previously Clause as a potentially more informative unit of analysis than T-unit 30 Limitations and future research Incorporating more measures and flexible definitions of structures into the analyzer Other conceptualizations of proficiency level Effect of L1 on syntactic development Relationship between developmental measures of fluency, accuracy and complexity at different linguistic levels 31 Lexical Complexity in EFL Speaking Lu, X. (under review). The relationship of lexical richness to the quality of ESL speakers’ oral narratives. Outline Research goals and motivation Measures of lexical complexity Methodology Results Conclusion 33 Research goals and motivation Research goals Automate lexical complexity analysis using 25 measures Evaluate the relationship of these measures plus the D measure to the quality of EFL speakers’ oral narratives Motivation Lexical complexity an important construct in L2 teaching and research Relationship between lexical complexity and proficiency claimed in many test rating scales 34 Measures of lexical complexity Lexical complexity measures proposed in language acquisition studies and reviewed in Wolfe-Quintero et al. (1998) Read (2000) Malvern et al. (2004) Measures of the following three dimensions Lexical density Lexical sophistication Lexical variation 35 Lexical density Proportion of lexical words (Nlw / N) (Ure 1971) Previous findings Lower in spoken than written texts (Halliday 1985) Affected by various sources (O’Loughlin 1995) Relation to L2 writing non-significant (Engber 1995) Inconsistent definition of lexical words All nouns and adjectives Adverbs with adjective base Full verbs (excluding modal/auxiliary verbs) 36 Lexical sophistication Five measures examined LS1: LS2: VS1: CVS1: VS2: Nslw / Nlw Ts / T Tsv / Nv Tsv / sqrt(2Nv) Tsv2 / Nv (Linnarud 1986; Hyltenstam 1988) (Laufer 1984) (Harley & King 1989) (Wolfe-Quintero et al. 1998) (Chaudron & Parker 1990) 37 Lexical sophistication (cont.) Previous findings LS1: NS-NNS dif sig (Linnarud 1986); non-sig (Hyltenstam 1988) LS2: sig pre-and post-essay dif (Laufer 1984) VS1: sig NS-NNS dif (Harley & King 1989) Varying definitions of sophistication 2000-word BNC frequency list (Leech et al. 2001) 38 Lexical variation 20 measures examined 4 based on NDW NDW: Number of different words NDW-50: NDW in first 50 words of sample NDW-ER50: mean NDW of 10 random 50-word subsamples NDW-ES50: mean NDW of 10 random 50-word sequences 39 Lexical variation (cont.) 7 based on TTR for total vocabulary Type token ratio (TTR) Mean TTR of all 50-word segments (MSTTR) LogTTR, Corrected TTR, Root TTR, Uber The D measure (McKee et al. 2000) 9 based on TTR for word classes T{LW, V, N, Adj, Adv, Mod} / Nlw Tv / Nv, Tv2 / Nv, Tv / sqrt(2Nv ) 40 Lexical variation (cont.) Previous findings NDW and TTR useful, but affected by sample size Transformations of NDW & TTR not equally useful D claimed superior; results mixed (Jarvis 2002; Yu 2010) Mixed results for word class TTR measures No consensus on a single best measure 41 Research questions How does LD relate to the quality of EFL speakers’ oral narratives? How do the LS measures compare with and relate to each other as indices of the quality of EFL speakers’ oral narratives? How do the LV measures compare with and relate to each other as indices of the quality of EFL speakers’ oral narratives? How do LD, LS and LV compare with and relate to each other as indices of the quality of EFL speakers’ oral narratives? 42 Data Spoken English Corpus of Chinese Learners (Wen et al. 2005) Transcripts of TEM-4 Spoken Test data in 1996-2002 Task 2 data used: 3-minute oral narratives Students ranked within groups of 32-35 12 groups of data used (1999-2002; N=32-35 each) Only rankings available, but not actual scores Example topic (2001) Describe a teacher of yours whom you found unusual 43 Computing the measures Preprocessing Part-of-speech tagging (Stanford tagger) Lemmatization (Morpha) Measure computation D measure: vocd utility in CLAN Type counting: w, sw, lw, slw, v, sv, n, adj, adv Token counting: w, lw, slw, v Computation of the other 25 ratios 44 Analysis Spearman’s rho computed for each group X: test takes’ rankings within the group Y: Values of each of the 26 measures Meta-analysis to combine results from the 12 groups Students divided into 4 levels based on rankings Levels A, B, C and D ANOVA’s run to determine inter-level differences 45 Analysis (cont.) Alpha level = .05 / 28 = .0018 Identification of discriminative measures Significant combined rho (p < .0018) Significant between-level differences with linear decreases from Level A to Level D 46 Lexical density and sophistication Measure Combined rho p-value Measure Combined rho p-value Words .437 .000 LS2 .050 .336 W/Min .437 .000 VS1 .133 .010 LD .011 .836 CVS1 .166 .001 LS1 .048 .355 VS2 .165 .001 47 Lexical density and sophistication (cont.) Measure A B C D F Sig. Words 336.16 295.95 297.76 256.34 28.335 .000 W/Min 112.052 98.650 99.252 85.446 28.335 .000 LD .417 .415 .409 .414 .896 .443 LS1 .227 .235 .221 .225 .681 .564 LS2 .261 .272 .256 .260 2.736 .043 VS1 .072 .086 .067 .073 2.629 .050 CVS1 .343 .383 .299 .297 3.722 .042 VS2 .314 .401 .274 .262 2.760 .042 48 Lexical density and sophistication (cont.) LS1 LS2 VS1 CVS1 LS1 1.000 LS2 .637** 1.000 VS1 .456** .391** 1.000 CVS1 .414** .382** .966** 1.000 VS2 .381** .350** .909** .935** VS2 1.000 49 Relationships among the dimensions Low to weak correlations among measures in different dimensions Lexical variation demonstrated strongest relationships to raters’ judgments of the quality of EFL speakers’ oral narratives 50 Summary of findings The three dimensions posited in language acquisition literature appear different constructs No/small effect for lexical density/sophistication found Lexical variation correlated strongly with quality 9 LV measures recommended NDW correlates strongly with length, but worth considering in the case of timed oral narratives Transformed TTR measures perform better than the original TTR measures 51 Limitations and future research A factor analysis will show patterns of relationships No scores available, so not possible to run regression models Division of students into 4 levels could be problematic Replication using EFL writing data and other conceptualizations of proficiency Effects of task-related variables Relations among factors determining quality 52