10/21/11 Norman Evans K. James Hartshorn Teresa Martin Troy Cox Brigham Young University Provo, Utah Error correction in second language writing has an interesting history with some scholars calling for its abolition, and others arguing for its inclusion in L2 writing pedagogy (Truscott, 1996; Ferris, 1999; Chandler, 2003; Bruton, 2010). Most of the debate has centered around whether or not L2 writers’ linguistic accuracy improves or not. 1 10/21/11 Writing is a complex process with many aspects that must be considered and measured. It is our position that both the message and the accuracy of the language used to communicate the message are important, and each requires separate measures to determine development. In the sometimes lively exchanges about the usefulness of error correction, relatively little has been said about the efficacy of the linguistic accuracy (LA) measurements that have been used in research. The purpose of this study is to consider various approaches to measuring LA with the aim of finding a viable measure of written linguistic accuracy that will facilitate error correction research. Definitions Measurements of linguistic accuracy Pros/ Cons Current Study ¡ Research Questions ¡ Method ¡ Participants ¡ Data Analysis ¡ Results ¡ Discussion 2 10/21/11 “Measuring linguistic accuracy is a complex endeavor” (Polio, 1998, p. 52). However, claims of learner progress or lack thereof depend on reliable and valid measures of LA. Accordingly, several definitions are needed to set the context of this study. ¡ Error ¡ Accuracy Error Lennon (1991) defines error as “A linguistic form or combination of forms which, in the same context and under similar conditions of production, would, in all likelihood, not be produced by the speakers’ native speaker counterpart” (as cited in Ellis & Barkhuizen, 2005, p. 56). Accuracy “The ability to be free from errors while using language to communicate in either writing or speech” (Wolfe-Quintero et al., 1998). 3 10/21/11 Obligatory Occasion Correct use in obligatory occasions means simply that the acquirer supplied the morpheme where it was required. Pros: ¡ Very accurate at measuring specific features of the language being examined. Cons: ¡ Limited in what and how much can be measured. ¡ Not all errors can be analyzed ¡ Lexical features cannot be measured. Holistic Scoring Writing samples are holistically scored on various traits; linguistic accuracy is just one of the traits considered. Pros ¡ Writing can be scored quickly ¡ Can be used with a wide range of proficiency for placement purposes. Cons ¡ Reliability can be difficult to achieve if there is a broad range of possible scores ¡ Difficult to discriminate single proficiency levels. 4 10/21/11 Error Counts All errors in a writing sample are counted and then are usually measured by means of a ratio: total errors per clause or T-unit, per 100 words, or total words in the sample. Pros ¡ Gets at the number of errors which can be lost in error-free unit (T-units and clauses) measurements. Cons ¡ ¡ Difficult to identify where one error begins and another ends If using errors per 100 words, it is difficult to identify what constitutes 100 words (Wigglesworth, 2008). Error-free Unit (T-units) A T-unit is defined as an independent clause and its dependent clauses (Hunt, 1965). Pros ¡ Relatively easy to identify T-units ¡ T-units constitute a valid measure of meaning. Cons ¡ One unit may contain more than one error ¡ The longer the unit, the greater the chance for error. 5 10/21/11 Error-free Unit (clauses) Clauses can be difficult to define. One definition used in LA research is “a syntactic unit which contains a finite verb” (Fischer, 1984). Pros ¡ ¡ Clauses constitute a valid measure of meaning Less chance of error since clause is by nature shorter than a T-unit. Cons ¡ ¡ ¡ One unit may contain more than one error The longer the unit, the greater the chance for error Clauses can be difficult to define and identify. Weighted Clauses Wigglesworth (2008) suggests that an “improved measure of accuracy” may be to weight the errors within a clause according to the following scales: No error: Level one: Level two: Level three: An accurately constructed clause minor errors more serious errors errors make it difficult or impossible to recover meaning 6 10/21/11 Error Type Definition Weighting No error An accurately constructed clause 1.0 Level one the clause has minor errors (e.g.) (morphosyntactic) which do not obscure the intended meaning 0.8 Level two Level three the clause has more serious errors (e.g. word choice or word order) which make the intended meaning harder to recover the clause has errors which make the intended meaning very difficult or impossible to recover 0.5 0.1 Given the variability among methods of accuracy measurement, and Wigglesworth’s claim that weighted clause ratios may be “an improved measure of linguistic accuracy” we ask the following questions of error-free T-unit ratios (EFT/TT), error-free clause ratios (EFC/ TC) , and weighted clause ratios (WCR): 1. How reliable is each as a measure of LA? 2. How valid is each as a measure of LA? 3. How practical is each when measuring LA? 7 10/21/11 Multiple writing samples from university matriculated ESL students and NES students were analyzed using EFT-unit, EF clause, and weighted clause ratios to determine the level of reliability, validity, and practicality of each as a measure of written linguistic accuracy. All samples were analyzed using ¡ error-free ¢ T-unit to total T-unit (EFT) ratios, Example 1: EFC Total=5 ¡ error-free clause to total clause (EFC) ratios, and Example 1: EFC Total=10 weighted error clause to total clause (WEFC) ratios. ¢ Example 1: WEFC Total=10 ¢ ¡ Example 1: Writing Analysis T-units Clauses 1 1 1 1 1 5 3 2 1 2 3 10 8 10/21/11 T-units are the largest unit T-units can consist of multiple clauses but may also be a single clause EFC Total = WEFC Total EFT Ratio ≤ EFC Ratio ≤ WEFC Ratio Example 1: Writing Analysis T-units Clauses 1 1 1 1 1 5 3 2 1 2 3 10 § Error Free T-Unit Ratio ¡ Error Error Free T-units 2 = = .4 Total T-Units 5 Free Clause Ratio Error Free Clauses 5 = = .5 Total Clauses 10 ¡ Weighted Error Free Clause Ratio EFC(1)+1L1 (0.8) + 2L2 (0.5)+2L3 (0.1) 7 = = 0.7 Total Clauses 10 Example 1: Writing Analysis T-units Clauses 1 1 1 1 1 5 3 2 1 2 3 10 9 10/21/11 Consider the following example from the data set: Wealth is a blessing of a person in all sort of ways. It helps a person in the physical needs. Being a wealthy person can help for the family to survive and for the family needs. It helps solve all the problem in an advantage ways. In the other hand, it also causes a lot of disadvantage effects; which is too much pride, looking down to some people, and it could causes sufferings as well. It can lead people lives into misery and trials. EFT ratio .0 / EFC ratio: .0 / WEFC ratio:.67 Facet: any factor, variable or component that is assumed to affect scores in a systematic way (Eckes, 2011) FACETS Software (Linacre, 2011) What facets could contribute to the variance of the scores in this study? ¡ The ¡ The ¡ The ¡ The ¡ The ability of a particular PARTICIPANT difficulty of a particular TOPIC difficulty of a particular OCCASION severity of a particular RATER SCALE that is used 10 10/21/11 FACET N Participants 97 Topics 42 Occasions 4 Raters 3 Scale 10 (transformed ratio with 1=10, .9= 9, etc.) loge (Pnijkx / Pnijk(x-1)) = Bn - Di - Hk- Cj – Fx where Bn = ability of participant n; Di = difficulty of topic i; Hk = difficulty of prompt/administration k; Cj = severity of rater j; and Fx = scale being used (Rasch-Andrich threshold or step calibration). If hypothesize that the score participant gets on EFT, EFC, and WEFC is based solely on their ability (Bn), then the other terms interacting with the scale (– Fx) would be 0 (Di , Cj , Hk ). That is there would be no difference between the difficulty of topic and prompt or severity of rater. loge (Pnijkx / Pnijk(x-1)) = Bn - Di - Hk - Cj – Fx where Bn = ability of participant n; Di = difficulty of topic i; Hk = difficulty of occasion k; Cj = severity of rater j; and Fx = scale being used (Rasch-Andrich threshold or step calibration). 11 10/21/11 Two groups of students all enrolled in undergraduate studies at the same university in the U.S. 81 ESL students ranging in ability level from high intermediated to advanced, and 16 native English speakers. ¡ ESL first year, EIL: ¡ ESL last semester of EIL: ¡ ESL one year post EIL ¡ ESL two years post EIL ¡ ESL three years post EIL ¡ ESL four years post EIL ¡ NES (ENG 101) TOTAL N= N= N= N= N= N= N= 17 18 9 16 13 8 16 97 12 10/21/11 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ After Graduation Being On Time Big Cities ¡ Challenges Coffee ¡ Competition Controlling Anger Cooperation ¡ Crime Economy Effective Leadership ¡ Exercise Farmers ¡ GE Good Books ¡ Respecting Elderly Science ¡ Social Problem ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ Hard Work Health History ¡ International Peace Lawyers ¡ ¡ ¡ ¡ Solving Problems Staying Up Late Stress Success Successful Business Learning From Mistakes ¡ Teaching Learning The Hard Way ¡ Technology Luck Managing Money ¡ Thinking vs Feeling ¡ Misunderstanding Quality Education Quality Of Life ¡ ¡ Time Too Much Freedom Wealth ¡ Finishing Prompts were administered within 2 weeks on on four different occasions. 13 10/21/11 All writing samples were rated in the following methods: EFT Ratio EFC Ratio WEFC Ratio A subset of the samples from each scoring method was double-rated. EFTUN N Range Minimum Maximum Mean Std. Deviation Variance Skewness Kurtosis 97 7.81 0.85 8.66 3.75 1.77 3.14 0.84 0.52 EFCL WEFCL 97 7.48 2.10 9.58 5.39 1.52 2.30 0.54 0.26 97 3.65 6.32 9.97 8.48 0.73 0.53 -0.12 0.11 14 10/21/11 Error-Free T-units Error-Free Weighted Clause Ratios Clause Ratios 15 10/21/11 Error-free T-unit Ratio Error-free Clause Ratio Weighted Clause Ratio EFT Ratio 1.0 .85 (.90) .71 (.78) EFC Ratio -- 1.0 .83 (.88) WC Ratio -- -- 1.0 Correlations by essays (and Participants) Coefficients suggest similarities among methods. We assume the facet of Participants will be different and that the separation reliability will be close to 1. If the facets of Topics, Occasions, and Raters have NO effect on the measures, the separation reliability will be close to 0. Separation Reliability EFT EFC WEFC Participants .87 .92 *.91 Topics *.49 .76 *.64 Occasions .00 .20 .81 Raters .00 .00 .93 *Indicates EXTREMES were present in the data 16 10/21/11 Linguistic Accuracy By definition, all three methods measure linguistic accuracy as a proportion of the total volume of language. All three methods have good reliability of separation All thee methods are well correlated with each other Stringency: 1. EFT Ratios are the most stringent 2. EFC Ratios are a little less stringent 3. WC Ratios were the least stringent Better separation for lower level students Error-Free T-units Error-Free Weighted Clause Ratios Clause Ratios 17 10/21/11 Better separation for higher level students Better separation for lower level students Error-Free T-units Error-Free Weighted Clause Ratios Clause Ratios From our experience EFTs were by far the easiest to count. Counting clauses and error-free clauses took nearly twice as much time as counting Tunits and error-free T-units. Weighting clauses was the most time consuming of all since the rater had to first count all clauses, designate all clauses as error-free or not, and finally weight the clauses on a 1-3 scale. 18 10/21/11 Which measure is best for identifying linguistic accuracy? It depends! Consider: 1. Level of writers 2. The purpose for the measurement (placement, achievement etc.) 3. Resources 4. What dimension of writing is being measured 19