Measuring Written Linguistic Accuracy - Troy Cox

advertisement
10/21/11 Norman Evans
K. James Hartshorn
Teresa Martin
Troy Cox
Brigham Young University
Provo, Utah
Error correction in second language writing
has an interesting history with some scholars
calling for its abolition, and others arguing
for its inclusion in L2 writing pedagogy
(Truscott, 1996; Ferris, 1999; Chandler, 2003;
Bruton, 2010).
Most of the debate has centered around
whether or not L2 writers’ linguistic accuracy
improves or not.
1 10/21/11 Writing is a complex process with many aspects that
must be considered and measured. It is our position
that both the message and the accuracy of the
language used to communicate the message are
important, and each requires separate measures to
determine development.
In the sometimes lively exchanges about the
usefulness of error correction, relatively little has
been said about the efficacy of the linguistic
accuracy (LA) measurements that have been used in
research.
The purpose of this study is to consider various
approaches to measuring LA with the aim of finding a
viable measure of written linguistic accuracy that
will facilitate error correction research.
ž  Definitions
ž  Measurements
of linguistic accuracy Pros/
Cons
ž  Current Study
¡  Research
Questions
¡  Method
¡  Participants
¡  Data
Analysis
¡  Results
¡  Discussion
2 10/21/11 “Measuring linguistic accuracy is a complex
endeavor” (Polio, 1998, p. 52). However, claims of
learner progress or lack thereof depend on reliable
and valid measures of LA. Accordingly, several
definitions are needed to set the context of this
study.
¡  Error
¡  Accuracy
Error
Lennon (1991) defines error as “A linguistic form
or combination of forms which, in the same
context and under similar conditions of production,
would, in all likelihood, not be produced by the
speakers’ native speaker counterpart” (as cited in
Ellis & Barkhuizen, 2005, p. 56).
Accuracy
“The ability to be free from errors while using
language to communicate in either writing or
speech” (Wolfe-Quintero et al., 1998).
3 10/21/11 Obligatory Occasion
Correct use in obligatory occasions means simply that the
acquirer supplied the morpheme where it was required.
Pros:
¡ 
Very accurate at measuring specific features of
the language being examined.
Cons:
¡  Limited
in what and how much can be measured.
¡  Not all errors can be analyzed
¡  Lexical features cannot be measured.
Holistic Scoring
Writing samples are holistically scored on various traits;
linguistic accuracy is just one of the traits considered.
Pros
¡  Writing
can be scored quickly
¡  Can be used with a wide range of proficiency for
placement purposes.
Cons
¡  Reliability
can be difficult to achieve if there is a
broad range of possible scores
¡  Difficult to discriminate single proficiency levels.
4 10/21/11 Error Counts
All errors in a writing sample are counted and then are usually
measured by means of a ratio: total errors per clause or T-unit,
per 100 words, or total words in the sample.
Pros
¡ 
Gets at the number of errors which can be lost in
error-free unit (T-units and clauses) measurements.
Cons
¡ 
¡ 
Difficult to identify where one error begins and
another ends
If using errors per 100 words, it is difficult to identify
what constitutes 100 words (Wigglesworth, 2008).
Error-free Unit (T-units)
A T-unit is defined as an independent clause and its dependent
clauses (Hunt, 1965).
Pros
¡  Relatively
easy to identify T-units
¡  T-units constitute a valid measure of meaning.
Cons
¡  One
unit may contain more than one error
¡  The longer the unit, the greater the chance for
error.
5 10/21/11 Error-free Unit (clauses)
Clauses can be difficult to define. One definition used in LA
research is “a syntactic unit which contains a finite
verb” (Fischer, 1984).
Pros
¡ 
¡ 
Clauses constitute a valid measure of meaning
Less chance of error since clause is by nature shorter
than a T-unit.
Cons
¡ 
¡ 
¡ 
One unit may contain more than one error
The longer the unit, the greater the chance for error
Clauses can be difficult to define and identify.
Weighted Clauses
Wigglesworth (2008) suggests that an “improved
measure of accuracy” may be to weight the errors
within a clause according to the following scales:
No error:
Level one:
Level two:
Level three:
An accurately constructed clause
minor errors
more serious errors
errors make it difficult or
impossible to recover meaning
6 10/21/11 Error Type
Definition
Weighting
No error
An accurately constructed clause
1.0
Level one
the clause has minor errors (e.g.)
(morphosyntactic) which do not
obscure the intended meaning
0.8
Level two
Level three
the clause has more serious errors
(e.g. word choice or word order)
which make the intended meaning
harder to recover
the clause has errors which make the
intended meaning very difficult or
impossible to recover
0.5
0.1
Given the variability among methods of
accuracy measurement, and Wigglesworth’s
claim that weighted clause ratios may be “an
improved measure of linguistic accuracy” we
ask the following questions of error-free T-unit
ratios (EFT/TT), error-free clause ratios (EFC/
TC) , and weighted clause ratios (WCR):
1. How reliable is each as a measure of LA?
2. How valid is each as a measure of LA?
3. How practical is each when measuring LA?
7 10/21/11 Multiple writing samples from university
matriculated ESL students and NES students
were analyzed using EFT-unit, EF clause, and
weighted clause ratios to determine the level
of reliability, validity, and practicality of
each as a measure of written linguistic
accuracy.
ž  All
samples were analyzed using
¡  error-free
¢ 
T-unit to total T-unit (EFT) ratios,
Example 1: EFC Total=5
¡  error-free
clause to total clause (EFC) ratios, and
Example 1: EFC Total=10
weighted error clause to total clause (WEFC) ratios.
¢  Example 1: WEFC Total=10
¢ 
¡ 
Example 1: Writing Analysis
T-units
Clauses
1
1
1
1
1
5
3
2
1
2
3
10
8 10/21/11 ž  T-units
are the largest unit
ž  T-units can consist of multiple clauses but
may also be a single clause
ž  EFC Total = WEFC Total
ž  EFT Ratio ≤ EFC Ratio ≤ WEFC Ratio
Example 1: Writing Analysis
T-units
Clauses
1
1
1
1
1
5
3
2
1
2
3
10
§ 
Error Free T-Unit Ratio
¡  Error
Error Free T-units 2
= = .4
Total T-Units
5
Free Clause Ratio
Error Free Clauses 5
= = .5
Total Clauses
10
¡  Weighted Error Free Clause Ratio
EFC(1)+1L1 (0.8) + 2L2 (0.5)+2L3 (0.1) 7
= = 0.7
Total Clauses
10
Example 1: Writing Analysis
T-units
Clauses
1
1
1
1
1
5
3
2
1
2
3
10
9 10/21/11 Consider the following example from the data set:
Wealth is a blessing of a person in all
sort of ways. It helps a person in the
physical needs. Being a wealthy person can
help for the family to survive and for the
family needs. It helps solve all the problem
in an advantage ways. In the other hand, it
also causes a lot of disadvantage effects;
which is too much pride, looking down to
some people, and it could causes sufferings
as well. It can lead people lives into misery
and trials.
EFT ratio .0 / EFC ratio: .0 / WEFC ratio:.67
ž  Facet:
any factor, variable or component
that is assumed to affect scores in a
systematic way (Eckes, 2011)
ž  FACETS Software (Linacre, 2011)
ž  What facets could contribute to the variance
of the scores in this study?
¡  The
¡  The
¡  The
¡  The
¡  The
ability of a particular PARTICIPANT
difficulty of a particular TOPIC
difficulty of a particular OCCASION
severity of a particular RATER
SCALE that is used
10 10/21/11 FACET
N
Participants
97
Topics
42
Occasions
4
Raters
3
Scale
10
(transformed ratio with 1=10, .9= 9, etc.)
loge (Pnijkx / Pnijk(x-1)) = Bn - Di - Hk- Cj – Fx
where
Bn = ability of participant n;
Di = difficulty of topic i;
Hk = difficulty of prompt/administration k;
Cj = severity of rater j;
and
Fx = scale being used (Rasch-Andrich threshold or step calibration).
ž  If
hypothesize that the score participant gets on
EFT, EFC, and WEFC is based solely on their
ability (Bn), then the other terms interacting
with the scale (– Fx) would be 0 (Di , Cj , Hk ).
That is there would be no difference between
the difficulty of topic and prompt or severity of
rater.
loge (Pnijkx / Pnijk(x-1)) = Bn - Di - Hk - Cj – Fx
where
Bn = ability of participant n;
Di = difficulty of topic i;
Hk = difficulty of occasion k;
Cj = severity of rater j;
and
Fx = scale being used (Rasch-Andrich threshold or step calibration).
11 10/21/11 Two groups of students all enrolled in
undergraduate studies at the same university
in the U.S. 81 ESL students ranging in ability
level from high intermediated to advanced,
and 16 native English speakers.
¡  ESL
first year, EIL:
¡  ESL last semester of EIL:
¡  ESL one year post EIL
¡  ESL two years post EIL
¡  ESL three years post EIL
¡  ESL four years post EIL
¡  NES (ENG 101)
TOTAL
N=
N=
N=
N=
N=
N=
N=
17
18
9
16
13
8
16
97
12 10/21/11 ¡ 
¡ 
¡ 
¡ 
¡ 
¡ 
¡ 
¡ 
¡ 
¡ 
¡ 
¡ 
¡ 
¡ 
¡ 
After Graduation
Being On Time
Big Cities
¡ 
Challenges
Coffee
¡ 
Competition
Controlling Anger
Cooperation
¡ 
Crime
Economy
Effective Leadership
¡ 
Exercise
Farmers
¡ 
GE
Good Books
¡ 
Respecting Elderly
Science
¡ 
Social Problem
¡ 
¡ 
¡ 
¡ 
¡ 
¡ 
¡ 
¡ 
Hard Work
Health
History
¡ 
International Peace
Lawyers
¡ 
¡ 
¡ 
¡ 
Solving Problems
Staying Up Late
Stress
Success
Successful Business
Learning From Mistakes ¡  Teaching
Learning The Hard Way ¡  Technology
Luck
Managing Money
¡ 
Thinking vs Feeling
¡ 
Misunderstanding
Quality Education
Quality Of Life
¡ 
¡ 
Time
Too Much Freedom
Wealth
¡ 
Finishing
Prompts were administered within 2 weeks on
on four different occasions.
13 10/21/11 All writing samples were rated in the following
methods:
EFT Ratio
EFC Ratio
WEFC Ratio
A subset of the samples from each scoring
method was double-rated.
EFTUN
N
Range
Minimum
Maximum
Mean
Std. Deviation
Variance
Skewness
Kurtosis
97
7.81
0.85
8.66
3.75
1.77
3.14
0.84
0.52
EFCL
WEFCL
97
7.48
2.10
9.58
5.39
1.52
2.30
0.54
0.26
97
3.65
6.32
9.97
8.48
0.73
0.53
-0.12
0.11
14 10/21/11 Error-Free
T-units
Error-Free
Weighted
Clause Ratios Clause Ratios
15 10/21/11 Error-free
T-unit Ratio
Error-free
Clause Ratio
Weighted
Clause Ratio
EFT Ratio
1.0
.85 (.90)
.71 (.78)
EFC Ratio
--
1.0
.83 (.88)
WC Ratio
--
--
1.0
Correlations by essays (and Participants)
Coefficients suggest similarities among methods.
We assume the facet of Participants will be different and that
the separation reliability will be close to 1.
If the facets of Topics, Occasions, and Raters have NO effect on
the measures, the separation reliability will be close to 0.
Separation Reliability
EFT
EFC
WEFC
Participants
.87
.92
*.91
Topics
*.49
.76
*.64
Occasions
.00
.20
.81
Raters
.00
.00
.93
*Indicates EXTREMES were present in the data
16 10/21/11 Linguistic Accuracy
By definition, all three methods measure linguistic
accuracy as a proportion of the total volume of
language.
All three methods have good reliability of separation
All thee methods are well correlated with each other
Stringency:
1. EFT Ratios are the most stringent
2. EFC Ratios are a little less stringent
3. WC Ratios were the least stringent
Better
separation
for lower
level
students
Error-Free
T-units
Error-Free
Weighted
Clause Ratios Clause Ratios
17 10/21/11 Better
separation
for higher
level
students
Better
separation
for lower
level
students
Error-Free
T-units
Error-Free
Weighted
Clause Ratios Clause Ratios
ž  From
our experience EFTs were by far the
easiest to count.
ž  Counting clauses and error-free clauses took
nearly twice as much time as counting Tunits and error-free T-units.
ž  Weighting clauses was the most time
consuming of all since the rater had to first
count all clauses, designate all clauses as
error-free or not, and finally weight the
clauses on a 1-3 scale.
18 10/21/11 Which measure is best for
identifying linguistic accuracy?
It depends! Consider:
1.  Level of writers
2.  The purpose for the measurement
(placement, achievement etc.)
3.  Resources
4.  What dimension of writing is
being measured
19 
Download