Exploring the Usefulness of Holistic Scores in Automated Writing Evaluation (AWE)

advertisement
Zhi Li, Stephanie Link, Hong Ma, Hyejin Yang,
Volker Hegelheimer
June 13-16
Iowa State University
Applied Linguistics and Technology, Department of English
Open education, resources, and design for language learning
Exploring the Usefulness of
Holistic Scores in
Automated Writing
Evaluation (AWE)
The Age of Machines?
¡  An English-teaching robot stands in front of children at an
elementary school in the south of Seoul, South Korea.
Iowa State University
Applied Linguistics and Technology, Department of English
2
Overview
v Background of the Study
v Literature Review
v Methodology
v Results
§  Learners’ perceptions of the usefulness of scores
§  Instructors’ perceptions of the usefulness of scores
§  Actual use of holistic score
v Implications/Conclusions
Iowa State University
Applied Linguistics and Technology, Department of English
3
Background of the Study
Large research group on AWE
Linguistic
development
Needs
Analysis
Assessment
Pedagogical
practice
This study
Usefulness
Iowa State University
Applied Linguistics and Technology, Department of English
4
Background of the Study
¡ U sefulness of AWE Systems
§ Holistic Scores
§ Trait Feedback
§ AWE Features (Planning, Handbook, Spell checker, etc.)
§ Overall
Iowa State University
Applied Linguistics and Technology, Department of English
5
AWE Tool in this Study:
Criterion®
Iowa State University
Applied Linguistics and Technology, Department of English
6
AWE Tool in this Study:
Criterion®
Iowa State University
Applied Linguistics and Technology, Department of English
7
Literature Review
¡ Perceptions of score usefulness in classroom setting
Instructors vs.
Learners
Problems with scoring
system
Grimes & Warschauer (2010, 2006)
Chen & Cheng (2008)
Teachers often disagreed
with scores, but found them
helpful for teaching
“Students tended to be less
skeptical of the scores than
teachers” (p10.).
Teachers showed lightly less
than neutral opinions about
fairness and accuracy
Iowa State University
Applied Linguistics and Technology, Department of English
1. Favors lengthiness
2. Overemphasizes the use of
transition words
3. Ignores coherence and content
development
4. Discourages unconventional
ways of essay writing
5. Partially reflects actual
English writing ability
8
Literature Review
v 
Instructors’ Use of AWE scores
Part of students’ grades or complete
disregard (varied reliance)
• Chen & Cheng (2008)
• Grimes & Warschauer (2010)
Required minimum AWE score
v 
• Chen & Cheng (2008)
Lack of research on how access
to AWE
scores transfers
toscores
use in the classroom
Learners’
Use of AWE
Scores motivated students
Low scores led to higher
improvement
Iowa State University
Applied Linguistics and Technology, Department of English
• Ebyary & Windeatt (2010)
• Attali (2004)
9
Gaps in the area of AWE
v 
v 
More in-depth studies on learner/instructor
perception of AWE score usefulness are needed.
Few studies on actual use of AWE scores in the
classroom context.
Purpose:
v 
to investigate the usefulness of AWE scores in
the ESL writing classroom.
Iowa State University
Applied Linguistics and Technology, Department of English
10
Research Questions
Perceptions of the Usefulness
of Criterion Scores
(RQ1) What are the learners’
perceptions of the usefulness of
Criterion holistic scores in the
classroom?
(RQ2) What are the instructors’
perceptions of the usefulness of
Criterion holistic scores in the
classroom?
Iowa State University
11
Applied Linguistics and Technology, Department of English
Actual Use of
Scores
(RQ3) How are
Criterion holistic
scores used by
learners and
instructors in the ESL
writing classroom?
Methodology
Setting
Spring 2011 to Fall 2011
ESL Academic Writing Course (Engl101C)
Participants
7 ESL academic writing instructors
-Experienced ESL writing instructors
-Proficient in technology use
140 ESL learners
-Intermediate proficiency level
-Majority from Asian backgrounds
Iowa State University
Applied Linguistics and Technology, Department of English
12
Methodology
Materials
§ 4 Major papers
§ 
§ 
§ 
§ 
Paper 1: Narrative
Paper 2: Compare and Contrast
Paper 3: Cause and Effect
Paper 4: Argumentative
§ Questionnaires
§ Interviews
Iowa State University
Applied Linguistics and Technology, Department of English
13
Methodology
Data
Collection
Data
Collection
Data
Analysis
(Spring 2011)
(Fall 2011)
(Spring 2012)
Learner Data
Instructor Data
Questionnaires
Individual Interviews
Questionnaires
Individual Interviews
Focus Group Interviews
Iowa State University
Applied Linguistics and Technology, Department of English
Methodology
Data Analysis
● 
Inductive Coding (RQ1-RQ3)
● 
Descriptive Analysis (RQ1-RQ2)
Iowa State University
Applied Linguistics and Technology, Department of English
15
Research Questions
Learners’
Perceptions
of the
Usefulness of
Criterion
Scores
Iowa State University
Instructors’
Perceptions
of the
Usefulness of
Criterion
Scores
16
Applied Linguistics and Technology, Department of English
Actual Use of
Holistic Scores
RQ1: Learners’ Perception
of Score Usefulness
1) Usefulness of Criterion scores
“From the score report , I can revise my essay and
improve my writing” (101c student)
-------------------“Criterion’s feedback always the same. Like, if you
use, like my score…sometimes it’s 5, sometimes a
4. But the… explanation about your score always
the same.…” (101c310)
Iowa State University
Applied Linguistics and Technology, Department of English
17
RQ1: Learners’ Perception
of Score Usefulness
2) Learners’ Trust in Criterion scores
Q: How much do you trust the scores from Criterion?
Result: 4.12 (relatively high)
Respondents
Learners’ Rating of Trustworthiness (N=54)
20
10
0
6
5
High trust
Iowa State University
Applied Linguistics and Technology, Department of English
4
3
2
1
Low trust
18
RQ1: Learners’ Perception
of Score Usefulness
2) Learners’ Trust in Criterion scores
“Usually the Criterion will give me a different grade than what
my teacher’s grade. The teacher gives more comments with
details about what I did wrong and Criterion did not. " (101c323)
Iowa State University
Applied Linguistics and Technology, Department of English
19
RQ2: Instructors’ Perception
of Score Usefulness
1) Usefulness of Criterion scores
Q: Please rate the usefulness of Criterion holistic scores.
Result: 3.25
Respondents
Instructors’ Rating of Usefulness (N=4)
1.5
1
0.5
0
6
5
High Usefulness
Iowa State University
Applied Linguistics and Technology, Department of English
4
3
2
1
Low Usefulness
20
RQ2: Instructors’ Perception
of Score Usefulness
1) Usefulness of Criterion scores
“The reason I rated the usefulness as 5 is that students
generally cared about the scores as an evaluation. They may
take Criterion task as a challenge and try multiple times to see
whether they could beat Criterion. In this sense, the scores can
be motivational.” (Hilary)
-----------------------------------------------
“In Fall 2011 I noticed some extreme cases in which students
got high scores and felt really good about them but their
writing was still full of grammatical mistakes and
organizational problems. In light of that, I doubted the
usefulness of the scores.” (Jason)
Iowa State University
Applied Linguistics and Technology, Department of English
21
RQ2: Instructors’ Perception
of Score Usefulness
2) Instructors’ Trust in Criterion scores
Q: How much do you trust the scores from Criterion?
Result: 2.75
Respondents
Instructors’ Rating of Trustworthiness (N=4)
3
2
1
0
6
5
High trust
Iowa State University
Applied Linguistics and Technology, Department of English
4
3
2
1
Low trust
22
RQ2: Instructors’ Perception
of Score Usefulness
2) Trustworthiness of Criterion scores
High Trust
High Scores
Low
Scores
Low Trust
“So I feel like that some of [my student’s]
sentences are unreadable…It’s amazing that...he
got scores like 5 sometimes from Criterion, or
even 6.” (Jason)
Iowa State University
Applied Linguistics and Technology, Department of English
23
RQ2: Instructors’ Perception
of Score Usefulness
3) Interpretation of Criterion scores
More Problems
High Scores
Low
Scores
Not free of
problems
“Getting a high score from Criterion does not mean
you are good, but if you get a low score in criterion, it
means you are problematic.” (Jason)
------------------------------“...and I will say 6 doesn’t mean anything” (Michael)
Iowa State University
Applied Linguistics and Technology, Department of English
24
RQ3: Actual Use of Holistic
Scores
1) Learners’ actual use of C riterion scores
Criterion scores
help push
learners to edit
more
As a Motivator
Iowa State University
Applied Linguistics and Technology, Department of English
Criterion scores
assist students
in arguing their
final grade
As a Defense
25
RQ3: Actual Use of Holistic
Scores
1.1) Learners’ use of C riterion scores as a
Motivator
“…it’s a really powerful power to push me. Yeah, fix it over
and over again” (101c315).
---------------------------“Oh the holistic score, because usually I get a 5 so I want to get
it 6, or a 6. So that motivates me…….When I got a lower score it
would turn me to work harder or improve my errors to get a
higher score for the essay." 101c304
Iowa State University
Applied Linguistics and Technology, Department of English
26
RQ3: Actual Use of Holistic
Scores
1.2) Learners’ use of C riterion scores as a
Defense
“[One student] came in and argued with me that he
had a score of 5…So he thought he was writing good
enough…”(Hilary)
---------------------------“Sometimes they would use Criterion score to
defend themselves” (Jason)
Iowa State University
Applied Linguistics and Technology, Department of English
27
RQ3: Actual Use of Holistic
Scores
2) Instructors’ actual use of C riterion scores
• Help appraise
errors
As a
Forewarning
• Help set
course
requirements
As a
Benchmark
Iowa State University
Applied Linguistics and Technology, Department of English
• Help with
grading
As an
Assessment
28
RQ3: Actual Use of Holistic
Scores
2.1) Instructors’ use of C riterion scores as a
Forewarning
Pay more
attention
Low Scores
“…if I got 2 or 3 from Criterion, I would say that I need to pay more
attention to that paper.” (Michael)
-------------------“Use of AWE scores can help [students] realize
that they still need to work on grammar and
that their grammar is not good as they expected” (Jason)
Iowa State University
Applied Linguistics and Technology, Department of English
29
RQ3: Actual Use of Holistic
Scores
2.2) Instructors’ use of C riterion scores as a
Benchmark
“I ask my students to reach a certain score
before their peer review section (4) and
before their submission to me (5-6)”
(Michael)
Iowa State University
Applied Linguistics and Technology, Department of English
30
RQ3: Actual Use of Holistic
Scores
2.3) Instructors’ use of C riterion scores as
Assessment tool
“In Fall 2011 I used Criterion assignment as a
midterm test and used the holistic scores as
they were.” (Hilary)
-----------------------------------“According to my syllabus, the students…
can get…5 points for getting a score of 6
(out of 6) from Criterion.” (Jason)
Iowa State University
Applied Linguistics and Technology, Department of English
31
Implications & Conclusions
● 
Criterion scores seemed to be less beneficial for:
v 
● 
Use as a summative assessment tool.
Criterion scores were beneficial for:
v 
Use as a formative assessment tool.
– 
A motivator to encourage revision
– 
A guide to inform students of editing issues
Iowa State University
Applied Linguistics and Technology, Department of English
32
Questions/Comments?
Thank you for listening
ZHI LI
H Y EJIN YA N G
H ON G M A
S T EP H A N IE LIN K
VOLKER H EG ELH EIMER
ZHILI@IASTATE.EDU
HJYANG@IASTATE.EDU
HMA2@IASTATE.EDU
SMCROSS@IASTATE.EDU
VOLKERH @ IA S TAT E.EDU
Iowa State University
Applied Linguistics and Technology, Department of English
33
REFERENCES
Attali, Y., Bridgeman, B., & Trapani, C. (2010). Performance of a Generic Approach in Automated
Essay Scoring. Journal of Technology, Learning, and Assessment, 10(3). Retrieved from
http://www.jtla.org.
Ben-Simon, A. & Bennett, R.E. (2007). Toward More Substantively Meaningful Automated Essay
Scoring. Journal of Technology, Learning, and Assessment, 6(1). Retrieved [date] from
http://www.jtla.org.
Bridgeman, B., Trapani, C., & Attali, Y. (2009). Considering fairness and validity in evaluating
automated scoring, Listening, Learning, Leading. Paper presented at the annual meeting of
the National Council on Measurement in Education (NCME) April 13-17, 2009, San Diego,
CA.
Chen, C., & Cheng, W. (2008). Beyond the design of automated writing evaluation: Pedagogical
practices and perceived learning effectiveness in EFL writing Classes, Language Learning &
Technology, 12, 2, 94-112. P106
Ebyary, K., & Windeatt, S. (2010). The impact of computer-based feedback on students’ written
work, International Journal of English Studies, 10 (2), 121-142.
Grimes, D., & Waschauer, M. (2006). Automated essay scoring in the classroom, Paper presented
at the American Educational Research Association.
Grimes, D. & Warschauer, M. (2010). Utility in a Fallible Tool: A Multi-Site Case Study of
Automated Writing Evaluation. Journal of Technology, Learning, and Assessment, 8(6).
Retrieved [date] from http://www.jtla.org.James, C. L. (2006 ). Validating a computerized
scoring system for assessing writing and placing students in composition courses. Assessing
Writing, 11(3), 167-178.
34
Download