Automated Writing Evaluation: Enough about reliability! What really matters for students and teachers?

advertisement
Automated Writing Evaluation:
Enough about reliability! What
really matters for students and
teachers?
Jooyoung Lee, Zhi Li, Stephanie Link, Hyejin Yang,
Volker Hegelheimer
Saturday, September 22, 2012
Beyond reliability
User-centric evaluation
System-centric evaluation
Development of NLP
tools for writing-AES
AES for
Testing context
AWE for
Classroom use
2003
late-1990s
1966
Automated essay
scoring (AES)
Correspondence
with human
rating
PEG
IEA
IntelliMetric
E-rater
87-97%
61-87%
77-89%
70-85%
85-91%
96-98%
(Agreement)
45-59%
(Exact agreement)
Literature Review: Needs Analysis
•  Academic writing for international graduate students (U of Hawaii)
•  Process skills
•  Pre-writing, editing/revising
•  Computer skills
•  Getting help, finding & using resources
•  Discourse/rhetorical skills
• 
• 
• 
• 
Field-specific research paper (sections of the paper)
Posing research questions (finding niche)
Grammar “patches”, hedges, connectors
Style/appropriacy
•  Bibliographies/citing/plagiarism
(Negretti, 2001)
What research says about AWE
•  Automated Writing Evaluation tools provide both numerical
scores and formative feedback
•  Positive findings:
•  Motivation (Grimes & Warshauer, 2006)
•  Grammar (Chodorow et al., 2010)
•  Rhetorical development (Cotos, 2011)
•  Negative findings:
•  Great focus on grammatical and mechanical aspects
•  Losing sense of audience (CCCC, 2004)
Motivation (Gap) for Study
•  Lack of previous studies that investigated stakeholders’ actual
needs
•  Inconsistent opinions between AWE users
•  Goal: to investigate what the students and teachers actually
need in ESL writing classes and how AWE can meet their
needs
Research Questions
1.  What are the needs of students and teachers in the ESL
writing curriculum?
2.  What are stakeholders’ views of the current status of
AWE?
3.  In what ways can AWE improve to meet the needs of
students and teachers?
Methodology: setting
§  ESL curriculum at Iowa State University
§  The purpose of English 101 curriculum is:
•  To prepare undergraduate non-native speakers of English for success in
various written assignments in academic context
•  To prepare them for English 150: first-year composition
Methodology: participants
•  Coordinators (N = 3)
•  One coordinator was also a 101 teacher
•  Teachers (N = 6)
•  Experienced & inexperienced users of AWE
•  Students (N = 167)
•  Experienced: 72 Inexperienced: 95
Methodology: Criterion
Methodology:
data collection and analysis
§  Diverse participants
§  Questionnaires (Descriptive statistics)
•  1 questionnaires for experienced students
• 
• 
1 questionnaire for new students
1 questionnaire for teachers
§  Interviews (A priori à inductive coding)
• 
3 interviews w/ coordinators- 30-60 minutes
§  Feedback tool analysis
(Long, 2005)
RQ1: Needs for students
Students’ view
Teachers’ views
Coordinators’ view
Local features
Global features
Global features
q  Expressions
q  Grammar
q  Organization
q  Content
q  Process writing
q  Learner autonomy
q  Skills in applying
feedback
q  Skills for writing
improvement
(esp. in content
development and
organization)
q  Learner autonomy
q  Skills in applying
feedback
q  Strategies for process
writing
q  Access to ample
amount of feedback
q  Genre awareness
q  Focused feedback
Using reference material, reading for information, synthesizing it to their own
essay,
“I would
andsay
then
that
making
[students]
their need
judgment,
little pieces
that isthat
independent
they needlearning
to learn and
and iftothey
not
can do that onlearn
their everything
own [that would
at once.”
be the
(Coordinator
ultimate goal] 1) (Teacher 2)
RQ1: Needs for teachers
Teachers need help with:
Teachers’ views
Coordinators’ view
q  Evaluating writing
q  Helping students with grammar
q  Assisting students in becoming
independent learners.
q  Other practical needs (platform for
learner community and peer review,
integration w/ course management
system)
q  Reducing workload
q  Understanding how to provide
feedback
q  Knowing what feedback to provide
q  Providing feedback and in a
manageable fashion
q  Effectively implementing and
integrating technology into
classrooms
That fit it really nicely with my preconception of Criterion removing some of
the workload of the teachers, so that teachers end up reading better papers.
That’s how I envisioned it initially. (Coordinator 3)
RQ2: Student Views of Criterion
Q: Do you think Criterion helped you write your argumentative
paper?
N (=72)
%
Yes
63
87.5
No
9
12.5
“I think Criterion cannot really give me some suggestion, so I hope my
instructor can give me more suggestions after he/she finish reading my
paper.”
RQ2: Student-Teacher Views of
Criterion
Major functions on
Criterion
Feedback on Grammar
Feedback on Usage
Feedback on
Mechanics
Feedback on Style
Feedback on
Organization and
Development
Experienced
Teacher (N=3)
M (SD)
Experienced
Student (N=72)
M (SD)
Inexperienced
Student (N=95)
M (SD)
5 (1)
4.66 (1.00)
4.74 (1.03)
5 (1)
4.40 (0.87)
4.59 (1.09)
4.19 (0.87)
4.76 (1.08)
4 (1)
3.76 (1.20)
4.38 (1.19)
2.67 (0.58)
3.63 (1.22)
4.52 (1.08)
5.33 (0.58)
RQ2: Teacher Views of AWE
Overall positive view with some inconveniences
§  Positive
•  “The reason that I give high ratings to Grammar, Usage, and
Mechanics is that I believe students can benefit from them if they
pay attention to them.” (teacher 1)
•  “As Criterion provides feedback repetitively, I hope it can help
students learn how to improve their writing skills by themselves”.
(teacher 3)
§  Negative
•  “Although Criterion enables students to save their drafts, one
pitfall is that students can only save the very first and the last
drafts, which are not good for students and teachers.” (teacher 4)
RQ2: Teacher Coordinators’ Views of
AWE
Coordinator
and Teacher
1
Coordinator
2
Coordinator
3
Grading students' essay
1
1
5
Giving feedback to students
4
5
3
Setting up assignments
1
3
5
Receiving and collecting students'
essay
1
3
5
Tracking students' progress
1
4
2
1.5
3
2
Likert scale items
(1 = not useful, 6=very useful)
Reducing workload in terms of
grading and feedback giving
RQ2: Coordinator Views of AWE
•  Changes in his attitudes
“My thought was Criterion should be able to alleviate some of
the pressure on teachers…hoping that it would remove or take a
way some...of the grading burden on the side of teachers. Based
on some of the things we’ve looked at, some of the problems
that students had with, or teachers had with Criterion…some of
the inconsistency in terms of grading, recognizing some
mistakes, and some of your recent findings .. I’m beginning to
doubt as to whether or not it really helps instructors. I don’t
know yet. I’d like to learn more about it. I’m not convinced as I
once was about utility of it. I still think there is..but I think I have
to take a deeper look at it.” (Coordinator 3)
RQ3: Suggestions for future AWE
Based on current needs and stakeholders’ views
Student
Teachers
q  Feedback is not
comprehensible
Suggestions
q  Organization/
style
q  Utility (e.g. save
draft / feedback;
pop-up notes)
Coordinators
q  Learner / Teacher
training (tech
support / material
support)
q  Focus more on
focused feedback
(treatable errors)
“I wish they could see the submissions of other students.
I wish they had a feature for peer review.” (Coordinator 1)
RQ3: Suggestions for future AWE
“[Students] need other entity to tell them about
their writing to make them look at their writing
again; there’s some good feedback that ESL
students can benefit from; it’s not perfect but pretty
good at it.” (Coordinator 2)
Chodorow’s study (2010) citation -> “articles /
prepositions” HYEJIN please revise
Implications
•  Feedback Categories (Ferris, 2001)
AWE Feedback
Treatability
fragment, missing comma treatable
AWE Feedback
article errors
run-on sentences
treatable
confused words
garbled sentences
treatable
wrong/missing words
SV agreement
treatable
ill-formed verb
treatable
wrong form of word
Treatability
less treatable
treatable
faulty comparison
pronoun errors
nonstandard word
form
possessive errors
negation error
preposition error
less treatable
Implications
Error gravity (WHO IS THIS?, ####)
Stakeholders’ needs and AWE
Students’ needs
Teachers’ Needs
q  Expressions
q  Grammar
q  Organization
q  Content
q  Process writing
q  Learner autonomy
q  Skills in applying feedback
q  Skills for writing improvement
(esp. in content development and
organization)
q  Access to ample amount of feedback
q  Genre awareness
q  Focused feedback
q  Evaluating writing
q  Helping students with grammar
q  Assisting students in becoming
independent learners.
q  Other practical needs
q  Reducing workload
q  Understanding how to provide
feedback
q  Knowing what feedback to provide
q  Providing feedback and in a
manageable fashion
q  Effectively implementing and
integrating technology into
classrooms
Implications – please write a short
note to connect the checklist ….
•  “I think [AWE] should be used and we
should figure out how to best use it. It may
not be perfect for everybody but there is a
better way of using it. We just have to find
out...so I’m not ready to give up on
it.” (Coordinator 3)
Thank you!
Questions/Comments?
E-mail: ___
http://volkerh.public.iastate.edu/awe
References
• 
CCCC. (2004). Position statement on teaching, learning, and assessing writing in digital environments. Retrieved from
http://www.ncte.org/cccc/resources/positions
• 
Chodorow, M., Gamon, M., & Tetreault, J. (2010). The utility of article and preposition error correction systems for English
language learners: Feedback and assessment. Language Testing, 27(3), 419-436.
• 
Cotos, E. (2011). Potential of automated writing evaluation feedback. CALICO Journal, 28(2), 420-459.
• 
Grimes, D., & Warschauer, M. (2006, April). Automated essay scoring in the classroom. Paper presented at the American
Educational Research Association, San Francisco, California.
• 
Grimes, D., & Waschauer, M. (2006). Automated essay scoring in the classroom, Paper presented at the American
Educational Research Association.
• 
Grimes, D. & Warschauer, M. (2010). Utility in a Fallible Tool: A Multi-Site Case Study of Automated Writing Evaluation.
Journal of Technology, Learning, and Assessment, 8(6). Retrieved [date] from http://www.jtla.org.James, C. L. (2006 ). Validating a
computerized scoring system for assessing writing and placing students in composition courses. Assessing Writing, 11(3),
167-178.
• 
Long, M. (2005). Methodological issues in learner needs analysis research. In H. Long (Ed.), Second Language Needs Analysis.
Cambridge: Cambridge University Press.
• 
Vann, R. J., Meyer, D. E., & Lorenz, F. O. (1984). Error gravity: A study of faculty opinion of ESL errors. TESOL Quarterly,
18(3), 427–440.
Download