Automated Writing Evaluation: Enough about reliability! What really matters for students and teachers? Jooyoung Lee, Zhi Li, Stephanie Link, Hyejin Yang, Volker Hegelheimer Saturday, September 22, 2012 Beyond reliability User-centric evaluation System-centric evaluation Development of NLP tools for writing-AES AES for Testing context AWE for Classroom use 2003 late-1990s 1966 Automated essay scoring (AES) Correspondence with human rating PEG IEA IntelliMetric E-rater 87-97% 61-87% 77-89% 70-85% 85-91% 96-98% (Agreement) 45-59% (Exact agreement) Literature Review: Needs Analysis • Academic writing for international graduate students (U of Hawaii) • Process skills • Pre-writing, editing/revising • Computer skills • Getting help, finding & using resources • Discourse/rhetorical skills • • • • Field-specific research paper (sections of the paper) Posing research questions (finding niche) Grammar “patches”, hedges, connectors Style/appropriacy • Bibliographies/citing/plagiarism (Negretti, 2001) What research says about AWE • Automated Writing Evaluation tools provide both numerical scores and formative feedback • Positive findings: • Motivation (Grimes & Warshauer, 2006) • Grammar (Chodorow et al., 2010) • Rhetorical development (Cotos, 2011) • Negative findings: • Great focus on grammatical and mechanical aspects • Losing sense of audience (CCCC, 2004) Motivation (Gap) for Study • Lack of previous studies that investigated stakeholders’ actual needs • Inconsistent opinions between AWE users • Goal: to investigate what the students and teachers actually need in ESL writing classes and how AWE can meet their needs Research Questions 1. What are the needs of students and teachers in the ESL writing curriculum? 2. What are stakeholders’ views of the current status of AWE? 3. In what ways can AWE improve to meet the needs of students and teachers? Methodology: setting § ESL curriculum at Iowa State University § The purpose of English 101 curriculum is: • To prepare undergraduate non-native speakers of English for success in various written assignments in academic context • To prepare them for English 150: first-year composition Methodology: participants • Coordinators (N = 3) • One coordinator was also a 101 teacher • Teachers (N = 6) • Experienced & inexperienced users of AWE • Students (N = 167) • Experienced: 72 Inexperienced: 95 Methodology: Criterion Methodology: data collection and analysis § Diverse participants § Questionnaires (Descriptive statistics) • 1 questionnaires for experienced students • • 1 questionnaire for new students 1 questionnaire for teachers § Interviews (A priori à inductive coding) • 3 interviews w/ coordinators- 30-60 minutes § Feedback tool analysis (Long, 2005) RQ1: Needs for students Students’ view Teachers’ views Coordinators’ view Local features Global features Global features q Expressions q Grammar q Organization q Content q Process writing q Learner autonomy q Skills in applying feedback q Skills for writing improvement (esp. in content development and organization) q Learner autonomy q Skills in applying feedback q Strategies for process writing q Access to ample amount of feedback q Genre awareness q Focused feedback Using reference material, reading for information, synthesizing it to their own essay, “I would andsay then that making [students] their need judgment, little pieces that isthat independent they needlearning to learn and and iftothey not can do that onlearn their everything own [that would at once.” be the (Coordinator ultimate goal] 1) (Teacher 2) RQ1: Needs for teachers Teachers need help with: Teachers’ views Coordinators’ view q Evaluating writing q Helping students with grammar q Assisting students in becoming independent learners. q Other practical needs (platform for learner community and peer review, integration w/ course management system) q Reducing workload q Understanding how to provide feedback q Knowing what feedback to provide q Providing feedback and in a manageable fashion q Effectively implementing and integrating technology into classrooms That fit it really nicely with my preconception of Criterion removing some of the workload of the teachers, so that teachers end up reading better papers. That’s how I envisioned it initially. (Coordinator 3) RQ2: Student Views of Criterion Q: Do you think Criterion helped you write your argumentative paper? N (=72) % Yes 63 87.5 No 9 12.5 “I think Criterion cannot really give me some suggestion, so I hope my instructor can give me more suggestions after he/she finish reading my paper.” RQ2: Student-Teacher Views of Criterion Major functions on Criterion Feedback on Grammar Feedback on Usage Feedback on Mechanics Feedback on Style Feedback on Organization and Development Experienced Teacher (N=3) M (SD) Experienced Student (N=72) M (SD) Inexperienced Student (N=95) M (SD) 5 (1) 4.66 (1.00) 4.74 (1.03) 5 (1) 4.40 (0.87) 4.59 (1.09) 4.19 (0.87) 4.76 (1.08) 4 (1) 3.76 (1.20) 4.38 (1.19) 2.67 (0.58) 3.63 (1.22) 4.52 (1.08) 5.33 (0.58) RQ2: Teacher Views of AWE Overall positive view with some inconveniences § Positive • “The reason that I give high ratings to Grammar, Usage, and Mechanics is that I believe students can benefit from them if they pay attention to them.” (teacher 1) • “As Criterion provides feedback repetitively, I hope it can help students learn how to improve their writing skills by themselves”. (teacher 3) § Negative • “Although Criterion enables students to save their drafts, one pitfall is that students can only save the very first and the last drafts, which are not good for students and teachers.” (teacher 4) RQ2: Teacher Coordinators’ Views of AWE Coordinator and Teacher 1 Coordinator 2 Coordinator 3 Grading students' essay 1 1 5 Giving feedback to students 4 5 3 Setting up assignments 1 3 5 Receiving and collecting students' essay 1 3 5 Tracking students' progress 1 4 2 1.5 3 2 Likert scale items (1 = not useful, 6=very useful) Reducing workload in terms of grading and feedback giving RQ2: Coordinator Views of AWE • Changes in his attitudes “My thought was Criterion should be able to alleviate some of the pressure on teachers…hoping that it would remove or take a way some...of the grading burden on the side of teachers. Based on some of the things we’ve looked at, some of the problems that students had with, or teachers had with Criterion…some of the inconsistency in terms of grading, recognizing some mistakes, and some of your recent findings .. I’m beginning to doubt as to whether or not it really helps instructors. I don’t know yet. I’d like to learn more about it. I’m not convinced as I once was about utility of it. I still think there is..but I think I have to take a deeper look at it.” (Coordinator 3) RQ3: Suggestions for future AWE Based on current needs and stakeholders’ views Student Teachers q Feedback is not comprehensible Suggestions q Organization/ style q Utility (e.g. save draft / feedback; pop-up notes) Coordinators q Learner / Teacher training (tech support / material support) q Focus more on focused feedback (treatable errors) “I wish they could see the submissions of other students. I wish they had a feature for peer review.” (Coordinator 1) RQ3: Suggestions for future AWE “[Students] need other entity to tell them about their writing to make them look at their writing again; there’s some good feedback that ESL students can benefit from; it’s not perfect but pretty good at it.” (Coordinator 2) Chodorow’s study (2010) citation -> “articles / prepositions” HYEJIN please revise Implications • Feedback Categories (Ferris, 2001) AWE Feedback Treatability fragment, missing comma treatable AWE Feedback article errors run-on sentences treatable confused words garbled sentences treatable wrong/missing words SV agreement treatable ill-formed verb treatable wrong form of word Treatability less treatable treatable faulty comparison pronoun errors nonstandard word form possessive errors negation error preposition error less treatable Implications Error gravity (WHO IS THIS?, ####) Stakeholders’ needs and AWE Students’ needs Teachers’ Needs q Expressions q Grammar q Organization q Content q Process writing q Learner autonomy q Skills in applying feedback q Skills for writing improvement (esp. in content development and organization) q Access to ample amount of feedback q Genre awareness q Focused feedback q Evaluating writing q Helping students with grammar q Assisting students in becoming independent learners. q Other practical needs q Reducing workload q Understanding how to provide feedback q Knowing what feedback to provide q Providing feedback and in a manageable fashion q Effectively implementing and integrating technology into classrooms Implications – please write a short note to connect the checklist …. • “I think [AWE] should be used and we should figure out how to best use it. It may not be perfect for everybody but there is a better way of using it. We just have to find out...so I’m not ready to give up on it.” (Coordinator 3) Thank you! Questions/Comments? E-mail: ___ http://volkerh.public.iastate.edu/awe References • CCCC. (2004). Position statement on teaching, learning, and assessing writing in digital environments. Retrieved from http://www.ncte.org/cccc/resources/positions • Chodorow, M., Gamon, M., & Tetreault, J. (2010). The utility of article and preposition error correction systems for English language learners: Feedback and assessment. Language Testing, 27(3), 419-436. • Cotos, E. (2011). Potential of automated writing evaluation feedback. CALICO Journal, 28(2), 420-459. • Grimes, D., & Warschauer, M. (2006, April). Automated essay scoring in the classroom. Paper presented at the American Educational Research Association, San Francisco, California. • Grimes, D., & Waschauer, M. (2006). Automated essay scoring in the classroom, Paper presented at the American Educational Research Association. • Grimes, D. & Warschauer, M. (2010). Utility in a Fallible Tool: A Multi-Site Case Study of Automated Writing Evaluation. Journal of Technology, Learning, and Assessment, 8(6). Retrieved [date] from http://www.jtla.org.James, C. L. (2006 ). Validating a computerized scoring system for assessing writing and placing students in composition courses. Assessing Writing, 11(3), 167-178. • Long, M. (2005). Methodological issues in learner needs analysis research. In H. Long (Ed.), Second Language Needs Analysis. Cambridge: Cambridge University Press. • Vann, R. J., Meyer, D. E., & Lorenz, F. O. (1984). Error gravity: A study of faculty opinion of ESL errors. TESOL Quarterly, 18(3), 427–440.