Zhi Li, Stephanie Link, Hong Ma, Hyejin Yang, Volker Hegelheimer June 13-16 Iowa State University Applied Linguistics and Technology, Department of English Open education, resources, and design for language learning Exploring the Usefulness of Holistic Scores in Automated Writing Evaluation (AWE) The Age of Machines? ¡ An English-teaching robot stands in front of children at an elementary school in the south of Seoul, South Korea. Iowa State University Applied Linguistics and Technology, Department of English 2 Overview v Background of the Study v Literature Review v Methodology v Results § Learners’ perceptions of the usefulness of scores § Instructors’ perceptions of the usefulness of scores § Actual use of holistic score v Implications/Conclusions Iowa State University Applied Linguistics and Technology, Department of English 3 Background of the Study Large research group on AWE Linguistic development Needs Analysis Assessment Pedagogical practice This study Usefulness Iowa State University Applied Linguistics and Technology, Department of English 4 Background of the Study ¡ U sefulness of AWE Systems § Holistic Scores § Trait Feedback § AWE Features (Planning, Handbook, Spell checker, etc.) § Overall Iowa State University Applied Linguistics and Technology, Department of English 5 AWE Tool in this Study: Criterion® Iowa State University Applied Linguistics and Technology, Department of English 6 AWE Tool in this Study: Criterion® Iowa State University Applied Linguistics and Technology, Department of English 7 Literature Review ¡ Perceptions of score usefulness in classroom setting Instructors vs. Learners Problems with scoring system Grimes & Warschauer (2010, 2006) Chen & Cheng (2008) Teachers often disagreed with scores, but found them helpful for teaching “Students tended to be less skeptical of the scores than teachers” (p10.). Teachers showed lightly less than neutral opinions about fairness and accuracy Iowa State University Applied Linguistics and Technology, Department of English 1. Favors lengthiness 2. Overemphasizes the use of transition words 3. Ignores coherence and content development 4. Discourages unconventional ways of essay writing 5. Partially reflects actual English writing ability 8 Literature Review v Instructors’ Use of AWE scores Part of students’ grades or complete disregard (varied reliance) • Chen & Cheng (2008) • Grimes & Warschauer (2010) Required minimum AWE score v • Chen & Cheng (2008) Lack of research on how access to AWE scores transfers toscores use in the classroom Learners’ Use of AWE Scores motivated students Low scores led to higher improvement Iowa State University Applied Linguistics and Technology, Department of English • Ebyary & Windeatt (2010) • Attali (2004) 9 Gaps in the area of AWE v v More in-depth studies on learner/instructor perception of AWE score usefulness are needed. Few studies on actual use of AWE scores in the classroom context. Purpose: v to investigate the usefulness of AWE scores in the ESL writing classroom. Iowa State University Applied Linguistics and Technology, Department of English 10 Research Questions Perceptions of the Usefulness of Criterion Scores (RQ1) What are the learners’ perceptions of the usefulness of Criterion holistic scores in the classroom? (RQ2) What are the instructors’ perceptions of the usefulness of Criterion holistic scores in the classroom? Iowa State University 11 Applied Linguistics and Technology, Department of English Actual Use of Scores (RQ3) How are Criterion holistic scores used by learners and instructors in the ESL writing classroom? Methodology Setting Spring 2011 to Fall 2011 ESL Academic Writing Course (Engl101C) Participants 7 ESL academic writing instructors -Experienced ESL writing instructors -Proficient in technology use 140 ESL learners -Intermediate proficiency level -Majority from Asian backgrounds Iowa State University Applied Linguistics and Technology, Department of English 12 Methodology Materials § 4 Major papers § § § § Paper 1: Narrative Paper 2: Compare and Contrast Paper 3: Cause and Effect Paper 4: Argumentative § Questionnaires § Interviews Iowa State University Applied Linguistics and Technology, Department of English 13 Methodology Data Collection Data Collection Data Analysis (Spring 2011) (Fall 2011) (Spring 2012) Learner Data Instructor Data Questionnaires Individual Interviews Questionnaires Individual Interviews Focus Group Interviews Iowa State University Applied Linguistics and Technology, Department of English Methodology Data Analysis ● Inductive Coding (RQ1-RQ3) ● Descriptive Analysis (RQ1-RQ2) Iowa State University Applied Linguistics and Technology, Department of English 15 Research Questions Learners’ Perceptions of the Usefulness of Criterion Scores Iowa State University Instructors’ Perceptions of the Usefulness of Criterion Scores 16 Applied Linguistics and Technology, Department of English Actual Use of Holistic Scores RQ1: Learners’ Perception of Score Usefulness 1) Usefulness of Criterion scores “From the score report , I can revise my essay and improve my writing” (101c student) -------------------“Criterion’s feedback always the same. Like, if you use, like my score…sometimes it’s 5, sometimes a 4. But the… explanation about your score always the same.…” (101c310) Iowa State University Applied Linguistics and Technology, Department of English 17 RQ1: Learners’ Perception of Score Usefulness 2) Learners’ Trust in Criterion scores Q: How much do you trust the scores from Criterion? Result: 4.12 (relatively high) Respondents Learners’ Rating of Trustworthiness (N=54) 20 10 0 6 5 High trust Iowa State University Applied Linguistics and Technology, Department of English 4 3 2 1 Low trust 18 RQ1: Learners’ Perception of Score Usefulness 2) Learners’ Trust in Criterion scores “Usually the Criterion will give me a different grade than what my teacher’s grade. The teacher gives more comments with details about what I did wrong and Criterion did not. " (101c323) Iowa State University Applied Linguistics and Technology, Department of English 19 RQ2: Instructors’ Perception of Score Usefulness 1) Usefulness of Criterion scores Q: Please rate the usefulness of Criterion holistic scores. Result: 3.25 Respondents Instructors’ Rating of Usefulness (N=4) 1.5 1 0.5 0 6 5 High Usefulness Iowa State University Applied Linguistics and Technology, Department of English 4 3 2 1 Low Usefulness 20 RQ2: Instructors’ Perception of Score Usefulness 1) Usefulness of Criterion scores “The reason I rated the usefulness as 5 is that students generally cared about the scores as an evaluation. They may take Criterion task as a challenge and try multiple times to see whether they could beat Criterion. In this sense, the scores can be motivational.” (Hilary) ----------------------------------------------- “In Fall 2011 I noticed some extreme cases in which students got high scores and felt really good about them but their writing was still full of grammatical mistakes and organizational problems. In light of that, I doubted the usefulness of the scores.” (Jason) Iowa State University Applied Linguistics and Technology, Department of English 21 RQ2: Instructors’ Perception of Score Usefulness 2) Instructors’ Trust in Criterion scores Q: How much do you trust the scores from Criterion? Result: 2.75 Respondents Instructors’ Rating of Trustworthiness (N=4) 3 2 1 0 6 5 High trust Iowa State University Applied Linguistics and Technology, Department of English 4 3 2 1 Low trust 22 RQ2: Instructors’ Perception of Score Usefulness 2) Trustworthiness of Criterion scores High Trust High Scores Low Scores Low Trust “So I feel like that some of [my student’s] sentences are unreadable…It’s amazing that...he got scores like 5 sometimes from Criterion, or even 6.” (Jason) Iowa State University Applied Linguistics and Technology, Department of English 23 RQ2: Instructors’ Perception of Score Usefulness 3) Interpretation of Criterion scores More Problems High Scores Low Scores Not free of problems “Getting a high score from Criterion does not mean you are good, but if you get a low score in criterion, it means you are problematic.” (Jason) ------------------------------“...and I will say 6 doesn’t mean anything” (Michael) Iowa State University Applied Linguistics and Technology, Department of English 24 RQ3: Actual Use of Holistic Scores 1) Learners’ actual use of C riterion scores Criterion scores help push learners to edit more As a Motivator Iowa State University Applied Linguistics and Technology, Department of English Criterion scores assist students in arguing their final grade As a Defense 25 RQ3: Actual Use of Holistic Scores 1.1) Learners’ use of C riterion scores as a Motivator “…it’s a really powerful power to push me. Yeah, fix it over and over again” (101c315). ---------------------------“Oh the holistic score, because usually I get a 5 so I want to get it 6, or a 6. So that motivates me…….When I got a lower score it would turn me to work harder or improve my errors to get a higher score for the essay." 101c304 Iowa State University Applied Linguistics and Technology, Department of English 26 RQ3: Actual Use of Holistic Scores 1.2) Learners’ use of C riterion scores as a Defense “[One student] came in and argued with me that he had a score of 5…So he thought he was writing good enough…”(Hilary) ---------------------------“Sometimes they would use Criterion score to defend themselves” (Jason) Iowa State University Applied Linguistics and Technology, Department of English 27 RQ3: Actual Use of Holistic Scores 2) Instructors’ actual use of C riterion scores • Help appraise errors As a Forewarning • Help set course requirements As a Benchmark Iowa State University Applied Linguistics and Technology, Department of English • Help with grading As an Assessment 28 RQ3: Actual Use of Holistic Scores 2.1) Instructors’ use of C riterion scores as a Forewarning Pay more attention Low Scores “…if I got 2 or 3 from Criterion, I would say that I need to pay more attention to that paper.” (Michael) -------------------“Use of AWE scores can help [students] realize that they still need to work on grammar and that their grammar is not good as they expected” (Jason) Iowa State University Applied Linguistics and Technology, Department of English 29 RQ3: Actual Use of Holistic Scores 2.2) Instructors’ use of C riterion scores as a Benchmark “I ask my students to reach a certain score before their peer review section (4) and before their submission to me (5-6)” (Michael) Iowa State University Applied Linguistics and Technology, Department of English 30 RQ3: Actual Use of Holistic Scores 2.3) Instructors’ use of C riterion scores as Assessment tool “In Fall 2011 I used Criterion assignment as a midterm test and used the holistic scores as they were.” (Hilary) -----------------------------------“According to my syllabus, the students… can get…5 points for getting a score of 6 (out of 6) from Criterion.” (Jason) Iowa State University Applied Linguistics and Technology, Department of English 31 Implications & Conclusions ● Criterion scores seemed to be less beneficial for: v ● Use as a summative assessment tool. Criterion scores were beneficial for: v Use as a formative assessment tool. – A motivator to encourage revision – A guide to inform students of editing issues Iowa State University Applied Linguistics and Technology, Department of English 32 Questions/Comments? Thank you for listening ZHI LI H Y EJIN YA N G H ON G M A S T EP H A N IE LIN K VOLKER H EG ELH EIMER ZHILI@IASTATE.EDU HJYANG@IASTATE.EDU HMA2@IASTATE.EDU SMCROSS@IASTATE.EDU VOLKERH @ IA S TAT E.EDU Iowa State University Applied Linguistics and Technology, Department of English 33 REFERENCES Attali, Y., Bridgeman, B., & Trapani, C. (2010). Performance of a Generic Approach in Automated Essay Scoring. Journal of Technology, Learning, and Assessment, 10(3). Retrieved from http://www.jtla.org. Ben-Simon, A. & Bennett, R.E. (2007). Toward More Substantively Meaningful Automated Essay Scoring. Journal of Technology, Learning, and Assessment, 6(1). Retrieved [date] from http://www.jtla.org. Bridgeman, B., Trapani, C., & Attali, Y. (2009). Considering fairness and validity in evaluating automated scoring, Listening, Learning, Leading. Paper presented at the annual meeting of the National Council on Measurement in Education (NCME) April 13-17, 2009, San Diego, CA. Chen, C., & Cheng, W. (2008). Beyond the design of automated writing evaluation: Pedagogical practices and perceived learning effectiveness in EFL writing Classes, Language Learning & Technology, 12, 2, 94-112. P106 Ebyary, K., & Windeatt, S. (2010). The impact of computer-based feedback on students’ written work, International Journal of English Studies, 10 (2), 121-142. Grimes, D., & Waschauer, M. (2006). Automated essay scoring in the classroom, Paper presented at the American Educational Research Association. Grimes, D. & Warschauer, M. (2010). Utility in a Fallible Tool: A Multi-Site Case Study of Automated Writing Evaluation. Journal of Technology, Learning, and Assessment, 8(6). Retrieved [date] from http://www.jtla.org.James, C. L. (2006 ). Validating a computerized scoring system for assessing writing and placing students in composition courses. Assessing Writing, 11(3), 167-178. 34