Document 11590792

advertisement
 Exploring the Usefulness of Holistic Scores in Automated Writing Evaluation (AWE) Li, Z., Link, S., Ma, H., Yang, H., & Hegelheimer, V. (2012) Nowadays, automated writing evaluation (AWE) tools provide students not only with qualitative feedback but also with numeric scores to inform the level of their writing proficiency. Research has claimed that the scores from some commercial AWE programs correlate strongly with human raters’ scores (Attali & Burstein, 2006), however in the language classroom contexts, little is known about the use of AWE scores, and the relationship of these scores in comparison to instructors’ numeric grades. In our large research project we utilize a longitudinal mixed-­‐methods approach (the convergent model of triangulation design, Creswell, J & Clask, V., 2007) to investigate the usability of the holistic scores from Criterion®, an online writing evaluation service developed by Educational Testing Service (ETS). In this presentation, we will focus on two classroom sections from a large subject pool and report findings of three main research questions: (1) How did learners perceive the holistic scores from Criterion®?, (2) How did students use the holistic scores to improve their writing?, and (3) How well are Criterion® holistic scores correlated with the instructors’ grades? To answer the research questions, we triangulated the analysis of Criterion® holistic score reports and instructors’ grades with the qualitative analysis of student questionnaires, student and teacher interviews. Survey and interview data showed that students’ perception of score trustworthiness was moderate, yet students relied more on the trait feedback from Criterion® in order to get a better score than on searching for other sources (e.g. sample essays, instructors’ help). The quantitative analysis indicated a moderate positive correlation (Spearman rho=0.553, p=0.001) between Criterion® holistic scores and instructors’ grades. These findings not only add to previous work on the reliability of AWE programs but also provide justification for use or nonuse of automated feedback for formative assessment purposes. 
Download