Holistic Scores of Automated Writing Evaluation: Consistency, Perceptions, and Use, presented at LTRC 2012 Presenters: Li, Z., Link, S., Ma, H., Yang, H., & Hegelheimer, V. (2012) While research in second language writing suggests that immediate feedback can have a positive influence on students’ written work (Ferris & Roberts, 2001; Stern & Solomon, 2006; Ferris, Pezone, Tade, & Tinti, 1997), the advancements of automated writing evaluation (AWE) have taken the provision of instructor commentary to a new level. Nowadays, writing analysis tools provide students not only with qualitative feedback but also with numeric scores to inform the level of their writing proficiency. Research has claimed that computerized applications provide automated scores that correlate strongly with human raters (Attali & Burstein, 2006), however little is known about the use of scores, and the reliability of these scores in comparison to instructors’ numeric grades in the language classroom. One of the AWE programs is Criterion®, an online writing evaluation service developed by Educational Testing Service (ETS). As an instructional tool, Criterion® can provide learners with a series of trait feedback and an overall holistic score for writings on the prompts from an existing item bank. In our multi-­‐semester research project we utilize the computer-­‐based program to implement a longitudinal mixed-­‐ methods study (using a convergent triangulation design model, see Creswell, J & Clask, V., 2007) on the practices and perspectives of six university-­‐level instructors from 15 course sections and the perspectives, attitudes and linguistic development of 279 ESL students at a large Midwestern university in the United States. In this presentation, we will focus on two classroom sections and report findings of three main research questions: (1) How did learners perceive the holistic scores from Criterion®?, (2) How did students use the holistic scores to improve their writing?, and (3) How well are Criterion holistic scores correlated with the instructors’ grades? To investigate these questions we administered student questionnaires, conducted student and teacher interviews, and evaluated the AWE holistic score reports and instructors’ grades that are based on a pre-­‐developed grading rubric. Survey and interview data showed that students’ perception of score trustworthiness was moderate, yet students relied more on the feedback from Criterion®, in order to get a better score than on searching for other sources (e.g. sample essays, instructors’ help). A Spearman rho correlation coefficient was computed to assess the relationship between holistic scores and instructors’ grades. Findings indicated a positive correlation (Spearman rho=0.553, p=0.001), which raises some issues for integrating the scores for pedagogical purposes. These findings including more of our research outcomes not only add to previous work on the reliability of AWE programs but also provide justification for use or nonuse of automated feedback for formative assessment purposes.