ASSESSING WRITING (1) Lecture 8 Teaching Writing in EFL/ESL Joy Robbins TODAY’S SESSION Your own experiences of assessment The purposes of assessment The concepts of reliability and validity in assessment 3 different approaches to the scoring of writing tests: 1. Holistic scoring 2. Analytic scoring 3. Primary and multiple trait scoring 2 ASSESSMENT: INTRODUCTORY DISCUSSION What’s the point of assessing writing? How have your teachers at school and university assessed your writing in your 1st and 2nd languages? Do you think there was any point in assessing you? Why (not)? In what ways have the scores and grades you have received on your writing (in L1 and L2) helped you improve your writing? If you are an experienced language teacher, what do you feel are your greatest challenges in evaluating student writing? If you aren’t an experienced teacher, what makes you nervous about assessing student writing? Why? 3 (Based on questions in Ferris & Hedgcock 1998: 227) WHAT’S THE POINT OF ASSESSMENT? Brindley (2001) lists the following purposes of assessment: selection: e.g. to determine whether learners have sufficient language proficiency to be able to undertake tertiary study; certification: e.g. to provide people with a statement of their language ability for employment purposes; accountability: e.g. to provide educational funding authorities with evidence that intended learning outcomes have been achieved and to justify expenditure; diagnosis: e.g. to identify learners’ strengths and weaknesses; instructional decision-making: e.g. to decide what material to present next or what to revise; motivation: e.g. to encourage learners to study harder. (p.138) 4 2 KEY TERMS Two key terms in the literature on testing and assessment are reliability and validity. Let’s have a closer look at what each of these mean… 5 RELIABILITY ‘reliability refers to the consistency with which a sample of student writing is assigned the same rank or score after multiple ratings by trained evaluators’ (Ferris & Hedgcock 1998: 230) For example: if we’re marking an essay out of 20, the test will be far more reliable if 2 markers both award an essay the same grade (or more or less the same grade), say 16 or 17. However, if 1 marker awards 10 and the other awards 15, the test isn’t reliable. The obvious way to try to achieve reliability is by designing criteria (e.g. for content, organization, grammar, etc.) which the markers refer to when they’re marking the essay 6 VALIDITY Validity refers to whether the test actually measures what it is supposed to measure Researchers have talked about several types of validity, for example: face validity content validity 7 FACE VALIDITY Face validity refers to how acceptable and credible a test is to its users (Alderson et al 1995) So if a test has high face validity, teachers and learners believe it tests what it is supposed to test A test would have low face validity among learners if they had been told a writing test was mainly assessing the quality of their ideas if they believed that teachers marked according to how good the students’ grammar was 8 CONTENT VALIDITY If a test has content validity, we have enough language to make a judgement about the student’s ability. So if a writing test is to have content validity, we need to be confident we have asked the student to do enough writing to display their writing skills 9 2 APPROACHES TO SCORING WRITING There are 2 main ways of scoring writing tests, the holistic approach and the analytic approach Let’s look at each of these in turn… 10 HOLISTIC SCORING Holistic scoring means that the assessor assesses the text generally, rather than focusing on 2 or 3 specific aspects The idea is that the assessor quickly reads through a text, gets a global impression, and awards a grade accordingly The holistic approach is supposed to respond to the writing positively, rather than negatively focusing on the things the writer has failed to do Let’s look at an example of holistic grading criteria... 11 HOLISTIC WRITING ASSESSMENT: AN EXAMPLE Have a look at the example of a holistic marking scheme I’ve given you on the handout, and discuss the questions… Afterwards, based on this example, make a list of pros and cons of using a holistic approach to assessing writing 12 HOLISTIC SCORING: ADVANTAGES Quick and easy, because there are few categories for the teacher to choose from 13 HOLISTIC SCORING: DISADVANTAGES Holistic scoring can’t provide the writing teacher with diagnostic information about students’ writing, because it doesn’t focus on tangible aspects of writing (e.g. organization, grammar, etc.) The holistic approach only produces a single score, so it’s less reliable than the analytical approach, which produces several scores (e.g. content, organization, grammar, etc.)…unless more than 1 assessor marks the tests A single score can be difficult to interpret for both teachers and students (‘What does 70% actually mean?’ ‘What did I do well?’ ‘What did I do badly?’) 14 HOLISTIC DISADVANTAGES (CONTD.) ‘…the same score assigned to two different texts may represent entirely distinct sets of characteristics even if raters’ scores reflect a strict and consistent application of the rubric. This can happen because a holistic score compresses a range of interconnected evaluations about all levels of the texts in question (i.e., content, form, style, etc.)’. (Ferris & Hedgcock 1998: 234) Even though assessors are supposed to assess a range of features in holistic scoring (e.g. style, content, organization, grammar, spelling, punctuation, etc.), this isn’t easy to do. So some assessors may (consciously or unconsciously) value 1 or 2 of these criteria as more important than the others, and give more weighting to these in their scores (Lumley & McNamara 1995; McNamara 1996). 15 ANALYTIC SCORING Analytic scoring separates different aspects of writing (e.g. organization, ideas, spelling) and grades them separately Let’s look at an example of analytic grading criteria... 16 ANALYTIC WRITING ASSESSMENT: AN EXAMPLE Have a look at the example of an analytic marking scheme I’ve given you on the handout, and discuss the questions… Afterwards, based on this example, make a list of pros and cons of using an analytic approach to assessing writing 17 ANALYTIC SCORING: ADVANTAGES Analytic schemes provide learners with much more meaningful feedback than holistic schemes. Teachers can hand students’ essays back with the criteria (e.g. marks out of 10 for organization, spelling, etc.) circled which the writing was awarded Analytic schemes can be designed to reflect the priorities of the writing course. So, for instance, if you have stressed the value of good organization on your course, you can weight the analytic criteria so that organization is worth 60% of the marks Because assessors are assessing specific criteria, it’s easier to train them than assessors who are using holistic schemes (Cohen 1994; McNamara 1996; Omaggio Hadley 1993; Weir 1990) Analytic assessment is more dependable than holistic assessment (Jonsson & Svingby, 2007: 135) 18 ANALYTIC SCORING: DISADVANTAGES Surely a piece of good writing can’t be judged on 3 or 4 criteria? Each of the scales may not be used separately (even though they should be). So, for instance, if the assessor gives a student a very high mark for the ‘ideas’ scale, this may influence the rest of the marks they award the student on the other scales Descriptors for each scale may be difficult to use (e.g. ‘What does ‘adequate organization’ mean?’) 19 PRIMARY AND MULTIPLE TRAIT SCORING We’ve seen how the analytic approach can be criticized for trying to assess a piece of writing on just 3 or 4 criteria… Although primary and multiple trait scoring also use specific criteria to assess writing, the advantage of this approach is that the criteria assessed depend on what kind of writing the student is doing So primary and multiple trait scoring involves ‘devising and deploying a scoring guide that is unique to each prompt and the student writing that it generates’. (Ferris & Hedgcock 1998: 241) 20 PRIMARY AND MULTIPLE TRAIT SCORING: EXAMPLES If the writing exam consisted of persuasive writing (e.g. Justify the case for the legalization of drugs), we might design a scoring scheme based exclusively on the ability to develop an argument If we were using primary trait scoring, just 1 trait would be assessed; if we were using multiple trait scoring, two or more traits would be assessed So in the example of the persuasive writing exam described above, we might design a scoring scheme which not only assessed the student’s ability to develop an argument, but also assessed the student’s use of counterargument, and the credibility of the sources they use to support their own argument, etc. 21 SAMPLE MULTIPLE TRAIT SCORING GUIDE (FERRIS & HEDGCOCK 2005: 317) Timed writing #3 – Comparative Analysis In their respective essays, Chang (2004) and Hunter (2004) express conflicting perspectives on how technology has influenced the education and training of the modern workforce. You will have 90 minutes in which to explain which author presents the most persuasive argument and why. On the basis of a brief summary of each author’s point of view, compare the two essays and determine which argument is the strongest for you. State your position clearly, giving each essay adequate coverage in your discussion. 22 SAMPLE MULTIPLE TRAIT SCORING GUIDE (FERRIS & HEDGCOCK 2005: 317) 23 MULTIPLE TRAIT SCORING: ADVANTAGES Multiple trait scoring doesn’t treat all writing as the same: it assesses (or should assess) the really important skills involved in different types of writing Providing the teacher has discussed the scoring criteria with the class before the exam, the students know exactly what they are being assessed on 24 MULTIPLE TRAIT SCORING: DISADVANTAGES Can be extremely time consuming to design specific assessment criteria for each type of writing (Perkins 1983) Scoring criteria would need to be extensively piloted to ensure they really are assessing the writing fairly Having discussed the holistic, analytic, and primary/multiple trait approaches, we’re now going to try scoring an assignment using the holistic approach… 25 APPLICATION AND DISCUSSION: HOLISTIC SCORING Use Ferris & Hedgcock’s holistic marking scheme to assess a paper written by a student on a pre-master’s academic English course at a UK university You need to do 2 things: 1. Give the paper a score based on the holistic criteria; 2. Write on the paper, making specific comments on the writing 26 APPLICATION AND DISCUSSION (CONTD.) In a pairs or groups, compare your score and comments with those of your colleagues. On what points did you agree or disagree? Why? If you disagreed, try to arrive at a consensus evaluation of the essay. After identifying the sources of your agreement and disagreement, formulate a list of future suggestions for using holistic scoring rubrics. (Ferris & Hedgcock 1998: 261) 27 REFERENCES Alderson JC et al (1995) Language Test Construction and Evaluation. Cambridge: Cambridge University Press. Brindley G (2001) Assessment. In R. Carter & D. Nunan (eds.), The Cambridge Guide to Teaching English to Speakers of Other Languages. Cambridge: Cambridge University Press, pp.137-143. Cohen A (1994) Assessing Language Ability in the Classroom (2nd ed.). Boston: Heinle & Heinle. Ferris D & Hedgcock JS (1998) Teaching ESL Composition: Purpose, Process, and Practice. Mahwah: Lawrence Erlbaum. Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(2), 130-144. Lumley T & McNamara T (1995) Rater characteristics and rater bias: implications for training. Language Testing 12: 54-71. McNamara T (1996) Measuring Second Language Performance. London: Longman. Omaggio Hadley A (1994) Teaching Languages in Context (2nd ed.). Boston: Heinle & Heinle. Perkins K (1983) On the use of composition scoring techniques, objective measures, and objective tests to evaluate ESL writing ability. TESOL Quarterly 17: 651-671. Weir CJ (1990) Communicative Language Testing. New York: Prentice Hall. 28 THIS WEEK’S READING Chapters 5 and 6 of: Ferris D & Hedgcock JS (2005) Teaching ESL Composition: Purpose, Process, and Practice. Mahwah: Lawrence Erlbaum. Min H-T (2005) Training students to become successful peer reviewers. System 33: 293-308. 29 HOMEWORK TASK Use the analytic scoring scale to grade the presessional piece of writing you graded holistically earlier today… Then work through the following questions: How well do your analytic ratings match your holistic ratings? Where do the two sets of scores and comments differ? Why? Given the nature of the writing tasks you evaluated, which of the two scales do you feel is most appropriate? Why? How might you modify one or both of the scales to suit the students you teach? (Adapted from Ferris & Hedgcock 1998: 261-2) 30