Achieving Positive Backwash in Language Testing

6 Achieving positive backwash Backwash is the effect that tests have on learning and teaching. Before the first edition of this book appeared, little attention was given to the subject. By the time of the second edition, there was much more interest in the topic. Backwash was established as an important part of the impact that a test may have on learners and teachers, on educational systems, and on society at large. Calls had been made for explicit models of backwash, and research had begun into the processes by which it might be achieved1. Now, we are happy to say, we can read the results of research that has confirmed and quantified the effect of tests on teaching and learning. The Further reading section provides a guide to that research. We have also been encouraged by seeing the efforts of major language testing institutions (such as ETS in the United States and Cambridge Assessment English in the UK) to change their tests in ways that will encourage positive backwash. We have no doubt that over the next few years continuing research into backwash will result in a better understanding of the processes involved and how different variables contribute to its effect in different situations. Nevertheless, we believe that the advice which follows, based largely on our practical experience, will prove helpful to teachers seeking to create positive backwash in their own situation. Test the abilities whose development you want to encourage For example, if you want to encourage oral ability, then test oral ability2. This is very obvious, yet it is surprising how often it has not been done. There is a tendency to test what is easiest to test rather than what is most important to test. Reasons for not testing particular abilities may take many forms. It is often said, for instance, that sufficiently high reliability cannot be obtained when a form of testing (such as an oral interview) requires subjective scoring. This is simply not the case, and in addition to the advice already given in the previous chapter, more detailed suggestions for achieving satisfactory reliability of subjective tests are to be found in 1. The word ‘washback’ is being increasingly used in place of ‘backwash’. We will continue to use the original term ‘backwash’, except when citing other authors. Bearing in mind what was said in Chapter 4, it is important that the scoring or rating of test performance (as well as the means of elicitation) should be valid. 2. 57 https://doi.org/10.1017/9781009024723.006 Published online by Cambridge University Press Achieving positive backwash Chapters 9 and 10. The other most frequent reason given for not testing is the expense involved in terms of time and money. This is discussed later in the chapter. 6 It is important not only that certain abilities should be tested, but also that they should be given sufficient weight in relation to other abilities. One of us well remembers his French teacher telling the class that, since the oral component of the General Certificate of Education examination in French (which we were to take later in the year) carried so few marks, we should not waste our time preparing for it. The examining board concerned was hardly encouraging positive backwash. Sample widely and unpredictably Normally a test can measure only a sample of everything included in the specifications. It is important that this sample should represent as far as possible the full scope of what is specified. If not, if the sample is taken from only a restricted area of the specifications, then the backwash effect will tend to be felt only in that area. If, for example, the specifications for a writing test include three or more kinds of task, but repeatedly, over the years, versions of the test include only the same two kinds of task (for instance: compare/contrast; describe/interpret a chart or graph), the likely outcome is that much preparation for the test will be limited to those two types of task. The backwash effect may not be as positive as it might have been had a wider range of tasks been used. Whenever the content of a test becomes highly predictable, teaching and learning are likely to concentrate on what can be predicted. An effort should therefore be made to test across the full range of the specifications (in the case of achievement tests, this should be equivalent to a fully elaborated set of objectives), even where this involves elements that lend themselves less readily to testing3. We must add that core elements of the specifications (those which we believe are most important) should always be represented in each version of a test. Use direct testing As we saw in Chapter 3, direct testing implies the testing of performance skills, with texts and tasks as authentic as possible. If we test directly the skills that we are interested in fostering, then practice for the test 3. It has to be admitted that high-stakes tests will always attract entrepreneurs who offer training courses that attempt to provide potential candidates with tricks and forms of words that will enable them to make higher scores, without necessarily improving their language abilities. This kind of training hardly represents positive backwash. The aim of test constructors must be to minimise the possibility of such training being successful. 58 https://doi.org/10.1017/9781009024723.006 Published online by Cambridge University Press 6 will naturally involve practice in those skills. If we want people to learn to write compositions, we should get them to write compositions in the test. If a course objective is that students should be able to read scientific articles, then we should get them to do that in the test. Immediately we begin to test indirectly, we are removing an incentive for students to practise in the way that we want them to. Achieving positive backwash Make testing criterion-referenced If test specifications make clear what candidates have to be able to do, and with what degree of success, then students will have a clear picture of what they have to achieve. What is more, they will know that if they do perform the tasks at the criterial level, then they will be successful on the test, regardless of how other students perform. Both these things will help to motivate students. Where testing is not criterion-referenced, it becomes easy for teachers and students to assume that a certain (perhaps very high) percentage of candidates will pass, almost regardless of the absolute standard that they reach. The possibility exists of having a series of criterion-referenced tests, each representing a different level of achievement or proficiency. The tests are constructed such that a ‘pass’ is obtained only by completing the great majority of the test tasks successfully. Students are required to take only the test (or tests) on which they are expected to be successful. As a result, they are spared the dispiriting, demotivating experience of taking a test on which they can, for example, respond correctly to fewer than half of the items (and yet be given a pass). This type of testing, we believe, should encourage positive attitudes to language learning. At one time it was the basis of some GCSE (General Certificate of Secondary Education) examinations in Britain. It has to be admitted that there is one potential drawback to having a series of criterion-referenced tests for which a candidate is entered for only one of them. Someone has to decide which test to take. Whether it is the candidate, a teacher, or some other adviser, mistakes may be made. The candidate’s ability may be underestimated or overestimated, resulting in the candidate taking an inappropriate test. One solution to this problem would be to have a single computer adaptive test. This could work well for a test of grammar or vocabulary. For a test of writing, however, where extended pieces of writing are called for, it is hard to see how that would work, unless initial items were short in nature and computer-scoreable. These initial items would effectively form a brief screening test and would serve to direct candidates to longer items. Traditional tests of speaking, carried out with a human interlocutor, are, or should be, adaptive in nature. Base achievement tests on objectives If achievement tests are based on objectives, rather than on detailed teaching and textbook content, they will provide a truer picture of what 59 https://doi.org/10.1017/9781009024723.006 Published online by Cambridge University Press Achieving positive backwash has actually been achieved. Teaching and learning will tend to be evaluated against those objectives. As a result, there will be constant pressure to achieve them. This was argued more fully in Chapter 3. Ensure the test is known and understood by students and teachers 6 However good the potential backwash effect of a test may be, the effect will not be fully realised if students and teachers do not know and understand what the test demands of them. The rationale for the test, its specifications, and sample items (including examples of written and oral performance with grades and examiner comments) should be made available to everyone concerned with preparation for the test. This is particularly important when a new test is being introduced, especially if it incorporates novel testing methods. Another, equally important, reason for supplying information of this kind is to increase test reliability, as was noted in the previous chapter. Where necessary, provide assistance to teachers The introduction of a new test may make demands on teachers to which they are not equal. If, for example, a longstanding national test of grammatical structure and vocabulary is to be replaced by a direct test of a much more communicative nature, it is possible that many teachers will feel that they do not know how to teach communicative skills. One important reason for introducing the new test may have been to encourage communicative language teaching, but if the teachers need guidance and possibly training, and these are not given, the test will not achieve its intended effect. It may simply cause chaos and disaffection. Where new tests are meant to help change teaching, support has to be given to help effect the change. Counting the cost One of the desirable qualities of tests which trips quite readily off the tongue of many testers, after validity and reliability, is that of practicality. Other things being equal, it is good that a test should be easy and cheap to construct, administer, score and interpret. We should not forget that testing costs time and money that could be put to alternative uses. It is unlikely to have escaped the reader’s notice that at least some of the recommendations listed above for creating positive backwash involve more than minimal expense. The individual direct testing of some abilities will take a great deal of time, as will the reliable scoring of performance on any subjective test. The production and distribution of sample tests and the training of teachers will also be costly. It might be argued, therefore, 60 https://doi.org/10.1017/9781009024723.006 Published online by Cambridge University Press 6 that such procedures are impractical. In our opinion, this would reveal an incomplete understanding of what is involved. Before we decide that we cannot afford to test in a way that will promote positive backwash, we have to ask ourselves a question: What will be the cost of not achieving positive backwash? When we compare the cost of the test with the waste of effort and time on the part of teachers and students in activities quite inappropriate to their true learning goals (and in some circumstances, with the potential loss to the national economy of not having more people competent in foreign languages), we are likely to decide that we cannot afford not to introduce a test with a powerful positive backwash effect. Achieving positive backwash READER ACTIVITIES 1. How would you improve the backwash effect of tests that you know? Be as specific as possible. (This is a follow-up to Activity 1 at the end of Chapter 1.) 2. Rehearse the arguments you would use to convince a sceptic that it would be worthwhile making the changes that you recommend. FURTHER READING Theoretical issues Alderson and Wall (1993) question the existence of backwash. Language Testing 13, 3 (1996) is a special issue devoted to backwash. In it Messick discusses backwash in relation to validity. Bailey (1996) reviews the concept of backwash in language testing, including Hughes’s (1993) proposed model and Alderson and Wall’s (1993) fifteen hypotheses about backwash. Wall (1996) looks to developments in general education and to innovation theory for insights into backwash. Hamp-Lyons’s (1997a) article raises ethical concerns in relation to backwash, impact and validity. Her 1997b article discusses ethical issues in test preparation practice for TOEFL®, to which Wadden and Hilke (1999) take exception. Hamp-Lyons (1999) responds to their criticisms. Brown and Hudson (1998) lay out the assessment possibilities for language teachers and argue that one of the criteria for choice of assessment method is potential backwash effect. Alderson (2009) reviews the new TOEFL® and comments on its potential for positive backwash. Research into backwash Wall and Alderson (1993) investigate backwash in a project in Sri Lanka with which they were concerned, argue that the processes involved in backwash are not straightforward, and call for a model of backwash and for further research. Shohamy et al. (1996) report that two different tests have different patterns of backwash. Watanabe (1996) investigates the possible effect of university entrance examinations in Japan on classroom methodology. Alderson and Hamp-Lyons (1996) report on a study into TOEFL® preparation courses and backwash. Muñoz and Álvarez (2010) is an account of a successful attempt to create positive backwash in a 61 https://doi.org/10.1017/9781009024723.006 Published online by Cambridge University Press 6 Achieving positive backwash Colombian university. Cheng (2005) reports on her research into backwash in Hong Kong. Cheng et al. (2011) report on the impact of introducing teachers’ assessments as part of a high-stakes exam. Choi (2008) reports on the negative backwash effects of standardised multiple choice tests in the Korean education system. Luxia (2005) examines the failure of a high-stakes test to achieve its intended backwash effects. Saif (2006) describes an attempt to achieve positive backwash. Cheng et al. (2004) is a collection of articles on carrying out research into backwash. Cheng and Curtis (2012) summarise the results of research into backwash and make recommendations for future research. Green (2007) reports on research into the effect of the academic writing module of a major test on preparation for university study (IELTS). Wall and Horák (2006, 2008, 2011) is a series of reports on the impact of the new TOEFL® on teaching and learning. All of their reports are available online. 62 https://doi.org/10.1017/9781009024723.006 Published online by Cambridge University Press

Achieving Positive Backwash in Language Testing

Related documents

Products

Support

Achieving Positive Backwash in Language Testing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib