Uploaded by Thanh Mai Thi

Achieving Positive Backwash in Language Testing

advertisement
6
Achieving positive
backwash
Backwash is the effect that tests have on learning and teaching. Before the
first edition of this book appeared, little attention was given to the subject.
By the time of the second edition, there was much more interest in the
topic. Backwash was established as an important part of the impact that
a test may have on learners and teachers, on educational systems, and on
society at large. Calls had been made for explicit models of backwash, and
research had begun into the processes by which it might be achieved1.
Now, we are happy to say, we can read the results of research that has
confirmed and quantified the effect of tests on teaching and learning. The
Further reading section provides a guide to that research. We have also been
encouraged by seeing the efforts of major language testing institutions (such
as ETS in the United States and Cambridge Assessment English in the UK) to
change their tests in ways that will encourage positive backwash.
We have no doubt that over the next few years continuing research into
backwash will result in a better understanding of the processes involved
and how different variables contribute to its effect in different situations.
Nevertheless, we believe that the advice which follows, based largely on
our practical experience, will prove helpful to teachers seeking to create
positive backwash in their own situation.
Test the abilities whose development you want
to encourage
For example, if you want to encourage oral ability, then test oral ability2.
This is very obvious, yet it is surprising how often it has not been done.
There is a tendency to test what is easiest to test rather than what is most
important to test. Reasons for not testing particular abilities may take
many forms. It is often said, for instance, that sufficiently high reliability
cannot be obtained when a form of testing (such as an oral interview)
requires subjective scoring. This is simply not the case, and in addition to
the advice already given in the previous chapter, more detailed suggestions
for achieving satisfactory reliability of subjective tests are to be found in
1.
The word ‘washback’ is being increasingly used in place of ‘backwash’. We will continue to
use the original term ‘backwash’, except when citing other authors.
Bearing in mind what was said in Chapter 4, it is important that the scoring or rating of
test performance (as well as the means of elicitation) should be valid.
2.
57
https://doi.org/10.1017/9781009024723.006 Published online by Cambridge University Press
Achieving positive backwash
Chapters 9 and 10. The other most frequent reason given for not testing is
the expense involved in terms of time and money. This is discussed later in
the chapter.
6
It is important not only that certain abilities should be tested, but also that
they should be given sufficient weight in relation to other abilities. One of
us well remembers his French teacher telling the class that, since the oral
component of the General Certificate of Education examination in French
(which we were to take later in the year) carried so few marks, we should
not waste our time preparing for it. The examining board concerned was
hardly encouraging positive backwash.
Sample widely and unpredictably
Normally a test can measure only a sample of everything included in the
specifications. It is important that this sample should represent as far as
possible the full scope of what is specified. If not, if the sample is taken
from only a restricted area of the specifications, then the backwash effect
will tend to be felt only in that area. If, for example, the specifications for
a writing test include three or more kinds of task, but repeatedly, over
the years, versions of the test include only the same two kinds of task (for
instance: compare/contrast; describe/interpret a chart or graph), the likely
outcome is that much preparation for the test will be limited to those two
types of task. The backwash effect may not be as positive as it might have
been had a wider range of tasks been used.
Whenever the content of a test becomes highly predictable, teaching and
learning are likely to concentrate on what can be predicted. An effort
should therefore be made to test across the full range of the specifications
(in the case of achievement tests, this should be equivalent to a fully
elaborated set of objectives), even where this involves elements that lend
themselves less readily to testing3.
We must add that core elements of the specifications (those which we
believe are most important) should always be represented in each version
of a test.
Use direct testing
As we saw in Chapter 3, direct testing implies the testing of performance
skills, with texts and tasks as authentic as possible. If we test directly
the skills that we are interested in fostering, then practice for the test
3.
It has to be admitted that high-stakes tests will always attract entrepreneurs who offer
training courses that attempt to provide potential candidates with tricks and forms of
words that will enable them to make higher scores, without necessarily improving their
language abilities. This kind of training hardly represents positive backwash. The aim of test
constructors must be to minimise the possibility of such training being successful.
58
https://doi.org/10.1017/9781009024723.006 Published online by Cambridge University Press
6
will naturally involve practice in those skills. If we want people to learn
to write compositions, we should get them to write compositions in the
test. If a course objective is that students should be able to read scientific
articles, then we should get them to do that in the test. Immediately we
begin to test indirectly, we are removing an incentive for students to
practise in the way that we want them to.
Achieving positive backwash
Make testing criterion-referenced
If test specifications make clear what candidates have to be able to do, and
with what degree of success, then students will have a clear picture of what
they have to achieve. What is more, they will know that if they do perform the
tasks at the criterial level, then they will be successful on the test, regardless of
how other students perform. Both these things will help to motivate students.
Where testing is not criterion-referenced, it becomes easy for teachers and
students to assume that a certain (perhaps very high) percentage of candidates
will pass, almost regardless of the absolute standard that they reach.
The possibility exists of having a series of criterion-referenced tests, each
representing a different level of achievement or proficiency. The tests are
constructed such that a ‘pass’ is obtained only by completing the great
majority of the test tasks successfully. Students are required to take only the
test (or tests) on which they are expected to be successful. As a result, they
are spared the dispiriting, demotivating experience of taking a test on which
they can, for example, respond correctly to fewer than half of the items
(and yet be given a pass). This type of testing, we believe, should encourage
positive attitudes to language learning. At one time it was the basis of some
GCSE (General Certificate of Secondary Education) examinations in Britain.
It has to be admitted that there is one potential drawback to having a
series of criterion-referenced tests for which a candidate is entered for only
one of them. Someone has to decide which test to take. Whether it is the
candidate, a teacher, or some other adviser, mistakes may be made. The
candidate’s ability may be underestimated or overestimated, resulting in
the candidate taking an inappropriate test. One solution to this problem
would be to have a single computer adaptive test. This could work well
for a test of grammar or vocabulary. For a test of writing, however, where
extended pieces of writing are called for, it is hard to see how that would
work, unless initial items were short in nature and computer-scoreable.
These initial items would effectively form a brief screening test and would
serve to direct candidates to longer items. Traditional tests of speaking,
carried out with a human interlocutor, are, or should be, adaptive in nature.
Base achievement tests on objectives
If achievement tests are based on objectives, rather than on detailed
teaching and textbook content, they will provide a truer picture of what
59
https://doi.org/10.1017/9781009024723.006 Published online by Cambridge University Press
Achieving positive backwash
has actually been achieved. Teaching and learning will tend to be evaluated
against those objectives. As a result, there will be constant pressure to
achieve them. This was argued more fully in Chapter 3.
Ensure the test is known and understood by
students and teachers
6
However good the potential backwash effect of a test may be, the effect
will not be fully realised if students and teachers do not know and
understand what the test demands of them. The rationale for the test,
its specifications, and sample items (including examples of written and
oral performance with grades and examiner comments) should be made
available to everyone concerned with preparation for the test. This is
particularly important when a new test is being introduced, especially if
it incorporates novel testing methods. Another, equally important, reason
for supplying information of this kind is to increase test reliability, as was
noted in the previous chapter.
Where necessary, provide assistance to teachers
The introduction of a new test may make demands on teachers to which they
are not equal. If, for example, a longstanding national test of grammatical
structure and vocabulary is to be replaced by a direct test of a much more
communicative nature, it is possible that many teachers will feel that they
do not know how to teach communicative skills. One important reason
for introducing the new test may have been to encourage communicative
language teaching, but if the teachers need guidance and possibly training,
and these are not given, the test will not achieve its intended effect. It may
simply cause chaos and disaffection. Where new tests are meant to help
change teaching, support has to be given to help effect the change.
Counting the cost
One of the desirable qualities of tests which trips quite readily off the
tongue of many testers, after validity and reliability, is that of practicality.
Other things being equal, it is good that a test should be easy and cheap to
construct, administer, score and interpret. We should not forget that testing
costs time and money that could be put to alternative uses.
It is unlikely to have escaped the reader’s notice that at least some of the
recommendations listed above for creating positive backwash involve more
than minimal expense. The individual direct testing of some abilities will
take a great deal of time, as will the reliable scoring of performance on
any subjective test. The production and distribution of sample tests and
the training of teachers will also be costly. It might be argued, therefore,
60
https://doi.org/10.1017/9781009024723.006 Published online by Cambridge University Press
6
that such procedures are impractical. In our opinion, this would reveal
an incomplete understanding of what is involved. Before we decide that
we cannot afford to test in a way that will promote positive backwash, we
have to ask ourselves a question: What will be the cost of not achieving
positive backwash? When we compare the cost of the test with the waste
of effort and time on the part of teachers and students in activities quite
inappropriate to their true learning goals (and in some circumstances,
with the potential loss to the national economy of not having more people
competent in foreign languages), we are likely to decide that we cannot
afford not to introduce a test with a powerful positive backwash effect.
Achieving positive backwash
READER ACTIVITIES
1. How would you improve the backwash effect of tests that you know? Be as
specific as possible. (This is a follow-up to Activity 1 at the end of Chapter 1.)
2. Rehearse the arguments you would use to convince a sceptic that it would
be worthwhile making the changes that you recommend.
FURTHER READING
Theoretical issues
Alderson and Wall (1993) question the existence of backwash.
Language Testing 13, 3 (1996) is a special issue devoted to backwash. In
it Messick discusses backwash in relation to validity. Bailey (1996) reviews
the concept of backwash in language testing, including Hughes’s (1993)
proposed model and Alderson and Wall’s (1993) fifteen hypotheses about
backwash. Wall (1996) looks to developments in general education and to
innovation theory for insights into backwash.
Hamp-Lyons’s (1997a) article raises ethical concerns in relation to
backwash, impact and validity. Her 1997b article discusses ethical issues
in test preparation practice for TOEFL®, to which Wadden and Hilke (1999)
take exception. Hamp-Lyons (1999) responds to their criticisms.
Brown and Hudson (1998) lay out the assessment possibilities for language
teachers and argue that one of the criteria for choice of assessment
method is potential backwash effect. Alderson (2009) reviews the new
TOEFL® and comments on its potential for positive backwash.
Research into backwash
Wall and Alderson (1993) investigate backwash in a project in Sri Lanka
with which they were concerned, argue that the processes involved in
backwash are not straightforward, and call for a model of backwash and
for further research. Shohamy et al. (1996) report that two different tests
have different patterns of backwash. Watanabe (1996) investigates the
possible effect of university entrance examinations in Japan on classroom
methodology. Alderson and Hamp-Lyons (1996) report on a study into
TOEFL® preparation courses and backwash. Muñoz and Álvarez (2010)
is an account of a successful attempt to create positive backwash in a
61
https://doi.org/10.1017/9781009024723.006 Published online by Cambridge University Press
6
Achieving positive backwash
Colombian university. Cheng (2005) reports on her research into backwash
in Hong Kong. Cheng et al. (2011) report on the impact of introducing
teachers’ assessments as part of a high-stakes exam. Choi (2008) reports
on the negative backwash effects of standardised multiple choice tests
in the Korean education system. Luxia (2005) examines the failure of a
high-stakes test to achieve its intended backwash effects. Saif (2006)
describes an attempt to achieve positive backwash. Cheng et al. (2004)
is a collection of articles on carrying out research into backwash. Cheng
and Curtis (2012) summarise the results of research into backwash and
make recommendations for future research. Green (2007) reports on
research into the effect of the academic writing module of a major test on
preparation for university study (IELTS). Wall and Horák (2006, 2008, 2011)
is a series of reports on the impact of the new TOEFL® on teaching and
learning. All of their reports are available online.
62
https://doi.org/10.1017/9781009024723.006 Published online by Cambridge University Press
Download