Describe and evaluate the use of multiple choice question for

advertisement
Describe and evaluate the use of
multiple choice question for testing
listening comprehension
Presenter: Luong Thi Phuong Nhi (MA)
Faculty of English for Specific Purposes
Foreign Trade University
CONTENT
Introduction
Discussion
Evaluation
Conclusion
I
II
III
IV
Introduction
According to McNamara (2000), language tests play a significant role in
various aspects of many people’s lives.
A great deal of research on language assessment have been undertaken in
language literature.
The appearance or the growth of a language testing method has resulted
from a certain language teaching and learning approach.
Multiple choice tests have been still perceived as a discrete-point testing
method, resulting from the model of discrete-point approach.
This presentation first discusses the communicativeness of the multiple
choice item and then evaluate the use of multiple choice item for testing
listening comprehension in terms of its practicality, reliability, validity and
backwash.
Description of Multiple choice items


There have been many forms of multiple choice items; however, their
fundamental structure is described as the following (Brown, 2004; Hughes,
2003):
There is a stem which presents a stimulus:
Indoor heating systems have made ______________ for people to live
and work comfortably in temperate climates.
and a number of options or alternatives (normally ranging from three to
five) – one of which is a key (a correct response), the others being
distractors:
A. it is possible
B. possible
C. it possible
D. possibly
(The above example cited from TOEFL success, 1996: 424)
Discussion
McNamara (2000) states that multiple-choice format is regarded the most
important among several types of fixed response format. The use of multiple
choice format for testing listening was widely proposed during the period
when the discrete-point approach gained its favor.
The requirement for new or innovative language test methods appears
when the concept of listening comprehension changes. Under the new
notion of listening comprehension, multiple choice items adopt “new
features”.
Although the test format remains almost the same, the construct is quite
different. It is therefore a misconception if someone still refers multiple
choice tests as “discrete-point tests” (Buck, 2001).
The “communicativeness” of multiple-choice
test
The idea of communicative teaching actually leads to a new shift in
language testing approach.
The language assessment focuses on the measurement of “how much a
person knows about the language” and how he/she can “use it to
communicate effectively” in the target language use context (Buck, 2011;
Weir, 1990).
Such features as the real-world context of the language use, the authenticity
of the tasks or the texts become a matter of concern in communicative
testing (Buck, 2001).
It could be said that the demands of the tests and their use of real-world
context as well as of more authentic task or texts make multiple choice tests
“communicative” in character.
Demand of the multiple-choice test
Although the test-takers are only asked to put a tick against the questions
they choose in multiple choice test, there is a need of interaction between
the test takers and the task.
A variety of listening sub-skills may be assessed in multiple choice tests
(Buck, 2001). The test of listening sub-skills can range from “understanding
at the most explicit literal level, making pragmatic inferences, understanding
implicit meanings to summarizing or synthesizing extensive sections of
tests”
Each kind of listening sub-skills probably places a certain sort of demand on
the test-takers. In other words, the multiple choice tests would demand
“some meta-cognitive processing skills” associated with the test method
(Brown, 2004).
Context
There has been much criticism on the use of multiple choice tests in
“isolated” contexts or “de-contextualized” situations.
The use of multiple choice items in listening tests, however, might depend
on the purpose of listening assessment.
Given the tests being used for listening comprehension, multiple choice
tests need provide contexts of the language use in real-life situations.
In some standardized tests, say TOEFL or TOEIC, multiple choice items are
used to test not only phonology or paraphrase recognition but also
responsive and extensive listening.
Both responsive and extensive listening require the context of language
use essential for the listeners to comprehend the whole texts and perform
the required tasks.
Authenticity
In reality, a communicative test requires the test takers to
perform a real-world task in a certain real-life communicative
context. Multiple choice test could be perceived as a
“communicative” one because it seems to include both authentic
tasks and texts.
Authentic tasks: Listening to monologues, lectures or brief
conversations are likely to be common tasks in multiple choice
tests. The test-takers are normally required to answer to a set of
comprehension questions after their listening (Brown, 2004).
Authentic texts: To create the authenticity of the test texts such
natural speech features as assimilation and elision as well as
hesitation phenomena could be found in the multiple choice
listening tests (Hughes, 2003).
The genuineness of the texts and the authenticity of tasks seem
to be the main focus of TOEFL or TOEIC tests, which still use the
multiple choice format to assess listening skills.
Evaluation
The question of how to make a good or effective tests attract
attention of both test constructors and teachers. There are
some major criteria for “assessing” a test, for example,
practicality, reliability, validity and backwash.
Those criteria are normally “evaluated” separately on the
process of “testing a test” (Brown, 2004).
Practicality
Making or scoring in multiple choice item tests are “rapid” and “economical”
(Cohen, 1998; Hughes, 2003).
The tests provide “predetermined correct responses” and “time-saving scoring
procedures” offering the raters an “easy and consistent process of scoring and
grading” (Brown, 2004).
However, regarding preparation phrase, the practicality of multiple choice test is
likely to be in question since it takes more time, money and effort to prepare
multiple choice questions than open-ended items.
Although multiple choice tests seem to be simple to design, they are of great
difficulty to construct correctly due to their complexity (Brown, 2004; Buck, 2001;
Cohen, 1994).
There is a common view that the construction of multiple choice items requires
trained, skilled or experienced test designers and all items need trailing or pretesting before being used in a test, especially high-stake assessment (Alderson,
2001; Brown, 2004; Buck, 2001; Weir, 1993).
In high-stake assessment, it is probably efficient to administer and score if using
multiple choice items (Brown, 2004; McNamara, 2000).
Reliability
In multiple choice tests, scoring could be perfectly reliable (Cohen,
1994; Hughes, 2003).
Additionally, in multiple choice tests, the issue of scorer’s subjective
assessment is out of concern. The scorers are not permitted to give any
judgment when marking candidates’ answers (Weir, 1998).
However, the test takers scores gained in multiple choice tests may be
of concern. When doing multiple choice tests, the candidates can get
some or most of the correct answers by using some test-taking
strategies like eliminating “implausible choice” or just simply by
guessing (Alderson, 2000; Brown, 2004; Cohen, 1994; Hughes, 2003;
Weir 1988, 1990).
The type of test and the level of reliability also depend on the number of
options in each item. To keep multiple choice items more reliable, it is
suggested that three or four options or alternatives should be presented
(Harrision, 1983; Hughes, 2003); candidate should be required to give
their reasons for marking their choice (Alderson, 2000). Those two
suggestions might make the test more reliable but may affect its
practicality.
Validity
Multiple choice items could serve as tests of receptive skills without requiring
the test-takers to show their productive skills: writing or speaking skills
(Hughes, 2003). Multiple choice tests thus may bring an incorrect picture of
the candidates’ language ability if the assessment of their ability is only based
on the test of receptive skills.
The concern about the uncertainty of their validity arises when multiple choice
items are used to measure language ability. Test-takers’ performance on
multiple choice items is likely an “unreal task” since in real-life situations they
are not forced to show their comprehension by choosing the best options
among those suggested (Weir, 1990).
Another concern about the validity of multiple choice tests is that while
candidates are asked to complete a listening task, they are required to read
and keep in mind four or more alternatives for each listening item (Hughes,
2003; Weir, 1990).
It is however hard to decide whether the candidates give the wrong answer
because of their incomprehension of the text they listened to or just because
of their misunderstanding of the suggested questions or alternatives (Weir,
1988, 1990). The problem with its format creates a sense that the multiple
choice test tends to be “an invalid method for assessing comprehension”
(Weir, 1990).
Backwash
The use of multiple choice items for testing listening comprehension may
have some negative effect on teaching and learning.
There might be a tendency to have some training in “improving educated
guesses rather than in learning the language” (Hughes, 1989; Weir, 1990
cited in Cohen 1994).
For the purpose of score improvement, much effort would be put on the
training test taking techniques rather than learning to improve their listening
ability. According to Weir (1993), such improvement does not mirror an
increase in learners’ language command, but “an enhanced ability” of doing
multiple choice test.
Conclusion
The usefulness of multiple choice items for language testing in general and
testing of listening comprehension in particular has been ongoing
debatable.
It should be flexible and sensible when using multiple choice items for
testing listening. Depending on the type and the purpose of testing,
teachers should know how to make the best use of multiple choice tests.
Despite some advantages offered by this test tool, the overuse of multiple
choice tests in the class may impose some negative effect on teaching and
learning.
References












Alderson, J.C.(2000). Assessing reading. Cambridge: CUP.
Bachman, L.F. & Palmer, A. (1996). Language testing in practice. Oxford: OUP.
Brown, H.D. (2004). Language assessment: Principles and classroom practices.
Person Education, Inc.
Buck, G. (2001). Assessing listening. Cambridge: CUP.
Cohen, A. (1994). Assessing language ability in the classroom. Boston, MA:
Heinle and Heinle.
Harrison, A. (1983). A language testing handbook. London: Macmillan.
Hughes, A. (2003). Testing for language teachers. Cambridge: CUP.
McNamara, T. (2000). Language testing. Oxford: OUP.
Rogers, B. (1996). TOEFL success. Peterson’s Princeton, New Jersey.
Weir, C.J. (1988). Communicative language testing. The university of Exeter.
Weir, C.J. (1990). Communicative language testing. New York. Prentice Hall.
Weir, C.J. (1993). Understanding and developing language tests. New York.
Prentice Hall.
Download