Describe and evaluate the use of multiple choice question for testing listening comprehension Presenter: Luong Thi Phuong Nhi (MA) Faculty of English for Specific Purposes Foreign Trade University CONTENT Introduction Discussion Evaluation Conclusion I II III IV Introduction According to McNamara (2000), language tests play a significant role in various aspects of many people’s lives. A great deal of research on language assessment have been undertaken in language literature. The appearance or the growth of a language testing method has resulted from a certain language teaching and learning approach. Multiple choice tests have been still perceived as a discrete-point testing method, resulting from the model of discrete-point approach. This presentation first discusses the communicativeness of the multiple choice item and then evaluate the use of multiple choice item for testing listening comprehension in terms of its practicality, reliability, validity and backwash. Description of Multiple choice items There have been many forms of multiple choice items; however, their fundamental structure is described as the following (Brown, 2004; Hughes, 2003): There is a stem which presents a stimulus: Indoor heating systems have made ______________ for people to live and work comfortably in temperate climates. and a number of options or alternatives (normally ranging from three to five) – one of which is a key (a correct response), the others being distractors: A. it is possible B. possible C. it possible D. possibly (The above example cited from TOEFL success, 1996: 424) Discussion McNamara (2000) states that multiple-choice format is regarded the most important among several types of fixed response format. The use of multiple choice format for testing listening was widely proposed during the period when the discrete-point approach gained its favor. The requirement for new or innovative language test methods appears when the concept of listening comprehension changes. Under the new notion of listening comprehension, multiple choice items adopt “new features”. Although the test format remains almost the same, the construct is quite different. It is therefore a misconception if someone still refers multiple choice tests as “discrete-point tests” (Buck, 2001). The “communicativeness” of multiple-choice test The idea of communicative teaching actually leads to a new shift in language testing approach. The language assessment focuses on the measurement of “how much a person knows about the language” and how he/she can “use it to communicate effectively” in the target language use context (Buck, 2011; Weir, 1990). Such features as the real-world context of the language use, the authenticity of the tasks or the texts become a matter of concern in communicative testing (Buck, 2001). It could be said that the demands of the tests and their use of real-world context as well as of more authentic task or texts make multiple choice tests “communicative” in character. Demand of the multiple-choice test Although the test-takers are only asked to put a tick against the questions they choose in multiple choice test, there is a need of interaction between the test takers and the task. A variety of listening sub-skills may be assessed in multiple choice tests (Buck, 2001). The test of listening sub-skills can range from “understanding at the most explicit literal level, making pragmatic inferences, understanding implicit meanings to summarizing or synthesizing extensive sections of tests” Each kind of listening sub-skills probably places a certain sort of demand on the test-takers. In other words, the multiple choice tests would demand “some meta-cognitive processing skills” associated with the test method (Brown, 2004). Context There has been much criticism on the use of multiple choice tests in “isolated” contexts or “de-contextualized” situations. The use of multiple choice items in listening tests, however, might depend on the purpose of listening assessment. Given the tests being used for listening comprehension, multiple choice tests need provide contexts of the language use in real-life situations. In some standardized tests, say TOEFL or TOEIC, multiple choice items are used to test not only phonology or paraphrase recognition but also responsive and extensive listening. Both responsive and extensive listening require the context of language use essential for the listeners to comprehend the whole texts and perform the required tasks. Authenticity In reality, a communicative test requires the test takers to perform a real-world task in a certain real-life communicative context. Multiple choice test could be perceived as a “communicative” one because it seems to include both authentic tasks and texts. Authentic tasks: Listening to monologues, lectures or brief conversations are likely to be common tasks in multiple choice tests. The test-takers are normally required to answer to a set of comprehension questions after their listening (Brown, 2004). Authentic texts: To create the authenticity of the test texts such natural speech features as assimilation and elision as well as hesitation phenomena could be found in the multiple choice listening tests (Hughes, 2003). The genuineness of the texts and the authenticity of tasks seem to be the main focus of TOEFL or TOEIC tests, which still use the multiple choice format to assess listening skills. Evaluation The question of how to make a good or effective tests attract attention of both test constructors and teachers. There are some major criteria for “assessing” a test, for example, practicality, reliability, validity and backwash. Those criteria are normally “evaluated” separately on the process of “testing a test” (Brown, 2004). Practicality Making or scoring in multiple choice item tests are “rapid” and “economical” (Cohen, 1998; Hughes, 2003). The tests provide “predetermined correct responses” and “time-saving scoring procedures” offering the raters an “easy and consistent process of scoring and grading” (Brown, 2004). However, regarding preparation phrase, the practicality of multiple choice test is likely to be in question since it takes more time, money and effort to prepare multiple choice questions than open-ended items. Although multiple choice tests seem to be simple to design, they are of great difficulty to construct correctly due to their complexity (Brown, 2004; Buck, 2001; Cohen, 1994). There is a common view that the construction of multiple choice items requires trained, skilled or experienced test designers and all items need trailing or pretesting before being used in a test, especially high-stake assessment (Alderson, 2001; Brown, 2004; Buck, 2001; Weir, 1993). In high-stake assessment, it is probably efficient to administer and score if using multiple choice items (Brown, 2004; McNamara, 2000). Reliability In multiple choice tests, scoring could be perfectly reliable (Cohen, 1994; Hughes, 2003). Additionally, in multiple choice tests, the issue of scorer’s subjective assessment is out of concern. The scorers are not permitted to give any judgment when marking candidates’ answers (Weir, 1998). However, the test takers scores gained in multiple choice tests may be of concern. When doing multiple choice tests, the candidates can get some or most of the correct answers by using some test-taking strategies like eliminating “implausible choice” or just simply by guessing (Alderson, 2000; Brown, 2004; Cohen, 1994; Hughes, 2003; Weir 1988, 1990). The type of test and the level of reliability also depend on the number of options in each item. To keep multiple choice items more reliable, it is suggested that three or four options or alternatives should be presented (Harrision, 1983; Hughes, 2003); candidate should be required to give their reasons for marking their choice (Alderson, 2000). Those two suggestions might make the test more reliable but may affect its practicality. Validity Multiple choice items could serve as tests of receptive skills without requiring the test-takers to show their productive skills: writing or speaking skills (Hughes, 2003). Multiple choice tests thus may bring an incorrect picture of the candidates’ language ability if the assessment of their ability is only based on the test of receptive skills. The concern about the uncertainty of their validity arises when multiple choice items are used to measure language ability. Test-takers’ performance on multiple choice items is likely an “unreal task” since in real-life situations they are not forced to show their comprehension by choosing the best options among those suggested (Weir, 1990). Another concern about the validity of multiple choice tests is that while candidates are asked to complete a listening task, they are required to read and keep in mind four or more alternatives for each listening item (Hughes, 2003; Weir, 1990). It is however hard to decide whether the candidates give the wrong answer because of their incomprehension of the text they listened to or just because of their misunderstanding of the suggested questions or alternatives (Weir, 1988, 1990). The problem with its format creates a sense that the multiple choice test tends to be “an invalid method for assessing comprehension” (Weir, 1990). Backwash The use of multiple choice items for testing listening comprehension may have some negative effect on teaching and learning. There might be a tendency to have some training in “improving educated guesses rather than in learning the language” (Hughes, 1989; Weir, 1990 cited in Cohen 1994). For the purpose of score improvement, much effort would be put on the training test taking techniques rather than learning to improve their listening ability. According to Weir (1993), such improvement does not mirror an increase in learners’ language command, but “an enhanced ability” of doing multiple choice test. Conclusion The usefulness of multiple choice items for language testing in general and testing of listening comprehension in particular has been ongoing debatable. It should be flexible and sensible when using multiple choice items for testing listening. Depending on the type and the purpose of testing, teachers should know how to make the best use of multiple choice tests. Despite some advantages offered by this test tool, the overuse of multiple choice tests in the class may impose some negative effect on teaching and learning. References Alderson, J.C.(2000). Assessing reading. Cambridge: CUP. Bachman, L.F. & Palmer, A. (1996). Language testing in practice. Oxford: OUP. Brown, H.D. (2004). Language assessment: Principles and classroom practices. Person Education, Inc. Buck, G. (2001). Assessing listening. Cambridge: CUP. Cohen, A. (1994). Assessing language ability in the classroom. Boston, MA: Heinle and Heinle. Harrison, A. (1983). A language testing handbook. London: Macmillan. Hughes, A. (2003). Testing for language teachers. Cambridge: CUP. McNamara, T. (2000). Language testing. Oxford: OUP. Rogers, B. (1996). TOEFL success. Peterson’s Princeton, New Jersey. Weir, C.J. (1988). Communicative language testing. The university of Exeter. Weir, C.J. (1990). Communicative language testing. New York. Prentice Hall. Weir, C.J. (1993). Understanding and developing language tests. New York. Prentice Hall.