ITBE Workshop, April 20, 2013 Assessing Listening Comprehension: Test Format Decisions Brian Hampson, Purdue University Calumet - brian.hampson@purduecal.edu Heather Torrie, Purdue University Calumet – torrieh@purduecal.edu Test Development Stages (adapted from Hughes, 2003) 1. Stating the problem a. What is the purpose? b. What abilities are to be tested? 2. Content a. What tasks should students perform? b. What texts should be used? c. What is the overall format (number of passages, number of listenings, etc)? d. What items should be included? 3. Pilot items – using both native and non-native speakers Validity: Does the test measure what it is supposed to measure? (Hughes, 2003) Sometimes format can affect validity Overview of Various Item Formats Buck (1991) “…successful listening comprehension involves an interaction between linguistic skills, knowledge of the context, background knowledge and inferencing skills. Thus, listening test items, even those written to test one particular skill, turn out on examination to be testing a number of different skills.” Format Research Notes Pros Cons Multiple-choice True/False Matching Selection In’nami & Koizumi, 2009 - Meta-analysis shows MC is easier than SA -Reliability -Less stressful for testees -Takes less time for testees -Encourages bottomup processing -Cognitive Load: difficult for listeners to hold four options in their mind while listening -Guessing -Choose the right answer for the wrong reason -Difficult to write good distracters -Cheating -Reliability issues (scoring difficult, especially for inference items) -Takes time for them to answer Yi’an 1998 - people often choose the right answer for the wrong reasons Short Answer In’nami & Koizumi, 2009 - Meta-analysis shows SA more difficult than MC Buck 1991 – Supports validity, but addresses concerns with reliability -Less guessing -Easier to write items -More authentic -More top-down, especially for main idea questions Other skills being tested -Reading ability -Only word recognition (rather than true comprehension) -Writing ability -Reading (understanding the question and determining which information to write down) Format Research Notes Pros Table/Outline/Chart Completion Song 2011 - Filling out a table is easier than blank notes - Constrains the -Scoring difficult; contents of test takers’ reliability issues notes to a given framework -Authentic -emphasizes top-down processing Brindley & Slatyer, 2002 – found that a chart/table structure was easier than SA/blank notes, and easier than cloze Re-call Used in research (eg, Jung 2003; Sherman 1997); not so much in classroom Cloze / Dictation / Partial Dictation Cons Other skills being tested -Writing ability -Authentic -Good measure of intake -More difficult to score (compile a list of key information units) -Writing and speaking ability -Memory -Note-taking ability -Reliability -Emphasizes bottomup listening -Less authentic -Writing ability Test Delivery Format Options: # of Listenings One time Rationale -Authentic Question Preview* Before Rationale -Helps students focus on particular information Two times through -Affective value for students -Reflects the way listening is taught in the classroom Sandwiched -Promotes top-down processing during the first listening; and bottomup processing during the second listening More than twice / student-controlled -Also can be authentic (listening to recorded lectures, online material, conversation) Afterwards -Further promotes top-down processing *Sherman (1997) and Buck (1991) suggest that question preview seemed to have more of an affective benefit than actual performance benefit. Examinees thought it helped them more than it actually did. References Brindley, G & H. Slatyer. (2002). Exploring task difficulty in ESL listening assessment. Language Testing, 19(4). Uses charts and sentence completion. One listening only. Buck, G. (1991). The testing of listening comprehension: an introspective study. Language Testing, 8(1). Using verbal report, this study looked at the test-taking process of answering short-answer questions based on listening to segments of a narrative. While suggesting strength in validity with using short-answer items, the study reveals concerns with reliability over the various answers. Cross, J. (2009). Effects of listening strategy instruction on news videotext comprehension. Language Testing Research, 13(2). 151-176. One of the ways listening comprehension was measured in this study was using written recalls, where they had to write down everything they remembered from the videotext. Hughes, A. (2003). Testing for Language Teaching. Cambridge University Press. This book is a great overview on test development for all language skills. In’nami & Koizumi. (2009). A meta-analysis of test format effects on reading and listening test performance: Focus on multiple-choice and open-ended formats. Language Testing, 26(2). After reviewing 56 or so studies, found that multiple-choice items were indeed easier than open-ended formats. Jung, E.H. (2003). The role of discourse signaling cues in second language listening comprehension. The Modern Language Journal, 87(4). 562-577. Learners performed a written recall task, after listening to a lecture. Assessment was based on how many key “information units” learners included in the recall. Sherman, J. 1997. The effect of question preview in listening comprehension tests. Language Testing, 14. 185-213. Very interesting study design! Lots of great citations too. Seems that question preview often makes students feel better, but doesn’t necessarily help them. Could interfere with processing. Song, M. (2011). Note-taking quality and performance on an L2 academic listening test. Language Testing, 29(1). Studies how effective note-taking using a partially-filled outline. “it would seem that notes taken in the outline format in particular, because it constrains the contents of test takers’ notes to a given framework, might have more potential as a listening measure than notes in the blank format.” Yi’an, W. (1998). What do tests of listening comprehension test? - A retrospection study of EFL test-takers performing a multiple-choice task. Language Learning. Did a qualitative study on 6 learners and why they chose the answers they did (multiple-choice). Showed a lot of people chose the correct answer, but for the wrong reason.