Are multiple choice tests useful in the assessment of a medical statistics module? Tom Fanshawe Nuffield Department of Primary Care Health Sciences University of Oxford Are multiple choice tests useful in the assessment of a medical statistics module? 1) Background 2) Overview of assessment formats 3) Issues arising in multiple choice test writing 4) Results 5) Conclusions "Nelson's Column during the Great Smog of 1952" by N T Stobbs. Licensed under CC BYSA 2.0 via Wikimedia Commons What is the height of Nelson’s Column (in metres)? Is it less than or more than 25 metres? 1) Background 2) Overview of assessment formats 3) Issues arising in multiple choice test writing 4) Results 5) Conclusions Medicine at Oxford • The Oxford Medicine course consists of a three-year preclinical stage, followed by a three-year clinical stage • Statistics teaching occurs during Year 1 and Year 2 of the preclinical stage • Statistics course restructured in 2013 • Students attend 8 one-hour workshops in Year 1 and 8 one-hour workshops in Year 2 • ~ 25 students per workshop, 2 teachers per workshop • 192 teacher-hours per student cohort Assessment strategy • Assessment of most other modules taken in Year 1 and Year 2 is via an end-of-year exam, which contains a substantial multiple choice component • Statistics course is assessed by: • A coursework assignment • A multiple-choice/short answer test (new in 2015) • The main objective is to ensure that all students reach an understanding of basic statistical principles Was it a good idea to change the format of assessment of the statistics course? 1) Background 2) Overview of assessment formats 3) Issues arising in multiple choice test writing 4) Results 5) Conclusions Assessment “What and how students learn depends to a major extent on how they think they will be assessed” (Biggs and Tang, 2007) “[Assessment is] the longest chapter in the book, and in many ways the most important, not least from the point of view of the students themselves” (Race, 1999) It follows that decisions about assessment methods should be central to course design decisions. Methods of assessment • • • • • • • • • • • Closed-book constructed response written examination Open-book constructed response written examination Multiple choice test Online quiz Individual written coursework assignment Group written coursework assignment Written journal Oral examination Oral presentation Laboratory report Audio or video project Question types Constructed response (CR) “Interpret the 95% confidence interval for the difference in sugar consumption between the groups at one year. Discuss whether the difference in mean sugar consumption constitutes a clinically important difference.” Constructed response – short answer “Calculate a 95% confidence interval for mean daily energy intake of individuals in the intervention group.” Multiple choice (MC) “Which of the following intervals do you think would be the narrower? a) 90% confidence interval for the mean b) 99% confidence interval for the mean” Advantages of multiple choice • • • • • • • • Automatic marking makes marking quicker Automatic marking makes marking more reliable Can be used for immediate formative assessment Students prefer MC to CR – easier to prepare for? (Van de Watering et al, 2008) Over time, a ‘question bank’ can be generated and reused (Thelwall, 2000) Repeated use of the same questions in different cohorts enables identification of temporal trends Large number of questions allows greater coverage of syllabus Frequently used already in biomedical sciences, increasing consistency across disciplines NB: Many of the above are more advantageous to the teacher than to the student Disadvantages of multiple choice • May be poorly aligned with course objectives • Much subject material may be unsuitable for assessment using this format • A large number of questions is typically required, risking • Irrelevance • Ambiguity • Reduction in difficulty level • Scoring decisions can be arbitrary and confusing • Marks may not reflect the abilities of the students • ‘Testwiseness’ (Simkin & Kuechler, 2005) • May encourage surface/rote learning (Scouller, 1998) • There is a “danger of [MC tests] being employed in the testing of mere trivia” (Curzon, 2003) • May encourage little more than informed guesswork based on superficial knowledge Why use a closed-book MC test for this module? • Consistency with objectives: we expect students to reach a basic level of understanding of all content, not just some of it • Concerns that coursework assignment is regarded as a group exercise • Conveys message that module is important 45-minute test, 30-40 questions, mixture of MC and short answers, computer-marked Nelson’s Column Revisited "Nelson's Column during the Great Smog of 1952" by N T Stobbs. Licensed under CC BYSA 2.0 via Wikimedia Commons What is the height of Nelson’s Column (in metres)? Is it less than or more than 25 metres? Nelson’s Column Revisited 51.6 m 62 m "Nelson's Column during the Great Smog of 1952" by N T Stobbs. Licensed under CC BYSA 2.0 via Wikimedia Commons "The Monument to the Great Fire of London" by Eluveitie - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons Anchoring Group 1 What is the height of Nelson’s Column (in metres)? Mean 70m Group 2 Is the height of Nelson’s Column less than or more than 25 metres? What is the height of Nelson’s Column (in metres)? Mean 30m The information provided in a question can influence the answer that is given 1) Background 2) Overview of assessment formats 3) Issues arising in multiple choice test writing 4) Results 5) Conclusions The first multiple choice test? • The ‘Alpha Army Intelligence Test’ was developed by Robert Yerkes to assess recruits for the US Army during the First World War (Gould, 1982) Christy Mathewson is famous as a: a) Writer b) Artist c) Baseball player d) Comedian ‘Crisco’ is a: a) b) c) d) Patent medicine Disinfectant Toothpaste Food product The first multiple choice test? • The ‘Alpha Army Intelligence Test’ was developed by Robert Yerkes to assess recruits for the US Army during the First World War (Gould, 1982) Christy Mathewson is famous as a: a) Writer b) Artist c) Baseball player d) Comedian ‘Crisco’ is a: a) b) c) d) Patent medicine Disinfectant Toothpaste Food product Questions are not well aligned with what they are intended to assess Result: the ‘average’ recruit was classified into the category of ‘moronity’ Implausible alternatives Estimate the correlation between the two variables in the figure. a) b) c) d) e) –2 –1 0 +1 +2 Implausible alternatives Estimate the correlation between the two variables in the figure. a) b) c) d) e) –2 –1 0 +1 +2 Make all options plausible (or just use fewer options) Strategic guessing Estimate the correlation between the two variables in the figure. a) b) c) d) – 0.5 0 + 0.5 + 0.8 Strategic guessing Estimate the correlation between the two variables in the figure. a) b) c) d) – 0.5 0 + 0.5 + 0.8 As far as possible, make all options ‘equally guessable’ Unfamiliar alternatives Based on the histogram, which of the following distributions might be used to approximate the distribution of diastolic blood pressure? a) b) c) d) Normal distribution Beta distribution Skellam distribution Zipf-Mandelbrot distribution Unfamiliar alternatives Based on the histogram, which of the following distributions might be used to approximate the distribution of diastolic blood pressure? a) b) c) d) Normal distribution Beta distribution Skellam distribution Zipf-Mandelbrot distribution Identifying the only familiar answer does not demonstrate knowledge Too many options Mean cholesterol was 0.3 mmol/l higher in Group 1 than Group 2. The tstatistic for this comparison is 2.09, with an associated p-value of 0.03. Which of the following applies? a) b) c) d) There is only a 3% chance of observing a difference as large as 0.3 mmol/l, if the alternative hypothesis is true In the long run, there would be a difference in mean cholesterol of at least 0.3 mmol/l in only 3% of studies conducted in the same way as this one There is a statistically significant difference in mean cholesterol between Group 1 and Group 2 Only 3% of individuals in Group 2 have a cholesterol value higher than the mean cholesterol of individuals in Group 1 Too many options Mean cholesterol was 0.3 mmol/l higher in Group 1 than Group 2. The tstatistic for this comparison is 2.09, with an associated p-value of 0.03. Which of the following applies? a) b) c) d) There is only a 3% chance of observing a difference as large as 0.3 mmol/l, if the alternative hypothesis is true In the long run, there would be a difference in mean cholesterol of at least 0.3 mmol/l in only 3% of studies conducted in the same way as this one There is a statistically significant difference in mean cholesterol between Group 1 and Group 2 Only 3% of individuals in Group 2 have a cholesterol value higher than the mean cholesterol of individuals in Group 1 Too many options Mean cholesterol was 0.3 mmol/l higher in Group 1 than Group 2. The tstatistic for this comparison is 2.09, with an associated p-value of 0.03. Which of the following applies? a) b) c) d) e) There is only a 3% chance of observing a difference as large as 0.3 mmol/l, if the alternative hypothesis is true In the long run, there would be a difference in mean cholesterol of at least 0.3 mmol/l in only 3% of studies conducted in the same way as this one There is a statistically significant difference in mean cholesterol between Group 1 and Group 2 Only 3% of individuals in Group 2 have a cholesterol value higher than the mean cholesterol of individuals in Group 1 The difference in mean cholesterol between Group 1 and Group 2 is not statistically significant Too many options Mean cholesterol was 0.3 mmol/l higher in Group 1 than Group 2. The tstatistic for this comparison is 2.09, with an associated p-value of 0.03. Which of the following applies? a) b) c) d) e) There is only a 3% chance of observing a difference as large as 0.3 mmol/l, if the alternative hypothesis is true In the long run, there would be a difference in mean cholesterol of at least 0.3 mmol/l in only 3% of studies conducted in the same way as this one There is a statistically significant difference in mean cholesterol between Group 1 and Group 2 Only 3% of individuals in Group 2 have a cholesterol value higher than the mean cholesterol of individuals in Group 1 The difference in mean cholesterol between Group 1 and Group 2 is not statistically significant Including too many options can make the question easier Other suggestions • • • • • • Don’t use MC for numerical answers (if marking software allows) Randomize the order in which options are presented Insist on flexibility about the number of options presented: fewer is often better Consider introducing ‘scenarios’ that can be used as a basis for several questions Make sure that the questions test what needs to be tested Be very wary of using negative marking… 1) Background 2) Overview of assessment formats 3) Issues arising in multiple choice test writing 4) Results 5) Conclusions ‘Factorial’ choices [Scenario describing study and regression line Cholesterol(mg) = 29.11 + 2.77 × Fat(g)] Which one of the following is correct? a) b) c) d) 1g greater daily fat consumption is associated with 2.77mg greater daily cholesterol consumption – 77% 1g greater daily fat consumption is associated with a 2.77-times greater daily cholesterol consumption – 10% 1mg greater daily cholesterol consumption is associated with 2.77g greater daily fat consumption – 7% 1mg greater daily cholesterol consumption is associated with a 2.77times greater daily fat consumption – 6% High discrimination [Scenario describing study] Which one of the following would tell us about the clinical significance of the difference in weight between the two groups at the end of the study? a) b) c) d) Mean difference – 25% Standard error – 6% P-value – 13% None of the above – 55% High discrimination [Scenario describing study] Calculate a 95% confidence interval for the mean difference in body weight between the two groups. Give as your answer the lower limit only. kg Conclusions MC questions can be used in the assessment of a statistics module, but: • • • • Be aware of their limitations Don’t be too ambitious Don’t use them as a labour-saving convenience Don’t use them in isolation: “Identify what mixture of [assessment] formats will yield the best possible combined effect” (Martinez, 1999) • “Never lose track of the main purpose of assessment: to improve learning” (Garfield, 1994) References Biggs, J. and Tang, C. (2007). Teaching for quality learning at university. Maidenhead: McGraw-Hill/Society for Research into Higher Education & Open University Press. Curzon, L.B. (2003). Teaching in Higher Education, 6th edition. London: Continuum. Garfield, J.B. (1994). Beyond testing and grading: using assessment to improve student learning. Journal of Statistics Education 2(1). Gould, S.J. (1982). A nation of morons. New Scientist 6: 349-352. Martinez, M.E. (1999). Cognition and the question of test item format. Educational Psychologist 34(4): 207-218. Race, P. (1999) (ed.). 2000 tips for lecturers. London: Kogan Page. Scouller, K (1998). The influence of assessment method on students' learning approaches: Multiple choice question examination versus assignment essay. Higher Education 35(4): 453-473. Simkin, M.G. and Kuechler, W.L. (2005). Multiple-choice tests and student understanding: what is the connection? Decision Sciences Journal of Innovative Education 3(1): 73-97. Thelwall, M. (2000). Computer-based assessment: a versatile educational tool. Computers and Education 34(1): 37-49. Van de Watering, G., Gijbels, D., Dochy, F. and van der Rijt, J. (2008). Students’ assessment preferences, perceptions of assessment and their relationships to study results. Higher Education 56: 645-658.