Tom_Fanshawe

advertisement
Are multiple choice tests useful in the
assessment of a medical statistics
module?
Tom Fanshawe
Nuffield Department of Primary Care Health
Sciences
University of Oxford
Are multiple choice tests useful in the assessment of
a medical statistics module?
1) Background
2) Overview of assessment formats
3) Issues arising in multiple choice test writing
4) Results
5) Conclusions
"Nelson's Column during the Great Smog of 1952" by N T Stobbs. Licensed under CC BYSA 2.0 via Wikimedia Commons
What is the height of Nelson’s Column (in metres)?
Is it less than or more than 25 metres?
1) Background
2) Overview of assessment formats
3) Issues arising in multiple choice test writing
4) Results
5) Conclusions
Medicine at Oxford
• The Oxford Medicine course consists of a three-year preclinical stage, followed by a three-year clinical stage
• Statistics teaching occurs during Year 1 and Year 2 of the preclinical stage
• Statistics course restructured in 2013
• Students attend 8 one-hour workshops in Year 1 and
8 one-hour workshops in Year 2
• ~ 25 students per workshop, 2 teachers per workshop
• 192 teacher-hours per student cohort
Assessment strategy
• Assessment of most other modules taken in Year 1 and Year 2
is via an end-of-year exam, which contains a substantial
multiple choice component
• Statistics course is assessed by:
• A coursework assignment
• A multiple-choice/short answer test (new in 2015)
• The main objective is to ensure that all students reach an
understanding of basic statistical principles
Was it a good idea to change the format of assessment of the
statistics course?
1) Background
2) Overview of assessment formats
3) Issues arising in multiple choice test writing
4) Results
5) Conclusions
Assessment
“What and how students learn depends to a major extent on how
they think they will be assessed”
(Biggs and Tang, 2007)
“[Assessment is] the longest chapter in the book, and in many
ways the most important, not least from the point of view of the
students themselves”
(Race, 1999)
It follows that decisions about assessment methods should be
central to course design decisions.
Methods of assessment
•
•
•
•
•
•
•
•
•
•
•
Closed-book constructed response written examination
Open-book constructed response written examination
Multiple choice test
Online quiz
Individual written coursework assignment
Group written coursework assignment
Written journal
Oral examination
Oral presentation
Laboratory report
Audio or video project
Question types
Constructed response (CR)
“Interpret the 95% confidence interval for the difference in sugar
consumption between the groups at one year. Discuss whether
the difference in mean sugar consumption constitutes a clinically
important difference.”
Constructed response – short answer
“Calculate a 95% confidence interval for mean daily energy intake
of individuals in the intervention group.”
Multiple choice (MC)
“Which of the following intervals do you think would be the
narrower?
a)
90% confidence interval for the mean
b)
99% confidence interval for the mean”
Advantages of multiple choice
•
•
•
•
•
•
•
•
Automatic marking makes marking quicker
Automatic marking makes marking more reliable
Can be used for immediate formative assessment
Students prefer MC to CR – easier to prepare for?
(Van de Watering et al, 2008)
Over time, a ‘question bank’ can be generated and reused
(Thelwall, 2000)
Repeated use of the same questions in different cohorts
enables identification of temporal trends
Large number of questions allows greater coverage of syllabus
Frequently used already in biomedical sciences, increasing
consistency across disciplines
 NB: Many of the above are more advantageous to the teacher
than to the student
Disadvantages of multiple choice
• May be poorly aligned with course objectives
• Much subject material may be unsuitable for assessment using
this format
• A large number of questions is typically required, risking
• Irrelevance
• Ambiguity
• Reduction in difficulty level
• Scoring decisions can be arbitrary and confusing
• Marks may not reflect the abilities of the students
• ‘Testwiseness’
(Simkin & Kuechler, 2005)
• May encourage surface/rote learning
(Scouller, 1998)
• There is a “danger of [MC tests] being employed in the testing
of mere trivia”
(Curzon, 2003)
• May encourage little more than informed guesswork based on
superficial knowledge
Why use a closed-book MC test for this
module?
• Consistency with objectives: we expect students to
reach a basic level of understanding of all content,
not just some of it
• Concerns that coursework assignment is regarded as
a group exercise
• Conveys message that module is important
 45-minute test, 30-40 questions, mixture of MC and
short answers, computer-marked
Nelson’s Column Revisited
"Nelson's Column during the Great Smog of 1952" by N T Stobbs. Licensed under CC BYSA 2.0 via Wikimedia Commons
What is the height of Nelson’s Column (in metres)?
Is it less than or more than 25 metres?
Nelson’s Column Revisited
51.6 m
62 m
"Nelson's Column during the Great Smog of 1952" by N T Stobbs. Licensed under CC BYSA 2.0 via Wikimedia Commons
"The Monument to the Great Fire of London" by Eluveitie - Own work. Licensed under CC
BY-SA 3.0 via Wikimedia Commons
Anchoring
Group 1
What is the height of Nelson’s Column (in metres)?
Mean 70m
Group 2
Is the height of Nelson’s Column less than or more
than 25 metres?
What is the height of Nelson’s Column (in metres)?
Mean 30m
 The information provided in a question can
influence the answer that is given
1) Background
2) Overview of assessment formats
3) Issues arising in multiple choice test writing
4) Results
5) Conclusions
The first multiple choice test?
• The ‘Alpha Army Intelligence Test’ was developed by Robert
Yerkes to assess recruits for the US Army during the First
World War
(Gould, 1982)
Christy Mathewson is famous
as a:
a) Writer
b) Artist
c) Baseball player
d) Comedian
‘Crisco’ is a:
a)
b)
c)
d)
Patent medicine
Disinfectant
Toothpaste
Food product
The first multiple choice test?
• The ‘Alpha Army Intelligence Test’ was developed by Robert
Yerkes to assess recruits for the US Army during the First
World War
(Gould, 1982)
Christy Mathewson is famous
as a:
a) Writer
b) Artist
c) Baseball player
d) Comedian
‘Crisco’ is a:
a)
b)
c)
d)
Patent medicine
Disinfectant
Toothpaste
Food product
 Questions are not well aligned with what they are
intended to assess
 Result: the ‘average’ recruit was classified into the
category of ‘moronity’
Implausible alternatives
Estimate the correlation between the two variables in the figure.
a)
b)
c)
d)
e)
–2
–1
0
+1
+2
Implausible alternatives
Estimate the correlation between the two variables in the figure.
a)
b)
c)
d)
e)
–2
–1
0
+1
+2
 Make all options plausible (or just use fewer options)
Strategic guessing
Estimate the correlation between the two variables in the figure.
a)
b)
c)
d)
– 0.5
0
+ 0.5
+ 0.8
Strategic guessing
Estimate the correlation between the two variables in the figure.
a)
b)
c)
d)
– 0.5
0
+ 0.5
+ 0.8
 As far as possible, make all options ‘equally guessable’
Unfamiliar alternatives
Based on the histogram, which of the following distributions
might be used to approximate the distribution of diastolic blood
pressure?
a)
b)
c)
d)
Normal distribution
Beta distribution
Skellam distribution
Zipf-Mandelbrot distribution
Unfamiliar alternatives
Based on the histogram, which of the following distributions
might be used to approximate the distribution of diastolic blood
pressure?
a)
b)
c)
d)
Normal distribution
Beta distribution
Skellam distribution
Zipf-Mandelbrot distribution

Identifying the only familiar answer does not demonstrate
knowledge
Too many options
Mean cholesterol was 0.3 mmol/l higher in Group 1 than Group 2. The tstatistic for this comparison is 2.09, with an associated p-value of 0.03.
Which of the following applies?
a)
b)
c)
d)
There is only a 3% chance of observing a difference as large as 0.3
mmol/l, if the alternative hypothesis is true
In the long run, there would be a difference in mean cholesterol of
at least 0.3 mmol/l in only 3% of studies conducted in the same way
as this one
There is a statistically significant difference in mean cholesterol
between Group 1 and Group 2
Only 3% of individuals in Group 2 have a cholesterol value higher
than the mean cholesterol of individuals in Group 1
Too many options
Mean cholesterol was 0.3 mmol/l higher in Group 1 than Group 2. The tstatistic for this comparison is 2.09, with an associated p-value of 0.03.
Which of the following applies?
a)
b)
c)
d)
There is only a 3% chance of observing a difference as large as 0.3
mmol/l, if the alternative hypothesis is true
In the long run, there would be a difference in mean cholesterol of
at least 0.3 mmol/l in only 3% of studies conducted in the same way
as this one
There is a statistically significant difference in mean cholesterol
between Group 1 and Group 2
Only 3% of individuals in Group 2 have a cholesterol value higher
than the mean cholesterol of individuals in Group 1
Too many options
Mean cholesterol was 0.3 mmol/l higher in Group 1 than Group 2. The tstatistic for this comparison is 2.09, with an associated p-value of 0.03.
Which of the following applies?
a)
b)
c)
d)
e)
There is only a 3% chance of observing a difference as large as 0.3
mmol/l, if the alternative hypothesis is true
In the long run, there would be a difference in mean cholesterol of
at least 0.3 mmol/l in only 3% of studies conducted in the same way
as this one
There is a statistically significant difference in mean cholesterol
between Group 1 and Group 2
Only 3% of individuals in Group 2 have a cholesterol value higher
than the mean cholesterol of individuals in Group 1
The difference in mean cholesterol between Group 1 and Group 2 is
not statistically significant
Too many options
Mean cholesterol was 0.3 mmol/l higher in Group 1 than Group 2. The tstatistic for this comparison is 2.09, with an associated p-value of 0.03.
Which of the following applies?
a)
b)
c)
d)
e)
There is only a 3% chance of observing a difference as large as 0.3
mmol/l, if the alternative hypothesis is true
In the long run, there would be a difference in mean cholesterol of
at least 0.3 mmol/l in only 3% of studies conducted in the same way
as this one
There is a statistically significant difference in mean cholesterol
between Group 1 and Group 2
Only 3% of individuals in Group 2 have a cholesterol value higher
than the mean cholesterol of individuals in Group 1
The difference in mean cholesterol between Group 1 and Group 2 is
not statistically significant
 Including too many options can make the question easier
Other suggestions
•
•
•
•
•
•
Don’t use MC for numerical answers (if marking software
allows)
Randomize the order in which options are presented
Insist on flexibility about the number of options presented:
fewer is often better
Consider introducing ‘scenarios’ that can be used as a basis
for several questions
Make sure that the questions test what needs to be tested
Be very wary of using negative marking…
1) Background
2) Overview of assessment formats
3) Issues arising in multiple choice test writing
4) Results
5) Conclusions
‘Factorial’ choices
[Scenario describing study and regression line
Cholesterol(mg) = 29.11 + 2.77 × Fat(g)]
Which one of the following is correct?
a)
b)
c)
d)
1g greater daily fat consumption is associated with 2.77mg greater
daily cholesterol consumption – 77%
1g greater daily fat consumption is associated with a 2.77-times
greater daily cholesterol consumption – 10%
1mg greater daily cholesterol consumption is associated with 2.77g
greater daily fat consumption – 7%
1mg greater daily cholesterol consumption is associated with a 2.77times greater daily fat consumption – 6%
High discrimination
[Scenario describing study]
Which one of the following would tell us about the clinical significance of
the difference in weight between the two groups at the end of the
study?
a)
b)
c)
d)
Mean difference – 25%
Standard error – 6%
P-value – 13%
None of the above – 55%
High discrimination
[Scenario describing study]
Calculate a 95% confidence interval for the mean difference in body
weight between the two groups. Give as your answer the lower limit
only.
kg
Conclusions
MC questions can be used in the assessment of a statistics
module, but:
•
•
•
•
Be aware of their limitations
Don’t be too ambitious
Don’t use them as a labour-saving convenience
Don’t use them in isolation: “Identify what mixture of
[assessment] formats will yield the best possible combined
effect”
(Martinez, 1999)
• “Never lose track of the main purpose of assessment: to
improve learning”
(Garfield, 1994)
References
Biggs, J. and Tang, C. (2007). Teaching for quality learning at university. Maidenhead:
McGraw-Hill/Society for Research into Higher Education & Open University Press.
Curzon, L.B. (2003). Teaching in Higher Education, 6th edition. London: Continuum.
Garfield, J.B. (1994). Beyond testing and grading: using assessment to improve student
learning. Journal of Statistics Education 2(1).
Gould, S.J. (1982). A nation of morons. New Scientist 6: 349-352.
Martinez, M.E. (1999). Cognition and the question of test item format. Educational
Psychologist 34(4): 207-218.
Race, P. (1999) (ed.). 2000 tips for lecturers. London: Kogan Page.
Scouller, K (1998). The influence of assessment method on students' learning
approaches: Multiple choice question examination versus assignment essay. Higher
Education 35(4): 453-473.
Simkin, M.G. and Kuechler, W.L. (2005). Multiple-choice tests and student
understanding: what is the connection? Decision Sciences Journal of Innovative
Education 3(1): 73-97.
Thelwall, M. (2000). Computer-based assessment: a versatile educational tool.
Computers and Education 34(1): 37-49.
Van de Watering, G., Gijbels, D., Dochy, F. and van der Rijt, J. (2008). Students’
assessment preferences, perceptions of assessment and their relationships to study
results. Higher Education 56: 645-658.
Download