Constructing Multiple Choice Question

advertisement
Objectives
Constructing Multiple Choice
Question-Based Examinations
At the end of this session, the participant will be
able to:
1) Construct exams using principles that
improve reliability and validity.
2) Write multiple choice items that meet the
four criteria for critical thinking test items.
3) Write test items that are professional and
meet generally accepted guidelines.
Kathy Chessman, PharmD, FCCP, BCPS, BCNSP
Professor, Clinical Pharmacy and Outcome Sciences
South Carolina College of Pharmacy, MUSC campus
Types of Exams

Achievement test: knowledge based


Systematic procedure for measuring a set
of learning objectives
Achievement Tests

Tests prior to
instruction (pretest)


Performance test: skills based



Laboratory skills (e.g., compounding)
Problem-solving skills (e.g., PN ordering)
Communication skills (e.g., counseling)
Readiness

Tests during instruction (midpoint)

Do students have skills
and abilities needed for
new material
Placement


Have students already
achieved the intended
learning objectives
Formative tests


Monitor student
progress during
instruction
Diagnostic tests

Determine learning or
teaching problems
1
Achievement Tests

Guidelines for Achievement Tests
End of instruction (post-test)


Summative testing


Given at end of a period of instruction to certify
mastery and/or assign a grade



Assesses all learning objectives
Covers breadth of learning experience
Measures a representative sample of all learning
tasks

Guidelines for Achievement Tests

Provide scores free from measurement errors
 Provide consistent & reliable results
 Avoid factors contributing to measurement errors:
 Ambiguity
 Testing with too few questions
 Subjectivity
 Variation in students’ attention, effort, fatigue
 Tendency to guess during a test
 Not allowing enough time to complete
Measure clearly defined learning outcomes (i.e.,
course or lecture objectives)
Generally test at the application level or higher (i.e.,
critical thinking)
 Not just measuring memorized facts and
explanations
Include types of test items that best measure the
intended learning outcomes
Tests as Teaching Tools


Improve student studying and motivation
 Provide short-term goals
 Clarify tasks and information to be learned
Aid retention and learning
 Learning at the understanding, application,
and interpretation levels is retained longer
2
Planning/Constructing a Test
Tests as Teaching Tools





Aid student self-evaluation
Aid in evaluation of instructional effectiveness
Aid in evaluation of instructor
*Exam reviews can be very valuable as
teaching tools*

Step 1: Determine the purpose
Step 2: Review expected learning
outcomes


What instructional objectives were given
to the students? Test on all objectives.
What cognitive level* should be tested?
Predetermined by objectives.
*Bloom’s levels of cognition, 1956
Illustrative Action Verbs for
Defining Objectives
The Cognitive Domain*
Intellectual abilities and skills






Knowledge
 Remembering previously learned material
Comprehension
 Grasping the meaning of the material
Application
 Using information in concrete situations
Analysis
 Breaking down material into its parts
Synthesis
 Putting parts together into a whole
Evaluation

Judging the value of a given thing for a given purpose using
definitive criteria

Knowledge


Comprehension


Identify, name, state, define, describe, list,
match
Classify, explain, summarize, convert
Application

Demonstrate, compute, solve, modify
*Bloom’s levels of cognition, 1956
3
Illustrative Action Verbs for
Defining Objectives

Analysis



Differentiate, diagram, estimate
Synthesis


Step 3: Determine Test
Specifications
Combine, create, formulate, design

Weight the learning objectives and
content areas in terms of their
relative importance and emphasis in
class
Decide on distribution of test items
Evaluation


Judge, critique, compare, justify, redesign


Step 4: Select Item Types

Selection type

Multiple choice





One correct answer
Multiple response (e.g., K-type)
True-false
Matching
Distribute based on relative importance
Distribute based on difficulty
Distribute based on time spent in class
Step 4: Select Item Types

Use the most direct measure of student
performance specified by the intended
learning objective/task


Supply type


Short-answer
Essay (restricted or extended response)

Can they pick the best antibiotic for an infection?
Have them pick an antibiotic for an infection.
Can they use drug-drug interaction knowledge?
Give them a case with a DI that determines
treatment options.
Can they write a PN order? Have them write a PN
order.
4
Step 5: Determine the
Number of Items



Develop a blueprint from weights
Allow for a sufficient sample
Consider practical restraints



Validity: to what extent is the interpretation of the
scores appropriate, meaningful, and useful?

We want test scores to serve their intended purpose:
test the students’ learning
Reliability: to what extent will the test score be
free from errors of measurement?


Arrange the selected items


All students should be able to attempt all items
within the time available
Step 8: Administer the exam
Step 9: Evaluate the exam


Simple multiple choice: 30 seconds
Computation or complex case: longer
Steps 8 and 9: Administer and
Evaluate the Exam Results


Step 6: Construct quality items
Step 7: Assemble the exam
Estimated time per question



Time period for test


Steps 6 and 7: Construct the
Items and Exam
Consistent from class to class, year to year
Effectiveness of each alternative

Items that measure the same objective
should be grouped together
Items of the same type should be grouped
together
Items should be arranged in order of
increasing difficulty (easy to hard)
Step 9: Evaluating Item
Performance

Difficulty index: proportion of the students
answering correctly – range: 0 – 1


Most instructors choose to review all items for which the
difficulty index is ≤ 0.5
Discrimination index: correlation between the
answering of this item and the overall
performance on the test – range: 0 – 1


Did the students who did well overall, answer this item
correctly? Discrimination index would be closer to 1.
Did the students who did poorly overall do poorly with
this item? Discrimination index would be closer to 0.
5
Miscellaneous Issues

Repeated use of an exam or individual test item
is valid and encouraged




Allows for item improvement and validation over time
Allows for development of question or item banks
Improves reliability
Step 6: Writing Effective
Multiple Choice Questions
Frequency of exams


No magic formula
Advantages of frequent testing:



More reliable basis for evaluation
Keeps instructor and students better informed on progress
Less penalty for doing poorly on one exam
Goal of Item Writing

High quality items

Reward the students that KNOW the
material not just KNOW how to take a test




Do not confuse
Do not penalize
Do not reward by inadvertently giving away the
correct answer
Do not test trivia


Test achievement of the stated objectives
Test important material that they need to know
It’s Not as Simple as You Think


Don’t wait until the last minute
Try out your questions on others





Writing critical thinking questions requires
good sound clinical expertise
Unintended violations of item-writing
principles
Multiple meanings
Reading your mind
Ambiguities in wording
6
Types of Multiple Choice
Questions

A-type questions – most common and
most often recommended by experts








Stem with a lead-in question followed by 3
to 5 choices with 1 correct choice and 3-4
plausible, but incorrect, distractors
True/False questions
Multiple response questions (K-type)
Matching items
Four Criteria for Critical
Thinking MC Questions

Levels of Knowledge Tested
Include the rationale for each test item,
include why the correct answer is correct and
why each incorrect answer is incorrect.
Write questions at the application or above
cognitive level.
Require multilogical thinking to answer
questions.
Require a high level of discrimination to
choose from plausible alternatives.

Multiple choice questions CAN assess
higher-level cognitive skills such as
application of knowledge and problemsolving.
Avoid simple rote memory questions


Should comprise ≤ 5% of test
Essential facts are best incorporated
into the problem-solving process
Parts of the MC item


The stem
The responses


One correct or one best answer
Foils or distractors

Key to good questions is being able to write
high-quality plausible distractors that effectively
discriminate those who have mastered the
material from those who have not
Paul R. Critical thinking: what every person needs to survive in a rapidly changing world. 1993.
7
Correct vs. Best Answer
Correct vs. Best Answer
Which of the following drugs reduces
transient relaxation of the lower
esophageal sphincter?
A. Metoclopramide
B. Sucralfate
C. Cisapride
D. Baclofen
A patient in the STICU is receiving PN providing approximately
60% of his estimated calorie needs. The CHO:fat ratio is 50:50,
and the GIR = 2 mg/kg/min. This morning’s 0400 labs reveal a
blood glucose concentration of 280 mg/dL. Which of the following
is the MOST appropriate action?
A.
B.
C.
D.
Knowledge level question.
Item structure

Write as a direct question. (There is
punctuation at the end of the sentence; not
the answers.)

Who is the scientist most closely associated with
the discovery of the polio vaccine?
A.
B.
C.
D.
Jonas Salk
Louis Pasteur
Edward Jenner
Robert Koch
Knowledge level question.
Start an insulin drip titrated to glucose 140-180 mg/dL.
Decrease the amount of glucose in the next PN bag.
Add insulin 10 units to the next PN bag.
Stop the PN; hang 0.9% NaCl at the same rate.
Synthesis level question.
Item structure

Write as an incomplete sentence. Note: there
is punctuation at the end of the choices to
complete the sentence.)
The polio vaccine was discovered by:
A.
B.
C.
D.
Jonas Salk.
Louis Pasteur.
Edward Jenner.
Robert Koch.
Knowledge level question.
8
Stem

Stem development: Poor Wording
Should be clear enough that the type of
answer and maybe even the answer can be
anticipated before looking at the responses.

A. It occurs frequently in women.
B. It may be associated with a finding of
chondrocalcinosis.
C. It is clearly hereditary in most cases.
D. It responds well to treatment with
allopurinol.
Diabetes is:
A. A disorder associated with the body’s absorption of
glucose.
B. A digestive disorder which afflicts adults.
C. A potentially fatal disorder if not detected early.
D. A disease that can result in blindness.
Represents essentially 4 True/False statements.
Represents essentially 4 True/False statements.
Knowledge level question.
Stem Development: Better
Wording

Pseudogout is characterized by which
of the following?
A.
B.
C.
D.
Frequent occurrence in women.
Chondrocalcinosis.
Genetic inheritance.
Good therapeutic response to
allopurinol.
Knowledge level question.
Which of the following is TRUE about
pseudogout?
Knowledge level question.
Stem Development: Poor


The stem should include all words that would
otherwise be repeated in the responses.
The process whereby amniotic fluid is
removed from a pregnant woman to test for
possible birth defects:
A.
B.
C.
D.
Is
Is
Is
Is
known
known
known
known
as
as
as
as
amniocentesis.
cesarean section.
embryonic analysis.
fetal catheterization.
Knowledge level question.
9
Stem Development: Better

The process whereby amniotic fluid is
removed from a pregnant woman to
test for possible birth defects is known
as:
A.
B.
C.
D.
amniocentesis.
cesarean section.
embryonic analysis.
fetal catheterization.
Stem Development


Avoid negatively stated item stems. Items
that require the choice of an incorrect answer
rather than a correct one to avoid confusion.
Remember, you are NOT trying to trick them
into choosing a wrong answer.



Each of the following EXCEPT
Which of the following is NOT
Especially important not to go back and forth.
Knowledge level question.
Stem Development: Poor

Which of the following drugs is NOT
an anticoagulant?
A.
B.
C.
D.
Warfarin
Heparin
Enoxaparin
Protamine
Knowledge level question.
Stem Development: Better

Which of the following can be used to
reverse anticoagulation?
A.
B.
C.
D.
Warfarin.
Heparin.
Enoxaparin.
Protamine.
Knowledge level question.
10
Stem – Provide authority for
judgments
Poor stem:
The best way to discipline
a child is to:
A. Reward good behavior
B. Use physical punishment for
serious offenses.
C. Explain to the child why an
undesirable behavior
displeases you.
D. Ignore misbehavior.
Better stems:




According to the
American Academy of
Pediatrics, the best way
to discipline a child is
to:
As stated in class by the
instructor, ….
As stated in your
textbook, …..
As stated on the video
watched in class, …..
Stem Development: Poor

In 1850, Adolphe Chattin, then professor of
pharmacy in Paris, believed that goiter
resulted from an inadequate amount of
iodine in the diet.
The thyroid is part of which body system?
A.
B.
C.
D.
Endocrine
Cardiovascular.
Kidney.
Gastrointestinal.
Stem Development


Do not include unnecessary information in the
stem. Be succinct.
Keep to the point. Limit information provided
to that necessary to evaluate and answer the
question.


Verbosity, window dressing, and red herrings do
not make better tests.
The test is not the time to introduce new material
because you didn’t have time to get to it in class.
Distractor Development


The correct response must clearly stand out
as the one that experts in the field would
recognize as the best response.
Doubt or controversy about the correct
response will confuse the examinee and the
item may be challengeable.

UNLESS the controversy or doubt has been
explained in class and the student is asked to
comment on the controversy.
Knowledge level question.
11
Distractor Development


All responses should be grammatically
and logically consistent with the item
stem, and all response options should
be parallel.
Lack of parallelism can sometimes make
it possible to choose the right answer
with no knowledge of the correct
answer.
Distractor Development: Poor

A 60-year-old alcoholic in status epilepticus
is brought to the ER department by the
police. After determining that the airway is
open, the first step in management should
be IV administration of
A.
B.
C.
D.
Glucose with thiamine.
CT scan of the head.
Phenytoin.
Diazepam.
Comprehension level question.
Distractor Development


Don’t make your choices too long with
too many options.
Statistically, grammatical and other
inconsistencies are more likely to occur
in the ‘wrong’ answers than the correct
answer because writers pay closer
attention to detail in the correct answer.
Distractor Development: Poor

Which of the following antibiotic drugs listed
would be the BEST empiric therapy for a
patient being treated for an infection
presumed to be due to Staphylococcus
aureus?
A.
B.
C.
D.
Gentamicin
Amphotericin B
Nafcillin
Aztreonam
Metronidazole
Comprehension level question.
12
Distractors




Distractors should represent unsafe practices or
commonly held misconceptions AND should be
plausible.
Avoid using distractors that even the most
uninformed would recognize as being incorrect.
The use of humorous or absurd distractors is not
appropriate for academic tests, but may be OK
occasionally in other tests, such as in-class quizzes.
Graded responses

Distractor Development


Give 100% credit for the best answer; 50% credit for a
correct answer, but not the best answer; 0% or even
negative points if the choice is dangerous
Stem Development: Poor
A 12-year-old girl with sickle cell disease is to
be started on an oral antibiotic for S.
pneumoniae prophylaxis. What antibiotic
would you recommend?
A. Cefazolin
B. Cefotaxime
C. Penicillin VK
D. Cefpodoxime
Avoid phrasing the correct answer
directly from a textbook/handout.
Usually more technical than other
distractors.
Do not use “what would you do” or
“what do you believe”.

Technically, all available choices are
correct. You never asked them to do the
‘correct’ or ‘best’ thing.
Distractor Development: Poor

Uniqueness


Make sure that answers do not overlap
and are, therefore, unique
The right to vote in the US is granted
to individuals of what age?
first
A. 18
B. 16
C. 17
D. 19
Synthesis level question.
13
Example
After discussing the facts about medroxyprogesterone acetate injection (DepoProvera®)
with your patient, you instruct her to return to
clinic for her first injection:
A. Within 5 days of the start of her next
menstrual period.
B. Within 10 days of the start of her next
menstrual period.
C. Within 7 days of the start of her next
menstrual period.
D. Anytime.
Distractor Development

Do not use “All of the above”.




Technically each one of the answers is a correct
answer.
Use “None of the above” only in questions
which involve some type of exact calculation.
Don’t use absolute terms (e.g., always, all,
never).
Avoid vague terms (e.g., usually, often,
rarely, frequently).
Application level question.
Distractors

Avoid giving answers away

Make distractors of similar length.



Don’t make the correct answer much shorter, longer, or
more technical. If not possible, try to balance: 2 short
and 2 long distractors.
Do not put the same key words or descriptive
words both in the stem and in the correct answer
but not in the other distractors.
Do not make the correct answer clear and concise
and the wrong ones vague and ambiguous.
Distractor Development: Poor
Avoid word repeats from stem to distractor


A 58-year-old man with a history of heavy
alcohol use and previous psychiatric
hospitalization is confused and agitated. He
speaks of experiencing the world as unreal. This
symptom is called
A.
B.
C.
D.
Derealization
Depersonalization
Derailment
Focal memory deficit
Knowledge level question.
14
Stem and distractors: case
clusters

If you have a case or problem set with more
than one question asked for the case,
 Don’t give away the answer to one
questions with a subsequent question stem
or distractor.
 Don’t put the examinee in double jeopardy.
 They have to get Question 1 correctly
before they can answer Question 2
Examples

A 1-year-old boy has a 3-day history of runny nose and barking
cough that worsens at night. On physical exam, he has a temp
of 101°F, a ‘nontoxic’ appearance, inspiratory stridor, mild
pharyngitis, and a normal epiglottis. An anteroposterior neck Xray is pending. His arterial blood gas results included a PaO2 of
70 mm Hg. Which of the following recommendations would be
most appropriate?
A. Start oxygen, mist therapy, and nebulized albuterol.
B. Start oxygen, mist therapy, and nebulized racemic
epinephrine.
C. Start oxygen and ribavirin therapy.
D. Start oxygen and cefuroxime therapy.
Application level question.
Examples
Following your recommendations above,
JB continues to have stridor. What
therapy would you add?
A. Ribavirin
B. Aminophylline infusion
C. Dexamethasone 1 dose
D. Inhaled albuterol
E. Cefuroxime
Stem and distractors

Avoid the use of language which is
unnecessarily technical or unfamiliar to
appropriately trained. Always take into
consideration the relative reading level
and level of knowledge of people taking
the test.

Be willing to give definitions to some words
if asked.
Application level question.
15
Other things to avoid

Inconsistent numerical data





Numerical data is best arranged in
ascending or descending order
If on-line testing, don’t use random answer
function
The stem





Is it easy to read?
Is sentence structure simple and direct?
Are there any ambiguous or fuzzy words or
terms?
If technical terms are used, are they the most
generally accepted terms or the ones
discussed in class.
Is the premise positively stated?
General considerations




Short stems with long answers
Being “tricky”
Item evaluation - Summary

Item evaluation - Summary

Does it test something relevant to work in the
field? Does it test a learning objective?
Does it reflect current best practice in the field?
Is the stem clear?
Are the context, setting and content appropriate
for all takers?
Is the stem free of offensive language and free
from stereotype reinforcement?
Item evaluation - Summary

Distractors




Does each follow grammatically and logically
from the stem?
Are the choices parallel in structure, terminology,
length, and content?
Have like terms been used in the correct answer
and in the distractors?
Is there only one correct (or best) choice?
16
Useful Resources


Gronlund NE. (1988). How to Construct
Achievement Tests. Prentice-Hall, Inc,
Englewood Cliffs, NJ.
Ebel RL. (1972). How to Plan a
Classroom Test. In Essentials of
Educational Measurement. PrenticeHall, Inc: Englewood Cliffs, NY, pp. 97122.
Useful Resources



Morrison S, Nibert A, Flick J. (2006).
Critical Thinking and Test Item Writing,
2nd ed. Health Education Systems, Inc:
Houston, TX.
Item Writing Manual. The National
Board of Medical Examiners, 3rd ed.
1998. Found at www.nbme.org.
The school of hard knocks.
Questions?
17
Download