Guidelines for Post-Exam Review of Questions

advertisement
Guidelines for Post-Exam Review of Questions
The purpose of all academic testing is to promote and document student learning. Most students
need clear short-term incentives to compel them to spend the time and effort necessary to learn
the immense body of knowledge and skills required to become an effective physician. They also
need periodic feedback telling them to what degree their study efforts are successful and where
they need to devote more attention. Finally, instructors and medical schools must make certain
that learning is taking place so that their graduates will be adequately prepared for their
residencies and medical careers.
Level of Discrimination:
Effective exam questions discriminate between different levels of learning so that students who
have learned more material at a higher-level are rewarded with higher grades. The most accurate
measure of a question’s level of discrimination is its point biserial.
What is a point biserial correlation?
A point biserial correlation is a measure of association between a continuous variable and a
binary variable. It is constrained to be between -1 and +1.
In an item analysis, the binary variable is individual students’ correct or incorrect answer to a
question item while the continuous variable is individuals’ performance on an exam as a whole.
Sample Exam Results:
Item # 5
(1=correct)
Student A
Student B
Student C
Student D
Student E
Student F
Student G
Student H
Student I
Student J
Student K
Student M
1
1
0
1
1
1
0
1
1
1
0
0
Overall
Score
20
20
18
17
17
16
16
15
15
15
14
11
Item-Analysis:
Item
5
Point
Biserial
.42
Total
Group %
66.7%
Upper
27%
66.7%
Lower
27%
33.3%
1
A Freq
B Freq
C Freq
D Freq
1
8
3
0
NonDistract
D
Approved on Nov. 7, 2007
What does it tell us?
Higher-scoring students on an exam in virtue of having more correct answers naturally have a
higher probability of getting any given question item on the exam correct. A positive biserial
indicates that this expected relationship exists between correct answers to an item and total exam
scores while a negative biserial shows an inverse relationship (i.e. lower-scoring students were
more likely to get it correct). Hence, negative discriminating questions should in most cases be
eliminated or amended because they very likely fail to accurately measure students’ learning of
the targeted knowledge and skills.
Level of Difficulty:
To effectively measure different levels of student learning, exams must include questions of
varying levels of difficulty. Statistically, difficulty is usually expressed as the proportion of
students failing to answer an item correctly; however, DCOM uses the reverse statistic, the
percentage of students answering correctly. Within the realm of medical education, test
questions should fall within the 90% to 50% range for percentages of correct answers.
There many factors that can contribute to students getting an item incorrect. Some of the most
obvious include:
•
•
•
•
•
•
•
•
The question is ambiguous.
The question has no or more than one correct answer.
The content was not adequately taught.
Students were unaware the content was important.
The question contained cues that misdirected students.
Students missed an essential element of the question.
Students were confused by a question’s logic (e.g. double negatives)
Students chose not to study the material.
Though it usually impossible to firmly determine why students could not correctly answer a
question item, enough factors are beyond students’ control that it is usually warranted to
eliminate questions below 40% (i.e. 60% of students missing the question).
Post-Exam Review
While reviewing post-exam item analysis, it is important to look at all the data available rather
than rely on single data points. The matrix below provides some general standards in using itemanalysis statistics.
2
Approved on Nov. 7, 2007
DCOM Suggested Guidelines for Reviewing and Eliminating Question Items:
Total % Correct
30%-0%
Biserial >.3
Biserial .3 -.15
Biserial .149 - 0
Biserial <0
Eliminate
Eliminate
Eliminate
50%-30.1%
Review - Eligible
for extra credit
Review
Review
Eliminate
Eliminate
80%-50.1%
Ok
Ok
Review
Eliminate
100%-80.1%
Ok
Ok
Ok
Review*
* Although negative-discriminating questions with high percentages of students answering correctly are usually not
discarded for practical reasons, they should probably not be used again on future exams.
It is important to keep in mind that item-analysis statistics can identify non-performing question
items but they seldom indicate what the problem is. While the Group Exam Question-Feedback
Forms can provide significant insights, peer review/discussion and instructor self-reflection are
frequently the most important factors in deciding to eliminate or amend question items.
Post-Exam Item Modification Options:
•
•
•
•
Change the correct answer (if the answer was mis-keyed)
Add an additional correct answer (only if a second answer is truly correct)
Eliminate a question from the exam (no points given for correct answer)
Make a question an extra-credit item (only if it is a good discriminator); point given for
correct answer but the number of total possible points dropped by one
The Practical Implications of Eliminating Questions
Individual
raw scores
(initially
out of 20)
If question left
unchanged
If question
eliminated
If question is
made extracredit
19*
18
17*
17
15
Mean %
95%
90%
85%
85%
75%
86%
94.7%
94.7%
84.2%
89.5%
78.9%
88.4%
100%
94.7%
89.5%
89.5%
78.9%
90.5%
* Indicates students with a correct answer.
3
Approved on Nov. 7, 2007
Download