Guidelines for Post-Exam Review of Questions The purpose of all academic testing is to promote and document student learning. Most students need clear short-term incentives to compel them to spend the time and effort necessary to learn the immense body of knowledge and skills required to become an effective physician. They also need periodic feedback telling them to what degree their study efforts are successful and where they need to devote more attention. Finally, instructors and medical schools must make certain that learning is taking place so that their graduates will be adequately prepared for their residencies and medical careers. Level of Discrimination: Effective exam questions discriminate between different levels of learning so that students who have learned more material at a higher-level are rewarded with higher grades. The most accurate measure of a question’s level of discrimination is its point biserial. What is a point biserial correlation? A point biserial correlation is a measure of association between a continuous variable and a binary variable. It is constrained to be between -1 and +1. In an item analysis, the binary variable is individual students’ correct or incorrect answer to a question item while the continuous variable is individuals’ performance on an exam as a whole. Sample Exam Results: Item # 5 (1=correct) Student A Student B Student C Student D Student E Student F Student G Student H Student I Student J Student K Student M 1 1 0 1 1 1 0 1 1 1 0 0 Overall Score 20 20 18 17 17 16 16 15 15 15 14 11 Item-Analysis: Item 5 Point Biserial .42 Total Group % 66.7% Upper 27% 66.7% Lower 27% 33.3% 1 A Freq B Freq C Freq D Freq 1 8 3 0 NonDistract D Approved on Nov. 7, 2007 What does it tell us? Higher-scoring students on an exam in virtue of having more correct answers naturally have a higher probability of getting any given question item on the exam correct. A positive biserial indicates that this expected relationship exists between correct answers to an item and total exam scores while a negative biserial shows an inverse relationship (i.e. lower-scoring students were more likely to get it correct). Hence, negative discriminating questions should in most cases be eliminated or amended because they very likely fail to accurately measure students’ learning of the targeted knowledge and skills. Level of Difficulty: To effectively measure different levels of student learning, exams must include questions of varying levels of difficulty. Statistically, difficulty is usually expressed as the proportion of students failing to answer an item correctly; however, DCOM uses the reverse statistic, the percentage of students answering correctly. Within the realm of medical education, test questions should fall within the 90% to 50% range for percentages of correct answers. There many factors that can contribute to students getting an item incorrect. Some of the most obvious include: • • • • • • • • The question is ambiguous. The question has no or more than one correct answer. The content was not adequately taught. Students were unaware the content was important. The question contained cues that misdirected students. Students missed an essential element of the question. Students were confused by a question’s logic (e.g. double negatives) Students chose not to study the material. Though it usually impossible to firmly determine why students could not correctly answer a question item, enough factors are beyond students’ control that it is usually warranted to eliminate questions below 40% (i.e. 60% of students missing the question). Post-Exam Review While reviewing post-exam item analysis, it is important to look at all the data available rather than rely on single data points. The matrix below provides some general standards in using itemanalysis statistics. 2 Approved on Nov. 7, 2007 DCOM Suggested Guidelines for Reviewing and Eliminating Question Items: Total % Correct 30%-0% Biserial >.3 Biserial .3 -.15 Biserial .149 - 0 Biserial <0 Eliminate Eliminate Eliminate 50%-30.1% Review - Eligible for extra credit Review Review Eliminate Eliminate 80%-50.1% Ok Ok Review Eliminate 100%-80.1% Ok Ok Ok Review* * Although negative-discriminating questions with high percentages of students answering correctly are usually not discarded for practical reasons, they should probably not be used again on future exams. It is important to keep in mind that item-analysis statistics can identify non-performing question items but they seldom indicate what the problem is. While the Group Exam Question-Feedback Forms can provide significant insights, peer review/discussion and instructor self-reflection are frequently the most important factors in deciding to eliminate or amend question items. Post-Exam Item Modification Options: • • • • Change the correct answer (if the answer was mis-keyed) Add an additional correct answer (only if a second answer is truly correct) Eliminate a question from the exam (no points given for correct answer) Make a question an extra-credit item (only if it is a good discriminator); point given for correct answer but the number of total possible points dropped by one The Practical Implications of Eliminating Questions Individual raw scores (initially out of 20) If question left unchanged If question eliminated If question is made extracredit 19* 18 17* 17 15 Mean % 95% 90% 85% 85% 75% 86% 94.7% 94.7% 84.2% 89.5% 78.9% 88.4% 100% 94.7% 89.5% 89.5% 78.9% 90.5% * Indicates students with a correct answer. 3 Approved on Nov. 7, 2007