Item Analysis: Improving Multiple Choice Tests Crystal Ramsay September 27, 2011 Schreyer Institute for Teaching Excellence http://www.schreyerinstitute.psu.edu/ This workshop is designed to help you do three things: To interpret statistical indices provided by the university’s Scanning Operations To differentiate between well-performing items and poor-performing items To make decisions about poor performing items We give tests for 4 primary reasons. To find out if students learned what we intended To separate those who learned from those who didn’t To increase learning and motivation To gather information for adapting or improving instruction Multiple choice items are comprised of 4 basic components. Stem The rounded filling of an internal angle between two surfaces of a plastic molding is known as the Options A. rib. B. fillet. C. chamfer. D. Gusset plate. Distracters Key An item analysis focuses on 4 major pieces of information provided in the test score report. Test Score Reliability Item Difficulty Item Discrimination Distracter information Test score reliability is an index of the likelihood that scores would remain consistent over time if the same test was administered repeatedly to the same learners. Reliability coefficients range from .00 to 1.00. Ideal score reliabilities are >.80. Higher reliabilities = less measurement error. Now look at the test score reliability from your exam. Item Difficulty is the percentage of students who answered an item correctly. Represented in the Response Table as KEY-% RESPONSE TABLE - FORM A ITEM NO. OMIT % 1 0 2 0 3 0 A % 0 79 4 B % 18 0 7 C % 82 0 89 D % 0 21 0 E % 0 0 0 KEY- % C A C 82 79 89 ITEM EFFECT 0.22 0.23 -0.12 Ranges from 0% to 100% ITEM NO. Easier items have higher item difficulty values. More difficult items have lower item difficulty values. RESPONSE TABLE –FORM A KEY - % ITEM EFFECT 0 C 96 0.18 0 0 A 100 0.00 0 95 E 95 -0.11 OMIT A B C D E % % % % % % 4 0 0 4 96 0 5 0 100 0 0 6 0 0 0 5+ ITEM NO. RESPONSE TABLE –FORM A OMIT A B C D E % % % % % % 8 0 0 43 0 57 9 0 7 4 0 10 0 5 12 27 ITEM EFFECT KEY - % 0 D 57 0.46 75 14 D 75 -0.19 31 25 D 31 0.10 What is an ‘ideal’ item difficulty statistic depends on 2 factors. Number of alternatives for each item Your reason for asking the question Sometimes we include very easy or very difficult items on purpose. Did I deliberately pose difficult items to challenge my students’ thinking? Did I deliberately pose easy items to test basic information or to boost students’ confidence? Now look at the item difficulties from your exam. Which items were easier for your students? Which items were more difficult? Item Discrimination is the degree to which students with high overall exam scores also got a particular item correct. Represented as Item Effect because it tells how well an item ‘performed’ RESPONSE TABLE - FORM A ITEM NO. OMIT % 1 0 2 0 3 0 A % 0 79 4 B % 18 0 7 C % 82 0 89 D % 0 21 0 E % 0 0 0 KEY- % C A C 82 79 89 ITEM EFFECT 0.22 0.23 -0.12 Ranges from -1.00 to 1.00 and should be >.2 A wellperforming item ITEM NO. OMIT 8 A poorperforming item A B C D E % % % % % % 0 0 43 0 57 0 ITEM NO. OMIT 6 RESPONSE TABLE –FORM A KEY - % D 57 0.46 KEY - % ITEM EFFECT E 95 -0.11 RESPONSE TABLE –FORM A A B C D E % % % % % % 0 0 0 5+ 0 95 ITEM EFFECT What is an ‘ideal’ item discrimination statistic depends on 3 factors. Item Difficulty Test heterogeneity Item characteristics Item difficulty Very easy or very difficult items will have poor ability to discriminate among students. Yet… Very easy or very difficult items may still be necessary to sample content taught. Test heterogeneity A test that assesses many different topics will have a lower correlation with any one content-focused item. Yet… A heterogeneous item pool may still be necessary to sample content taught. Item quality A poorly written item will have little ability to discriminate among students. and… There is no substitute for a well-written item or for testing what you teach! Now look at the item effects from your exam. Which items on your exam performed ‘well’? Did any items perform ‘poorly’? Distracter information can be analyzed to determine which distracters were effective and which ones were not. RESPONSE TABLE - FORM A ITEM NO. OMIT % 1 0 2 0 3 0 A % 0 79 4 B % 18 0 7 C % 82 0 89 D % 0 21 0 E % 0 0 0 KEY- % C A C 82 79 89 ITEM EFFECT 0.22 0.23 -0.12 Now look at the distracter information for items from your exam. What can you conclude about them? Whether to retain, revise, or eliminate items depends on item difficulty, item discrimination, distracter information, and your instruction. Item Difficulty Item Discrimination Distracters Instruction Ultimately, it’s a judgment call that you have to make. What if I have a relatively short test or I give a test in a small class? I might not use the testing service for scoring. Is there a way I can understand how my items worked? Yes. Item 1 Top 1/3 Bottom 1/3 A B* C 10 3 D 1 4 Item 2 Top 1/3 Bottom 1/3 A* 8 B C 2 3 D Item 3 Top 1/3 Bottom 1/3 A 5 2 B C* 1 4 D 4 4 Item 4 Top 1/3 Bottom 1/3 A* 10 9 B C D 7 1 1. Which item is the easiest? 2. Which item shows negative (very bad) discrimination? 3. Which item discriminates best between high and low scores? 4. In Item 2, which distracter is most effective? 5. In Item 3, which distracter must be changed? 2 From: Suskie, L. (2009). Assessing student learning: A common sense guide (2nd ed.). San Francisco: Jossey-Bass. Even after you consider reliability, difficulty, discrimination, and distracters, there are still a few other things to think about… Multiple course sections Student feedback Other item types Resources For an excellent resource on item analysis: For a more extensive list of item-writing tips: http://testing.byu.edu/info/handbooks/MultipleChoice%20Item%20Writing%20Guidelines%20%20Haladyna%20and%20Downing.pdf http://homes.chass.utoronto.ca/~murdockj/teaching/MCQ_basi c_tips.pdf For a discussion about writing higher-level multiple choice items: http://www.ascilite.org.au/conferences/perth04/procs/pdf/woo dford.pdf http://www.utexas.edu/academic/ctl/assessment/iar/students/r eport/itemanalysis.php