Objectives Constructing Multiple Choice Question-Based Examinations At the end of this session, the participant will be able to: 1) Construct exams using principles that improve reliability and validity. 2) Write multiple choice items that meet the four criteria for critical thinking test items. 3) Write test items that are professional and meet generally accepted guidelines. Kathy Chessman, PharmD, FCCP, BCPS, BCNSP Professor, Clinical Pharmacy and Outcome Sciences South Carolina College of Pharmacy, MUSC campus Types of Exams Achievement test: knowledge based Systematic procedure for measuring a set of learning objectives Achievement Tests Tests prior to instruction (pretest) Performance test: skills based Laboratory skills (e.g., compounding) Problem-solving skills (e.g., PN ordering) Communication skills (e.g., counseling) Readiness Tests during instruction (midpoint) Do students have skills and abilities needed for new material Placement Have students already achieved the intended learning objectives Formative tests Monitor student progress during instruction Diagnostic tests Determine learning or teaching problems 1 Achievement Tests Guidelines for Achievement Tests End of instruction (post-test) Summative testing Given at end of a period of instruction to certify mastery and/or assign a grade Assesses all learning objectives Covers breadth of learning experience Measures a representative sample of all learning tasks Guidelines for Achievement Tests Provide scores free from measurement errors Provide consistent & reliable results Avoid factors contributing to measurement errors: Ambiguity Testing with too few questions Subjectivity Variation in students’ attention, effort, fatigue Tendency to guess during a test Not allowing enough time to complete Measure clearly defined learning outcomes (i.e., course or lecture objectives) Generally test at the application level or higher (i.e., critical thinking) Not just measuring memorized facts and explanations Include types of test items that best measure the intended learning outcomes Tests as Teaching Tools Improve student studying and motivation Provide short-term goals Clarify tasks and information to be learned Aid retention and learning Learning at the understanding, application, and interpretation levels is retained longer 2 Planning/Constructing a Test Tests as Teaching Tools Aid student self-evaluation Aid in evaluation of instructional effectiveness Aid in evaluation of instructor *Exam reviews can be very valuable as teaching tools* Step 1: Determine the purpose Step 2: Review expected learning outcomes What instructional objectives were given to the students? Test on all objectives. What cognitive level* should be tested? Predetermined by objectives. *Bloom’s levels of cognition, 1956 Illustrative Action Verbs for Defining Objectives The Cognitive Domain* Intellectual abilities and skills Knowledge Remembering previously learned material Comprehension Grasping the meaning of the material Application Using information in concrete situations Analysis Breaking down material into its parts Synthesis Putting parts together into a whole Evaluation Judging the value of a given thing for a given purpose using definitive criteria Knowledge Comprehension Identify, name, state, define, describe, list, match Classify, explain, summarize, convert Application Demonstrate, compute, solve, modify *Bloom’s levels of cognition, 1956 3 Illustrative Action Verbs for Defining Objectives Analysis Differentiate, diagram, estimate Synthesis Step 3: Determine Test Specifications Combine, create, formulate, design Weight the learning objectives and content areas in terms of their relative importance and emphasis in class Decide on distribution of test items Evaluation Judge, critique, compare, justify, redesign Step 4: Select Item Types Selection type Multiple choice One correct answer Multiple response (e.g., K-type) True-false Matching Distribute based on relative importance Distribute based on difficulty Distribute based on time spent in class Step 4: Select Item Types Use the most direct measure of student performance specified by the intended learning objective/task Supply type Short-answer Essay (restricted or extended response) Can they pick the best antibiotic for an infection? Have them pick an antibiotic for an infection. Can they use drug-drug interaction knowledge? Give them a case with a DI that determines treatment options. Can they write a PN order? Have them write a PN order. 4 Step 5: Determine the Number of Items Develop a blueprint from weights Allow for a sufficient sample Consider practical restraints Validity: to what extent is the interpretation of the scores appropriate, meaningful, and useful? We want test scores to serve their intended purpose: test the students’ learning Reliability: to what extent will the test score be free from errors of measurement? Arrange the selected items All students should be able to attempt all items within the time available Step 8: Administer the exam Step 9: Evaluate the exam Simple multiple choice: 30 seconds Computation or complex case: longer Steps 8 and 9: Administer and Evaluate the Exam Results Step 6: Construct quality items Step 7: Assemble the exam Estimated time per question Time period for test Steps 6 and 7: Construct the Items and Exam Consistent from class to class, year to year Effectiveness of each alternative Items that measure the same objective should be grouped together Items of the same type should be grouped together Items should be arranged in order of increasing difficulty (easy to hard) Step 9: Evaluating Item Performance Difficulty index: proportion of the students answering correctly – range: 0 – 1 Most instructors choose to review all items for which the difficulty index is ≤ 0.5 Discrimination index: correlation between the answering of this item and the overall performance on the test – range: 0 – 1 Did the students who did well overall, answer this item correctly? Discrimination index would be closer to 1. Did the students who did poorly overall do poorly with this item? Discrimination index would be closer to 0. 5 Miscellaneous Issues Repeated use of an exam or individual test item is valid and encouraged Allows for item improvement and validation over time Allows for development of question or item banks Improves reliability Step 6: Writing Effective Multiple Choice Questions Frequency of exams No magic formula Advantages of frequent testing: More reliable basis for evaluation Keeps instructor and students better informed on progress Less penalty for doing poorly on one exam Goal of Item Writing High quality items Reward the students that KNOW the material not just KNOW how to take a test Do not confuse Do not penalize Do not reward by inadvertently giving away the correct answer Do not test trivia Test achievement of the stated objectives Test important material that they need to know It’s Not as Simple as You Think Don’t wait until the last minute Try out your questions on others Writing critical thinking questions requires good sound clinical expertise Unintended violations of item-writing principles Multiple meanings Reading your mind Ambiguities in wording 6 Types of Multiple Choice Questions A-type questions – most common and most often recommended by experts Stem with a lead-in question followed by 3 to 5 choices with 1 correct choice and 3-4 plausible, but incorrect, distractors True/False questions Multiple response questions (K-type) Matching items Four Criteria for Critical Thinking MC Questions Levels of Knowledge Tested Include the rationale for each test item, include why the correct answer is correct and why each incorrect answer is incorrect. Write questions at the application or above cognitive level. Require multilogical thinking to answer questions. Require a high level of discrimination to choose from plausible alternatives. Multiple choice questions CAN assess higher-level cognitive skills such as application of knowledge and problemsolving. Avoid simple rote memory questions Should comprise ≤ 5% of test Essential facts are best incorporated into the problem-solving process Parts of the MC item The stem The responses One correct or one best answer Foils or distractors Key to good questions is being able to write high-quality plausible distractors that effectively discriminate those who have mastered the material from those who have not Paul R. Critical thinking: what every person needs to survive in a rapidly changing world. 1993. 7 Correct vs. Best Answer Correct vs. Best Answer Which of the following drugs reduces transient relaxation of the lower esophageal sphincter? A. Metoclopramide B. Sucralfate C. Cisapride D. Baclofen A patient in the STICU is receiving PN providing approximately 60% of his estimated calorie needs. The CHO:fat ratio is 50:50, and the GIR = 2 mg/kg/min. This morning’s 0400 labs reveal a blood glucose concentration of 280 mg/dL. Which of the following is the MOST appropriate action? A. B. C. D. Knowledge level question. Item structure Write as a direct question. (There is punctuation at the end of the sentence; not the answers.) Who is the scientist most closely associated with the discovery of the polio vaccine? A. B. C. D. Jonas Salk Louis Pasteur Edward Jenner Robert Koch Knowledge level question. Start an insulin drip titrated to glucose 140-180 mg/dL. Decrease the amount of glucose in the next PN bag. Add insulin 10 units to the next PN bag. Stop the PN; hang 0.9% NaCl at the same rate. Synthesis level question. Item structure Write as an incomplete sentence. Note: there is punctuation at the end of the choices to complete the sentence.) The polio vaccine was discovered by: A. B. C. D. Jonas Salk. Louis Pasteur. Edward Jenner. Robert Koch. Knowledge level question. 8 Stem Stem development: Poor Wording Should be clear enough that the type of answer and maybe even the answer can be anticipated before looking at the responses. A. It occurs frequently in women. B. It may be associated with a finding of chondrocalcinosis. C. It is clearly hereditary in most cases. D. It responds well to treatment with allopurinol. Diabetes is: A. A disorder associated with the body’s absorption of glucose. B. A digestive disorder which afflicts adults. C. A potentially fatal disorder if not detected early. D. A disease that can result in blindness. Represents essentially 4 True/False statements. Represents essentially 4 True/False statements. Knowledge level question. Stem Development: Better Wording Pseudogout is characterized by which of the following? A. B. C. D. Frequent occurrence in women. Chondrocalcinosis. Genetic inheritance. Good therapeutic response to allopurinol. Knowledge level question. Which of the following is TRUE about pseudogout? Knowledge level question. Stem Development: Poor The stem should include all words that would otherwise be repeated in the responses. The process whereby amniotic fluid is removed from a pregnant woman to test for possible birth defects: A. B. C. D. Is Is Is Is known known known known as as as as amniocentesis. cesarean section. embryonic analysis. fetal catheterization. Knowledge level question. 9 Stem Development: Better The process whereby amniotic fluid is removed from a pregnant woman to test for possible birth defects is known as: A. B. C. D. amniocentesis. cesarean section. embryonic analysis. fetal catheterization. Stem Development Avoid negatively stated item stems. Items that require the choice of an incorrect answer rather than a correct one to avoid confusion. Remember, you are NOT trying to trick them into choosing a wrong answer. Each of the following EXCEPT Which of the following is NOT Especially important not to go back and forth. Knowledge level question. Stem Development: Poor Which of the following drugs is NOT an anticoagulant? A. B. C. D. Warfarin Heparin Enoxaparin Protamine Knowledge level question. Stem Development: Better Which of the following can be used to reverse anticoagulation? A. B. C. D. Warfarin. Heparin. Enoxaparin. Protamine. Knowledge level question. 10 Stem – Provide authority for judgments Poor stem: The best way to discipline a child is to: A. Reward good behavior B. Use physical punishment for serious offenses. C. Explain to the child why an undesirable behavior displeases you. D. Ignore misbehavior. Better stems: According to the American Academy of Pediatrics, the best way to discipline a child is to: As stated in class by the instructor, …. As stated in your textbook, ….. As stated on the video watched in class, ….. Stem Development: Poor In 1850, Adolphe Chattin, then professor of pharmacy in Paris, believed that goiter resulted from an inadequate amount of iodine in the diet. The thyroid is part of which body system? A. B. C. D. Endocrine Cardiovascular. Kidney. Gastrointestinal. Stem Development Do not include unnecessary information in the stem. Be succinct. Keep to the point. Limit information provided to that necessary to evaluate and answer the question. Verbosity, window dressing, and red herrings do not make better tests. The test is not the time to introduce new material because you didn’t have time to get to it in class. Distractor Development The correct response must clearly stand out as the one that experts in the field would recognize as the best response. Doubt or controversy about the correct response will confuse the examinee and the item may be challengeable. UNLESS the controversy or doubt has been explained in class and the student is asked to comment on the controversy. Knowledge level question. 11 Distractor Development All responses should be grammatically and logically consistent with the item stem, and all response options should be parallel. Lack of parallelism can sometimes make it possible to choose the right answer with no knowledge of the correct answer. Distractor Development: Poor A 60-year-old alcoholic in status epilepticus is brought to the ER department by the police. After determining that the airway is open, the first step in management should be IV administration of A. B. C. D. Glucose with thiamine. CT scan of the head. Phenytoin. Diazepam. Comprehension level question. Distractor Development Don’t make your choices too long with too many options. Statistically, grammatical and other inconsistencies are more likely to occur in the ‘wrong’ answers than the correct answer because writers pay closer attention to detail in the correct answer. Distractor Development: Poor Which of the following antibiotic drugs listed would be the BEST empiric therapy for a patient being treated for an infection presumed to be due to Staphylococcus aureus? A. B. C. D. Gentamicin Amphotericin B Nafcillin Aztreonam Metronidazole Comprehension level question. 12 Distractors Distractors should represent unsafe practices or commonly held misconceptions AND should be plausible. Avoid using distractors that even the most uninformed would recognize as being incorrect. The use of humorous or absurd distractors is not appropriate for academic tests, but may be OK occasionally in other tests, such as in-class quizzes. Graded responses Distractor Development Give 100% credit for the best answer; 50% credit for a correct answer, but not the best answer; 0% or even negative points if the choice is dangerous Stem Development: Poor A 12-year-old girl with sickle cell disease is to be started on an oral antibiotic for S. pneumoniae prophylaxis. What antibiotic would you recommend? A. Cefazolin B. Cefotaxime C. Penicillin VK D. Cefpodoxime Avoid phrasing the correct answer directly from a textbook/handout. Usually more technical than other distractors. Do not use “what would you do” or “what do you believe”. Technically, all available choices are correct. You never asked them to do the ‘correct’ or ‘best’ thing. Distractor Development: Poor Uniqueness Make sure that answers do not overlap and are, therefore, unique The right to vote in the US is granted to individuals of what age? first A. 18 B. 16 C. 17 D. 19 Synthesis level question. 13 Example After discussing the facts about medroxyprogesterone acetate injection (DepoProvera®) with your patient, you instruct her to return to clinic for her first injection: A. Within 5 days of the start of her next menstrual period. B. Within 10 days of the start of her next menstrual period. C. Within 7 days of the start of her next menstrual period. D. Anytime. Distractor Development Do not use “All of the above”. Technically each one of the answers is a correct answer. Use “None of the above” only in questions which involve some type of exact calculation. Don’t use absolute terms (e.g., always, all, never). Avoid vague terms (e.g., usually, often, rarely, frequently). Application level question. Distractors Avoid giving answers away Make distractors of similar length. Don’t make the correct answer much shorter, longer, or more technical. If not possible, try to balance: 2 short and 2 long distractors. Do not put the same key words or descriptive words both in the stem and in the correct answer but not in the other distractors. Do not make the correct answer clear and concise and the wrong ones vague and ambiguous. Distractor Development: Poor Avoid word repeats from stem to distractor A 58-year-old man with a history of heavy alcohol use and previous psychiatric hospitalization is confused and agitated. He speaks of experiencing the world as unreal. This symptom is called A. B. C. D. Derealization Depersonalization Derailment Focal memory deficit Knowledge level question. 14 Stem and distractors: case clusters If you have a case or problem set with more than one question asked for the case, Don’t give away the answer to one questions with a subsequent question stem or distractor. Don’t put the examinee in double jeopardy. They have to get Question 1 correctly before they can answer Question 2 Examples A 1-year-old boy has a 3-day history of runny nose and barking cough that worsens at night. On physical exam, he has a temp of 101°F, a ‘nontoxic’ appearance, inspiratory stridor, mild pharyngitis, and a normal epiglottis. An anteroposterior neck Xray is pending. His arterial blood gas results included a PaO2 of 70 mm Hg. Which of the following recommendations would be most appropriate? A. Start oxygen, mist therapy, and nebulized albuterol. B. Start oxygen, mist therapy, and nebulized racemic epinephrine. C. Start oxygen and ribavirin therapy. D. Start oxygen and cefuroxime therapy. Application level question. Examples Following your recommendations above, JB continues to have stridor. What therapy would you add? A. Ribavirin B. Aminophylline infusion C. Dexamethasone 1 dose D. Inhaled albuterol E. Cefuroxime Stem and distractors Avoid the use of language which is unnecessarily technical or unfamiliar to appropriately trained. Always take into consideration the relative reading level and level of knowledge of people taking the test. Be willing to give definitions to some words if asked. Application level question. 15 Other things to avoid Inconsistent numerical data Numerical data is best arranged in ascending or descending order If on-line testing, don’t use random answer function The stem Is it easy to read? Is sentence structure simple and direct? Are there any ambiguous or fuzzy words or terms? If technical terms are used, are they the most generally accepted terms or the ones discussed in class. Is the premise positively stated? General considerations Short stems with long answers Being “tricky” Item evaluation - Summary Item evaluation - Summary Does it test something relevant to work in the field? Does it test a learning objective? Does it reflect current best practice in the field? Is the stem clear? Are the context, setting and content appropriate for all takers? Is the stem free of offensive language and free from stereotype reinforcement? Item evaluation - Summary Distractors Does each follow grammatically and logically from the stem? Are the choices parallel in structure, terminology, length, and content? Have like terms been used in the correct answer and in the distractors? Is there only one correct (or best) choice? 16 Useful Resources Gronlund NE. (1988). How to Construct Achievement Tests. Prentice-Hall, Inc, Englewood Cliffs, NJ. Ebel RL. (1972). How to Plan a Classroom Test. In Essentials of Educational Measurement. PrenticeHall, Inc: Englewood Cliffs, NY, pp. 97122. Useful Resources Morrison S, Nibert A, Flick J. (2006). Critical Thinking and Test Item Writing, 2nd ed. Health Education Systems, Inc: Houston, TX. Item Writing Manual. The National Board of Medical Examiners, 3rd ed. 1998. Found at www.nbme.org. The school of hard knocks. Questions? 17