Abstract: OXFORD Shock of the Old, 7/4/05 Tony Gardner-Medwin, Dept. Physiology, UCL, London WC1E 6BT a.gardner-medwin@ucl.ac.uk Why is your institution (probably) not using confidence-based marking (CBM) in place of right-wrong marking for objective tests? Decades of research and a decade of large-scale implementation at UCL have shown it to be theoretically sound, pedagogically beneficial, popular with students and easy to implement with both on-line and optical mark reader technologies. If the answer is ignorance, then you should look at our FDTL-funded dissemination website (www.ucl.ac.uk/lapt). Maybe the answer is inertia and the imagined constraints of an institutional VLE. But if you think that CBM must somehow be subjective, arbitrary, irrelevant to assessment of knowledge and understanding, discipline-specific, time-wasting, requiring new types of assessment material, or favouring particular personalities, then almost certainly you need to think or read more deeply about it. Within instructional material and formative or summative tests it helps reduce some of the very sensible regrets that we all have when we are forced to replace part of our paper-based assessments and small group teaching with automated tests and material. If you worry that your students simply repeat what they have learned - whether in essays or computer tests - without understanding why it is true, then CBM can help you discriminate between well-justified knowledge, tentative hunches, lucky guesses, simple ignorance and seriously confident errors. The presentation will explain what CBM is all about, give you experience based on questions about the Highway Code, seek audience feedback about what you perceive as potential + and - features, and cover evidence about many of the issues raised above. The take away message is that you fail in your duty to your students if you treat lucky guesses as equivalent to knowledge, or serious misconceptions as no worse than acknowledged ignorance. Your assessments should be something in which you have confidence. Gaining Confidence in Confidence-Based Marking Tony Gardner-Medwin, Physiology, UCL www.ucl.ac.uk/lapt What is CBM ? …. Why ? …. When ? What’s it like to experience CBM? What are possible pros and cons? ….. DISCUSSION ….. Issues, Data, implementation Why is your institution (probably) not using confidence-based marking (CBM) in place of right-wrong marking for objective tests? Oxford 4/05 What is CBM ? The LAPT (UCL) Confidence-Based Marking scheme … applied to each answer that will be marked right/wrong … e.g. T/F, MCQ, EMQs, Numerical, Simple text Confidence Level Score if Correct Score if Incorrect 1 1 0 2 2 -2 3 3 -6 < 67% < 2:1 67-80% > 2:1 >80% > 4:1 Best marks obtained if : Probability correct Odds Why CBM ? (1) Knowledge is degree of belief, or confidence: knowledge uncertainty ignorance misconception delusion decreasing confidence in what is true, increasing confidence in what is false (2) Students must be able to justify knowledge – relate it to other things, check it and argue with rigour. Rote learning is the bane of education. Knowledge is justified true belief In teaching we need to emphasise justification. In assessment we need to measure degrees of belief. With CBM you must think about justification You gain: EITHER if you find justifications for high confidence Mark expected on average OR if you see justifications for reservation. 3 C=3 2 C=2 1 C=1 0 -1 no reply -2 -3 -4 -5 67%80% 80% 67% -6 0% 50% 100% Confidence (estim ated probability correct) When & How do we use CBM ? ……… potentially whenever answers are marked right/wrong Student study: self-assessment, revision & learning materials … stand-alone (PC) or on the web, at home or in College Formative tests (once-off or repeat-till-pass, with randomised Qs or values) e.g. End of Module tests, Maths Practice/Assessment … access portal e.g. via WebCT, and grades returned e.g. to WebCT Open access for other universities, schools, etc. … BMAT practice & tips, GCSE maths, Biol AL, Physics, etc. Exams – summative assessment (at UCL) … T/F or MCQ, EMQ etc. using Optical Mark Reader … OMR (Speedwell) cards & processing available through UCL The UCL Confidence-Based Marking scheme … applied to each answer that will be marked right/wrong … e.g. T/F, MCQ, EMQs, Numerical, Simple text Confidence Level Score if Correct Score if Incorrect 1 1 0 2 2 -2 < 67% < 2:1 67-80% > 2:1 3 3 -6 Best marks obtained if : Probability correct Odds >80% > 4:1 What seem possible benefits (+) or drawbacks (-) to such a scheme? (a) In formative work (b) in exams ? Personality, gender issues: real or imagined? Does confidence-based marking favour certain personality types? • Both underconfidence and overconfidence are undesirable • ‘Correct’ calibration is well defined, desirable and achievable • No significant gender differences are evident (at least after practice) • Students with confidence problems: this is the way to deal with it! • In exams, we can adjust to compensate for poor calibration, so students still benefit from distinguishing more/less reliable answers How well do students discriminate confidence? 100% @C=3 90% % correct @C=2 80% 70% @C=1 60% 50% F M F M (i-c) (ex) F M F M (i-c) (ex) F M F M (i-c) (ex) Mean +/- 95% confidence limits, 331 students year 1 by ethnicity 100 90 80 confidence score 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 simple score (chance=0%) 70 80 90 100 Reliability and Validity of Confidence-based exam marks Exam marks are determined by: 1. the student’s knowledge and skills in the subject area 2. the level of difficulty of the questions 3. chance factors - how questions relate to details of the student’s knowledge and how uncertainties resolve (luck) (1) = “signal” (its measurement is the object of the exam) (3) = “noise” (random factors obscuring the “signal”) Confidence-based marks improve the “signal-to-noise ratio” A simple & convincing test of this is to compare marks on one set of questions with marks for the same student on a different set (e.g. odd & even Q nos.). High correlation means the data are measuring something about the student, not just “noise”. B. Marks scaled: 0%=chance 100%=max 2 R = 0.735 set 2 (simple) 80% 60% 40% 80% C. 2 R = 0.814 set 2 (confidence) 100% 60% 40% 7 C 20% 20% set 1 (sim ple) 0% set 1 (confidence) 0% 20% 40% 60% 80% 100% The correlation, across students, between scores on one set of questions and another is higher for CBM than for simple scores. But perhaps they are just measuring ability to handle confidence ? No. CBM scores are better than simple scores at predicting even the simple scores (ignoring confidence) on a different set of questions. This can only be because CBM is statistically a more efficient measure of knowledge. 0% 20% 40% 60% 80% 100% D. 80% 2 R = 0.776 set 2 (simple) 0% 60% 40% 20% set 1 (conf 0.6 ) 0% 0% 20% 40% 60% 80% 100% Coef. of Determination (r²), between odd & even numbered Qs in 6 exams (m±sem) Relative efficiency (adjusted conf- based scores / conventional) : m±sem 3 1 * P<0.05 ** P<0.01 0.8 conventional 0.6 adj. conf-based ** 2 difference 0.4 ** 0.2 * ** ** ** differences all P<0.01 ** 1 0 whole class bottom 1/3 top 1/3 whole class bottom 1/3 top 1/3 Improvements in reliability and efficiency, comparing CBM to conventional scores, in 6 medical student exams (each 250300 T/F Qs, >300 students). Cronbach Alpha (standard psychometric measure of ‘reliability’) On six exams (mean ± SEM, n=6): α = 0.925 ± 0.007 using CBM α = 0.873 ± 0.012 using number of items correct • The improvement (P<0.001, paired t-test) corresponds to a reduction of the random element in the variance of exam scores from 14.6% of the student variance to 8.1%. Arriving at a conclusion through probabilistic inference Nuggets of knowledge ? ? ? ? ? ? ? ? Networks of Understanding E V I D E N C E Confidence (Degree of Belief) Inference Choice Confidence-based marking places greater demands on justification, stimulating understanding To understand = to link correctly the facts that bear on an issue. We fail if we mark a lucky guess as if it were knowledge. We fail if we mark delusion as no worse than ignorance. www.ucl.ac.uk/lapt