Comparing Decision Rules Decision accuracy of different decision rules combining multiple measures in a higher educational context Iris Yocarini, Samantha Bouwmeester, Guus Smeets, and Lidia Arends CEMO conference standard setting 23rd september 2015 The decision to be made Student to second bachelor year Student leaves bachelor program Start of first bachelor year End of first bachelor year BSA decision Decision accuracy • Given high stakes an accurate decision is required • Comparison decision based on true score vs. observed score True Score Error Observed score Decision accuracy • Given high stakes an accurate decision is required • Comparison decision based on true score vs. observed score Decision based on true score Fail Decision based on observed score Fail Pass Correct classification Misclassification False negative Pass Misclassification False positive Correct classification Decision accuracy Decision based on true score Fail Decision based on observed score Fail Correct classification Pass Misclassification False positive • Total proportion of misclassifications • (C + B / total sample) Pass A Misclassification False negative B C Correct classification D Decision accuracy Decision based on true score Fail Decision based on observed score Fail Correct classification Pass Misclassification False positive • • Pass A Misclassification False negative B C Correct classification D Total proportion of misclassifications • (C + B / total sample) False negative rate • from all truly competent students those who are identified as fails (B/B+D) Decision accuracy Decision based on true score Fail Decision based on observed score Fail Correct classification Pass Misclassification False positive • • • Pass A Misclassification False negative B C Correct classification D Total proportion of misclassifications • (C + B / total sample) False negative rate • from all truly competent students those who are identified as fails (B/B+D) False positive rate • from all truly failing students those who are identified as passes (C/A+C) Decision accuracy Decision based on true score Fail Decision based on observed score Fail Correct classification Pass Misclassification False positive • • • • Pass A Misclassification False negative B C Correct classification D Total proportion of misclassifications • (C + B / total sample) False negative rate • from all truly competent students those who are identified as fails (B/B+D) False positive rate • from all truly failing students those who are identified as passes (A/A+C) Positive predictive value • from all students who passed those who are correctly classified (D/C+D) Testing system • Compensatory testing system at Erasmus University Rotterdam • Vs. standard conjunctive testing system in Dutch higher education • Debate Reasons behind implementation • Educational views • Psychometric argument • Classical Test Theory (CTT): average more reliable Assumption of parallel tests • Equal true ability levels • Similar test reliabilities Factors influencing decision accuracy • Reliability • Decision accuracy True Score Observed score Decision based on true score Error Error Fail Decision based on observed score Fail Pass Correct classification Misclassification False negative Pass Misclassification False positive Correct classification Decision rules in practice • Educational setting: combinatory decision rules • Compensatory aspect: required GPA • Conjunctive aspect: required minimum grade • Clusters • First year psychology at Erasmus University • Grading scale: 1.0 – 10.0 • GPA: 6.0 • Minimum grade: 4.0 • Two clusters with each 8 courses Our study • Aim of study • Comparing decision accuracy different decision rules that combine multiple tests • Evaluating psychometric argument for implementation compensatory testing system • CTT: average grade is more reliable than using individual test scores • Context of first year Psychology students at Erasmus University Rotterdam Decision rules • Varying • Conjunctive aspect: minimum required grade • Compensatory aspect: required GPA • Also included • Fully conjunctive rule • Fully compensatory rule Decision rule Minimum grade GPA Fully Conjunctive 5.5 5.5 Fully Compensatory 1.0 5.5 / 6.0 / 6.5 Complex rules 3.0 / 4.0 / 5.0 5.5 / 6.0 / 6.5 *Grading from 1.0 to 10.0 Simulation • Simulation study Decision based on true score Decision based on observed score • Fail Pass Fail Correct classification Misclassification False negative Pas s Misclassification False positive Correct classification Manipulation of factors Results – minimum grade & GPA • • Minimum grade • 1.0/ 3.0/ 4.0/ 5.0 GPA • 5.5/ 6.0/ 6.5 Results – minimum grade & GPA Results – minimum grade & GPA Results – minimum grade & GPA Results – average test reliability Proportion of Misclassifications Results – average test reliability Positive Predictive Value Results – number of retakes Proportion of Misclassifications Results – number of retakes False Negative Rate Results – number of retakes False Positive Rate Results – number of retakes Positive Predictive Value Comparison conjunctive & compensatory • In compensatory decision rule: • Fewer classification errors • Fewer false negatives, more false positives • Positive predictive value higher Conclusion • Increasing the degree of compensation results in less classification errors • Within compensatory decision rule relatively fewer false negatives and more false positives • Depends on specific setting & tests used • Most important: test reliability and number of retakes • • Psychometric argument Standard setting Take home message • • Decision accuracy important consideration Focus on both specific decision rule as well as tests Thank you for your attention! Questions? yocarini@fsw.eur.nl Results – proportion of misclassifications Number of Retakes Minimum Number of Tests GPA Average Test Reliability Decision Rule Average Test Correlation Mean Proportion Errors 1 5.5 5.5 .18 .16 .18 .19 .19 .24 .18 .12 .19 .17 .20 .16 2 5.5 1 .05 .03 .04 .05 .06 .06 .04 .03 .05 .04 .06 .03 3 5.5 3 .08 .09 .09 .08 .07 .14 .07 .04 .08 .09 .13 .04 4 5.5 4 .17 .20 .18 .16 .13 .27 .16 .08 .15 .19 .26 .08 5 5.5 5 .22 .23 .23 .22 .21 .31 .22 .14 .22 .23 .28 .16 6 6 1 .09 .09 .10 .09 .09 .13 .09 .06 .10 .09 .10 .08 7 6 3 .11 .13 .11 .10 .09 .16 .10 .06 .11 .11 .14 .08 8 6 4 .16 .20 .17 .14 .12 .24 .15 .08 .15 .17 .22 .10 9 6 5 .21 .23 .22 .21 .19 .29 .21 .13 .20 .22 .27 .15 10 6.5 1 .13 .17 .13 .12 .10 .18 .13 .08 .14 .12 .13 .13 11 6.5 3 .13 .17 .14 .12 .10 .19 .13 .08 .14 .13 .13 .13 12 6.5 4 .15 .19 .15 .13 .11 .21 .14 .09 .15 .14 .16 .13 13 6.5 5 .17 .20 .18 .16 .14 .24 .17 .11 .17 .17 .20 .14 .1 .3 .5 .7 .4 .6 .8 8 12 0 2 Results - sensitivity Number of Retakes Minimum Number of Tests GPA Average Test Reliability Decision Rule Average Test Correlation Mean Sensitivity 1 5.5 5.5 .60 .52 .59 .64 .67 .45 .60 .76 .65 .56 .44 .77 2 5.5 1 .97 .98 .97 .97 .97 .96 .98 .99 .97 .98 .96 .99 3 5.5 3 .93 .91 .92 .93 .95 .87 .94 .98 .94 .92 .88 .98 4 5.5 4 .83 .79 .81 .84 .87 .71 .84 .93 .85 .80 .71 .94 5 5.5 5 .67 .61 .66 .69 .72 .52 .68 .82 .71 .63 .51 .83 6 6 1 .95 .94 .95 .95 .95 .93 .95 .97 .94 .95 .92 .98 7 6 3 .92 .90 .91 .93 .94 .87 .93 .96 .92 .92 .87 .97 8 6 4 .85 .80 .83 .87 .90 .74 .86 .93 .87 .83 .75 .95 9 6 5 .68 .61 .67 .71 .75 .53 .69 .83 .73 .64 .52 .84 10 6.5 1 .92 .89 .91 .93 .94 .89 .92 .94 .91 .92 .88 .96 11 6.5 3 .90 .86 .90 .92 .93 .86 .91 .94 .90 .90 .85 .95 12 6.5 4 .86 .80 .85 .88 .91 .78 .87 .93 .87 .85 .78 .94 13 6.5 5 .73 .64 .71 .76 .81 .59 .74 .86 .77 .69 .58 .88 .1 .3 .5 .7 .4 .6 .8 8 12 0 2 Results - specificity Number of Retakes Minimum Number of Tests GPA Average Test Reliability Decision Rule Average Test Correlation Mean Specificity 1 5.5 5.5 .93 .92 .92 .93 .94 .93 .92 .93 .91 .94 .96 .89 2 5.5 1 .67 .57 .66 .71 .74 .58 .67 .77 .66 .69 .76 .58 3 5.5 3 .72 .66 .72 .75 .77 .69 .71 .77 .70 .75 .82 .63 4 5.5 4 .80 .75 .79 .83 .85 .82 .79 .81 .78 .83 .87 .74 5 5.5 5 .89 .86 .88 .90 .92 .90 .88 .89 .87 .91 .93 .85 6 6 1 .73 .65 .73 .77 .79 .64 .73 .82 .72 .75 .81 .65 7 6 3 .75 .68 .75 .78 .80 .69 .74 .83 .73 .77 .84 .66 8 6 4 .80 .75 .80 .82 .83 .78 .78 .84 .78 .82 .88 .72 9 6 5 .89 .86 .88 .90 .91 .90 .88 .89 .87 .91 .93 .84 10 6.5 1 .80 .74 .80 .83 .84 .72 .80 .88 .79 .82 .87 .74 11 6.5 3 .81 .75 .80 .83 .84 .73 .81 .88 .79 .82 .87 .74 12 6.5 4 .83 .78 .82 .85 .86 .78 .82 .89 .81 .85 .90 .76 13 6.5 5 .88 .87 .88 .89 .90 .88 .87 .90 .86 .91 .94 .83 .1 .3 .5 .7 .4 .6 .8 8 12 0 2 Results – positive predictive value Number of Retakes Minimum Number of Tests GPA Average Test Reliability Decision Rule Average Test Correlation Mean Positive Predictive Value 1 5.5 5.5 .82 .68 .80 .88 .93 .79 .82 .86 .83 .81 .79 .86 2 5.5 1 .98 .99 .98 .97 .96 .97 .98 .98 .97 .98 .98 .97 3 5.5 3 .98 .99 .98 .97 .97 .97 .98 .98 .98 .98 .98 .97 4 5.5 4 .97 .96 .97 .97 .98 .97 .97 .97 .97 .97 .97 .97 5 5.5 5 .90 .82 .88 .93 .96 .88 .89 .91 .91 .89 .87 .92 6 6 1 .93 .95 .93 .93 .93 .91 .93 .96 .93 .94 .94 .93 7 6 3 .94 .95 .94 .93 .93 .92 .94 .96 .93 .94 .95 .93 8 6 4 .94 .94 .94 .94 .94 .93 .93 .95 .93 .94 .94 .93 9 6 5 .89 .81 .88 .92 .95 .88 .89 .91 .90 .89 .87 .91 10 6.5 1 .85 .82 .85 .87 .88 .80 .85 .91 .85 .86 .87 .84 11 6.5 3 .86 .82 .85 .87 .88 .81 .85 .91 .85 .86 .87 .84 12 6.5 4 .86 .82 .86 .87 .88 .82 .86 .91 .85 .87 .88 .85 13 6.5 5 .85 .77 .85 .89 .91 .83 .84 .89 .85 .86 .85 .85 .1 .3 .5 .7 .4 .6 .8 8 12 0 2 Results – average test reliability False Negative Rate Results – average test reliability False Positive Rate Previous studies • • • Douglas & Mislevy (2010) Van Rijn, Béguin, & Verstralen (2012) McBee, Peters, & Waterman (2014)