Comparing Decision Rules

advertisement
Comparing
Decision
Rules
Decision accuracy of different decision rules combining multiple
measures in a higher educational context
Iris Yocarini, Samantha Bouwmeester, Guus Smeets, and Lidia Arends
CEMO conference standard setting 23rd september 2015
The decision to be made
Student to
second
bachelor year
Student
leaves
bachelor
program
Start of first
bachelor year
End of first
bachelor year
BSA
decision
Decision accuracy
•
Given high stakes an accurate decision is required
•
Comparison decision based on true score vs. observed
score
True
Score
Error
Observed
score
Decision accuracy
•
Given high stakes an accurate decision is required
•
Comparison decision based on true score vs. observed
score
Decision based on true score
Fail
Decision based
on observed
score
Fail
Pass
Correct classification Misclassification
False negative
Pass Misclassification
False positive
Correct classification
Decision accuracy
Decision based on true score
Fail
Decision based
on observed
score
Fail
Correct
classification
Pass Misclassification
False positive
•
Total proportion of misclassifications
•
(C + B / total sample)
Pass
A
Misclassification
False negative
B
C
Correct
classification
D
Decision accuracy
Decision based on true score
Fail
Decision based
on observed
score
Fail
Correct
classification
Pass Misclassification
False positive
•
•
Pass
A
Misclassification
False negative
B
C
Correct
classification
D
Total proportion of misclassifications
•
(C + B / total sample)
False negative rate
•
from all truly competent students those who are identified as fails (B/B+D)
Decision accuracy
Decision based on true score
Fail
Decision based
on observed
score
Fail
Correct
classification
Pass Misclassification
False positive
•
•
•
Pass
A
Misclassification
False negative
B
C
Correct
classification
D
Total proportion of misclassifications
•
(C + B / total sample)
False negative rate
•
from all truly competent students those who are identified as fails (B/B+D)
False positive rate
•
from all truly failing students those who are identified as passes (C/A+C)
Decision accuracy
Decision based on true score
Fail
Decision based
on observed
score
Fail
Correct
classification
Pass Misclassification
False positive
•
•
•
•
Pass
A
Misclassification
False negative
B
C
Correct
classification
D
Total proportion of misclassifications
•
(C + B / total sample)
False negative rate
•
from all truly competent students those who are identified as fails (B/B+D)
False positive rate
•
from all truly failing students those who are identified as passes (A/A+C)
Positive predictive value
•
from all students who passed those who are correctly classified (D/C+D)
Testing system
•
Compensatory testing system at Erasmus University
Rotterdam
• Vs. standard conjunctive testing system in Dutch higher
education
•
Debate
Reasons behind implementation
•
Educational views
•
Psychometric argument
• Classical Test Theory (CTT): average more reliable
Assumption of parallel tests
• Equal true ability levels
• Similar test reliabilities
Factors influencing decision accuracy
•
Reliability
•
Decision accuracy
True
Score
Observed
score
Decision based on true score
Error
Error
Fail
Decision based
on observed
score
Fail
Pass
Correct classification Misclassification
False negative
Pass Misclassification
False positive
Correct classification
Decision rules in practice
•
Educational setting: combinatory decision rules
• Compensatory aspect: required GPA
• Conjunctive aspect: required minimum grade
•
Clusters
•
First year psychology at Erasmus University
• Grading scale: 1.0 – 10.0
• GPA: 6.0
• Minimum grade: 4.0
• Two clusters with each 8 courses
Our study
•
Aim of study
• Comparing decision accuracy different decision rules
that combine multiple tests
• Evaluating psychometric argument for implementation
compensatory testing system
• CTT: average grade is more reliable than using
individual test scores
•
Context of first year Psychology students at Erasmus
University Rotterdam
Decision rules
•
Varying
• Conjunctive aspect: minimum required grade
• Compensatory aspect: required GPA
•
Also included
• Fully conjunctive rule
• Fully compensatory rule
Decision rule
Minimum grade
GPA
Fully Conjunctive
5.5
5.5
Fully Compensatory
1.0
5.5 / 6.0 / 6.5
Complex rules
3.0 / 4.0 / 5.0
5.5 / 6.0 / 6.5
*Grading from 1.0 to 10.0
Simulation
•
Simulation study
Decision based on true score
Decision based
on observed
score
•
Fail
Pass
Fail
Correct classification
Misclassification
False negative
Pas
s
Misclassification
False positive
Correct classification
Manipulation of factors
Results – minimum grade & GPA
•
•
Minimum grade
• 1.0/ 3.0/ 4.0/ 5.0
GPA
• 5.5/ 6.0/ 6.5
Results – minimum grade & GPA
Results – minimum grade & GPA
Results – minimum grade & GPA
Results – average test reliability
Proportion of Misclassifications
Results – average test reliability
Positive Predictive Value
Results – number of retakes
Proportion of Misclassifications
Results – number of retakes
False Negative Rate
Results – number of retakes
False Positive Rate
Results – number of retakes
Positive Predictive Value
Comparison conjunctive &
compensatory
•
In compensatory decision rule:
• Fewer classification errors
• Fewer false negatives, more false positives
• Positive predictive value higher
Conclusion
•
Increasing the degree of compensation results in less
classification errors
•
Within compensatory decision rule relatively fewer false
negatives and more false positives
•
Depends on specific setting & tests used
• Most important: test reliability and number of retakes
•
•
Psychometric argument
Standard setting
Take home message
•
•
Decision accuracy important consideration
Focus on both specific decision rule as well as tests
Thank you for your attention!
Questions?
yocarini@fsw.eur.nl
Results – proportion of misclassifications
Number of
Retakes
Minimum
Number of
Tests
GPA
Average Test
Reliability
Decision
Rule
Average Test
Correlation
Mean
Proportion
Errors
1
5.5
5.5
.18
.16
.18
.19
.19
.24
.18
.12
.19
.17
.20
.16
2
5.5
1
.05
.03
.04
.05
.06
.06
.04
.03
.05
.04
.06
.03
3
5.5
3
.08
.09
.09
.08
.07
.14
.07
.04
.08
.09
.13
.04
4
5.5
4
.17
.20
.18
.16
.13
.27
.16
.08
.15
.19
.26
.08
5
5.5
5
.22
.23
.23
.22
.21
.31
.22
.14
.22
.23
.28
.16
6
6
1
.09
.09
.10
.09
.09
.13
.09
.06
.10
.09
.10
.08
7
6
3
.11
.13
.11
.10
.09
.16
.10
.06
.11
.11
.14
.08
8
6
4
.16
.20
.17
.14
.12
.24
.15
.08
.15
.17
.22
.10
9
6
5
.21
.23
.22
.21
.19
.29
.21
.13
.20
.22
.27
.15
10
6.5
1
.13
.17
.13
.12
.10
.18
.13
.08
.14
.12
.13
.13
11
6.5
3
.13
.17
.14
.12
.10
.19
.13
.08
.14
.13
.13
.13
12
6.5
4
.15
.19
.15
.13
.11
.21
.14
.09
.15
.14
.16
.13
13
6.5
5
.17
.20
.18
.16
.14
.24
.17
.11
.17
.17
.20
.14
.1
.3
.5
.7
.4
.6
.8
8
12
0
2
Results - sensitivity
Number of
Retakes
Minimum
Number of
Tests
GPA
Average Test
Reliability
Decision
Rule
Average Test
Correlation
Mean
Sensitivity
1
5.5
5.5
.60
.52
.59
.64
.67
.45
.60
.76
.65
.56
.44
.77
2
5.5
1
.97
.98
.97
.97
.97
.96
.98
.99
.97
.98
.96
.99
3
5.5
3
.93
.91
.92
.93
.95
.87
.94
.98
.94
.92
.88
.98
4
5.5
4
.83
.79
.81
.84
.87
.71
.84
.93
.85
.80
.71
.94
5
5.5
5
.67
.61
.66
.69
.72
.52
.68
.82
.71
.63
.51
.83
6
6
1
.95
.94
.95
.95
.95
.93
.95
.97
.94
.95
.92
.98
7
6
3
.92
.90
.91
.93
.94
.87
.93
.96
.92
.92
.87
.97
8
6
4
.85
.80
.83
.87
.90
.74
.86
.93
.87
.83
.75
.95
9
6
5
.68
.61
.67
.71
.75
.53
.69
.83
.73
.64
.52
.84
10
6.5
1
.92
.89
.91
.93
.94
.89
.92
.94
.91
.92
.88
.96
11
6.5
3
.90
.86
.90
.92
.93
.86
.91
.94
.90
.90
.85
.95
12
6.5
4
.86
.80
.85
.88
.91
.78
.87
.93
.87
.85
.78
.94
13
6.5
5
.73
.64
.71
.76
.81
.59
.74
.86
.77
.69
.58
.88
.1
.3
.5
.7
.4
.6
.8
8
12
0
2
Results - specificity
Number of
Retakes
Minimum
Number of
Tests
GPA
Average Test
Reliability
Decision
Rule
Average Test
Correlation
Mean
Specificity
1
5.5
5.5
.93
.92
.92
.93
.94
.93
.92
.93
.91
.94
.96
.89
2
5.5
1
.67
.57
.66
.71
.74
.58
.67
.77
.66
.69
.76
.58
3
5.5
3
.72
.66
.72
.75
.77
.69
.71
.77
.70
.75
.82
.63
4
5.5
4
.80
.75
.79
.83
.85
.82
.79
.81
.78
.83
.87
.74
5
5.5
5
.89
.86
.88
.90
.92
.90
.88
.89
.87
.91
.93
.85
6
6
1
.73
.65
.73
.77
.79
.64
.73
.82
.72
.75
.81
.65
7
6
3
.75
.68
.75
.78
.80
.69
.74
.83
.73
.77
.84
.66
8
6
4
.80
.75
.80
.82
.83
.78
.78
.84
.78
.82
.88
.72
9
6
5
.89
.86
.88
.90
.91
.90
.88
.89
.87
.91
.93
.84
10
6.5
1
.80
.74
.80
.83
.84
.72
.80
.88
.79
.82
.87
.74
11
6.5
3
.81
.75
.80
.83
.84
.73
.81
.88
.79
.82
.87
.74
12
6.5
4
.83
.78
.82
.85
.86
.78
.82
.89
.81
.85
.90
.76
13
6.5
5
.88
.87
.88
.89
.90
.88
.87
.90
.86
.91
.94
.83
.1
.3
.5
.7
.4
.6
.8
8
12
0
2
Results – positive predictive value
Number of
Retakes
Minimum
Number of
Tests
GPA
Average Test
Reliability
Decision
Rule
Average Test
Correlation
Mean
Positive
Predictive
Value
1
5.5
5.5
.82
.68
.80
.88
.93
.79
.82
.86
.83
.81
.79
.86
2
5.5
1
.98
.99
.98
.97
.96
.97
.98
.98
.97
.98
.98
.97
3
5.5
3
.98
.99
.98
.97
.97
.97
.98
.98
.98
.98
.98
.97
4
5.5
4
.97
.96
.97
.97
.98
.97
.97
.97
.97
.97
.97
.97
5
5.5
5
.90
.82
.88
.93
.96
.88
.89
.91
.91
.89
.87
.92
6
6
1
.93
.95
.93
.93
.93
.91
.93
.96
.93
.94
.94
.93
7
6
3
.94
.95
.94
.93
.93
.92
.94
.96
.93
.94
.95
.93
8
6
4
.94
.94
.94
.94
.94
.93
.93
.95
.93
.94
.94
.93
9
6
5
.89
.81
.88
.92
.95
.88
.89
.91
.90
.89
.87
.91
10
6.5
1
.85
.82
.85
.87
.88
.80
.85
.91
.85
.86
.87
.84
11
6.5
3
.86
.82
.85
.87
.88
.81
.85
.91
.85
.86
.87
.84
12
6.5
4
.86
.82
.86
.87
.88
.82
.86
.91
.85
.87
.88
.85
13
6.5
5
.85
.77
.85
.89
.91
.83
.84
.89
.85
.86
.85
.85
.1
.3
.5
.7
.4
.6
.8
8
12
0
2
Results – average test reliability
False Negative Rate
Results – average test reliability
False Positive Rate
Previous studies
•
•
•
Douglas & Mislevy (2010)
Van Rijn, Béguin, & Verstralen (2012)
McBee, Peters, & Waterman (2014)
Download