The Effect of CAT Item Selection

advertisement
CAT Item Selection and Person Fit:
Predictive Efficiency and Detection of
Atypical Symptom Profiles
Barth B. Riley, Ph.D., Michael L. Dennis,
Ph.D., Kendon J. Conrad, Ph.D.
Funded by NIDA grant 1R21DA025731
Introduction
• Do our measures accurately reflect a
person’s performance or status?
– Example: Persons with few endorsed
symptoms, but symptoms of high severity
• Person fit statistics offer a means of
detecting these patterns.
• But, detecting person misfit in CAT is
problematic:
– Reduced number of items administered
– Selected items cover limited range of
measurement continuum
Item Selection in CAT
• Optimized for efficiency and precision of
measurement estimation.
– e.g., maximizing Fisher’s information
function
• Alternative procedures could be devised
to balance efficiency/precision and
obtaining responses over a wider range
of the measurement continuum
– e.g., Linacre’s (1995) Bayesian falsification
procedure
Purpose of Study
• Examine the predictive efficiency and
sensitivity of various person fit indices
to detecting misfit in CAT
– Predictive efficiency: how well can we
predict the overall pattern of misfit based
on item responses collected via CAT?
• What effect does different item
selection methods have on our ability to
detect person misfit in a CAT context?
Hypotheses
1. Predictive efficiency of CAT-derived
person fit statistics will be enhanced
by selecting items from a wider range
of the measurement continuum.
2. Greater predictive efficiency will
improve detection of atypical
responding.
Data Source and Simulation Procedure
• Data were from 4,360 individuals presenting to
substance abuse treatment upon intake
• Post-hoc CAT simulations were performed:
– One parameter IRT (Rasch) dichotomous response
model.
– Maximum-likelihood estimation
– Item Selection Procedures
• Modified “Bayesian” falsification procedure (MBF)
• Maximum Fisher’s Information (MFI)
– Stop Rule: all items were administered to examine
the effects of successive item administration on
person fit indices.
Internal Mental Distress Scale
• The IMDS is a 42-item instrument that is part
of the Global Appraisal of Individual Needs
(Dennis et al., 2003).
• Measures:
–
–
–
–
–
–
Internal mental distress (second-order factor)
Depression
Anxiety
Trauma
Homicidality/Suicidality
Somatic complaints
• Validated using a 1-parameter IRT (Rasch)
measurement model
Modified Bayesian Falsification Item Selection
(MBF)
1. Set the start value for the measure (θ0)
at 0 logits.
2. Calculate a “target” measure:
i.
If previous item was endorsed or first item:
θT = θi-1 + max(2,SE2)
ii. Otherwise: θT = θi-1 – max(2,SE2)
3. For each unadministered item, compute
the information function Ini(θT).
4. Select the item with the largest
information function.
Person Fit Statistics
• Residual-based:
– Infit, outfit (Wright & Stone, 1979; Wright, 1980)
– Log infit and outfit (Wright & Stone, 1979)
• Non-Parametric
– Modified Caution Index (MCI; Harnisch & Linn,
1981)
– HT (Sijtsma, 1986; Sijtsma & Meier, 1992)
• Likelihood-Based
– lz (Drasgow, Levine & Williams, 1985)
• CAT-Specific (CUSUM; van Krimpen-Stoop &
Meijer, 2000)
– Used three different methods for estimating
response residuals (T1, T3, and T6).
Predictive Efficiency of Person Fit
Statistics
Predictive Efficiency, MFI Item Selection
110%
100%
90%
80%
R
2
70%
60%
50%
40%
30%
20%
10%
0%
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
Items Administered
MCI
Infit
Log Infit
HT
CUSUM T1
CUSUM T3
CUSUM T6
Outfit
Log Outfit
Iz
Predictive Efficiency, MBF Item Selection
110%
100%
90%
80%
60%
50%
fs
40%
30%
20%
10%
0%
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
R
2
70%
Items Administered
MCI
Infit
Log Infi
HT
CUSUM T1
CUSUM T3
CUSUM T6
Outfit
Log Outfit
Iz
Min. Number of Items to Achieve R2 = .80
Fit Statistic
MCI
HT
MFI
13
18
MBF
11
17
Infit
Log Infit
Outfit
20
15
19
16
39
36
Log Outfit
19
19
lZ
38
34
CUSUM (T1)
26
26
CUSUM (T3)
CUSUM (T6)
Average
30
32
39
35
25.7
24.5
Identification of Persons with Atypical
Suicide
Atypical Suicide
• Conrad and colleagues (2010) identified a
subgroup with suicidal ideation with lower
levels of depression, anxiety, trauma
• In this study however, we defined atypical
suicide as persons with:
– 2+ suicidal symptoms
– Level of internal mental distress is not predictive
of suicidality.
– Under typical CAT operation, these individuals
would be unlikely to receive suicide items during a
CAT session
Suicide Groups Based on 2+ Symptoms
7%
2%
N=7,348
91%
Non-Suicidal
Suicidal
Atypical Suicide
Predicting Atypical Suicide: All Items
Variable
IMDS
MCI
HT
Infit/Log Infit
Outfit
Log Outfit
lZ
CUSUM (T1)
CUSUM (T3)
CUSUM (T6)
Multivariate
AUC Sensitivity Specificity
0.83
0.0
99.5
0.38
0.0
100.0
0.62
0.0
100.0
0.90
33.2
99.0
0.92
14.1
98.6
0.92
16.3
98.2
0.92
45.4
98.8
0.89
15.3
99.1
0.84
11.8
99.3
0.87
16.6
99.2
0.98
81.0
97.0
Sensitivity to Predict Atypical Suicide
110%
100%
90%
70%
60%
50%
40%
30%
20%
10%
0%
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Sensitivity
80%
Items Administered
IMDS Only-MFI
IMDS Only-MBF
IMDS+Fit Statistics--MFI
IMDS+Fit Statistics--MBF
Comparison of Item Selection
Procedures
First 5 Items Administered by CAT
IMDS Subscales
Trauma
0.20%
12.10%
18.50%
13.30%
Somatic Complaints
40.70%
Homicidality/Suicidality
1.40%
6.80%
Anxiety
48.80%
33.80%
Depression
24.40%
0%
10%
20%
30%
40%
50%
60%
Percentage
MFI
MBF
70%
80%
90%
100%
110%
MFI
Items Administered
MBF
42
40
41
37
38
39
34
35
36
32
33
29
30
31
26
27
28
24
25
21
22
23
18
19
20
16
17
13
14
15
11
12
8
9
10
6
7
5
3
4
2
CAT to Full Instrument Correlation
CAT to Full Instrument Correlation
1.10
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
RMSE
Measurement Precision (RMSE)
3.60
3.40
3.20
3.00
2.80
2.60
2.40
2.20
2.00
1.80
1.60
1.40
1.20
1.00
0.80
0.60
0.40
0.20
0.00
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
Items Administered
MFI
MBF
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Mean Cum% of Test Information
Test Information
110%
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Items Administered
MFI
BMF
A Case Example
MFI Item Selection and Measure Estimation
3.0
2.0
First suicide item
administered
Measure
1.0
0.0
-1.0
-2.0
-3.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
Items Administered
Difficulty
Measure
MBF Item Selection and Measure Estimation
4.0
3.0
First suicide item
administered
Measure
2.0
1.0
0.0
-1.0
-2.0
-3.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
Items Administered
Difficulty
Measure
Comparison
MFI
MBF
Full
Measure
-1.57
-0.58 -0.40
Std. Error
0.49
0.48
0.35
Outfit
0.82
2.20
2.10
Infit
0.85
1.25
1.51
lz
1.85
-1.53 -3.65
# Suicide
0
3
5
# Administered
19
22
42
Conclusions
• Hypothesis 1: Item selection method had only
a modest effect on predictive efficiency,
though in the hypothesized direction.
– MBF had strongest effect on outfit, lz and CUSUM
(T6)
• Partial support for Hypothesis 2:
– MBF provided efficient detection of atypical suicide
pattern
– Reflects the type of items selected early in the
CAT rather than on predictive efficiency
• MBF was found to be somewhat less efficient
than MFI
Strengths and Limitations
• Strengths
– Large sample
– Clinical sample
– Several fit statistics examined
• Limitations
– Multidimensionality
– Small item bank
– Further work needed on defining “atypicalness” in
clinical context
– Further validation of approach across instruments,
measurement models
References
•
•
•
•
•
•
•
•
•
Conrad, K. J., Bezruczko, N., Chan, Y. F., Riley, B., Diamond, G., & Dennis, M. L. (2010).
Screening for atypical suicide risk with person fit statistics among people presenting to
alcohol and other drug treatment. Drug and Alcohol Dependence, 106(1), 92-100.
Drasgow, F., Levine, M. V., & McLaughlin, M. E. (1987). Detecting inappropriate test scores
with optimal and practical appropriateness indices. Applied Psychological Measurement,
11(1), 59-79.
Harnisch, D. L., & Linn, R. L. (1981). Analysis of item response patterns: Questionable test
data and dissimilar curriculum practices. Journal of Educational Measurement, 18(2), 133146.
Linacre, J. M. (1995). Computer-adaptive testing CAT: A Bayesiian approach. Rasch
Measurement Transactions, 9(1), 412.
Sijtsma, K. (1986). A coefficient of deviance of response patterns. Kwantitatieve Methoden,
7, 131–145.
Sijtsma, K., & Meijer, R. R. (1992). A method for investigating the intersection of item
response functions in Mokken’s non-parametric IRT model. Applied Psychological
Measurement, 16(2), 149-157.
van Krimpen-Stoop, E. M., & Meijer, R. R. (2000). Detecting person misfit in adaptive testing
using statistical process control techniques. In W.J. van der Linden and C.A.W. Glas (Ed.),
Computer adaptive testing: Theory and practice. Boston: Kluwer Academic.
Wright, B. D. (1980). Afterword. In G. Rasch (Ed.), Probabilistic models for some
intelligence and attainment tests: With foreword and afterword by Benjamin D. Wright.
Chicago: MESA Press.
Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: University of Chicago, MESA
Press.
Thank you!
For more information, contact:
Barth Riley, Ph.D.
bbriley@chestnut.org
For more information about the psychometrics of the Global
Appraisal of Individual Needs (GAIN), including the Internal
Mental Distress Scale, go to:
http://www.chestnut.org/li/gain/#GAIN%20Working%20Papers
Download