CAT Item Selection and Person Fit: Predictive Efficiency and Detection of Atypical Symptom Profiles Barth B. Riley, Ph.D., Michael L. Dennis, Ph.D., Kendon J. Conrad, Ph.D. Funded by NIDA grant 1R21DA025731 Introduction • Do our measures accurately reflect a person’s performance or status? – Example: Persons with few endorsed symptoms, but symptoms of high severity • Person fit statistics offer a means of detecting these patterns. • But, detecting person misfit in CAT is problematic: – Reduced number of items administered – Selected items cover limited range of measurement continuum Item Selection in CAT • Optimized for efficiency and precision of measurement estimation. – e.g., maximizing Fisher’s information function • Alternative procedures could be devised to balance efficiency/precision and obtaining responses over a wider range of the measurement continuum – e.g., Linacre’s (1995) Bayesian falsification procedure Purpose of Study • Examine the predictive efficiency and sensitivity of various person fit indices to detecting misfit in CAT – Predictive efficiency: how well can we predict the overall pattern of misfit based on item responses collected via CAT? • What effect does different item selection methods have on our ability to detect person misfit in a CAT context? Hypotheses 1. Predictive efficiency of CAT-derived person fit statistics will be enhanced by selecting items from a wider range of the measurement continuum. 2. Greater predictive efficiency will improve detection of atypical responding. Data Source and Simulation Procedure • Data were from 4,360 individuals presenting to substance abuse treatment upon intake • Post-hoc CAT simulations were performed: – One parameter IRT (Rasch) dichotomous response model. – Maximum-likelihood estimation – Item Selection Procedures • Modified “Bayesian” falsification procedure (MBF) • Maximum Fisher’s Information (MFI) – Stop Rule: all items were administered to examine the effects of successive item administration on person fit indices. Internal Mental Distress Scale • The IMDS is a 42-item instrument that is part of the Global Appraisal of Individual Needs (Dennis et al., 2003). • Measures: – – – – – – Internal mental distress (second-order factor) Depression Anxiety Trauma Homicidality/Suicidality Somatic complaints • Validated using a 1-parameter IRT (Rasch) measurement model Modified Bayesian Falsification Item Selection (MBF) 1. Set the start value for the measure (θ0) at 0 logits. 2. Calculate a “target” measure: i. If previous item was endorsed or first item: θT = θi-1 + max(2,SE2) ii. Otherwise: θT = θi-1 – max(2,SE2) 3. For each unadministered item, compute the information function Ini(θT). 4. Select the item with the largest information function. Person Fit Statistics • Residual-based: – Infit, outfit (Wright & Stone, 1979; Wright, 1980) – Log infit and outfit (Wright & Stone, 1979) • Non-Parametric – Modified Caution Index (MCI; Harnisch & Linn, 1981) – HT (Sijtsma, 1986; Sijtsma & Meier, 1992) • Likelihood-Based – lz (Drasgow, Levine & Williams, 1985) • CAT-Specific (CUSUM; van Krimpen-Stoop & Meijer, 2000) – Used three different methods for estimating response residuals (T1, T3, and T6). Predictive Efficiency of Person Fit Statistics Predictive Efficiency, MFI Item Selection 110% 100% 90% 80% R 2 70% 60% 50% 40% 30% 20% 10% 0% 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Items Administered MCI Infit Log Infit HT CUSUM T1 CUSUM T3 CUSUM T6 Outfit Log Outfit Iz Predictive Efficiency, MBF Item Selection 110% 100% 90% 80% 60% 50% fs 40% 30% 20% 10% 0% 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 R 2 70% Items Administered MCI Infit Log Infi HT CUSUM T1 CUSUM T3 CUSUM T6 Outfit Log Outfit Iz Min. Number of Items to Achieve R2 = .80 Fit Statistic MCI HT MFI 13 18 MBF 11 17 Infit Log Infit Outfit 20 15 19 16 39 36 Log Outfit 19 19 lZ 38 34 CUSUM (T1) 26 26 CUSUM (T3) CUSUM (T6) Average 30 32 39 35 25.7 24.5 Identification of Persons with Atypical Suicide Atypical Suicide • Conrad and colleagues (2010) identified a subgroup with suicidal ideation with lower levels of depression, anxiety, trauma • In this study however, we defined atypical suicide as persons with: – 2+ suicidal symptoms – Level of internal mental distress is not predictive of suicidality. – Under typical CAT operation, these individuals would be unlikely to receive suicide items during a CAT session Suicide Groups Based on 2+ Symptoms 7% 2% N=7,348 91% Non-Suicidal Suicidal Atypical Suicide Predicting Atypical Suicide: All Items Variable IMDS MCI HT Infit/Log Infit Outfit Log Outfit lZ CUSUM (T1) CUSUM (T3) CUSUM (T6) Multivariate AUC Sensitivity Specificity 0.83 0.0 99.5 0.38 0.0 100.0 0.62 0.0 100.0 0.90 33.2 99.0 0.92 14.1 98.6 0.92 16.3 98.2 0.92 45.4 98.8 0.89 15.3 99.1 0.84 11.8 99.3 0.87 16.6 99.2 0.98 81.0 97.0 Sensitivity to Predict Atypical Suicide 110% 100% 90% 70% 60% 50% 40% 30% 20% 10% 0% 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Sensitivity 80% Items Administered IMDS Only-MFI IMDS Only-MBF IMDS+Fit Statistics--MFI IMDS+Fit Statistics--MBF Comparison of Item Selection Procedures First 5 Items Administered by CAT IMDS Subscales Trauma 0.20% 12.10% 18.50% 13.30% Somatic Complaints 40.70% Homicidality/Suicidality 1.40% 6.80% Anxiety 48.80% 33.80% Depression 24.40% 0% 10% 20% 30% 40% 50% 60% Percentage MFI MBF 70% 80% 90% 100% 110% MFI Items Administered MBF 42 40 41 37 38 39 34 35 36 32 33 29 30 31 26 27 28 24 25 21 22 23 18 19 20 16 17 13 14 15 11 12 8 9 10 6 7 5 3 4 2 CAT to Full Instrument Correlation CAT to Full Instrument Correlation 1.10 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 RMSE Measurement Precision (RMSE) 3.60 3.40 3.20 3.00 2.80 2.60 2.40 2.20 2.00 1.80 1.60 1.40 1.20 1.00 0.80 0.60 0.40 0.20 0.00 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Items Administered MFI MBF 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Mean Cum% of Test Information Test Information 110% 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Items Administered MFI BMF A Case Example MFI Item Selection and Measure Estimation 3.0 2.0 First suicide item administered Measure 1.0 0.0 -1.0 -2.0 -3.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Items Administered Difficulty Measure MBF Item Selection and Measure Estimation 4.0 3.0 First suicide item administered Measure 2.0 1.0 0.0 -1.0 -2.0 -3.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 Items Administered Difficulty Measure Comparison MFI MBF Full Measure -1.57 -0.58 -0.40 Std. Error 0.49 0.48 0.35 Outfit 0.82 2.20 2.10 Infit 0.85 1.25 1.51 lz 1.85 -1.53 -3.65 # Suicide 0 3 5 # Administered 19 22 42 Conclusions • Hypothesis 1: Item selection method had only a modest effect on predictive efficiency, though in the hypothesized direction. – MBF had strongest effect on outfit, lz and CUSUM (T6) • Partial support for Hypothesis 2: – MBF provided efficient detection of atypical suicide pattern – Reflects the type of items selected early in the CAT rather than on predictive efficiency • MBF was found to be somewhat less efficient than MFI Strengths and Limitations • Strengths – Large sample – Clinical sample – Several fit statistics examined • Limitations – Multidimensionality – Small item bank – Further work needed on defining “atypicalness” in clinical context – Further validation of approach across instruments, measurement models References • • • • • • • • • Conrad, K. J., Bezruczko, N., Chan, Y. F., Riley, B., Diamond, G., & Dennis, M. L. (2010). Screening for atypical suicide risk with person fit statistics among people presenting to alcohol and other drug treatment. Drug and Alcohol Dependence, 106(1), 92-100. Drasgow, F., Levine, M. V., & McLaughlin, M. E. (1987). Detecting inappropriate test scores with optimal and practical appropriateness indices. Applied Psychological Measurement, 11(1), 59-79. Harnisch, D. L., & Linn, R. L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18(2), 133146. Linacre, J. M. (1995). Computer-adaptive testing CAT: A Bayesiian approach. Rasch Measurement Transactions, 9(1), 412. Sijtsma, K. (1986). A coefficient of deviance of response patterns. Kwantitatieve Methoden, 7, 131–145. Sijtsma, K., & Meijer, R. R. (1992). A method for investigating the intersection of item response functions in Mokken’s non-parametric IRT model. Applied Psychological Measurement, 16(2), 149-157. van Krimpen-Stoop, E. M., & Meijer, R. R. (2000). Detecting person misfit in adaptive testing using statistical process control techniques. In W.J. van der Linden and C.A.W. Glas (Ed.), Computer adaptive testing: Theory and practice. Boston: Kluwer Academic. Wright, B. D. (1980). Afterword. In G. Rasch (Ed.), Probabilistic models for some intelligence and attainment tests: With foreword and afterword by Benjamin D. Wright. Chicago: MESA Press. Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: University of Chicago, MESA Press. Thank you! For more information, contact: Barth Riley, Ph.D. bbriley@chestnut.org For more information about the psychometrics of the Global Appraisal of Individual Needs (GAIN), including the Internal Mental Distress Scale, go to: http://www.chestnut.org/li/gain/#GAIN%20Working%20Papers