Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh © UCLES 2013 Outline • Item Response Theory (IRT) • Importance of Model Fit within IRT • Fit Procedures • • Issues and Limitations Lagrange Multiplier (LM) Test • An empirical study using LM Fit statistics • Sharing Results • Conclusions © UCLES 2013 Item Response Theory (IRT) A family of mathematical models that provide a common framework for describing people and items Examinee performance can be predicted in terms of the underlying trait Provides a means for estimating abilities of people and characteristics of items © UCLES 2013 IRT Models Dichotomous or Discrete 1 Parameter Logistic Model / Rasch (1PL) 2 Parameter Logistic Model (2PL) 3 Parameter Logistic Model (3PL) Polytomous or Scalar Partial Credit Model (PCM) Generalized Partial Credit Model (GPCM) Graded Response Model (GRM) © UCLES 2013 Shape of Item Response Function © UCLES 2013 Model for Item with 5 response categories Probability Response Category © UCLES 2013 IRT Applications IRT applications in language testing are mainly used in Test development Item banking Differential item functioning (DIF) Computerized adaptive testing (CAT) Test equating, linking and scaling Standard setting The utility of the IRT model is dependent upon the extent to which the model accurately reflects the data © UCLES 2013 Model Fit from Item Perspective Measurement Invariance (MI): Item responses can be described by the same parameters in all subpopulations. Item Characteristic Curve (ICC): Describes the relation between the latent variable and the observable responses to items. Local Independence (LI): Responses to different items are independent given the latent trait variable value. Uni-dimensionalty Speededness Global © UCLES 2013 Consequences of Misfit Yen (2000) and Wainer & Thissen (2003) have shown the inadequacy of model-data fit Some of the adverse consequences are: Biased ability estimates Unfair ranks Wrongly equated scores Student misclassifications Score precision Validity © UCLES 2013 Existing Item Fit Procedures Chi – Square Statistics Tests of the discrepancy between the observed and expected frequencies. Pearson-Type Item-Fit Indices (Yen, 1984; Bock, 1972). Likelihood Ratio Based Item-Fit Indices (McKinley & Mills, 1985). © UCLES 2013 Issues in Existing Fit Procedures The standard theory for chi-square statistics does not hold. Failure to take into account the stochastic nature of the item parameter estimates. Forming of subgroups for the test are based on modeldependent trait estimates. There is an issue of the number of degrees of freedom. It is sensitive to test length and sample size. © UCLES 2013 Lagrange Multiplier (LM) Test Glas(1999) proposed the LM test to the evaluation of model fit. The LM tests are used for testing a restricted model against a more general alternative one. Consider a null hypothesis about a model with parameters 0 This model is a special case of a general model with parameters '0 = ( '01 , c) LM ( c ) h ( c ) W ' © UCLES 2013 1 h (c ) LM Item Fit Statistics MI / DIF LI ICC © UCLES 2013 Pi ( n ) ex p ( i ( n i ) y n i )) 1 ex p ( i ( n i ) y n i )) P ( X ni 1, X nl 1 | n , il ) P ( X ni 1 | n , ig ) exp( i ( n i n l il )) 1 exp( i ( n i n l il )) exp( i ( n ig i )) 1 exp( i ( n ig i )) Null Model i 0 Alternative Model Null Model il 0 Alternative Model il 0 Null Model ig 0 Alternative Model ig 0 i 0 Empirical Example Data from Cambridge English First (FCE) – – Reading 3 parts/30 questions Listening 4 parts/30 questions Sample size over 35000 The approach can be applied to any other language exam © UCLES 2013 MI © UCLES 2013 Lagrange tests for Rasch MODEL -------------------------------------------------------------Focal-Group Reference Abs. Item LM df Prob Obs Exp Obs Exp Dif. -------------------------------------------------------------1 Item1 0.60 1 0.44 0.74 0.72 0.75 0.76 0.01 2 Item2 0.34 1 0.56 0.94 0.94 0.96 0.95 0.00 3 Item3 0.04 1 0.84 0.70 0.71 0.75 0.75 0.00 4 Item4 2.10 1 0.15 0.78 0.75 0.78 0.79 0.02 5 Item5 1.77 1 0.18 0.82 0.80 0.81 0.82 0.02 6 Item6 0.15 1 0.69 0.70 0.71 0.75 0.75 0.01 7 Item7 1.43 1 0.23 0.71 0.68 0.70 0.71 0.02 8 Item8 0.40 1 0.53 0.87 0.87 0.89 0.90 0.01 9 Item9 0.17 1 0.68 0.89 0.88 0.90 0.90 0.00 10 Item10 0.85 1 0.36 0.77 0.78 0.83 0.82 0.01 11 Item11 0.97 1 0.32 0.87 0.85 0.87 0.87 0.01 12 Item12 0.09 1 0.76 0.87 0.87 0.89 0.89 0.00 13 Item13 7.10 1 0.01 0.45 0.50 0.59 0.56 0.04 14 Item14 2.04 1 0.15 0.51 0.55 0.61 0.60 0.02 15 Item15 0.00 1 0.97 0.72 0.72 0.75 0.75 0.00 16 Item16 0.03 1 0.85 0.62 0.62 0.68 0.68 0.00 17 Item17 2.63 1 0.10 0.48 0.52 0.60 0.59 0.03 18 Item18 0.01 1 0.91 0.44 0.44 0.49 0.49 0.00 19 Item19 0.36 1 0.55 0.78 0.79 0.83 0.83 0.01 20 Item20 1.05 1 0.31 0.66 0.69 0.73 0.72 0.02 21 Item21 2.77 1 0.10 0.80 0.83 0.88 0.87 0.02 22 Item22 4.17 1 0.04 0.71 0.75 0.81 0.80 0.02 23 Item23 0.58 1 0.44 0.87 0.85 0.87 0.87 0.01 24 Item24 0.13 1 0.71 0.83 0.83 0.87 0.87 0.00 25 Item25 0.94 1 0.33 0.92 0.93 0.95 0.95 0.01 26 Item26 5.05 1 0.02 0.60 0.55 0.59 0.61 0.03 27 Item27 4.55 1 0.03 0.64 0.60 0.64 0.65 0.03 28 Item28 2.76 1 0.10 0.49 0.45 0.49 0.50 0.03 29 Item29 0.26 1 0.61 0.62 0.61 0.66 0.67 0.01 30 Item30 3.07 1 0.08 0.70 0.66 0.69 0.71 0.03 --------------------------------------------------------------- MI © UCLES 2013 Lagrange tests for Rasch MODEL -------------------------------------------------------------Focal-Group Reference Abs. Item LM df Prob Obs Exp Obs Exp Dif. -------------------------------------------------------------1 Item1 0.60 1 0.44 0.74 0.72 0.75 0.76 0.01 2 Item2 0.34 1 0.56 0.94 0.94 0.96 0.95 0.00 3 Item3 0.04 1 0.84 0.70 0.71 0.75 0.75 0.00 4 Item4 2.10 1 0.15 0.78 0.75 0.78 0.79 0.02 5 Item5 1.77 1 0.18 0.82 0.80 0.81 0.82 0.02 6 Item6 0.15 1 0.69 0.70 0.71 0.75 0.75 0.01 7 Item7 1.43 1 0.23 0.71 0.68 0.70 0.71 0.02 8 Item8 0.40 1 0.53 0.87 0.87 0.89 0.90 0.01 9 Item9 0.17 1 0.68 0.89 0.88 0.90 0.90 0.00 10 Item10 0.85 1 0.36 0.77 0.78 0.83 0.82 0.01 11 Item11 0.97 1 0.32 0.87 0.85 0.87 0.87 0.01 12 Item12 0.09 1 0.76 0.87 0.87 0.89 0.89 0.00 13 Item13 7.10 1 0.01 0.45 0.50 0.59 0.56 0.04 14 Item14 2.04 1 0.15 0.51 0.55 0.61 0.60 0.02 15 Item15 0.00 1 0.97 0.72 0.72 0.75 0.75 0.00 16 Item16 0.03 1 0.85 0.62 0.62 0.68 0.68 0.00 17 Item17 2.63 1 0.10 0.48 0.52 0.60 0.59 0.03 18 Item18 0.01 1 0.91 0.44 0.44 0.49 0.49 0.00 19 Item19 0.36 1 0.55 0.78 0.79 0.83 0.83 0.01 20 Item20 1.05 1 0.31 0.66 0.69 0.73 0.72 0.02 21 Item21 2.77 1 0.10 0.80 0.83 0.88 0.87 0.02 22 Item22 4.17 1 0.04 0.71 0.75 0.81 0.80 0.02 23 Item23 0.58 1 0.44 0.87 0.85 0.87 0.87 0.01 24 Item24 0.13 1 0.71 0.83 0.83 0.87 0.87 0.00 25 Item25 0.94 1 0.33 0.92 0.93 0.95 0.95 0.01 26 Item26 5.05 1 0.02 0.60 0.55 0.59 0.61 0.03 27 Item27 4.55 1 0.03 0.64 0.60 0.64 0.65 0.03 28 Item28 2.76 1 0.10 0.49 0.45 0.49 0.50 0.03 29 Item29 0.26 1 0.61 0.62 0.61 0.66 0.67 0.01 30 Item30 3.07 1 0.08 0.70 0.66 0.69 0.71 0.03 --------------------------------------------------------------- MI © UCLES 2013 Lagrange tests for Rasch MODEL -------------------------------------------------------------Focal-Group Reference Abs. Item LM df Prob Obs Exp Obs Exp Dif. -------------------------------------------------------------1 Item1 0.60 1 0.44 0.74 0.72 0.75 0.76 0.01 2 Item2 0.34 1 0.56 0.94 0.94 0.96 0.95 0.00 3 Item3 0.04 1 0.84 0.70 0.71 0.75 0.75 0.00 4 Item4 2.10 1 0.15 0.78 0.75 0.78 0.79 0.02 5 Item5 1.77 1 0.18 0.82 0.80 0.81 0.82 0.02 6 Item6 0.15 1 0.69 0.70 0.71 0.75 0.75 0.01 7 Item7 1.43 1 0.23 0.71 0.68 0.70 0.71 0.02 8 Item8 0.40 1 0.53 0.87 0.87 0.89 0.90 0.01 9 Item9 0.17 1 0.68 0.89 0.88 0.90 0.90 0.00 10 Item10 0.85 1 0.36 0.77 0.78 0.83 0.82 0.01 11 Item11 0.97 1 0.32 0.87 0.85 0.87 0.87 0.01 12 Item12 0.09 1 0.76 0.87 0.87 0.89 0.89 0.00 13 Item13 7.10 1 0.01 0.45 0.50 0.59 0.56 0.04 14 Item14 2.04 1 0.15 0.51 0.55 0.61 0.60 0.02 15 Item15 0.00 1 0.97 0.72 0.72 0.75 0.75 0.00 16 Item16 0.03 1 0.85 0.62 0.62 0.68 0.68 0.00 17 Item17 2.63 1 0.10 0.48 0.52 0.60 0.59 0.03 18 Item18 0.01 1 0.91 0.44 0.44 0.49 0.49 0.00 19 Item19 0.36 1 0.55 0.78 0.79 0.83 0.83 0.01 20 Item20 1.05 1 0.31 0.66 0.69 0.73 0.72 0.02 21 Item21 2.77 1 0.10 0.80 0.83 0.88 0.87 0.02 22 Item22 4.17 1 0.04 0.71 0.75 0.81 0.80 0.02 23 Item23 0.58 1 0.44 0.87 0.85 0.87 0.87 0.01 24 Item24 0.13 1 0.71 0.83 0.83 0.87 0.87 0.00 25 Item25 0.94 1 0.33 0.92 0.93 0.95 0.95 0.01 26 Item26 5.05 1 0.02 0.60 0.55 0.59 0.61 0.03 27 Item27 4.55 1 0.03 0.64 0.60 0.64 0.65 0.03 28 Item28 2.76 1 0.10 0.49 0.45 0.49 0.50 0.03 29 Item29 0.26 1 0.61 0.62 0.61 0.66 0.67 0.01 30 Item30 3.07 1 0.08 0.70 0.66 0.69 0.71 0.03 --------------------------------------------------------------- ICC for Rasch MODEL Lagrange multipliers --------------------------------------------------------------------------Abs. 3 2 Groups: 1 Prob Obs. Exp. Obs. Exp. Obs. Exp. Dif. df LM Item --------------------------------------------------------------------------0.17 0.56 0.55 0.72 0.71 0.82 0.83 0.01 2 3.56 1 Item1 0.37 0.60 0.59 0.79 0.78 0.89 0.90 0.01 2 1.98 2 Item2 0.54 0.54 0.56 0.76 0.74 0.86 0.87 0.01 2 1.25 3 Item3 0.54 0.67 0.66 0.83 0.83 0.91 0.92 0.01 2 1.23 4 Item4 0.24 0.71 0.71 0.86 0.84 0.91 0.92 0.01 2 2.81 5 Item5 0.23 0.58 0.57 0.68 0.71 0.84 0.83 0.02 2 2.96 6 Item6 0.27 0.17 0.19 0.33 0.31 0.49 0.49 0.01 2 2.65 7 Item7 0.09 0.65 0.66 0.76 0.77 0.87 0.86 0.01 2 4.82 8 Item8 0.11 0.20 0.20 0.33 0.36 0.60 0.58 0.02 2 4.40 9 Item9 0.14 0.24 0.23 0.51 0.54 0.84 0.82 0.02 2 3.89 10 Item10 0.44 0.73 0.72 0.86 0.88 0.95 0.95 0.01 2 1.62 11 Item11 0.00 0.42 0.37 0.50 0.57 0.77 0.76 0.04 2 19.55 12 Item12 0.63 0.43 0.44 0.76 0.75 0.91 0.92 0.01 2 0.94 13 Item13 0.24 0.64 0.63 0.89 0.88 0.96 0.97 0.01 2 2.82 14 Item14 0.00 0.36 0.36 0.65 0.63 0.81 0.84 0.02 2 11.03 15 Item15 0.14 0.52 0.51 0.83 0.83 0.95 0.96 0.01 2 3.88 16 Item16 0.66 0.51 0.51 0.77 0.77 0.92 0.92 0.01 2 0.84 17 Item17 0.65 0.25 0.25 0.41 0.41 0.59 0.60 0.01 2 0.85 18 Item18 0.61 0.49 0.50 0.70 0.70 0.86 0.85 0.01 2 0.99 19 Item19 0.64 0.34 0.33 0.59 0.59 0.81 0.81 0.00 2 0.90 20 Item20 0.60 0.18 0.17 0.27 0.28 0.44 0.43 0.01 2 1.02 21 Item21 0.23 0.43 0.44 0.72 0.72 0.90 0.89 0.01 2 2.92 22 Item22 0.88 0.73 0.73 0.93 0.93 0.98 0.98 0.00 2 0.26 23 Item23 0.48 0.69 0.70 0.91 0.90 0.97 0.97 0.01 2 1.47 24 Item24 0.74 0.45 0.46 0.61 0.59 0.71 0.72 0.01 2 0.61 25 Item25 0.01 0.53 0.56 0.74 0.71 0.81 0.82 0.02 2 8.56 26 Item26 0.25 0.36 0.36 0.56 0.58 0.79 0.78 0.01 2 2.76 27 Item27 0.44 0.38 0.36 0.53 0.56 0.76 0.75 0.02 2 1.64 28 Item28 0.86 0.55 0.55 0.78 0.79 0.92 0.92 0.00 2 0.31 29 Item29 0.33 0.37 0.39 0.53 0.50 0.62 0.63 0.02 2 2.21 30 Item30 --------------------------------------------------------------------------© UCLES 2013 LI Lagrange multipliers for Rasch MODEL ------------------------------------------------------Itm Itm LM df Prob Observed Expected Abs.Dif ------------------------------------------------------2 1 0.15 1 0.70 0.55 0.55 0.62 0.63 0.01 3 2 6.31 1 0.04 0.57 0.59 0.71 0.69 0.01 4 3 1.79 1 0.18 0.62 0.64 0.72 0.71 0.02 5 4 0.26 1 0.61 0.72 0.73 0.77 0.77 0.01 6 5 0.07 1 0.79 0.75 0.75 0.82 0.82 0.01 7 6 0.02 1 0.88 0.51 0.52 0.62 0.61 0.03 8 7 23.95 1 0.00 0.53 0.59 0.70 0.66 0.03 9 8 0.27 1 0.61 0.61 0.61 0.76 0.76 0.01 10 9 1.97 1 0.16 0.40 0.42 0.68 0.67 0.01 11 10 1.20 1 0.27 0.61 0.60 0.78 0.79 0.01 12 11 24.08 1 0.00 0.72 0.77 0.93 0.91 0.05 13 12 2.11 1 0.15 0.53 0.56 0.81 0.80 0.01 14 13 4.24 1 0.06 0.68 0.71 0.91 0.90 0.01 15 14 41.66 1 0.00 0.14 0.25 0.62 0.60 0.05 16 15 4.02 1 0.07 0.70 0.69 0.84 0.85 0.02 17 16 7.04 1 0.01 0.66 0.70 0.87 0.86 0.01 18 17 4.37 1 0.08 0.51 0.55 0.80 0.79 0.01 19 18 13.69 1 0.00 0.52 0.57 0.84 0.82 0.04 20 19 2.04 1 0.12 0.69 0.70 0.93 0.91 0.02 21 20 3.85 1 0.05 0.41 0.46 0.67 0.66 0.01 22 21 1.71 1 0.11 0.80 0.82 0.92 0.91 0.01 23 22 2.01 1 0.16 0.79 0.82 0.94 0.94 0.01 24 23 10.60 1 0.00 0.62 0.72 0.93 0.92 0.03 25 24 1.02 1 0.31 0.61 0.58 0.84 0.84 0.02 26 25 2.34 1 0.13 0.58 0.60 0.82 0.82 0.01 27 26 2.10 1 0.09 0.41 0.45 0.67 0.65 0.02 28 27 1.62 1 0.92 0.86 0.85 0.89 0.91 0.02 29 28 0.17 1 0.68 0.48 0.47 0.63 0.63 0.01 30 29 0.47 1 0.49 0.77 0.77 0.86 0.86 0.01 ------------------------------------------------------© UCLES 2013 Conclusions LM statistics overcome existing FIT issues Less computational intensive Size of residuals in the form of Abs.Dif is highly valuable Fit of IRT model holds reasonably (FCE) Items violated - MI (4); ICC (3); LI (7) Magnitude of violation is not severe © UCLES 2013 Thank you! & Questions © UCLES 2013