Model Fit

advertisement
Identification of Misfit Item Using IRT Models
Dr Muhammad Naveed Khalid
Outline
1.
2.
3.
4.
5.
6.
7.
8.
Item Response Theory
Model Fit
Fit Procedures
Issues and Limitations
Lagrange Multiplier (LM) Test
Simulation Design
Results
Conclusions
Item Response Theory

Item response theory (IRT) also known as latent trait theory,
strong true score theory, or modern mental test theory, is a
paradigm for the design, analysis, and scoring of tests,
questionnaires, and similar instruments measuring abilities,
attitudes, or other variables.

Some well documented advantages over CTT are
1) Invariance Item and Ability Estimates
2) Computer Adaptive Testing
3) Equating
4) Development of Item Bank
5) Reliability
Model Fit

IRT models are based on a number of explicit assumptions.

Uni-dimensionalty: Assumption entails that the item/test should
measure only one ability, trait or construct.

DIF (MI): The assumption entails that the item responses can be
described by the same parameters in all sub-populations.

ICC: The shape of item response function which describes the
relation between the latent variable and the observable responses to
items is invariant.

Local Independence:
The local independence, assumes that
responses to different items are independent given the latent trait
variable value.

Speededness: The score-oriented perspective focuses on the effect
of speededness on examinees’ test scores, while the fairnessoriented perspective focuses on the degree to which speededness
adversely affects some examinees relative to others.
Consequences of Misfit

Yen (1981) and Wainer & Thissen (1987) have shown inadequacy
of model-data fit have adverse consequences such as
1)
2)
3)
4)
Biased ability estimates
Unfair ranks
Wrongly equated scores
Validity
Fit Procedures

The fit of item response theory models can be evaluated by the
computation of residuals and the associated test statistics.
Chi – Square Statistics

Tests of the discrepancy between the observed and expected
frequencies.

Pearson-Type Item-Fit Indices (Yen, 1984; Bock, 1972).

Likelihood Ratio Based Item-Fit Indices (McKinley & Mills, 1985).
Issues and Limitations

Glas and Suarez Falcon (2003) note that the standard theory for
chi-square statistics does not hold in the IRT context because the
observations on which the statistics are based do not have a
multinomial or Poisson distribution.

Glas and Suarez Falcon (2003) have also criticized these
procedures for failing to take into account the stochastic nature of
the item parameter estimates.

Orlando and Thissen (2000) argued that because the observed
proportions correct are based on model-dependent trait
estimates, the degrees of freedom may not be as claimed.
Continue’d

The problem of huge power in large samples.

The fact that they lose their validity when the model is
grossly violated.

The fact that they do not directly reveal the impact of the
model violation for the envisioned application.

They do not provide diagnostic information.
Lagrange Multiplier (LM) Test

Glas(1999) proposed the LM test to the evaluation of model fit.

The LM tests are used for testing a restricted model against a more
general alternative.

LM test is based on the evaluation of the first-order partial derivatives of
the log-likelihood function of the general model, evaluated using the
maximum likelihood estimates of the restricted model.

Consider a null hypothesis about a model with parameters 0
This model is a special case of a general model with parameters
 '0 = ( '01 , c)
LM (c)  h(c)'W 1h(c)

LM Item Fit Statistics
DIF
exp(i ( n  i )  yn i ))
Pi (n ) 
1  exp(i ( n  i )  yn i ))
LOC
exp(i (n  i  n  l  il ))
P( X ni  1, X nl  1| n , il ) 
1  exp(i (n  i  n  l  il ))
ICC
P( X ni  1| n , ig ) 
Null Model
Null Model
Null Model
i  0
 il  0
 ig  0
exp(i (n   ig  i ))
1  exp(i (n   ig  i ))
Alternative Model
Alternative Model
Alternative Model
i  0
 il  0
 ig  0
Simulation Design

The 1-PL,2-PL & 3-PL Model is used for generation and calibration.

Test length (10, 20, 40) and examinee sample size (100, 400,1000).

Item difficulty and discrimination parameters were drawn from
standard normal and log normal distribution respectively.

Ability parameters were drawn from a standard normal distribution.

The effect size, degree of misfit, was varying as 0.5, 1.0.

The number of misfit items varies in each test from 10% to 40%.

Nominal significance level of 5 % was used.

100 replications were carried out in each condition of study.
The power and Type I error by test length, effect size and sample size under Rasch model
Power
Type I error rate
Number of Items with MI
K
δ
N
10%
20%
10%
20%
10
0.5
100
0.42
0.32
0.07
0.07
400
0.89
0.71
0.06
0.07
1000
0.99
0.99
0.05
0.06
100
0.84
0.76
0.11
0.15
400
1.00
1.00
0.07
0.08
1000
1.00
1.00
0.07
0.09
100
0.50
0.42
0.05
0.07
400
0.89
0.83
0.06
0.06
1000
1.00
0.99
0.05
0.08
100
0.95
0.86
0.05
0.09
400
1.00
1.00
0.06
0.07
1000
1.00
1.00
0.08
0.09
100
0.48
0.47
0.11
0.14
400
0.89
0.88
0.06
0.07
1000
1.00
1.00
0.06
0.08
100
0.95
0.92
0.11
0.12
400
1.00
1.00
0.06
0.07
1000
1.00
1.00
0.06
0.07
1.0
20
0.5
1.0
40
0.5
1.0
The power and Type I error by test length, effect size and sample size under Rasch model
Power
Type I error rate
Number of Items with LOC
K
δ
N
10%
20%
10%
20%
10
0.5
100
0.22
0.23
0.06
0.06
400
0.60
0.47
0.06
0.08
1000
0.96
0.85
0.06
0.08
100
0.71
0.52
0.10
0.12
400
1.00
0.90
0.08
0.10
1000
1.00
0.97
0.05
0.07
100
0.54
0.35
0.06
0.07
400
0.89
0.75
0.08
0.09
1000
1.00
0.97
0.05
0.06
100
0.86
0.83
0.09
0.09
400
1.00
0.99
0.09
0.10
1000
1.00
1.00
0.06
0.08
100
0.34
0.36
0.05
0.05
400
0.48
0.44
0.06
0.05
1000
0.98
0.86
0.05
0.06
100
0.49
0.52
0.06
0.07
400
1.00
1.00
0.06
0.06
1000
1.00
1.00
0.05
0.06
1.0
20
0.5
1.0
40
0.5
1.0
The power and Type I error by test length, effect size and sample size under Rasch model
Power
Type I error rates
Number of Items with ICC
K
δ
N
10%
20%
10%
20%
10
0.5
100
0.55
0.40
0.05
0.06
400
0.53
0.48
0.06
0.07
1000
0.77
0.61
0.05
0.05
100
0.97
0.77
0.05
0.06
400
0.97
0.78
0.05
0.05
1000
1.00
0.98
0.05
0.06
100
0.65
0.61
0.05
0.05
400
0.88
0.71
0.05
0.05
1000
0.81
0.68
0.05
0.06
100
0.97
0.90
0.06
0.06
400
0.95
0.87
0.05
0.05
1000
1.00
0.98
0.05
0.06
100
0.47
0.40
0.05
0.06
400
0.98
0.92
0.06
0.07
1000
1.00
0.96
0.05
0.05
100
0.53
0.49
0.06
0.06
400
0.99
0.96
0.05
0.06
1000
1.00
1.00
0.05
0.05
1.0
20
0.5
1.0
40
0.5
1.0
An Empirical Example
Lagrange Multiplier Tests (DIF) for Rasch Model
------------------------------------------------------------Focal-Group Reference
Abs.
Item
LM
df
Prob Obs
Exp
Obs
Exp
Dif.
------------------------------------------------------------1 Item1
3.26
1
0.07 0.73 0.76 0.76 0.74 0.02
2 Item2
0.63
1
0.43 0.95 0.95 0.95 0.95 0.01
3 Item3
0.91
1
0.34 0.75 0.76 0.74 0.73 0.01
4 Item4
1.48
1
0.22 0.78 0.80 0.79 0.77 0.01
5 Item5
1.60
1
0.21 0.81 0.83 0.82 0.81 0.02
6 Item6
0.00
1
0.96 0.76 0.76 0.73 0.73 0.00
7 Item7
0.15
1
0.70 0.72 0.71 0.69 0.70 0.01
8 Item8
0.37
1
0.54 0.91 0.90 0.88 0.88 0.01
9 Item9
0.45
1
0.50 0.91 0.90 0.89 0.89 0.01
10 Item10
0.02
1
0.90 0.83 0.83 0.81 0.81 0.00
11 Item11
0.11
1
0.74 0.88 0.88 0.86 0.86 0.00
12 Item12
4.68
1
0.03 0.91 0.89 0.86 0.88 0.02
13 Item13
0.38
1
0.54 0.60 0.59 0.53 0.53 0.01
14 Item14
0.08
1
0.77 0.60 0.61 0.58 0.58 0.00
15 Item15
3.89
1
0.05 0.79 0.76 0.71 0.74 0.03
16 Item16
0.14
1
0.71 0.70 0.69 0.64 0.65 0.00
17 Item17
0.72
1
0.40 0.62 0.61 0.54 0.55 0.01
18 Item18
0.53
1
0.47 0.49 0.50 0.47 0.46 0.01
19 Item19
0.15
1
0.69 0.84 0.84 0.81 0.82 0.00
20 Item20
0.04
1
0.83 0.74 0.74 0.71 0.71 0.00
21 Item21
0.05
1
0.82 0.87 0.87 0.85 0.85 0.00
22 Item22
1.72
1
0.19 0.79 0.80 0.78 0.77 0.01
23 Item23
0.38
1
0.54 0.87 0.88 0.87 0.86 0.01
24 Item24
2.77
1
0.10 0.85 0.87 0.87 0.85 0.02
25 Item25
0.37
1
0.54 0.95 0.95 0.94 0.94 0.00
26 Item26
2.47
1
0.12 0.65 0.63 0.56 0.58 0.02
27 Item27
0.39
1
0.53 0.68 0.67 0.62 0.63 0.01
28 Item28
0.02
1
0.89 0.51 0.52 0.48 0.47 0.00
29 Item29
0.55
1
0.46 0.69 0.68 0.63 0.64 0.01
30 Item30
0.12
1
0.73 0.71 0.71 0.69 0.68 0.00
--------------------------------------------------------------
Conclusions
1.
2.
3.
4.
5.
6.
7.
The fit statistics have known asymptotic null distribution.
The fit statistics have sound statistical properties in terms of
Power and Type 1 error rates.
LM (MI), LM (LI) and LM (ICC) statistics have detection rates in
ascending order, respectively.
1PL, 2PL and 3PL have Power in ascending order,
respectively.
These fit indices also provide a measure of effect size. Effect
size has practical advantage to gauge the severity of misfit.
The performance of these indices less deteriorates in the
presence of large misfitting items.
The sample sizes, test length, degree of misfit are potential
factors which have influence on Type 1 error rates and Power.
Thanks for Kind Attention
&
Questions
Download