DIF Analysis

advertisement
DIF Analysis
Galina Larina
28-31 of March, 2012
University of Ostrava
DIF analysis
Definitions
• Item impact
– “significant group difference on an item, e.g., when one group has a
higher proportion of examinees answering an item correctly than
another group ”
– Due to the true group differences in proficiency or due to item bias
• Differential Item Functioning (DIF)
– “It occurs when test-takers having identical levels on the latent trait
that the test was designed to measure but belonging to different
groups, have different probabilities of endorsing (or answering
correctly) a particular item”
– Examinees in different groups are matched on the proficiency
If an item is found to be poor-fitting in the whole data set or within
any group of test-takers, it should be remove from subsequent DIF
analysis
DIF analysis
Effectless of fit statistics
Winsteps
Infit
Conquest
Outfit
Infit
Outfit
Mean
1.00
1.00
1.00
1.00
Maximum
1.06
1.13
1.06
1.10
Minimum
0.94
0.91
0.93
0.91
Item 25
1.03
1.00
1.03
1.01
Infit and outfit mean square errors for simulated 50-item
test in which item 25 has DIF
DIF analysis
Types of DIF
Uniform DIF
Non-uniform DIF
Non-uniform mixed DIF
DIF analysis
Statistical methods for evaluating DIF
• CTT methods
– Conditional p-value difference
– Delta plot
– Standardization
• Chi-square methods
– Mantel-Haenszel
– etc.
• IRT methods
DIF analysis
Mantel-Haenszel method
Focal group
Base group
DIF analysis
Mantel-Haenszel method
Average factor by which the
likelihood that a base group
member gets the item correct
exceeds the corresponding
likelihood for comparable focal
group members
For statistically significant DIF
on an item, Prob. < 0.05
DIF analysis
Mantel-Haenszel method
• MH procedure is an extension of the chi-square test of
independence
• Advantages:
– Easy to compute
– Modest sample size requirements
– Effect size
• ETS DIF classification rules
– ‘Large DIF’ absolute value of MH D-DIF greater than or equal to
1.5, chi-square test sig. at 0.05 level/ Category C
– ‘Moderate DIF’ at least 1.0 (and less) than 1.5) and the chisquare test sig. at 0.05 level/ Category B
DIF analysis
Rasch approaches
• Separate calibration t-test first proposed by Wright and Stone
d i1  d i2
t= 2
2 1/ 2
(si1 + si2 )
Where di1 is the difficulty of item I in calibration 1, di2 is the difficulty of
item i in calibration 2 based on groups 2, s2i1 is the standard error of
estimate for di1, and s2i2 is the standard error of estimate for di2
• Winsteps applies the above formula in DIF analysis
DIF analysis
IRT approaches
• The between fit approach is based on a single
calibration that contains at least two subpopulations
of interest.
2
N
N
J
 (UB)i = 
2
j=1
j
 j

 x 
p ni 


ni
 n j

n j


Nj
w
n j
ni
where J is a number of subpopulations, N is a number of person in each
populations, xni is the score for person n responding to item i, and pni is
the probability of person n responding correctly to item i given the overall
estimates for the ability of the person and the difficulty of the item
DIF analysis
Winsteps
Column 20 with
width 1
DIF label start in person
label column 20
DIF label start in person
label with a width 1
DIF analysis
Winsteps
Press Entry Number
Press OK
DIF analysis
Winsteps
Pairwise
comparison
This should be
at least 0.5
logits for DIF
to be
noticeable
For statistically
significant DIF
on an item,
t > |2|
For statistically
significant DIF
on an item,
Prob. < 0.05
DIF analysis
Winsteps
Item 1
DIF analysis
Winsteps
Item 1
DIF analysis
Winsteps. Plots
Press OK
DIF analysis
Winsteps. Plots. Item 1
Download