The Impact of Missing Data on the Detection of Nonuniform

advertisement
The Impact of Missing Data on the
Detection of Nonuniform
Differential Item Functioning
W. Holmes Finch
Outline
• Introduction
• DIF detection
• Missing data
– Types
– Methods for dealing with missing data
•
•
•
•
•
•
•
•
Listwise deletion
Omitted as incorrect
Multiple imputation
Stochastic regression imputation
Objective of the present study
Method
Results
Discussion
Introduction
• Researchers have focused on the impact of
missing data on uniform DIF analyses in the
presence of missing data.
• Results showed that type I error rates were
inflated so that items were mistakenly identified
as displaying DIF and power for DIF detection in
presence of missing data was low.
• This paper focused on examining the impact of
missing data on nonuniform DIF.
DIF Detection
• Uniform DIF:
– The reference group have a consistent advantage
in the likelihood of responding correctly to an item
for all levels, as compared with the focal group.
• Nonuniform DIF:
– The reference group have an advantage in
correctly responding to an item for some levels,
whereas for other levels, the focal group has an
advantage in correctly responding to the item.
Methods of Nonuniform DIF Detection
• IRT likelihood ratio test (IRTLR)
• Logistic regression (LR)
• Crossing SIBTEST (CSIB)
Types of Missing Data
• Missing completely at random (MCAR)
– Some respondents leave an item unanswered in a
completely random fashion, with no systematic
mechanism associated with the missingness.
• Missing at random (MAR)
– The probability of an observation containing missing data
is associated directly with a measurable variable.
• Missing not at random (MNAR)
– The likelihood of being missing is associated with the value
of the variable itself.
Listwise Deletion (LD)
• If an individual fails to respond to any item on the
instrument, his or her data would be excluded from DIF
analyses.
• Easy to employ and is the default for many statistical
software packages.
• It reduces the effective sample size, which can in turn lead
to a notable reduction in statistical power for hypothesis
testing of DIF.
• It has been associated with biased estimates in some
situations except data of MCAR.
Omitted as incorrect
• Zero imputation (ZI)
• Missing responses are assigned an incorrect
value, or a zero in the case of dichotomously
scored items.
• This approach would lead to biased parameter
estimation and hypothesis test results.
Multiple imputation (MI)
• MI can incorporate information from all variables
in a data set to derive imputed values for those
that are missing.
• The MI algorithm assumes a multivariate normal
probability distribution among the variables and
that the data are MAR or MCAR.
• Accurate parameter estimation and statistical
power rates comparable with those obtained
with complete data.
Stochastic regression imputation (SRI)
• SRI involves a two-step process in which the distribution of relative
frequencies for each response category for each member of the
sample is first obtained from the observed data.
• For each member of the sample, missing values are then replaced
by random draws from the multinomial distribution with
parameters equal to the distribution of relative frequencies of the
categories.
• The second step of SRI, LR is conducted for the target variable for
each of the M complete data sets with the independent variables
being the other variables in the data set.
Prior research
• Sedivy et al. (2006)
–
–
–
–
GRM model
LR and Poly-SIBTEST (uniform DIF)
Lowest value imputation
Type I error rates were rarely inflated and power was
diminished for higher levels of missing.
• Banks and Walker (2006)
– 3PL dichotomous model
– LD and ZI
– Type I error rates were inflated for ZI but not LD and power
for detecting DIF was higher for ZI than LD.
Prior research
• Robitzsch and Rupp (2009)
–
–
–
–
MH and LR
LD, ZI, MI, and tow-way imputation
ZI resulted in inflated type I error rates
DIF method, sample size, and number of items had relatively
little impact on the type I error and power rates.
• Finch (2011)
– MI, LD, and ZI
– ZI was associated with type I error inflation and in some cases
low power.
– Methods of DIF detection used (SIBTEST, MH, or LR) were not
affected differentially by the presence of missing data.
Method
•
•
•
•
•
•
•
3PL model
20 and 40 items
1 DIF item
Sample size: 250/250, 500/500, 1000/1000
Impact: (0,0), (0,-0.5), (0,0.5)
Percentage of missing data: 0, 10%, 20%, 30%
Magnitude of DIF: 0, 0.4, 0.8, and 1
Type of missing data
• MCAR: responses from across both groups on the target item were
randomly selected to be missing.
• MAR1: only members of the focal group were randomly selected to
have missing data on the target item (missing data mechanism was
associated with group membership).
• MAR2: examinees with total scores at or below the 30th percentile
were selected to have missing data (individuals with relatively lower
trait levels tend to leave target item blank).
• MNAR: missing data were taken only from those who had an
incorrect response to the target item (examinees who did not know
the correct answer to an item left it blank).
Results
Results
Results
Power
Impactwas
= 0/0
0/-5
0/+5
higher for greater
levels
Power
Whenof
impact
for
DIF
the
all conditions
=LD0/+5,
method
power
was
was
slightly
somewhat
under
most
lower
lower
ofthan
thethan
conditions
thatfor
of the
the
complete
other twodata
simulated
here
impact
condition,
was
conditions.
higherexcept
than
when
Powerthe
impact
for data
LD was
= were
0/-5.
slightly
MNAR.
lower
For
thanZI,for
Power
power
for
the
MIcomplete
was
ratestypically
were
data
relatively
except MAR2.
comparable
lowwith
in the
or higher
MAR1 and
than
MCAR
Higher
for
LD, conditions.
with
power
the
forexception
SRI mightof
resulted
MAR1
data
from
and
inflated
the lowest
type DIF
I
error.
condition.
LR
Results
Discussion
• Prior research on uniform DIF and missing data
– No single approach could be identified as optimal for all
conditions.
– ZI can always be viewed as the least optimal missing data
approach for uniform DIF detection.
• The current study on nonuniform DIF and missing data
– ZI did not always result in type I error inflation for nonuniform
DIF detection when data were MCAR and MNAR.
– LD produced results very similar to those obtained with the
complete data.
– Overall MI appears to be much preferable to SRI. The inflation
for SRI was much more severe than that of MI.
Download