The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch Outline • Introduction • DIF detection • Missing data – Types – Methods for dealing with missing data • • • • • • • • Listwise deletion Omitted as incorrect Multiple imputation Stochastic regression imputation Objective of the present study Method Results Discussion Introduction • Researchers have focused on the impact of missing data on uniform DIF analyses in the presence of missing data. • Results showed that type I error rates were inflated so that items were mistakenly identified as displaying DIF and power for DIF detection in presence of missing data was low. • This paper focused on examining the impact of missing data on nonuniform DIF. DIF Detection • Uniform DIF: – The reference group have a consistent advantage in the likelihood of responding correctly to an item for all levels, as compared with the focal group. • Nonuniform DIF: – The reference group have an advantage in correctly responding to an item for some levels, whereas for other levels, the focal group has an advantage in correctly responding to the item. Methods of Nonuniform DIF Detection • IRT likelihood ratio test (IRTLR) • Logistic regression (LR) • Crossing SIBTEST (CSIB) Types of Missing Data • Missing completely at random (MCAR) – Some respondents leave an item unanswered in a completely random fashion, with no systematic mechanism associated with the missingness. • Missing at random (MAR) – The probability of an observation containing missing data is associated directly with a measurable variable. • Missing not at random (MNAR) – The likelihood of being missing is associated with the value of the variable itself. Listwise Deletion (LD) • If an individual fails to respond to any item on the instrument, his or her data would be excluded from DIF analyses. • Easy to employ and is the default for many statistical software packages. • It reduces the effective sample size, which can in turn lead to a notable reduction in statistical power for hypothesis testing of DIF. • It has been associated with biased estimates in some situations except data of MCAR. Omitted as incorrect • Zero imputation (ZI) • Missing responses are assigned an incorrect value, or a zero in the case of dichotomously scored items. • This approach would lead to biased parameter estimation and hypothesis test results. Multiple imputation (MI) • MI can incorporate information from all variables in a data set to derive imputed values for those that are missing. • The MI algorithm assumes a multivariate normal probability distribution among the variables and that the data are MAR or MCAR. • Accurate parameter estimation and statistical power rates comparable with those obtained with complete data. Stochastic regression imputation (SRI) • SRI involves a two-step process in which the distribution of relative frequencies for each response category for each member of the sample is first obtained from the observed data. • For each member of the sample, missing values are then replaced by random draws from the multinomial distribution with parameters equal to the distribution of relative frequencies of the categories. • The second step of SRI, LR is conducted for the target variable for each of the M complete data sets with the independent variables being the other variables in the data set. Prior research • Sedivy et al. (2006) – – – – GRM model LR and Poly-SIBTEST (uniform DIF) Lowest value imputation Type I error rates were rarely inflated and power was diminished for higher levels of missing. • Banks and Walker (2006) – 3PL dichotomous model – LD and ZI – Type I error rates were inflated for ZI but not LD and power for detecting DIF was higher for ZI than LD. Prior research • Robitzsch and Rupp (2009) – – – – MH and LR LD, ZI, MI, and tow-way imputation ZI resulted in inflated type I error rates DIF method, sample size, and number of items had relatively little impact on the type I error and power rates. • Finch (2011) – MI, LD, and ZI – ZI was associated with type I error inflation and in some cases low power. – Methods of DIF detection used (SIBTEST, MH, or LR) were not affected differentially by the presence of missing data. Method • • • • • • • 3PL model 20 and 40 items 1 DIF item Sample size: 250/250, 500/500, 1000/1000 Impact: (0,0), (0,-0.5), (0,0.5) Percentage of missing data: 0, 10%, 20%, 30% Magnitude of DIF: 0, 0.4, 0.8, and 1 Type of missing data • MCAR: responses from across both groups on the target item were randomly selected to be missing. • MAR1: only members of the focal group were randomly selected to have missing data on the target item (missing data mechanism was associated with group membership). • MAR2: examinees with total scores at or below the 30th percentile were selected to have missing data (individuals with relatively lower trait levels tend to leave target item blank). • MNAR: missing data were taken only from those who had an incorrect response to the target item (examinees who did not know the correct answer to an item left it blank). Results Results Results Power Impactwas = 0/0 0/-5 0/+5 higher for greater levels Power Whenof impact for DIF the all conditions =LD0/+5, method power was was slightly somewhat under most lower lower ofthan thethan conditions thatfor of the the complete other twodata simulated here impact condition, was conditions. higherexcept than when Powerthe impact for data LD was = were 0/-5. slightly MNAR. lower For thanZI,for Power power for the MIcomplete was ratestypically were data relatively except MAR2. comparable lowwith in the or higher MAR1 and than MCAR Higher for LD, conditions. with power the forexception SRI mightof resulted MAR1 data from and inflated the lowest type DIF I error. condition. LR Results Discussion • Prior research on uniform DIF and missing data – No single approach could be identified as optimal for all conditions. – ZI can always be viewed as the least optimal missing data approach for uniform DIF detection. • The current study on nonuniform DIF and missing data – ZI did not always result in type I error inflation for nonuniform DIF detection when data were MCAR and MNAR. – LD produced results very similar to those obtained with the complete data. – Overall MI appears to be much preferable to SRI. The inflation for SRI was much more severe than that of MI.