2011-25 Treatment of Missing Data in Workforce Education Research Sinan Gemici, Jay W. Rojewski, In Heok Lee University of Georgia Abstract Most quantitative analyses in workforce education are affected by missing data. Traditional approaches to remedy missing data problems often result in reduced statistical power and biased parameter estimates due to systematic differences between missing and observed values. This article examines the treatment of missing data in pertinent quantitative analyses published in recent issues of Career and Technical Education Research. Next, essential missing data patterns and mechanisms are reviewed, and alternative methods of handling missing data are discussed. The article concludes with a comparison of missing data methods using a small sample from the National Longitudinal Survey of Youth 1997 to illustrate the detrimental effects of traditional approaches to handling missing data, and demonstrate the benefits of multiple imputation (MI) as an efficient modern missing data technique. Introduction Most quantitative research studies in education and the social sciences contain missing data (Allison, 2002). To address this issue, Wilkinson and the Task Force on Statistical Inference (1999) urged researchers to portray any complications encountered in the course of completing their investigations, including the occurrence of missing data. The report strongly highlighted the importance of using appropriate methodology to ensure that results are not biased by anomalies in the data. Authors were encouraged to describe patterns of missing data, as well as steps taken to address the problem during the analysis stage. It was concluded that such steps should be a standard component of all analyses, for to do otherwise was to risk “publishing nonsense” (p. 597). Guidelines found in the latest edition of the American Psychological Association Publication Manual (2010) echoed these sentiments. Given that many applied research studies in the social sciences continue to show deficiencies in the treatment of missing data, Schlomer, Bauman, and Card (2010) recently called for greater attention to the issue. Researchers know that they must report response rates for survey data and include effect sizes for statistically significant results. The same expectation for accurate reporting should apply to missing data management. Ignoring this step is poor science, and results reported without attention to missing data can misinform our scientific understanding and misguide policy and practice. (p. 8) Purpose The problem of how to deal with missing data has received little attention in career-technical education (CTE) research. Given the importance of using best practice in addressing missing data issues, the purpose of our study is threefold. First, we determine how CTE researchers have managed missing data in their own analyses by examining several recent issues of Career and Technical Education Research (CTER). Our examination criteria are based on two items considered essential in best practices related to the treatment of missing data, i.e., reporting the extent and nature of the missing data, and describing the procedures used to manage missing data, including the rationale for using the method selected (Schlomer et al., 2010). Second, we outline the types and mechanisms under which missing data occur, and review several alternative methods of handling missing data. Third, we compare these methods using a small data sample to illustrate the detrimental effects that unprincipled missing data techniques 2011-25 may have on the accuracy of parameter estimates. Our overall objective is to offer guidelines that may help CTE researchers to adequately address missing data problems in their own work. 2011-25 Treatment of Missing Data in Recent CTER Articles We selected four volume series of the CTER journal covering the years 2006 through 2009 (Volumes 31-34) to determine how researchers have managed missing data in quantitative investigations. We emphasize that the objective here was neither to engage in a full meta-analytic review of recently published research, nor to criticize the work of individual CTE researchers. Instead, our motivation for conducting this review was to highlight the need for taking a more principled approach to dealing with missing data when conducting quantitative CTE research. A total of 12 issues were published during this 4-year period containing 27 quantitative articles included in the pool of eligible studies. For each article in our sample, we determined whether (a) the percentage of missing data was reported, (b) the method for handling these data was specified, and (c) a rationale for the chosen method was provided (see Table 1). Of the 27 articles we examined, only three addressed missing data with specific techniques. One article employed a listwise deletion technique, eliminating 19 surveys with incomplete responses to items measuring the dependent variable. A second article included only “useable” surveys in its final data pool, meaning that only surveys containing complete responses to all questionnaire items were retained, eliminating the missing data problem. A third article mentioned the percentage of cases with complete data and indicated the use of a pairwise deletion method to address the problem, despite the author’s knowledge of potential bias being introduced into the analysis. No rationale was offered to support this choice. Four articles contained data tables with footnotes alerting readers to the missingness of data. However, in each case no explanation of the missing data, its possible effect on analysis, or its treatment was found in the article narrative. An additional eight articles made no mention of missing data but presented data tables indicating missing data. Finally, one study conducted a missing data analysis to determine differences in pre-/post-group attrition but did not extend this analysis to the data that was used for the final analysis. 2011-25 Table 1 Summary of How Articles Published in Career and Technical Education Research from 2006-2009 Dealt with the Issue of Missing Data Author(s) Iss Missing data Missing data addressed Rationale for Comments correction Volume 31 Park & Rojewski 1 NO __ Chadd & Drage 2 Listwise deletion (n=19) NO Zirkle, Norris, Winegardner, & Frustaci 2 NA __ Alfeld, Hansen, Aragon, & Stone 3 NO __ Higgins & Kotrlik 3 YES; Discrepancies noted in NO frequency tables for n but no explanation offered. Correlation table n values vary from 102-104 without explanation (N=105). __ YES; small n missing data in demographic data table; Doesn’t appear to influence inferential analysis but no explanation. YES; “due to missing responses in the 11 items addressing perceptions” (p. 89) Authors also noted that “many respondents did not identify their school or school district” (p. 89). NO; All surveys “had complete responses to the 36 barrier items of interest…and were deemed useable” (p. 108). YES; Due to “a page missing in some of the surveys sent” (p. 136) to one of the groups. Authors noted, “A final screening procedure to check for missing data did not detect any missing values for variables” (p. 34) Low response rate from survey ~ 20%. Authors examined attrition between surveys completed in the fall but not spring administrations. They referred to this analysis as a missing data analysis finding “no significant pattern to which students did not take the second survey” (p. 147). 2011-25 Author(s) Iss Missing data Volume 32 Bae, Gray, & Yeager 1 Geiman, Torres, Burris, 1 & Kitchel Missing data addressed Rationale for Comments correction NO; Used post hoc random selection NA of existing database, so may not be issue YES; Missing data detected in NO frequency table but no explanation offered. Doesn’t appear to be issue for analysis __ __ Sample size confusing. Initially, n=80 divided into 2 groups of 40 but eventually reported as 2 groups of n=39 and n=31. Additional information would allow for determining possible missing values. Park & Osborne 1 YES; Two variables in the descriptive NO data table and ANOVA results (df) indicated missingness. __ Esters 2 __ Low response rate from survey ~ 20%. Small sample size, n=88. Treated ordinal results as interval data for analysis (p. 87). McClain & McClain 2 YES; Author notes in Table 1 that NO “total does not equal 88 due to missing data” (p. 55). Table 2 also contains missing values; Results of PDA reveal only n=43. YES; Not mentioned but appears NO widespread through each data table. __ Low response rate from survey = 31.3%. Geiman & Covington 2 __ Small sample size, n=41 [3/44 did not reply]. Authors claimed, “Non-response error was not considered a serious threat to the validity of the study due to the high response rate” (p. 125). Park & Covington 3 DON’T KNOW; Cohorts and NO conditions described based on n=44 but had response of n=41. Only clarified response in terms of independent variable for analysis. No missing data on this variable. YES; Missing data on 3 of 9 variables NO included in Table 2; df for selected ttests indicate no missing data __ 2011-25 Author(s) Rationale for Comments correction Iss Missing data Missing data addressed Bennett 3 YES; Author indicates that only 602 of 1741 students responded to all items on the survey (34.6%). YES; Pairwise deletion was NO used though “it risked entering significant bias in the analyses” (p. 206) “To determine whether the missing data were randomly distributed, the differences between the ‘missing’ and the ‘non-missing’ student groups on each measure were tested” (p. 206). Archival database representing 67% student response rate. All descriptive data in percentages only, no n. Volume 33 Burns 1 NO __ Small sample size, n=48. Indicated that 55 teachers were initially administered surveys but included 48 for analysis. Rehm 1 YES; Table 2 (descriptive) has missing data on 13 of 25 items— footnote provided; Table 3 indicates n=47. survey YES; A footnote to Table 1 indicated presence of missing data NO __ Low response rate from survey = 22.8%. Small sample size, n=41. Kitchel, Geiman, Torres, & Burris Bragg & Marvel 2 YES; Mentioned in text for a single descriptive variable YES; Large discrepancies between responses and data in Tables 2-4 NA __ NO __ Gaytan 2 NO NA __ Kim & Bragg 2 DON’T KNOW; No n provided after initial values; Can’t determine NO __ McCharen 3 DON’T KNOW NO __ 2 Low response rate from survey = 29.0%. F statistics presented without df so extent of missing data not determined. t statistics presented without df so extent of missing data not determined. 2011-25 Author(s) Iss Missing data Volume 34 Grier-Reid, Skaar, & Parson 1 Missing data addressed Rationale for Comments correction YES; Table 1 presents incomplete data NO for initial rather than final pool, which indicates missing data although final data pool does not. YES; Nonresponse on individual items NO reported. Table footnote indicates n=511, which is much lower than reported sample size of n=716 “due to mortality rate of the KAI” (p. 36). n=712 noted for analysis on another variable without comment. YES NO __ Reports attrition rates of 45%. __ Missing data = 28%. DON’T KNOW; Can’t determine from data given. DON’T KNOW; Can’t determine from data given. NO __ Friedel & Rudd 1 Kotrlik & Redmann 1 Fletcher & Zirkle 2 Wolf, Foster, & Birkensholz 2 Crittenden 3 NO; df indicate no missing data. NA __ Kitchel, Cannon, & Duncan 3 YES; Missing data in descriptive data table (very small). NO __ __ NO Small sample size, n=27. Descriptive data presented in percentages only Small sample size, n=45. 2011-25 Overview of Missing Data Data can be missing for a variety of reasons. Missing data due to nonresponse can occur because of noncontact, refusal to cooperate, or specific barriers that impede an eligible respondent from participating (Groves & Couper, 1998). Survey researchers generally distinguish between unit nonresponse and item nonresponse. The former refers to the absence of any sort of data from an eligible respondent due to noncontact or outright refusal to participate, while the latter denotes a situation in which a respondent answers some items but fails to answer others (Elliott, Edwards, Angeles, Hambarsoomians, & Hays, 2005). Wave nonresponse occurs in longitudinal surveys where participants’ responses may be missing for one or more survey waves. In experimental studies missing data may occur due to attrition, meaning that a participant decides to drop out before data collection has been completed (Given, Keilman, Collins, & Given, 1990). Finally, erroneous data entry, disclosure restrictions, and similar procedural factors can lead to incomplete data. Missing values that emanate from these and other scenarios routinely obstruct data analysis because most statistical procedures require a complete data matrix. Incomplete data can result in reduced statistical power, difficulties in data analytic procedures using standard software packages, and biased analysis results due to the potential existence of systematic differences between missing and observed data (Barnard & Meng, 1999). The detrimental effects caused by missing data are particularly challenging in the context of survey research due to the sizeable number of responses and respondents involved (Raaijmakers, 1999). Overall, incomplete data are a nuisance that routinely obstructs data analytic procedures in applied workforce education and other areas of scientific investigation, including psychology (JeliΔiΔ, Phelps, & Lerner), political science (Honaker & King, 2010), and public health (Stuart, Azur, Frangakis, & Leaf, 2009). Historically, cases with missing values were either ignored or the missing observations were substituted with imprecise approximations based on simplistic replacement procedures. The statistical cost incurred by these approaches was frequently prohibitive in terms of case loss and/or analysis bias. To address this issue, Dempster, Laird, and Rubin (1977) developed the expectation maximization algorithm, whereby a likelihood function is used to draw parameter estimates from a particular distribution that is assumed to underlie the missing data. Based on Rubin’s (1976) framework of inference from incomplete data, EM was the first modern-day stochastic missing data technique (Schafer & Graham, 2002). A decade after introducing EM, Rubin (1987) developed the multiple imputation (MI) method that is based on the creation of several complete datasets in which missing values are replaced with different random draws from a distribution of plausible values. By analyzing each imputed dataset separately before pooling results, MI is able to incorporate the uncertainty inherent in the missing data, thus producing more robust parameter estimates (Schafer, 1999). This brief historic overview of handling missing data illustrates a progression from simplistic approaches to more principled ones that incorporate the randomness reflected in the missing data. This progression has been supported by the general proliferation of computing power and the widespread incorporation of advanced missing data methods in standard statistical software packages. Before reviewing different missing data methods it is important to understand the nature of missing data patterns and mechanisms. Missing Data Patterns and Mechanisms Missing data can occur in random or nonrandom patterns within a data matrix. Methodologists differentiate between three types of patterns, including univariate, monotone, and arbitrary. Univariate patterns occur when a specific variable contains missing values, while all other variables are fully observed. Monotone patterns occur when individuals decide to drop out from a study before its formal completion (Fielding, Fayers, & Ramsay, 2009). For instance, if an individual within a Yi variable data matrix had an observed value on variable Y3, the same individual would have observed values on all 2011-25 preceding variables Y1 and Y2. Likewise, if Y3 was the last variable for which data were collected before dropout, all further variables Y4 to Yi would exhibit missing values. Arbitrary patterns arise when missing data display no systematic, discernable structure within a given data matrix. This occurs when each case exhibits a different pattern of missing values (McKnight et al., 2007). It should be noted that not all different missing data patterns have to be present in any given data matrix. Missing data theory distinguishes between three nonresponse mechanisms that may underlie the structure of a data matrix (Graham, 2009). These mechanisms capture differences in the probabilistic relationship between missing and observed values. When data are missing completely at random (MCAR) the missingness of a value in variable Z is unrelated to any other data point within variable Z or any other variable in the dataset. Nonresponse under MCAR is ignorable, since it assumes that missing values are simply a random subsample of the complete data matrix and, therefore, do not alter the original distributional relationships between variables. MCAR makes the very strong assumption of complete randomness, which is difficult to uphold in practice (Little & Rubin, 1987). Little’s (1988) MCAR test, which is available in several standard statistical software packages (e.g., SPSS, SAS), can be conducted to determine whether missing data are, in fact, missing completely at random. Where test results reject the null hypothesis of complete randomness, the application of missing data techniques that assume MCAR will yield biased parameter estimates. A less stringent mechanism is known as missing at random (MAR), whereby the missingness of a value in variable Z is unrelated to any other data point within variable Z, but is related to one or more of the other variables in the dataset. Lower-achieving students, for instance, may exhibit a lower propensity to participate in a voluntary aptitude test. Consequently, the missingness of test scores is directly related to a student’s achievement status. Nonresponse under MAR is ignorable because the probabilities of missingness do not depend on the missing data themselves (Allison, 2002). The strength of modern missing data techniques lies in their ability to produce unbiased parameter estimates under MAR. The third mechanism, missing not at random (MNAR), refers to situations in which the missingness of a value in variable Z is a function of other values in variable Z. Data that are MNAR represent nonignorable nonresponse and greatly complicate the treatment of missing data, since a model for the distribution of missingness in each variable must be specified separately (Schafer & Graham, 2002). While methods dealing with MNAR have been developed (e.g., Demirtas, 2005), these models are highly complex and require modeling assumptions that, if incorrect, exacerbate bias when compared to the application of modern MAR-based techniques (Demirtas & Schafer, 2003). No statistical test exists to determine whether missing data are MAR or MNAR. Traditional Missing Data Methods Frequently-encountered traditional approaches to handling missing data include complete case analysis, complete variables analysis, mean substitution, regression-based imputation, and cold-deck/hotdeck imputation. For clarity, we have divided traditional approaches into case reduction and deterministic methods. Case Reduction Methods Complete case and complete variables analysis are based on eliminating the missing data problem through case reduction. Complete case analysis, also referred to as listwise deletion, entails simply discarding all cases in a dataset that exhibit missing data on one or more variables. This approach is often used by researchers because it can be implemented without computational effort and may be used in conjunction with all sorts of subsequent statistical analyses (Allison, 2002). Complete variables analysis, also referred to as pairwise deletion, is a variable-by-variable approach that discards only those cases that exhibit missing values on a particular bivariate pair. Both methods are generally considered inefficient because they discard cases for which information is at least partially available. 2011-25 The use of case reduction techniques may be appropriate when data are MCAR and the amount of missing values is small. Five percent has been suggested as an acceptable upper limit for case reduction (Schafer, 1997). When applied in scenarios with higher rates of missingness, case reduction eliminates important information contained in the original data matrix, resulting in potentially dramatic case loss and biased parameter estimates (Graham, Hofer, & MacKinnon, 1996). Complex multivariate analyses based on large-scale datasets are particularly prone to detrimental effects from case reduction due to the high number of variables on which missingness can occur. Deterministic Methods Mean substitution, regression imputation, as well as cold-deck/hot-deck imputation are considered deterministic procedures because they replace missing values with a simple fixed estimate of the hypothesized true value (Schulte Nordholt, 1998). The key advantage of deterministic approaches over case reduction methods lies in the preservation of sample size. Mean substitution simply replaces all missing data points in a given variable with that variable’s arithmetic mean value. This approach, however, is no less problematic than case reduction techniques, for replacing missing observations with the mean value reduces variability in the data and leads to biased estimates of variances and covariances even under MCAR (Little & Rubin, 1987). Regression imputation is a slightly more refined approach that replaces missing data with the predicted values from a linear regression model using a set of auxiliary variables. This method requires at least a moderate degree of covariance between variables with missing data and all other variables within the data matrix. Similar to case reduction, regression imputation requires data to be MCAR. Although easy to implement, regression imputation produces negatively biased standard errors (Enders, 2006), inflated correlations between variables, and overestimated R2 values (Schafer & Olsen, 1998). Moreover, imputed values fall directly on the regression plane, leading to a lack of residual variability in the data. To offset this effect, a random error term can be added to the imputation model to introduce additional variance. Cold-deck and hot-deck procedures have long been used to deal with missing data in survey research. In contrast to mean substitution and regression imputation, cold-deck and hot-deck procedures do not rely on the creation of synthetic values (Chen & Shao, 2001). Cold-deck imputation is used in longitudinal surveys that consist of several data collection waves. If a certain case exhibits an observed value on a given variable in a previous wave, but a missing value on that same variable in a later wave, the previous wave’s observed value is assigned (Chaudhuri & Stenger, 1992). Whereas cold-deck imputation is based on data from different datasets on the same case, hot-deck imputation uses the actual value from a different case in the same dataset (Schulte-Nordholt, 1998). Hot deck imputation identifies a case (also referred to as a donor) in the same dataset that is similar across all variables to the case containing the missing value and replaces the missing observation with the donor’s value. Distance measures ensure that the closest-fitting donor value is identified and used for replacement (Switzer, Roth, & Switzer, 1998). Using similar donors generally avoids the computation of nonsensical replacement values. Nonetheless, resulting estimates of correlations and regression weights are often unreliable (Roth & Switzer, 1995) and parameter estimates can be biased even under MCAR (Brown, 1994). While deterministic approaches are easy to compute and implement, their detrimental effect on variance is problematic when data are used for multivariate analysis. Moreover, deterministic methods routinely underestimate parameter standard errors, thus increasing the likelihood for Type I error. Modern estimation procedures, such as expectation maximization and multiple imputation, can remedy many of these shortcomings. 2011-25 Modern Missing Data Methods In contrast to the case reduction and deterministic approaches of traditional missing data methods, modern missing data techniques include an element of randomness. Specifically, modern methods assume a certain underlying distribution for the missing values. Plausible values are then drawn at random from that assumed distribution to re-create a complete data matrix (see Little & Rubin, 1987). Modern techniques have gained widespread popularity because they have demonstrated consistently superior efficiency and estimation properties in terms of parameter bias (Schafer, 1997). Here, we refer to Chen and Åstebro’s (2003) simplified definition of efficiency as “a procedure that provides an unbiased estimate of sample properties that is also easy to implement” (p. 315). One key advantage of these methods lies in their ability to produce unbiased parameter estimates under MAR instead of requiring the more stringent (and less realistic) MCAR assumption. Frequently-used modern missing data methods include expectation maximization and multiple imputation. Options for carrying out these methods exist for standard statistical software packages, such as SPSS, Stata, SAS, or R. Expectation Maximization Expectation maximization (EM; Dempster et al., 1977) is a maximum-likelihood approach that arrives at missing value estimates through an iterative approximation process. Maximum-likelihood estimation “searches over different possible population values, finally selecting parameter estimates that are most likely (have the ‘maximum likelihood’) to be true, given the sample observations” (Eliason, 1993, p. v). Conceptually, EM solves a complex missing data problem by repeatedly solving simpler complete data problems. EM is a two-step process that consists of an expectation and a maximization step. During the expectation step, the mean vectors and covariance matrix of the available data and resulting parameter estimates are used to determine the conditional expectations of the missing data (Enders, 2006). This means that a series of separate equations is used to regress each missing variable on the remaining complete variables for a given case. Predicted scores (i.e., parameter estimates) produced from these regressions are used to replace the missing values. The maximization step consists of recalculating these predicted scores using maximum-likelihood estimates based on actual and re-estimated missing data from the expectation step (Little & Rubin, 1987). EM iterations are repeated until the loglikelihood converges to a stationary point. The maximum-likelihood procedures in EM infer probable values for the missing data from information contained in the observed data. Simulation studies have found EM to perform very well under different missing data scenarios (see Graham & Donaldson, 1993; Ibrahim, 1990). One important disadvantage of EM is its high sensitivity to misspecifications of the imputation model. Another disadvantage lies in EM’s limited ability to account for the uncertainty inherent in the estimation of missing data. This is due to the fact that the covariance matrix used as a basis for the regression equations is itself only one of many plausible covariance matrices and, therefore, estimated with error due to the missing data (Enders, 2006). Multiple Imputation First notions of multiple imputation (MI) were introduced by Rubin (1978) as a reaction to the nonresponse problem in the analysis of large-scale surveys. Almost a decade later, Rubin (1987) presented a comprehensive framework for the use of MI as a highly versatile, general-purpose approach to missing data. However, it was not until the late 1990s that MI became more widely used based on advances in computational power (Sindharay, Stern, & Russell, 2001). Today, it is well established that MI provides accurate estimates in conditions under which deterministic approaches yield biased results (Schafer, 1997; Schulte-Nordholt, 1998). MI is a Monte Carlo approach, a general term for computational techniques that repeat an artificially created chance process using random numbers (Mooney, 1997). MI is based on the creation of m > 1 complete datasets that are analyzed individually before pooling parameter estimates and standard 2011-25 errors into one unified set of results. The replacement of each missing data point with several simulated values is a key characteristic that distinguishes MI from other methods (Rubin, 1996). By replacing each missing observation with several slightly different plausible values, MI incorporates the randomness inherent in the missing data, thus mitigating the problem of variance underestimation inherent in both traditional missing data methods and the EM approach. MI is further able to yield precise missing value estimates without a large number of computation cycles. Between five and 10 imputations are generally viewed as sufficient (Schafer, 1997), although much higher numbers of imputations have been suggested with regards to preserving statistical power for testing small effect sizes (for more details see Graham, Olchowski, & Gilreath, 2007). Once several imputed data matrices have been created they are analyzed separately before results are pooled into a final set of parameter estimates and standard errors using the four-step process outlined in Table 2 (see also Enders, 2006). Table 2 Pooling Procedure for Multiple Parameter Estimates and Standard Errors Step Formula π 1 πΜ = ∑ πΜπ π 1. Pooled parameter estimate π=1 where m is the number of imputations and πΜπ is the parameter estimate from the ith imputed dataset 2. Pooled standard error π 1 Μ = ∑π Μπ π π a. Within-imputation variance π=1 Μπ is the variance estimate where π from the ith imputed dataset, and m is the number of imputations π b. Between-imputation variance 1 2 π΅ = ∑(πΜπ − πΜ ) π c. Total imputation variance Μ + (1 + π=π d. MI standard error π−1 1 )π΅ π S.E. = √π In this section, we reviewed traditional and modern missing data methods. Generally, these methods represent a progression from unprincipled approaches, such as case reduction and deterministic methods, to principled ones, such as EM and MI. While traditional methods can be adequate for simple missing data problems with low rates of missingness under the MCAR assumption, EM and MI generally yield much better estimation results when data are MAR. Due to its flexibility, MI is particularly wellsuited for addressing multivariate missing data problems under the normal model. MI has also demonstrated relative robustness to misspecification of the imputation model (Beunckens, Sotto, & Molenberghs, 2008) and deviations from multivariate normality (Graham & Schafer, 1999). Comparison of Missing Data Methods To demonstrate the effects of different approaches to handling missing data, we conducted a comparison of several missing data methods using a small sample from the National Longitudinal Survey of Youth 1997 (NLSY97, U.S. Bureau of Labor Statistics, 2009). The NLSY97 is a nationallyrepresentative annual survey that provides data to examine the transition process of secondary students 2011-25 into postsecondary education and/or the workplace. We randomly selected a sample of 100 complete cases from the 1996/97 base year cohort of 9th-graders to regress socioeconomic status, academic achievement, and curriculum track on outcome scores of the Peabody Individual Achievement Test (PIAT) math assessment. The PIAT is a widely-used brief assessment of academic achievement, and the instrument’s mathematics assessment subtest was administered to all respondents who were in ninth grade or lower during the NLSY97 base year data collection. The choice of these predictor and outcome variables was guided by their frequent use in CTE research, along with the need to keep the analysis simple for demonstration purposes. Also, we intentionally did not select a large sample to account for the fact that many studies in applied workforce education research operate with smaller sample sizes (see examples provided in Table 1). Table 3 provides details on the variables used for comparison. Table 3 NLSY97 Variables Used for Comparison NLSY97 designation Description Levels Role CV_HH_POV_RATIO Ratio of household income to poverty level (referred to as Poverty ratio) Continuous Predictor YSCH-6800 Grades received in eighth grade (Grades) 1=Mostly below Ds 2=Mostly Ds 3=About half Cs and Ds 4=Mostly Cs 5=About half Bs and Cs 6=Mostly Bs 7=About half As and Bs 8=Mostly As Predictor TRANS_SCH_PGM Curriculum track (Track) 0=Academic 1=CTE Predictor CV_PIAT_STANDARD_UPD PIAT standard score (PIAT) Continuous Outcome Note. The eight-level ordinal grades variable was treated as a continuous predictor for imputation purposes. The distributional properties of all continuous predictors were examined, since standard missing data mechanisms, including MI, assume multivariate normality. The original household poverty ratio variable was positively skewed and leptokurtic. After applying square root transformation, the poverty ratio variable more closely approximated normality. No transformations were applied to any other variable. Following this transformation, the Shapiro-Wilk test indicated the presence of multivariate normality for the complete dataset (W = .983, p = .230). Descriptive statistics for the complete-case sample are provided in Table 4. Table 4 Descriptive Statistics for the Complete-case Sample (n=100) Poverty ratio Grades Track PIAT Min 3.46 3 0 68 Max 31.21 8 1 137 M 17.80 5.80 .61 99.65 SD 5.384 1.287 β 13.432 Skewness -.150 -.197 β .149 Kurtosis -.090 -.466 β .150 2011-25 We first used the complete-case sample to regress PIAT scores on poverty ratio, grades, and track in order to establish a baseline of the hypothetically true parameter estimates. Baseline results for the complete sample are provided in Table 5. Table 5 Baseline Results for the Complete Dataset R2 Adj β S.E. 64.528 .648 4.839 -7.334 5.512 .175 .809 2.023 t df CI β CI + 96 96 96 96 53.588 .300 3.233 -11.349 75.469 .996 6.444 -3.319 .586 Intercept Povratio Grades Track * p<.05 ** p<.01 11.707*** 3.699*** 5.982*** -3.626*** *** p<.001 The same analysis was repeated for different missing data mechanisms and rates of missingness using listwise deletion (LD), mean substitution (MS), and multiple imputation (MI). LD and MS were chosen due to their continued use in applied workforce education research, and MI was chosen due to its rapidly increasing popularity as a modern missing data method. Parameter estimates were compared to those of the complete-case baseline to determine differential effects on missing data bias. All analyses were conducted in the statistics program R, which is widely on the internet available at no cost. Multiple imputation was carried out using the Multiple Imputation by Chained Equations (MICE, Van Buuren & Groothuis-Oudshoorn, 2009) package for R. Ten complete imputed datasets were created for each application of MI under different missing data mechanisms and rates of missingness. MCAR We imposed an MCAR mechanism by randomly deleting 40 (10%), 80 (20%), and 120 (30%) observations from the sample’s 100 x 4 complete data matrix. Table 6 illustrates the missing data pattern by category of missingness. The first row of the first pattern (i.e., 10% missingness) indicates that 64 out of 100 cases in the sample are complete. The second row shows that eight cases have a missing value on the poverty ratio variable, whereas the seventh row indicates the existence of one case with missing values on both poverty ratio and PIAT. The last column summarizes the number of variables that have missing values for the number of cases specified in the first column. The total number of missing values is 40, and most of them (i.e., n=13) occur in the PIAT outcome variable. The interpretation applies analogously to all other missing data patterns. Table 6 MCAR Missing Data Patterns for Various Categories of Missingness 10% 20% PR G T PT PR G T PT 64 1 1 1 1 0 36 1 1 1 1 0 8 0 1 1 1 1 8 0 1 1 1 1 8 1 0 1 1 1 13 1 0 1 1 1 6 1 1 0 1 1 13 1 1 0 1 1 10 1 1 1 0 1 14 1 1 1 0 1 1 0 0 1 1 2 4 0 0 1 1 2 1 0 1 1 0 2 2 0 1 0 1 2 2 1 0 1 0 2 4 1 0 0 1 2 10 11 6 13 40 1 0 1 1 0 2 5 1 1 0 0 2 15 21 24 20 80 30% 13 8 11 21 16 10 6 2 4 2 5 PR 1 0 1 1 1 0 0 1 0 1 1 G 1 1 0 1 1 0 1 0 1 0 1 T 1 1 1 0 1 1 0 0 1 1 0 PT 1 1 1 1 0 1 1 1 0 0 0 0 1 1 1 1 2 2 2 2 2 2 2011-25 1 1 0 0 30 0 1 26 1 0 35 0 0 29 3 3 120 Note. PR = poverty ratio; G = grades; T = track; PT = PIAT For the four variable columns, 0 indicates missing data and 1 indicates observed data. Listwise deletion. Listwise deletion resulted in 40, 80, and 120 values being randomly deleted across all four variables in the 400-cell complete data matrix. When compared with the hypothetically true parameters from the complete dataset (see Table 4), results from the listwise-deletion-based analysis yielded Type II errors for track (for 10% and 20% missingness), as well as poverty ratio and grades (for 30% missingness). This means that the null hypothesis for these variables is wrongfully accepted. Type II errors and biased regression coefficients were accompanied by inflated adjusted R2 values and standard errors for 20 and 30% missingness. Table 7 lists results for listwise deletion under MCAR. Table 7 Regression Results for Listwise Deletion, Mean Substitution, and Multiple Imputation under MCAR CI % CI R2 Adj β t df Missingness S.E. + β Listwise deletion 10 .556 Intercept 61.611 6.965 8.846*** 47.679 75.543 Povratio .710 .212 3.355** 60 .287 1.134 Grades 4.593 1.058 4.342*** 60 2.477 6.709 Track -4.379 2.661 -1.645 60 -9.702 .944 20 .602 Intercept 53.186 9.735 5.464*** 32 33.357 73.015 Povratio .645 .285 2.263* 32 .064 1.225 Grades 6.408 1.549 4.136*** 32 3.252 9.564 Track -3.036 3.483 -.872 32 -10.129 4.058 30 .610 Intercept 87.526 16.341 5.356*** 9 50.559 124.492 Povratio .527 .615 .857 9 -.864 1.918 Grades 1.332 2.400 .555 9 -4.096 6.760 Track -17.018 6.596 -2.580* 9 -31.939 -2.097 Mean substitution 10 .477 Intercept Povratio Grades Track 20 .346 Intercept Povratio Grades Track 30 .232 Intercept Povratio Grades Track 65.374 .662 4.282 -5.948 5.721 .188 .881 2.083 11.426*** 3.518** 4.860*** -2.855** 96 96 96 96 54.017 .289 2.533 -10.082 76.731 1.036 6.031 -1.813 80.768 .464 2.957 -8.856 6.022 .211 .952 2.357 13.412*** 2.196* 3.106** -3.757*** 96 96 96 96 68.814 .045 1.067 -13.535 92.722 .883 4.847 -4.177 70.077 .719 3.172 -4.956 8.241 .253 1.079 3.000 8.504*** 2.840** 2.939** -1.652 96 96 96 96 53.719 .216 1.030 -10.911 86.435 1.221 5.314 .999 2011-25 Multiple imputation 10 .518 Intercept Povratio Grades Track 20 .625 Intercept povratio Grades Track 30 .503 Intercept Povratio Grades Track * p<.05 **p<.01 68.541 .705 3.954 -8.050 6.159 .197 .961 2.369 11.128*** 3.581** 4.116*** -3.398** 96 96 96 96 56.270 .313 2.035 -12.794 80.812 1.097 5.873 -3.306 65.514 .461 5.286 -7.330 6.205 .236 .935 2.552 10.558*** 1.982* 5.655*** -2.872** 96 96 96 96 52.880 .026 3.402 -12.598 78.148 .948 7.170 -2.062 61.995 9.251 96 42.793 81.198 96 96 96 .064 2.471 -13.520 1.176 7.840 .551 .620 5.156 -6.484 *** p<.001 .272 1.302 3.393 6.701*** 2.278* 3.960** -1.911 Mean substitution. In our example, mean substitution was more robust to Type II error than listwise deletion, and a misclassification of the track variable occurred only in the highest missingness category. However, whereas standard error inflation and bias in the regression coefficients was moderate compared the performance of listwise deletion, negative bias in R2 values was substantial (see Table 7 for results of mean substitution under MCAR). Multiple imputation. Poverty ratio, grades, and PIAT were imputed using predictive mean matching (PMM), which is the default method for imputing continuous data in MICE (see Van Buuren & Groothuis-Oudshoorn, 2009, for detailed information on PMM). The binary track variable was imputed using logistic regression. MI performed considerably better than LD and MS with regard to regression coefficients, standard errors, and R2 values. Results were robust to Type II error up to, but not including, 30% missingness (see multiple imputation results under MCAR in Table 7). MAR A MAR mechanism was imposed on the dataset by creating an artificial dependency between poverty ratio and PIAT scores such that individuals with lower poverty ratios had a higher likelihood of missingness on PIAT. Accordingly, we randomly deleted 10 (10%), 20 (20%), and 30 (30%) of values from the PIAT outcome variable for cases in the two lowest poverty ratio quartiles. Table 9 illustrates the data pattern by category of missingness. Table 9 MAR Missing Data Patterns for Various Categories of Missingness 10% 20% 30% PR G T PT PR G T PT PR 90 1 1 1 1 0 80 1 1 1 1 0 70 1 10 1 1 1 0 1 20 1 1 1 0 1 30 1 0 0 0 10 10 0 0 0 20 20 0 Note. PR = poverty ratio; G = grades; T = track; PT = PIAT For the four variable columns, 0 indicates missing data and 1 indicates observed data. G 1 1 0 T 1 1 0 PT 1 0 0 1 30 30 2011-25 Listwise deletion. When compared with the hypothetically true parameters from the complete dataset, results from the listwise deletion-based analysis yielded Type II errors on poverty ratio for 20% and 30% missingness. R2 values were deflated, whereas regression coefficients and standard errors were moderately inflated. Table 9 lists results for listwise deletion under MAR. Table 9 Regression Results for Listwise Deletion, Mean Substitution, and Multiple Imputation under MAR % R2 Adj β S.E. t df CI + CI β Missingness Listwise deletion 10 .459 Intercept 70.932 6.487 10.934*** 60 58.036 83.828 * Povratio .442 .197 2.244 60 .050 .834 Grades 4.472 .878 5.093*** 60 2.726 6.217 Track -6.997 2.058 -3.400** 60 -11.089 -2.906 20 .423 Intercept 77.866 7.366 10.571*** 32 63.196 92.536 Povratio .256 .215 1.192 32 -.172 .685 *** Grades 4.126 .935 4.411 32 2.263 5.990 Track -8.147 2.166 -3.761*** 32 -12.461 -3.833 30 .463 Intercept 75.706 7.623 9.931*** 9 60.486 90.927 Povratio .370 .231 1.600 9 -.092 .833 Grades 4.085 .980 4.169*** 9 2.129 6.040 Track -8.898 2.339 -3.804*** 9 -13.569 -4.227 Mean substitution 10 .337 Intercept Povratio Grades Track 20 .281 Intercept Povratio Grades Track 30 .278 Intercept Povratio Grades Track Multiple imputation 10 .560 Intercept Povratio Grades Track 20 .561 Intercept 85.552 .110 3.226 4.846 .186 .858 14.635*** .594 96 96 96 73.948 -.258 1.523 97.155 .479 4.929 96 -11.171 -2.654 -6.913 2.145 3.760*** -3.222** 94.993 -.080 2.453 -7.338 5.569 .177 .817 2.044 17.058*** -.450 3.001** -3.591** 96 96 96 96 83.939 -.431 .831 -11.394 106.046 .272 4.075 -3.281 93.533 -.023 2.416 -6.845 5.449 .173 .800 2.000 17.166*** -.134 3.021** -3.423** 96 96 96 96 82.717 -.367 .829 -10.814 104.349 .320 4.003 -2.876 67.604 .516 4.760 -6.990 6.661 .204 0.846 2.008 10.149*** 2.528* 5.625*** -3.481** 96 96 96 96 54.095 .104 3.075 -10.976 81.113 .928 6.445 -3.003 71.488 6.446 11.090*** 96 58.454 84.522 2011-25 Povratio .334 Grades 4.876 Track -7.818 30 .584 Intercept 71.354 Povratio .497 Grades 4.350 Track -9.050 * p<.05 **p<.01 ***p<.001 .209 .813 2.142 1.992* 6.000*** -3.650** 96 96 96 .089 3.261 -12.092 .758 6.491 -3.542 7.633 .197 .926 2.670 9.348*** 2.519* 4.700*** -3.389** 96 96 96 96 55.422 .100 2.482 -14.573 87.287 .895 6.219 -3.526 Mean substitution. In our example, mean substitution led to Type II error with only 10% missing values on the PIAT outcome variable. Negative bias in R2 values was substantial, as was bias in coefficients. Standard errors were close to the complete-case benchmark (see Table 9 for results of mean substitution under MAR). Multiple imputation. Whereas both listwise deletion and mean substitution produced Type II error, MI estimates were robust across all three tested categories of missingness. R2 values were consistently close to the complete-case benchmark, as were regression coefficients and standard errors (see Table 9 for results for multiple imputation under MAR). Relative Efficiency of Missing Data Methods We used a relative efficiency approach to assess performance differences between the missing data methods examined in our example. Relative efficiency is a concept in which two estimators of a given parameter of interest are compared against each other in terms of their bias relative to the parameter’s hypothesized true value. To illustrate, let T represent the first estimator and T′ the second estimator of parameter θ. T is relatively more efficient if σ2(T) < σ2(T′) for all possible values of θ (Panik, 2005). For ease of implementation, we chose R2 as our reference statistic although relative efficiencies could have also been calculated for each individual t-value for the regression coefficients. Given that our example was based on a single estimation run for each method, we adapted the relative efficiency approach by substituting estimator variance with differences between the hypothesized true adjusted R2 value and the adjusted R2 value produced by each missing data method. Missing data methods were compared pairwise, with MI being the reference method in each comparison. Our adaptation of relative efficiency resulted in (βπ 2 π΄ππ)2 R.E.1 vs. 2 = (βπ 12 π΄ππ)2 2 where (βπ 12 π΄ππ)2 is the difference between the hypothesized true adjusted R2 value from the complete dataset and the adjusted R2 value generated by method 1, and (βπ 22 π΄ππ)2 is the difference between the hypothesized true adjusted R2 value from the complete dataset and the adjusted R2 value generated by method 2 (i.e., multiple imputation as the reference method). Adapted relative efficiency values are provided in Tables 10 and 11 for MCAR and MAR, respectively. Table 10 Relative Efficiency of Missing Data Methods under MCAR based on Adjusted R2 % Missing R2 Adj (ΔR2 Adj)2 Listwise deletion 10 20 30 .556 .602 .610 .0009 .0003 .0006 Mean substitution 10 20 30 .477 .346 .232 .0119 .0576 .1253 Method R.E. (LD vs .MI) .195 .168 .084 R.E. (MS vs. MI) 2.569 37.870 18.191 2011-25 Multiple imputation 10 20 30 .518 .625 .503 .0046 .0015 .0069 Note. Efficiency was computed relative to the hypothesized true adjusted R2 value from the complete data (.589; see Table 5). Two equally-efficient estimators have a relative efficiency of 1. Given that MI was the reference group, R.E. < 1 indicate that the respective comparison missing data method is more efficient than MI. Likewise, R.E. > 1 indicate that MI is more efficient. Table 11 Relative Efficiency of Missing Data Methods under MAR based on Adjusted R2 % Missing R2adj (ΔR2 adj)2 Listwise deletion 10 20 30 .459 .423 .463 .0161 .0266 .0151 Mean substitution 10 20 30 .337 .281 .278 .0620 .0930 .0949 Multiple imputation 10 20 30 .560 .561 .584 .0007 .0006 .0000 Method R.E. (LD vs .MI) R.E. (MS vs. MI) 23.859 42.510 3782.250 91.717 148.840 23716.000 Note. Efficiency was computed relative to the hypothesized true adjusted R2 value from the complete data (.589; see Table 5). Two equally-efficient estimators have a relative efficiency of 1. Given that MI was the reference group, R.E. < 1 indicate that the respective comparison missing data method is more efficient than MI. Likewise, R.E. > 1 indicate that MI is more efficient. Discussion The purpose of this article was threefold; (a) to determine how CTE researchers have managed missing data in several recent issues of CTER, (b) to review missing data theory and alternative methods of handling missing data, and (c) to illustrate the detrimental effects that unprincipled missing data techniques may have on the accuracy of research results. Our overarching goal was to offer guidelines that may help CTE researchers to adequately address missing data problems in their own work. The examination of recent research in our field reveals that more attention needs to be paid to the issue of missing data. Improvements are necessary in the reporting of missing data, as well as the methods used for their treatment. These concerns are intensified by the frequent use of small sample sizes. While small sample sizes are not problematic in certain types of research designs, they do present a problem when using survey data to conduct inferential statistical analysis. Small sample sizes may not provide enough power to detect differences, and the use of case reduction methods to remedy missing data in small samples further exacerbates this issue. Finally, low response rates to survey research were reported in the 20-30% range for a number of studies. The extent of non-response bias threatens the generalizability of findings from these studies, for non-responders may be systematically different from responders (i.e., missing data are not likely to be MCAR). Traditional missing data methods, however, perform acceptably only under MCAR. Our example using a small sample of real-life data illustrated the various detrimental effects of traditional approaches to handling missing data. These effects include the occurrence of Type II error, 2011-25 biased regression coefficients, standard errors, and adjusted R2 values, as well as loss of variance (for mean substitution) and statistical power/efficiency (for listwise deletion). Clearly, the repercussions of distorted analysis results can be serious for policy and practice, as intervention effects and other outcomes of interest may be severely misestimated. MI yielded estimates that were much closer to those of the complete-case benchmark. The method’s performance was particularly robust under the MAR assumption, which is more realistic than MCAR in most education and social science datasets. While researchers must keep in mind that imputed data represent plausible as opposed to real data, and that the uncertainty inherent in the missing data will be reflected in larger standard errors, MI has allowed us to draw inferences that would be highly similar to those from complete data. Overall, our findings are in line with prior studies that have illustrated the superior performance of MI as a prominent modern missing data technique (e.g., Graham, Hofer, & MacKinnon, 1996; Roth & Switzer, 1995; Schafer & Graham, 2002). Conclusion Scholmer et al. (2010) urged researchers to provide detail in their manuscripts about the extent and nature of the missing data they encountered, describe any procedures they used to manage this missing data, and include a rationale for selecting particular techniques to manage the missing data. Incorporating this type of expectation in the review process of the CTER and other journals in workforce education would contribute to reducing the potentially biasing effects that missing data can have, as detailed in our comparison of missing data techniques with a small sample from NLSY97. These potentially biasing effects can produce incorrect inferences that may, in the worst case, have implications for CTE policy decisions. Schlomer et al.’s (2010) suggestions, if adopted, will not change behavior overnight. Our review of the treatment of missing data for four recent volumes of the CTER showed little attention to this important aspect of data analysis. Nevertheless, directing attention toward missing data now can enhance the quality of the research generated by CTE researchers, and also provide a more robust basis for making recommendations for policy and practice. 2011-25 References Allison, P. D. (2002). Missing data. Thousand Oaks, CA: Sage. American Psychological Association. (2010). Publication manual. Washington, DC: Author. Barnard, J., & Meng, X. L. (1999). Applications of multiple imputation in medical studies: From AIDS to NHANES. Statistical Methods in Medical Research, 8, 7-36. Beunckens, C., Sotto, C., & Molenberghs, G. (2008). A simulation study comparing weighted estimating equations with multiple imputation based estimating equations for longitudinal binary data. Computational Statistics and Data Analysis, 52, 1533-1548. Brown, R. L. (1994). Efficacy of the indirect approach for estimating structural equation models with missing data: A comparison of five methods. Structural Equation Modeling, 4, 287-316. Chaudhuri, A., & Stenger, H. (1992). Survey sampling: Theory and methods. New York, NY: Dekker. Chen, G., & Åstebro, T. (2003). How to deal with missing categorical data: Test of a simple Bayesian method. Organizational Research Methods, 6, 309-327. Chen, J., & Shao, J. (2001). Jackknife variance estimation for nearest-neighbor imputation. Journal of the American Statistical Association, 96, 260-269. Demirtas, H. (2005). Multiple imputation under Bayesianly smoothed pattern-mixture models for nonignorable drop-out. Statistics in Medicine, 24, 2345-2363. Demirtas, H., & Schafer, J. L. (2003). On the performance of random-coefficient pattern-mixture models for nonignorable dropout. Statistics in Medicine, 21, 1-23. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39, 1-38. Eliason, S. R. (1993). Maximum likelihood estimation: Logic and practice. Newbury Park, CA: Sage. Elliott, M. N., Edwards, C., Angeles, J., Habarsoomians, K., & Hays, R. D. (2005). Patterns of unit and item nonresponse in the CAHPS hospital survey. Health Services Research, 40, 2096-2119. Enders, C. K. (2006). Analyzing structural equation models with missing data. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (pp. 313-342). Greenwich, CT: Information Age. Fielding, S., Fayers, P. M., & Ramsay, C. R. (2009). Investigating the missing data mechanism in quality of life outcomes: A comparison of approaches. Health and Quality of Life Outcomes, 7, 1-10. Given, B. A., Keilman, L. J., Collins, C., & Given, C. W. (1990). Strategies to minimize attrition in longitudinal studies. Nursing Research, 39, 184-187. Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology. Advance online publication. doi: 10.1146/annurev.psych.58.110405.085530 Graham, J. W., & Donaldson, S. W. (1993). Evaluating interventions with differential attrition: The importance of nonresponse mechanisms and use of follow-up data. Journal of Applied Psychology, 78, 119-128. Graham, J. W., Hofer, S. M., & MacKinnon, D. P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31, 197-218. Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206-213. Graham, J. W., & Schafer, J. L. (1999). On the performance of multiple imputation for multivariate data with small sample size. In Rick H. Hoyle (Ed), Statistical strategies for small sample research (pp. 129). Thousand Oaks, CA: Sage. Groves, R. M., & Couper, M. P. (1998). Nonresponse in household interview surveys. New York, NY: Wiley. Honaker, J. & King, G. (2010). What to do about missing data in time-series cross-sectional data. American Journal of Political Science, 54, 561-81. Ibrahim, J. G. (1990). Incomplete data in generalized linear models. Journal of the American Statistical Association, 85, 765-769. 2011-25 JeliΔiΔ, H., Phelps, E., Lerner, R. M. (2009). Use of missing data methods in longitudinal studies: The persistence of bad practices in developmental psychology. Developmental Psychology, 45, 11951199. Little, R. J. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83, 1198-1202. Little, R. J., & Rubin, D. B. (1987). Statistical analysis with missing data. New York, NY: Wiley. McKnight, P. E., McKnight, K. M., Sidani, S., & Figueredo, A. J. (2007). Missing data: A gentle introduction. New York, NY: Guilford. Mooney, C. Z. (1997). Monte Carlo simulation. Thousand Oaks, CA: Sage. Panik, M. J. (2005). Advanced statistics from an elementary point of view. Burlington, MA: Elsevier. Raaijmakers, Q. A. W. (1999). Effectiveness of different missing data treatments in surveys with Likerttype data: Introducing the relative mean substitution approach. Educational and Psychological Measurement, 59, 725-748. Roth, P. L., & Switzer, F. S. (1995). A Monte Carlo analysis of missing data techniques in HRM settings. Journal of Management, 21,1003-1023. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581-592. Rubin, D. B. (1978). Multiple imputation in sample surveys – a phenomenological Bayesian approach to nonresponse. Proceedings of the Survey Research Methods Section. American Statistical Association, 20-34. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley. Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91, 473-489. Schafer, J. L. (1997). Analysis of incomplete multivariate data. Boca Raton, FL: Chapman & Hall. Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 3-15. Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177. Schafer, J. L., & Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research, 33, 545-571. Schlomer, G. L., Bauman, S., & Card, N. A. (2010). Best practices for missing data management in counseling psychology. Journal of Counseling Psychology, 57, 1–10. Schulte Nordholt, E. (1998). Imputation: Methods, simulation experiments and practical examples. International Statistical Review, 66, 157-180. Sinharay, S., Stern, H. S., & Russell, D. (2001). The use of multiple imputation for the analysis of missing data. Psychological Methods, 6, 317-329. Stuart, E. A., Azur, M., Frangakis, C., & Leaf, P. (2009). Multiple imputation with large data sets: A case study of the children’s mental health initiative. American Journal of Epidemiology, 169, 1133-1139. Switzer, F. S., Roth, P. L., & Switzer, D. M. (1998). A Monte Carlo analysis of systematic data loss in an HRM setting. Journal of Management, 24, 763-779. Tsikriktsis, N. (2005). A review of techniques for treating missing data in OM survey research. Journal of Operations Management, 24, 53-62. U.S. Bureau of Labor Statistics. (2009). National longitudinal survey of youth 1997 [Data file and code book]. Retrieved from https://www.nlsinfo.org/investigator/pages/welcome.jsp Van Buuren, S., & Groothuis-Oudshoorn, K. (2009). MICE: Multivariate imputation by chained equations in R. Journal of Statistical Software. Retrieved from http://www.stefvanbuuren.nl/publications/MICE in R - Draft.pdf Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594–604.