Advances in Missing Data and Implications for Developmental Science Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director, Undergraduate Social and Behavioral Sciences Methodology Minor Member, Developmental Psychology Training Program crmda.KU.edu Talk presented 03-31-2011 @ Society for Research in Child Development crmda.KU.edu 1 Conclusions •Imputing missing data is not cheating • NOT imputing missing data is MOST likely to lead to errors in generalization! •Plan for un-intentional missing data •Plan intentionally missing data crmda.KU.edu 2 Types of missing data crmda.KU.edu 3 Modern Missing Data Analysis MI or FIML • In 1978, Rubin proposed Multiple Imputation (MI) • • • • An approach especially well suited for use with large public-use databases. First suggested in 1978 and developed more fully in 1987. MI primarily uses the Expectation Maximization (EM) algorithm and/or the Markov Chain Monte Carlo (MCMC) algorithm. Beginning in the 1980’s, likelihood approaches developed. • • Multiple group SEM Full Information Maximum Likelihood (FIML). • An approach well suited to more circumscribed models crmda.KU.edu 4 Missing Data and Estimation: Missingness by Design • • Assess all persons, but not all variables at each time of measurement Control entry into study: estimate and control for retest effects, increase validity, decrease costs, increase power, etc. • • Randomly assign participants to their entry into a longitudinal study and/or to the occasions of assessment Key to providing unbiased estimates of growth or change crmda.KU.edu 5 3-Form Protocol Common Form Variables Variable Set A Variable Set B Variable Set C 1 Marker Variables ~1/3 of Variables ~1/3 of Variables None 2 Marker Variables Marker Variables ~1/3 of Variables none ~1/3 of Variables none ~1/3 of Variables ~1/3 of Variables 3 crmda.KU.edu 6 Expansions of 3-Form Design (Graham, Taylor, Olchowski, & Cumsille, 2006) crmda.KU.edu 7 Expansions of 3-Form Design (Graham, Taylor, Olchowski, & Cumsille, 2006) crmda.KU.edu 8 2-Method Planned Missing Design crmda.KU.edu 9 Controlled Enrollment Group Time 1 Time 2 Time 3 Time 4 Time 5 1 x x x x x 2 x x x missing missing 3 x x missing x missing 4 x missing x x missing 5 missing x x x missing 6 x x missing missing x 7 x missing x missing x 8 missing x x missing x 9 x missing missing x x 10 missing x missing x x 11 missing missing x x x crmda.KU.edu 10 Optimal Growth Curve Design Group Time 1 Time 2 Time 3 Time 4 Time 5 1 x x x x x 6 x x missing missing x 7 x missing x missing x 9 x missing missing x x crmda.KU.edu 11 Combined Elements crmda.KU.edu 12 The Sequential Designs crmda.KU.edu 13 Transforming to Accelerated Longitudinal • Assumes a MAR process, but if you plan for it and measure cohort-related influences, the impact will be easily estimated. • In the analysis, cohort becomes a variable that is controlled for. crmda.KU.edu 14 Advances in Missing Data and Implications for Developmental Science Thanks for your attention! Questions? crmda.KU.edu Talk presented 03-31-2011 @ Society for Research in Child Development crmda.KU.edu 15 Update Dr. Todd Little is currently at Texas Tech University Director, Institute for Measurement, Methodology, Analysis and Policy (IMMAP) Director, “Stats Camp” Professor, Educational Psychology and Leadership Email: yhat@ttu.edu IMMAP (immap.educ.ttu.edu) Stats Camp (Statscamp.org) www.Quant.KU.edu 16