Planned missing data designs for social science research: Use them! Todd D. Little University of Kansas Director, Quantitative Psychology Training Program Director, Center for Research Methods and Data Analysis Director, Undergraduate Social and Behavioral Sciences Methodology Minor Member, Developmental Psychology Training Program crmda.KU.edu Colloquium presented 10-05-2012 @ Notre Dame University Special Thanks to: Mijke Rhemtulla & Wei Wu crmda.KU.edu 1 Road Map • Learn about the different types of missing data • Learn about ways in which the missing data process can be recovered • Understand why imputing missing data is not cheating • Learn why NOT imputing missing data is more likely to lead to errors in generalization! • Learn to use intentionally missing designs!! crmda.KU.edu 2 Key Considerations • Recoverability • • • Bias • • Is it possible to recover what the sufficient statistics would have been if there was no missing data? • (sufficient statistics = means, variances, and covariances) Is it possible to recover what the parameter estimates of a model would have been if there was no missing data. Are the sufficient statistics/parameter estimates systematically different than what they would have been had there not been any missing data? Power • Do we have the same or similar rates of power (1 – Type II error rate) as we would without missing data? crmda.KU.edu 3 Types of Missing Data • Missing Completely at Random (MCAR) • No association with unobserved variables (selective process) and no association with observed variables • Missing at Random (MAR) • No association with unobserved variables, but maybe related to observed variables • Random in the statistical sense of predictable • Non-random (Selective) Missing (MNAR) • Some association with unobserved variables and maybe with observed variables crmda.KU.edu 4 Effects of imputing missing data crmda.KU.edu 5 Effects of imputing missing data No Association with ANY Observed Variable An Association with Analyzed Variables An Association with Unanalyzed Variables No Association with Unobserved /Unmeasured Variable(s) MCAR •Fully recoverable •Fully unbiased MAR • Partly to fully recoverable • Less biased to unbiased MAR • Partly to fully recoverable • Less biased to unbiased An Association with Unobserved /Unmeasured Variable(s) NMAR • Unrecoverable • Biased (same bias as not estimating) MAR/NMAR • Partly to fully recoverable • Same to unbiased MAR/NMAR • Partly to fully recoverable • Same to unbiased Statistical Power: Will always be greater when missing data is imputed! crmda.KU.edu 6 Modern Missing Data Analysis MI or FIML • In 1978, Rubin proposed Multiple Imputation (MI) • • • • An approach especially well suited for use with large public-use databases. First suggested in 1978 and developed more fully in 1987. MI primarily uses the Expectation Maximization (EM) algorithm and/or the Markov Chain Monte Carlo (MCMC) algorithm. Beginning in the 1980’s, likelihood approaches developed. • • Multiple group SEM Full Information Maximum Likelihood (FIML). • An approach well suited to more circumscribed models; but auxiliary variables must be included! crmda.KU.edu 7 Three-form design • What goes in the Common Set? Form Common Set X Variable Set A Variable Set B Variable Set C 1 ¼ of items ¼ of items ¼ of items missing 2 ¼ of items ¼ of items missing ¼ of items 3 ¼ of items missing ¼ of items ¼ of items crmda.KU.edu 8 Three-form design: Example • 21 questions made up of 7 3-question subtests Subtest Item Subtest Item Demographics How old are you? Are you male or female? What is your occupation? Extraversion Musical Taste What is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? I start conversations. I am the life of the party. I am comfortable around people. Neuroticism I get stressed out easily. I get irritated easily. I have frequent mood swings. Conscientiousness I am always prepared. I like order. I pay attention to details. Agreeableness I am interested in people. I have a soft heart. I take time out for others. Openness I have a rich vocabulary. I have excellent ideas. I have a vivid imagination. crmda.KU.edu 9 Three-form design: Example • Common Set (X) Subtest Item Subtest Item Demographics How old are you? Are you male or female? What is your occupation? Extraversion Musical Taste What is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? I start conversations. I am the life of the party. I am comfortable around people. Neuroticism I get stressed out easily. I get irritated easily. I have frequent mood swings. Conscientiousness I am always prepared. I like order. I pay attention to details. Agreeableness I am interested in people. I have a soft heart. I take time out for others. Openness I have a rich vocabulary. I have excellent ideas. I have a vivid imagination. crmda.KU.edu Three-form design: Example • Common Set (X) Subtest Item Subtest Item Demographics How old are you? Are you male or female? What is your occupation? Extraversion Musical Taste What is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? I start conversations. I am the life of the party. I am comfortable around people. Neuroticism I get stressed out easily. I get irritated easily. I have frequent mood swings. Conscientiousness I am always prepared. I like order. I pay attention to details. Agreeableness I am interested in people. I have a soft heart. I take time out for others. Openness I have a rich vocabulary. I have excellent ideas. I have a vivid imagination. crmda.KU.edu 11 Three-form design: Example • Set A Subtest Item Subtest Item Demographics How old are you? Are you male or female? What is your occupation? Extraversion Musical Taste What is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? I start conversations. I am the life of the party. I am comfortable around people. Neuroticism I get stressed out easily. I get irritated easily. I have frequent mood swings. Conscientiousness I am always prepared. I like order. I pay attention to details. Agreeableness I am interested in people. I have a soft heart. I take time out for others. Openness I have a rich vocabulary. I have excellent ideas. I have a vivid imagination. crmda.KU.edu 12 Three-form design: Example • Set B Subtest Item Subtest Item Demographics How old are you? Are you male or female? What is your occupation? Extraversion Musical Taste What is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? I start conversations. I am the life of the party. I am comfortable around people. Neuroticism I get stressed out easily. I get irritated easily. I have frequent mood swings. Conscientiousness I am always prepared. I like order. I pay attention to details. Agreeableness I am interested in people. I have a soft heart. I take time out for others. Openness I have a rich vocabulary. I have excellent ideas. I have a vivid imagination. crmda.KU.edu 13 Three-form design: Example • Set C Subtest Item Subtest Item Demographics How old are you? Are you male or female? What is your occupation? Extraversion Musical Taste What is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? I start conversations. I am the life of the party. I am comfortable around people. Neuroticism I get stressed out easily. I get irritated easily. I have frequent mood swings. swings. Conscientiousness I am always prepared. I like order. I pay attention to details. Agreeableness I am interested in people. I have a soft heart. I take time out for others. Openness I have a rich vocabulary. I have excellent ideas. I have a vivid imagination. crmda.KU.edu 14 Form 1 (XAB) Form 2 (XAC) Form 3 (XBC) How old are you? Are you male or female? What is your occupation? How old are you? Are you male or female? What is your occupation? How old are you? Are you male or female? What is your occupation? What is your favorite genre of What is your favorite genre of What is your favorite genre of music? music? music? Do you like to listen to music Do you like to listen to music Do you like to listen to music while you work? while you work? while you work? Do you prefer music played loud or Do you prefer music played loud or Do you prefer music played loud or softly? softly? softly? I have a rich vocabulary. I have excellent ideas. I have a rich vocabulary. I have a vivid imagination. I have excellent ideas. I have a vivid imagination. I start conversations. I am the life of the party. I start conversations. I am comfortable around people. I am the life of the party. I am comfortable around people. I get stressed out easily. I get irritated easily. I get stressed out easily. I have frequent mood swings. I get irritated easily. I have frequent mood swings. I am always prepared. I like order. I am always prepared. I pay attention to details. I like order. I pay attention to details. I am interested in people. I have a soft heart. I am interested in people. I take time out for others. I have a soft heart. I take time out for others. 15 student Jazz 4 1 29 M server 5 1 27 M 6 2 21 7 2 8 4 4 -- 1 5 -- 1 2 -- 4 2 -- 3 2 -- soft 1 3 -- 2 2 -- 5 3 -- 4 1 -- 2 1 -- N soft 2 4 -- 5 5 -- 2 4 -- 5 1 -- 4 2 -- Metal N soft 1 3 -- 5 2 -- 2 1 -- 1 1 -- 4 2 -- chef Rock N soft 1 4 -- 5 1 -- 2 2 -- 5 3 -- 2 2 -- F painter Pop Y loud 4 -- 4 2 -- 1 1 -- 5 1 -- 5 5 -- 3 39 F librarian Alt N loud 1 -- 4 4 -- 3 4 -- 3 4 -- 2 4 -- 3 2 22 F server Ska N soft 4 -- 2 3 -- 3 3 -- 3 1 -- 2 5 -- 5 9 2 38 M doctor Punk N loud 1 -- 3 2 -- 2 2 -- 4 4 -- 1 3 -- 2 10 2 29 F statistician Pop N loud 4 -- 5 3 -- 4 5 -- 4 3 -- 2 3 -- 1 11 3 28 F chef Rock Y loud -- 3 3 -- 5 5 -- 5 4 -- 3 3 -- 2 5 12 3 25 M nurse Rock N soft -- 4 5 -- 2 2 -- 2 5 -- 4 5 -- 3 5 13 3 29 M lawyer Jazz Y soft -- 3 4 -- 3 2 -- 4 5 -- 4 5 -- 1 2 14 3 38 F accountant Metal N soft -- 3 1 -- 1 2 -- 3 3 -- 4 4 -- 5 4 15 3 21 F secretary Alt N loud -- 4 4 -- 1 2 -- 1 1 -- 5 3 -- 4 5 Genre Agree3 M Agree2 27 Agree1 1 Consc3 3 Consc2 N Consc1 Funk Neuro3 musician Neuro2 F Neuro1 42 Extra3 1 Extra2 2 Extra1 professor Open3 Occupation F Open2 Sex 47 Open1 Age 1 Volume Form Work Music Participant 1 Classical N loud crmda.KU.edu 16 Expansions of 3-Form Design Forms Item Set Order 1 2 3 4 5 6 7 8 9 XAB XAC XBC AXB AXC BXC ABX ACX BCX (Graham, Taylor, Olchowski, & Cumsille, 2006) crmda.KU.edu 17 Expansions of 3-Form Design (Graham, Taylor, Olchowski, & Cumsille, 2006) crmda.KU.edu 18 2-Method Planned Missing Design crmda.KU.edu 19 2-Method Planned Missing Design Self-Report Bias SelfSelfReport 1 Report 2 CO Cotinine Smoking crmda.KU.edu 20 2-Method Measurement • Expensive Measure 1 • • Gold standard– highly valid (unbiased) measure of the construct under investigation Problem: Measure 1 is time-consuming and/or costly to collect, so it is not feasible to collect from a large sample • Inexpenseive Measure 2 • • Practical– inexpensive and/or quick to collect on a large sample Problem: Measure 2 is systematically biased so not ideal crmda.KU.edu 21 2-Method Measurement • e.g., measuring stress • • Expensive Measure 1 = collect spit samples, measure cortisol Inexpensive Measure 2 = survey querying stressful thoughts • e.g., measuring intelligence • • Expensive Measure 1 = WAIS IQ scale Inexpensive Measure 2 = multiple choice IQ test • e.g., measuring smoking • • Expensive Measure 1 = carbon monoxide measure Inexpensive Measure 2 = self-report • e.g., Student Attention • • Expensive Measure 1 = Classroom observations Inexpensive Measure 2 = Teacher report crmda.KU.edu 22 Longitudinal 2-Method Measurement • Example • Does child’s level of classroom attention in Grade 1 • • predict math ability in Grade 3? Attention Measures • 1) Direct Classroom Assessment (2 items, N = 60) • 2) Teacher Report (2 items, N = 200) Math Ability Measure, 1 item (test score, N = 200) crmda.KU.edu 23 1 Attention (Grade 1) TR1 TR 2 DA1 DA2 Math Score (Grade 3) 1 Teacher Report 1 Teacher Report 2 Direct Assessment 1 Direct Assessment 2 Math Score (Grade 3) (N = 200) (N = 200) (N = 60) (N = 60) (N = 200) 1 1 Teacher Bias 2 bias crmda.KU.edu 24 crmda.KU.edu 25 crmda.KU.edu 26 A Wave Missing Design Group Time 1 Time 2 Time 3 Time 4 Time 5 1 x x x x x 2 x x x missing missing 3 x x missing x missing 4 x missing x x missing 5 missing x x x missing 6 x x missing missing x 7 x missing x missing x 8 missing x x missing x 9 x missing missing x x 10 missing x missing x x 11 missing missing x x x crmda.KU.edu 27 The Sequential Designs crmda.KU.edu 28 Transforming to Accelerated Longitudinal crmda.KU.edu 29 Variable Actual Assessments X1 Y1 X2 Y2 X3 Y3 X4 Y4 X5 Y5 X6 X7 X8 X9 Xi Xj •• • Xn T1 Y6 Y7 Y8 Y9 Yi Yj Yn Tmin T2 crmda.KU.edu Tmax 30 Multiple Regression LAM model Yˆi b0 b1 X i b2 Lagi b3 X i Lagi • • • • • Xi is the focal predictor of outcome Yi Lagi can vary across persons b1 describes the effect of Xi on Yi when Lagi is zero b2 describes the effect of Lagi on Yi when Xi is zero b3 describes change in the Xi → Yi relationship as a function of Lagi crmda.KU.edu 31 An Empirical Example • Data are from the Early Head Start (EHS) Research and Evaluation study (N = 1,823) • Data were collected at Time 1 when the focal children were approximately 14 months of age and again at Time 2 when the children were approximately 24 months of age • The average lag between Time 1 and Time 2 observations was • 10.3 months with values ranging from 3.0 to 17.3 months Measures: • The Home Observation for the Measurement of the Environment (HOME) assessed the quality of stimulation in the home at Time 1. • The Mental Development Index (MDI) from the Bayley Scales of Infant Development measured developmental status of children at Time 2. crmda.KU.edu 32 HOME predicting MDI MDI T 2 b0 b1 HOME T1 b2 Lag b3 ( HOMET 1 Lag ) Effect of HOMET1 on MDIT2 3 2.5 2 1.5 1 0.5 0 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 Lag (Mean Centered) crmda.KU.edu 33 Randomly Distributed Assessment X1 Y1 Y2 X2 X3 Y3 X6 X7 X8 X9 Y3 Y5 Y3 Y4 Y5 Y6 Y7 Y1 Y1 Y2 Y2 Y2 Y2 Y4 X4 X5 Y1 Y8 Y3 Y4 Y4 Y4 Y5 Y6 Y5 Y7 Y7 Y5 Y6 Y7 Y3 Y6 Y7 Y6 Y8 Y8 Y8 Y8 Y9 Y9 Y9 Y9 Y1 Y9 •• • Xn Yn B T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 Yn crmda.KU.edu Yn Yn Yn 34 Thank You! Questions? crmda.KU.edu crmda.KU.edu 35 Developmental time-lag model • Use 2-time point data with variable time-lags to measure a growth trajectory + practice effects (McArdle & Woodcock, 1997) crmda.KU.edu 36 Time Age student T1 T2 1 5;6 5;7 2 5;3 5;8 3 4;9 4;11 4 4;6 5;0 5 4;11 5;4 6 5;7 5;10 7 5;2 5;3 8 5;4 5;8 0 1 2 crmda.KU.edu 3 4 5 6 37 T0 T1 T2 T3 crmda.KU.edu T4 T5 T6 38 Yt 1I Bt G At P 1 Intercept 1 T0 1 1 T1 1 1 1 1 T2 T3 crmda.KU.edu T4 T5 T6 39 Yt 1I Bt G At P Linear growth 1 1 Intercept 1 T0 1 Growth 1 T1 1 1 1 1 0 1 T2 2 3 4 T3 5 6 T4 T5 T6 40 crmda.KU.edu 40 Yt 1I Bt G At P Constant Practice Effect 1 1 Intercept 1 T0 1 1 Growth 1 T1 1 1 1 1 0 1 T2 2 3 4 T3 crmda.KU.edu Practice 5 6 T4 0 11 1 1 T5 1 1 T6 41 Yt 1I Bt G At P Exponential Practice Decline 1 1 Intercept 1 T0 1 1 Growth 1 T1 1 1 1 1 0 1 T2 2 3 4 T3 crmda.KU.edu Practice 5 6 T4 0 1 .87 .67 .55 T5 .45 .35 T6 42 60% MAR correlation estimates with no auxiliary variables Simulation results showing XY correlation estimates (with 95 and 99% confidence intervals) associated with a 60% MAR Situation. crmda.KU.edu 43 60% MAR correlation estimates with all possible auxiliary variables (r = .60) Simulation results showing XY correlation estimates (with 95 and 99% confidence intervals) associated with a 60% MAR Situation and 8 auxiliary variables. crmda.KU.edu 44 60% MAR correlation estimates with 1 PCA auxiliary variable (r = .60) Simulation results showing XY correlation estimates (with 95 and 99% confidence intervals) associated with a 60% MAR Situation and 1 PCA auxiliary variable. crmda.KU.edu 45 Auxiliary Variable Power Comparison 1 PCA Auxiliary All 8 Auxiliary Variables 1 Auxiliary crmda.KU.edu 46 Faster and more reliable convergence crmda.KU.edu 47 www.quant.ku.edu crmda.KU.edu 48