Part 2 Attrition: Bias and Loss of Power Relevant Papers Graham, Collins, (2009). Missing data analysis: making it work in the real world. Annual Review of Psychology, 60, 549-576. J.W., L. M., Schafer, J. L., & Kam, C. M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330_351. Hedeker, D., & Gibbons, R.D. (1997). Graham, J.W., & Collins, L.M. (2010, forthcoming). Application of random-effects pattern-mixture models for missing data in longitudinal studies, Psychological Methods, 2, 64-78. Using Modern Missing Data Methods with Auxiliary Variables to Mitigate the Effects of Attrition on Statistical Power. Chapter 10 in Graham (2010, forthcoming), Missing Data: Analysis and Design. New York: Springer. Relevant Papers Graham, J.W., Palen, L.A., et al. (2008). Attrition: MAR & MNAR missingness, and estimation bias. Annual Meetings of the Society for Prevention Research, San Francisco, CA. (available upon request) also see: Graham, J.W., (2010, forthcoming). Simulations with Missing Data. Chapter 9 in Graham (2010, forthcoming), Missing Data: Analysis and Design. New York: Springer. What if the cause of missingness is MNAR? Problems with this statement MAR & MNAR are widely misunderstood concepts I argue that the cause of missingness is never purely MNAR The cause of missingness is virtually never purely MAR either. MAR vs MNAR "Pure" MCAR, MAR, MNAR never occur in field research Each requires untenable assumptions e.g., that all possible correlations and partial correlations are r = 0 MAR vs MNAR Better to think of MAR and MNAR as forming a continuum MAR vs MNAR NOT even the dimension of interest MAR vs MNAR: What IS the Dimension of Interest? How much estimation bias? when cause of missingness cannot be included in the model Bottom Line ... All missing data situations are partly MAR and partly MNAR Sometimes it matters ... bias affects statistical conclusions Often it does not matter bias has tolerably little effect on statistical conclusions (Collins, Schafer, & Kam, Psych Methods, 2001) Methods: "Old" vs MAR vs MNAR MAR methods (MI and ML) are ALWAYS at least as good as, usually better than "old" methods (e.g., listwise deletion) Methods designed to handle MNAR missingness are NOT always better than MAR methods Yardstick for Measuring Bias Standardized Bias = (average parameter est) – (population value) -------------------------------------------------------- X 100 Standard Error (SE) |bias| < 40 considered small enough to be tolerable t-value off by 0.4 A little background for Collins, Schafer, & Kam (2001; CSK) Example model of interest: X Y X = Program (prog vs control) Y = Cigarette Smoking Z = Cause of missingness: say, Rebelliousness (or smoking itself) Factors to be considered: % Missing (e.g., % attrition) rYZ rZR rYZ Correlation between cause of missingness (Z) e.g., rebelliousness (or smoking itself) and the variable of interest (Y) e.g., Cigarette Smoking rZR Correlation between cause of missingness (Z) and missingness on variable of interest e.g., rebelliousness (or smoking itself) e.g., Missingness on the Smoking variable Missingness on Smoking (often designated: R or RY) Dichotomous variable: R = 1: Smoking variable not missing R = 0: Smoking variable missing CSK Study Design (partial) Simulations manipulated amount of missingness (25% vs 50%) rZY (r = .40, r = .90) rZR held constant r = .45 with 50% missing (applies to "MNAR-Linear" missingness) CSK Results (partial) (MNAR Missingness) 25% 25% 50% 50% missing, missing, missing, missing, rYZ rYZ rYZ rYZ = = = = .40 .90 .40 .90 ... ... ... ... no problem no problem no problem problem * "no problem" = bias does not interfere with inference These Results apply to the regression coefficient for X Y with "MNAR-Linear" missingness (see CSK, 2001, Table 2) But Even CSK Results Too Conservative Not considered by CSK: rZR In their simulation rZR = .45 Even with 50% missing and rYZ = .90 bias can be acceptably small Graham et al. (2008): Bias acceptably small (standardized bias < 40) as long as rZR < .24 rZR < .24 Very Plausible Study _________ HealthWise (Caldwell, Smith, et al., 2004) AAPT (Hansen & Graham, 1991) Botvin1 Botvin2 Botvin3 rZR (estimated) _____ .106 .093 .044 .078 .104 All of these yield standardized bias < 10 CSK and Follow-up Simulations Results very promising Suggest that even MNAR biases are often tolerably small But these simulations still too narrow Beginnings of a Taxonomy of Attrition Causes of Attrition on Y (main DV) Case 1: not Program (P), not Y, not PY interaction Case 2: P only Case 3: Y only . . . (CSK scenario) Case 4: P and Y only Graham, J. W. (2009). Annual Review of Psychology. Beginnings of a Taxonomy of Attrition Causes of Attrition on Y (main DV) Case Case Case Case 5: 6: 7: 8: PY interaction only P + PY interaction Y + PY interaction P, Y, and PY interaction Taxonomy of Attrition Cases 1-4 often little or no problem Cases 5-8 Jury still out (more research needed) Very likely not as much of a problem as previously though Use diagnostics to shed light Use of Missing Data Diagnostics Diagnostics based on pretest data not much help Hard to predict missing distal outcomes from differences on pretest scores Longitudinal Diagnostics can be much more helpful Hedeker & Gibbons (1997) Plot main DV over time for four groups: for Program and Control for those with and without last wave of data Much can be learned Empirical Examples Hedeker & Gibbons (1997) Drug treatment of psychiatric patients Hansen & Graham (1991) Adolescent Alcohol Prevention Trial (AAPT) Alcohol, smoking, other drug prevention among normal adolescents (7th – 11th grade) Empirical Example Used by Hedeker & Gibbons (1997) IV: Drug Treatment vs. Placebo Control DV: Inpatient Multidimensional Psychiatric Scale (IMPS) 1 2 3 4 5 6 7 = = = = = = = normal borderline mentally ill mildly ill moderately ill markedly ill severely ill among the most extremely ill From Hedeker & Gibbons (1997) 5.5 Placebo Control 5 IMPS 4.5 low = better outcomes 4 3.5 Drug Treatment 3 2.5 0 1 3 Weeks of Treatment 6 Longitudinal Diagnostics Hedeker & Gibbons Example Treatment droppers do BETTER than stayers Control droppers do WORSE than stayers Example of Program X DV interaction But in this case, pattern would lead to suppression bias Not as bad for internal validity in presence of significant program effect AAPT (Hansen & Graham, 1991) IV: Normative Education Program vs Information Only Control DV: Cigarette Smoking (3-item scale) Measured at one-year intervals 7th grade – 11th grade AAPT Control Program Control Cigarette Smoking (high = more smoking; arbitrary scale) Program th th th th th Longitudinal Diagnostics AAPT Example Treatment droppers do WORSE than stayers Control droppers do WORSE than stayers little steeper increase little steeper increase Little evidence for Prog X DV interaction Very likely MAR methods allow good conclusions (CSK scenario holds) Use of Auxiliary Variables Reduces attrition bias Restores some power lost due to attrition What Is an Auxiliary Variable? A variable correlated with the variables in your model but not part of the model not necessarily related to missingness used to "help" with missing data estimation Best auxiliary variables: same variable as main DV, but measured at waves not used in analysis model Model of Interest X Y 1 res 1 Benefit of Auxiliary Variables Example from Graham & Collins (2010) X Y Z 1 1 1 1 0 1 500 complete cases 500 cases missing Y X, Y variables in the model (Y sometimes missing) Z is auxiliary variable Benefit of Auxiliary Variables Effective sample size (N') Analysis involving N cases, with auxiliary variable(s) gives statistical power equivalent to N' complete cases without auxiliary variables Benefit of Auxiliary Variables It matters how highly Y and Z (the auxiliary variable) are correlated For example rYZ rYZ rYZ rYZ = = = = .40 .60 .80 .90 increase N N N N = = = = 500 500 500 500 gives gives gives gives power power power power of of of of N' N' N' N' = = = = 542 608 733 839 ( 8%) (22%) (47%) (68%) Effective Sample Size by rYZ 1000 900 Effective Sample Size 800 700 600 500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 rYZ Conclusions Attrition CAN be bad for internal validity But often it's NOT nearly as bad as often feared Don't rush to conclusions, even with rather substantial attrition Examine evidence (especially longitudinal diagnostics) before drawing conclusions Use MI and ML missing data procedures! Use good auxiliary variables to minimize impact of attrition Part 3: Illustration of Missing Data Analysis: Multiple Imputation with NORM and Proc MI Multiple Imputation: Basic Steps Impute Analyze Combine results Imputation and Analysis Impute 40 datasets Analyze each data set with USUAL procedures a missing value gets a different imputed value in each dataset e.g., SAS, SPSS, LISREL, EQS, STATA, HLM Save parameter estimates and SE’s Combine the Results Parameter Estimates to Report Average of estimate (b-weight) over 40 imputed datasets Combine the Results Standard Errors to Report Weighted sum of: “within imputation” variance average squared standard error usual kind of variability “between imputation” variance sample variance of parameter estimates over 40 datasets variability due to missing data Materials for SPSS Regression Starting place http://methodology.psu.edu downloads (you will need to get a free user ID to download all our free software) missing data software Joe Schafer's Missing Data Programs John Graham's Additional NORM Utilities http://mcgee.hhdev.psu.edu/missing/index.html (this mcgee website is currently down, but I hope to have it up again in the Fall). Please email me with any questions. exit for sample analysis