Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics Objectives • How to impute missing values in SPSS, specifically MI • How to implement analyses with multiple imputed values • Interpretation of the output • Practical tips Example data From trial of pedometers+advice vs advice vs controls in sedentary elderly women Follow-up at 3 and 6 mnths Main outcome measure of activity from accelerometer counts 210 randomised / 170 at 3 months Example data – Pedometer trial Read in data ‘SPSS Study databse.sav’ Main outcome is: 3 mnth activity – AccelVM2 Baseline activity – AccelVM1a Trial arm represented by two dummy variables: Grp1 = Pedom. Vs. control Grp2 = Advice vs. control Main analysis – Pedometer trial Regression on 3 months activity adjusting for baseline activity and two dummy variables representing trial arm contrasts Main analysis – Pedometer trial Note that n =170 with 40 missing in complete case analysis and so potential for bias Missing at Random (MAR) • Prob (Missing) is independent of: 1) unobserved data but 2) dependent on observed data • Essentially observed data is a random sample of full data in each stratum • MAR is weaker version of MCAR assumption • If MAR is assumed, many methods possible to impute data using observed data. Comparison of completers at 3 months and drop-outs Completers (n =172) Dropped out at 3 months (n = 32) Chi-squared or ttest p-value 77.1 (5.0) 78.5 (5.6) 0.137 130695 (47991) 113381 (50444) 8.69 (2.25) 7.41 (2.86) £199.59 (306.74) £404.29 (1289.54) Pedometer Group N (%) 58 (85.3%) 10 (14.7%) BCI Group N (%) 52 (77.6%) 15 (22.4%) Control Group N (%) 62 (92.5%) 5 (7.5%) Stairs difficult Yes 48 (76.2%) 15 (23.8%) No 124 (87.9% 17 (12.1%) Age Mean (SD) Accelerometer VM Mean (SD) Limb Function Mean (SD) NHS Costs previous 3 months Mean (SD) 0.065 0.028 0.402 0.052 0.033 Execution of MI in SPSS So assuming MAR we can use the available data to predict missing values in SPSS: Analyze Multiple Imputation Impute Missing Data Values Execution of MI in SPSS Enter ALL variables you think associated with missingness Note default imputation number = 5 Create new dataset to store results Note icon indicating procedures that allow MI analysis Execution of MI in SPSS Automatic method lets SPSS chose Custom gives more flexibility Can include all 2-way interactions Linear Regression model prediction Execution of MI in SPSS List of variables chosen Define Each variable for imputation or predictor or BOTH N.b. Recommend including the OUTCOME as both predictor and outcome Output of MI in SPSS Note main interest in outcome VM2 but other factors with missing values also imputed Step 2 - Using Imputed datasets in analysis Note new dataset has IMPUTATION number as first column and contains in order the original dataset (n = 210), IMPUTATION = 0 and concatenated below it a further 5 new datasets (each n = 210) but now with imputed values, IMPUTATION = 1 to 5 Most analyses can now be implemented if the fossil shell spiral symbol is present Repeat Main analysis – Need Pooled Results Procedure exactly same as before SPSS will do the pooled analysis if the icon (above) is present in the drop-down menu Pooled Analysis in SPSS Results presented for the original data and for each imputed dataset separately Results of pooled analysis from 5 imputed datasets Model B SE t Sig. Fraction missing Constant 15607 7808 1.999 0.047 0.173 AccelVM1a 0.852 0.051 16.630 0.000 0.124 Pedometer Group 11310 6131 1.845 0.066 0.138 Advice only 17536 6526 2.687 0.009 0.266 Pooled Larger effect sizes in both groups Greater power gives more significance Interpretation Compare pooled results with the original as a form of sensitivity analysis If results similar suggests the original results fairly robust Consider whether MAR is reasonable assumption Consider whether you have included all factors (including the outcome) related to the missingness in the imputation model as a crucial assumption Summary • SPSS now includes Multiple imputation in its armoury • Consider assumptions of MI • Compare results under different assumption to assess robustness of results • If MAR assumption o.k. then MI provides results that are less biased than complete case analysis