Practical Missing Data Analysis in SPSS (v17 onwards) Peter T

advertisement
Practical Missing Data
Analysis in SPSS
(v17 onwards)
Peter T. Donnan
Professor of Epidemiology and Biostatistics
Objectives
• How to impute missing values
in SPSS, specifically MI
• How to implement analyses
with multiple imputed values
• Interpretation of the output
• Practical tips
Example data
From trial of pedometers+advice vs
advice vs controls in sedentary elderly
women
Follow-up at 3 and 6 mnths
Main outcome measure of activity
from accelerometer counts
210 randomised / 170 at 3 months
Example data – Pedometer
trial
Read in data ‘SPSS Study databse.sav’
Main outcome is:
3 mnth activity – AccelVM2
Baseline activity – AccelVM1a
Trial arm represented by two dummy
variables:
Grp1 = Pedom. Vs. control
Grp2 = Advice vs. control
Main analysis – Pedometer
trial
Regression on 3
months activity
adjusting for
baseline activity
and two dummy
variables
representing trial
arm contrasts
Main analysis – Pedometer
trial
Note that n =170
with 40 missing in
complete case analysis
and so potential for
bias
Missing at Random (MAR)
• Prob (Missing) is independent of:
1) unobserved data but
2) dependent on observed data
• Essentially observed data is a random
sample of full data in each stratum
• MAR is weaker version of MCAR
assumption
• If MAR is assumed, many methods
possible to impute data using observed
data.
Comparison of completers at
3 months and drop-outs
Completers (n =172)
Dropped out at 3
months (n = 32)
Chi-squared or ttest p-value
77.1 (5.0)
78.5 (5.6)
0.137
130695 (47991)
113381 (50444)
8.69 (2.25)
7.41 (2.86)
£199.59 (306.74)
£404.29 (1289.54)
Pedometer Group N (%)
58 (85.3%)
10 (14.7%)
BCI Group N (%)
52 (77.6%)
15 (22.4%)
Control Group N (%)
62 (92.5%)
5 (7.5%)
Stairs difficult Yes
48 (76.2%)
15 (23.8%)
No
124 (87.9%
17 (12.1%)
Age Mean (SD)
Accelerometer VM Mean (SD)
Limb Function Mean (SD)
NHS Costs previous 3 months
Mean (SD)
0.065
0.028
0.402
0.052
0.033
Execution of MI in SPSS
So assuming MAR we can use the
available data to predict missing values
in SPSS:
Analyze
Multiple Imputation
Impute Missing Data Values
Execution of MI in SPSS
Enter ALL variables
you think associated
with missingness
Note default
imputation number =
5
Create new dataset
to store results
Note icon indicating
procedures that
allow MI analysis
Execution of MI in SPSS
Automatic method
lets SPSS chose
Custom gives more
flexibility
Can include all 2-way
interactions
Linear Regression
model prediction
Execution of MI in SPSS
List of variables
chosen
Define Each variable
for imputation or
predictor or BOTH
N.b. Recommend
including the
OUTCOME as both
predictor and
outcome
Output of MI in SPSS
Note main interest
in outcome VM2 but
other factors with
missing values also
imputed
Step 2 - Using Imputed
datasets in analysis
Note new dataset has IMPUTATION number
as first column and contains in order the
original dataset (n = 210), IMPUTATION = 0
and concatenated below it a further 5 new
datasets (each n = 210) but now with
imputed values, IMPUTATION = 1 to 5
Most analyses can now be implemented if the
fossil shell spiral symbol is present
Repeat Main analysis –
Need Pooled Results
Procedure exactly
same as before
SPSS will do the
pooled analysis if
the icon (above)
is present in the
drop-down menu
Pooled Analysis in SPSS
Results
presented for
the original
data and for
each imputed
dataset
separately
Results of pooled analysis
from 5 imputed datasets
Model
B
SE
t
Sig.
Fraction
missing
Constant
15607
7808
1.999
0.047
0.173
AccelVM1a
0.852
0.051
16.630
0.000
0.124
Pedometer
Group
11310
6131
1.845
0.066
0.138
Advice only 17536
6526
2.687
0.009
0.266
Pooled
Larger
effect
sizes in
both
groups
Greater power gives
more significance
Interpretation
Compare pooled results with the original as a
form of sensitivity analysis
If results similar suggests the original results
fairly robust
Consider whether MAR is reasonable assumption
Consider whether you have included all factors
(including the outcome) related to the
missingness in the imputation model as a crucial
assumption
Summary
•
SPSS now includes Multiple imputation in its
armoury
•
Consider assumptions of MI
•
Compare results under different assumption
to assess robustness of results
•
If MAR assumption o.k. then MI provides
results that are less biased than complete
case analysis
Download