COU_Bauman20091075_supp

advertisement
Missing Data 1
Appendix B: Sample Syntax of Analyses for Illustration
The following three sections of this appendix provide sample syntax for performing mean
substitution using SPSS, MI with SAS, and FIML with Mplus. All use the illustrative data set
described in the text and available from the journal website. This data set is a tab-delimited text
file title “illustration.txt” that contains 60 rows to represent the 60 cases and 10 columns
representing 10 variables: ID number (from 1 to 60), group membership (0 or 1), the covariate, 7
variables representing the dependent variable under (a) no missing values; (b) MCAR with 10,
20, or 50% missing; and MAR with 10, 20, or 50% missing. Missing values are denoted by the
value of 999 in this text file. The dataset does not contain variable labels in the first row;
instead, the syntax files specify the variable names and order.
Mean Substitution using SPSS (not recommended)
Mean substitution, which is an approach we do not recommend, can be performed in all
programs. We next provide SPSS syntax to demonstrate these analyses with the illustrative data
set.
* The following opens the data file
* assuming it is placed directly in the C drive .
GET DATA
/TYPE=TXT
/FILE='C:\illustration.txt'
/DELCASE=LINE
/DELIMITERS="\t"
/ARRANGEMENT=DELIMITED
/FIRSTCASE=1
/IMPORTCASE=ALL
/VARIABLES=
ID F2.0
Group F1.0
Covariat F18.16
DV0Miss F18.16
DV10MCAR F18.16
DV20MCAR F18.16
DV50MCAR F17.16
DV10MAR F18.16
DV20MAR F18.16
DV50MAR F18.16.
CACHE.
EXECUTE.
Missing Data 2
DATASET NAME DataSet1 WINDOW=FRONT.
* The next lines recode the missing code 999 into SPSS system
missing values .
RECODE DV10MCAR DV20MCAR DV50MCAR DV10MAR DV20MAR DV50MAR
(999=SYSMIS).
EXECUTE .
* Analysis with 0% missing .
* The dependent variable (DV0Miss) is
* The remaining syntax specifies SPSS
which are reasonable in this analysis
deletion, which is irrelevant in this
data) .
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT DV0Miss
/METHOD=ENTER Group .
regressed onto “Group” .
defaults for regression,
(including listwise
situation of no missing
* Analysis with 10% MCAR using Mean Substitution .
* Here, the dependent variable is “DV10MCAR”, which is the same
variable as used previously, but with 10% of cases randomly
deleted (i.e., missing) .
* Note that the line “/MISSING MEANSUBSTITUTION” specifies mean
substitution as the method of managing missing data .
DATASET ACTIVATE DataSet1.
REGRESSION
/MISSING MEANSUBSTITUTION
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT DV10MCAR
/METHOD=ENTER Group.
* Analysis with 20% MCAR using Mean Substitution .
DATASET ACTIVATE DataSet1.
REGRESSION
/MISSING MEANSUBSTITUTION
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT DV20MCAR
/METHOD=ENTER Group.
To conserve space, we do not show the syntax for the remaining four analyses, noting
that these involve simply inserting DV50MCAR , DV10MAR , DV20MAR , or DV50MAR as the
‘dependent’ (variable).
MI using SAS
Missing Data 3
MI is an approach greatly preferred over mean substitution. SPSS does not have the
ability to directly perform MI. We illustrate these analyses using SAS syntax as shown next:
***The following lines read in data and recode 999 as missing***;
PROC IMPORT OUT= WORK.Illustration
DATAFILE= "C:\illustration.txt"
DBMS=TAB REPLACE;
GETNAMES=NO;
DATAROW=1;
RUN;
DATA illustration; set illustration;
ID = Var1;
Group = Var2;
Covariat = Var3;
DV0Miss = Var4;
IF (Var5 < 999) THEN DV10MCAR = Var5;
IF (Var6 < 999) THEN DV20MCAR = Var6;
IF (Var7 < 999) THEN DV50MCAR = Var7;
IF (Var8 < 999) THEN DV10MAR = Var8;
IF (Var9 < 999) THEN DV20MAR = Var9;
IF (Var10 < 999) THEN DV50MAR = Var10;
RUN;
****Analysis with 0% missing****;
* The dependent variable (DV0Miss) is regressed onto “Group”;
PROC REG DATA=illustration;
MODEL DV0Miss = group;
RUN;
****Multiple Imputation for 10% MCAR****;
* Here, the dependent variable is “DV10MCAR”, which has 10% of
cases randomly deleted (i.e., missing);
**** The first command creates 10 imputed data sets ***;
PROC MI DATA=illustration OUT=MCAR10 NIMPUTE=10 SEED=1211981;
VAR group covariat DV10MCAR;
RUN;
**** This second command performs 10 regression analyses for the
10 imputed data sets ***;
PROC REG DATA=MCAR10 outest=a COVOUT;
MODEL DV10MCAR = group;
BY _IMPUTATION_;
RUN;
**** This third command combines results of the 10 regression
analyses to estimate appropriate standard errors for the
regression coefficients intercept and group ***;
PROC MIANALYZE DATA=a;
MODELEFFECTS INTERCEPT group;
RUN;
****Multiple Imputation for 20% MCAR****;
PROC MI DATA=illustration OUT=MCAR20 NIMPUTE=10 SEED=1211981;
VAR group covariat DV20MCAR;
RUN;
PROC REG DATA=MCAR20 outest=a COVOUT;
MODEL DV20MCAR = group;
BY _IMPUTATION_;
Missing Data 4
RUN;
PROC MIANALYZE DATA=a;
MODELEFFECTS INTERCEPT group;
RUN;
To conserve space, we do not show the syntax for the remaining four analyses, noting
that these involve substituting DV50MCAR , DV10MAR , DV20MAR , or DV50MAR as the new
file name (out-file of Proc MI and data-file of Proc REG) and dependent variable in the ‘model’
command of the Proc REG.
FIML using Mplus
FIML is a model-based approach that is also greatly preferred over mean substitution.
The FIML approach to missing data management is most commonly implemented in structural
equation modeling or multilevel modeling software. We illustrate these analyses using MPlus
syntax as shown next (note that two warnings appear in the output for this syntax, one input
warning noting that ‘Type=Missing’ is now the default and the other regarding the standard
errors; both of these warnings can be ignored) :
Syntax for no missing data:
TITLE: 0% Missingness;
DATA: FILE IS "c:\illustration.txt";
VARIABLE: NAMES ARE ID Group Covariat DV0Miss
DV10MCAR DV20MCAR DV50MCAR
DV10MAR DV20MAR DV50MAR;
USEVARIABLES group DV0Miss covariat;
MISSING = all (999);
ANALYSIS: TYPE IS MISSING;
ESTIMATOR IS ML;
ITERATIONS = 1000;
CONVERGENCE = 0.00005;
COVERAGE = 0.10;
MODEL: DV0Miss on group;
group with covariat;
DV0Miss with covariat;
OUTPUT: tech1 tech3
Syntax for 10% MCAR:
TITLE: FIML 10% MCAR;
DATA: FILE IS "c:\illustration.txt";
VARIABLE: NAMES ARE ID Group Covariat DV0Miss
DV10MCAR DV20MCAR DV50MCAR
Missing Data 5
DV10MAR DV20MAR DV50MAR;
USEVARIABLES group DV10MCAR covariat;
MISSING = all (999);
ANALYSIS: TYPE IS MISSING;
ESTIMATOR IS ML;
ITERATIONS = 1000;
CONVERGENCE = 0.00005;
COVERAGE = 0.10;
MODEL: DV10MCAR on group;
group with covariat;
DV10MCAR with covariat;
OUTPUT: tech1 tech3
Again, to conserve space, we do not show the syntax for the remaining five analyses. To
perform these analyses, one simply replaces DV10MCAR within the syntax above with
DV20MCAR, DV50MCAR, DV10MAR, DV20MAR, or DV50MAR.
Missing Data 6
Illustration.txt
1
0
-.850867829681874 -.154493593047363 -.154493593047363 .154493593047363 -.154493593047363 -.154493593047363 -.154493593047363 .154493593047363
2
0
-.211056217578008 -1.29827630533749 -1.29827630533749 1.29827630533749 999
-1.29827630533749 -1.29827630533749 999
3
0
-1.58158061928887 -1.84729308697254 -1.84729308697254 999
999
999
999
999
4
0
-.892182736398273 -.749896078284945 -.749896078284945 .749896078284945 -.749896078284945 -.749896078284945 -.749896078284945 999
5
0
-.0154904264600894 -.0350404123315361 -.0350404123315361 .0350404123315361 999
-.0350404123315361 -.0350404123315361 999
6
0
-.617416180956403 -.791173756467051 -.791173756467051 .791173756467051 -.791173756467051 -.791173756467051 -.791173756467051 999
7
0
-.160947342612139 -1.39305601560536 -1.39305601560536 1.39305601560536 -1.39305601560536 -1.39305601560536 -1.39305601560536 999
8
0
.0822082173324673 -.592680564158753 -.592680564158753 .592680564158753 -.592680564158753 -.592680564158753 -.592680564158753 .592680564158753
9
0
-2.84579505283808 -.928260768294282 -.928260768294282 .928260768294282 999
999
999
999
10
0
-.750327378280308 -.566730518398268 -.566730518398268 .566730518398268 -.566730518398268 999
999
999
11
0
-.192208289689706 1.6010839421218
1.6010839421218
999
999
1.6010839421218
1.6010839421218
1.6010839421218
12
0
-.29929998834839 -.706578484159677 -.706578484159677 .706578484159677 999
-.706578484159677 -.706578484159677 -.706578484159677
13
0
.397103644817022 -.854409356857959 -.854409356857959 .854409356857959 -.854409356857959 -.854409356857959 -.854409356857959 .854409356857959
14
0
-1.86598842125329 -2.01778396402153 -2.01778396402153 2.01778396402153 -2.01778396402153 -2.01778396402153 -2.01778396402153 999
15
0
-.563836006435321 .336646818210308 .336646818210308
.336646818210308 .336646818210308 .336646818210308 999
999
16
0
-.147133183225006 1.09641236468901 1.09641236468901
1.09641236468901 1.09641236468901 1.09641236468901 1.09641236468901
1.09641236468901
17
0
-.0355818140701889 -.0997428904192106 999
999
999
.0997428904192106 -.0997428904192106 -.0997428904192106
18
0
.887897514907359 -.182639887782136 -.182639887782136 .182639887782136 999
-.182639887782136 -.182639887782136 -.182639887782136
19
0
-1.48840552487275 -.539959146486662 -.539959146486662 .539959146486662 999
999
999
999
20
0
.108675987681218 -.437665939005987 -.437665939005987 999
999
-.437665939005987 -.437665939005987 -.437665939005987
Missing Data 7
21
0
-.20275634025207 -1.38987700220506 999
999
999
1.38987700220506 -1.38987700220506 -1.38987700220506
22
0
-.376034169822592 -.0396592797147315 -.0396592797147315 .0396592797147315 999
999
999
999
23
0
1.00077829138927 -.358278452919041 999
999
999
.358278452919041 -.358278452919041 -.358278452919041
24
0
-.542027337979399 .0171460549100705 .0171460549100705
.0171460549100705 .0171460549100705 .0171460549100705 .0171460549100705
.0171460549100705
25
0
-.849060312346486 -.549660453094282 -.549660453094282 .549660453094282 999
-.549660453094282 -.549660453094282 -.549660453094282
26
0
-1.80705390136401 -.289094061603656 -.289094061603656 .289094061603656 -.289094061603656 -.289094061603656 999
999
27
0
-.101662079471109 -.276314150394167 -.276314150394167 999
999
-.276314150394167 -.276314150394167 999
28
0
-3.2620755881702 -1.39658693937835 -1.39658693937835 1.39658693937835 -1.39658693937835 -1.39658693937835 999
999
29
0
-.0330451498263781 .24681560739095
.24681560739095
.24681560739095
.24681560739095
999
999
999
30
0
.0396122058845316 -.680536147388183 999
999
999
.680536147388183 999
999
31
1
1.05826453711523 .99917526641664
.99917526641664
.99917526641664
.99917526641664
.99917526641664
.99917526641664
.99917526641664
32
1
-.181826632905936 .29456458296444
.29456458296444
.29456458296444
.29456458296444
.29456458296444
.29456458296444
999
33
1
.325298620152614 -.124115873877004 -.124115873877004 .124115873877004 -.124115873877004 -.124115873877004 -.124115873877004 .124115873877004
34
1
-.554650029752322 -.622863667018844 -.622863667018844 .622863667018844 999
-.622863667018844 -.622863667018844 -.622863667018844
35
1
-.654212127213383 -1.01541423045684 -1.01541423045684 1.01541423045684 999
-1.01541423045684 999
999
36
1
-1.1177906863268 .209256887497134 .209256887497134
.209256887497134 999
.209256887497134 .209256887497134 999
37
1
-.181923486491488 -.717691998996674 -.717691998996674 .717691998996674 -.717691998996674 -.717691998996674 -.717691998996674 .717691998996674
38
1
.0570406365471001 .156146647451192 999
999
999
.156146647451192 .156146647451192 999
39
1
.664147230154368 -.288229623773199 -.288229623773199 .288229623773199 -.288229623773199 -.288229623773199 -.288229623773199 999
40
1
1.20738016510671 .251229411715677 .251229411715677
.251229411715677 .251229411715677 .251229411715677 .251229411715677
.251229411715677
Missing Data 8
41
1
1.49167062362998 1.82453435527453 1.82453435527453
1.82453435527453 1.82453435527453 1.82453435527453 1.82453435527453
1.82453435527453
42
1
.10443799659144
-.505317323754116 -.505317323754116 .505317323754116 999
-.505317323754116 -.505317323754116 999
43
1
-.549409315402947 .688840627679144 .688840627679144
.688840627679144 999
.688840627679144 .688840627679144
.688840627679144
44
1
1.2355847811718
.488225613985325 .488225613985325
.488225613985325 .488225613985325 .488225613985325 .488225613985325
.488225613985325
45
1
1.4159448165691
.102824753225284 .102824753225284
.102824753225284 999
.102824753225284 .102824753225284
.102824753225284
46
1
-.829901220499762 -.884557475454705 -.884557475454705 999
999
-.884557475454705 -.884557475454705 999
47
1
1.16593272486838 2.28809182337569 2.28809182337569
2.28809182337569 2.28809182337569 2.28809182337569 2.28809182337569
2.28809182337569
48
1
.43362804115754
.446226340180492 .446226340180492
.446226340180492 999
.446226340180492 .446226340180492 999
49
1
.206497558025471 .641067713273713 .641067713273713
.641067713273713 .641067713273713 .641067713273713 .641067713273713
999
50
1
.847932701494698 1.25236793850088 1.25236793850088
1.25236793850088 1.25236793850088 1.25236793850088 1.25236793850088
1.25236793850088
51
1
1.84759808931025 1.69199874491296 1.69199874491296
1.69199874491296 1.69199874491296 1.69199874491296 1.69199874491296
1.69199874491296
52
1
-.7760562984868
-.390328025772538 -.390328025772538 .390328025772538 -.390328025772538 -.390328025772538 999
999
53
1
1.83621118386067 1.06028971435592 1.06028971435592
1.06028971435592 999
1.06028971435592 1.06028971435592
1.06028971435592
54
1
1.63096618805731 1.72387743255306 1.72387743255306
1.72387743255306 999
1.72387743255306 1.72387743255306
1.72387743255306
55
1
2.43408888774903 .821243682071921 .821243682071921
.821243682071921 .821243682071921 .821243682071921 .821243682071921
.821243682071921
56
1
1.05397592438395 .507454217808059 .507454217808059 999
999
.507454217808059 .507454217808059 999
57
1
.0521649622721339 .238215282499099 .238215282499099
.238215282499099 .238215282499099 .238215282499099 .238215282499099
999
Missing Data 9
58
1
1.43465274304311 2.1426558324697
999
999
999
2.1426558324697
2.1426558324697
2.1426558324697
59
1
2.72640613683008 2.3463057079243
2.3463057079243
2.3463057079243
999
2.3463057079243
2.3463057079243
2.3463057079243
60
1
-1.20849872180246 -.748491890025146 -.748491890025146 .748491890025146 -.748491890025146 -.748491890025146 -.748491890025146 999
Download