Online Supplementary Material Manuscript ID: LIDA1518

advertisement
Online Supplementary Material
Manuscript ID: LIDA1518
Subsample ignorable likelihood for accelerated failure time models with
missing predictors
1. Introduction
This report, which is a supplement to the paper, “Subsample ignorable likelihood
for accelerated failure time models with missing predictors”, consists of two parts. (1) In
section 2, we conduct Monte Carlo studies to examine the performance of the proposed
SILAFT method when the covariates (x and w) are missing at a higher rate (40% and
60%, respectively). (2) In section 3, we provide the WinBUGS code for implementing the
SILAFT model on the motivating example in the manuscript.
2. Simulation studies when covariates are missing at missing at higher rates
To examine the performance of the proposed SILAFT model when covariates are
missing at higher rates, we simulate data from the same Log-Normal model as in the
manuscript, but with the covariates x and w missing at higher rates (40% and 60%,
respectively).
Missing values of W and X were then generated from the following two logistic
models:

logit  P( R

logit P( Rwi  0 | zi , wi , xi , (ti ,  i ))   0( w)   z( w) zi   w( w) wi   x( w) xi   t( w)ti
xi

 0 | Rwi  1, zi , wi , xi , (ti ,  i ))   0( x )   z( x ) zi   w( x ) wi   x( x ) xi  t( x )ti
with xi fully observed when wi is missing.
For the missing data generation schemes above, CC analysis is valid if both  t( w )
and  t( x ) are zero; IL is valid if  w( w ) ,  x( w ) and  x( x ) are zero; SILAFT is valid if  t( w ) and
 x( x ) are zero. Four missing data mechanisms were created using different sets of values
for the regression coefficients such that, in mechanism (I) all three methods (CC, IL and
SILAFT) are consistent, while in mechanisms (II), (III) and (IV), just one of the three
methods is valid. The simulation setup is summarized in Table A1. These missing data
mechanisms all generate approximately 50% and 35% values missing in W and X,
respectively.
Table A2 summarizes the root mean squared errors (RMSEs) of estimates of all
the regression coefficients, and Tables A3 reports respectively the empirical bias, RMSE
and coverage probability of estimates of the individual regression coefficients. Results in
bold type reflect situations where the method is consistent based on the theory of Section
4 in the manuscript, and hence should do well. The results are based on 1000 repetitions
in each simulation.
Table A1: Missing data mechanisms generated in the simulations (50% missingness
in w and 35% missingness in x)
Mechanisms
 0( w )
 z( w )
 w( w )
 x( w )
 t( w )
 0( x )
 z( x )
 w( x )
 x( x )
 t( x )
I: All valid
II: CC valid
III: IL valid
IV: SILAFT valid
0
-0.5
-1.4
-.5
1
1
1
1
0
1
0
1
0
1
0
0
0
0
0.25
0
1.4
1.4
0.7
-.3
1
1
1
1
0
1
1
1
0
1
0
0
0
0
0.25
0.25
Missing value of W and X are generated based on the following logistic models:
logit  P( Rw  0 | zi , wi , xi , (ti ,  i ))    0( w)   z( w) zi   w( w) wi   x( w) xi   t( w)ti
.
i


logit P( Rxi  0 | Rwi  1, zi , wi , xi , (ti ,  i ))   0( x )   z( x ) zi   w( x ) wi   x( x ) xi  t( x )ti
In particular, for the four missing data mechanisms:
I: Missingness of W = f(Z), Missingness of X = f(Z|W observed), all four methods are valid;
II: Missingness of W = f(Z,W, X), Missingness of X = f(Z,W,X|W observed), only CC valid;
III: Missingness of W = f(Z), Missingness of X = f(Z,W|W observed), only IL valid;
IV: Missingness of W= f(Z,W,(t,δ)), Missingness of X = f(Z,W,(t,δ),W observed), only SILAFT valid.
Table A2. Summary RMSEs*1000 of Estimated Regression Coefficients for Before
Deletion (BD), Complete Cases (CC), Ignorable Likelihood (IL) and Subsample
AFT model, under Four Missing Data Mechanisms
I
II
III
IV
BD
94
92
93
95
CC
261
268
707
591
IL
158
199
162
219
SILAFT
212
392
534
227
*Four missing data mechanisms:
I: Missingness of W = f(Z), Missingness of X = f(Z|W observed), all four methods are valid;
II: Missingness of W = f(Z,W, X), Missingness of X = f(Z,W,X|W observed), only CC valid;
III: Missingness of W = f(Z), Missingness of X = f(Z,W, (t ,  ) |W observed), only IML valid;
IV: Missingness of W= f(Z,W), Missingness of X = f(Z,W, (t ,  ) |W observed), only SILAFT valid.
RMSE estimates

1000* E  r  TRUE
2
 , with r denoting the r
th
repetition.
Bold values are for methods consistent for the mechanism generating the data
Table A3. RMSE, Empirical Bias, and 95% confidence coverage for Individual Regression Coefficients under Four Missing
Data Mechanisms (1000 replications)
RMSE*1000
Mechanism I
Method β0
βz
βw
Mechanism II
βx
β0
βz
βw
Mechanism III
βx
β0
βz
βw
Mechanism IV
βx
β0
βz
βw
βx
BD
46
35
65
35
43
34
66
34
46
35
65
32
45
31
68
36
CC
145
102
172
85
143
100
183
89
553
269
279
210
438
219
285
171
IL
73
42
124
49
103
45
158
44
77
45
127
47
152
58
138
49
SILAFT
118
82
139
70
327
109
169
82
377
247
208
196
109
76
171
69
βz
βw
βx
β0
Bias*1000
Method β0
βz
βw
βx
β0
βz
βw
βx
β0
4
BD
4
-4
1
-5
-2
7
3
0
CC
16
0
-2
-2
-1
8
13
3
IL
10
-4
-3
-4
81
-18
-98
-11
SILAFT
7
-2
6
-10
308
74
87
28
-4
-4
0
4
βz
-2
βw
-4
βx
0
-540 -249 -221 -191 -419 -196 -215 -146
8
-2
-8
2
-366 -236 -168 -183
135
44
5
8
21
14
-3
4
95% Confidence coverage
Method β0
βz
βw
βx
β0
βz
βw
βx
β0
βz
βw
βx
β0
βz
βw
βx
BD
96.2 93.8 95.7 92.4 95.7 94.8 93.9 96.1 94.2
97.5
94.5
94.9
95.1
97.6
92.2
93.7
CC
95.2 95.2 92.9 95.2 94.4 95.2 93.9 97.0
2.2
29.8
75.6
35.3
10.7
50.0
79.6
55.8
IL
94.8 97.1 95.7 92.9 77.5 91.3 87.9 93.9
97.5
95.3
94.9
95.6
42.7
85.9
93.2
95.1
SILAFT
93.3 95.2 94.3 96.7 18.2 81.4 87.4 93.1
3.6
15.3
74.9
21.8
94.7
97.6
93.2
96.6
3. WinBUGS code of SILAFT model on the NLMS dataset
In this section, we provide the WinBUGS code for fitting the SILAFT model using the NLMS dataset. The follow-up time is
modeled as a log-normal regression model. The missing binary covariates are imputed based on logistic regression models. We
assume non-informative priors for all parameters involved.
# followup: follow –up time
# cens: censoring time
# eduhs: high school education or above vs. less than HS
# adjincome: adjusted income
# raceb: Black race vs. White race
# raceo: Other race vs. White race
# sex: Female vs. Male
# married: Married vs. not married
# agebaseline: Age at baseline
model {
for (i in 1:N) {
followup[i] ~ dlnorm(mu[i], sigma) I(cens[i],)
mu[i] <- beta[1] + beta[2] * eduhs[i] + beta[3] * adjincome[i] + beta[4] * raceb[i]
+ beta[5] * raceo[i] + beta[6] * sex[i] + beta[7] * married[i] + beta[8] * agebaseline[i]
# married is modeled as a logistic model
married[i] ~ dbern(pm[i])
logit(pm[i]) <- bm[1] + bm[2] * sex[i] + bm[3] * agebaseline[i]
# education is modeled as a logistic model
eduhs[i] ~ dbern(pe[i])
logit(pe[i]) <- be[1] + be[2] * sex[i] + be[3] * agebaseline[i]
# race: black vs. white is modeled as a logistic model
raceb[i] ~ dbern(prb[i])
logit(prb[i]) <- brb[1] + brb[2] * sex[i] + brb[3] * agebaseline[i]
# race: other vs. white is modeled as a logistic model
raceo[i] ~ dbern(pro[i])
logit(pro[i]) <- bro[1] + bro[2] * sex[i] + bro[3] * agebaseline[i]
}
# priors
for (j in 1:8) {
for (k in 1:3) {
for (l in 1:3) {
for (m in 1:3) {
for (n in 1:3) {
beta[j] ~ dnorm(0, .001) }
bm[k] ~ dnorm(0, .001) }
be[l] ~ dnorm(0, .001) }
brb[m] ~ dnorm(0, .001) }
bro[n] ~ dnorm(0, .001) }
sigma ~ dgamma(0.001, 0.001)
}
Download