Appendix

advertisement
Supplemental Materials for “Simulation Study of Instrumental Variable Approaches with an
application of a Study of the Antidiabetic Effect of Bezafibrate” by Bing Cai, Sean Hennessy,
James Flory, Daohang Sha, Tom Ten Have, and Dylan Small
Appendix I: GSMM Estimation Procedure
The estimation procedure is based on the assumption that the mean of the potential outcomes under no
treatment is the same across the levels of the instrumental variable R . Consequently, the first step of
estimation is to obtain the potential outcome under no treatment for each patient, which we express as:



T
E Y (0)  *  | X  x, Z  z   R expit 11 z  βˆ 12
x  z *  1  R  expit 01 z  βˆ T02 x  z *

(A1)
 
(0)
*
where expit .  exp(.) / 1  exp(.) . The notation Y  indicates that it is the true potential
treatment-free outcome Y (0) if the true treatment effect  is equal to  * . We estimate
T
, β̂12 and β̂ 02 , respectively.
E Y (0)  *  | X  x, Z  z  using the estimated values of β T02 and β12
These estimates are obtained at the start of estimation using logistic regression of Y on X for patients
with Z  1 and Z  0 respectively; these estimates do not change across the iterations of the estimation
process for  . On the right-hand side of the equation (A1), the first component
 R expit  
11

T
z  β12
x  z *  corresponds to the putative mean of the treatment-free outcome,
Y (0)  *  , for those patients with R  1 ; and the second component
1  R  expit  
01

z  βT02 x  z *  corresponds to the mean of the putative treatment-free outcome for
R  0 . When  *   , E Y (0)  *  | X  x, Z  z  = E Y (0) | X  x, Z  z  , so the IV is
 
(0)
*
independent of Y  under the assumption that the IV is unrelated to any unmeasured confounders,
the expectation of the treatment-free potential outcome is the same between the two levels of the IV. This
equality results in the following estimating equation:
 d  x  R  p  x E Y   | X  x, Z  z   q  x  = 0
(0)
*
(A2)
The functions p ( x ) and d ( x ) are obtained with the predicted values from the standard logistic regression
of R and Z on X , respectively, prior to solving equation (A2). The function q(x) is obtained with the
expit transformation of the predicted values from a standard linear regression of the logit transformation
 
of Y (0)  * on the baseline covariates, X . Vansteelandt and Goetghebeur present selection criteria for
d  x  and q  x  that maximize the efficiency of the resulting estimator of  .2 The selection of these
functions does not affect the bias of this estimator. However, the unbiased property of the estimate of 
requires that the probability model for the IV, p ( x ) , be specified correctly. 2 We find an approximate
solution  * iteratively by the Newton-Raphson method.
Appendix II: 2SPS and 2SRI estimation procedures
For both the two two-stage approaches (2SPS and 2SRI), the two stages of estimation correspond to
first obtaining estimates of the probability of treatment (bezafibrate prescription) based on a logistic
regression with the IV and baseline confounders as covariates. The predicted values for treatment ( Ẑ ) are
then used in two different ways as a covariate in the second-stage logistic model for the probability of
outcome (risk of first diabetes diagnosis). We note that for IV-based control of unmeasured confounding
to be effective, the baseline covariates in the first-stage model must also be in the second-stage model.
The specific first-stage model is the same for the 2SPS and 2SRI approaches:
Pr  Z  1| R  r, X  x   expit( 1   2r  γT3 x) .
For the second-stage model under the 2SPS procedure, the predicted value of treatment ( Zˆ  zˆ ) from the
first-stage model and baseline confounders are the covariates:


Pr Y  1| Zˆ  z, X  x  expit(1  2 zˆ  3x) .
where
2 is proposed to be the log odds ratio for the treatment effect on outcome with interpretations as
LATE or the treatment effect on the treated (TOT) depending on whether the assumptions in Section 2.1
or 2.2 of the main text are used respectively. In contrast, for the second-stage model under the 2SRI
method, the observed treatment ( Z ) is retained as a covariate, but along with the difference (residual)
between the observed and predicted values of treatment ( E  Z  Zˆ ):
Pr Y  1| Z  z, E  e, X  x   expit(1  2 z  3e  3x) .
where
2 is the effect of the observed treatment adjusting for the residual variable, E  e . This effect is
also proposed to be interpreted as the LATE or TOT depending on the assumptions. Both models purport
to yield LATE or TOT odds ratios. However, Cai et al. show both two-stage approaches yield biased
estimators of the LATE or TOT odds ratio.12
Appendix III: Simulation Study
Under the compliance class model, each patient in the simulated dataset was assigned to one of the
three compliance classes: compliers ( C ), always-takers (AT), and never-takers (NT) based on the
probability model below. Under the above monotonicity assumption, there are no defiers. First, we
simulated a baseline covariate X from the normal distribution with mean of 0 and variance of 1 for each
patient in the simulated dataset. Second, for the three compliance classes, we specified the multinomial
variable C with three levels ( 1  CO , 2  AT , 3  NT ) under the following probability distribution
used to simulate the compliance class to which each patient in the simulated dataset was assigned:
M  Pr  C  1| X  x   expit( 1   2 x)
 A  Pr  C  2 | X  x    1  P  C  1| X  x  
N  Pr  C  3| X  x   1  P  C=1| X  x  – P  C=2 | X  x 
where  is a measure of the unknown relationship between the complier and always-taker compliance
class. This parameterization  A allowed us to easily specify the condition of no always-takers by simply
setting   0 . For the simulations,  was set to 0 when the simulations omitted always-takers and 0.50
with always-takers. Additionally,  1  0 and  2  0.50 .
Once each patient was simulated in terms of being assigned to one of the three compliance classes,
we then simulated the binary IV level, independently of the compliance class assignment with a
probability of 0.50 for the R=1 and =0 levels. Given the compliance class and IV assignments for each
patient, we then were able to obtain by definition the treatment level for each patient. For example, if a
patient was in the complier group and assigned to the positive IV level (previous practice fibrate
prescription was bezafibrate), then the treatment level for the patient was also positive (current fibrate
prescription is bezafibrate). In contrast, the treatment level would be positive regardless of the IV level for
always-takers and always negative for never-takers.
Given the simulated compliance class, IV level, and covariate, we then simulated the binary outcome
from the following logistic model:
Pr Y  1| R  r, C  c, X  x   expit(1c 2c r 3 x) .
where  2c is the log effect of the IV on the risk of outcome for a given level c of C and given baseline
covariate vector X  x . Under either the monotonicity or no IV-treatment interaction assumptions,  2c
can also be interpreted as the effect of treatment (as opposed to the IV) on the outcome.22 When C  1
indicating the complier category,  2c is the effect of treatment among the compliers, which is the Local
Average Treatment Effect (LATE). If one were to relax the monotonicity assumption but instead assume
that the effect of treatment on outcome is the same across levels of the IV (i.e., no T  R interaction on
Y ), then  2c is the log effect of the treatment on the risk of outcome among the treated, or the treated-ontreated (TOT) effect. For c  CO , we specified 2c CO  1.25 for prevalent events and 2c CO  0.70
for rare events, so that those with treatment had a risk of outcome about double that for those without
treatment. For the always- and never-takers under the exclusion restriction, 2c  AT  2c  NT  0 .
To induce confounding, 1c was varied across the levels of C , leading to variation in the probability
of outcome across principal strata, which reflects confounding of the treatment on outcome relationship.
We parameterized this confounding with the following parameter   1c  NT 1c CO , which was varied
in the simulation study (results shown in Tables 1-3 in the main text). We also used the individual 1c
parameters to specify the level of risk of the outcome. Specifically, 1c CO  1c  AT  logit(0.03) for rare
outcomes and 1c CO  1c  AT  logit(0.30) . With  specified in the simulation tables below, the level of
risk for the never-takers is obtained as 1c  NT  1c CO   .
Download