Technical Appendix Statistical method As described in the main text

advertisement
Technical Appendix
Statistical method
As described in the main text, we used a three-step approach to estimate the effect of
CR on survival, while controlling for potential confounders. In this appendix, we describe
each of these steps in further detail. In the first step, we fitted a penalized Cox proportionalhazards regression model, estimating days of survival conditional on treatment status (CR / no
CR) and confounders. We used the lasso1, an automated variable selection method to shrink
the full set of variables in category f (i.e., variables without a specific a priori expectation on
their status as a confounder) to a much smaller set with maximal predictive power with
respect to the outcome of interest (survival). The lasso can be understood as a regression
model, in which the sum of the absolute values of the regression coefficients is less than a
constant. Its aim is similar to traditional automated variable selection methods, such as subset
selection (e.g., forward stepwise), but improves on these methods by yielding more stable
models, and greater predictive accuracy1. The constraint on the absolute values of the
regression coefficients is a tuning factor, which regulates the amount of shrinkage, with lower
values leading to more parsimonious models, i.e. models in which many coefficients are zero.
In our analysis, we fitted the lasso regression model 1000 times, over a range of increasing
values of this constraint. We used (10-fold) cross-validation, to choose the optimal constraint
based on a cross-validated partial-likelihood criterion, as proposed by Van Houwelingen et
al.2 and implemented in the glmnet package3 4 of the R statistical software version 2.15.25.
This step ensured that confounders included in our final model were indeed independently
related to the outcome (rather than being only related to treatment), which has been shown of
importance to minimize bias in the treatment effect6 7.In the second step we estimated the
propensity score, i.e. the probability of receiving CR as a function of all confounders selected
in the first step, using generalised boosted regression8 9. We subsequently estimated the
average treatment effect on the treated (ATT) by weighting all observations for patients who
did not receive CR, by the inverse of the propensity score. After propensity score weighting, it
was checked whether all confounders were equally distributed across both groups.
In the third step, we estimated the effect of CR on survival using a traditional Cox
proportional hazards model in this weighted cohort, including the treatment indicator and all
covariates (commonly referred to as a "doubly-robust" approach10 11 12. We tested the
assumption of proportional hazards over time, and report estimates for three different followup times (12, 24 and 48 months after study inclusion).
Results
We used a total of 919 potential confounders. Besides the 26 confounders listed in
Table 1 of the main text, for which we had an a priori expectation of their association with
survival, we identified an additional set of 892 other characteristics that were associated with
survival in our dataset. These included diagnoses related to a hospital visit or discharge (160
variables), hospital treatments (82), outpatient prescriptions at 5-digit ATC chemical
subgroup level (154) and other medical services and products (496), all measured during the
12 month period before study inclusion. Because we included all confounders both as 0/1
indicators, as actual quantities (e.g. the volume of a prescription), and as aggregated variables
(e.g. prescriptions at the 3-digit therapeutic subgroup level), the total number of variables was
2718. From this initial set of variables, 99 remained after applying Cox proportional hazards
regression to predict survival, using the lasso and 10-fold cross-validation.
1
Tibshirani R. Regression Shrinkage and Selection via the Lasso." Journal of the Royal
Statistical Society B; 1996(58): 267-288..
2
van Houwelingen HC, Bruinsma T, Hart AAM, van't Veer LJ, Wessels LFA. Cross-
Validated Cox Regression on Microarray Gene Expression Data. Statistics in Medicine, 2006
Sep 30;25(18):3201-16..
3
Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via
Coordinate Descent. Journal of Statistical Software 2010; 33(1):
4
Simon N, Friedman JH, Hastie T, Tibshirani R. Regularization Paths for Cox's Proportional
Hazards Model via Coordinate Descent. Journal of Statistical Software 2011; 39(5)
5
R Core Team. R: A language and environment for statistical computing. http://www.R-
project.org/. Vienna, Austria: R Foundation for Statistical Computing; 2012.
6
Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Sturmer T. Variable
selection for propensity score models. Am J Epidemiol 2006; 163(12):1149-1156.
7
Patrick AR, Schneeweiss S, Brookhart MA, Glynn RJ, Rothman KJ, Avorn J et al. The
implications of propensity score variable selection strategies in pharmacoepidemiology: an
empirical illustration. Pharmacoepidemiol Drug Saf 2011; 20(6):551-559.
8
Friedman JH. Greedy function approximation: A gradient boosting machine. Annals of
Statistics 2001; 29(5):1189-1232.
9
Friedman JH. Stochastic gradient boosting. Computational Statistics and Data Analysis
2002; 38(4):367-378.
10
Lunceford JK, Davidian M. Stratification and weighting via the propensity score in
estimation of causal treatment effects: a comparative study. Stat Med 2004; 23(19):29372960.
11
Austin PC. The performance of different propensity-score methods for estimating
differences in proportions (risk differences or absolute risk reductions) in observational
studies. Stat Med 2010; 29(20):2137-2148.
12
Funk MJ, Westreich D, Wiesen C, Sturmer T, Brookhart MA, Davidian M. Doubly robust
estimation of causal effects. Am J Epidemiol 2011; 173(7):761-767.
Download