A Bayesian Perspective on Unmeasured Confounding in Large Administrative Databases Lawrence McCandless lmccandl@sfu.ca Faculty of Health Sciences, Simon Fraser University, Vancouver Canada Summer 2014 My Background • I work on Bayesian methods for causal inference (epidemiology). • Develop Bayesian methods to explore effects of unmeasured confounding. Sensitivity Analysis • Application areas: • Pharmacoepidemiology • Mental health epidemiology • Causal inference with large administrative databases (e.g. health records) Today’s Talk Causal Mediation Analysis Unmeasured Confounding Bayesian Methods Outline • Background: What is causal mediation analysis? • Data Example: Mortality in criminal offenders using large administrative databases • Partially Missing Confounders: Example of multiple imputation and Bayesian sensitivity analysis What is Mediation Analysis? In health research it is often necessary to disentangle the causal pathways that link exposure to disease. The goals of mediation analyses are to identify • the total effect of the exposure on disease, • the effect of the exposure that acts through a given set of intermediate variables (indirect effect), and • the effect of the exposure unexplained by those same intermediate variables (direct effect). Richiardi et al. Int J Epi (2013) Mediation analysis in epidemiology Mediation analysis concerns intermediate variables on the causal pathway between exposure and outcome Hafeman (2009) Int J Epidemiol Example: Survival Analysis of Time-to-Death in Criminal Offenders Health Gender Age Criminal Sentences Mental Illness (Log Rate) Addiction Data Source: Ministry of Justice, Goverment of British Columbia, Canada. Death How to Estimate Direct & Indirect effects?? The traditional approach to mediation analysis is based on comparing two regression models for the outcome variable, one with and one without adjusting for the intermediate variable. If adjustment for the intermediate variable greatly attenuates the exposure effect, then we conclude that the exposure effect is mediated primarily through the intermediate. This is the “Difference in Coefficients” Approach described in Baron and Kenny 1980’s. Illustration of Baron & Kenny methods “Product of coefficients” method There also is a related “Product of coefficients” approach to mediation analysis. Let T denote time until death or censoring Let X denote a dichotomous exposure variable, Let M denote a continous intermediate variable In a mediation analysis, we write down a model for both the mediator and outcome: P(T , M|X ) = P(T |M, X ) × P(M|X ) | {z } | {z } OutcomeModel MediatorModel Illustration of Baron & Kenny method “Product of coefficients” method Suppose that T follows a proportional hazards model, and M is continuous and normally distributed. Then we could use 1) Weibull outcome model for T : h(T |X , M) = exp(βX X + βM M) × λT λ−1 2) linear regression model for M: M|X = γ0 + γX X + where ∼ N(0, σ 2 ). Illustration of Baron & Kenny method “Product of coefficients” method The direct effect is βX The indirect effect is γx × βM Indirect Effect M γ βM X X T βX Direct Effect Illustration of Baron & Kenny method The “product of coefficients” method is criticized because it is invalid for non-linear outcome models, and also invalid if there are interactions between exposure and mediator However, if the disease is rare and there are no interactions, then it approximates the Natural/Controlled Direct and Natural Indirect Effects. Vanderweele (2013) Epidemiol: shows: log HRNDE = βX + . . . log HRNIE = βM × γX + . . . Mediation Analysis Results Characteristic Outcome Death Exposure Addiction Mediator Sentencing rate (sentences/yr) Covariates Female Age <25 25-44 >40 Number (%) or Mean n=79088 1841 (2.3%) 11673 (14%) 1 per 2 yrs 15453 (20%) 25433 (32%) 29623 (38%) 24032 (30%) 20+ Other covariates: race/ethnicity; education; mental illness; health services use; hospitalization; disability; type of criminal offense; Mediation Analysis Results Addiction Hazard Ratio for Death∗ Direct Effect Indirect Effect Total Effect HR 95% CI HR 95% CI HR 95% CI 1.20 (1.08-1.30) 1.40 (1.38-1.44) 1.68 (1.51, 1.82) ∗ Adjusted for 20+ covariates ∗ Calculated using method of Vanderweele (2013) + boostrap Conclusion: A large indirect effect. Addiction is associated with mortality that is mediated by high rates of criminal sentencing. Mediation Analysis Results The direct effect is βX = 0.17 The indirect effect is γx × βM = 0.23 × 1.51 = 0.34 Indirect Effect M γ X βM =1.51 =0.23 X T β =0.17 X Direct Effect The Problem of Confounding Unmeasured confounding can plague causal inferences in administrative databases. The association between mediator and outcome is biased from criminogenic factors. High risk offenders face problems with ... Poverty Family Criminal Behavior Peers Mental Illness Cognition The Problem of Confounding This is called Mediator-Outcome confounding Cognition Family Criminal Behavior Criminal Sentences (Log Rate) Addiction Peers Mental Illness Poverty Death Two Important Partially Missing Confounders RNA scores The Risk Need Assement (RNA) score is a validated 21-question instrument that predicts re-offending. RNA score (Criminal History) RNA score (Behaviour) % Missing 20.4% 20.4% Labels 1/0 1/0 Example: High-risk offenders are more deprived, and consequently more likely to die. → Indirect effect is biased away from Null Diagnostics: Analysis of the Complete Data ONLY Addiction∗ Hazard Ratio for Death Direct Effect Indirect Effect Total Effect HR 95% CI HR 95% CI HR 95% CI 1.18 (1.07-1.29) 1.39 (1.35-1.43) 1.64 (1.47, 1.81) Addiction† 1.17 (1.04-1.26) 1.27 (1.24-1.30) 1.48 (1.30, 1.61) ∗ Calculated ∗ using method of Vanderweele (2013) + boostrap Adjusted for 20+ covariates † Adjusted for 20+ covariates and RNA scores Conclusion: When we adjust for RNA scores, we see attenuation of indirect effect. Correlation Among Partially Missing Confounders in the complete data A 2 × 2 table of the binary missing confounders. RNA score (Criminal History) RNA score (Behaviour) 22912 19119 6624 13343 The OR is 2.41 with 95% CI (2.32, 2.49). To adjust for confounding, we require a model for the joint distribution of the 2 partially missing confounders. Bayesian adjustment for partially missing confounders Proposed method: Use Bayesian methods to average over partially missing RNA scores. Similar to multiple imputation. Methodological challenges: • We require a joint model for missing confounders (challenging in high dimension) • Bayesian MCMC computing is hard in large samples • Missing confounders perhaps not missing at random (NMAR) • Can be combined with a Bayesian sensitivity analysis for other unmeasured confounders. Bayesian adjustment for partially missing confounders Outcome Exposure variable Mediating variable Covariates Covariates Symbol T X M C U Description Time until death or censoring Addiction Rate of criminal sentencing (log) Age, Sex, Measures of health status, ... RNA1 , RNA2 Bayesian adjustment for 2 missing dichotomous confounders We already have P(T |X , M, U, C) {z } | Outcome Model P(M|X , U, C) | {z } Mediator Model Now we include P(U, C) ∝ exp{βU1 U1 + βU2 U2 + βU1 ,U2 U1 U2 + . . .} To give a full probability distribution for P(T , M, U, C) Bayesian Computation We assign relatively noninformative prior distributions to model parameters For example, βX , βM , βU1 , βU2 , βC1 , . . . ∼ N(0, 106 ) In fact, because MCMC computation is so challenging in large samples, I udpate parameters by sampling from distribution of MLE using standard regression software (e.g. survreg(), lm(), glm()) Bayesian Computation Bayesian computation proceeds using MCMC in 2 interative stages: • Step 1 Draw Imputations. Sample U from P(U|T , X , M, C) ∝ P(T |X , M, C)P(M|X , C)P(U, C) • Step 2 Update parameters given imputations Step 1 can be done analytically, but challenging in high dimensional U. Step 2 can approximated using standard regression software. Mediation Analysis Results ∗ Addiction∗ Hazard Ratio for Death Direct Effect Indirect Effect Total Effect HR 95% CI HR 95% CI HR 95% CI 1.20 (1.08-1.30) 1.40 (1.38-1.44) 1.68 (1.51, 1.82) Addiction† 1.20 (1.10-1.30) 1.29 (1.27-1.32) Ignoring missing data; Method of Vanderweel (2013) + bootstrap † Bayesian adjustment for partially missing confounders 1.55 (1.40, 1.67) Conclusion There are important partially missing confounders that we can control for using Bayesian methods. Note that the complete case analysis produces almost identical answers to the more complex method. Conclusion Additional issues: A quote from from Kropko, Goodrich, Gelman and Hill (2014) “Joint vs Conditional Approaches to MI”. Conclusion Bayesian approach is useful to explore sensitivity to unmeasured or partially measured confounders. We can model the confounder using a missing data model, and incorporate prior information about the confounder from external data. Very relevant to analysis of large administrative databases, which have large sample sizes. More generally, Bayesian mediation analysis is exciting new area of innovation in biostatistics. Thank You! References: Daniels et al. (2012) Bayesian inference for the causal effect of mediation Biometrics. McCandless LC, Richardson S, Best N. (2012) Adjustment for missing confounders using external validation data and propensity scores. Journal of the American Statistical Association 107:40-51. McCandless LC, Gustafson P, Levy AR, Richardson S. (2012) Hierarchical priors for bias parameters in Bayesian sensitivity analysis for unmeasured confounding. Statistics in Medicine 31:383-96. McCandless LC, Gustafson P, Levy AR. (2007) Bayesian sensitivity analysis for unmeasured confounding in observational studies. Statistics in Medicine. 26:2331–47. VanderWeele (2011) Causal mediation analysis with survival data Epidemiology. Lange, Vansteelandt (2012) A simple unified approach to estimating natural direct and indirect effects Am J Epidemiol.