A Bayesian Perspective on Unmeasured Confounding in Large Administrative Databases Lawrence McCandless

advertisement
A Bayesian Perspective on Unmeasured
Confounding in Large Administrative
Databases
Lawrence McCandless
lmccandl@sfu.ca
Faculty of Health Sciences, Simon Fraser University, Vancouver Canada
Summer 2014
My Background
• I work on Bayesian methods for causal inference
(epidemiology).
• Develop Bayesian methods to explore effects of
unmeasured confounding. Sensitivity Analysis
• Application areas:
• Pharmacoepidemiology
• Mental health epidemiology
• Causal inference with large administrative databases (e.g.
health records)
Today’s Talk
Causal Mediation Analysis
Unmeasured Confounding
Bayesian Methods
Outline
• Background: What is causal mediation analysis?
• Data Example: Mortality in criminal offenders using large
administrative databases
• Partially Missing Confounders: Example of multiple
imputation and Bayesian sensitivity analysis
What is Mediation Analysis?
In health research it is often necessary to disentangle the
causal pathways that link exposure to disease.
The goals of mediation analyses are to identify
• the total effect of the exposure on disease,
• the effect of the exposure that acts through a given set of
intermediate variables (indirect effect), and
• the effect of the exposure unexplained by those same
intermediate variables (direct effect).
Richiardi et al. Int J Epi (2013)
Mediation analysis in epidemiology
Mediation analysis concerns intermediate variables on the
causal pathway between exposure and outcome
Hafeman (2009) Int J Epidemiol
Example: Survival Analysis of Time-to-Death in
Criminal Offenders
Health
Gender
Age
Criminal Sentences
Mental Illness
(Log Rate)
Addiction
Data Source: Ministry of Justice, Goverment of British Columbia, Canada.
Death
How to Estimate Direct & Indirect effects??
The traditional approach to mediation analysis is based on
comparing two regression models for the outcome variable,
one with and one without adjusting for the intermediate
variable.
If adjustment for the intermediate variable greatly attenuates
the exposure effect, then we conclude that the exposure effect
is mediated primarily through the intermediate.
This is the “Difference in Coefficients” Approach described in
Baron and Kenny 1980’s.
Illustration of Baron & Kenny methods
“Product of coefficients” method
There also is a related “Product of coefficients” approach to
mediation analysis.
Let T denote time until death or censoring
Let X denote a dichotomous exposure variable,
Let M denote a continous intermediate variable
In a mediation analysis, we write down a model for both the
mediator and outcome:
P(T , M|X ) = P(T |M, X ) × P(M|X )
| {z }
| {z }
OutcomeModel
MediatorModel
Illustration of Baron & Kenny method
“Product of coefficients” method
Suppose that T follows a proportional hazards model, and M is
continuous and normally distributed.
Then we could use
1) Weibull outcome model for T :
h(T |X , M) = exp(βX X + βM M) × λT λ−1
2) linear regression model for M:
M|X = γ0 + γX X + where ∼ N(0, σ 2 ).
Illustration of Baron & Kenny method
“Product of coefficients” method
The direct effect is βX
The indirect effect is γx × βM
Indirect Effect
M
γ
βM
X
X
T
βX
Direct Effect
Illustration of Baron & Kenny method
The “product of coefficients” method is criticized because it is
invalid for non-linear outcome models, and also invalid if there
are interactions between exposure and mediator
However, if the disease is rare and there are no interactions,
then it approximates the Natural/Controlled Direct and
Natural Indirect Effects.
Vanderweele (2013) Epidemiol: shows:
log HRNDE = βX + . . .
log HRNIE = βM × γX + . . .
Mediation Analysis Results
Characteristic
Outcome
Death
Exposure
Addiction
Mediator
Sentencing rate (sentences/yr)
Covariates
Female
Age
<25
25-44
>40
Number (%) or Mean
n=79088
1841 (2.3%)
11673 (14%)
1 per 2 yrs
15453 (20%)
25433 (32%)
29623 (38%)
24032 (30%)
20+ Other covariates: race/ethnicity; education; mental
illness; health services use; hospitalization; disability; type of
criminal offense;
Mediation Analysis Results
Addiction
Hazard Ratio for Death∗
Direct Effect
Indirect Effect
Total Effect
HR 95% CI
HR 95% CI
HR 95% CI
1.20 (1.08-1.30) 1.40 (1.38-1.44) 1.68 (1.51, 1.82)
∗
Adjusted for 20+ covariates
∗
Calculated using method of Vanderweele (2013) + boostrap
Conclusion: A large indirect effect.
Addiction is associated with mortality that is mediated by high
rates of criminal sentencing.
Mediation Analysis Results
The direct effect is βX = 0.17
The indirect effect is γx × βM = 0.23 × 1.51 = 0.34
Indirect Effect
M
γ
X
βM =1.51
=0.23
X
T
β =0.17
X
Direct Effect
The Problem of Confounding
Unmeasured confounding can plague causal inferences in
administrative databases.
The association between mediator and outcome is biased from
criminogenic factors.
High risk offenders face problems with ...
Poverty
Family Criminal Behavior
Peers
Mental Illness
Cognition
The Problem of Confounding
This is called Mediator-Outcome confounding
Cognition
Family Criminal Behavior
Criminal Sentences
(Log Rate)
Addiction
Peers
Mental Illness
Poverty
Death
Two Important Partially Missing Confounders
RNA scores
The Risk Need Assement (RNA) score is a validated
21-question instrument that predicts re-offending.
RNA score (Criminal History)
RNA score (Behaviour)
% Missing
20.4%
20.4%
Labels
1/0
1/0
Example: High-risk offenders are more deprived, and
consequently more likely to die.
→ Indirect effect is biased away from Null
Diagnostics:
Analysis of the Complete Data ONLY
Addiction∗
Hazard Ratio for Death
Direct Effect
Indirect Effect
Total Effect
HR 95% CI
HR 95% CI
HR 95% CI
1.18 (1.07-1.29) 1.39 (1.35-1.43) 1.64 (1.47, 1.81)
Addiction†
1.17 (1.04-1.26)
1.27 (1.24-1.30)
1.48 (1.30, 1.61)
∗
Calculated
∗
using method of Vanderweele (2013) + boostrap
Adjusted for 20+ covariates
†
Adjusted for 20+ covariates and RNA scores
Conclusion: When we adjust for RNA scores, we see
attenuation of indirect effect.
Correlation Among Partially Missing Confounders
in the complete data
A 2 × 2 table of the binary missing confounders.
RNA score
(Criminal History)
RNA score
(Behaviour)
22912 19119
6624 13343
The OR is 2.41 with 95% CI (2.32, 2.49).
To adjust for confounding, we require a model for the joint
distribution of the 2 partially missing confounders.
Bayesian adjustment for partially missing
confounders
Proposed method:
Use Bayesian methods to average over partially missing RNA
scores. Similar to multiple imputation.
Methodological challenges:
• We require a joint model for missing confounders
(challenging in high dimension)
• Bayesian MCMC computing is hard in large samples
• Missing confounders perhaps not missing at random
(NMAR)
• Can be combined with a Bayesian sensitivity analysis for
other unmeasured confounders.
Bayesian adjustment for partially missing
confounders
Outcome
Exposure variable
Mediating variable
Covariates
Covariates
Symbol
T
X
M
C
U
Description
Time until death or censoring
Addiction
Rate of criminal sentencing (log)
Age, Sex, Measures of health status, ...
RNA1 , RNA2
Bayesian adjustment for 2 missing dichotomous
confounders
We already have
P(T |X , M, U, C)
{z
}
|
Outcome Model
P(M|X , U, C)
|
{z
}
Mediator Model
Now we include
P(U, C) ∝ exp{βU1 U1 + βU2 U2 + βU1 ,U2 U1 U2 + . . .}
To give a full probability distribution for P(T , M, U, C)
Bayesian Computation
We assign relatively noninformative prior distributions to model
parameters
For example,
βX , βM , βU1 , βU2 , βC1 , . . . ∼ N(0, 106 )
In fact, because MCMC computation is so challenging in large
samples, I udpate parameters by sampling from distribution of
MLE using standard regression software
(e.g. survreg(), lm(), glm())
Bayesian Computation
Bayesian computation proceeds using MCMC in 2 interative
stages:
• Step 1 Draw Imputations.
Sample U from
P(U|T , X , M, C) ∝ P(T |X , M, C)P(M|X , C)P(U, C)
• Step 2 Update parameters given imputations
Step 1 can be done analytically, but challenging in high
dimensional U.
Step 2 can approximated using standard regression software.
Mediation Analysis Results
∗
Addiction∗
Hazard Ratio for Death
Direct Effect
Indirect Effect
Total Effect
HR 95% CI
HR 95% CI
HR 95% CI
1.20 (1.08-1.30) 1.40 (1.38-1.44) 1.68 (1.51, 1.82)
Addiction†
1.20 (1.10-1.30)
1.29 (1.27-1.32)
Ignoring missing data; Method of Vanderweel (2013) + bootstrap
†
Bayesian adjustment for partially missing confounders
1.55 (1.40, 1.67)
Conclusion
There are important partially missing confounders that we can
control for using Bayesian methods.
Note that the complete case analysis produces almost identical
answers to the more complex method.
Conclusion
Additional issues:
A quote from from Kropko, Goodrich, Gelman and Hill (2014)
“Joint vs Conditional Approaches to MI”.
Conclusion
Bayesian approach is useful to explore sensitivity to
unmeasured or partially measured confounders.
We can model the confounder using a missing data model, and
incorporate prior information about the confounder from
external data.
Very relevant to analysis of large administrative databases,
which have large sample sizes.
More generally, Bayesian mediation analysis is exciting new
area of innovation in biostatistics.
Thank You!
References:
Daniels et al. (2012) Bayesian inference for the causal effect of mediation
Biometrics.
McCandless LC, Richardson S, Best N. (2012) Adjustment for missing
confounders using external validation data and propensity scores. Journal of
the American Statistical Association 107:40-51.
McCandless LC, Gustafson P, Levy AR, Richardson S. (2012) Hierarchical
priors for bias parameters in Bayesian sensitivity analysis for unmeasured
confounding. Statistics in Medicine 31:383-96.
McCandless LC, Gustafson P, Levy AR. (2007) Bayesian sensitivity analysis
for unmeasured confounding in observational studies. Statistics in Medicine.
26:2331–47.
VanderWeele (2011) Causal mediation analysis with survival data
Epidemiology.
Lange, Vansteelandt (2012) A simple unified approach to estimating natural
direct and indirect effects Am J Epidemiol.
Download