Sensitivity Analysis with Several Unmeasured Confounders Lawrence McCandless

advertisement
Sensitivity Analysis with Several Unmeasured
Confounders
Lawrence McCandless
lmccandl@sfu.ca
Faculty of Health Sciences, Simon Fraser University, Vancouver Canada
Spring 2015
Outline
The problem of several unmeasured confounders
• Background: Sensitivity analysis for a single binary
unmeasured confounder.
• New methodology: Sensitivity analysis for several
unmeasured confounders.
• The role of Bayesian inference: We assign probability
distributions to sensitivity parameters.
Background
Binary Unmeasured Confounders
To explore sensitivity to unmeasured confounding, researchers
often assume that there is one unmeasured confounder
(typically binary).
This approach dates back to Rosenbaum and Rubin (1983),
and Lin, Psaty and Kronmal et al. (1998), and others.
The assumption of 1 binary unmeasured confounder is
appealing because it is
• Tractable mathematically
• Leads to simple bias-adjustment formulas
• Low dimensional with few bias parameter inputs
• Easy to explain, interpret and implement
Background
Unmeasured Confonding in the Real World
In reality, there are often several unmeasured confounders, and
there is little guidance in the statistical literature how to
proceed.
Examples:
Statins and lower fracture risk in the elderly
Unmeasured confounders: health-seeking behaviors, BMI,
physical activity, smoking, alcohol consumption
Lead exposure in childhood and lower IQ
Unmeasured confounders: pesticide exposure, breastfeeding,
poor parenting, maternal depression, iron deficiency, tobbacco
exposure, poverty, and pica.
The Problem of Several Unmeasured Confounders
There are no methods available to explore sensitivity to
several (say 5) unmeasured confounders.
Method input:
“Please list your 5 confounders and their properties”
Method output:
Bias-corrected point and interval estimates for causal effects.
However: see notes below for other related methods
Why does problem of several unmeasured
confounders remain unsolved?
The challenges:
1. We need to specify the relationship between 1) the
confounders and the outcome, 2) confounders and the
exposure, (High dimensional).
2. The confounders may be continuous, nominal or ordinal
(e.g. race or smoking). (More dimensions).
3. The unmeasured confounders may be correlated with one
another. (Less important)
4. The unmeasured confounders may be correlated with the
measured covariates. (More important)
5. The confounders may interact with exposure. (Less
important)
Why does problem of several unmeasured
confounders remain unsolved?
More challenges:
6. Even if the laundry-list of parameters are specified, we
must still impute out the U (intergrate out of the model).
This integration is often not available analytically, except in
simple cases (like in Lin et al.). Custom statistical software
is required.
7. The inference is fundementally Bayesian because there is
uncertainty in the bias parameters.
8. We require content area expertise because the problem is
inherently qualitative. Statisticians often do not have this
expertise.
Methods for sensitivity analysis for several
unmeasured confounders
• Greenland (2005) JRSSA, treats U as a compound of
other variables, or “sufficient summary”
• Rosenbaum (2002), Departures from random assignment
by factor Γ.
• McCandless (2012) JASA use validation data.
• Hsu (2013) Biometrics, calibrated sensitivity analysis.
• Deng (2013) Biometrika Cornfield conditions
• Maclehose & Kaufman (2005) Epidemiol, linear
programming
• Brumbeck (2004) Stat Med, Sensitivity analysis based on
potential outcomes.
• Vanderweele & Arah (2011) Epidemiol, formulas for
general scalar U.
Greenland 2005 JRSSA
Lin et al. (1998) Biometrics
A New Methodology
Consider a prospective cohort study with equal follow-up,
where
• Y be an continous outcome measure
• X is a dichotomous 0/1 exposure at baseline
• C is a p × 1 vector of measured covariates
• U is a q × 1 vector of unmeasured confounders that are
quantitative and continuous
A New Methodology
Building on Lin et al. (1998), factorise
P(Y , U|X , C) = P(Y |X , C, U)P(U|X , C), and write
Y |X , C, U = β0 + βX X +
βCT C
+ βU
q
X
Ui + j=1
U|X , C ∼ MVN (α0 + αX X ) × 1, Σρ ,
where 1 is a q-vector of 1’s, and Σρ is a q × q covariance matrix


1
ρ ... ρ
ρ
1 ... ρ 

Σρ = 
. . . . . . 1 . . .
ρ ... ... 1
with diagonal elements 1 and off-diagonals ρ (compound
symmetric).
Key Assumptions
I will call these the Duplicate Unmeasured Confounder (DUC)
assumptions.
1. U1 , U2 , . . . , Uq are equicorrelated with Y
2. U1 , U2 , . . . , Uq are equicorrelated with X
3. U1 , U2 , . . . , Uq are equicorrelated with one another
Other assumptions: Linearity; absence of interactions;
U ⊥⊥ C|X (zero correlation between measured and
unmeasured confounders); no measurement error, ...
A New Methodology
Sensitivity analysis for several unmeasured confounders
Define a new variable U ∗ =
Θ = Var
q
X
i=1
Pq
√
i=1 Ui /
Θ, where
!
Ui X , C = 1T Σρ 1 = q(1 + ρ(q − 1))
.
The quantity U ∗ is the sum of U1 , . . . , Uq rescaled to have
unit-variance, and normally distributed.
The idea: We replace the vector U with the scalar U ∗ , and we
are then within the the general framework of Lin, Psaty &
Kronmal (1998).
A New Methodology
Therefore the original model
E(Y |X , C, U) = β0 + βX X +
βCT C
+ βU
q
X
Ui
j=1
U|X , C ∼ MVN (α0 + αX X ) × 1, Σρ ,
becomes the new model
√
E(Y = 1|X , C, U ∗ ) = β0 + βX X + βCT C + βU ΘU ∗
n
o
√
U ∗ |X , C ∼ N q(α0 + αX X )/ Θ, 1 .
and this is embedded within the original framework of Lin,
Psaty & Kronmal (1998)
Inference
To conduct a sensitivity analysis, we can use maximum
likelihood
Q calculated from the observed data likelihood
L(.) = P(Yi |Xi , C i ) where
Z
P(Y |X , C) = P(Y |X , U, C)P(U|X , C)dU
Lin et al (1998) show how to do the integration analytically for
Gaussian scalar U for linear, log-linear or logistic response Y ,
so that we obtain, for example,
E(Y |X , C) = β0 + βU α0 +
h
√ √ i
βX + βU Θ × qαX / Θ X +
βCT C
A New Methodology
Therefore the bias on the causal effect parameter βX from q
unmeasured confounders under the DUC assumptions is equal
to
√ √ Bias = βU Θ × qαX / Θ = qβU αX
Lin et al. (1998), Vanderweele & Arah (2011)
A new methodology
Consequently, within our modelling framework
1. The confounding bias from q unmeasured confounders is
equal to q× the confounding bias of a single Ui (thus bias
is additive).
2. The correlation among the U1 , . . . , Uq (which is ρ) does not
affect the magnitude of bias.
Results #2 is surprising, but makes sense.
Demonstration with Numbers
One simulated dataset
correlation <- 0.999
## Correlation among the Us
k <- 10 ## Dimension of U
sigma <- matrix(correlation, nrow=k, ncol=k); diag(sigma) <- 1
n <- 10000 ## Sample size
X <- rbinom(n, 1, 0.5)
U <- X + matrix(rnorm(k*n), nrow=n, ncol=k) %*% chol(sigma)
Y <- rnorm(n, 0*X + apply(U, 1, sum), 1)
Demonstration with Numbers
One Simulated Dataset
Demonstration with Numbers ρ = 0.99999
High Correlation Among Unmeasured Confounders
Demonstration with Numbers ρ = 0
Zero Correlation Among Unmeasured Confounders
Conclusion
Within our modelling framework.
1. The confounding bias from q unmeasured confounders is
equal to q× the confounding bias of a single Ui (thus bias
is additive)
2. The correlation among the U1 , . . . , Uq (which is ρ) does not
affect the magnitude of bias.
Questions:
1. How general are these findings?
2. How useful in practice?
3. What about correlation between measure and unmeasured
confounders?
Bias from Several Unmeasured Confounders
How general are these findings?
•
•
•
•
What about binary outcomes and survival data?
What if U1 , . . . , Uq are binary?
What if Σρ is not compound symmetric?
What about weakening the Duplicate Unmeasued
Confounder (DUC) assumption?
E.g.
Y = β0 + βX X + βCT C + βU1 U1 + βU2 U2 + . . . + βUq Uq + where
βU1 , . . . , βUq ∼ N(µ, σ 2 )
instead of
Y = β0 + βX X +
βCT C
+ βU
q
X
j=1
Ui + Bias from Several Unmeasured Confounders
How useful in practice?
Rule of thumb (?):
If assume DUC then “k unmeasured confounders means k
times more bias”
... always true???
Brief Comment on the Role of Bayesian Statistics
The Bayesian approach is useful because it quantifies the
uncertainty about unmeasured confounding.
We assign a prior probability distribution to bias parameters.
Bayesian theorem gives posterior credible intervals that
incorporate uncertainty from unmeasured confounding.
The Bayesian approach is useful to obtain simple summarizes
in sensitivity analysis when there are multiple bias parameter
inputs.
McCandless et al. (2007) Stat Med
Gustafson, Greenland (2009) Statistical Science
Thank You
Thank you
Download