Sensitivity Analysis with Several Unmeasured Confounders Lawrence McCandless lmccandl@sfu.ca Faculty of Health Sciences, Simon Fraser University, Vancouver Canada Spring 2015 Outline The problem of several unmeasured confounders • Background: Sensitivity analysis for a single binary unmeasured confounder. • New methodology: Sensitivity analysis for several unmeasured confounders. • The role of Bayesian inference: We assign probability distributions to sensitivity parameters. Background Binary Unmeasured Confounders To explore sensitivity to unmeasured confounding, researchers often assume that there is one unmeasured confounder (typically binary). This approach dates back to Rosenbaum and Rubin (1983), and Lin, Psaty and Kronmal et al. (1998), and others. The assumption of 1 binary unmeasured confounder is appealing because it is • Tractable mathematically • Leads to simple bias-adjustment formulas • Low dimensional with few bias parameter inputs • Easy to explain, interpret and implement Background Unmeasured Confonding in the Real World In reality, there are often several unmeasured confounders, and there is little guidance in the statistical literature how to proceed. Examples: Statins and lower fracture risk in the elderly Unmeasured confounders: health-seeking behaviors, BMI, physical activity, smoking, alcohol consumption Lead exposure in childhood and lower IQ Unmeasured confounders: pesticide exposure, breastfeeding, poor parenting, maternal depression, iron deficiency, tobbacco exposure, poverty, and pica. The Problem of Several Unmeasured Confounders There are no methods available to explore sensitivity to several (say 5) unmeasured confounders. Method input: “Please list your 5 confounders and their properties” Method output: Bias-corrected point and interval estimates for causal effects. However: see notes below for other related methods Why does problem of several unmeasured confounders remain unsolved? The challenges: 1. We need to specify the relationship between 1) the confounders and the outcome, 2) confounders and the exposure, (High dimensional). 2. The confounders may be continuous, nominal or ordinal (e.g. race or smoking). (More dimensions). 3. The unmeasured confounders may be correlated with one another. (Less important) 4. The unmeasured confounders may be correlated with the measured covariates. (More important) 5. The confounders may interact with exposure. (Less important) Why does problem of several unmeasured confounders remain unsolved? More challenges: 6. Even if the laundry-list of parameters are specified, we must still impute out the U (intergrate out of the model). This integration is often not available analytically, except in simple cases (like in Lin et al.). Custom statistical software is required. 7. The inference is fundementally Bayesian because there is uncertainty in the bias parameters. 8. We require content area expertise because the problem is inherently qualitative. Statisticians often do not have this expertise. Methods for sensitivity analysis for several unmeasured confounders • Greenland (2005) JRSSA, treats U as a compound of other variables, or “sufficient summary” • Rosenbaum (2002), Departures from random assignment by factor Γ. • McCandless (2012) JASA use validation data. • Hsu (2013) Biometrics, calibrated sensitivity analysis. • Deng (2013) Biometrika Cornfield conditions • Maclehose & Kaufman (2005) Epidemiol, linear programming • Brumbeck (2004) Stat Med, Sensitivity analysis based on potential outcomes. • Vanderweele & Arah (2011) Epidemiol, formulas for general scalar U. Greenland 2005 JRSSA Lin et al. (1998) Biometrics A New Methodology Consider a prospective cohort study with equal follow-up, where • Y be an continous outcome measure • X is a dichotomous 0/1 exposure at baseline • C is a p × 1 vector of measured covariates • U is a q × 1 vector of unmeasured confounders that are quantitative and continuous A New Methodology Building on Lin et al. (1998), factorise P(Y , U|X , C) = P(Y |X , C, U)P(U|X , C), and write Y |X , C, U = β0 + βX X + βCT C + βU q X Ui + j=1 U|X , C ∼ MVN (α0 + αX X ) × 1, Σρ , where 1 is a q-vector of 1’s, and Σρ is a q × q covariance matrix 1 ρ ... ρ ρ 1 ... ρ Σρ = . . . . . . 1 . . . ρ ... ... 1 with diagonal elements 1 and off-diagonals ρ (compound symmetric). Key Assumptions I will call these the Duplicate Unmeasured Confounder (DUC) assumptions. 1. U1 , U2 , . . . , Uq are equicorrelated with Y 2. U1 , U2 , . . . , Uq are equicorrelated with X 3. U1 , U2 , . . . , Uq are equicorrelated with one another Other assumptions: Linearity; absence of interactions; U ⊥⊥ C|X (zero correlation between measured and unmeasured confounders); no measurement error, ... A New Methodology Sensitivity analysis for several unmeasured confounders Define a new variable U ∗ = Θ = Var q X i=1 Pq √ i=1 Ui / Θ, where ! Ui X , C = 1T Σρ 1 = q(1 + ρ(q − 1)) . The quantity U ∗ is the sum of U1 , . . . , Uq rescaled to have unit-variance, and normally distributed. The idea: We replace the vector U with the scalar U ∗ , and we are then within the the general framework of Lin, Psaty & Kronmal (1998). A New Methodology Therefore the original model E(Y |X , C, U) = β0 + βX X + βCT C + βU q X Ui j=1 U|X , C ∼ MVN (α0 + αX X ) × 1, Σρ , becomes the new model √ E(Y = 1|X , C, U ∗ ) = β0 + βX X + βCT C + βU ΘU ∗ n o √ U ∗ |X , C ∼ N q(α0 + αX X )/ Θ, 1 . and this is embedded within the original framework of Lin, Psaty & Kronmal (1998) Inference To conduct a sensitivity analysis, we can use maximum likelihood Q calculated from the observed data likelihood L(.) = P(Yi |Xi , C i ) where Z P(Y |X , C) = P(Y |X , U, C)P(U|X , C)dU Lin et al (1998) show how to do the integration analytically for Gaussian scalar U for linear, log-linear or logistic response Y , so that we obtain, for example, E(Y |X , C) = β0 + βU α0 + h √ √ i βX + βU Θ × qαX / Θ X + βCT C A New Methodology Therefore the bias on the causal effect parameter βX from q unmeasured confounders under the DUC assumptions is equal to √ √ Bias = βU Θ × qαX / Θ = qβU αX Lin et al. (1998), Vanderweele & Arah (2011) A new methodology Consequently, within our modelling framework 1. The confounding bias from q unmeasured confounders is equal to q× the confounding bias of a single Ui (thus bias is additive). 2. The correlation among the U1 , . . . , Uq (which is ρ) does not affect the magnitude of bias. Results #2 is surprising, but makes sense. Demonstration with Numbers One simulated dataset correlation <- 0.999 ## Correlation among the Us k <- 10 ## Dimension of U sigma <- matrix(correlation, nrow=k, ncol=k); diag(sigma) <- 1 n <- 10000 ## Sample size X <- rbinom(n, 1, 0.5) U <- X + matrix(rnorm(k*n), nrow=n, ncol=k) %*% chol(sigma) Y <- rnorm(n, 0*X + apply(U, 1, sum), 1) Demonstration with Numbers One Simulated Dataset Demonstration with Numbers ρ = 0.99999 High Correlation Among Unmeasured Confounders Demonstration with Numbers ρ = 0 Zero Correlation Among Unmeasured Confounders Conclusion Within our modelling framework. 1. The confounding bias from q unmeasured confounders is equal to q× the confounding bias of a single Ui (thus bias is additive) 2. The correlation among the U1 , . . . , Uq (which is ρ) does not affect the magnitude of bias. Questions: 1. How general are these findings? 2. How useful in practice? 3. What about correlation between measure and unmeasured confounders? Bias from Several Unmeasured Confounders How general are these findings? • • • • What about binary outcomes and survival data? What if U1 , . . . , Uq are binary? What if Σρ is not compound symmetric? What about weakening the Duplicate Unmeasued Confounder (DUC) assumption? E.g. Y = β0 + βX X + βCT C + βU1 U1 + βU2 U2 + . . . + βUq Uq + where βU1 , . . . , βUq ∼ N(µ, σ 2 ) instead of Y = β0 + βX X + βCT C + βU q X j=1 Ui + Bias from Several Unmeasured Confounders How useful in practice? Rule of thumb (?): If assume DUC then “k unmeasured confounders means k times more bias” ... always true??? Brief Comment on the Role of Bayesian Statistics The Bayesian approach is useful because it quantifies the uncertainty about unmeasured confounding. We assign a prior probability distribution to bias parameters. Bayesian theorem gives posterior credible intervals that incorporate uncertainty from unmeasured confounding. The Bayesian approach is useful to obtain simple summarizes in sensitivity analysis when there are multiple bias parameter inputs. McCandless et al. (2007) Stat Med Gustafson, Greenland (2009) Statistical Science Thank You Thank you