Novel Approaches to Adjusting for Confounding: Propensity Scores, Instrumental Variables and MSMs Matthew Fox Advanced Epidemiology What are the exposures you are interested in studying? Assuming I could guarantee you that you would not create bias, which approach is better: randomization or adjustment for every known variable? What is intention to treat analysis? Yesterday Causal diagrams (DAGS) – – – Discussed rules of DAGS Goes beyond statistical methods and forces us to use prior causal knowledge Teaches us adjustment can CREATE bias Helps identify a sufficient set of confounders – Not how to adjust for them This week Beyond stratification and regression – – – – New approaches to adjusting for (not “controlling” ) confounding Instrumental variables Propensity scores (Confounder scores) Marginal structural models Time dependent confounding Given the problems with the odds ratio, why does everyone use it? Non-collapsibility of the OR (Excel; SAS) Odds ratio collapsibility but confounding C+ E+ CE- Total E+ E- E+ E- Disease+ 400 300 240 180 640 480 Disease- 100 200 360 720 460 920 Total 500 500 600 900 1100 1400 Risk 0.80 0.60 0.40 0.20 0.58 0.34 Odds 4.00 1.50 0.67 0.25 1.39 0.52 Crude Adj RR 1.33 2.00 1.6968 1.5496 OR 2.67 2.67 2.67 2.67 RD 0.2 0.2 0.23896 0.2 Solution: SAS Code title "Crude relative risk model"; proc genmod data=odds descending; model d = e/link=log dist=bin; run; title "Adjusted relative risk model"; proc genmod data=odds descending; model d = e c/link=log dist=bin; run; Results Model crude: Exp(0.5288) = 1.6968 Crude RR was 1.6968 Results Model adjusted: Exp(0.3794) = 1.461 MH RR was 1.55 STATA glm d e, family(binomial) link(log) glm d e c, family(binomial) link(log) What about risk differences? Solution: SAS Code title "Crude risk differences model"; proc genmod data=odds descending; model d = e/link=bin dist=identity; run; title "Adjusted risk differences model"; proc genmod data=odds descending; model d = e c/link=bin dist=identity; run; Results Model Model crude: crude: 0.239= 1.6968 Exp(0.5288) Crude 0.23896 CrudeRD RR= was 1.7 Results Adjusted model : 0.20 MH RD = 0.20 STATA glm d e, family(binomial) link(identity) glm d e c, family(binomial) link(identity) glm d e c c*e, family(binomial) link(identity) Novel approaches to controlling confounding Limitations of Stratification and Regression Stratification/regression work well with point exposures with complete follow up and sufficient data to adjust – – – Limited data on confounders or small cells No counterfactual for some people in our dataset Regression often estimates parameters Time dependent exposures and confounding A common situation With time dependence, DAGs gets complex Randomization and Counterfactuals Ideally, evidence comes from RCTs – Randomization gives expectation unexposed can stand in for the counterfactual ideal – In expectation, assuming no other bias Full exchangeability: E(p1=q1, p2=q2, p3=q3, p4=q4) [Pr(Ya=1=1) - Pr(Ya=0=1)] = [Pr(Y=1|A=1) - Pr(Y=1|A=0)] Since we assign A, RRAC = 1 – If we can’t randomize, what can we do to approximate randomization? How randomization works Randomized Controlled Trial C1 C2 C3 Randomization A D Randomization strongly predicts exposure (ITT) A typical observational study Observational Study C1 C2 C3 ? A D A typical observational study Observational Study C1 C2 C3 A D Regression/stratification seeks to block backdoor path from A to D by averaging A-D associations within levels of Cx Approach 1: Instrumental Variables Intention to treat analysis In an RCT we assign the exposure – – e.g. assign people to take an aspirin a day vs. not But not all will take aspirin when told to and others will take it even if told not to What to do with those who don’t “obey”? – The paradigm of intention to treat analysis says analyze subject in the group they are assigned Maintains the benefits of randomization Biases towards the null at worst Instrumental variables An approach to dealing with confounding using a single variable – Works along the same lines as randomization Commonly used approach in economics, yet rarely used in medical research – – Suggests we are either behind the times or they are hard to find Party privileged in economics because little adjustment data exists Instrumental variables An instrument (I): – A variable that satisfies 3 conditions: Strongly associated with exposure Has no effect on outcome except through A (E) Shares no common causes with outcome Ignore E-D relationship – Measure association between I and D – This is not confounded Approximates an ITT approach Adjust the IV estimate Can optionally adjust IV estimate to estimate the effect of A (exposure) – But differs from randomization If an instrument can be found, has the advantage we can adjust for unknown confounders – This is the benefit we get from randomization? Intention to Treat (IV Ex 1) A(Exposure): Aspirin vs. Placebo Outcome: First MI Instrument: Randomized assignment Condition 1: Condition 2: no Confounders Predictor of direct effect on A? the outcome? Randomization Therapy Condition 3: No common causes with outcome? MI Regression (17 confounders), no effect RD: -0.06/100; 95% CI -0.26 to 0.14 Confounding by indication (IVProtective Ex 2) effect of COX-2 IV: RD: -1.31/100; -2.42 to -0.20 A(Exposure): COX2 inhibitor vs NSAID Compatible with trial results Outcome: GI complications RD: -0.65/100; -1.08 to -0.22 Instrument: Physician’s previous prescription Indications Previous Px COX2/NSAID GI comp For 3,964 women born 19191940, a 1 SD (1.3 ºC) > mean st year life summer temp in 1 Unknown confounders (IV Ex 3) associated with 1.12-mmHg (95% CI: 0.33, 1.91) > adult systolic blood pressure, and 1 Hypothesized hottest/driest A(Exposure): Childhood dehydration SD > mean summer rainfall summers in infancy would Outcome: Adult high (33.9 bloodmm) pressure associated with < be associated with severe systolic blood pressure (-1.65 Instrument: 1st year summer climate infant diarrhea/dehydration, mmHg, 95% CI: -2.44, -0.85). and consequently higher blood pressure in adulthood. SES 1st year climate dehydration High BP Optionally we can adjust for “noncompliance” Optionally if we want to estimate A-D relationship, not I-D, we can adjust: – – – RDID / RDIE Inflate the IV estimator to adjust for the lack of perfect correlation between I and E If I perfectly predicts E then RDIE = 1, so adjustment does nothing Like per protocol analysis – But adjusted for confounders To good to be true? Maybe The assumptions needed for an instrument are un-testable from the data – Can only determine if I is associated with A Failure to meet the assumptions can cause strong bias – Particularly if we have a “weak” instrument Approach 2: Propensity Scores Comes out of a world of large datasets (Health insurance data) Cases where we have a small (relative to the size of the dataset) exposed population and lots and lots of potential comparisons in the unexposed group – And lots of covariate data to adjust for Then we have luxury of deciding who to include in study as a comparison group based on a counterfactual definition Propensity Score Model each subject’s propensity to receive the index condition as a function of confounders – Model is independent of outcomes, so good for rare disease, common exposure Use the propensity score to balance assignment to index or reference by: – – – Matching Stratification Modeling Propensity Scores The propensity score for subject i is: – Probability of being assigned to treatment A = 1 vs. reference A = 0 given a vector xi of observed covariates: Pr ( Ai 1 | X i x i ) In other words, the propensity score is: – Probability that the person got the exposure given anything else we know about them Why estimate the probability a subject receives a certain treatment when it is known what treatment they received? How Propensity Scores Work Quasi-experiment – Using probability a subject would have been treated (propensity score) to adjust estimate of the treatment effect, we simulate a RCT 2 subjects with = propensity, one E+, one E– – We can think of these two as “randomly assigned” to groups, since they have the same probability of being treated, given their covariates Assumes we have enough observed data that within levels of propensity E is truly random Propensity Scores: Smoking and Colon Cancer Have info on people’s covariates: – Person A is a smoker, B is not – Alcohol use, sex, weight, age, exercise, etc: Both had 85% predicted probability of smoking If “propensity” to smoke is same, only difference is 1 smoked and 1 didn’t – – This is essentially what randomization does B is the counterfactual for A assuming a correct model for predicting smoking Obtaining Propensity Scores in SAS Calculate propensity score proc logistic; model exposure = cov_1 cov_2 … cov_n; output out = pscoredat pred = pscore; run; Either match subjects on propensity score or adjust for propensity score proc logistic; model outcome = exposure pscore; run; Pros and Cons of PS Pros – – – Adjustment for 1 confounder Allows estimation of the exposure and fitting a final model without ever seeing the outcome Allows us to see parts of data we really should not be drawing conclusions on b/c no counterfactual Cons – – – Only works if have good overlap in pscores Does not fix conditioning on a collider problem Doesn’t deal with unmeasured confounders Study of effect of neighborhood segregation on IMR Approach 3: Marginal Structural Models Time Dependent Confounding Time dependent confounding: 1) Time dependent covariate that is a risk factor for or predictive of the outcome and also predicts subsequent exposure Problematic if also: 2) Past exposure history predicts subsequent level of covariate Example Observational study of subjects infected with HIV E = HAART therapy – D = All cause mortality – C = CD4 count – Time Dependent Confounding A0 1) C0 D A1 D C1 A0 2) C0 A1 C1 Failure of Traditional Methods Want to estimate causal effect of A on D – – Can’t stratify on C (it’s an intermediate) Can’t ignore C (it’s a confounder) Solution – rather than stratify, weight Equivalent to standardization Create pseudo-population where RRCE = 1 – Weight each person by “inverse probability of treatment” they actually received – Weighting doesn’t cause problems pooling did – In DAG, remove arrow C to A, don’t box – Remember back to the SMR Crude E+ ED+ 350 70 D+ D1650 1130 DTotal 2000 1200 Total Risk 0.18 0.06 RR 3.0 RR W * a N1 C1 E+ 300 1200 1500 0.2 2.0 E20 D+ 180 D200 Total 0.1 RR C0 E+ E50 50 450 950 500 1000 0.1 0.05 2.0 300 50 a 1500 * 500 * N1 * W 1500 500 N1 ˆ SMR 2.0 b b 20 50 W * N * N 1 N 1500* 200 500* 1000 0 0 W SMR The SMR asks, what if the exposed had also been unexposed? Crude E+ ED+ 350 70 D+ D1650 1130 DTotal 2000 1200 Total Risk 0.18 0.06 RR 3.0 RR C1 E+ 300 1200 1500 0.2 2.0 Crude E+ D+ 350 D1650 Total 2000 Risk 0.18 RR C1 E+ 300 1200 1500 0.2 E- D+ DTotal RR E20 D+ 180 D200 Total 0.1 RR E- D+ DTotal RR C0 E+ E50 50 450 950 500 1000 0.1 0.05 2.0 C0 E+ 50 450 500 0.1 E- SMR The SMR asks, what if the exposed had also been unexposed? Crude E+ ED+ 350 70 D+ D1650 1130 DTotal 2000 1200 Total Risk 0.18 0.06 RR 3.0 RR C1 E+ 300 1200 1500 0.2 2.0 Crude E+ D+ 350 D1650 Total 2000 Risk 0.18 RR C1 E+ E300 D+ 1200 D1500 1500 Total 0.2 RR E- D+ DTotal RR E20 D+ 180 D200 Total 0.1 RR C0 E+ E50 50 450 950 500 1000 0.1 0.05 2.0 C0 E+ E50 450 500 500 0.1 SMR The SMR asks, what if the exposed had also been unexposed? Crude E+ ED+ 350 70 D+ D1650 1130 DTotal 2000 1200 Total Risk 0.18 0.06 RR 3.0 RR C1 E+ 300 1200 1500 0.2 2.0 Crude E+ D+ 350 D1650 Total 2000 Risk 0.18 RR C1 E+ E300 D+ 1200 D1500 1500 Total 0.2 0.1 RR E- D+ DTotal RR E20 D+ 180 D200 Total 0.1 RR C0 E+ E50 50 450 950 500 1000 0.1 0.05 2.0 C0 E+ E50 450 500 500 0.1 0.05 SMR The SMR asks, what if the exposed had also been unexposed? Crude E+ ED+ 350 70 D+ D1650 1130 DTotal 2000 1200 Total Risk 0.18 0.06 RR 3.0 RR C1 E+ 300 1200 1500 0.2 2.0 Crude E+ D+ 350 D1650 Total 2000 Risk 0.18 RR C1 E+ 300 1200 1500 0.2 2.0 E- D+ DTotal RR E20 D+ 180 D200 Total 0.1 RR C0 E+ E50 50 450 950 500 1000 0.1 0.05 2.0 E150 D+ 1350 D1500 Total 0.1 RR C0 E+ E50 25 450 475 500 500 0.1 0.05 2.0 SMR The SMR asks, what if the exposed had also been unexposed? Crude E+ ED+ 350 70 D+ D1650 1130 DTotal 2000 1200 Total Risk 0.18 0.06 RR 3.0 RR C1 C0 E+ EE+ E300 20 now D+ equals 50 50 Crude 1200 180 D450 950 the adjusted. No 1500 200 Total 500 1000 need to adjust. 0.2 0.1 0.1 0.05 2.0 RR 2.0 Crude E+ D+ 350 D1650 Total 2000 Risk 0.175 RR 2.0 C1 E+ 300 1200 1500 0.2 2.0 E175 D+ 1825 D2000 Total 0.875 RR E150 D+ 1350 D1500 Total 0.1 RR C0 E+ E50 25 450 475 500 500 0.1 0.05 2.0 Could also ask, what if everyone was both exposed, unexposed? Crude E+ ED+ 350 70 D+ D1650 1130 DTotal 2000 1200 Total Risk 0.18 0.06 RR 3.0 RR C1 E+ 300 1200 1500 0.2 2.0 E20 D+ 180 D200 Total 0.1 RR C0 E+ E50 50 450 950 500 1000 0.1 0.05 2.0 Could also ask, what if everyone was both exposed, unexposed? Crude E+ D+ DTotal Risk RR C1 E+ ED+ DTotal RR E- C0 E+ E- D+ D1700 1700 Total 1500 1500 0.2 0.1 0.1 0.05 2.0 RR 2.0 Could also ask, what if everyone was both exposed, unexposed? Crude E+ D+ DTotal Risk RR ED+ DTotal RR C1 E+ 340 1360 1700 0.2 2.0 C0 EE+ 170 D+ 150 1530 D1350 1700 Total 1500 0.1 0.1 RR 2.0 E75 1425 1500 0.05 Could also ask, what if everyone was both exposed, unexposed? Crude E+ D+ 490 D2710 Total 3200 Risk 0.153 RR 2.0 E245 D+ 2955 D3200 Total 0.077 RR C1 E+ 340 1360 1700 0.2 2.0 C0 EE+ 170 D+ 150 1530 D1350 1700 Total 1500 0.1 0.1 RR 2.0 E75 1425 1500 0.05 What is Inverse Probability Weighting (IPW)? Weight each subject by inverse probability of treatment received Probability of treatment is: – – Weighting breaks E-C link only – p(receiving treatment received| covariates) Adjust # of E+ and E- subjects in C strata Now Marginal (Crude) = Causal Effect But that’s what we just did Calculate the weights Crude E+ 350 1650 2000 0.18 3.0 D+ DTotal Risk RR PT IPTW E70 D+ 1130 D1200 Total 0.06 RR C1 E+ 300 1200 1500 0.2 2.0 E20 D+ 180 D200 Total 0.1 RR C0 E+ 50 450 500 0.1 2.0 E50 950 1000 0.05 Calculate p(receiving treatment received|C) For C=1, E=1 – – PT = 1500/1700 = 0.88 IPTW = 1/0.88 = 1.13 Calculate the weights Crude E+ 350 1650 2000 0.18 3.0 D+ DTotal Risk RR PT IPTW E70 D+ 1130 D1200 Total 0.06 RR C1 E+ 300 1200 1500 0.2 2.0 0.88 1.13 E20 D+ 180 D200 Total 0.1 RR C0 E+ 50 450 500 0.1 2.0 E50 950 1000 0.05 Calculate p(receiving treatment received|C) For C=1, E=1 – – PT = 1500/1700 = 0.88 IPTW = 1/0.88 = 1.13 Calculate the weights Crude E+ 350 1650 2000 0.18 3.0 D+ DTotal Risk RR PT IPTW E70 D+ 1130 D1200 Total 0.06 RR C1 E+ E300 20 1200 180 1500 200 0.2 0.1 2.0 0.88 0.12 1.13 8.50 D+ DTotal RR C0 E+ 50 450 500 0.1 2.0 E50 950 1000 0.05 Calculate p(receiving treatment received|C) For C=1, E=0 – – PT = 200/1700 IPTW = 1/0.12 = 0.12 = 8.50 Calculate the weights Crude E+ 350 1650 2000 0.18 3.0 D+ DTotal Risk RR PT IPTW D+ DTotal RR C0 E+ E50 50 450 950 500 1000 0.1 0.05 2.0 0.33 0.67 3.00 1.50 Calculate p(receiving treatment received|C) For C=1, E=0 – – E70 D+ 1130 D1200 Total 0.06 RR C1 E+ E300 20 1200 180 1500 200 0.2 0.1 2.0 0.88 0.12 1.13 8.50 PT = 200/1700 IPTW = 1/0.12 = 0.12 = 8.50 Multiply cell number by the weights Apply the weights Crude E+ 350 1650 2000 0.18 3.0 E70 D+ 1130 D1200 Total 0.06 RR D+ DTotal Risk RR PT IPTW Pseudo population Crude E+ ED+ DTotal Risk RR D+ DTotal RR C1 E+ E300 20 1200 180 1500 200 0.2 0.1 2.0 0.88 0.12 1.13 8.50 D+ DTotal RR C1 E+ E340 170 D+ 1360 1530 D1700 1700 Total 0.2 0.1 2.0 RR C0 E+ E50 50 450 950 500 1000 0.1 0.05 2.0 0.33 0.67 3.00 1.50 C0 E+ E150 75 1350 1425 1500 1500 0.1 0.05 2.0 Collapse Crude E+ EBroke link between D+ 350 70 D+ C and E without D1650 1130 Dstratification, no Total Total 2000so1200 Risk problem of0.18 0.06 RR 3.0 RR conditioning on PT collider IPTW Pseudo population Crude E+ ED+ 490 245 D+ D2710 2955 DTotal 3200 3200 Total Risk 0.153 0.077 RR 2.0 RR C1 E+ E300 20 1200 180 1500 200 0.2 0.1 2.0 0.88 0.12 1.13 8.50 D+ DTotal RR C1 E+ E340 170 D+ 1360 1530 D1700 1700 Total 0.2 0.1 2.0 RR C0 E+ E50 50 450 950 500 1000 0.1 0.05 2.0 0.33 0.67 3.00 1.50 C0 E+ E150 75 1350 1425 1500 1500 0.1 0.05 2.0 Pseudo-population The “pseudo-population” breaks the link between the exposure and the outcome without stratification – – Note this is different from stratifying Create a standard population without confounding By creating multiple copies of people, standard errors will be biased – Use robust standard errors to adjust “The IPTW method effectively simulates the data that would be observed had, contrary to fact, exposure been conditionally randomized” Robins and Hernán Time Dependent Confounding Extend method to time dependent confounders – Predict p(receiving treatment actually received at time t1|covariates, treatment at t0) Probability of treatment at t1 is: p(receiving treatment received at t0) * p(receiving treatment received at t1) See Hernán for SAS code, not hard scwgt command, robust SE (repeated statement) Time Dependent Confounding 1) Before IPTW E0 C0 2) After IPTW C0 E1 D E1 D C1 E0 C1 Limitations of MSMs Very sensitive to weights Still need to be able to be able to predict the exposure – The methods solves the structural problem, but we still need the data to be able to accurately predict exposure Still have to get the model right