Strategies for Using Partially Valid Instrumental Variables Dylan Small Department of Statistics, Wharton School, University of Pennsylvania Joint work with: Paul Rosenbaum Mike Baiocchi Marshall Joffe Tom Ten Have Overview • Example of Instrumental Variables (IV) method: Effect of World War II military service on future earnings. • Sensitivity to unobserved biases for IV method. • Strength of IVs and sensitivity to unobserved biases: How do small studies with strong IVs compare to large studies with weak IVs? • Extended instrumental variables methods when exclusion restriction for IV is invalid. WWII Veteran Status and Earnings • Does military service raise or lower earnings? • Angrist and Krueger (1994) studied this in context of WWII military service and 1980 earnings (using 5% public use sample of US Census). • Lower earnings? Military service in WWII interrupts education or career. • Higher earnings? Labor market might favor veterans, GI Bill increases education. WWII Vets (76% of men) earned on average $4500 more in 1980 than Non-Vets. This is association not causation: WWII Vets might not be comparable to Non-Vets in terms of health, criminal behavior… We created matched triples: men matched on quarter of birth, race, age, education up to 8 years and location of birth. This figure provides reason to doubt military service increases earnings by $4500. From 1924 to 1926, the proportion of veterans stayed about constant and the earnings stayed about the same. From 1926 to 1928, the proportion of veterans decreased by 50% but earnings increased, suggesting military service decreases earnings. Unmeasured Confounding Graph is conditional on measured confounders (race, education up to 8 years, location of birth) Earnings Veteran Status Unobserved Variables Instrumental Variables Strategy Y=Outcome W=Treatment Z=IV Graph is conditional on measured confounders (race, education up to 8 years, location of birth) Y:Earnings X W: Veteran Status Z: Year of Birth X Unobserved Variables Extract variation in W from Z that is free of unobserved confounders and use this variation to estimate the causal effect of W on Y. Key IV Assumptions: (1) Z independent of unobserved variables; (2) Z does not have direct effect on outcome. Prototype IV Design: Matched Pair Encouragement Design Consider a matched pair design in which there are I matched pairs and one unit j in each pair i is encouraged to receive treatment ( Z ij 1 ) and the other unit j’ is not encouraged to receive treatment ( Z ij ' 1) . Rubin Causal Model: Each subject ij has two potential outcomes: rTij = outcome if encouraged rCij = outcome if not encouraged and two potential treatment receiveds: wTij = dose of treatment received if encouraged wCij = dose of treatment received if not encouraged Randomization Inference A simple model says that the effect of encouragement on the outcome is proportional to its effect on the treatment received: rTij rCij (wTij wCij ) (1) In WWII study, casual effect of military service Let Rij = observed outcome, Wij observed treatment received. Under model (1), rCij Rij Wij . In this context, the encouragement variable Z is said to be a valid instrumental variable (IV) if Z is effectively randomly assigned: 1 1 P( Z i1 1, Z i 2 0) , P( Z i1 0, Z i 2 1) 2 2 If Z is a valid IV, we can test H 0 : 0 by testing whether Rij 0Wij (= rCij if 0 ) is independent of Z ij , e.g., by a Wilcoxon signed rank test. 95% CI for effect of military service: (-$1,445, -$500) Relationship to Angrist, Imbens and Rubin Setup Angrist, Imbens and Rubin (1996) define an IV as valid if it is 1. effectively randomly assigned (ignorable) (rTij , rCij ) independent of Zij | Xij 2. no direct effect (exclusion restriction) The model rTij rCij (wTij wCij ) assumes the exclusion restriction: encouragement has no direct effect. Side note: Angrist, Imbens also consider situation of heterogeneous treatment effects. They show that under an additional assumption (monotonicity), a valid IV identifies the average treatment effect for the subjects who would receive treatment if and only if encouraged to do so (the compliers). IV Applications in Health Research Outcome ( Y ) Birth weight Treatment ( W ) Maternal smoking IV ( Z ) State cigarette taxes Birth weight Maternal smoking Mortality Premature baby delivered at high level NICU vs. local hospital Non-steroidal antiinflammatory drug (NSAID) vs. nonNSAID drug Breast cancer surgery treatment vs. non-surgery treatment HDL Cholesterol Random assignment of free smoker’s counseling Mother’s differential distance between high level NICU and local hospital Physician’s last prescription type Gastrointestinal Complications Mortality Coronary Heart Disease Reference Evans and Ringel (1999) Permutt and Hebel (1989) Baiocchi, Small, Lorch and Rosenbaum (2010) Brookhart et al. (2006) Proportion receiving surgery in health referral region Brooks et al. (2003) Polymorphisms that affect HDL cholesterol Voight et al. (2012) Sensitivity Analysis IV method assumes that the IV (encouragement) is effectively randomly assigned: 1 1 P( Z i1 1, Z i 2 0) , P( Z i1 0, Z i 2 1) 2 2 There is often concern about whether this is true. In WWII Study, there are gradual long term trends in apprenticeship, education, employment and nutrition that might bias comparisons of workers born two years apart. A sensitivity analysis asks how departures from random assignment of the IV of various magnitudes might alter a study’s conclusion. Model for Sensitivity Analysis For subject ij, let ij denote the probability that ij is encouraged, ij P( Zij 1) . Suppose that two subjects ij and ik may differ in their odds of being encouraged by at most a factor of 1 because they differ in terms of an unobserved covariate, uij uik , 1 ij (1 ik ) i, j , k . ik (1 ij ) If 1 , IV is randomly assigned. If 1 , then distribution of treatment assignments is unknown but magnitude of departure from random assignment controlled by . Carrying out Sensitivity Analysis Let (11 , 12 , for each subject. , I 1 , I 2 ) denote the probabilities of being encouraged For each fixed value of , we can test H 0 : 0 using permutation inference. For a given value of , we compute the minimum and maximum p-values for testing H 0 : 0 for all that satisfy 1 ij (1 ik ) i, j , k . ik (1 ij ) Rosenbaum (Observational Studies, 2002) provides a simple method to compute these minimum and maximum p-values. 95% CI for effect of military service when 1 : (-$1,445, -$500) 95% CI for effect of military service when 1.2 : (-$3,745, $1,735) Sensitivity Analysis for WWII Study Upper Bound on One-Sided Significance Level for 1926 vs. 1928 IV 1 1.2 1.5 1.6 2.2 2.3 H0 : 0 0.001 1.000 1.000 1.000 1.000 1.000 H 0 : 1, 000 0.001 0.860 1.000 1.000 1.000 1.000 H 0 : 4,500 0.001 0.001 0.027 0.904 1.000 1.000 H 0 : 10, 000 0.001 0.001 0.001 0.001 0.016 0.476 Strength of IV • An IV is strong if encouragement has a strong effect on treatment received; An IV is weak if encouragement has only a weak effect on treatment received. Study Strong IV Weak IV World War II Study 1926 vs. 1928 1924 vs. 1926 Maternal Smoking Study Random assignment of State cigarette taxes free counseling • Effects of Weak IVs 1. Increased Variance 2. Increased Sensitivity to Bias Effect of Weak IVs I: Increased Variance Y X W|X Z|X X Unobserved Variables If Z is a weak IV, then the variance of the IV estimate will be higher because less variation in W from Z can be extracted. 95% CI for effect of military service using 1926 vs. 1928 IV: (-$1,445, -$500). 95% CI for effect of military service using 1924 vs. 1926 IV: (-$10,130, $10,750) Effect of Weak IVs II: Increased Sensitivity to Bias Power of a Sensitivity Analysis (Rosenbaum, 2004) 1 , 2 but we didn’t know this and wanted to allow for some sensitivity to bias measured by Suppose Z were in fact a valid IV so that P( Z i1 1, Z i 2 0) P( Z i1 0, Z i 2 1) Suppose also that 0 was large, so that H 0 : 0 was substantially in error. We would like to be able to reject H 0 : 0 for all that satisfy 1 ij (1 ik ) i, j , k . (1) ik (1 ij ) Power of a sensitivity analysis: Probability that we will reject H 0 : 0 for all that satisfy (1) assuming that Z is a valid IV and a given value of 0 . Model for Power Analysis Let rCi1 rCi 2 ~ N (0, 2 ) . Subjects have random compliance patterns with zero probability of being a defier and equal probabilities of being a never taker or always taker. Effect size is ( 0 ) / Strength of instrument is P(Wij 1| Z ij 1) P(Wij 1| Z ij 0) . (probability of being a complier) Effect size: ( 0 ) / 1 Number of pairs I Strength of IV: 100 1000 10,000 100,000 lim 1 0.5 0.1 1 1 1 1.00 1.00 0.99 1.00 0.12 0.73 1.00 1.00 1.00 1.00 1.00 1.00 1 1 1 1 0.5 0.1 1.2 1.00 1.00 1.2 0.92 1.00 1.2 0.03 0.03 1.00 1.00 0.04 1.00 1.00 0.10 1 1 1 1 0.5 0.1 2 2 2 1.00 1.00 0.00 1.00 1.00 0.00 1 1 0 1.00 1.00 0.18 0.97 0.00 0.00 I When the IV is valid ( 1 ), the power is of course greater for stronger IVs but there is good power for all cases with sample size of 10,000 pairs. Valid but weak IVs eventually get it right. But when 1 , the power can tend to 1 or 0 depending on the strength of the IV. Weak IVs are quite sensitive to small biases. Effect size: ( 0 ) / 0.5 Strength of IV: 100 1000 Proportion of compliers 1 1 1.00 1.00 0.5 1 0.64 1.00 0.1 1 0.07 0.32 Number of pairs I 10,000 100,000 lim 1.00 1.00 1.00 1.00 1.00 1.00 1 1 1 1 0.5 0.1 1.2 0.98 1.00 1.2 0.32 1.00 1.2 0.01 0.00 1.00 1.00 0.00 1.00 1.00 0.00 1 1 1 1 0.5 0.1 2 2 2 1.00 0.00 0.00 1.00 0.00 0.00 1 0 0 0.38 1.00 0.01 0.00 0.00 0.00 I For strong IVs, the sensitivity to unobserved biases is meaningfully affected by the effect size (e.g., for 2, I 1000 , proportion of compliers = 0.5, power is 0.97 when ( 0 ) / 1 but 0.00 when ( 0 ) / 0.5 ). But for weak IVs, there is barely any difference between ( 0 ) / 1 versus ( 0 ) / 0.5 . Practical Consequences 1. Weak IVs that might have small bias are dangerous to use. Weak IVs are sensitive to quite small biases ( 1 yet close to 1), even when the effect size ( 0 ) / is quite large. Unless one is confident that a weak IV is perfectly valid ( 1 ), its extreme sensitivity to small biases is likely to limit its usefulness to the study of enormous effects, ( 0 ) / 1 . 2. Strong IVs that might be moderately biased are useful. A strong IV may provide useful information even if moderate biases are plausible. 3. Strength of IV important in choosing a study design. Consider two studies, a small study with a strong IV and a large study with a weak IV, which would have the same power if both IVs are unbiased. When there is concern that the IVs might be biased, the small study with a strong IV has considerable advantages. Practical Consequences Continued 4. Strategies for increasing effect size more useful for strong IVs. For strong IVs, the sensitivity to unobserved biases is meaningfully affected by the effect size ( 0 ) / whereas for weak IVs, the effect size makes little difference. Sensitivity to unobserved biases can sometimes be reduced by increasing the effect size say by reducing the unexplained heterogeneity of subjects (Rosenbaum, 2005). For instance Ashenfelter and Rouse (1998) studied the effects of additional education on earnings using identical twins and Kim (2007) studied the earnings of veteran siblings to estimate the effect of being drafted Strategies of this sort may be helpful with strong IVs but largely ineffective with weak IVs. Extended IV Methods for Addressing Violation of Exclusion Restriction • Angrist, Imbens and Rubin (1996): two key conditions for valid IV are : – IV effectively random assigned conditional on measured covariates X – No direct effect on Y (exclusion restriction). • We consider situations in which the random assignment is plausible but the exclusion restriction is not. Instrumental Variables Strategy Y=Outcome W=Treatment Z=IV Graph is conditional on measured confounders (race, education up to 8 years, location of birth) Y:Earnings X W: Veteran Status Z: Year of Birth X Unobserved Variables Extract variation in W from Z that is free of unobserved confounders and use this variation to estimate the causal effect of W on Y. Key IV Assumptions: (1) Z independent of unobserved variables; (2) Z does not have direct effect on outcome. Vascular access in hemodialysis • Hemodialysis – One of main treatment options in end-stage renal disease (ESRD) – Requires access to vascular system • Three main types – Catheter – Synthetic material – Native arteriovenous fistula (AVF) Vascular access (cont’d) • Type of VA (A) partially determines dose of dialysis (DD; S) Y – Native AVF allows larger doses than catheter – S may affect outcomes (e.g., mortality) • VA may have effects on outcome (Y) not mediated by dose (e.g., infection) • Incomplete directed acyclic graph (DAG) of key variables S A Estimand of interest • To gauge impact of type of VA, interested in overall effect Y – Involves both • Direct effect (A->Y) • Indirect effect (A->S->Y) S Ya • Formulate in terms of potential outcomes: Y a1 Y a0 Y a1S a1 Y singly indexed a0 S a0 direct effect: doubly indexed a1S a0 Y indirect effect: Y overall effect: Y a1S a1 a1S a1 Y a0 S a0 Y a1S a0 Y a0 S a0 A Confounding by indication • AVFs given preferentially to healthier subjects • Results in confounding by indication – Often difficult to control using standard methods based on ignorable treatment assignment – Variety of treatments of dialysis patients in which standard approaches based on ignorability lead to implausible results • Dose of dialysis choice (S) also nonignorable Instrumental variables • Alternative approach for estimation • Need to find instrumental variable (R) – Associated with treatment of interest (A) – Independent of unmeasured confounders, i.e., shares no unmeasured common cause with outcome Y. – Has no direct effect on outcome (exclusion restriction) • Practice at which dialysis provided reasonable candidate – Used for various analyses in Dialysis Outcomes and Practice Patterns Study (DOPPS) • Large, international study with hundreds of practices • Will assume that practice (R) shares no unmeasured common causes with S or Y. Revise DAG • Need to elaborate DAG • Include Y – instrument/center (R) – Measured (X) and unmeasured (U) common causes of variables of interest • Is R a valid instrument for the overall effect of A on Y? S A R U X Graphical criteria for instrument • Remove effect of treatment of interest • Check whether R independent of/D-separated from Y • Directed path R->S->Y • Criterion not satisfied • R not a valid instrument for overall effect of A • In Angrist, Imbens & Rubin framework, the problem is that R has direct effect on Y through S and hence violates the exclusion restriction. Y S A R U X Second Example: Return to Schooling • Y=Earnings, A=Years of Education • Unmeasured confounders: Ability, Motivation. • Card (1993) proposes as an IV, R= distance person grew up from nearest four year college. • Problem: – R also affects whether person lives in an SMSA as an adult (S) conditional on A and measured confounders X (whether lived in an SMSA growing up, region where grew up and family background variables). – There is a wage premium to living in an SMSA as an adult. Return to Schooling DAG • R (living near college growing up) is not a valid instrument for the overall effect of A (years of schooling) on Y (earnings) because it has direct effect on Y through S (lives in SMSA as an adult). Y S A R U X Estimation • For estimating overall effects of A in these two problems, can’t use – Standard methods based on ignorability – Standard instrumental variables methods • Idea: Look for interactions between R and X that can serve as instruments. Extended Instruments • Look for component of X that interacts with R to affect A but not Y directly. • Card proposes family income as component of X that R*X – Interacts with R to affect A : college proximity is a factor that lowers costs of higher education, consequently it has a bigger effect on a poorer family – Does not directly effect S nor Y: the direct earnings effect of living near a college or the direct effect on living in an SMSA does not vary by family background. R Y S A U X Two-step approach • • • • Estimate joint effect of A, S on Y Estimate effect of A on S Combine to obtain overall effect In systems of linear models, overall effect is sum of – Direct effect of A: ψA – Indirect effect of A: ψSΦA Y A A A S S Two-step approach (1st step) • Yas potential outcome • Model for joint effect: – Yas=Y00+aψA+sψS – Rank-preserving/deterministic formulation • Model for observables – E*=Best Linear Predictor – E*(Y|X,R)=E*(YAS|X,R)= E*(Y00|X,R,X*R)+E*(A|X,R,X*R)ψA+E*(S|X,R,X*R)Ψs – Identifiability requires that E*(Y00|X,R,X*R), E*(A|X,R,X*R) and E*(S|X,R) not collinear. • One way: Assume E*(Y00|X,R,X*R) only depends on X. Then we need one component of X that interacts with R to affect A. • Another way: Assume E*(Y00|X,R,X*R) depends on X and R but not X*R. Then we need at least two components of X that interacts with R to affect. – Estimation by two stage least squares. Regress A and S on X, R and X*R. Regress Y on Aˆ , Sˆ , X , R Two-step approach (2nd step) • Under assumptions Y – Effect of A on S confounded – R not instrument for effect of A on S S • Consider alternative – Linear model for joint effect of R, A – Sra=S00+rΦR+aΦA A R*X • Model for observables – E*(S|X,R)=E*(S00|X,R,X*R)+RΦR+ E*(A|X,R,X*R)ΦA • Can estimate by 2SLS under the assumption that E*(S00|X,R,X*R) does not depend on X*R (uncheckable) and that X*R affects A. • Regress A on X, R, X*R. Regress S on  , X, R. R U X Results for Card’s Data Path Analysis (OLS) Two Step Extended IV Estimate of SE Overall Effect of A 0.0762 0.0004 0.1503 0.0462 Y= log earnings A= years of schooling S = lives in SMSA as an adult R = lived near 4 year college growing up X = experience, experience squared, black indicator, indicator for living in SMSA growing up, indicators for region growing up, mother and father’s education Summary • The IV method can be a powerful strategy for observational studies when there are confounders that are hard to measure and there is a “random” encouragement to receive treatment. • When encouragement is not actually random, it is important to do a sensitivity analysis. • Strong IVs are much less sensitive to bias. • When the exclusion restriction might be violated, developed extended IV methods that use X*R as IVs. Papers • Small, D.S. and Rosenbaum, P.R. (2008), “War and Wages: The Strength of Instrumental Variables and Their Sensitivity to Unobserved Biases,” Journal of the American Statistical Association, 103, 924-933. • Joffe, M. M., Small, D.S., Brunelli, S., Ten Have, T.R., and Feldman, H. I. (2008), "Extended Instrumental Variables Estimation for Overall Effects," International Journal of Biostatistics, 4. • Baiocchi, M., Small, D.S., Lorch, S.A. and Rosenbaum, P.R. (2010), “Building a Stronger Instrument in an Observational Study of Perinatal Care for Premature Infants,” Journal of the American Statistical Association, 105, 12851296 • e-mail: dsmall@wharton.upenn.edu Alternative estimands • Assumed that interested in overall effect – Vascular Access (VA) inevitably affects Dose of Dialysis (DD) • Type of VA limits possible dose • However, may be possible to alter DD • Interested in – Effect of DD – Effect of VA if affects DD in different fashion from under current practice Alternative estimands (cont’d) • Show altered effect, new intervention on DAG • Formulate in terms of potential outcomes Y S A S g ,a target level of S under treatment a, plan g E (Y aS g ,a ) expected of Y level under treatment a, plan g • Contrast for different levels of treatment R U X Alternative estimands (cont’d) • Defining intervention on S – Individualize target levels of S • e.g., base on maximum tolerated DD • Insufficient information in established databases (e.g, DOPPS) – Set target level of S based on A, covariates X • Currently little information to set target levels • Available covariate information may be insufficient to determine whether particular DD feasible for individual Alternative estimands (cont’d) • Defining intervention on S – Speculate about feasible interventions on S at aggregate level • Consider effects of A on S under those interventions; i.e., propose value for ΦA* • Compute overall effect from component effects: ψA+ψSΦA* • Perform sensitivity analysis for values of ΦA* One-step approach • Estimator of effect of A on S does not require either standard ignorability or IV • Can we do same for overall effect of A A R*X on Y? R • Remove S from graph, redraw diagram • Graph identical to original graph removing Y A R*X • Use same methods of estimation for R effect of A on S Y U X S U X Results for Card’s Data Estimate of Overall Effect of A Path Analysis (OLS) 0.0762 Two Step Extended IV 0.1503 One Step Extended IV 0.1500 SE 0.0004 0.0462 0.0462 Y= log earnings A= years of schooling S = lives in SMSA as an adult R = lived near 4 year college growing up X = experience, experience squared, black indicator, indicator for living in SMSA growing up, indicators for region growing up, mother and father’s education