Strategies for Using Partially Valid Instrumental Variables, Dylan Small

advertisement
Strategies for Using Partially
Valid Instrumental Variables
Dylan Small
Department of Statistics,
Wharton School, University of Pennsylvania
Joint work with:
Paul Rosenbaum
Mike Baiocchi
Marshall Joffe
Tom Ten Have
Overview
• Example of Instrumental Variables (IV)
method: Effect of World War II military
service on future earnings.
• Sensitivity to unobserved biases for IV
method.
• Strength of IVs and sensitivity to
unobserved biases:
How do small studies with strong IVs
compare to large studies with weak IVs?
• Extended instrumental variables methods
when exclusion restriction for IV is invalid.
WWII Veteran Status and Earnings
• Does military service raise or lower
earnings?
• Angrist and Krueger (1994) studied this in
context of WWII military service and 1980
earnings (using 5% public use sample of
US Census).
• Lower earnings? Military service in WWII
interrupts education or career.
• Higher earnings? Labor market might
favor veterans, GI Bill increases
education.
WWII Vets (76% of men)
earned on average $4500
more in 1980 than Non-Vets.
This is association not causation:
WWII Vets might not be comparable
to Non-Vets in terms of
health, criminal behavior…
We created matched triples: men matched on quarter of birth, race, age, education up
to 8 years and location of birth.
This figure provides reason to doubt military service increases earnings by $4500.
From 1924 to 1926, the proportion of veterans stayed about constant and the earnings
stayed about the same. From 1926 to 1928, the proportion of veterans decreased
by 50% but earnings increased, suggesting military service decreases earnings.
Unmeasured Confounding
Graph is conditional on
measured confounders
(race, education up to 8
years, location of birth)
Earnings
Veteran Status
Unobserved
Variables
Instrumental Variables Strategy
Y=Outcome
W=Treatment
Z=IV
Graph is conditional on
measured confounders
(race, education up to 8
years, location of birth)
Y:Earnings
X
W: Veteran
Status
Z: Year of
Birth
X
Unobserved
Variables
Extract variation in W from Z that is free of unobserved confounders
and use this variation to estimate the causal effect of W on Y.
Key IV Assumptions: (1) Z independent of unobserved variables;
(2) Z does not have direct effect on outcome.
Prototype IV Design:
Matched Pair Encouragement Design
Consider a matched pair design in which there are I matched pairs
and one unit j in each pair i is encouraged to receive treatment
( Z ij  1 ) and the other unit j’ is not encouraged to receive
treatment ( Z ij '  1) .
Rubin Causal Model:
Each subject ij has two potential outcomes:
rTij = outcome if encouraged
rCij = outcome if not encouraged
and two potential treatment receiveds:
wTij = dose of treatment received if encouraged
wCij = dose of treatment received if not encouraged
Randomization Inference
A simple model says that the effect of encouragement on the outcome
is proportional to its effect on the treatment received:
rTij  rCij   (wTij  wCij )
(1)
In WWII study,   casual effect of military service
Let Rij = observed outcome, Wij  observed treatment received.
Under model (1), rCij  Rij  Wij .
In this context, the encouragement variable Z is said to be a valid
instrumental variable (IV) if Z is effectively randomly assigned:
1
1
P( Z i1  1, Z i 2  0)  , P( Z i1  0, Z i 2  1) 
2
2
If Z is a valid IV, we can test H 0 :   0 by testing whether Rij   0Wij
(= rCij if   0 ) is independent of Z ij , e.g., by a Wilcoxon signed rank test.
95% CI for effect of military service: (-$1,445, -$500)
Relationship to
Angrist, Imbens and Rubin Setup
Angrist, Imbens and Rubin (1996) define an IV as valid if it is
1. effectively randomly assigned (ignorable)
(rTij , rCij ) independent of Zij | Xij
2. no direct effect (exclusion restriction)
The model
rTij  rCij   (wTij  wCij )
assumes the exclusion restriction: encouragement has no direct effect.
Side note: Angrist, Imbens also consider situation of heterogeneous
treatment effects. They show that under an additional assumption
(monotonicity), a valid IV identifies the average treatment effect for
the subjects who would receive treatment if and only if encouraged to
do so (the compliers).
IV Applications in Health Research
Outcome ( Y )
Birth weight
Treatment ( W )
Maternal smoking
IV ( Z )
State cigarette taxes
Birth weight
Maternal smoking
Mortality
Premature baby
delivered at high
level NICU vs.
local hospital
Non-steroidal antiinflammatory drug
(NSAID) vs. nonNSAID drug
Breast cancer
surgery treatment
vs. non-surgery
treatment
HDL Cholesterol
Random assignment
of free smoker’s
counseling
Mother’s differential
distance between high
level NICU and local
hospital
Physician’s last
prescription type
Gastrointestinal
Complications
Mortality
Coronary Heart
Disease
Reference
Evans and Ringel
(1999)
Permutt and Hebel
(1989)
Baiocchi, Small,
Lorch and
Rosenbaum
(2010)
Brookhart et al.
(2006)
Proportion receiving
surgery in health
referral region
Brooks et al.
(2003)
Polymorphisms that
affect HDL
cholesterol
Voight et al.
(2012)
Sensitivity Analysis
IV method assumes that the IV (encouragement) is
effectively randomly assigned:
1
1
P( Z i1  1, Z i 2  0)  , P( Z i1  0, Z i 2  1) 
2
2
There is often concern about whether this is true.
In WWII Study, there are gradual long term trends in
apprenticeship, education, employment and nutrition
that might bias comparisons of workers born two years
apart.
A sensitivity analysis asks how departures from random
assignment of the IV of various magnitudes might alter
a study’s conclusion.
Model for Sensitivity Analysis
For subject ij, let  ij denote the probability that ij is encouraged,
 ij  P( Zij  1) .
Suppose that two subjects ij and ik may differ in their odds of
being encouraged by at most a factor of   1 because they differ
in terms of an unobserved covariate, uij  uik ,
1  ij (1   ik )

   i, j , k .
  ik (1   ij )
If   1 , IV is randomly assigned.
If   1 , then distribution of treatment assignments is unknown but
magnitude of departure from random assignment controlled by  .
Carrying out Sensitivity Analysis
Let   (11 , 12 ,
for each subject.
,  I 1 ,  I 2 ) denote the probabilities of being encouraged
For each fixed value of  , we can test H 0 :   0 using permutation
inference.
For a given value of  , we compute the minimum and maximum p-values
for testing H 0 :   0 for all  that satisfy
1  ij (1   ik )

   i, j , k .
  ik (1   ij )
Rosenbaum (Observational Studies, 2002) provides a simple method to
compute these minimum and maximum p-values.
95% CI for effect of military service when   1 : (-$1,445, -$500)
95% CI for effect of military service when   1.2 : (-$3,745, $1,735)
Sensitivity Analysis for WWII Study
Upper Bound on One-Sided Significance Level for 1926 vs. 1928 IV

1
1.2
1.5
1.6
2.2
2.3
H0 :   0
0.001
1.000
1.000
1.000
1.000
1.000
H 0 :   1, 000
0.001
0.860
1.000
1.000
1.000
1.000
H 0 :   4,500
0.001
0.001
0.027
0.904
1.000
1.000
H 0 :   10, 000
0.001
0.001
0.001
0.001
0.016
0.476
Strength of IV
•
An IV is strong if encouragement has a strong effect on
treatment received;
An IV is weak if encouragement has only a weak effect
on treatment received.
Study
Strong IV
Weak IV
World War II Study
1926 vs. 1928
1924 vs. 1926
Maternal Smoking Study Random assignment of State cigarette taxes
free counseling
•
Effects of Weak IVs
1. Increased Variance
2. Increased Sensitivity to Bias
Effect of Weak IVs I: Increased
Variance
Y
X
W|X
Z|X
X
Unobserved
Variables
If Z is a weak IV, then the variance of the IV estimate will
be higher because less variation in W from Z can be
extracted.
95% CI for effect of military service using 1926 vs. 1928 IV: (-$1,445, -$500).
95% CI for effect of military service using 1924 vs. 1926 IV: (-$10,130, $10,750)
Effect of Weak IVs II:
Increased Sensitivity to Bias
Power of a Sensitivity Analysis (Rosenbaum, 2004)
1
,
2
but we didn’t know this and wanted to allow for some sensitivity to bias measured
by 
Suppose Z were in fact a valid IV so that P( Z i1  1, Z i 2  0)  P( Z i1  0, Z i 2  1) 
Suppose also that   0 was large, so that H 0 :   0 was substantially in error.
We would like to be able to reject H 0 :   0 for all  that satisfy
1  ij (1   ik )

   i, j , k .
(1)
  ik (1   ij )
Power of a sensitivity analysis: Probability that we will reject H 0 :   0 for all 
that satisfy (1) assuming that Z is a valid IV and a given value of   0 .
Model for Power Analysis
Let rCi1  rCi 2 ~ N (0,  2 ) .
Subjects have random compliance patterns with
zero probability of being a defier and equal probabilities
of being a never taker or always taker.
Effect size is (   0 ) / 
Strength of instrument is P(Wij  1| Z ij  1)  P(Wij  1| Z ij  0) .
(probability of being a complier)
Effect size: (   0 ) /   1
Number of pairs I
Strength of IV:  100 1000 10,000 100,000
lim
1
0.5
0.1
1
1
1
1.00 1.00
0.99 1.00
0.12 0.73
1.00
1.00
1.00
1.00
1.00
1.00
1
1
1
1
0.5
0.1
1.2 1.00 1.00
1.2 0.92 1.00
1.2 0.03 0.03
1.00
1.00
0.04
1.00
1.00
0.10
1
1
1
1
0.5
0.1
2
2
2
1.00
1.00
0.00
1.00
1.00
0.00
1
1
0
1.00 1.00
0.18 0.97
0.00 0.00
I 
When the IV is valid (   1 ), the power is of course greater for
stronger IVs but there is good power for all cases with sample size
of 10,000 pairs. Valid but weak IVs eventually get it right.
But when   1 , the power can tend to 1 or 0 depending on the
strength of the IV. Weak IVs are quite sensitive to small biases.
Effect size: (   0 ) /   0.5
Strength of IV:  100 1000
Proportion of
compliers
1
1
1.00 1.00
0.5
1
0.64 1.00
0.1
1
0.07 0.32
Number of pairs I
10,000 100,000 lim
1.00
1.00
1.00
1.00
1.00
1.00
1
1
1
1
0.5
0.1
1.2 0.98 1.00
1.2 0.32 1.00
1.2 0.01 0.00
1.00
1.00
0.00
1.00
1.00
0.00
1
1
1
1
0.5
0.1
2
2
2
1.00
0.00
0.00
1.00
0.00
0.00
1
0
0
0.38 1.00
0.01 0.00
0.00 0.00
I 
For strong IVs, the sensitivity to unobserved biases is meaningfully
affected by the effect size (e.g., for   2,
I  1000 , proportion of compliers = 0.5, power is 0.97
when (   0 ) /   1 but 0.00 when (   0 ) /   0.5 ).
But for weak IVs, there is barely any difference between (   0 ) /   1
versus (   0 ) /   0.5 .
Practical Consequences
1. Weak IVs that might have small bias are dangerous to use.
Weak IVs are sensitive to quite small biases (   1 yet  close to 1),
even when the effect size (   0 ) /  is quite large.
Unless one is confident that a weak IV is perfectly valid (   1 ), its
extreme sensitivity to small biases is likely to limit its usefulness to
the study of enormous effects, (   0 ) /   1 .
2. Strong IVs that might be moderately biased are useful.
A strong IV may provide useful information even if
moderate biases are plausible.
3. Strength of IV important in choosing a study design.
Consider two studies, a small study with a strong IV and a large
study with a weak IV, which would have the same power if both IVs are
unbiased. When there is concern that the IVs might be biased, the small
study with a strong IV has considerable advantages.
Practical Consequences Continued
4. Strategies for increasing effect size more useful for strong IVs.
For strong IVs, the sensitivity to unobserved biases is meaningfully
affected by the effect size (   0 ) /  whereas for weak IVs, the
effect size makes little difference.
Sensitivity to unobserved biases can sometimes be reduced by
increasing the effect size say by reducing the unexplained
heterogeneity  of subjects (Rosenbaum, 2005). For instance
Ashenfelter and Rouse (1998) studied the effects of additional
education on earnings using identical twins and Kim (2007)
studied the earnings of veteran siblings to estimate the effect
of being drafted Strategies of this sort may be helpful with strong
IVs but largely ineffective with weak IVs.
Extended IV Methods for
Addressing Violation of
Exclusion Restriction
• Angrist, Imbens and Rubin (1996): two key
conditions for valid IV are :
– IV effectively random assigned conditional on
measured covariates X
– No direct effect on Y (exclusion restriction).
• We consider situations in which the
random assignment is plausible but the
exclusion restriction is not.
Instrumental Variables Strategy
Y=Outcome
W=Treatment
Z=IV
Graph is conditional on
measured confounders
(race, education up to 8
years, location of birth)
Y:Earnings
X
W: Veteran
Status
Z: Year of
Birth
X
Unobserved
Variables
Extract variation in W from Z that is free of unobserved confounders
and use this variation to estimate the causal effect of W on Y.
Key IV Assumptions: (1) Z independent of unobserved variables;
(2) Z does not have direct effect on outcome.
Vascular access in
hemodialysis
• Hemodialysis
– One of main treatment options in end-stage
renal disease (ESRD)
– Requires access to vascular system
• Three main types
– Catheter
– Synthetic material
– Native arteriovenous fistula (AVF)
Vascular access (cont’d)
• Type of VA (A) partially determines
dose of dialysis (DD; S)
Y
– Native AVF allows larger doses than
catheter
– S may affect outcomes (e.g., mortality)
• VA may have effects on outcome
(Y) not mediated by dose (e.g.,
infection)
• Incomplete directed acyclic graph
(DAG) of key variables
S
A
Estimand of interest
• To gauge impact of type of VA,
interested in overall effect
Y
– Involves both
• Direct effect (A->Y)
• Indirect effect (A->S->Y)
S
Ya
• Formulate in terms of potential
outcomes:
Y a1  Y a0
Y
a1S a1
Y
singly indexed
a0 S a0
direct effect:
doubly indexed
a1S a0
Y
indirect effect: Y
overall effect: Y
a1S a1
a1S a1
Y
a0 S a0
Y
a1S a0
Y
a0 S a0
A
Confounding by indication
• AVFs given preferentially to healthier
subjects
• Results in confounding by indication
– Often difficult to control using standard
methods based on ignorable treatment
assignment
– Variety of treatments of dialysis patients in
which standard approaches based on
ignorability lead to implausible results
• Dose of dialysis choice (S) also
nonignorable
Instrumental variables
• Alternative approach for estimation
• Need to find instrumental variable (R)
– Associated with treatment of interest (A)
– Independent of unmeasured confounders, i.e., shares no
unmeasured common cause with outcome Y.
– Has no direct effect on outcome (exclusion restriction)
• Practice at which dialysis provided reasonable candidate
– Used for various analyses in Dialysis Outcomes and Practice
Patterns Study (DOPPS)
• Large, international study with hundreds of practices
• Will assume that practice (R) shares no unmeasured
common causes with S or Y.
Revise DAG
• Need to elaborate DAG
• Include
Y
– instrument/center (R)
– Measured (X) and
unmeasured (U) common
causes of variables of interest
• Is R a valid instrument for
the overall effect of A on Y?
S
A
R
U
X
Graphical criteria for instrument
• Remove effect of treatment of
interest
• Check whether R independent
of/D-separated from Y
• Directed path R->S->Y
• Criterion not satisfied
• R not a valid instrument for
overall effect of A
• In Angrist, Imbens & Rubin
framework, the problem is that R
has direct effect on Y through S
and hence violates the exclusion
restriction.
Y
S
A
R
U
X
Second Example: Return to
Schooling
• Y=Earnings, A=Years of Education
• Unmeasured confounders: Ability, Motivation.
• Card (1993) proposes as an IV, R= distance person
grew up from nearest four year college.
• Problem:
– R also affects whether person lives in an
SMSA as an adult (S) conditional on A and
measured confounders X (whether lived in an
SMSA growing up, region where grew up and
family background variables).
– There is a wage premium to living in an
SMSA as an adult.
Return to Schooling DAG
• R (living near college
growing up) is not a valid
instrument for the overall
effect of A (years of
schooling) on Y (earnings)
because it has direct effect
on Y through S (lives in
SMSA as an adult).
Y
S
A
R
U
X
Estimation
• For estimating overall effects of A in these
two problems, can’t use
– Standard methods based on ignorability
– Standard instrumental variables methods
• Idea: Look for interactions between R and
X that can serve as instruments.
Extended Instruments
• Look for component of X that
interacts with R to affect A but not
Y directly.
• Card proposes family income as
component of X that
R*X
– Interacts with R to affect A : college
proximity is a factor that lowers costs
of higher education, consequently it
has a bigger effect on a poorer family
– Does not directly effect S nor Y: the
direct earnings effect of living near a
college or the direct effect on living in
an SMSA does not vary by family
background.
R
Y
S
A
U
X
Two-step approach
•
•
•
•
Estimate joint effect of A, S on Y
Estimate effect of A on S
Combine to obtain overall effect
In systems of linear models, overall
effect is sum of
– Direct effect of A: ψA
– Indirect effect of A: ψSΦA
Y
A
A A
S
S
Two-step approach (1st step)
• Yas potential outcome
• Model for joint effect:
– Yas=Y00+aψA+sψS
– Rank-preserving/deterministic formulation
• Model for observables
– E*=Best Linear Predictor
– E*(Y|X,R)=E*(YAS|X,R)=
E*(Y00|X,R,X*R)+E*(A|X,R,X*R)ψA+E*(S|X,R,X*R)Ψs
– Identifiability requires that E*(Y00|X,R,X*R), E*(A|X,R,X*R) and
E*(S|X,R) not collinear.
• One way: Assume E*(Y00|X,R,X*R) only depends on X. Then we
need one component of X that interacts with R to affect A.
• Another way: Assume E*(Y00|X,R,X*R) depends on X and R but not
X*R. Then we need at least two components of X that interacts with
R to affect.
– Estimation by two stage least squares. Regress A and S on X,
R and X*R. Regress Y on Aˆ , Sˆ , X , R
Two-step approach (2nd step)
• Under assumptions
Y
– Effect of A on S confounded
– R not instrument for effect of A on S
S
• Consider alternative
– Linear model for joint effect of R, A
– Sra=S00+rΦR+aΦA
A
R*X
• Model for observables
– E*(S|X,R)=E*(S00|X,R,X*R)+RΦR+
E*(A|X,R,X*R)ΦA
• Can estimate by 2SLS under the assumption
that E*(S00|X,R,X*R) does not depend on X*R
(uncheckable) and that X*R affects A.
• Regress A on X, R, X*R. Regress S on  , X,
R.
R
U
X
Results for Card’s Data
Path Analysis (OLS)
Two Step Extended IV
Estimate of
SE
Overall Effect of A
0.0762
0.0004
0.1503
0.0462
Y= log earnings
A= years of schooling
S = lives in SMSA as an adult
R = lived near 4 year college growing up
X = experience, experience squared, black indicator,
indicator for living in SMSA growing up,
indicators for region growing up,
mother and father’s education
Summary
• The IV method can be a powerful strategy for
observational studies when there are
confounders that are hard to measure and there
is a “random” encouragement to receive
treatment.
• When encouragement is not actually random, it
is important to do a sensitivity analysis.
• Strong IVs are much less sensitive to bias.
• When the exclusion restriction might be violated,
developed extended IV methods that use X*R as
IVs.
Papers
• Small, D.S. and Rosenbaum, P.R. (2008), “War
and Wages: The Strength of Instrumental
Variables and Their Sensitivity to Unobserved
Biases,” Journal of the American Statistical
Association, 103, 924-933.
• Joffe, M. M., Small, D.S., Brunelli, S., Ten Have,
T.R., and Feldman, H. I. (2008), "Extended
Instrumental Variables Estimation for Overall
Effects," International Journal of Biostatistics, 4.
• Baiocchi, M., Small, D.S., Lorch, S.A. and
Rosenbaum, P.R. (2010), “Building a Stronger
Instrument in an Observational Study of
Perinatal Care for Premature Infants,” Journal of
the American Statistical Association, 105, 12851296
• e-mail: dsmall@wharton.upenn.edu
Alternative estimands
• Assumed that interested in overall effect
– Vascular Access (VA) inevitably affects Dose
of Dialysis (DD)
• Type of VA limits possible dose
• However, may be possible to alter DD
• Interested in
– Effect of DD
– Effect of VA if affects DD in different fashion
from under current practice
Alternative estimands (cont’d)
• Show altered effect, new
intervention on DAG
• Formulate in terms of potential
outcomes
Y
S
A
S g ,a target level of S
under treatment a, plan g
E (Y
aS g ,a
) expected of Y level
under treatment a, plan g
• Contrast for different levels of
treatment
R
U
X
Alternative estimands (cont’d)
• Defining intervention on S
– Individualize target levels of S
• e.g., base on maximum tolerated DD
• Insufficient information in established databases
(e.g, DOPPS)
– Set target level of S based on A, covariates X
• Currently little information to set target levels
• Available covariate information may be insufficient
to determine whether particular DD feasible for
individual
Alternative estimands (cont’d)
• Defining intervention on S
– Speculate about feasible interventions on S at
aggregate level
• Consider effects of A on S under those
interventions; i.e., propose value for ΦA*
• Compute overall effect from component effects:
ψA+ψSΦA*
• Perform sensitivity analysis for values of ΦA*
One-step approach
• Estimator of effect of A on S does not
require either standard ignorability or IV
• Can we do same for overall effect of A
A
R*X
on Y?
R
• Remove S from graph, redraw diagram
• Graph identical to original graph
removing Y
A
R*X
• Use same methods of estimation for
R
effect of A on S
Y
U
X
S
U
X
Results for Card’s Data
Estimate of
Overall Effect of A
Path Analysis (OLS)
0.0762
Two Step Extended IV 0.1503
One Step Extended IV 0.1500
SE
0.0004
0.0462
0.0462
Y= log earnings
A= years of schooling
S = lives in SMSA as an adult
R = lived near 4 year college growing up
X = experience, experience squared, black indicator,
indicator for living in SMSA growing up,
indicators for region growing up,
mother and father’s education
Download