Estimating Treatment Effects with Observational Data using Instrumental

advertisement
Estimating Treatment Effects with
Observational Data using Instrumental
Variable Estimation: The Extent of
Inference
John M Brooks. Ph.D.
Health Effectiveness Research Center (HERCe)
Colleges of Pharmacy and Public Health
University of Iowa
June 8, 2004
Health
Effectiveness
Research
Center
1
When the folks at the Academy asked me to do this educational oriented
presentation about instrumental variables, I looked through the literature some more
and realized that what is often missing is an appreciation of the assumptions
required for IV estimation and the extent from which one can make inferences from
these estimates.
That is the focus of this talk.
1
Research Goal:
Estimate casual relationships between
"treatment" and “outcome” in healthcare...
•
•
•
treatment on outcome
behavior on outcome
system change on outcome
2
2
The best estimation method to make inferences about
these relationships is a function of:
1. the manner in which the researcher collects data; and
2. the approach used to control for “confounding factors”
confounding factors: factors related to both the
treatment and outcome.
3
3
Research Environments and Estimation Methods
Statistical “Matching”
Techniques (Propensity Scores)
Secondary
Databases
ANOVA
Quasi-Experimental Designs
Logistic Regression
Instrumental Variables
Multiple Regression
– “Ex Post Design”
– “Risk Adjustment”
Statistical Control of
Confounding Factors
Design Control of
Confounding Factors
Weighted Regression
Techniques of Survey
Databases:
• NMES
• MEPS
Entirely Controlled
2
Experiment - 3
Tests
– Randomized
Controlled Trials
Researcher-Collected
Databases
4
Rarely are folks experts in all the methods listed here.
Points:
1. Researcher collected ... get correct measures (in theory ... nothing unmeasured
... no unmeasured confounding factors) Rarely do I see my survey researcher
friends discuss the affect of unmeasured variables...if important they measured it!!
2. The risk of not measuring confounding factors increases the more the researcher
is left out of the data collection process.
3. Risk adjustment often leads reviewers unsatisfied. Hence the development of
more “design” based methods. The application of design “ex post”
4
Sources of Treatment Variation in Health Care
1. Randomized Controlled Trials: study of patients with
a given medical condition in which treatment is
randomly assigned.
• Why randomly assign treatment to patients?
To help ensure that estimated treatment affects are
attributable to the treatment and not unmeasured
confounders.
The Gold Standard
5
To estimate treatment effects, you need treatment variation. Patients must choose
or be given different treatments in order to assess effects
Hopefully unmeasured confounders will be distributed evenly across groups.
Interesting to note that folks will show whether measured confounders are
distributed evenly across groups.
5
• Why don’t we do more Randomized Controlled Trials
between approved treatments?
→ ethical problems
→ expensive and time-consuming
→ little motivation
→ inability to generalize
6
Focus on comparing drugs already released. Little incentive once treatment is
approved. And once approved in the real world, who consents to randomization?
6
2. Observational Healthcare Databases
• Database Types:
→ Claims: medical service treatment claims from
individuals with health insurance
→ Provider-Specific: databases describing the
utilization of a set of providers.
→ Health Care Surveys: surveys of patients or
providers detailing health
care utilization.
7
7
• Strengths:
→ plenty of variation in treatment choice;
→ potentially enhanced ability to generalize – reveals
variation in treatment choice across a variety of
clinical scenarios;
→ can assess treatments in practice – estimate
“effectiveness”;
→ unobtrusively collected;
→ the power of large numbers and time.
8
8
• Weaknesses:
→ data usually not collected for researcher purposes;
→ missing information;
- care not covered is not observed
- care not claimed is not observed
- claim form limitations
- nuances of illness, treatment, and patient that can’t
be recorded on claims forms
→ patient enrollment variation;
→ confounding information may be unobserved.
9
9
Is the Main Source of Weakness with Observational Data
Unmeasured Confounders or Treatment Selection Bias?
1. Unmeasured Confounders
• Unmeasured Confounders argument:
→ homogenous treatment effect;
→ unmeasured factors related to both treatment
and outcome is the source of bias.
10
This conversation is needed to gain an understanding of the biases involved, and
the inferences that can be made from IV estimation
10
• Assume true outcome relationship is:
Y = ao +
a1•T + a2•L + e
where:
Y = measure of outcome (e.g. 1 if survive to a
certain time period, 0 otherwise);
T = 1 if receive treatment, 0 otherwise; and
L = additional factor (e.g. severity, other treatments).
Goal is to estimate a1 – the effect of treatment on outcome.
11
a1 is the truth that we are trying to estimate.
11
• For Estimation Suppose:
→ L is not measured and the estimation model is:
Y = ao +
u
=
a1•T + u
where:
(a2•L + e)
→ L is related to Y (a2 ≠ 0); and
→ T and L are related (Cov(T,L) ≠ 0).
Cov(T,L) – covariance of T & L. Cov(T,L) ≠ 0 essentially
means that T & L move together.
12
12
• Define the ordinary least squares (ANOVA) estimate of
a1 as â1 .
→ It can be shown that under these assumptions â1 is
a biased estimate of a1 through its expected value:
E [ aˆ1 ] = a1 + Cov(T,L)•a2
→ Also note that E [ aˆ1 ] will equal a1 if either:
-- Cov(T,L) = 0; or
-- a2 = 0.
13
If conditions exist, Not a confounder. Randomization hopes to yield Cov(TL)=0.
13
• Suppose theory about the unmeasured variable “L”
suggests:
→ “a2 < 0” (patients with higher severity have lower
cure rate).
→ Cov(T,L) > 0 (treated patients are generally more
severe).
• Plug in “signs” into our expected value formula to find:
E [ aˆ1 ] = a1 + (+ )(−)
( −)
E [ aˆ1 ]
<
a 1.
14
Generally only treating more severe patients that have a lower chance of a good
outcome will yield estimates that are biased low.
The reverse is true.
14
• Problem with the Unmeasured Confounders argument
to describe bias in observational data:
→ It does not provide a theoretical foundation to link
treatments to unmeasured factors....
Why is Cov(T,L) ≠ 0?
→ In the case we just described, if treatment effect (a1)
is the same for all patients, why would Cov(T,L) > 0?
Perhaps patients getting treatments:
-- live in areas with high/low poverty;
-- live in areas with more pollution; or
-- also tend to get other unmeasured treatments.
15
If regardless of severity, age, comorbidities, other treatments, the benefit from
treatment is identical, why would patients with different L values get treated and
some don’t?
15
2. Treatment Selection Bias (the gestalt underlying most
negative reviewer’s comments)
• Treatment Selection Bias argument:
→ heterogeneous treatment effect -- Cov(T,L)
reflects the decision-maker’s beliefs about the
differences in treatment effectiveness across
patients; and
→
bias comes from unmeasured factors (severity,
other treatments) related to the treatment’s
expected effectiveness that affects both
treatment choice and outcome.
16
Not sample selection bias...(though needs to be remember here) but TREATMENT
Selection bias.
People that get the treatment are the one’s most likely to benefit MORE...
16
• Assume true outcome relationship is:
Y = bo + (b1 + b2•L) •T + b3•L + e
where:
Y
= measure of outcome (e.g. 1 if survive to a
certain time period, 0 otherwise);
T
= 1 if receive treatment, 0 otherwise;
L
= unmeasured factor (e.g. severity, other
treatment);
b3
= the direct effect of L on Y; and
(b1 + b2•L) = effect of T on Y that depends on L.
17
L is related to Y through change in treatment
effectiveness (b2 ≠ 0) and through its direct effect on
outcome (b3 ≠ 0).
17
→ L is now related to T through theory linking "treatment
choice" to the decision-makers expectations of
treatment benefits across patients with different “L”.
T = co + c1•L + c2•W +
v
where:
T = 1 if receive treatment, 0 otherwise;
L = unmeasured factor (e.g. severity, other
treatment) affecting treatment choice through
expected treatment effectiveness; and
W = other factors affecting treatment choice.
If decision makers use L in treatment
decisions, c1 ≠ 0 and Cov(T,L) ≠ 0.
18
L belongs here because decision-makers believe that the effectiveness of T
changes with L.
Remember C1 for below.
18
• Ultimate goal should be to estimate (b1 + b2•L) – the
effect of treatment T on outcome Y across levels of L.
• For estimation suppose:
→ L is not measured and it is wrongly assumed by the
researcher that the effect of T is homogenous, the
estimation model is:
Y = ao +
a1•T + u
where:
u = (b2•L•T + b3•L + e)
19
19
• Define the ordinary least squares (ANOVA) estimate of
a1 as â1 .
→ It can be shown that the expected value of â1 is:
E [â ] ≈ b + b
1
1
2
E [ T ] ⋅E [ L ]
+ c ⋅( b + b )
var[ T ]
1
2
3
→ If c1 = 0 (no selection based on L), then E [â1 ]
becomes:
E [â ] ≈ b + b
1
1
2
E [ T ] ⋅E [ L ]
var[ T ]
Yields an average estimate that depends on the mix of
“L” in the population (e.g. RCT using a broad population).
20
Problem here is that there is no “truth” to compare our estimate.
An RCT that assumes a homogenous treatment effect across a broad population
with heterogenous treatment effects.
A valid estimate, but not very useful. Average estimate if all patients would put into
an urn...
20
• How does c1 • (b2 + b3) affect this estimate?
→ Assume that L is unmeasured illness severity
and that higher L means more severe illness.
→ Higher L lowers survival which implies b3 < 0.
→ If treatment benefit is less for more severe cases
(e.g. surgery for heart attacks) then:
b < 0 ⇒ c < 0 ⇒ c ⋅ (b + b ) = − ⋅ (− + − ) > 0
2
benefit falls
with higher
severity
1
1
2
3
less treatment
in more
severe cases
Estimate of average population treatment benefit will
21
be biased high.
Less severe get treatment, and less severe have higher benefit.
So, as an estimate of the average population treatment benefit, it will be biased
high.
Because less the folks that get benefit the most and they are generally less severe.
21
→ If treatment benefit is greater for more severe cases
(e.g. antibiotics for otitis media) then:
b > 0 ⇒ c > 0 ⇒ c ⋅ (b + b ) = + ⋅ (+ + − ) ?
>
< 0
2
1
1
2
3
benefit increases more treatment
with higher
in more
severity
severe cases
Estimate of average population treatment affect is
biased but sign can not be determined.
22
Selection to patients with higher benefit biases up, but providing to more severe
patients biases the effect downward.
Interesting to note that if b3 not negative. Pure selection will always cause the
estimate to be biases high.
22
• So what do we have here?
→ Observational data contains enormous treatment
variation.
→ Treatment choice may be related to the selection or
sorting of patients using unmeasured (to the
researcher) characteristics that are related to
expected outcomes.
→ Under “selection”, standard statistical techniques
yield biased estimates that don’t apply to anyone
anyway.
Do we have any alternatives?
23
23
Instrumental Variables (IV) Estimation and “Subset B”
• IV estimation offers consistent estimates for a subset of
patients (McClellan, Newhouse 1993):
Marginal Patients: patients whose treatment choices vary
with measured factors called instruments
that do not directly affect outcomes.
• McClellan and Newhouse argue that estimates of treatment
effects for Marginal Patients are useful:
→ They are estimates for patients for whom the benefits of
treatment are the least certain – patients least like those in
RCTs.
→ Estimates may be more suitable than RCT estimates to
address the question of whether existing treatment rates
24
should change.
Two key “subsets” here. Subset B and Marginal Patients. I will also “group”
patients later to isolate subsets. Could be a bit confusing...ask questions if so.
Could be thought as limiting, but it is offered that...
Instruments are generally “non-clinical” to fit the non-direct criteria.
If some non-clinical factor affects treatment choice, it must be that the best choice is
considered unclear.
24
• Where do Marginal Patients come from?
Distribution of Patients by Prior Assessment of
the Certainty of Treatment Benefit
A
0%
More certainty
about treatment
benefits
B
50%
C
100%
Less certainty
about treatment
benefits
A = subset of patients all providers agree to treat.
C = subset of patients all providers agree not to treat.
B = subset of patients whose treatment choice is
situation/provider dependent.
25
Given measured and unmeasured characteristics and existing clinical evidence.
A and C in “all situations”.
25
• Patients in Subset B are interesting because:
→ the “best” treatment choice (treat or don’t treat) is least
certain;
→ treatment or no-treatment for a patient in this subset is
not considered bad medicine – the “art” of medice;
→ the possibility of gaining new RCT evidence for patients
in this subset is remote (ethics, motivation);
→ McClellan et al. 1994 argue that policy interventions
affect mainly the treatment choices for patients in this
subset; and
→ Non-clinical factors (e.g. provider access, market
pressures) affect mainly the treatment choices of
patients in this subset.
26
Whereas non-treatment in the A group and treatment in the “B” group would be
considered bad medicine.
26
• Size and location of Subset B varies with clinical scenario.
Ý treatment with little consensus (e.g. aggressive treatment
for early-stage prostate cancer):
A
B
0%
50%
More
Certainty
C
100%
Less
Certainty
Ý off-label use for new treatment (e.g. new anti-cancer
drugs used in non-tested cancer populations):
B
0%
More
Certainty
C
50%
100%
Less
Certainty
27
Of course, the selection of the underlying population will matter here.
27
• Changes in the underlying population definition will affect
the location of Subset B.
Ý aggressive treatment for early-stage prostate cancer for
50-60 year-olds with no comorbidities:
A
0%
B
50%
C
100%
More
Certainty
Less
Certainty
Ý aggressive treatment for early-stage prostate cancer for
70-80 year-olds with one comorbidity:
A
0%
More
Certainty
C
B
50%
100%
Less
Certainty
28
People with the same disease can have different distributions based upon
measured characteristics. In contrast to the last page...
28
• IV estimation involves:
1. Finding measured variables or “instruments” (Z) that:
a. are related to the possibility of a patient receiving
treatment (cov(T,Z) ≠ 0); and
b. are assumed (through theory) unrelated directly to Y
or to unmeasured confounding variables (cov(Z,L) = 0).
The theoretical basis for “Z” variables should come from
a model of treatment choice – the “W” variables in:
T = co + c1•L + c2•W +
v
where:
W = other factors affecting treatment choice.
29
So the treatment variation described by the instrument is unrelated to unmeasured
confounders. Supported by a theoretical story of plausibility.
29
• IV estimation involves con’t:
2. Grouping patients using values of the “instrument”.
3. Estimate treatment effects for marginal patients by
exploiting treatment variation rate differences across
patient groups.
Local Average Treatment Effect -(Imbens & Angrist 1994)
30
The approach more naturally reflecting usual causal research in healthcare.
So the treatment variation described by the instrument is unrelated to unmeasured
confounders.
30
• For example, if an instrument divides patients into two
groups, a simple IV estimate can be found by calculating:
1. the overall treatment rate in each group (ti = treatment
rate in group “i”); and
2.
the overall outcome rate in each group (yi = outcome
rate in group “i”); and estimate:
aˆ1IV =
difference in outcome rate
y − y2
= 1
difference in treatment rate
t1 − t 2
where:
aˆ1IV
= average treatment effect for the “marginal patients”
specific to the instrument used in the analysis –
only those patients whose treatment choices were
affected by the instrument who must have come
31
from Subset B.
All you need is 4 little numbers!!!!!!
What did the increase in treatment rate buy in terms of change in outcome rate?
31
• Hypothetical Treatment Choices Across Patients
Grouped by Access to Providers Required for Treatment
Patient Group Closer to Providers Required for Treatment:
treated
A
B M
C
0%
More Certainty
100%
Less Certainty
Patient Group Further From Providers Required for Treatment:
treated
A
0%
More Certainty
M
B M
50% 60%
C
100%
Less Certainty
= patients within Subset B whose treatment choices
are affected by the instrument – the Marginal
Patients for that instrument.
32
For example, define providers and provider location, measure distance, group
patients, etc.
Other instruments may select a different group from Subset B
32
• We have treatment rates for each group:
Closer Group Treatment Rate: .60
Further Group Treatment Rate: .50
Suppose we also measured “cure” rates in both groups:
Closer Group Cure Rate: .40
Further Group Cure Rate: .38
• Four numbers lead to the following IV estimate:
â =
1IV
.40 −.38
.02
=
= .2
.6 − .5
.1
33
Note the “four little and easily measured numbers”.
Given this estimate, in its rawest form, some might say If the treatment rate went
from 0 to 1 (100%) for those folks affected by the instrument and you could
generalize to everyone, the cure rate would increase by .2 or 20%.
Take the “why this is” by faith for a minute. I will demonstrate with another
hypothetical example.
33
• Strict Interpretation:
→ If the treatment rate in the Further Group was increased .01
percentage point (e.g. .50 to .51) by increasing treatment
for the M patients in the Further Group, the Cure rate in the
Further Group would increase .002 (.01 • .2) – from .38 to
.382.
• Stretched “Policy-Relevant” Interpretation (McClellan et al.
1994)
→ A behavioral intervention that increases the overall
treatment rate by .01 percentage point (e.g. .55 to .56)
would lead to an increase in the cure rate of .002 (.01 • .2).
34
This may not be the case.
34
• Stretched interpretation assumes that the treatment effect
for patients in Subset B is fairly homogenous and an IV
estimate from a single instrument can be generalized to all
patients in Subset B. This allows one to say:
• Stretched interpretation is not perfectly accurate if treatment
effects are heterogeneous within Subset B and different
instruments affect treatment choices from different patients
within Subset B.
→ Results from a single instrument may still be more
appropriate than assuming RCT results apply to Subset B.
→ Ability to generalize results may increase if more than one
instrument is used in an IV analysis.
35
This may not be the case.
35
• IV qualifiers to remember:
→ second property of IV variables (cov(Z,L) = 0) is
forever an assumption (unless more data are
obtained); and
→ unmeasured but correlated treatments may still bias
estimated treatment benefits.
Researchers should fully qualify their IV estimates – don't
oversell.
36
36
Hypothetical Example to Demonstrate “4-Number” Result
Suppose:
• 2100 children with Acute Otitis Media (AOM) in a
population.
• Two treatment possibilities:
1.
2.
antibiotics;
watchful waiting.
• The patients in our sample are in one of three severity
types “low”, “medium”, and “high”
• Severity type is observed by the provider/patient but is
37
not observed by the researcher.
37
• The 2100 patients are distributed across severity type in the
following manner:
number of patients
High
800
severity type
Medium
800
Low
500
• The actual underlying cure rates for each severity type by
treatment are:
treatment
antibiotics
watchful waiting
High
.95
.80
severity type
Medium
.97
.90
Low
.98
.98
38
38
→ Higher severity means a lower the cure rate in general
(b3 < 0).
→ Antibiotics have a higher curative effect in more severe
patients and offer no advantage to the less severe (b2 > 0).
ASSUMPTION: Treatment effects are heterogenous.
→ All providers have inclination that antibiotics work well in
the "high" severity patients; have little effect on the "low"
severity patients; but the effect in the "medium" type is
unknown to providers.
Leads to selection bias...the more severe kids are
treated (c1 > 0).
39
39
Potential Methods to analyze:
1. Randomize Patients Into Treatments -- ANOVA
2. Providers Assign Treatments -- ANOVA
3. Instrumental Variable Grouping
40
40
1. Randomize Patients Across Population – ANOVA.
Patient Treatment Assignments After Randomization
by Severity Type
patient groups
antibiotics
watchful waiting
severity type
High
Medium
400
400
400
400
Low
250
250
41
41
Expected average cure rates for each group:
Antibiotic Cure Rate =
W .W .Cure Rate =
400
400
250
× .95 +
× .97 +
× .98 = .965
1050
1050
1050
400
400
250
× .80 +
× .90 +
× .98 = .881
1050
1050
1050
• Unbiased average antibiotic treatment rate for the entire
population (.965-.881 = .084), but
• To whom does it apply? A patient randomly chosen
from an urn? Are patients chosen from urns?
42
42
2.
Providers Assign Treatments -- ANOVA
If providers follow “inclinations”, we may end up with
something like:
Number of Patients Assigned by Providers to Each
Treatment Group by Severity Type
patient group
antibiotics
watchful waiting
High
800
0
severity type
Medium
400
400
Low
0
500
43
C1 > 0 ... Higher severity, more likely to be treated.
43
Expected average cure rates for each group:
Antibiotic Cure Rate =
W .W .Cure Rate =
800
400
0
× .95 +
× .97 +
× .98 = .957
1200
1200
1200
0
400
500
× .80 +
× .90 +
× .98 = .944
900
900
900
• For this population the average treatment effect is (≈.084).
We find a biased low estimate of the antibiotic treatment
effect for the average patient (.957 - .944 = .013 < .084).
• To which patients does this estimate apply?
44
Relate to bias equation. b3 is more negative than b2 is positive
44
3. Instrumental Variable Grouping -- Further:
a.
Assume information is available to approximate
distances from patients to providers
• address of patient
• supply of providers in area around patients
b. Evidence suggests that patients in areas with more
physicians per capita have a higher probability of being
treated with antibiotics for their AOM than patients in
areas with fewer physicians per capita.
45
45
If “b” is true, divide 2100 patients into two groups based on
the physicians per capita in the area around their home:
Group 1: the group of patients living in areas with a higher
number of physicians per capita;
Group 2: the group of patients living in areas with a lower
number of physicians per capita;
46
46
Using our assumptions, does this grouping qualify as an
instrument?
1. Doc supply related to treatment? Yes, if patients tend to go to
the closest provider for
treatment.
If true, and providers follow inclinations we may see treatment
patterns something like:
Patient Treatment Assignments by Severity Type
patient
group
Group 1
High
100% antibiotics
Group 2
100% antibiotics
severity type
Medium
80% antibiotics
20% W.W.
30% antibiotics
70% W.W.
Low
100% W.W.
100% W.W.
47
Note I have assumed that the High group is subset A, low group is C and the
Medium group is Subset B.
47
2. Is grouping related to unmeasured confounding variables
(e.g. severity)? Related to severity only if parents chose
residences in expectation of the severity of a future acute
condition.
If not related to severity, we assume equivalent severity
distributions across groups:
Number of Patients in Each Group by Severity Type
patient group
Group 1
Group 2
High
400
400
severity type
Medium
400
400
Low
250
250
48
The good and the bad of IV approach...
What I like about IV over propensity scores... we can argue this point. Results are
conditional on a KNOWN assumption related to where we get the treatment
variation.
48
Expected average estimated cure rates for these groups:
Group 1 Cure Rate =
400
320
80
250
×.95 +
×.97 +
×.90 +
×.98 = .959428
1050
1050
1050
1050
Group 2 Cure Rate =
400
120
280
250
×.95 +
×.97 +
×.90 +
×.98 = .946092
1050
1050
1050
1050
Well, (.959428 - .946092) = .013336 doesn't appear to reveal
much of anything…
49
Notice the only differences in the cure rates. The different percentages on A and
WW in the two groups.
49
Now look at the antibiotic treatment rate in each group:
720/1050 = .68571 in Group 1
520/1050 = .4952381 in Group 2
These differences also don't look very informative….
The IV change in the cure rates resulting from a one unit
increase in the drug treatment rate equals:
aˆ1IV =
.959428 − .946092
.013336
=
= .07
.68571 − .4952381
.190471905
• This estimate is the average difference in the antibiotic cure
rate for the marginal or in this example the “Medium”
severity patients.
50
50
• Remember the actual “unknown” cure rates for each
group by treatment are:
treatment
antibiotics
watchful waiting
High
.95
.80
severity type
Medium
.97
.90
.07
Low
.98
.98
• This estimate was found using only measured treatment
rates and outcome rates across “groups” that are
defined by the instruments.
• Which of the estimates above is the most important for
policy-makers wondering about over/underutilization of
a treatment?
51
51
IV Brass Tacks
• Where do instruments come from?
→ Theory on what motivated choices, not theory on
how choices can be motivated.
→ Observed differences in:
-- guideline implementation (timing/interpretation)
-- product approval rules across payers
-- reimbursement across payers/geography
-- area provider “treatment signatures”
-- geographic access to relevant providers
-- provider market structure/competition
→ Generally, “Natural Experiments” (Angrist and Krueger,
2001)
52
52
• General IV Estimation Model
Treatment Choice Equation (1st stage):
T = c + c ⋅ X + c ⋅Z + (v + c ⋅L
i
0
2
i
3
i
Outcome Equation (2nd stage):
i
1
i
)
Yi = a0 + a1 ⋅ Tˆi + a2 ⋅ X i + (ei + a3 ⋅ Li
)
Yi = 1 if health outcome occurs, 0 otherwise;
Xi = measured patient clinical characteristics;
Ti = 1 if patient received treatment, 0 otherwise;
Tˆi = predicted treatment from 1st stage;
Zi = a set of binary variables to grouping patients based on
values of instrumental variables (from W); and
Li = unmeasured confounding variables assumed related
to both Y and T but not Z (from W).
The only variation in T used to estimate a1 comes from Z.
53
Z variables excluded from Outcome Equation via Theory and assumed unrelated to
L
53
→ The estimate of a1 can only be definitively generalized
to the patients whose treatment choices were affected
by Z (Angrist, Imbens, Rubin 1996).
→ F-test of whether the parameters within c3 are
simultaneously equal to zero provides a test of the
first instrumental variable criterion:
Finding measured variables or “instruments” (Z) that:
a. are related to the possibility of a patient receiving
treatment (cov(T,Z) ≠ 0)
54
54
→ Model can be estimated via:
-- Two-Stage Least Squares (2SLS) – PROC
SYSLIN in SAS.
-- Bivariate Probit – BIPROBIT function in STATA.
-- Two-Stage Replacement (e.g. Beenstock &
Rahav, 2002).
→ 2SLS offers consistent estimates that are
asymptotically normal with the fewest assumptions
(Angrist 2001).
-- essentially regressing group-level outcome rate
changes on group-level treatment rate changes.
55
John Wennberg’s gestalt.
55
• How many groups?
→ Z can be specified as a continuous variable, but results
are then conditional on this assumption and is less
interpretable.
→ Creating many groups from an instrument (more binary
variables in Z) uses more information and yields a
weighted average of many two-group comparisons, e.g.
-- low/high groups using the median of the instrument
VS
-- low/med low/med high/high groups using the
quartiles of the instrument.
→ Too many groups may introduce bias.
→ Best to report estimates for several grouping strategies.
56
Grouping more natural... Experiment feel... Less conditional on parametric
assumptions.
56
• Example: The effect of breast-conserving surgery
(BCS) relative to mastectomy (MAS) for stage
II breast cancer patients (Brooks et al. 2003).
→ Sample: ESBC Stage II patients (N = 2,905) from the
Iowa SEER Cancer Registry, 1989-1994 that
had either BCS or MAS.
→ Measures:
-- Treatment: Had BCS plus irradiation.
-- Outcomes: Survival 1, 2, 3 and 4 years.
→ Instrument: BCS percentage for all other early-stage
breast cancer patients in 50-mile radius
of patient zip code in diagnosis year.
57
57
Comparison of Characteristics of ESBC Patient Groups
In Iowa, 1989-1994: Treatment vs. Area BCS Rates
Group based on
actual treatment choice
Patient
BCS
Char’s
BCS %
100
Under 65 %
67***
65 to 74 %
22
Over 74 %
9***
Stage IIb %
21***
Comor Indexb .15***
Number
2622
Mastectomy
0
44
25
27
35
.31
283
Group based on area
treatment signature
High
Low
BCS areaa
BCS areaa
12***
8
53***
48
23
25
24
27
35
33
.31
.28
1225
1680
.
***,**,* significant differences at the .01, .05 and .10 percent confidence levels, respectively.
a. Based on 50-mile radius around patient’s zip code in year of diagnosis. High areas have BCS
percentage greater than or equal to 22% (includes stage I patients). Low areas have BCS
percentages less than 22%. Rates are calculated excluding the patient.
b. Modified version of Charlson Co-morbidity index using non-cancer ICD-9 codes from patient’s
hospital discharge abstracts. Equals one if index is greater than zero, zero otherwise.
58
ANOVA survival estimates the same.
58
Marginal Stage II Early Stage Breast Cancer
Patients in Iowa, 1989-1994
M
0%
8%
More
Certain
For BCS
12%
50%
100%
Less
Certain for
BCS
M = patients whose treatment choice is dependent on the
practice inclinations of local providers – Marginal
Patients.
59
59
→ IV estimates using area BCS rate as instrument.
Number of
groups
Instrument
F-statistic
After diagnosis, effect of BCS on
patient survival:
1 year
2 years
3 years
4 years
2
8.57***
-0.32
-0.68
-0.57
-0.51
4
5.19***
-0.37**
-0.54**
-0.45
-0.65*
8
3.43***
-0.33**
-0.50**
-0.46*
-0.52*
12
3.00***
-0.23**
-0.41**
-0.33
-0.11
***,**,* statistically significant at .99, .95, and .90 confidence, respectively.
60
Increase BCS rate by 5% points for those affected by Area Rate, decrease survival
by .25 percentage points.
60
• How many instruments?
→ Patients in Subset B affected by instruments may
vary across instruments, so IV estimates may vary.
→ IV estimates using Distance to Radiation as an
instrument:
Number of
groups
Instrument
F-statistic
After diagnosis, effect of BCS on
patient survival:
1 year
2 years
3 years
4 years
2
21.79***
-0.21*
-0.12
-0.33
-0.23
4
7.52***
-0.14
-0.22
-0.39
-0.38
8
3.30***
-0.14
-0.19
-0.35
-0.28
12
2.94***
-0.05
-0.14
-0.33
-0.40*
***,**,* statistically significant at .99, .95, and .90 confidence, respectively.
61
All negative, smaller fewer significant.
61
→ IV estimates using both area BCS rate and distance
to radiation:
Number of
groups
Instrument
F-statistic
After diagnosis, effect of BCS on
patient survival:
1 year
2 years
3 years
4 years
2
13.08***
-0.24**
-0.25
-0.38*
-0.30
4
4.99***
-0.24**
-0.32*
-0.39*
-0.45*
8
2.76***
-0.24**
-0.31**
-0.34*
-0.27
12
2.74***
-0.12*
-0.23**
-0.30**
-0.15
***,**,* statistically significant at .99, .95, and .90 confidence, respectively.
→ Each instrument remained independently significant.
→ Estimates are “weighted average”.
62
62
• Which Sample?
→ Estimates for Marginal Patients may vary by sample.
8-Group Estimates by Cancer Stage and Instrument
After diagnosis, effect of BCSI
on patient survival:
Cancer
Stage
stage II
state I
Instrument
1
F-statistic year
2
years
3
years
4
years
BCS Rate
3.43***
-0.33**
-0.50**
-0.46*
-0.52*
Rad Dist
3.30***
-0.14
-0.19
-0.35
-0.28
Both
2.76***
-0.24**
-0.31**
-0.34*
-0.27
BCS Rate
0.69
-0.06
-0.07
-0.04
0.18
Rad Dist
3.36***
-0.09
0.04
0.22*
0.16
Both
1.77**
-0.09
-0.02
0.19
0.18
Instrument
***,**,* statistically significant at .99, .95, and .90 confidence, respectively.
63
63
• Which Sample (Example 2)?
→ Effects of Catheterization on AMI Patient Mortality
by Insurance Status using Differential Distance as
an Instrument (Brooks et al. 2000).
→ Data from Washington State 1989-1993
Insurance
Group
Obs
Private – Non HMO 6,121
Average Cath
Age
Rate
IV Estimate of Cath on 1Year Mortality Rates
54.8
77.8
-0.104***
Private HMO
1,408
54.5
69.6
-0.132***
Medicaid
1,285
53.2
67.3
-0.119*
Self-Pay
765
54.0
64.7
-0.194***
***,**,* statistically significant at .99, .95, and .90 confidence, respectively.
→ Lower catheterization rate reveals higher benefit for
marginal patients.
64
64
Summary
• The foundation of IV estimation is theory that suggests
instruments – what factors motivated treatment choices.
• Ability to generalize is limited, but IV estimates offer a
more natural estimate of the effects of rate changes than
RCT estimates.
• Estimates can vary by sample and instrument used.
• Estimates are conditional on the truth (and acceptance)
of a known identification restriction. The source of the
treatment variation is known. The relationship between
this variation and unmeasured confounders can be
debated.
65
DON’T OVERSELL ESTIMATES...DESCRIBE the sensitity of the results to model
changes.
65
References
Angrist JD, 2001. Estimation of Limited Dependent Variable Models with Dummy Endogenous Regressors: Simple
Strategies for Empirical Practice. Journal of Business & Economic Statistics. 19(1):2-16
Angrist, JD, Imbens GW, Rubin, DB. 1996. Identification of Causal Effects Using Instrumental Variables. Journal of the
American Statistical Association. 91:444-454.
Angrist JD, Krueger AB. 2001. Instrumental Variables and the Search for Identification: From Supply and Demand to
Natural Experiments. Journal of Economic Perspectives. 15(4): 69-85.
Brooks JM, Chrischilles E, Scott S, Chen-Hardee S. 2003. Was Lumpectomy Underutilized for Early Stage Breast
Cancer? – Instrumental Variables Evidence for Stage II Patients from Iowa. Health Services Research, 38(6):13851402.
Brooks JM, McClellan M, Wong H. 2000. The Marginal Benefits of Invasive Treatment for Acute Myocardial Infarction:
Does Insurance Coverage Matter? Inquiry, 37(1):75-90.
Imbens GW, Angrist, JD. 1994. Identification and Estimation of Local Average Treatment Effects, Econometrica.
62(2):467-475.
McClellan M, McNeil BJ, Newhouse JP. 1994. Does More Intensive Treatment of Acute Myocardial Infarction in the
Elderly Reduce Mortality: Analysis Using Instrumental Variables", Journal of the American Medical Association.
272:859-866.
McClellan M, Newhouse JP. 1993. The Marginal Benefits of Medical Treatment Intensity. Cambridge,Mass: National
Bureau of Economic Research: Working Paper.
McClellan M, Newhouse JP. 1997. The Marginal Cost-Effectiveness of Medical Technology - a Panel Instrumental
Variables Approach, Journal of Econometrics. 77:39-64.
66
DON’T OVERSELL ESTIMATES...DESCRIBE the sensitity of the results to model
changes.
66
Download