Combining propensity scores with g p p y Instrumental Variable analysis

advertisement
Combining
gp
propensity
p
y scores with
Instrumental Variable analysis
Paul L. Hebert, PhD
AcademyHealth Annual Research Meeting
June 29,
29 2010
Motivation



Propensity score matching is good for addressing limited
overlap of distributions of observed covariates
Instrumental Variables (IV) good for addressing unequal
distribution of unobserved covariates
What if you have both?



You know you have bad distribution of observed covariates
across treatments
You think you also have bad distribution of unobserved
covariates
Should
Sh
ld you make
k an attempt to balance
b l
the
h observed
b
d
covariates before applying an IV?
Motivation continued
Motivation,

We often do some ad hoc “trimming” or “screening” of
the sample when there is limited overlap in covariates


E.g., In a comparison between VA and Medicare beneficiaries,
get rid
id off allll the
h women iin the
h sample
l bbecause VA iis 95% male
l
Pre-screening using the propensity score (Crump, 2009) is
more systematic
i and
d can improve
i
precision
i i and
d accuracy
Crump, RK(2009) Biometrika 96: 187-199
Eg. Is it beneficial for Emergency Medical Technicians
(EMT ) tto place
(EMTs)
l
an iintravenous
t
catheter
th t out-of-hospital
t fh
it l




Data on patients who did/did not
get intravenous access placed by
EMT. (Seymour et al working
paper)
Detailed clinical information*
Large differences in observed
covariates
Good Instrumental Variable



Certain EMTs place out-of-hospital
catheters more often than others
Strongly correlated with observed
treatment
Uncorrelated with outcomes
among patients who categorically
require an in-field catheter
placement
No outofhospital
catheter
Out of
hospital
p
catheter
N
28,254
28,078
Age
63
65
Male
42%
49%
CVD
20%
56%
R i t
Respiratory
18%
13%
Life threateningg
1%
8%
Urgent
31%
78%
Non-urgent
67%
21%
In-hospital deaths
732
805
Cause of disease
EMS Severity code
* Other variables: BP, heart rate, respiratory rate, oxygen saturation, Glasgow score, interventions, response time, etc.
Why propensity score screening + regression
analysis works
Distribution of patients across deciles of the propensity score,
by treatment status
0
10
20
30
40
Treatment B
0
Percent
10
20
30
0
40
Treatment A
0
.5
Pr(tx)
Graphs
p by
y tx
Folks in these groups probably weren’t giving you
reliable estimates of the treatment effect anyways
1
Why it might work for IV analysis


Depends on where the “marginal patient” is along the
propensity score continuum
If few people at the extremes of the propensity score are
affected by the instrument, then trimming might improve
precision


Eg., If severely ill patients almost always get out-of-hospital
access, regardless of practice patterns of the EMT, then
severely ill patients are not induced to change their treatment
status by the instrument, and so are not “marginal” patients.
Trimmingg the sample
p of these patients
p
might
g improve
p
the
precision of the IV estimate
Why propensity score screening + regression
analysis works
Distribution of patients across deciles of the propensity score,
by treatment status
0
10
20
30
40
Treatment B
0
Percent
10
20
30
0
40
Treatment A
0
.5
Pr(tx)
Graphs
p by
y tx
If there are no “marginal patients” in the tails, then
maybe IV is better off not having them in the model
1
Simulations
Ob
Observed
d patient
i
characteristics
h
i i
X N(0 1)
X~N(0,1)
Instrumental Variable (IV):
EMT proclivity to place catheter in-field
Z=(0,1); 1=High use of in-field catheter
0=low use of in-field catheter
Treatment:
Out-of-hospital catheter placement
Tr=(0,1); 1=Out-of-hospital catheter
0=No Out-of-hospital catheter
Scenario 1: IV affects the probability of
treatment in all patients equally
Prob(Tr=1)=logit(βX+Z+u1)
β>0 separation between treatments on
observed patient characteristics
SScenario 2:
2 IV affects
ff
probability
b b l off
treatment only for patients in the
middle of the distribution of X
Prob(Tr=1|
P
b(T 1| |x|≤1)=logit(βX+Z+u
| |≤1) l i (βX Z 1)
Prob(Tr=1| |x|>1)=logit(βX+u1)
Outcome:
Mortality
Prob(Y=1)=logit(X+δTrueTr+u2)
δTrue=0
u1, u2 are normally distributed and
correlated, therefore selection bias
Simulation continued
Simulation,
•
•
•
Perform 1000 simulations
Vary separation of observed covariates (X) between
treatments from 0 to 1.75 standard deviations
Estimate treatment effect δ (recall δTrue=0)
–
–
–
•
δIV= 2SRI IV estimate
δPS= Propensity score matched estimate
δIV+PS= 2SRI IV performed on propensity score matched
sample
Report bias=(δIV- δTrue) and mean squared deviation=(δIVδTrue)2
Simulation results: Scenario 1
Bi (sd)
Bias
( d)
Standardized
S
d di d difference
diff
in
i observed
b
d covariates
i
(X)
between treatment groups
0
P-Score
IV
IV+PS
0.5
1
1.5
1.75
0.69(0.05) 0.68(0.05) 0.69(0.05) 0.69(0.06) 0.70(0.07)
‐0.01(0.23) ‐0.04(0.24) ‐0.03(0.25) ‐0.02(0.21) ‐0.02(0.18)
0.01(0.25) ‐0.00(0.25) 0.04(0.27) 0.08(0.31) 0.09(0.32)
Estimate from ppropensity
p
y score is badlyy biased
because of unobserved selection bias (odds
ratio on treatment≈2 when it should be 1.0)
Simulation results: Scenario 1
Bi ((sd)
Bias
d)
Standardized
S
d di d difference
diff
in
i observed
b
d covariates
i
(X)
between treatment groups
0
P-Score
IV
IV+PS
0.5
1
1.5
1.75
0.69(0.05) 0.68(0.05) 0.69(0.05) 0.69(0.06) 0.70(0.07)
-0.01(0.23) -0.04(0.24) -0.03(0.25) -0.02(0.21) -0.02(0.18)
0.01(0.25) -0.00(0.25) 0.04(0.27) 0.08(0.31) 0.09(0.32)
Bias on IV estimate is small and does not
systemically increased with greater seperation
Simulation results: Scenario 1
Bi ((sd)
Bias
d)
Standardized
S
d di d difference
diff
in
i observed
b
d covariates
i
(X)
between treatment groups
0
P-Score
IV
IV+PS
0.5
1
1.5
1.75
0.69(0.05) 0.68(0.05) 0.69(0.05) 0.69(0.06) 0.70(0.07)
-0.01(0.23) -0.04(0.24) -0.03(0.25) -0.02(0.21) -0.02(0.18)
0.01(0.25) -0.00(0.25) 0.04(0.27) 0.08(0.31) 0.09(0.32)
Bias of IV conducted on P-Score matched sample
is similar to bias of IV conducted on full-sample
Simulation results: Scenario 1
Standardized
S
d di d difference
diff
in
i observed
b
d covariates
i
(X)
between treatment groups
Bias (sd)
0
0.5
1
1.5
1.75
P-Score
0.69(0.05) 0.68(0.05) 0.69(0.05) 0.69(0.06) 0.70(0.07)
IV
-0.01(0.23) -0.04(0.24) -0.03(0.25) -0.02(0.21) -0.02(0.18)
IV+PS
0.01(0.25) -0.00(0.25) 0.04(0.27) 0.08(0.31) 0.09(0.32)
Mean Squared Deviation
P Score
P-Score
0 48
0.48
0 47
0.47
0 47
0.47
0 48
0.48
0 49
0.49
IV
0.05
0.06
0.06
0.05
0.03
IV+PS
0.06
0.06
0.07
0.10
0.11
MSD on IV estimates are similar, regardless
g
of
matching on P-score
Simulation results: Scenario 2
Standardized
S
d di d difference
diff
in
i observed
b
d covariates
i
(X)
between treatment groups
Bias (sd)
0
0.5
1
1.5
1.75
P‐Score
0.69(0.05) 0.69(0.05) 0.69(0.05) 0.69(0.06) 0.69(0.07)
IV
‐0.09(0.33) ‐0.11(0.33) ‐0.15(0.29) ‐0.06(0.23) ‐0.03(0.20)
IV+PS
‐0.07(0.36) ‐0.02(0.33) ‐0.00(0.32) 0.04(0.34) 0.08(0.35)
Mean Squared Deviation
P Score
P-Score
0 48
0.48
0 48
0.48
0 47
0.47
0 48
0.48
0 48
0.48
IV
0.12
0.12
0.11
0.05
0.04
IV+PS
0.13
0.11
0.10
0.12
0.13
Bias of IV conducted on P-Score matched sample
is smaller than bias off IV cconducted
nd cted on
n ffull-sample,
ll sam le
but not by much. MSDs are similar.
Empirical Example: Out-of-hospital
placement of catheter by EMT

Out-of-hospital placement
of intravenous catheter by
EMTs is associated with
l
lower
iin-hospital
h i l mortality
li
regardless of the
estimation method used.
used
Seymour, C (2010) U Wash Working paper.
Odds ratio
Odd
i on
out-of-hospital
use of
i t
intravenous
catheter
Unadjusted
1.11 (p=0.054)
Logistic regression
0.67 (p<0.001)
Propensity score
matched
0.69 (p=0.002)
2SRI Instrumental
variable on full
sample
l
0.78 (p=0.112)
2SRI Instrumental
variable on P-score
matched sample
0.68 (p=0.01)
Conclusion



Preliminary simulations suggest limited benefit (but no
apparent harm) to using IV in combination with
propensity score matching
IV estimates were impressively close to true estimate
even with large separation between the treatment groups
on observed
b
d covariates
Further study needs to address the sensitivity of these
results
l
A Practical Guide to Propensity Score Models
Paul L. Hebert, PhD
I
Investigator, VA HSR&D Puget Sound and
ti t VA HSR&D P
t S
d d
Research Associate Professor, Department of Health Services
University of Washington School of Public Health
Paul.Hebert2@va.gov Heberp@u.washington.edu
June 29, 2009
Funding provided by NIH/NIDDK
Motivation
 Researcher is using observational data to compare outcomes among two or more treatments
 Observed covariates differ substantially between Ob
d i t diff b t ti ll b t
treatment groups
 Propensity Score Models attempt to affect a balance in p
y
p
observed covariates between treatment groups
 Create a single variable—a propensity score– that captures how differences in these covariates contribute to a patient’s differences in these covariates contribute to a patient
s probability of receiving treatment A vs. treatment B
 Use this propensity score to create groups of treatment A versus treatment B patients that look similar to each other.
 Compare outcomes between these groups of well‐matched patients
Motivation continued
Motivation, continued
 Especially useful in three situations  Substantial non‐overlap of treatment groups on i
important covariates. i
 Some people in Treatment A group don’t look like anybody in Treatment B group
 Sparse data with binary (0,1) outcomes
 Rare outcomes with common treatments.  Multivariate models would have too few events per right
Multivariate models would have too few events per right‐
hand side variable
 Rare co‐morbidities that are perfectly correlated with binary outcome
 Applications where timing of treatment is critical
 Match an “early dialysis” patient to a “late” dialysis patient
ti t
Using propensity score models requires five steps
1. Estimate propensity score
2 Use the propensity score to create a 2.
balance in observed covariates across treatment groups 3. Evaluate the quality of the balance
4. Estimate differences in outcomes between balanced treatment groups
5. Perform sensitivity analyses
This talk focuses on the first two steps
1. Estimate propensity score
2 Use the propensity score to create a 2.
balance in observed covariates across treatment groups 3. Evaluate the quality of the balance
4. Estimate differences in outcomes between matched treatment groups
5. Perform sensitivity analyses
What is a propensity score?
 Usually a logit (probit if you’re an econometrician) model of treatment (W) as a function of X’s
Tr 1 if patient i takes treatment A, 0 if treatment B
Tr=1 if patient i
takes treatment A 0 if treatment B
 For each patient, calculate the propensity score ßX from the ß’s and X’s
 
 
exp ˆX i
ˆ
Pi 
1  exp ˆX i
 Note that K variables for each patient just got reduced to 1 propensity score Step 1: Estimate PS equation
 What X’s do you use?
 How do you build the model?
 E.g, transformations, interactions
g,
,
What X s do you use?
What X’s do you use?
Risk
Factor
Mediator
Instrument
Outcome
Treatment
C f
Confounders
d
What X s do you use?
What X’s do you use?
 Rubin (2007) suggests expansive definition of X.  To estimate effect of smoking on costs, modeled smoking as a function of:
 Age, education, occupation, etc..
 Seatbelt use, arthritis, number of friends, frequency of having friends over for dinner, membership in clubs, etc.
 Variables that should not be included are Variables that should not be included are “…are effectively are effectively known to have no possible connection to the outcomes, such as random numbers…, or the weather half‐way around the world (Rubin 2007; p 29)
world” (Rubin 2007; p 29)
Rubin, DB, Statist Med 2007; 26:20-36
What X s do you use?
What X’s do you use?
 Monte Carlo Simulations by Austin (2006, 2008) and Brookhart (2006)
 DO include all variables related to the outcome
 Could get biased results (Brookhart) or imbalance in covariates in matched samples (Austin, 2006) if you include only known confounders.
 DO NOT include variables related to the treatment but not the outcome (i.e, instrumental variables)
( ,
)
 Including these variables increases the variance of the estimated treatment effect but doesn’t decrease the bias (
(Brookhart)
)
 Including these variables reduces ability to match (Austin, 2006)
Austin PC,
PC et al Statisti Med 2006; 26(4) 734
734-753
753
Brookhart, et al, Am J Epi 2006; 163:1149-56
Austin PC, J Clinical Epi 2008; 26: 537-545
What X s do you use?
What X’s do you use?
Risk
Factor
Mediator
Instrument
Outcome
Treatment
C f
Confounders
d
How do you build the propensity score model?
 Rubin (2007) suggests an expansive use of transformations and interaction terms. Smoking equations includes
 Log(weight)*log(height)
Log(weight)*log(height), log(weight)
log(weight)2, etc.
etc
 “Other, unspecified non‐linear terms”
 Should not include “five‐way interactions”
 Austin (2006): Modeled statin
A i (
6) M d l d i use using 257 variables: 24 i i bl main variables and 233 transformations and two‐way interactions of those variables.
 Schneeweiss (2009) High‐dimensional Propensity Score (hd‐PS) Adjustment SAS Macro does this to your data
 Don
Don’t worry about p‐values
t worry about p values, don
don’t use step‐wise model t use step wise model building
Austin PC et al, Statist. Med 2006; 25:2084‐2106
Rubin D, Statist. Med 2007; 26:20‐36
Schneeweiss S, et al. Epidemiology 2009. 20:512–22
http://www.drugepi.org/downloads/index.php
Using propensity score models requires five steps
1. Estimate propensity score
2 Use the propensity score to create a 2.
balance in observed covariates across treatment groups 3. Evaluate the quality of the balance
4. Estimate differences in outcomes between matched treatment groups
5. Perform sensitivity analyses
Kernal Density plot of logit of the propensity score for Ramipril users
V
Versus
captopril
t il users
Before PScore Matching
0
pr
kdensity p
.2
.4
.6
6
Density plots of linear propensity score
-4
-2
x
Ramipril
0
Captopril
2
Four basic options for using a propensity
score
1. Conditioning on the propensity score
2 Stratification on the propensity score
2.
3. Matching on the propensity score
4. Weighting on the Inverse Propensity Score

Combinations of the above
Four basic options for using a propensity
score
1. Conditioning on the propensity score
g(Yi)=b0+b1Tri+b2Pscorei+ei
 Can yield unbiased estimates of rate ratios (Austin 2007), risk C i ld bi d i
f i (A i ) i k ratios (Austin, 2008), or differences in means or proportions (Rosenbaum and Rubin, 1985)
 Problems:  Conditioning on the propensity score results in biased (toward null) estimates of odds ratios and hazard ratios
 Ruben (2004) warns that propensity score must be carefully specified if used directly in the estimation of treatment
Rosenbaum and Rubin (1983) Biometrika; 70:41-45
Austin PC, et al Stat Med 2006; 26(4): 734-753
Austin PC
PC, et al Stat Med 2007; 26(4): 754
754-768
768
Austin PC, et al J Clinical Epi 2008; 26: 537-545
Rubin DB (2004) Pharmacoepidemiology and Drug Safety 2004; 14:885-887
Four basic options for using a propensity
score
2. Stratification on the propensity score
 Divide sample into (usually) quintiles based on propensity score
 Calculate differences in outcomes within each quintile (j) and average to get overall treatment effect
Four basic options for using a propensity
score
2. Stratification on the propensity score
 Problems: Problems  Stratifying on the quintiles of any propensity score model resulted in residual imbalance between treated and untreated subjects in the upper and lower quintiles. (Austin, 2006)
 When estimating relative risks, stratifying resulted in the greater bias than matching (Austin, 2008)
g
g(
,
)
Austin
A
ti PC,
PC ett all Stat
St t Med
M d 2006;
2006 26(4)
26(4): 734
734-753
753
Austin PC, et al J Clinical Epi 2008; 26: 537-545
Example of density plot of the linear propensity score for two users of two
antihypertensives: ramipril and captopril
Before PScore Matching
You can get a lousy estimate
of treatment effect in tails
because of limited overlap
of covariates in the first and
last strata
0
nsity pr
kden
.2
.4
.6
Density plots of linear propensity score
-4
-2
x
Ramipril
0
Captopril
2
Four basic options for using a propensity
score
3. Matching on the propensity score
 For each patient in the Treatment A group, find a For each patient in the Treatment A group find a Treatment B patient with a propensity score similar to the Treatment A patient s propensity similar to the Treatment A patient’s propensity score.
 Advantages
 Increasingly popular technique with good properties in simulations.  Balance on observables is obvious (i.e., it makes a beautiful Table 1))
Covariate balance before and after
propensity score matching
Before Matching
Ramipril Captopril P-value
N
Demographics
Mean Age
Female
Race
White
African American
Hispanic
Asian
Other/Unknown Race
Long Term Care
Disabled
Hospitalized
Charlson
# Meds
1 269
1,269
1 521
1,521
70.3
65%
71.4
70%
0.000
0 002
0.002
53%
11%
11%
7%
19%
4%
26%
13%
16
1.6
7.1
59%
14%
8%
5%
15%
14%
32%
20%
17
1.7
12.1
0.024
0.002
0.000
0.016
0.000
0.000
0.000
0 000
0.000
0.000
Covariate balance before and after
propensity score matching
Before Matching
After matching
Ramipril Captopril P-value Ramipril Captopril P-value
N
Demographics
Mean Age
Female
Race
White
African American
Hispanic
Asian
Other/Unknown Race
Long Term Care
Disabled
Hospitalized
Charlson
# Meds
1 269
1,269
1 521
1,521
70.3
65%
71.4
70%
53%
11%
11%
7%
19%
4%
26%
13%
16
1.6
7.1
59%
14%
8%
5%
15%
14%
32%
20%
17
1.7
12.1
859
859
0.000
0 002
0.002
70.6
69%
70.6
69%
0.944
0 557
0.557
0.024
0.002
0.000
0.016
0.000
0.000
0.000
0 000
0.000
0.000
56%
13%
9%
5%
17%
5%
29%
17%
16
1.6
10.6
57%
12%
8%
6%
17%
5%
28%
17%
16
1.6
10.4
0.605
0.949
0.736
0.748
0.103
0.662
0.678
0.481
0 658
0.658
0.923
Four basic options for using a propensity
score
3. Matching on the propensity score
 For each patient in the Treatment A group, find a For each patient in the Treatment A group find a Treatment B patient with a propensity score similar to the Treatment A patient s propensity similar to the Treatment A patient’s propensity score.
 Advantages
 Disadvantages: Many options for matching and no standardization
If you choose to match, there are If you choose to match
there are several techniques
 Nearest Neighbor
 Match subject in Treatment A group to subject in Treatment B group with the closest propensity score
 5→1 match
 Match on the 5th digit of the propensity score first
 Of the resulting unmatched sample, match on the 4
f h
l
h d
l
h
h th digit of the d
f h
propensity score
 Repeat until 1 digit match
 Mahanalobis Matching
 Match on the basis of the Mahanalobis distance between subjects
 Distance between vectors of characteristics, including the propensity score (more later). Options for use with each type of matching procedure
 1‐1 matching versus 1‐many
 Gain efficiency if you have many treatment B subjects that match one Treatment A subject
one Treatment A subject.
 Efficiency gain of 1‐many is surprisingly small (Rosenbaum, 1985)
 Matching with replacement
 Allow a subject in treatment group B to serve as a match for ll
b
hf
multiple subjects in treatment group A
 Matching with calipers g
 Match Treatment B subjects within a specified caliper or range of a Treatment A’s score (e.g., +/‐0.01 of the propensity score)
 e.g., discard a nearest neighbor match if the propensity score of the matched Treatment B patient is > caliper
Rosenbaum and Rubin, (1985) Biometrics 41, 103-116
What should you do? Bias versus efficiency y
y
tradeoffs in matching techniques
Bias
1‐1 matching
1‐N matching
With replacement
With t l
Without replacement
t
With calipers
Without calipers
Inefficiencyy
Do like Don: Rubin’s suggested Do like Don: Rubin
s suggested method for using propensity scores
1.
2
2.
3.
Anti‐parsimonious logit
Anti
parsimonious logit for propensity score equation For each Treatment A patient (i) define a “donor For each Treatment A patient (i), define a donor pool” of Treatment B patients (j) with linear propensity score within .2 sd
p
p
y
of patient i’s
p
Calculate the Mahalanobis distance (M) to each p
patient in the donor pool
p
Mij=(xi‐xj)Ω‐1(xi‐xj)
x= vector of linear propensity score and a few other important covariates
Ω=covariance matrix for X’s
Rubin continued
Rubin, continued
1‐1 matching
t hi
4.
5. Match without replacement
6. Use a greedy matching algorithm


Match the hardest to match patient to the p
best match in the donor pool first based on Mahanalobis distance
Repeat with next hardest to match patient until all Treatment A patients are matched or th T t
there are no Treatment B patients left in the t B ti t l ft i th donor pool.
Four basic options for using a propensity
score
4. Weighting by inverse of the propensity score
 Assign weights to each patient that are equal to the inverse of the probability of receiving the treatment the subject received (i.e., 1/p‐score for patients who got Treatment A; 1/(1‐pscore) for patients who got Treatment B)
 Calculate the weighted average difference in outcomes between groups
 Advantages: Good properties in simulations (Austin 2010; L f d & Davidian, 2004)
Luceford
& D idi )
 Disadvantages:  Not used very often.
 Same warning from Ruben (2004) as regards using the propensity score to directly estimate treatment.
Lunceford, J. K. and Davidian, M. (2004). Statistics in Medicine 23, 2937–2960
Austin PC, (2010). Statistics in Medicine, forthcoming.
Summary
 Consider propensity scores for sparse data, limited overlap in covariates, and when timing of treatment is important
 Goal of Propensity Score modeling is to create treatment groups that are balanced on observed covariates
 To estimate the propensity score

Include all variables believed to be related to the outcome, not just the ,
j
confounders. Be anti‐parsimonious.
 Do not include variables that are only related to the treatment  Use interactions, transformations, liberally to effect a balance
Use interactions transformations liberally to effect a balance
 Condition, stratify, match or weight using the propensity score

Be careful about odds ratios and hazard ratios from logistic or Cox g
regressions using conditioning Last Slide Thank You
Last Slide‐
Download