A Practical Guide to Propensity Score Models

advertisement
A Practical Guide to
Propensity Score Models
Paul L. Hebert, PhD
Investigator, VA HSR&D Puget Sound and
Research Associate Professor, Department of Health
Services
Paul.Hebert2@va.gov Heberp@u.washington.edu
June 29, 2009
Funding provided by NIH/NIDDK
Motivation
 Researcher is using observational data to compare
outcomes among two or more treatments
 Observed covariates differ substantially between
treatment groups
 Propensity Score Models attempt to affect a balance in
observed covariates between treatment groups
 Create a single variable—a propensity score– that captures how
differences in these covariates contribute to a patient’s
probability of receiving treatment A vs. treatment B
 Use this propensity score to create groups of treatment A versus
treatment B patients that look similar to each other.
 Compare outcomes between these groups of well-matched
patients
Motivation, continued
 Especially useful in two situations
 Substantial non-overlap of treatment groups on
important covariates.
 Some people in Treatment A group don’t look like
anybody in Treatment B group
 Rare outcomes with common treatments.
 Multivariate models would have too few events per
right-hand side variable
Using propensity score models
requires five steps
1. Estimate propensity score
2. Use the propensity score to create a
balance in observed covariates across
treatment groups
3. Evaluate the quality of the balance
4. Estimate differences in outcomes
between balanced treatment groups
5. Perform sensitivity analyses
This talk focuses on the first
three steps
1. Estimate propensity score
2. Use the propensity score to create a
balance in observed covariates across
treatment groups
3. Evaluate the quality of the balance
4. Estimate differences in outcomes
between matched treatment groups
5. Perform sensitivity analyses
Data for Today’s Presentation
 Are all Angiotensin Converting Ezyme Inhibitors
(ACEIs) alike?
 HOPE RCT Trial showed ACEI ramipril was good for
high-risk CV patients
 Are other, cheaper ACEIs just as effective?


Ramipril $53.03/ 30 day supply
Captopril $12.99
 No RCT has been, or ever will be, conducted to
answer this. It’s either causal models or we guess
Problem: ACEI users differ on a number of
important variables
California Medicaid/Medicare beneficiaries prescribed ACEIs in 1997
ramipril
N
1,269
Demographics
Mean Age
70.3
(sd)
6.6
Female
65%
Race
White
53%
African American
11%
Hispanic
11%
Asian
7%
Other/Unknown Race
19%
Long Term Care
4%
Disabled
26%
Per-capita income in ZIP
17,519
Hospitalized in 1996
13%
Charlson Comorbidity Score
1.57
Median number of meds in 1996 7 (3, 12)
ACEI Users
captoprl enalapril
1,521
71.4
7.2
70%
4,935
benazepril
P-value
3,412
71.4
6.6
69%
71.0
6.9
71%
0.000
0.002
59%
54%
14%
14%
8%
7%
5%
8%
15%
17%
14%
7%
32%
24%
17,606
19,051
20%
23%
1.74
1.71
12 (7, 20) 9 (5, 14)
54%
13%
7%
8%
18%
6%
27%
18,858
22%
1.64
9 (5, 14)
0.024
0.002
0.000
0.016
0.000
0.000
0.000
0.000
0.000
0.000
Using propensity score models
requires five steps
1. Estimate propensity score
2. Use the propensity score to create a
balance in observed covariates across
treatment groups
3. Evaluate the quality of the balance
4. Estimate differences in outcomes
between matched treatment groups
5. Perform sensitivity analyses
What is a propensity score
 Usually a logit (probit if you’re an econometrician) model of
treatment (W) as a function of X’s
W=1 if patient i takes ramipril, 0 if captopril
Wi = f ( X i ; β )
 For each patient, calculate the propensity score ßX from the ß’s and
X’s
( )
( )
exp βˆX i
ˆ
Pi =
1 + exp βˆX i
Step 1: Estimate PS equation
 What X’s do you use?
 What should you do with the Xs?
 E.g, transformations, interactions
 How do you know you have a good model?
What X’s do you use?
Risk
Factor
Instrument
Treatment
e.g, ramipril
Outcome
Confounders
What X’s do you use?
 Rubin (2007) suggests expansive definition of X.
 To estimate effect of smoking on costs, modeled
smoking as a function of:
 Age, education, occupation, etc..
 Seatbelt use, arthritis, number of friends, frequency of having
friends over for dinner, membership in clubs, etc.
 Variables that should not be included are “…are effectively
known to have no possible connection to the outcomes, such
as random numbers…, or the weather half-way around the
world” (Rubin 2007; p 29)
Rubin, DB, Statist Med 2007; 26:20-36
What X’s do you use?
 Monte Carlo Simulations by Austin (2006, 2008) and
Brookhart (2006)
 DO include all variables related to the outcome
 Could get biased results (Brookhart) or imbalance in covariates
in matched samples (Austin, 2006) if you include only known
confounders.
 DO NOT include variables related to the treatment but not the
outcome (i.e, instrumental variables)
 Including these variables increases the variance of the
estimated treatment effect but doesn’t decrease the bias
(Brookhart)
 Including these variables reduces ability to match (Austin,
2006)
Austin PC, et al Statisti Med 2006; 26(4) 734-753
Brookhart, et al, Am J Epi 2006; 163:1149-56
Austin PC, J Clinical Epi 2008; 26: 537-545
What X’s do you use?
Risk
Factor
Instrument
Treatment
e.g, ramipril
Outcome
Confounders
Step 1: Estimate Propensity
Score
 What X’s do you use?
 What should you do with these Xs?
 Rubin (2007) suggests an expansive use of transformations and
interaction terms. Smoking equations includes
 Log(weight)*log(height), log(weight)2, etc.
 “Other, unspecified non-linear terms”
 Should not include “five-way interactions”
 Austin (2006): Modeled statin use using 257 variables: 24 main
variables and 233 transformations and two-way interactions of
those variables.
 Dehejia and Wahba (1999, 2000): Add higher-order terms if
imbalance in covariates within quintiles of the propensity score
(more later)
Austin PC et al, Statist. Med 2006; 25:2084-2106
Rubin D, Statist. Med 2007; 26:20-36
Step 1: Estimate PS equation
 What X’s do you use?
 What should you do with the Xs?
 How do you know you have a good model?
 Should you care about significance of variables in
model (i.e., p-values)?
 Should you care about the overall fit or predictive
properties (e.g., c-statistic)?
 Answer: NO. Its all about affecting a balance in the X’s
Using propensity score models
requires five steps
1. Estimate propensity score
2. Use the propensity score to create a
balance in observed covariates across
treatment groups
3. Evaluate the quality of the balance
4. Estimate differences in outcomes
between matched treatment groups
5. Perform sensitivity analyses
Kernal Density plot of logit of the propensity score for Ramipril users
Versus captopril users
Before PScore Matching
0
kdensity pr
.2
.4
.6
Density plots of linear propensity score
-4
-2
0
x
Ramipril
Captopril
2
Step 2: Creating balanced treatment groups
 Three basic options
 Conditioning on the propensity score
g(Yi)=b0+b1Ramiprili+b2F(βXi)+ei
 Stratification on the propensity score
Yi , j = β 0, j + β1, jWi , j
1 4
β1 = ∑ β1, j
5 j =0
 Matching on the propensity score
What techniques are researchers
using
 Weitzen (2004) reviewed 47 in Medline in 2001
 Conditional adjustment:
 Stratification:
 Matching:
 Stratified covariate:
 Unspecified:
25
9
7
4
2
Weitzen, S et al Pharmacoepi Drug Safety. 2004; 13:841-853
Which should you use?
 Conditioning is inappropriate for odds ratios and hazard
ratios

Conditioning on the propensity score results in biased (toward null)
estimates of odds ratios and hazard ratios, but not rate ratios (Austin
2007), risk ratios (Austin, 2008), or differences in means or proportions
(Rosenbaum and Rubin, 1985)
 Stratifying on the quintiles of any propensity score
model resulted in residual imbalance between treated
and untreated subjects in the upper and lower quintiles.
(Austin, 2006)
Rosenbaum and Rubin (1983) Biometrika; 70:41-45
Austin PC, et al Stat Med 2006; 26(4): 734-753
Austin PC, et al Stat Med 2007; 26(4): 754-768
Austin PC, et al J Clinical Epi 2008; 26: 537-545
Kernal Density plot of logit of the propensity score for Ramipril users
Versus captopril users
Before PScore Matching
0
kdensity pr
.2
.4
.6
Density plots of linear propensity score
-4
-2
0
x
Ramipril
Captopril
2
Which should you use?
 Conditioning is inappropriate for odds ratios and hazard ratios
 Stratifying on the quintiles of any propensity score model
resulted in residual imbalance between treated and untreated
subjects in the upper and lower quintiles. (Austin, 2006)
 Matching on the propensity score resulted in the least bias
when estimating relative risks, whereas stratifying resulted in
the greatest bias (Austin, 2008)
Austin PC, et al Stat Med 2006; 26(4): 734-753
Austin PC, et al J Clinical Epi 2008; 26: 537-545
If you choose to match, there are
several techniques
 Nearest Neighbor
 Match subject in Treatment A group to subject in Treatment B
group with the closest propensity score
 5→1 match
 Match on the 5th digit of the propensity score first
 Of the resulting unmatched sample, match on the 4th digit of the
propensity score
 Repeat until 1 digit match
 Mahanalobis Matching
 Match on the basis of the Mahanalobis distance between subjects
 Distance between vectors of characteristics, including the
propensity score (more later).
Options for use with each type of
matching procedure
 1-1 matching versus 1-many
 Gain efficiency if you have many treatment B subjects that match
one Treatment A subject.
 Efficiency gain of 1-many is surprisingly small (Rosenbaum, 1985)
 Matching with replacement
 Allow a subject in treatment group B to serve as a match for
multiple subjects in treatment group A
 Matching with calipers
 Match Treatment B subjects within a specified caliper or range of
a Treatment A’s score (e.g., +/-0.01 of the propensity score)
 e.g., discard a nearest neighbor match if the propensity score of
the matched Treatment B patient is > caliper
Rosenbaum and Rubin, (1985) Biometrics 41, 103-116
What matching techniques are
researchers using
 Austin (2008) reviewed 47 articles published in medical journals
1996-2003
 Matching technique cited





19 Used calipers of various sizes
9 Used 5→1 matching
3 Used nearest neighbor
1 matched within quintiles of propensity score
15 no information
 1-1 versus 1-many



39 used 1-1
4 used 1-many
4 no information
 With/without replacement


14 Without
33 no information
Austin, PC Stat Medicine. 2008; 27:2037-2049
What should you do? Bias versus efficiency
tradeoffs in matching techniques
Bias
1-1 matching
1-N matching
With replacement
Without replacement
With calipers
Without calipers
Inefficiency
How do you know when you have a good match?
 Look at differences in X’s between matched (or stratified)
treatment group
 T-tests and chi2 tests could misrepresent balance because large N’s give
small p-values, and vice versa
 Better: calculate standardized differences for each variable (j)
in your analysis
 Standardized differences >10 could be a problem
Dj =
100 * ( xtreatment − xcontrol )
(s
2
treatment
)
2
/2
+ scontrol
Before propensity score matching, ramipril users
versus captopril users
Standardized differences for selected variables
ramipril versus captopril
NSAID
Statin
Diuretic
Calcium Channel Blocker
Beta Blocker
COPD
Diabetes
Previous Hosp
Disabled
Longterm Care
Other Race
Asian
Hispanic
Black
Female
Number Meds
Income
Age
-70
-60
-50
-40
-30
-20
-10
Standardized Differences
0
10
20
Standardized differences after 5→1 digit
matching, without replacement, 1-1 matching
Standardized differences after propensity score matching
ramipril versus captopril
NSAID
Statin
Diuretic
Calcium Channel Blocker
Beta Blocker
COPD
Diabetes
Previous Hosp
Disabled
Longterm Care
Other Race
Asian
Hispanic
Black
Female
Number Meds
Income
Age
-70
-60
-50
-40
-30
-20
-10
Standardized Differences
0
10
20
Compare standardized differences within
quintiles of the estimated propensity score
Standardized differences overall and within propensity score quintiles; ramipril
Users versus Captopril users BEFORE propensity score matching
Overall
Age
Per Capital Income
Number Meds
Female
Black
Hispanic
Asian
Other/Unk Race
Longterm Care
Disabled
Previous Hosp
Diabetes
COPD
Beta Blocker
Calcium Channel Blocker
Diuretic
Statin
NSAID
-16.9
-1.2
-68.6
-11.1
-9.8
10.5
8.5
11.2
-37.9
-14.2
-12.2
-17.0
-2.4
-12.6
-21.1
-34.9
5.1
3.8
Good
Uncertain
Bad
38%
41%
22%
Quintiles of the Propensity Score
0
1
2
3
-18.9
-4.6
-21.6
-6.2
-7.5
9.2
2.1
-7.6
-30.6
2.5
7.8
-1.4
6.7
1.2
22.5
16.2
-7.1
14.9
-1.3
5.0
-2.3
13.7
-8.1
-8.8
8.2
6.8
-8.0
-13.1
-0.9
-4.7
2.1
21.6
24.0
9.1
-3.3
-2.4
-0.2
-6.5
-9.3
3.0
2.7
1.7
1.4
-1.4
5.0
-3.9
4.6
12.9
-1.8
-5.0
7.3
6.0
6.7
-1.5
7.5
-1.0
-11.2
-8.5
1.8
-2.4
-12.1
0.7
7.9
10.8
-9.5
-1.1
-7.9
-3.8
-19.4
-13.2
-3.8
-1.8
4
0.9
-9.1
-4.1
-12.1
8.9
5.2
1.5
2.1
-7.3
5.8
-7.5
-6.0
3.5
-18.4
-35.9
-18.9
14.9
0.7
After matching
Standardized differences overall and within propensity score quintiles; ramipril
Users versus Captopril users
Age
Per Capital Income
Number Meds
Female
Black
Hispanic
Asian
Other/Unk Race
Longterm Care
Disabled
Previous Hosp
Diabetes
COPD
Beta Blocker
Calcium Channel Blocker
Diuretic
Statin
NSAID
Good
Uncertain
Bad
Overall
-1.7
0.2
-0.2
-0.7
-2.8
-2.4
-1.8
0.9
-3.4
2.5
-2.0
2.0
-1.4
-3.2
-0.2
-0.5
1.1
1.4
94%
6%
0%
Quintiles of the
0
1
-19.9
4.8
7.7
5.5
8.4
-5.3
2.2
11.0
-4.1
-14.9
0.1
-9.2
-2.7
10.0
5.6
13.8
-16.5
11.3
7.4
-20.3
1.7
4.1
-3.0
-11.5
6.6
11.0
0.3
12.7
20.7
16.9
25.0
-5.9
-11.6
7.7
25.9
-29.7
Propensity Score
2
3
5.5
-1.2
-7.6
-3.3
-11.3
3.1
5.9
-11.8
4.2
-2.2
4.4
1.6
-11.2
-7.8
3.7
-6.6
3.5
10.6
-5.0
16.0
-1.3
-2.2
22.5
4.5
-10.1
-11.1
-13.9
2.8
-3.1
-7.8
6.1
-19.0
-0.5
3.4
4.9
15.1
4
4.3
-0.3
-0.8
-9.2
4.3
-8.3
1.8
-8.1
-10.6
16.6
-18.7
-1.8
-5.8
-24.0
-27.7
-10.7
6.5
-9.7
After matching, treatment groups look
balanced on propensity score
After PScore Matching
Density plots of linear propensity score
0
0
.2
.2
kdensity pr
kdensity pr
.4
.4
.6
.6
Before PScore Matching
Density plots of linear propensity score
-4
-2
0
2
-3
-2
-1
x
Ramipril
0
1
x
Captopril
Ramipril
Captopril
2
After propensity score matching, ramipril vs.
captopril
Model Characteristics
Initial Sample
% Matched (N)
Captopril Ramipril
p-value
1521
1269
56% (859) 68% (859)
- --
0.7569
--
70.6
69%
12%
8%
6%
17%
5%
28%
5.4
17,663
1.6
17%
10.4
18%
50%
46%
24%
6%
7%
1%
5%
0.944
0.917
0.557
0.34
0.605
0.949
0.736
0.748
0.103
0.662
0.678
0.893
0.481
0.658
0.923
0.663
0.651
0.92
0.707
1
0.747
C-statistic
Characteristics of matched samples
Mean age
Female
Black
Hispanic
Asian
Other/Unknown race
Long-term care
Disabled
Ramipril equivalent dose (mg)
Per Capita Income in ZIP code ($)
Charlson comorbidity score
Hospitalized 1996
Number of Medications 1996
Beta blockers
Calcium Channel Blockers
Diuretics
Lipid Lowering Agents
NSAIDS
Anti-arrythmias
Anti-platelet
Hydralazine Nitrates
70.6
69%
13%
9%
5%
17%
5%
29%
5.6
17,487
1.6
17%
10.6
17%
50%
45%
23%
6%
7%
1%
5%
Using propensity score models
requires five steps
Estimate propensity score
Use the propensity score to create a balance in
observed covariates across treatment groups
Evaluate the quality of the balance
Estimate differences in outcomes between
matched treatment groups
1.
2.
3.
4.

5.
If you matched, consider using appropriate technique
for matched samples (Austin, 2008. And associated
editorials).
Perform sensitivity analyses (Rosenbaum 2002)
Austin PC, Statics. Med 2008; 27(12) 2037-49
Rosenbaum (2002) Observational Studies. New York: Springer
Example: Rubin’s suggested method
1.
2.
3.
Anti-parsimonious logit for propensity score
equation
For each Treatment A patient (i), define a “donor
pool” of Treatment B patients (j) with linear
propensity score within .2 sd of patient i’s
Calculate the Mahalanobis distance (M) to each
patient in the donor pool
Mij=(xi-xj)Ω-1(xi-xj)
x= vector of linear propensity score and a few
other important covariates
Ω=covariance matrix for X’s
Rubin, continued
1-1 matching
4.
5. Match without replacement
6. Use a greedy matching algorithm


Match the hardest to match patient to the
best match in the donor pool first based on
Mahanalobis distance
Repeat with next hardest to match patient
until all Treatment A patients are matched or
there are no Treatment B patients left in the
donor pool.
Summary
 Goal of Propensity Score modeling is to create treatment
groups that are balanced on observed covariates
 To estimate the propensity score
Include all variables believed to be related to the outcome, not just the
confounders. Be anti-parsimonious.
 Do not include variables that are only related to the treatment
 Use interactions, transformations, liberally to effect a balance

 Condition, stratify or match on the propensity score

Do not condition on the propensity score if you are estimating a logit, or
Cox model
 If you match,
Tradeoff between bias and efficiency regarding with replacement,
caliper, and 1-many matching
 evaluate the quality of the match use standardized differences
 Consider using statistics appropriate for matched samples

Last Slide- References
Look at “good” versus “bad” matches

For a given variable (say, age) calculate the variance that is left in this
variable, after controlling for the propensity score.


A good variable is one that has about as much unexplained variance in the
treatment group as the control group
Using OLS, estimate
Agei = α 0 + α1Wˆi + ui
where W is the estimated propensity score

Get the residuals (u) from this equation and calculate
BAge =
su2,treatment
su2,control
 B is good if it is between 4/5 and 5/4 (i.e., around 1.0)
 B is bad if it is >2 or <1/2
 B is uncertain otherwise

Do this for each variable in the propensity score model
Download