Confounding adjustment: Ideas in Action -A Case

advertisement
Confounding adjustment:
Ideas in Action
-a case study
Xiaochun Li, Ph.D.
Associate Professor
Division of Biostatistics
Indiana University School of Medicine
Outline
• Description of the data set
• Quantity to be estimated
• Summary of baseline characteristics
• Approaches to data analyses
• Results
• Discussion
2
Simulation Setup
•
Linder Center data described and analyzed in
Kereiakes et al. (2000)
6 month follow-up data on 996 patients who

underwent an initial Percutaneous Coronary Intervention
(PCI)
 were treated with “usual care” alone or usual care plus a
relatively expensive blood thinner (IIB/IIIA cascade
blocker
• has10 variables


Y: 2 outcomes, mort6mo (efficacy) and cardcost (cost)
X: 1 treatment variable, and 7 baseline covariates,
stent, height, female, diabetic, acutemi, ejecfrac and
ves1proc
3
Baseline characteristics
Stent
coronary stent
deployment
female
patient sex
diabetic
diabetes mellitus
acutemi
acute myocardial
infarction
ves1proc
number of vessels
involved in initial PCI
height
In centimeter
ejecfrac
left ejection fraction %
4
The “LSIM10K” dataset
•
Simulation data set was based on the Linder Center
data
17 copies of the clustered Lindner data, with fudge
factors added to ejfract and hgt, and some clipping

same correlation among covariates, same clustering
patterns
• Contains the values of 10 simulated variables for
•
•
10,325 hypothetical patients
To simplify analyses, the data contain no missing
values.
Details and dataset available from Bob’s website
5
What do we want to estimate?
The population average treatment effect (ATE), i.e.,
E(Y1) - E(Y0)
Y1 and Y0 are conterfactual outcomes
In plain words: what if scenarios
The expected response if treatment had been
assigned to the entire study population minus the
expected response if control had been assigned to
the entire study population
6
Baseline covariate balance
assessment
Variable
C
(Usual care
alone)
T
(Usual care +
Abciximab)
P value
stent
63%
69%
<0.001
female
33%
34%
0.36
diabetic
23%
19%
<0.001
acutemi
7%
15%
<0.001
ves1proc
1.4 (±0.6)
1.3 (±0.6)
<0.001
height (cm)
172.5 (±10)
171.5 (±10)
<0.001
ejfract
53 (±8)
50 (±10)
<0.001
7
Visualizing overall imbalance
Deep blue = high values
8
C
T
Analytical Methods
for confounding adjustment
The following methods were applied to lsim10k
• Outcome regression adjustment (OR)
• Propensity score (PS) stratification
• Inverse-probability-treatment-weighted (IPTW)
• Doubly robust estimation
• Matching by
Mahalonobis distance
PS only
9
ANALYSIS OF MORT6MO
OR model for mort6mo :
• treatment indicator (trtm)
• main effect terms for all seven covariates
• quadratic terms for both height and ejfract
• Residual deviance: 2410.4 on 10323 degrees
of freedom
PS model:
• saturated model for the five categorical
covariates (main effects and interaction terms up
to fifth-order)
• main effects and quadratic terms for height and
ejfract
Covariates Balance Evaluations based on
PS Quintiles
Stent
1
2
Female
1
3
Diabetic
1
4
Acutemi
1
5
Ves1proc
1
6
Height
strata 2 (0.95 cm) and 3 (-1.50cm)
1
7
Height
•
•
Existence of residual confounding after adjusting
for PS quintiles
The within-stratum between-group height
difference
mean
 Stratum 2:
 Stratum 3:
0.949
-1.497
s.d.
p
0.44
0.032
0.43
0.0005
1
8
Ejfract
strata 1 (0.81), 2 (-1.32) and 3 (-0.72)
1
9
Ejfract
•
•
Existence of residual confounding after adjusting
for PS quintiles
The within-strata between-group height difference
mean
Stratum 1: 0.812
s.d.
0.41
p-value
0.0475
Stratum 2:
-1.322
0.33
7.38e-5
Stratum 3:
-0.721
0.32
0.025
2
0
PS Stratification
• Residual confounding within strata
•
In PS stratification method, height and ejfract are
further adjusted
stratum specific

Treatment effect
 Height, ejfract main effects and their quadratic terms
2
1
Results – mort6mo
True △=-0.036
u1
u0
△
SE
Outcome
Regression
0.010
0.043
-0.032
0.0038
PS strat.
0.012
0.044
-0.033
0.0039
IPTW1
0.011
0.045
-0.034
0.0038
IPTW2
0.011
0.045
-0.034
0.0037
DR
0.011
0.043
-0.032
0.0037
NA
NA
-0.037
-0.036
0.0044
0.0039
Method
Match
Mahalanobis
PS
2
2
Results of all methods are consistent, providing evidence of treatment
effectiveness at preventing death at 6 months.
ANALYSIS OF CARDCOST
PS MODEL: SAME AS BEFORE
cardcost model:
•treatment indicator (trtm)
• main effect terms for all seven covariates
• quadratic terms for both height and ejfract
cardcost model of CA with PS stratification:
stratum specific
Treatment effect
Height, ejfract main effects and their quadratic terms
Model checking – OR
Adjusted R-squared: 0.0386
2
4
Model checking – OR (log transformed)
Adjusted R-squared: 0.0693
2
5
Results – cardcost
Method
u1
△
u0
SE
OR: original
scale
15308
15300
8
210
OR: Log
transformed
13536
13702
-166
111
PS strat.
13580
13639
-59
119
IPTW1
15545
15226
-319
409
IPTW2
15408
15303
-105
229
DR
15393
15292
-101
226
NA
NA
150
-3
178
215
Match
Mahalanobis
PS
2
6
Discussion
• All methods give consistent results on the 2
•
•
•
•
•
outcomes
All PS based results have similar variance except
IPTW1
IPTWs depend on approx. correct PS model
OR depends on approx. correct outcome model
DR is a fortuitous combination of OR and IPTW:
depends on one of models being right
Nonparametric models of either models may be an
alternative to parametric models
2
7
Double Robustness
• wrong PS model: adjust for one covariate ‘acutemi’
only
• wrong OR model for card cost: adjust for the treatment
indicator ‘trtm’ and the ‘acutemi’ covariate
△
Method
PS
outcome
IPTW2
wrong
NA
464
214
DR
wrong
wrong
right
wrong
right
wrong
463
166
-131
217
214
233
By “right”, we mean approximately.
SE
2
8
Propensity score estimation
•
•
The majority applications in literature use a parametric
logistic regression model that assume covariates are
linear and additive on the log odds scale

May include selected interactions and polynomial terms
Accurate PS estimation is impeded by
High dimensional covariates – which ones should we deconfound?
 Unknown functional form – how do they relate to the
treatment selection

•
•
PS model misspecification can substantially bias the
estimated treatment effect
Nonparametric approach is flexible to accommodate
nonlinear/non-additive relationship of covariates to
treatment assignment, e.g., trees
2
9
Nonparametric regression
techniques
•
Generalized Boosted Models (GBM) to
estimate the propensity score function

Friedman, 2001; Madigan and Ridgeway, 2004;
McCaffrey, Ridgeway, and Morral, 2004

•
R package: twang
Regression tree model to predict cardcost

Ripley, 1996; Therneau and Atkinson, 1997
 R package: rpart
3
0
Generalized Boosted Models (GBM)
•
•
•
•
•
•
•
•
A multivariate nonparametric regression technique
Sum of a large set of simple regression trees modelling
log-odds

gbm finds mle of g(x)=log(p(x)/(1-p(x)), p(x)=P(T=1|x)
Predict treatment assignment from a large number of
pretreatment covariates – adaptively choose them
Nonlinear
No need to select variables
Can model complex interactions
Invariant to monotone transformations of x

E.g, same PS estimates whether use age, log(age) or age2
Outperforms alternative methods in prediction error
3
1
Results – cardcost
nonparametric approach
Method
u1
u0
△
SE
DR:
parametric
models
15393
15292
-101
226
DR:
Gbm +
parametric
model
15303
15213
-90
210
DR:
Gbm + tree
15233
15356
123
172
3
2
Future research
• People try quintiles, deciles for propensity score
•
stratification – need data driven approach (based
on bias-variance tradeoff) for number of strata
Model selection: PS model, and outcome model

Nonparametric estimation of models may be intuitive,
but not clear about the properties of the causal
estimates
 Nonparametric caveat: still need to define a set of
“confounders” based on knowledge of causal
relationship among treatment, outcome and covariates
rather than conditioning indiscriminatly on all covariates
that have associations with treatment and outcome
3
3
Download