Honest Inference from Observational Studies in Healthcare David Madigan

advertisement
Honest Inference from
Observational Studies in
Healthcare
David Madigan
Columbia University
Patrick Ryan
Janssen
http://www.omop.org
http://www.ohdsi.org
“The sole cause and root of almost every defect in the sciences is this: that whilst
we falsely admire and extol the powers of the human mind, we do not search for its
real helps.”
— Novum Organum: Aphorisms [Book One], 1620, Sir Francis Bacon
141 patients exposed in pivotal
randomized clinical trial for metformin
>1,000,000 new users of metformin in one
administrative claims database
Patient profiles from observational data
Major Use-Cases
• Population-level estimation
– Effect estimation: Does metformin cause lactic acidosis?
– Comparative effectiveness: Does metformin cause lactic
acidosis more than glyburide?
• Patient-level prediction/Precision medicine
– Given everything you know about me and my medical
history, if I start taking metformin, what is the chance that I
am going to have lactic acidosis in the next year?
• Clinical characterization:
– Natural history: Who are the patients that take
metformin? What happens to them?
– Quality improvement: what proportion of patients with
diabetes experience disease-related complications?
How well do we do estimation?
August2010: “Among patients in the UK
General Practice Research Database, the
use of oral bisphosphonates was not
significantly associated with incident
esophageal or gastric cancer”
Sept2010: “In this large nested casecontrol study within a UK cohort [General
Practice Research Database], we found a
significantly increased risk of oesophageal
cancer in people with previous
prescriptions for oral bisphosphonates”
What is the quality of the current
evidence from observational analyses?
April2012: “Patients taking oral
fluoroquinolones were at a higher risk of
developing a retinal detachment”
Dec2013: “Oral fluoroquinolone use was
not associated with increased risk of
retinal detachment”
What is the quality of the current
evidence from observational analyses?
BJCP May 2012: “In this study population,
pioglitazone does not appear to be significantly
associated with an increased risk of bladder
cancer in patients with type 2 diabetes.”
BMJ May 2012: “The use of pioglitazone is
associated with an increased risk of incident
bladder cancer among people with type 2
diabetes.”
What is the quality of the current
evidence from observational analyses?
Nov2012: FDA released risk
communication about the bleeding risk of
dabigatran, based on unadjusted cohort
analysis performed within Mini-Sentinel
Dec2013: “This analysis shows that the
RCTs and Mini-Sentinel Program show
completely opposite results”
Aug2013: “However, the absence of any
adjustment for possible confounding and
the paucity of actual data made the
analysis unsuitable for informing the care
of patients”
2010-2014 OMOP Research Experiment
• Open-source
• Standards-based
OMOP Methods Library
Inception
cohort
Case control
Logistic
regression
Common Data Model
• 10 data sources
• Claims and EHRs
• 200M+ lives
• 14 methods
• Epidemiology designs
• Statistical approaches
adapted for longitudinal data
Aplastic Anemia
Acute Liver Injury
Bleeding
Hip Fracture
Hospitalization
Myocardial Infarction
Mortality after MI
Renal Failure
GI Ulcer Hospitalization
B
nt
ib
s u io
lfo tic
na s:
m er
id yt
A
es h r
nt
, t om
ie
et
c a pil
ra yc i
c y ns
rb e p
a m ti
cl ,
in
c
az s:
es
ep
B
en
in
e,
zo
ph
di
en
az
yt
ep
oi
in
n
e
B
s
et
a
bl
oc
ke
rs
B
is
p
al hos
en p
dr ho
on n
at ate
e
Tr
s:
ic
yc
l ic
an
tid
ep
Ty
re
pi
ss
ca
an
la
ts
nt
ip
sy
ch
W
ar
ot
ic
fa
s
rin
A
ph
o
m
A
A
Outcome
Angioedema
C
E
In
h
te
ric
in
ib
ito
rs
Drug
Lesson 1: Empirical performance:
Most observational methods do not have nominal
statistical operating characteristics
• Applying the cohort design to
MDCR against 34 negative controls
for acute liver injury:
• If 95% confidence interval was
properly calibrated, then 95%*34 =
32 of the estimates should cover
RR = 1
• We observed 17 of negative
controls did cover RR=1
• Estimated coverage probability =
17 / 34 = 50%
• Estimates on both sides of null
suggest high variability in the bias
Ryan PB, Stang PE, Overhage JM et al, Drug Safety, 2013:
“A Comparison of the Empirical Performance of Methods for a Risk Identification System”
Lesson 2: Database heterogeneity:
Holding analysis constant, different data may yield
different estimates
• When applying a propensity score
adjusted new user cohort design to
10 databases for 53 drug-outcome
pairs:
• 43% had substantial heterogeneity
(I2 > 75%) where pooling would not
be advisable
• 21% of pairs had at least 1 source
with significant positive effect and
at least 1 source with significant
negative effect
Madigan D, Ryan PB, Schuemie MJ et al, American Journal of Epidemiology, 2013
“Evaluating the Impact of Database Heterogeneity on Observational Study Results”
Test cases from OMOP 2011/2012 experiment
Lesson 3: Parameter sensitivity:
Holding data constant, different analytic design
choices may yield different estimates
Holding all parameters constant,
except:
• Matching on age, sex and visit
(within 30d)
(CC: 2000205)
yields a RR = 0.73 (0.65 – 0.81)
Sertaline-GI Bleed: RR = 2.45 (2.06 – 2.92)
• Controls per case: up to 10 controls per case
• Required observation time prior to
outcome: 180d
• Time-at-risk: 30d from exposure start
• Include index date in time-at-risk: No
• Case-control matching strategy: Age and
sex
• Nesting within indicated population: No
• Exposures to include: First occurrence
• Metric: Odds ratio with Mantel Haenszel
adjustment by age and gender
(CC: 2000195)
Relative risk
Madigan D, Ryan PB, Scheumie MJ, Therapeutic Advances in Drug Safety, 2013: “Does design matter?
Systematic evaluation of the impact of analytical choices on effect estimates in observational studies”
Lesson 4: Empirical calibration can help restore
interpretation of study findings
• Type I error rate typically 40-60%
• Negative controls can be used to
estimate empirical null distribution:
how much bias and variance exists
when no effect should be observed
Schuemie MJ, Ryan PB, DuMouchel W, et al, Statistics in Medicine, 2013:
“Interpreting observational studies: why empirical calibration is needed to correct p-values”
All is Not Well
• Unknown operating characteristics
• Type 1 error rate? “95%” confidence interval?
• Like early days of routine laboratory testing –
“trust me, I measured it myself”
Large-scale analytics can help reframe the
patient-level prediction problem
…can we predict
outcomes for that
patient in the future?
0
1
1
0
0
0
0
0
1
0
0
1
1
1
0
0
1
1
1
0
0
All drugs
1
1
0
0
1
1
1
1
0
1
0
0
1
1
1
1
1
0
0
1
1
0
1
1
0
1
1
1
0
0
0
1
0
0
1
All conditions
1
1
1
1
1
1
0
0
1
0
0
0
0
1
0
0
0
0
0
1
1
0
0
1
1
1
1
1
All procedures
1
0
0
0
1
1
0
1
0
1
1
0
0
1
1
0
1
0
1
0
0
0
1
0
0
1
1
0
All lab values
b
n
La
76 M B
441
77 F
W 521
96 F
B
215
76 F
B
646
64 M B
379
74 M W 627
68 M B
348
Demographics
Dr
ug
Co n
nd
i
Co tion
nd 1
i
… tion
2
Co
nd
i
Pr tion
oc
e n
P r dur
oc e 1
e
… dur
e
2
Pr
oc
e
La dur
e
b
n
1
La
b
2
…
0
1
1
1
0
1
1
Ge
nd
Ra er
ce
Lo
ca
t
Dr ion
ug
Dr 1
ug
… 2
Ou
tc
o
Ag m e
:
e
St
ro
k
e
Given a patient’s clinical
observations in the
past….
Which Atrial Fibrillation Patients Should
Take Warfarin?
Atrial Fibrillation
Stroke
Risk
Stroke
Risk
Warfarin
Bleed
Risk
Goal: Identify patients with sufficiently low stroke risk to be spared warfarin.
Standard Machine Learning Stroke Results
AUC
CHADS2
Random Forest
Logistic Regression
.72
.79
.78
Standard Machine Learning Stroke Results
AUC
Chads2
Random Forest
Logistic Regression
Not great discrimination
at the low risk end
.72
.79
.78
Health History Motifs
Amit and Murua (2001)
Shahn, Ryan, and Madigan (2015)
Random Relational Forest (RRF)
Approach
• Build decision trees with graphs at the nodes
• Each graph is a set of labeled edges.
• A labeled edge is a triplet [ei,ej,Relation]
where ei and ej are each health events (such as
“Diabetes diagnosis” or “Atorvastatin
prescription”) and ‘Relation’ labels the
temporal relationship between the two
events.
Example Tree
I = {[Diabetes, Asthma, d=20]}
E = {}
I = {[Diabetes,Asthma, (d=20)],
[Asthma,Dementia,d=40]}
E = {}
I = {}
E = {[Diabetes, Asthma, d=20]}
I = {[Diabetes,Asthma, d=20]}
E = {[Asthma,Dementia, d=40]}
Each node of a tree is defined by two sets of labeled edges, call them I and E for
“included” and “excluded”. A patient is in a node if he contains each edge in I and
none of the edges in E. The edges in I form a connected graph.
-
+
113
-
14  8
age <
59
(9,17)
age >=
59
age >=
59
(1,13)
(6,16)
(4,6)
(9,17)
(14,8)
+
(14,8)
age >= 59
(1,13)
(6,16)
(4,6)
(9,17)
(1,13)
(6,16)
(4,6)
(1,13)
(6,16)
(4,6)
+
74
+
(1,13)
(6,16)
(7,4)
-
(6,16)
(4,6)
+
15  13
616
46
9  17
-
+-
age < 59
+
(1,13)
+
-
+
13  10
12  6
+
(1, 13)
61
+
(15, 13)
(14,1)
(4,6)
(7,4)
(1,13)
(6,16)
(12,6)
14  1
+
-
63
-
+
(4,6)
(7,4)
(12,6)
(6,3)
12  11
(13, 10)
(1, 13)
(14, 1)
+
51
(15,13)
(6, 1)
(4,6)
(7,4)
(12,6)
(6,3)
(12,11)
(4, 6)
(7,4)
(12,6)
(1,13)
(6,16)
(6,3)
(12,11)
(1,13)
(6,16)
4
+
+
(1,13)
(14, 1)
(6,1)
(5, 1)
1 2
(15,13)
(1,13)
(14,1)
(6,1)
(1,13)
(6,16)
(6,3)
7
(1,13)
(15, 13)
(13, 10)
(1, 13)
(15, 13)
1. Vascular disorders NEC
2. Central nervous system vascular
disorders NEC
3. Total fluid volume increased
4. Vascular disorders
5. Cardiac failure congestive
6. Nervous system disorders
7. Respiratory system disorders
8. Eye disorders
9. Coronary artery disorders NEC
10. Haematological and lymphoid
tissue therapeutic procedures
11. Anti-inflammatory and antirheumatic products
12. Non-steroidal drugs for
obstructive airway disease
13. Blood and blood forming organs
14. Antithrombotic agents
15. Opioids
16. Myocardial disorders NEC
17. Arteriosclerosis, stenosis,
vascular insufficiency and
necrosis
(15,13)
(5,1)
(1,2)
6
(1,13)
(14,1)
(6,1)
(1,2)
(15,13)
(5,1)
14
12
12
11
5
1
13
RRF Stroke Results
AUC
Chads2
Logistic Regression
Random Forest
RRF
.72
.78
.79
.79
RRF Stroke Results
AUC
Chads2
Logistic Regression
Random Forest
RRF
Improved discrimination
at the low risk end
.72
.78
.79
.79
Standardized large-scale analytics tools
under development within OHDSI
ACHILLES:
Database
profiling
Patient-level
data in
OMOP CDM
CIRCE:
Cohort
definition
HERACLES:
Cohort
characterization
OHDSI Methods
Library:
CYCLOPS
CohortMethod
HERMES:
Vocabulary
exploration
LAERTES:
Drug-AE
evidence base
http://github.com/OHDSI
PLATO:
Patient-level
predictive
modeling
HOMER:
Population-level
causality
assessment
Large-scale analytics example:
ACHILLES
http://ohdsi.org/web/ACHILLES
•
•
•
•
•
•
•
•
•
>12 databases from 5 countries across 3 different platforms:
Janssen (Truven, Optum, Premier, CPRD, NHANES, HCUP)
Columbia University
Regenstrief Institute
Ajou University
IMEDS Lab (Truven, GE)
UPMC Nursing Home
Erasmus MC
Cegedim
Atopic Dermatitis
29
Treatment pathways for diabetes
T2DM : All databases
Only drug
First drug
Second drug
Treatment pathways for HTN
HTN: All databases
Treatment pathways for depression
Depression: All
databases
Population-level heterogeneity
Type 2 Diabetes Mellitus
CCAE
Hypertension
CUMC
CPRD
INPC
JMDC
MDCR
Depression
MDCD
GE
OPTUM
Population-level heterogeneity
Type 2 Diabetes Mellitus
CCAE
Hypertension
CUMC
CPRD
INPC
JMDC
MDCR
Differences by country
Depression
MDCD
GE
OPTUM
Population-level heterogeneity
Type 2 Diabetes Mellitus
CCAE
Hypertension
CUMC
CPRD
INPC
JMDC
MDCR
Depression
Differences by medical center
MDCD
GE
OPTUM
Concluding thoughts
• An international community and global data
network can be used to generate real-world
evidence in a secure, reliable and efficient
manner
• Common data model critically important
• Much work remains on establishing (and
improving) actual operating characteristics of
current approaches to causal inference
“I would rather discover one cause than gain the kingdom of Persia”
- Democritus 400 BCE
OHDSI: Join the journey
Download