Replicating Randomised Trials of Treatments in Observational Settings Using Propensity

advertisement
Replicating Randomised Trials of Treatments
in Observational Settings Using Propensity
Scores – Fisher’s Aphorisms
Nick Freemantle PhD
Professor of Clinical Epidemiology & Biostatistics
Assessing Causation
Aim
• Randomised trials
– And how they work
• Observational studies
– Propensity scores
• Make a few concluding comments
Randomised controlled trials
We have the duty of formulating,
of summarising and of communicating
our conclusions, in intelligible form,
in recognition of the right of other free minds
to utilise them in making their own decisions
Ronald Fisher
• “the simple act of randomisation assures the internal
validity of the test for significance”
1935 Ronald Fisher
Fisher RA. The design of experiments. 8th ed. Edinburgh : Oliver & Boyd, 1966. PP21
Ronald Fisher 1890 – 1962
• Professor of Eugenics UCL 1933 to 1943
•
•
•
•
•
•
•
•
•
•
•
After Rothamsted Experimental Station
Analysis of variance,
Maximum likelihood estimation
Sufficiency
Ancillary statistics
Fisher's linear discriminator
Fisher information
Fisher's z-distribution (F distribution)
Fiducial inference
First to use the term "Bayesian“
Founded modern quantitative genetics
Randomisation
Patient population
Randomisation
Intervention group
Control group
Randomised trials
• Two orthogonal (e.g. independent) explanations
for any observed difference between experimental
groups at end of trial
– Play of chance (that was how the patients were allocated)
– Treatment effects
Statistical analysis of randomised trials
• Statistical analysis tells us the likelihood that
the observed difference (or a larger difference)
occurred by chance alone
– Implausible?
– Then it must be the experimental treatment!
Meta analysis
• Simply the weighted average of the results
of randomised trials
– Weighted by the inverse of the study variance (fixed effects)
– Weighted by the inverse of the study variance and a component
(moment) for between study variability in treatment effects
(random effects)
Mixed treatment comparisons
AKA network meta analyses
• Simply filling in the gaps between trials
– Loved by the health economists who want ‘fully conditional’
estimates of each treatment included in their models
What makes an unbiased trial
• Randomisation
– Properly done and concealed
• Objective outcome measure
• Double blind
• Minimising loss to follow-up
• Intention to treat analysis
Intention to treat principle
• Analysis conducted according to the initial randomisation,
rather than the treatment received
– Preserves randomisation
Concealment of allocation
• CAPPP trial
– 10,985 patients in Nordic family practice with moderate
hypertension randomised to captopril or standard therapy
– Difference in baseline diastolic BP
• Control = 98.1 mm Hg (sd=10.1)
• Captopril = 99.8 mm Hg (sd=9.9)
• P = 6 x 10-19
CAPPP Study Group. Effect of angiotensin-converting-enzyme inhibition compared with conventional therapy on cardiovascular
morbidity and mortality in hypertension: the Captopril Prevention Project (CAPPP) randomised trial. Lancet 1999;353:611–6
Loss to follow-up
• Assent-2
– Randomised 16,949 patients with acute myocardial infarction
– Primary outcome all-cause mortality at 30 days
– 6 patients missing!
ASSENT-2 Investigators. Single-bolus tenecteplase compared with front-loaded alteplase in acute myocardial infarction:
the ASSENT-2 double-blind randomised trial. Lancet 1999;354:716–22
What happens when you cannot undertake
a trial?
• Circumstances include:
– Question precludes randomisation
• E.g. Brain mass in pathology subjects with and without schizophrenia
– Immediacy of question
• Trials take years to conduct
– Internal vs. external validity?
How can you maximise internal and external
validity?
• Internal validity
= Gives the right answer for the subjects included
• External validity
= Gives an answer that you can generalise
Using propensity scores as an alternative
to randomisation in real world data sets
• Approach has been used very effectively
– Exploration of cancer outcomes in ’statin trials
– Comparing different antiarrhythmic drugs in atrial fibrillation
Smeeth L, Douglas I, Hall AJ, Hubbard R, Evans S. Effect of statins on a wide range of health outcomes: a cohort study validated by comparison
with randomized trials. Br J Clin Pharmacol 2008;67:99–109
Saksena S, Slee A, Waldo AL, Freemantle N, Reynolds M, Rosenberg Y, Rathod S, Grant S, Thomas E, Kuo E, Wyse DG. Cardiovascular
Outcomes in the AFFIRM Trial: An Assessment of Individual Antiarrhythmic Drug Therapies compared to Rate Control Using Propensity Score
Matched Analyses. JACC 2011;19:1975–85
Bridging from trials using observational
methods? A salutary tale
• Aldosterone inhibitors for Heart Failure
– 3 large trials confirming aldosterone inhibitors massively reduce
mortality in heart failure
– Population in trials highly (self) selective and prejudiced towards
younger men without comorbidity…
– Real world prejudiced towards older women with comorbidities…
Pitt B, Annad FZ, Remme WJ, Cody R, Castaigne A, Perez A, Palensky J, Wittes J. The effect of spironolactone on morbidity and mortality in patients with
severe heart failure. N Engl J Med 1999:341:709-17
Pitt B, Remme W, Zannad F, Neaton J, Martinez F, Roniker B, Bittman R, Hurley S, Kleiman J, Gatlin M. Eplerenone, a selective aldosterone blocker, in
patients with left ventricular dysfunction after myocardial infarction. N Engl J Med 2003;348:1309-21
Zannad F, McMurray JJV, Krum H, van Veldhuisen DJ, Swedberg K, Shi H, Vincent J, Pocock SJ, Pitt B. Eplerenone in patients with systolic heart failure and
mild symptoms. N Engl J Med 2011;364:11-21
Replicating the RALES trial
• We replicated, as best we could, a randomised trial
of aldosterone inhibitors in severe heart failure
– In The Health Improvement Network database (THIN)
– Matched cases and controls using a propensity score
• With the aim of then broadening the comparison to groups
not included in the randomised trials
Propensity score?
• Likelihood of a subject receiving a treatment / exposure
given their characteristics
• Derived from a statistical model
– E.g. ‘logistic regression’
• Matching those who did and did not receive treatment
of interest on propensity score
Methods
• Included:
– Patients with recent high-dose loop diuretics treatment
(≥80mg furosemide per day or equivalent) indicating congestion
Methods
• Excluded patients:
– Palliative care register
– Renal dysfunction
– Recent cancer or unstable angina, liver failure or a heart transplant
Methods
• Matched on propensity score
– Propensity score included a large number of indicators of patient
demography, comorbidities and treatments (prescribed drugs)
• Made two tightly matched groups of patients (n=4,412)
treated with and not treated with spironolactone
– Described their baseline characteristics
– Compared outcome between the groups
Propensity scores, confounding by indication
and other perils for the unwary in observational
research
HR (95% CI), p=, n=, events=
RALES 1999
0.70 (0.60, 0.82), p<0.0001, n=1,663, events=670
Overall Propensity Score Matched Analysis
1.32 (1.18, 1.47), p<0.0001, n=4,412, events=1,285
Propensity Score Quartile >75% to 100%
1.20 (0.98, 1.47), p=0.085, n=1,103, events=369
Propensity Score Quartile >50% to ≤75%
0.99 (0.81, 1.21), p=0.91, n=1,103, events=388
Propensity Score Quartile >25% to ≤ 50%
1.46 (1.16, 1.83), p=0.001, n=1,103, events=298
Propensity Score Quartile ≤ 25%
0.5
2.01 (1.54, 2.63), p<0.0001, n=1,103, events=230
1
2
5
What went wrong?
• Decision to prescribe spironolactone
– Not otherwise ignorable
– Confounding by indication / severity
– Analysis did not pass the validation step (luckily it had one)
– Others have taken observational studies at face value
• Aprotinin in cardiac surgery
• Human analogue insulin and cancer risk
Mangano DT, Tudor JC, Dietzel C. The Risk Associated with Aprotinin in Cardiac Surgery. N Engl J Med 2006;354:353–65
Take home messages for propensity score
based analyses of treatment effects
• Start from somewhere you know
– replicating an existing trial
• Examine the behaviour of the propensity score across its
range
– Test for interaction with exposure
A personal plea
• Nearly did not publish the spironolactone analysis
– Anxiety about doing harm
Comment
• Have described how randomised trials account for bias
– Maximising internal validity
• Have described how propensity scores work to address
bias
– Maximising external validity
• Have described the situation where propensity score
analyses will fail, and some approaches to address this
Download