Replicating Randomised Trials of Treatments in Observational Settings Using Propensity Scores – Fisher’s Aphorisms Nick Freemantle PhD Professor of Clinical Epidemiology & Biostatistics Assessing Causation Aim • Randomised trials – And how they work • Observational studies – Propensity scores • Make a few concluding comments Randomised controlled trials We have the duty of formulating, of summarising and of communicating our conclusions, in intelligible form, in recognition of the right of other free minds to utilise them in making their own decisions Ronald Fisher • “the simple act of randomisation assures the internal validity of the test for significance” 1935 Ronald Fisher Fisher RA. The design of experiments. 8th ed. Edinburgh : Oliver & Boyd, 1966. PP21 Ronald Fisher 1890 – 1962 • Professor of Eugenics UCL 1933 to 1943 • • • • • • • • • • • After Rothamsted Experimental Station Analysis of variance, Maximum likelihood estimation Sufficiency Ancillary statistics Fisher's linear discriminator Fisher information Fisher's z-distribution (F distribution) Fiducial inference First to use the term "Bayesian“ Founded modern quantitative genetics Randomisation Patient population Randomisation Intervention group Control group Randomised trials • Two orthogonal (e.g. independent) explanations for any observed difference between experimental groups at end of trial – Play of chance (that was how the patients were allocated) – Treatment effects Statistical analysis of randomised trials • Statistical analysis tells us the likelihood that the observed difference (or a larger difference) occurred by chance alone – Implausible? – Then it must be the experimental treatment! Meta analysis • Simply the weighted average of the results of randomised trials – Weighted by the inverse of the study variance (fixed effects) – Weighted by the inverse of the study variance and a component (moment) for between study variability in treatment effects (random effects) Mixed treatment comparisons AKA network meta analyses • Simply filling in the gaps between trials – Loved by the health economists who want ‘fully conditional’ estimates of each treatment included in their models What makes an unbiased trial • Randomisation – Properly done and concealed • Objective outcome measure • Double blind • Minimising loss to follow-up • Intention to treat analysis Intention to treat principle • Analysis conducted according to the initial randomisation, rather than the treatment received – Preserves randomisation Concealment of allocation • CAPPP trial – 10,985 patients in Nordic family practice with moderate hypertension randomised to captopril or standard therapy – Difference in baseline diastolic BP • Control = 98.1 mm Hg (sd=10.1) • Captopril = 99.8 mm Hg (sd=9.9) • P = 6 x 10-19 CAPPP Study Group. Effect of angiotensin-converting-enzyme inhibition compared with conventional therapy on cardiovascular morbidity and mortality in hypertension: the Captopril Prevention Project (CAPPP) randomised trial. Lancet 1999;353:611–6 Loss to follow-up • Assent-2 – Randomised 16,949 patients with acute myocardial infarction – Primary outcome all-cause mortality at 30 days – 6 patients missing! ASSENT-2 Investigators. Single-bolus tenecteplase compared with front-loaded alteplase in acute myocardial infarction: the ASSENT-2 double-blind randomised trial. Lancet 1999;354:716–22 What happens when you cannot undertake a trial? • Circumstances include: – Question precludes randomisation • E.g. Brain mass in pathology subjects with and without schizophrenia – Immediacy of question • Trials take years to conduct – Internal vs. external validity? How can you maximise internal and external validity? • Internal validity = Gives the right answer for the subjects included • External validity = Gives an answer that you can generalise Using propensity scores as an alternative to randomisation in real world data sets • Approach has been used very effectively – Exploration of cancer outcomes in ’statin trials – Comparing different antiarrhythmic drugs in atrial fibrillation Smeeth L, Douglas I, Hall AJ, Hubbard R, Evans S. Effect of statins on a wide range of health outcomes: a cohort study validated by comparison with randomized trials. Br J Clin Pharmacol 2008;67:99–109 Saksena S, Slee A, Waldo AL, Freemantle N, Reynolds M, Rosenberg Y, Rathod S, Grant S, Thomas E, Kuo E, Wyse DG. Cardiovascular Outcomes in the AFFIRM Trial: An Assessment of Individual Antiarrhythmic Drug Therapies compared to Rate Control Using Propensity Score Matched Analyses. JACC 2011;19:1975–85 Bridging from trials using observational methods? A salutary tale • Aldosterone inhibitors for Heart Failure – 3 large trials confirming aldosterone inhibitors massively reduce mortality in heart failure – Population in trials highly (self) selective and prejudiced towards younger men without comorbidity… – Real world prejudiced towards older women with comorbidities… Pitt B, Annad FZ, Remme WJ, Cody R, Castaigne A, Perez A, Palensky J, Wittes J. The effect of spironolactone on morbidity and mortality in patients with severe heart failure. N Engl J Med 1999:341:709-17 Pitt B, Remme W, Zannad F, Neaton J, Martinez F, Roniker B, Bittman R, Hurley S, Kleiman J, Gatlin M. Eplerenone, a selective aldosterone blocker, in patients with left ventricular dysfunction after myocardial infarction. N Engl J Med 2003;348:1309-21 Zannad F, McMurray JJV, Krum H, van Veldhuisen DJ, Swedberg K, Shi H, Vincent J, Pocock SJ, Pitt B. Eplerenone in patients with systolic heart failure and mild symptoms. N Engl J Med 2011;364:11-21 Replicating the RALES trial • We replicated, as best we could, a randomised trial of aldosterone inhibitors in severe heart failure – In The Health Improvement Network database (THIN) – Matched cases and controls using a propensity score • With the aim of then broadening the comparison to groups not included in the randomised trials Propensity score? • Likelihood of a subject receiving a treatment / exposure given their characteristics • Derived from a statistical model – E.g. ‘logistic regression’ • Matching those who did and did not receive treatment of interest on propensity score Methods • Included: – Patients with recent high-dose loop diuretics treatment (≥80mg furosemide per day or equivalent) indicating congestion Methods • Excluded patients: – Palliative care register – Renal dysfunction – Recent cancer or unstable angina, liver failure or a heart transplant Methods • Matched on propensity score – Propensity score included a large number of indicators of patient demography, comorbidities and treatments (prescribed drugs) • Made two tightly matched groups of patients (n=4,412) treated with and not treated with spironolactone – Described their baseline characteristics – Compared outcome between the groups Propensity scores, confounding by indication and other perils for the unwary in observational research HR (95% CI), p=, n=, events= RALES 1999 0.70 (0.60, 0.82), p<0.0001, n=1,663, events=670 Overall Propensity Score Matched Analysis 1.32 (1.18, 1.47), p<0.0001, n=4,412, events=1,285 Propensity Score Quartile >75% to 100% 1.20 (0.98, 1.47), p=0.085, n=1,103, events=369 Propensity Score Quartile >50% to ≤75% 0.99 (0.81, 1.21), p=0.91, n=1,103, events=388 Propensity Score Quartile >25% to ≤ 50% 1.46 (1.16, 1.83), p=0.001, n=1,103, events=298 Propensity Score Quartile ≤ 25% 0.5 2.01 (1.54, 2.63), p<0.0001, n=1,103, events=230 1 2 5 What went wrong? • Decision to prescribe spironolactone – Not otherwise ignorable – Confounding by indication / severity – Analysis did not pass the validation step (luckily it had one) – Others have taken observational studies at face value • Aprotinin in cardiac surgery • Human analogue insulin and cancer risk Mangano DT, Tudor JC, Dietzel C. The Risk Associated with Aprotinin in Cardiac Surgery. N Engl J Med 2006;354:353–65 Take home messages for propensity score based analyses of treatment effects • Start from somewhere you know – replicating an existing trial • Examine the behaviour of the propensity score across its range – Test for interaction with exposure A personal plea • Nearly did not publish the spironolactone analysis – Anxiety about doing harm Comment • Have described how randomised trials account for bias – Maximising internal validity • Have described how propensity scores work to address bias – Maximising external validity • Have described the situation where propensity score analyses will fail, and some approaches to address this