Trial Objectives Superiority, Non-inferiority, and Equivalence Questions of Interest • Is the new treatment better than the control treatment that I am using now? (superiority trial) • If it is not better, is the new treatment as good (not unacceptably non-inferior) as the control treatment that I am using now? (non-inferiority trial) • Can I use the new treatment and the control treatment interchangeably? (equivalence trial) Non-inferiority and equivalence trials are usually considered when there is an active control. Definitions (ICH Guidelines – E9) • Superiority trial – a trial with the primary objective of showing that the response to the investigational product is superior to a comparative agent (active or placebo control). • Equivalence trial – a trial with primary objective of showing that the response to two or more treatments differs by an amount which is clinically unimportant (active control). • Non-inferiority trial – a trial with the primary objective of showing that the response to the investigational product is not clinically inferior (or not unacceptably inferior) to a comparative agent (active or placebo control but usually active) – very common in the regulatory setting either for a new treatment or for a new label indication. FDA Guidance • “The objective of a non-inferiority trial is to show that any difference in the effectiveness of the two drugs is small enough to allow a conclusion that the new drug is not substantially less effective than the active control.” • “FDA considers the selection of a non-inferiority margin to be the single greatest challenge in designing, conducting, and interpreting non-inferiority trials…If a non-inferiority margin is incorrectly calculated and set to large, a drug that is not effective may appear to be effective; if the margin is too small, an effective drug may appear ineffective.” GAO-1-798 Evidence from Clinical Trials Reasons for Active Controls • An active treatment (comparator) with established efficacy exists. • If superiority can be established, the standard of care is improved. • While a short-term study with a placebo control might be ethical, if the outcome is morbidity/mortality, a trial with use of a placebo is not ethical if an accepted standard of care treatment exists (recall papers by Temple and Ellenberg). The Number and Type of Active Comparator Studies Vary by Sponsor (Commercial versus Non-Commercial) • Among published reports of trials between June 2008 and September 2009 in major medical journals, 97/212 (46%) used an active comparator. • 36/108 (33%) with commercial sponsors and 61/104 (59%) with non-commercial sponsors. • 18/36 (50%) of active controlled commercial trials were non-inferiority versus 5/61 (8%) of non-commercial trials. JAMA 2010; 303:951-958 Examples – Non-Inferiority - 1 • Safety: Is a new vaccine for pertussis (whooping cough) that has an improved safety profile as effective in preventing whooping cough as the currently licensed vaccine? • Ease of use: Is a new oral anticoagulant noninferior to warfarin for stroke and systemic embolism among patients with atrial fibrillation? (N Engl J Med 2011) Examples – Non-Inferiority - 2 • Treatment duration: Is a short course of treatment for latent TB infection (3 months of INH plus rifapentine) as effective as 9 months of INH in preventing active TB? (N Engl J Med 2011) • Cost: Is an inexpensive alternative to ranibizumab called bevacizumab noninferior for visual acuity among patients with age-related macular degeneration? (N Engl J Med 2011) Example - HIV Trial: Abacavir-Lamivudine-Zidovdine vs Indinavir-Lamivudine-Zidovudine JAMA 2001;285:1155-1163. “The study was powered to assess treatment equivalence for the primary endpoint (i.e., a plasma HIV RNA level <= 400 copies/mL at week 48 for the intent- to-treat population). For the primary end point, treatments were considered equivalent if the 95% confidence interval was within the bound -12% to 12%.” Motivation Evaluating New Treatments in for Non-Inferiority and Equivalence Trials New Treatment • Costs less • More convenient to use (e.g., short course of prophylaxis for TB, no blood tests as for warfarin) • Lower risk of side effects (e.g., pertussis vaccine) But is it as effective? Active and Placebo Controls in One Trial (Usually concurrent placebo arm is absent, but this may be practical in some short-term studies) Randomize Drug A Control Drug B Experimental Placebo Superiority Non-inferiority Effect of Hypericum perforatum (St. John’s Wort) in Major Depressive Disorder Randomize Sertraline Active Control St. John’s Wort Experimental Placebo Control Neither sertraline or St. John’s Wort was significantly different from placebo in this 8 week study. The authors noted “without a placebo, hypericum could easily have been considered as effective as sertraline…” JAMA 2002; 287:1807-1814. In the absence of a concurrent placebo, have to provide assurance that the active control would have been superior to placebo, if it had been used, and the test treatment would have beat placebo had it been used (indirect inference). Non-inferiority or Equivalence Trials: Key Features • Efficacy of reference or control treatment (anchor) must be clearly established (control is better than nothing). • Target population and outcome measures must be similar to the trial that established efficacy of control (constancy assumption). • Margin of non-inferiority/equivalence must be a priori stated, clinically relevant, and chosen to ensure new treatment is better than “imputed” nothing (non-inferiority margin). Assay Sensitivity and Constancy are Critical Assumptions in Interpreting Non-inferiority and Equivalence Trials Assay Sensitivity (def.) – ability to demonstrate a difference between active and inactive treatments • Can you assume that the standard treatment (active control) is effective? • How do you tell the difference between a good trial that establishes two active treatments to be similarly effective from a bad trial that incorrectly claims similarity? – External evidence: historical data that the control treatment is effective – Internal evidence : a high quality trial Constancy (def.) • Historical data showing that the control treatment is effective (better than placebo), holds in the setting of the current noninferiority trial Hung and O’Neill, Encyclopedia of Clinical Trials Historical Evidence Concerning Efficacy of Active Control and Defining the Non-Inferiority or Equivalence Margin • One trial • Meta-analysis or overview of trials (need to be cognizant of “file-drawer” problem) • Point-estimate or lower bound of 95% CI • Retention of certain fraction of superiority of active control over placebo (e.g., 50%) – True probability of event for active control and placebo are 20% and 30% – Show probability of event with new treatment is smaller than 25% (a difference, or non-inferiority margin, between new treatment and active control of 5%) Would like to convince people that if you had used placebo you would have won! General Problems in Determining NonInferiority Margin • What is “unacceptably inferior” or an acceptable level of non-inferiority – often in the eyes of the beholder! • Multiple outcomes are at play – non-inferiority margins are typically defined for the primary endpoint but many outcomes may be considered. • Constancy assumption: same endpoint, duration of followup as trial(s) that established efficacy of active control. • The margin assumes we know “true” effect of active control and often there is substantial variability. • In some cases, there are multiple choices for active control. How do you prove two treatments are equal? Cannot prove HO: Δ=0 “It is never correct to claim that treatments have no effect or that there is no difference in the effects of treatments. It is impossible to prove … that two treatments have the same effect. There will always be uncertainty surrounding estimates of treatment effects, and a small difference can never be excluded… An analysis of 45 reports of trials purporting to test equivalence found that only a quarter set boundaries on their equivalence.” The non-inferiority/equivalence margin must be specified in the protocol! Alderson P, Chalmers I. BMJ 2003:326:1691-8. Relationship Between Significance Tests and Confidence Intervals Superiority strongly shown p=0.002 p=0.05 Superiority shown p=0.20 Control Better Superiority not shown 0 New Agent Better Treatment Difference Superiority Trial – ALLHAT: Lisinopril vs Chlorthalidone for CHD Incidence, CVD Composite Outcome, and ESRD* CHD (95% CI:0.91-1.08) CVD Composite (95% CI: 1.05-1.16) ESRD (95% CI: 0.88-1.38) Lisinopril better 1.00 Chlorthalidone better HR (Lisinopril/Chlorthalidone) In ALLHAT, 15,255 participants were randomized to chlorthalidone and 9,000+ participants were randomized to each of 3 other treatments. JAMA 2002;288:2981-2997. Interpretation of Head to Head (Equivalence) Trials: CONVINCE and CAPPP CONVINCE equivalence bounds (0.86-1.16) CONVINCE Trial result CAPPP Trial result Overview (9 trials) Calcium Channel Blocker better 1.00 SOC better HR (Verapamil/SOC) for CONVINCE (Captopril/SOC) for CAPPP CAPPP = Captopril Primary Prevention Project. Authors concluded: “captopril and conventional treatment did not differ in efficacy.” See JAMA 2003;289: 2073-2082 for Convince Trial Example: 2NN Study • A study of first-line antiretroviral therapy in HIV • Main comparison between nevirapine twice daily and efavirenz (plus stavudine and lamivudine) in terms of ‘treatment failure’ (based on virology, disease progression, therapy change) • Primary objective was to establish the non-inferiority of nevirapine twice daily (δ =10%) Lancet 2004, 363:1253-63 Results: 2NN Study • Confidence intervals for failure rates (EFV-NVP) – All data (-12.8%, 0.9%) – Those starting med. (-14.6%, -0.8%) • Neither interval is completely above δ value of -10%; one interval also excludes zero. Conclusions: 2NN Study • BUT, the authors concluded: ‘Antiviral therapy with nevirapine or efavirenz showed similar efficacy, so triple-drug regimens with either … are valid for first-line treatment’ Lancet 2004, 363:1253-63 Interpretation of Non-Inferiority Trials: 6 Examples (A – F): Hazard ratio (Test Drug/Standard) and 95% CI Zone of noninferiority Test drug better 0.6 0.7 Superiority 0.8 0.9 1.0 1.1 1.2 Estimated benefit of standard drug over placebo Standard drug better 1.3 1.4 A B C Noninferiority (i.e., Equivalence) Inferiority D Underpowered trial E F Anteman EM, Circulation 2001;103:e101-e104. Interpretation of Non-Inferiority Trials: 6 Examples (A – F) (Hazard ratio and 95% CI) Zone of noninferiority Test drug better 0.6 0.7 Superiority 0.8 0.9 1.0 1.1 1.2 Estimated benefit of standard drug over placebo Standard drug better 1.3 1.4 A B C Noninferiority (i.e., Equivalence) Inferiority D Underpowered trial E F A = Test drug is superior to standard Interpretation of Non-Inferiority Trials: 6 Examples (A – F) (Hazard ratio and 95% CI) Zone of noninferiority Test drug better 0.6 0.7 Superiority 0.8 0.9 1.0 1.1 1.2 Estimated benefit of standard drug over placebo Standard drug better 1.3 1.4 A B C Noninferiority (i.e., Equivalence) Inferiority D Underpowered trial E F B = Test drug is better than standard and can be considered “non-inferior” to standard Interpretation of Non-Inferiority Trials: 6 Examples (A – F) (Hazard ratio and 95% CI) Zone of noninferiority Test drug better 0.6 0.7 Superiority 0.8 0.9 1.0 1.1 1.2 Estimated benefit of standard drug over placebo Standard drug better 1.3 1.4 A B C Noninferiority (i.e., Equivalence) Inferiority D Underpowered trial E F C = Test drug is worse than standard but not that much worse, and can be considered “non-inferior” to standard Interpretation of Non-Inferiority Trials: 6 Examples (A – F) (Hazard ratio and 95% CI) Zone of noninferiority Test drug better 0.6 0.7 Superiority 0.8 0.9 1.0 1.1 1.2 Estimated benefit of standard drug over placebo Standard drug better 1.3 1.4 A B C Noninferiority (i.e., Equivalence) Inferiority D Underpowered trial E F D = Test drug is inferior to standard and non-inferiority criteria not satisfied. Interpretation of Non-Inferiority Trials: 6 Examples (A – F) (Hazard ratio and 95% CI) Zone of noninferiority Test drug better 0.6 0.7 Superiority 0.8 0.9 1.0 1.1 1.2 Estimated benefit of standard drug over placebo Standard drug better 1.3 1.4 A B C Noninferiority (i.e., Equivalence) Inferiority D Underpowered trial E F E = Test drug is very inferior to standard (non-inferiority criteria not satisfied) Interpretation of Non-Inferiority Trials: 6 Examples (A – F) (Hazard ratio and 95% CI) Zone of noninferiority Test drug better 0.6 0.7 Superiority 0.8 0.9 1.0 1.1 1.2 Estimated benefit of standard drug over placebo Standard drug better 1.3 1.4 A B C Noninferiority (i.e., Equivalence) Inferiority D Underpowered trial E F F = Trial is inconclusive due to small size and resultant wide CI Possible Reasons for Non-Significant Difference • Small sample size • Poor compliance to study treatments • Losses-to-follow-up • Equivalent regimens Absence of proof of a treatment difference does not constitute proof of an absence of a treatment difference. Non-Inferiority and Equivalence Trials Considerations • Cannot prove Pe = Pc or µ1 = µ2 therefore Ho: δ < 0 versus HA : δ > 0 is not correct because a small, underpowered study could incorrectly lead to a claim of equivalence – absence of evidence is not evidence of absence, and if power is too high, Ho may be rejected when the difference is not important. • Since Ho cannot be accepted, either reverse the roles of type 1 and 2 errors (i.e., rejection of Ho implies equivalence) or focus on confidence intervals • Treatment difference must be chosen not only to rule out smallest clinically meaningful difference, but also to be sure new treatment is better than no treatment • Consensus on what equivalence means, especially in a broad sense, is hard to achieve 1-Sided Hypothesis Testing (Non-inferiority) A = new treatment; B = standard; PA and PB = event rates (failure rate) PA PB ; 0 Implies standard is better (lower val ues are better for new treatment ) H o : o (B better by at least o ) H A : o (A not worse by as much as o ; A is close to B) If Ho is rejected, treatments are “equivalent” Roles of null and alternative hypotheses are reversed. In practice, this is confusing to people. Blackwelder W, Cont Clin Trials 1982 Parallel Group Studies with Continuous Outcomes: Sample Size Formula is the Same Except for δ0 A B n n A nB 2 2 z1 z1 2 O 2 0.025; z1 1.96 1 .90; z1 1.28 2 10.5 n A nB O 2 2 Note: If Δ=0, then this is equivalent to superiority trial to detect δo with 90% power. Example Non-Inferiority Trial for New BP Lowering Drug δO = 4 mmHg Δ = 0, -2 (A better) and +2 (B better) σ2 = 100; α = 0.025 (1-sided); 1-β = 0.90 1:1 allocation δO Δ No. per group 4 4 4 0 +2 -2 132 525 58 Confidence Interval Approach Example of Type I Error A (new treatment better) ˆ 0 B (standard treatment better) (1 2 ) CI Type I error = Prob (incorrect ly rejecting null hypothesis ) In this case - incorrectl y claiming " equivalenc e" when the treatment s are not (reverse of usual situation) Upper limit of (1 - 2 ) CI < o , but o We want to reject H o when o , not when o (H o is true) Confidence Interval Approach Example of Type II Error A (new treatment better) ˆ 0 B (standard treatment better) (1 2 ) CI Type II error = Prob (incorrect ly accepting null hypothesis ) In this case - incorrectl y claiming the treatment s are not equivalent when they are (also the reverse of usual situation) Upper limit of (1 - 2 ) CI > o , but o We want to reject H o when o , not accept it. Sample Size for Equivalence Design Based on CI Limits A = New Treatment; B = Standard Prob (upper limit of CI exceeds 0 when < 0 ) = ^ ^ Prob (PA PB ) Z 0 1- 2 ^ ^ (PA PB ) (PA PB ) 0 (PA PB ) Prob - Z 1- 2 Z 2 PA (1 PA ) PB (1 PB ) NA NB Sample Size for Equivalence Design Based on CI Limits (cont.) A = New Treatment; B = Standard Choose N A and N B to ensure is small. 2 Z Z 1 2 ( NA NB N PA (1 PA ) PB(1 PB )) 2 0 (PA PB) N = 2P(1-P) 2 Z Z 1- 2 2 0 if PA PB Makuch and Simon (Cancer Treatment Reports, 1978) suggest = 0.10 (1-sided) and = 0.20; I like = .05 (and usually 2-sided) For Proportions and Relative Risks, Farrington and Manning’s Approach is Better • Problem arises because of estimation of variance under the null hypothesis. • Farrington and Manning (Stat Med 1990) have shown that their maximum likelihood approach is better particularly for small values of pc and pe. • Algorithm can be easily programmed. Stat Med 1990; 9:1447-1454 Sample Size for Proportions for Non-Inferiority Trial: Makuch and Simon versus Farrington and Manning (PA=PB)* Sample Size per Group PA(PE) 0.05 0.10 0.15 0.20 0.20 PB(PC) 0.05 0.10 0.15 0.20 0.20 * α = 0.025 (1-sided) 1-β = 0.90 1:1 allocation δO 0.01 0.05 0.05 0.05 0.10 Makuch and Simon 9,972 756 1,071 1,344 336 Farrington and Manning 10,032 775 1,080 1,348 340 Sample Size for Proportions for Non-Inferiority Trial: Makuch and Simon versus Farrington and Manning (PA = or ≠ PB)* Sample Size per Group PA(PE) PB(PC) δO Makuch and Simon 0.10 0.125 0.10 0.10 0.10 0.125 0.05 0.05 0.05 756 3,343 371 * α = 0.025 (1-sided) 1-β = 0.90 1:1 allocation Farrington and Manning 775 3,379 384 Sample Size for Proportions: Superiority Trial with Specified Delta or Inferiority with Farrington and Manning (1:1 allocation and 1-β = 0.90) Sample Size per Group PA(PE) 0.05 0.10 0.15 0.20 0.20 PB(PC) 0.05 0.10 0.15 0.20 0.20 δO 0.01 0.05 0.05 0.05 0.10 Superiority* 9,021 581 917 1,211 266 * α = 0.05 (2-sided) PE=PC - δO Farrington and Manning** 10,032 775 1,080 1,349 340 8,174 630 880 1,099 277 ** α = 0.025 (1-sided) in 1st column; α = 0.05 (1-sided) in 2nd column General Approach New Treatment Better RR RRo (1 2 ) CI .025 95% CI RRo chosen so that if upper limit < RRo, we conclude “equivalence” RRo usually ≠ 1.0 Standard Treatment Better CONVINCE Design • Based on the findings from 17 trials with over 50,000 participants, the CVD risk reduction associated with BP lowering by diuretics and beta-blockers was estimated as 24%. • Equivalence margin was set to ensure that there would be no more than a 50% loss of efficacy based on this point estimate. • Upper bound = 1.16 = 0.88 (12% reduction)/ 0.76 (24% reduction). • Lower bound = 1/1.16 = 0.86. Confidence Interval Approach to Monitoring for Convince 0.86 Lower limit of equiv. Ca+ Blocker Better 1.0 No diff. 1.16 Upper limit of equiv. Diuretic/β-blocker Better Equivalence Inconclusive Non-inferiority and superiority The 95% CI for the difference between the control and the intervention are all >-δ, i.e. noninferiority demonstrated. In this case both noninferiority and superiority have been demonstrated -δ 0 Control treatment better No difference New treatment better Non-inferiority and Inferiority The 95% CI for the difference between the control and the intervention are all >-δ, i.e. noninferiority demonstrated. In this case both noninferiority and superiority have been demonstrated In this case both noninferiority and inferiority have been demonstrated -δ 0 Control treatment better No difference New treatment better Summary - Determining Equivalence • First step in establishing equivalence define ‘limits of equivalence’ (± δ) • Having conducted the trial, calculate the 95% confidence intervals for the difference between the control and the new treatment • If the confidence interval is entirely within ± δ then equivalence is established Summary - Determining Non-inferiority • Equivalence requires that the difference control - new intervention is both > -δ and < δ, the new treatment must be neither worse nor better than the control by a fixed amount. • In contrast to equivalence with non-inferiority we are only interested in determining whether new treatment is no worse by an amount δ. Analysis of Non-inferiority/Equivalence Trials • Superiority trials are analysed by intention-to-treat (ITT) because it is the most conservative and least likely to be biased. • ITT analysis of non-inferiority trials is not conservative - there is a bias towards no difference. • Per protocol analysis is biased since not all randomised patients included. • Recommendation: Analyze by both ITT and per protocol (need to ensure power for both). Testing for Superiority after Non-Inferiority • In some situations it may be appropriate to test for superiority after testing for non-inferiority. • Regulatory authorities do not require any multiplicity adjustment for this. • In this situation, while the primary analysis for non-inferiority might be based on a “per protocol” population, the primary analysis for the superiority analysis should be intention to treat. Equivalence/Non-Inferiority Trials Summary • Equivalence/non-inferiority trials may be larger, smaller or similar to superiority trials – depends on margin chosen and whether new therapy is assumed to be more efficacious. • Equivalence is “in the eyes of the beholder” – select margins carefully! • The absence of a significant difference in a superiority trial does not imply equivalence • Need to be sure about the efficacy of the active control treatment based on earlier trials. • It is critical that the conduct of equivalence/non-inferiority trials is excellent. • Because of difficulty of interpretation, equivalence and non-inferiority trials should be used cautiously. • More head to head superiority comparisons of approved treatments are needed. Quality of Reporting of Non-inferiority and Equivalence Trials (JAMA 2006;295:1147-1151) • Margin of non-inferiority/equivalence defined in most trials, but rationale for margin missing in majority of studies. • About 25% of reports did not give sample size justification in sufficient detail to reproduce it. • Less than 50% described both intention to treat and per protocol analysis. • About 15% of reports did not state confidence intervals. Guidelines for Reporting Non-inferiority and Equivalence Trials+ (JAMA 2006;295:1152-1160) • Specification of whether the trial is a non-inferiority study • Sample size details (specification and rationale for non-inferiority margin) • Use of 1- or 2-sided confidence interval • Nature of analysis: intention to treat, per protocol or both • Presentation of results: confidence intervals + Builds on CONSORT guidelines for superiority trials.