Interim Analyses of Clinical Trials A Requirement Outline • Background and how DSMBs arose and function • Group sequential methods • Examples References • Ellenberg SS, Fleming TR, DeMets DL, Data Monitoring Committees in Clinical Trials, Wiley, 2002. • DeMets DL, Furberg CD, Friedman LM. Data Monitoring in Clinical Trials. A Case Studies Approach, Springer, 2006. • Jennison C and Turnbull BW, Group Sequential Methods with Applications to Clinical Trials, Chapman and Hall, 2000. • Proschan MA, Lan KKG, Wittes J, Statistical Monitoring of Trials. A Unified Approach, 2006, Springer. • http://www.biostat.wisc.edu/landemets Structure for Cooperative Studies (Greenberg Report) Policy Board or Advisory Committee National Advisory Heart Council Institute staff Executive Committee or Steering Committee Coordinating Center Participating Units Cont Clinical Trials 9:137-48, 1988. Initial review group Monitoring Committee Acronyms • PAB = Policy advisory board • DSMB = Data and Safety Monitoring Board • DMC = Data Monitoring Committee • ESMB = Efficacy and safety monitoring board • OSMB = Observational study monitoring board Responsibilities • Steering/Executive Committee/Protocol Team – Study design – Patient recruitment and follow-up – Data collection – Quality assurance – Review of external data – Study reports • DMC or DSMB – Safety of patients – Protection of integrity of study – Review of blinded data on safety and efficacy of treatments – Review of trial conduct, amendments and external data DMCs are responsible to patients, investigators IRBs, regulatory agencies and sponsor. Data Monitoring Rationale • Accumulating data needs to be monitored for risk/benefit (Safety is best assured by comparing the rate of adverse events with a control group) • Reasons: – Ethical : do not expose participants to an inferior intervention longer than needed to test hypothesis – Scientific: assessment of relevance of question (e.g., external data), design assumptions, logistical problems. – Economic: do not waste financial or human resources for a futile trial. Reasons for Early Termination of Clinical Trials • Based on accumulated data from the trial: – Unequivocal evidence of treatment benefit or harm – Unexpected, unacceptable side effects – No emerging trends and no reasonable chance of demonstrating benefit • Based on overall progress of the trial: – Failure to include enough patients at a sufficient rate – Lack of compliance in a large number of patients – Poor follow-up – Poor data quality Today • All NIH sponsored clinical trials are required to have a data monitoring plan • NIH-sponsored trials with clinical endpoints have a DSMB • Many industry sponsored studies have a DSMB • The FDA has prepared a guidance document (Establishment and Operation of Clinical Trial Data Monitoring Committees) http://www.fda.gov/RegulatoryInformation/Guidances/ucm127069.htm • There is variation in operating procedures for DSMBs When is an Independent DSMB Needed • Early phase studies – Monitoring usually at local level; independent DMC not usually needed. • Phase III & IV studies with morbidity/mortality outcomes; pivotal phase III trials • Frail populations, e.g., children, elderly • Trial with substantial uncertainty about safety, e.g., gene therapy See FDA Guidance and ICH/E9, section 4.5. DSMB Composition: Multidisciplinary • Clinical experts in the subject matter area • Biostatisticians with expertise in clinical trials and preferably in the subject matter area • Others depending on the nature of the study, e.g., ethicist, pharmacologist, patient advocate Senior investigators without significant conflicts of interest Independence of DSMB: • Voting members should not be part of the investigative team or work for the sponsor • There should be a clear “need to know” policy for non-DSMB members, e.g., the statistician preparing interim summaries needs to know and may be an employee of the sponsor or member of the investigative team • Members should state potential conflicts This view is not shared by all. See Meinert CL and discussion, Cont Clin Trials, 1998 Typical DSMB Meeting Format • Open Session – Progress report using open data (no outcome data by treatment group) – Sponsor, e.g., NIH, Executive Committee, Protocol Chairs, DSMB and unblinded statisticians • Closed Session – Outcome data by treatment group (usually coded) – DSMB and unblinded statisticians only • Executive Session (DSMB only) • Debriefing Session – DSMB, Sponsor, Executive Committee, Protocol Chairs, and unblinded statisticians DSMB Confidentiality • Interim data reviewed by the DSMB must remain confidential • Members must not share interim data with anyone outside DSMB • Leaks can affect – Patient recruitment – Protocol compliance – Outcome assessment – Trial integrity and support DMC Recommendations • Continue the study unmodified • Modify the study protocol • Terminate the study – Serious toxicity – Clear benefit – Futility – Design/logistical problems Outline • Background and how DSMBs function • Group sequential methods • Examples DSMB Decision Making Can Be Complex • • • • • • Internal consistency Benefit/Risk External consistency Current versus future patients Clinical and public health impact Statistical issues – monitoring guidelines Overall Probability of Achieving a Result with Given Nominal Significance of 0.05 After N Repeated Tests Under Ho No. of Tests (N) Probability 1 2 3 4 5 10 25 Ref: McPherson, NEJM, 1974. .05 .083 .107 .126 .142 .193 .266 Value of Nominal Significance Level Necessary to Achieve a True Level of 0.05 After N Repeated Tests No. of Tests (N) Significance Level Which Should be Used 1 2 3 4 5 10 Ref: McPherson, NEJM, 1974. .05 .0296 .0221 .0183 .0159 .0107 Early Work • Acceptance sampling • Wald (1947) sequential probability ratio test Manufacturing problems, continuous monitoring of the data, no upper bound on sample size Group Sequential Methods • Calculate a summary statistics (e.g., Z for logrank test) on each additional new group of participants (events) • Compare the test statistic to a critical value that preserves overall type 1 error (e.g., 0.05). Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks Interim O-Brien/ Analysis Pocock Fleming 1 2 3 4 5 6 7 2.49 2.49 2.49 2.49 2.49 2.49 2.49 5.46 3.85 3.15 2.73 2.44 2.23 2.06 Haybittle/ Peto 3.0 3.0 3.0 3.0 3.0 3.0 1.96 (2.00) Critical Values O’Brien-Fleming No. of Looks Look 2 1 2.178 .029 2.797 .005 3.290 .001 2 2.178 .029 1.977 .048 1.962 .050 1 2.289 .022 3.471 .0005 3.290 .001 2 2.289 .022 2.454 .014 3.290 .001 3 2.289 .022 2.004 .045 1.964 .050 1 2.361 .018 4.049 .0001 3.290 .001 2 2.361 .018 2.863 .004 3.290 .001 3 2.361 .018 2.338 .019 3.290 .001 4 2.361 .018 2.024 .043 1.967 .049 1 2.413 .016 4.562 .00001 3.290 .001 2 2.413 .016 3.226 .0013 3.290 .001 3 2.413 .016 2.634 .008 3.290 .001 4 2.413 .016 2.281 .023 3.290 .001 5 2.413 .016 2.040 .041 1.967 .049 3 4 5 Pocock Z P Z P Peto Z P Choosing Critical Values Choose the values c1 , . . . , c k so that : Pr Z1 c1 , . . . , Z k ck ; 0 1 or Pr Z1 c1 , or Z 2 c2 ...or Z k ck ; 0 Pocock (1977) Use the same boundary value at each look Reject H 0 the first time when Z k c p or equivalently S k c p k O' Brien and Fleming (1979) Use larger boundary values at earlier looks It is hard to reject H 0 early in the study The final test is similar to a fixed sample test Reject H 0 the first time when Z k cB K / k or equivalently S k cB k General Approach • Compute sample size as if a single look (fixed sample approach) • Specify number of interim analyses and stopping boundary (usually OBF). • Inflate sample size to preserve assumed power using constants in table (not always done as adjustment is minor). • Compute the standardized statistic Zk at each analysis and compare with critical values corresponding to monitoring boundary chosen. • At the end or upon early termination determine P-values and confidence intervals in the usual manner. Problems with Initial Approach • Difficult to specify number of analyses in advance • Logistically difficult to organize reviews after equal increments of information. Solutions: Slud and Wei and Lan-DeMets Flexible Approaches • Slud and Wei (JASA, 1982) – specify exit probabilities for each look (stage) such that they sum to , e.g., the prob of exiting the kth stage is the joint prob of not exiting the 1st k-1 stages and exiting the kth one. • Lan-DeMets (Biometrika, 1983) – specify a use function or type I error spending function, e.g., at time zero, used = 0 and with full information used = 0.05 (or nominal level) Spending Function (t) Alpha .05 (t2 ) (t ) } (t1 ) .0 t1 t2 1 Information Fraction spending function plotted over fraction of total information to be obtained in the study, evaluated at two arbitrary points, t1 and t2 in the study t= (number of events observed at monitoring) (total number of anticipated events) Cont Clin Trial 2000;21:190-207 Critical Values O’Brien-Fleming No. of Looks Look 2 1 2.178 .029 2.797 .005 3.290 .001 2 2.178 .029 1.977 .048 1.962 .050 1 2.289 .022 3.471 .0005 3.290 .001 2 2.289 .022 2.454 .014 3.290 .001 3 2.289 .022 2.004 .045 1.964 .050 1 2.361 .018 4.049 .0001 3.290 .001 2 2.361 .018 2.863 .004 3.290 .001 3 2.361 .018 2.338 .019 3.290 .001 4 2.361 .018 2.024 .043 1.967 .049 1 2.413 .016 4.562 .00001 3.290 .001 2 2.413 .016 3.226 .0013 3.290 .001 3 2.413 .016 2.634 .008 3.290 .001 4 2.413 .016 2.281 .023 3.290 .001 5 2.413 .016 2.040 .041 1.967 .049 3 4 5 Pocock Z P Z P Peto Z P Plots of Pocock-type and O’Brien Fleming-type spending functions for a one-sided 0.025 significance level, for four analyses at 25%, 50%, 75% and 100% of the expected information. 0.025 Spending Functions Pocock Alpha 0.02 0.015 0.01 OBF 0.005 0 0 0.25 0.5 0.5 Information Fraction 1 Approximate O’Brien Fleming Boundaries Using LanDeMets Spending Function Approach: Overall Significance =0.05 and 4 Looks Interim O-Brien OBF Analysis Fleming Lan-DeMets 1 2 3 4 4.05 2.86 2.34 2.02 4.33 2.96 2.36 2.01 Usual Choices for Information • Planned number of events in event-driven trial with common closing date chosen to achieve event target. • Follow-up time, e.g., percent of participants attending final follow-up visit in trial with fixed follow-up for each participant. • Calendar time, e.g., trial with common calendar closing date (e.g., to ensure some minimum follow-up for each participant) but not eventdriven. Beta-Blocker Heart Attack Trial (BHAT) • Placebo-controlled trial of propranolol in patients with a recent MI • Recruitment began in June 1978; planned termination June 1982; average of 3 years of follow-up and maximum of 4 • Primary endpoint – all-cause mortality • Event target - 629 deaths • Stopped early in October 1981 JAMA 1982; 247:1707-1714. Interim Monitoring of BHAT Study Look Monitoring Months Cumulative Logrank Number Date Since Start Deaths Statistic 1 May 1979 11 (.23) 56 (.09) 1.68 2 Oct 1979 16 (.33) 77 (.12) 2.24 3 Mar 1980 21 (.44) 126 (.20) 2.37 4 Oct 1980 28 (.58) 177 (.28) 2.30 5 Apr 1981 34 (.71) 247 (.39) 2.34 6 Oct 1981 40 (.83) 318 (.51) 2.82 Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks (BHAT) Interim Analysis 1 2 3 4 5 6 7 OBF 5.46 3.85 3.15 2.73 2.44 2.23 2.06 Lan-DeMets (OBF) Events Calendar 8.00 8.00 4.86 4.08 3.41 2.95 1.97 Logrank Z=2.82 4.53 3.73 3.20 2.75 2.47 2.28 2.05 Flexible Number of Looks • Another advantage of the Lan-DeMets spending function approach is the flexibility with the number of looks. • Suppose BHAT was not stopped and there were 3 more looks before the end (10 total). • Looks 7-10 correspond to information fractions considering the number of events of 0.65, 0.75, 0.85 and 1.0. • Stopping boundaries can be calculated conditioned upon the previous tests Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks (BHAT) Interim Analysis 1 2 3 4 5 6 7 8 9 10 Lan-DeMets (OBF) 7 Looks 10 Looks 8.00 8.00 4.86 4.08 3.41 2.95 1.97 8.00 8.00 4.86 4.08 3.41 2.95 2.58 2.41 2.26 2.06 Suppose We Get To the 6th Analysis by A Different Route • Information fractions are .05, .20, .30, .40, .45 • Instead of .09, .12, .20, .28, and .39 Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks (BHAT) Interim Analysis 1 2 3 4 5 6 Lan-DeMets (OBF) 7 Looks 7 Looks 8.00 8.00 4.86 4.08 3.41 2.95 8.00 4.89 3.93 3.33 3.19 2.98 Variations of the Theme • Asymmetric boundaries (e.g., non-significant harmful effect of new treatment) – – Use upper boundary for superiority and less conservative boundary for harm (Z= -1.5 or –2.0, or OBF for efficacy and Pocock for harm) Appropriate for an investigational product but probably not for a product already approved and used as part of standard of care • Multiple outcomes, e.g., efficacy and safety, and composites • Multiple trials (CHARM heart failure, Cox-2 chemoprevention) • Futility and curtailed sampling procedures (conditional and unconditional power) • Repeated confidence intervals (e.g., use OBF critical values to compute interim CIs) Asymmetric Monitoring Boundary for Harm Harm Pocock 2.4 1.5 Z Benefit SMART Study Design CD4+ cell count >350 cells/mm3 n = 2752 Virologic Suppression (VS) Strategy [Use of ART to maintain viral load as low as possible throughout follow-up] n = 2720 Drug Conservation (DC) Strategy [Stop or defer ART until CD4+ < 250; then episodic ART based on CD4+ cell count to increase counts to > 350] Plan: 910 primary endpoints; 8 years average follow-up. Intervention interrupted on 11 January 2005. N Engl J Med 2006. SMART Guideline “…it is recommended that the DSMB consider early termination or protocol modification only when the O’Brien-Fleming boundary is crossed for the primary endpoint and the findings for the primary and the composite cardiovascular, metabolic endpoint are consistent...” Interim Monitoring: O’Brien Fleming Boundaries for the Primary Endpoint, by DSMB Date Interim Monitoring: O’Brien Fleming Boundaries for the Primary Endpoint, by Cut Date SMART Primary and Supportive Endpoint Results • OD or death • (primary endpoint) DC Group VS Group N N Rate Rate HR (DC/VS) [95% CI] P-value 122 3.4 50 1.4 2.5 [1.8, 3.5] <0.001 • CVD, Renal, Liver • • - CVD • • - Renal 65 1.8 39 1.1 1.7 [1.1, 2.5] 0.009 48 1.3 31 0.8 1.6 [1.0, 2.5] 0.05 9 0.2 2 0.1 4.5 [1.0, 20.9] 0.05 • - Liver 10 0.3 7 0.2 1.4 [0.6, 3.8] 0.46 Futility • Usual definition - convincing evidence exists that the new treatment is not beneficial. • If this is the case, minimizing exposure to an ineffective treatment with potential toxicities and saving resources should lead to a consideration to stop the trial. • What is convincing? • Futility, more generally, can also be impacted by low event rate or slow enrollment (e.g., CVD mortality outcome in the Physician’s Health Study). Conditional Power (or Stochastic Curtailment) to Assess Futility • What is the probability of rejecting the null hypothesis (i.e., getting a significant result) given the data to date and my best guess about the future, e.g., – will look like the past – no difference – like assumed in the design Lan KKG, Wittes J, Biometrics, 1988. Example of Curtailment from Proschan’s Book Event No Event Control 75 116 191 Treatment 75 118 193 150 234 384 Planned sample size = 400 Even if all 9 remaining controls had events and all 7 treatment group patients did not, Z=0.92. Why continue? Example from Proschan’s Book (cont.) Event No Event Control 71 100 171 Treatment 71 100 171 142 200 342 Planned sample size = 400 If all 20 remaining controls had events and all 20 treatment group patients did not, the result would be significant. But how likely is that? Answer = almost zero. Conditional Power: Usual Implementation • Guidelines in protocol (pre-specified) • Typically compute conditional power after you have a fair amount of data (e.g., 50% of information) • Compute conditional power under a number of scenarios for assumed intervention effect (observed effect to date, alternative assumed in design, null effect, others effect sizes in between). • Can graph boundaries of conditional power versus information accrued to facilitate decision making. Unconditional Power • What is the probability of rejecting the null hypothesis (i.e., getting a significant result) based on the original design assumptions for the treatment effect, but considering: – revised estimate of control group event rate – duration of follow-up accounting for recruitment period and minimum follow-up originally planned for each participant Is a null result still meaningful? Guideline for HIV Early Treatment Trial (START) • 1st consider unconditional power. If < 70%, consider conditional power. • If conditional power is < 20%, consider stopping for futility. Rationale: Unconditional power could be low in the presence of a large treatment effect. Summary (1) • Many studies require a DSMB – Trials with morbidity and mortality outcomes – Trials of treatments that may be associated with serious toxicities (need to have a group look a controlled comparisons) – Trials of novel, high risk treatments (e.g., gene therapy) – Trials involving frail populations (elderly, infants) Summary (2) • A DSMB can be most effective in its role of protecting the interests of patients if it is independent of the sponsor and trial investigators – peer review works! • Operating procedures should be agreed upon in advance • An informed statistician who performs interim analyses is important • To carry out interim analyses data must be collected in a timely way • Reports should focus on comparisons of clinical outcomes and their validity Summary (3) • Monitoring guidelines should be pre-specified • Guidelines need to be accompanied with common sense, a careful assessment of risks and benefits, and and opinions from experts from different backgrounds. • This is a fruitful area for research. Recommendation from Paul Canner based on his experiences in Coronary Drug Project “…no single statistical decision rule or procedure can take the place of the well-reasoned consideration of all aspects of the data by a group of concerned, competent, and experienced persons with a wide range of scientific backgrounds and points of view.” Cont Clin Trials 1981; 1:363-376.