PubH 7420 Clinical Trials: Supplemental Notes for Lectures 23 and 24 1. Friedman, Furberg, and DeMets. Fundamentals of Clinical Trials, Chapter 16. Supplemental Reading/References 1. Pocock, SJ.: Clinical Trials: A Practical Approach. John Wiley and Sons, Ltd., Chapters 14. 2. DeMets D, Fleming T, et al. The Data and Safety Monitoring Board and Acquired Immune Deficiency Syndrome (AIDS) Clinical Trials. Cont. Clinical Trials, 16:408-421, 1995. 3. Task Force of the Working Group on Arrhythmias of the European Society of Caridiology. The early termination of clinical trials: causes, consequences, and control. Circulation 89:2892-2907, 1994. 4. Meinert C. Clinical tirals and treatment effects monitoring (with comment). Cont. Clinical Trials, 19:515-543, 1998. 5. Guidance for clinical trial sponsors on the establishment and operation of clinical trial data monitoring committees (www.fda.gov/cber/gdlns/clindatmon.htm) 6. Ellenberg SS, Fleming TR, DeMets DL. Data monitoring committees in clinical trials. A practical perspective. John Wiley & Sons, 2002. 7. DeMets D, Califf R, et al. Issues in regulatory guidelines for data monitoring committee. Clinical Trials 1:162-169, 2004. 8. McPherson K. Statistics: the problem of examining accumulating data more than once. NEJM 290:501-502, 1974. 9. Jennison C, Turnbull BW. Group sequential methods with applications to clinical trials. Chapman & Hall, 2000. 10. Fleming TR, Neaton JD, et al. Insights from monitoring the CPCRA didanosine/zalciabine trial. JAIDS 10:S9-S18, 1995. 11. DeMets DL, Furberg CD, Friedman LM. Data monitoring in clinical trials. A case studies approach. Springer, 2006. 1 12. Mueller PS, et al. Ethical issues in stopping randomized trials early because of apparent benefit. Ann Intern Med 146:878-881, 2007. 13. Proschan MA, Lan KKG, Wittes JT. Statistical monitoring of clinical trials. A unified approach. Springer 2006. 2 Considerations for Sequential Monitoring of Clinical Trials with Fixed Sample Sizes for Adverse Effects and Therapeutic Benefit 1. Most clinical trials are designed with a fixed sample size (or expected number of events for morbidity and mortality studies). This is in contrast to sequential designs in which the sample size is not fixed by design but depends on the accumulating results of the study. The simplest sequential design is one in which patients enter the trial in pairs and are randomly allocated to each treatment (A or B). The pairs need not necessarily be matched. After the responses for each pair of patients is noted, (assumed to occur quickly) a decision is made to continue randomizing or stop the study. This is referred to as an open sequential plan and in theory with such a plan randomization could go on indefinitely. A variation of this plan (restricted or closed plan) puts a maximum on the number of subjects to be enrolled. Both the "open" and "closed" plan involve the determination of boundary lines based on the Type I and Type II error rates and the proportion of untied pairs with preference for one of the treatments. 2. In trials involving morbidity and mortality endpoints, it is essential to periodically review (usually one or two times per year) accruing endpoint data (efficacy and safety) for the safety of patients in the trial and also to assure that important results on toxicity or efficacy are reported in a timely fashion to the scientific community. Note that even more frequent reviews of data quality and follow-up success should be carried out. Some investigators refer to these reviews as administrative reviews. A trial may be stopped early because of unequivocal evidence of treatment benefit or harm; for unexpected side effects, that may be minor in nature, but nevertheless prevent the treatment from being used; clear absence of treatment differences, e.g., even if the trial was fully enrolled and all patients were followed as planned, the new data is unlikely to change the current picture. A trial may also be stopped for poor enrollment, poor follow-up, or poor compliance to one or more of the treatments. Sometimes the results of another trial require the alteration of your trial or stopping it. 3. The consequences of repeated looking at the data for purposes of monitoring is that Type I error rates are increased above acceptable levels, e.g., 5%. One needs to be concerned about "data-dredging." For example, if one is operating at a nominal 0.05 level and one carries out 10 tests of significance during the study, the likelihood of rejecting the null hypothesis incorrectly is 0.19. If one 3 wants to preserve an α level of 0.05, then each test should be carried out at the 0.011 level of significance. 4. The following are examples of approaches to interim monitoring that control for the type I error rate that have been proposed: a. Require interim treatment differences to exceed 3.0 or 3.5 standard errors (Peto and Haybittle). This approach is conservative at all interim analyses. The basic idea is that there should be "proof beyond any reasonable doubt" before stopping early. b. Use the same critical value for all interim analyses but one that is based on the number of looks preserves an overall .05 level of significance. (Pocock) c. Use a declining critical value for comparing the treatments over the course of the study. At the last analysis (the planned end of the study) use 1.96 (O'Brien and Fleming). This approach is very conservative early on. Trials cannot be stopped unless there is a very large treatment difference. d. A Bayesian approach based on likelihood ratio statistics (Cornfield). e. To provide flexibility in monitoring, Lan and DeMets have proposed a spending function approach to define appropriate critical values. The spending function defines the rate the overall Type I error is used up at repeated interim analyses. To use this approach, one has to define the scale on which information is accumulated. In trials with morbidity/mortality outcomes in which time to event methods will be used for analysis, the information accumulated corresponds to the number of endpoints observed, not number of patients under follow-up. 5. All of the approaches proposed greatly oversimplify the decision making process to terminate a trial prematurely. Multiple endpoints, multiple treatments and subgroups all complicate the decision making process. It is for this reason that such monitoring rules are primarily used as guidelines for the discussion for the Data Monitoring Board. There are several issues with respect to data monitoring which are debatable among investigators involved with clinical trials: 1. Type of study that requires monitoring -- most agree that trials with major nonreversible, morbidity or mortality outcomes should be monitored. 2. External vs. internal review group -- most NIH trials now have external monitoring groups. The monitoring group does not include any investigators 4 involved in the day to day management of patients. Some clinical trialists feel that data monitoring committees should include both participating investigators and external experts. 3. Blinded (coded) vs. unblinded review -- most of the time it is easy to guess. 4. Role of Sr. Investigator/Protocol Chair (should they be unblinded?) 5. Functions of Data Monitoring Group - monitor safety and efficacy; at the same or different intervals. - monitor data quality, recruitment, general review of design 6. To whom does the data monitoring group report? How independent should they be? 7. Expertise of members -- it is essential that monitoring boards have a biostatistician with expertise in clinical trials. Clinicians on the board should have expertise in the subject area under investigation. It is also useful to have an ethicists on monitoring committees. 8. Formal vs. informal stopping guidelines 9. How large should the treatment difference be to stop a trial early for benefit. Once a decision has been made to stop a trial, either early or on schedule, procedures must be in place to verify endpoints through the closeout date, to unblind the protocol team, to prepare a study summary, and to inform patients and clinicians of the results. Monitoring guidelines should be specified in the protocol. Below is an example of monitoring guidelines in an HIV protocol called START. “An independent DSMB, supported by NIAID, will meet as often as required, but at least annually, to review the general conduct of the trial and to review interim analyses of the major clinical outcomes. The monitoring plan for the review of the primary endpoint and its two major components is described below. A sample size re-estimation will be carried out by the protocol team before the target enrollment of 4,000 participants is achieved to ensure that the planned sample size is adequate. For the sample size re-estimation, the protocol team will use only pooled endpoint data, i.e., the number and rate of AIDS* and non-AIDS events for both treatment groups combined. The protocol team will use these pooled event data and other relevant data sources to estimate the rates of AIDS*, non-AIDS, and deaths due to other causes (including unknown causes). 5 The DSMB will be asked to recommend early termination or modification only when there is clear and substantial evidence of a treatment difference. As a guideline, the Lan-DeMets spending function analog of the O’Brien-Fleming boundaries will be used to monitor the primary endpoint comparison. The DSMB will be asked not to stop the study early unless there is evidence of a significant treatment difference based on the spending function boundary for the primary endpoint and each of the two major components of the primary endpoint – AIDS*, and non-AIDS or deaths not attributed to AIDS – are consistent (in the same direction, for example, Z > 1.5 for each outcome). The DSMB will also review other relevant data that might impact the design of START, e.g., data from other completed trials, and cohorts with similarly defined target populations. At each DSMB review, beginning with the review prior to the end of enrollment when sample size is re-estimated, futility analyses will be presented to the DSMB by the unblinded statisticians based on conditional and unconditional power. Conditional power incorporates the observed results by treatment group thus far (and uses the originally assumed treatment effect for future data) to calculate the conditional probability of obtaining a significant result by the end of the trial. In contrast, unconditional power does not take into consideration the observed treatment difference. It uses a revised estimate of the event rate in the deferred ART arm based on the observed data, the planned duration of follow-up, and the originally assumed treatment effect to calculate what the real power was at the beginning of the trial. Participants will be followed to a common closing date a minimum of 3 years after the last participant is enrolled. Thus, participant follow-up will range from 3 to 6 years, with an average follow-up of approximately 4.5 years. Conditional and unconditional power estimates are used for two different purposes. Conditional power tells us whether we are likely to get a significant result, whereas unconditional power tells whether a null result would still be meaningful. For example, suppose the unconditional power were only 40%. Even if the true treatment benefit were as originally hypothesized, there would be a 60% chance of missing it. Therefore, a null result would not rule out the originally hypothesized treatment benefit. On the other hand, if unconditional power were high — say 90% — then a null result would effectively rule out the originally hypothesized treatment benefit. As a guideline, we recommend that the DSMB first consider unconditional power. If unconditional power is less than 70%, the DSMB should then consider conditional power. If conditional power, given the observed data and assuming the originally hypothesized treatment effect thereafter, is less than 20%, consideration should be given to stopping the trial. We recommend early termination for futility only if both conditional and unconditional power estimates are low, i.e., only if a null result is both likely and not meaningful. It is possible that unconditional power is low in the presence 6 of a very large treatment effect of early ART. Hence, there is a need to also consider conditional power. Such a scenario would not be grounds for stopping for futility because conditional power would probably still be high, indicating that a null result is unlikely. In summary, the DSMB for START will be provided with the aforementioned guidelines but be expected to use their expert and independent judgment concerning early termination. It is recognized that there are a number of considerations in determining whether a trial should be stopped early. For that reason, we propose guidelines to the DSMB, not rules.” 7