PubH 7420 Clinical Trials: Supplemental Notes for Lectures 23 and

advertisement
PubH 7420 Clinical Trials: Supplemental Notes for Lectures 23 and 24
1.
Friedman, Furberg, and DeMets. Fundamentals of Clinical Trials, Chapter 16.
Supplemental Reading/References
1.
Pocock, SJ.: Clinical Trials: A Practical Approach. John Wiley and Sons, Ltd.,
Chapters 14.
2.
DeMets D, Fleming T, et al. The Data and Safety Monitoring Board and Acquired
Immune Deficiency Syndrome (AIDS) Clinical Trials. Cont. Clinical Trials,
16:408-421, 1995.
3.
Task Force of the Working Group on Arrhythmias of the European Society of
Caridiology. The early termination of clinical trials: causes, consequences, and
control. Circulation 89:2892-2907, 1994.
4.
Meinert C. Clinical tirals and treatment effects monitoring (with comment).
Cont. Clinical Trials, 19:515-543, 1998.
5.
Guidance for clinical trial sponsors on the establishment and operation of clinical
trial data monitoring committees (www.fda.gov/cber/gdlns/clindatmon.htm)
6.
Ellenberg SS, Fleming TR, DeMets DL. Data monitoring committees in clinical
trials. A practical perspective. John Wiley & Sons, 2002.
7.
DeMets D, Califf R, et al. Issues in regulatory guidelines for data monitoring
committee. Clinical Trials 1:162-169, 2004.
8.
McPherson K. Statistics: the problem of examining accumulating data more than
once. NEJM 290:501-502, 1974.
9.
Jennison C, Turnbull BW. Group sequential methods with applications to clinical
trials. Chapman & Hall, 2000.
10.
Fleming TR, Neaton JD, et al. Insights from monitoring the CPCRA
didanosine/zalciabine trial. JAIDS 10:S9-S18, 1995.
11.
DeMets DL, Furberg CD, Friedman LM. Data monitoring in clinical trials. A case
studies approach. Springer, 2006.
1
12.
Mueller PS, et al. Ethical issues in stopping randomized trials early because of
apparent benefit. Ann Intern Med 146:878-881, 2007.
13.
Proschan MA, Lan KKG, Wittes JT. Statistical monitoring of clinical trials. A
unified approach. Springer 2006.
2
Considerations for Sequential Monitoring of Clinical Trials with Fixed
Sample Sizes for Adverse Effects and Therapeutic Benefit
1.
Most clinical trials are designed with a fixed sample size (or expected number of
events for morbidity and mortality studies). This is in contrast to sequential
designs in which the sample size is not fixed by design but depends on the
accumulating results of the study. The simplest sequential design is one in which
patients enter the trial in pairs and are randomly allocated to each treatment (A or
B). The pairs need not necessarily be matched. After the responses for each
pair of patients is noted, (assumed to occur quickly) a decision is made to
continue randomizing or stop the study. This is referred to as an open sequential
plan and in theory with such a plan randomization could go on indefinitely. A
variation of this plan (restricted or closed plan) puts a maximum on the number of
subjects to be enrolled. Both the "open" and "closed" plan involve the
determination of boundary lines based on the Type I and Type II error rates and
the proportion of untied pairs with preference for one of the treatments.
2.
In trials involving morbidity and mortality endpoints, it is essential to periodically
review (usually one or two times per year) accruing endpoint data (efficacy and
safety) for the safety of patients in the trial and also to assure that important
results on toxicity or efficacy are reported in a timely fashion to the scientific
community. Note that even more frequent reviews of data quality and follow-up
success should be carried out. Some investigators refer to these reviews as
administrative reviews.
A trial may be stopped early because of unequivocal evidence of treatment
benefit or harm; for unexpected side effects, that may be minor in nature, but
nevertheless prevent the treatment from being used; clear absence of treatment
differences, e.g., even if the trial was fully enrolled and all patients were followed
as planned, the new data is unlikely to change the current picture.
A trial may also be stopped for poor enrollment, poor follow-up, or poor
compliance to one or more of the treatments. Sometimes the results of another
trial require the alteration of your trial or stopping it.
3.
The consequences of repeated looking at the data for purposes of monitoring is
that Type I error rates are increased above acceptable levels, e.g., 5%. One
needs to be concerned about "data-dredging." For example, if one is operating
at a nominal 0.05 level and one carries out 10 tests of significance during the
study, the likelihood of rejecting the null hypothesis incorrectly is 0.19. If one
3
wants to preserve an α level of 0.05, then each test should be carried out at the
0.011 level of significance.
4.
The following are examples of approaches to interim monitoring that control for
the type I error rate that have been proposed:
a. Require interim treatment differences to exceed 3.0 or 3.5 standard errors
(Peto and Haybittle). This approach is conservative at all interim analyses.
The basic idea is that there should be "proof beyond any reasonable doubt"
before stopping early.
b. Use the same critical value for all interim analyses but one that is based on
the number of looks preserves an overall .05 level of significance. (Pocock)
c. Use a declining critical value for comparing the treatments over the course of
the study. At the last analysis (the planned end of the study) use 1.96 (O'Brien
and Fleming). This approach is very conservative early on. Trials cannot be
stopped unless there is a very large treatment difference.
d. A Bayesian approach based on likelihood ratio statistics (Cornfield).
e. To provide flexibility in monitoring, Lan and DeMets have proposed a spending
function approach to define appropriate critical values. The spending function
defines the rate the overall Type I error is used up at repeated interim analyses.
To use this approach, one has to define the scale on which information is
accumulated. In trials with morbidity/mortality outcomes in which time to event
methods will be used for analysis, the information accumulated corresponds to
the number of endpoints observed, not number of patients under follow-up.
5.
All of the approaches proposed greatly oversimplify the decision making process
to terminate a trial prematurely. Multiple endpoints, multiple treatments and
subgroups all complicate the decision making process. It is for this reason that
such monitoring rules are primarily used as guidelines for the discussion for the
Data Monitoring Board.
There are several issues with respect to data monitoring which are debatable among
investigators involved with clinical trials:
1.
Type of study that requires monitoring -- most agree that trials with major nonreversible, morbidity or mortality outcomes should be monitored.
2.
External vs. internal review group -- most NIH trials now have external
monitoring groups. The monitoring group does not include any investigators
4
involved in the day to day management of patients. Some clinical trialists feel
that data monitoring committees should include both participating investigators
and external experts.
3.
Blinded (coded) vs. unblinded review -- most of the time it is easy to guess.
4.
Role of Sr. Investigator/Protocol Chair (should they be unblinded?)
5.
Functions of Data Monitoring Group
- monitor safety and efficacy; at the same or different intervals.
- monitor data quality, recruitment, general review of design
6.
To whom does the data monitoring group report? How independent should they
be?
7.
Expertise of members -- it is essential that monitoring boards have a
biostatistician with expertise in clinical trials. Clinicians on the board should have
expertise in the subject area under investigation. It is also useful to have an
ethicists on monitoring committees.
8.
Formal vs. informal stopping guidelines
9.
How large should the treatment difference be to stop a trial early for benefit.
Once a decision has been made to stop a trial, either early or on schedule, procedures
must be in place to verify endpoints through the closeout date, to unblind the protocol
team, to prepare a study summary, and to inform patients and clinicians of the results.
Monitoring guidelines should be specified in the protocol. Below is an example of
monitoring guidelines in an HIV protocol called START.
“An independent DSMB, supported by NIAID, will meet as often as required, but at least
annually, to review the general conduct of the trial and to review interim analyses of the
major clinical outcomes. The monitoring plan for the review of the primary endpoint and
its two major components is described below.
A sample size re-estimation will be carried out by the protocol team before the target
enrollment of 4,000 participants is achieved to ensure that the planned sample size is
adequate. For the sample size re-estimation, the protocol team will use only pooled
endpoint data, i.e., the number and rate of AIDS* and non-AIDS events for both
treatment groups combined. The protocol team will use these pooled event data and
other relevant data sources to estimate the rates of AIDS*, non-AIDS, and deaths due
to other causes (including unknown causes).
5
The DSMB will be asked to recommend early termination or modification only when
there is clear and substantial evidence of a treatment difference. As a guideline, the
Lan-DeMets spending function analog of the O’Brien-Fleming boundaries will be used to
monitor the primary endpoint comparison. The DSMB will be asked not to stop the study
early unless there is evidence of a significant treatment difference based on the
spending function boundary for the primary endpoint and each of the two major
components of the primary endpoint – AIDS*, and non-AIDS or deaths not attributed to
AIDS – are consistent (in the same direction, for example, Z > 1.5 for each outcome).
The DSMB will also review other relevant data that might impact the design of START,
e.g., data from other completed trials, and cohorts with similarly defined target
populations.
At each DSMB review, beginning with the review prior to the end of enrollment when
sample size is re-estimated, futility analyses will be presented to the DSMB by the
unblinded statisticians based on conditional and unconditional power. Conditional
power incorporates the observed results by treatment group thus far (and uses the
originally assumed treatment effect for future data) to calculate the conditional
probability of obtaining a significant result by the end of the trial. In contrast,
unconditional power does not take into consideration the observed treatment difference.
It uses a revised estimate of the event rate in the deferred ART arm based on the
observed data, the planned duration of follow-up, and the originally assumed treatment
effect to calculate what the real power was at the beginning of the trial. Participants will
be followed to a common closing date a minimum of 3 years after the last participant is
enrolled. Thus, participant follow-up will range from 3 to 6 years, with an average
follow-up of approximately 4.5 years.
Conditional and unconditional power estimates are used for two different purposes.
Conditional power tells us whether we are likely to get a significant result, whereas
unconditional power tells whether a null result would still be meaningful. For example,
suppose the unconditional power were only 40%. Even if the true treatment benefit
were as originally hypothesized, there would be a 60% chance of missing it. Therefore,
a null result would not rule out the originally hypothesized treatment benefit. On the
other hand, if unconditional power were high — say 90% — then a null result would
effectively rule out the originally hypothesized treatment benefit.
As a guideline, we recommend that the DSMB first consider unconditional power. If
unconditional power is less than 70%, the DSMB should then consider conditional
power. If conditional power, given the observed data and assuming the originally
hypothesized treatment effect thereafter, is less than 20%, consideration should be
given to stopping the trial. We recommend early termination for futility only if both
conditional and unconditional power estimates are low, i.e., only if a null result is both
likely and not meaningful. It is possible that unconditional power is low in the presence
6
of a very large treatment effect of early ART. Hence, there is a need to also consider
conditional power. Such a scenario would not be grounds for stopping for futility
because conditional power would probably still be high, indicating that a null result is
unlikely.
In summary, the DSMB for START will be provided with the aforementioned guidelines
but be expected to use their expert and independent judgment concerning early
termination. It is recognized that there are a number of considerations in determining
whether a trial should be stopped early. For that reason, we propose guidelines to the
DSMB, not rules.”
7
Download