Adaptive Designs: Terminology and Classification

advertisement
Adaptive Designs: Terminology and Classification
1. Introduction
Recent achievements in the methodology of adaptive designs provide new ways of drug
development that have the potential to improve quality, speed and efficiency of decision
making. With introduced flexibility within trial design, this approach saves resources by
identifying failures early and increases efficiency by focusing precious patient resource
on treatments that have a higher probability of success. While clearly advantageous to the
drug development program, this is also ethically beneficial to the patients in the trial as it
restricts patient exposures to ineffective treatments.
Unfortunately, as often happens with novel approaches, there has been substantial
confusion over what these designs are and when they are most applicable. They are
known as adaptive, sequential, flexible, self-designing, multi-stage, dynamic, responsedriven, smart, novel designs. We propose here an integrated approach to defining and
classifying adaptive designs in order to minimize confusion on their terminology and
taxonomy across the pharmaceutical industry, its stakeholders and analysts, and
regulatory agencies.
The primary purpose of this paper is to describe the range of adaptive designs that are
available and to promote the benefits they might bring in all phases of clinical drug
development. It is necessary to emphasize that these designs have much more to offer
than the rigid conventional parallel group designs in clinical trials.
To maintain focus within the space allotment available, we do not attempt an exhaustive
literature review. Rather we focus on key ideas and cite supportive literature as
appropriate. Section 2 gives a general definition of adaptive designs and their structure.
Section 3 provides a classification of adaptive designs
2. Definition of adaptive designs
Adaptive design is defined as a multi-stage study design that uses accumulating data to
decide on how to modify aspects of the study without undermining the validity and
integrity of the trial.
To maintain study validity means providing correct statistical inference (such as adjusted
p-values, unbiased estimates and adjusted confidence intervals, etc), assuring consistency
between different stages of the study, minimizing operational bias.
To maintain study integrity means providing convincing results to a broader scientific
community, preplanning, as much as possible, based on intended adaptations, and
maintaining the blind of interim analysis results.
An adaptive design requires the trial to be conducted in multiple stages with access to the
accumulated data. An adaptive design may have one or more of the following rules
applied at an interim look:

Allocation Rule: how subjects will be allocated to different arms in the trial? This
can be a fixed randomization, say 1:1, throughout the trial, or it may be adaptive
with the randomization ratio changing from stage to stage, based on accruing data.
This includes also the decision to drop or add treatment arms.

Sampling Rule: how many subjects will be sampled at the next stage? This may
depend on estimates of accrual so far, or on estimates of nuisance parameters, e.g.
variance, or even on estimates of treatment effect. In dose-escalation studies this
is the cohort size per stage.

Stopping Rule: when to stop the trial? There are many reasons for stopping a
trial: for efficacy, for harm, for futility, for safety.

Decision Rule: the final decision and interim decisions pertaining to design
change not covered by the previous three rules, e.g. to update the model, to
change the endpoint, to modify the initial design.
At any stage, the data may be analyzed and subsequent stages can be redesigned taking
into account all available data.
This definition includes any group sequential designs for which the only design revision
is stopping the study early for sufficiently strong evidence of a treatment effect
difference. Another kind of adaptive design aims to treat patients in the study as
efficiently as possible using response adaptive allocation in which patients are more
likely to be assigned to treatment that appears to be more efficient according to the
observed responses. Sample size re-assessment or "internal pilot studies" involve the
recalculation of sample size based on interim information about the values of nuisance
parameters. While each of these three designs allow just one of the adaptation rules, the
most recent class of adaptive designs, known also as flexible designs, allow for adaptive
allocation rule (changing the randomization from stage to stage), adaptive sampling rule
(the timing of the next interim analysis), a stopping rule, as well as for other
modifications to be made following interim analyses (adaptive decision rule), including
changing the target treatment difference for which the study is powered, changing the
primary endpoint or varying the form of the primary analysis.
Although statistical methodology has been developed to allow for these types of adaptive
designs, these methods should never be used to replace the careful planning for the
statistical design of a clinical trial. Before starting the trial, an efficient design must be
detailed in the protocol. Adaptive design methodology then provides a valuable tool for
reasonable design changes.
We describe below the four elements of an adaptive design.
2
Allocation Rules. At each stage, the allocation rule determines how new patients will be
assigned to available treatments. An allocation rule may be fixed (static) during the study
or may by adaptive (dynamic), changing from stage to stage according to previous
treatment assignments or/and patient responses.
A fixed allocation rule does not necessarily mean a deterministic rule. On the contrary,
randomization (random allocation) of patients is usually used to achieve balance in all
known and unknown, observed and unobserved covariates (prognostic factors) at
baseline. However, a fixed allocation rule uses allocation probabilities that are
determined in advance and are not changed during the trial. Complete randomization uses
equal allocation probabilities for balancing treatment assignments. Stratification can be
used to improve the randomization, but this approach limits the number of covariates.
Permuted block design can also be used, but this method has the disadvantage of high
predictability at the investigatory site level. Restricted randomization is used with fixed
unequal allocation probabilities for unbalanced treatment allocation. Rosenberger and
Lachin [1] develop this subject more deeply than is possible to report here.
By contrast, an adaptive allocation rule dynamically alters the allocation probabilities to
reflect the accruing data on the trial. Covariate-adaptive randomization [2-5] ensures
balance between treatment arms with respect to known covariates. Rather than to balance
over known covariates, the optimal design approach [6, 7] minimizes the variance of
treatment effect estimator in the presence of covariates.
Response-adaptive randomization uses interim data to unbalance the allocation
probabilities in favor of the treatment arms having comparatively superior outcomes. The
simplest one is the randomized play-the-winner rule [8, 9], in which a success on one
treatment results in the next patient’s assignment to the same treatment, and only changes
to the alternative treatment in the event of a failure. More complex and flexible allocation
rules can be obtained by using urn models [1]. The allocation probabilities are changed
during the course of the trial to reflect the known outcomes of patients by adding balls of
an appropriate color to the urn. The doubly adaptive biased coin design [10] adapts
allocations based on previous treatment group assignments as well as on the outcome
information.
Bayesian response-adaptive randomization [11, 12] alters the allocation probabilities
based on the posterior probabilities of each treatment arm being the “best". Drop-theloser type [13] of allocation rule removes completely a treatment arm from further
randomization schedule. This gives patients a higher chance of receiving the treatment
that is performing better.
Sampling Rules. At each stage, the sampling rule determines how many subjects will be
sampled at the next stage. Sample size re-estimation (SSR) design consists of two stages
and a simple sampling rule that determines the sample size for the second stage in the
light of first stage data. This may depend on estimates of nuisance parameters such as
variance or response rate in control arm, but not on the treatment difference. A restricted
sampling rule is one where the target sample size calculated before the trial serves as a
3
lower bound for the recalculated sample size [14, 15]. Blinded SSR rules calculate the
estimate of the nuisance parameter without unmasking treatment codes [16, 17]. They are
quite efficient [18-21] and less controversial than the unblinded SSR [22, 23] rules that
require unmasking because the pooled variance depends on the sample mean in each arm.
A traditional group sequential design uses a simple sampling rule with fixed (usually
equal) sample sizes per stage. On the other hand, the information based design [24] uses
a sampling rule that keeps the maximum information fixed but adjusts the sample size in
order to achieve it. An error spending approach [25] allows the sample sizes for different
stages to vary but in a way that does not depend on the observations from previous
stages. Sequentially planned decision procedures [26-30] extend the group sequential
designs by allowing future stage sample sizes to depend on the current value of the test
statistic.
The most flexible SSR rules incorporate information on the estimated treatment
difference as well [31, 32]. The sample size for the next stage is determined by the
conditional power, defined as the probability of rejecting the null hypothesis at the end of
the study, conditional on the first-stage data. This probability is usually calculated under
the originally specified treatment difference and uses information not only on the
nuisance parameters but all the observed data, by conditioning on the observed test
statistic. It is tempting to replace the originally specified treatment difference by its
interim estimate. This option, although proposed frequently [33-36], cannot be
recommended as a general strategy. The interim effect size is a random variable and will
lead to highly variable second stage sample sizes, including particularly large ones [37].
A cap on the maximum sample size is recommended in such situations [38].
Stopping Rules. Stopping rules for clinical trials are intended to protect patients in the
trial from unsafe drugs or to hasten the approval of a beneficial treatment. There is a wide
range of statistical rules that can be used to determine whether to stop or continue a trial.
The majority of such stopping rules are applied to a single primary endpoint and are
constructed to satisfy a given power requirement in a hypothesis testing framework.
Stopping rules are now available for testing superiority, equivalence, noninferiority and
even safety aspects of clinical trials.
A trial may be stopped in the following three situations: first, if the experimental
treatment is clearly better than the control (superiority); second, if it is clearly worse than
the control (harm); and third, if it is clearly not going to be shown to be better than the
control (futility). Many stopping rules are based on boundary crossing methodology: at
any stage in the trial, a test statistic is calculated and compared with given stopping
boundaries, corresponding to one of the three objectives above; if either of them is
crossed, the trial is stopped and the appropriate conclusion drawn, otherwise it is
continued to the next stage.
Bayesian stopping rules are based on posterior probabilities of hypotheses of interest and
may be supplemented by making predictions of the possible consequences of continuing.
Each of the three objectives may be formalized by assessing the posterior probability that
the treatment benefit lies above or below some threshold. A skeptical prior can be used
4
for early stopping for efficacy and an enthusiastic prior for early stopping for futility
[39, 40].
Decision Rules. At any stage, additional decision rules can be considered like changing
the test statistics, redesigning multiple endpoints, selecting which hypothesis to be tested
(switching from superiority to non-inferiority [41, 42] or changing the hierarchical order
of hypotheses [43, 44]), changing the patient population (e.g., going forward either with
the full population or with a pre-specified subpopulation).
To maximize the power of parametric trend tests in a dose-response trial, scores
corresponding to the typically unknown shape of the dose response curve have to be
applied. Using an adaptive combination test, one can use the first stage data to estimate
this shape and compute appropriate scores for the second stage test [45]. A similar idea
has been used for changing scores for the comparison of survival curves, if deviations
from the proportional hazards assumption are apparent based on the interim data [46].
Location-scale tests are used in situations where an increase in location is accompanied
by an increase in variability. A usual test statistic for such a test is the sum of a location
and a scale test statistics. This test can be improved by an adaptive two-stage design
where in the first stage the sum and in the second stage a weighted sum of a location and
a scale test statistics is used. The appropriate weights are estimated based on the first
stage data [47].
Another example for an adaptive choice of the test statistics could be to include a
covariate in the second stage test procedure which, in the interim analysis, shows an
unexpected effect in terms of variance reductions (not foreseen in the study protocol)
[48].
Decision rules for redesigning multiple endpoints include changing their pre-assigned
hierarchical order in multiple testing [49], updating their correlation in reverse
multiplicity situation [50], excluding those that are not properly measured in terms of
variability and completeness [51], updating the parameters in modeling the relationship
between the primary endpoint and auxiliary variables (biomarkers, short-term endpoints,
etc) [12, 52].
After the first stage, one can perform another two-stage test with the level given by the
conditional error function [53]. This allows choosing adaptively the number of interim
analyses based on information collected so far. For example, if the sample size was
increased one can add another interim analysis if the probability for an early decision is
high.
5
3. Classification of adaptive designs
Single arm trials
Standard Phase II studies are used to screen new treatments for activity and decide which
ones should be tested further. The decisions generally are based on single-arm studies
using short-term endpoints (response/no response) in limited number of patients. The
problem is formulated as hypothesis testing about some minimal acceptable probability
of response allowing early stopping due to inactivity of the treatment.
An early approach [54] considered both estimation and testing. At the end of the first
stage a decision is made to abandon development of the new treatment if there have been
no responses observed. The sample size for the first stage is determined so as to give a
specified type I error rate. Following the first stage, the sampling rule calculates the
second stage sample size depending on the data from the first stage, so as to estimate the
unknown response rate with the specified precision. The design has been extended to
three stages [55, 56].
Several group sequential designs with a fixed sampling rule have been proposed and
evaluated in the frequientist framework [57, 58]. An adaptive two-stage design allows the
sample size at the second stage to depend on the results at the first stage [59].
A Bayesian design [60] stops the trial for activity as soon as the posterior probability that
the true response rate is at least as the standard exceeds 0.9 or stops for futility if the
posterior probability that the true response is of a considerable improvement over the
standard is less than 0.1.
Instead of evaluating each treatment in isolation, one after the other, the adaptive design
for the entire screening program can be considered [61-63]. Number of subjects per
screening trial is chosen to minimize the shortest possible time to identify the
"promising" compound, subject to the given constraints on type I and II risks for the
entire screening program.
Comparing two treatments
The main objective of large-scale Phase III clinical trials is to confirm the clinical benefit
of the experimental treatment by comparing it with a control (placebo or active). The
clinical benefit is expressed through a parameter, an unknown population characteristic
about which a hypothesis testing problem is formulated. A test statistic measures the
advantage of experimental over control apparent from the sample of data available at an
interim analysis.
A sequential design uses a stopping rule that stops the trial at a given stage if the
boundary is crossed. If the test statistic stays within the boundaries then there is no
enough evidence to come to a conclusion and a further interim look should be taken. A
fully sequential design [64] has a very simple sampling rule: look after every observation.
Group sequential designs [65] have two or more stages at which the test statistic is
compared with the boundaries after groups of patients have been observed. These designs
6
have a simple allocation rule with fixed randomization and a decision rule that simply
determines whether to accept or reject the null hypothesis after stopping. The precise
form of the stopping rule is determined by consideration of significance level (Type I
error rate) and power at the specified alternative (desired treatment advantage on the
primary endpoint). The appropriate type of stopping rule should reflect the main
objective of the trial and the desirable reasons for stopping or continuing.
Traditionally, the purpose of group sequential designs in confirmatory trials (e.g., [66,
67]), was to stop early only under overwhelming evidence of treatment benefit. In such
case, they are said to have ‘stopped prematurely’ as if there is some correct size and
duration for every trial and falling short of it is inevitably suspect. However, there is no
correct sample size. All statistical sample size calculations are based on compromises and
assumptions. The compromises come from setting the clinically important treatment
difference at which the power is specified and assumptions involve the value of ‘nuisance
parameters’, such as the variance of a quantitative endpoint or success rate or survival
pattern of patients in the control arm.
Instead of relying on such compromises and assumptions, so-called adaptive group
sequential designs extend the group sequential design methodology by allowing not only
to stop early but also to increase the sample size or study duration when such an increase
is worthwhile. The p-value combination test is the cornerstone of the methodology.
Technical details may be found in [68, 69]. It has been shown that different approaches to
flexible designs via the conditional error function [33] or variance-spending approach
[70] or down weighting the second stage test statistic [34, 71] can be looked at in terms of
combination functions [31, 72].
Assume that a one-sided null hypothesis H0 is tested in a two-stage design. The test
decisions are based on p-values p1 and p2 calculated from the separate samples of the two
stages. Early decision boundaries are defined for p1: if p1  α1, (where α1 < α) the trial
stops after the interim analysis with an early rejection, if p1 > α0, (where α0 > α) it stops
with an acceptance of H0 (stopping for futility). If the trial proceeds to the second stage
the decision in the final analysis is based on a combination function C(p1,p2) defined in
the study protocol: if C(p1, p2)  c the null hypothesis is rejected, otherwise accepted.
The rejection boundary c has to be chosen to get a combination test for stochastically
independent p-values at the significance level α, and will depend on the first stage
decision boundaries α1 and α0. Examples of combination function are Fisher’s product
test, the “inverse normal“ method, adaptively weighted z-score test, cumulative sum of
chi-square statistics.
The methodology allows one to combine p-values from two independent samples,
regardless of whether or not they are based on the same endpoint, test statistic, etc.
Therefore, an adaptive sampling rule and a wide spectrum of decision rules described in
the previous section can be applied after the first stage to modify the design. The
recursive application of two-stage combination tests generalizes to flexible designs with
variable number of stages, see e.g. [72-74]. This larger class of adaptive designs is called
flexible designs.
7
Comparing more than two treatments
One of the first proposals for this kind of adaptive designs was for establishing a doseresponse relationship [75]. The objective of the first stage in such a study is to obtain
some initial evidence of dose response on the primary endpoint with an option of early
stopping for futility. The first stage test statistic can be a linear trend test and its p-value
should be used in the combination function at the second stage. Dose selection is not
restricted to any specific decision rule, it may be based not only on the primary endpoint,
but also takes into account the whole spectrum of safety data. It is quite frequent that not
the most efficacious treatment is selected but, for example, a lower dose with submaximal efficacy and a better overall benefit/risk ratio. Sample size re-estimation can be
performed in addition to the adaptive choice of the doses carried on to the second stage.
The allocation ratio could be changed to randomize more patients to the most promising
dose. A closed multiple testing procedure controlling the familywise significance level
can be used for individual treatment comparisons [76, 77].
Adaptive model-based dose finding. The primary goal of a dose-finding study is to
establish the dose-response relationship. The optimal experimental design framework
provides enough structure to make this goal attainable. It is assumed that the available
doses (the design region) and the response variables have been defined and there exists a
known structure for the mathematical model describing the dose-response relationship
(the model). The focus is on choosing the dose levels in some optimal way to enhance the
process of estimating the unknown parameters of the model θ. The experimental designs
are represented by a set of design points (support points) and a corresponding set of
weights representing the allocations to the design points: ξ={(xi, λi), i=1,k}. An important
element in optimal design is the information matrix, say M(ξ, θ), which is an expression
of the accuracy of the θ estimate based on observations at k design points of design ξ. A
"larger" value of M reflects more information (more precision, lower variability) in the
estimate. A natural goal in picking the design ξ is to find the design that "maximizes" the
determinant of matrix M, the so-called the D-optimal criterion [78-80].
A major challenge in design for nonlinear (in θ) models is that the optimal design ξ*
depends on θ--a conundrum: one is looking for the design ξ with the aim of estimating
the unknown θ, and yet one has to know θ to find the best ξ. This conundrum leads to
various ways of coping with the dependence on θ. These include the locally-optimal
design based on one's best guess at θ, Bayesian design by augmenting the criterion to
reflect the uncertainty in a prior knowledge about θ, minimax design by finding the
design that is optimal under the worst parameter θ value, adaptive design by alternating
between forming estimates of θ and choosing a locally-optimal design for that value of
the parameter.
In an adaptive design, each new cohort of patients is allocated to the doses that maximize
the expected increment of information (in terms of selected criterion), given the current
interim data. The maximization is carried over the whole range of possible doses with
additional constraints that, for instance, may involve the probability of toxicity at those
doses, accommodating the maximum tolerated dose (MTD) mentality: dose escalate
cautiously starting from the lowest dose. Initial design is chosen and preliminary
8
parameter estimates are obtained. Then, the next dose(s) is selected from the available
range of doses that satisfy the efficacy and safety constraints and provide the maximal
improvement of the design with respect to the selected criterion of optimality and current
parameter estimates. The next available cohort of patients is allocated to this dose. The
estimates of unknown parameters are refined given these additional observations. These
design-estimation steps are repeated until either the available resources are exhausted or
the set of acceptable doses is empty. Such an approach is efficient from the perspective of
both time and patient resources.
D-optimal designs in general, are concerned mainly with collective ethics: doing in the
dose-finding study what is best for future patients who stand to benefit from the results of
the trial. In contrast, alternative procedures for dose-finding studies have been proposed
that are mainly concerned with individual ethics: doing what is best for current patients in
the trial. The continual reassessment method (CRM) [81] was the first such method that
formulates the goal of a dose escalation in Phase I trial as to maximize patient gain.
Similar procedures are considered in [82-84]. Notice however, that although these
designs rely on a noble intention to maximize individual gain by allocating the patient to
the "best" known dose, the individual ethics may be well compromised by the "poor
learning" about the "best" dose with such a design.
Pocock [85] points out that each clinical trial involves a balance between individual and
collective ethics and such a balance is never simple but complex. Of course, the
collective ethics should never usurp the individual ethics. Dragalin and Fedorov [86]
made one of the first attempt to formalize the goal of a dose-finding study as a penalized
D-optimal design problem: find the design that maximizes the information (collective
(society) ethics) but under the control of the total penalty for treating patients in the trial
(individual (all individuals in the trial) ethics).
A comprehensive overview of many adaptive designs in Phase I clinical trials is given in
[87-89]. Most designs for dose-finding in Phase I clinical trials determine a MTD based
on toxicity alone, while ignoring efficacy response. Most Phase II designs assume that a
toxicity acceptable dose range has been determined and aim to establish treatment
efficacy at some dose in this range, with early stopping if response rate is too low.
However, under a variety of circumstances, it is useful to address safety and efficacy
simultaneously. A class of models that can be used in early phase clinical trials in which
patient response is characterized by two dependent binary outcomes, one for efficacy and
one for toxicity have been proposed [86], and response-adaptive designs with both dose
allocation rules and early stopping rules in terms of response and toxicity, thus
combining elements of more typical Phase I and Phase II trials, have been derived.
Similar designs can be used in drug combination studies [90]. For the same situation,
Bayesian adaptive designs have been also proposed [91-97].
We refer to Gaydos et al. [98] in this volume for additional information on adaptive
designs in dose-finding studies.
9
Seamless Phase II/III designs. Important opportunities for seamless designs are
available also in combining traditional Phase IIb and Phase III of clinical development
into a single trial, both operationally and inferentially, i.e. conducting both treatment
selection and confirmation of treatment efficacy over control under a single protocol
where all the data are appropriately used in the final analysis. For a detailed description
of this type of adaptive designs see Maca et al. [99] in this volume.
4. Conclusion
The objective of a clinical trial may be either to target the MTD or minimum effective
dose, or to find the therapeutic range, or to determine the optimal safe dose to be
recommended for confirmation, or to confirm efficacy over control in Phase III clinical
trial. This clinical goal is usually determined by the clinicians from the pharmaceutical
industry, practicing physicians, key opinion leaders in the field, and the regulatory
agency. Once agreement has been reached on the objective, it is the statistician's
responsibility to provide the appropriate design and statistical inferential structure
required to achieve that goal. There is a plenty of available designs on statistician’s shelf.
The greatest challenge is their implementation. For logistical and procedural issues in the
implementation of adaptive designs see Quinlan et al. [100].
FDA recently released “The Critical Path Opportunities Report” [100] emphasizes that
“the two most important areas for improving medical product development are
biomarker development (Topic 1 ) and streamlining clinical trials (Topic 2)”. Adaptive
designs for clinical trials provide efficient tools to demonstrate the safety and
effectiveness of new medical products in faster timeframes with more certainty, at lower
costs, and with better information.
While adaptive designs are not appropriate for all drug development programs it is
considered critical for the success of a pharmaceutical company that R&D increase the
utilization of adaptive designs in clinical development plans wherever feasible.
References
1.
Rosenberger WF, Lachin JM. Randomization in Clinical Trials: Theory and Practice. 2002,
Wiley.
2.
Taves DR. Minimization: a new method of assigning patients to treatment and control groups.
Clinical Pharmacology and Therapeutics 1974;15:443-453.
3.
Zelen M. The randomization and stratification of patients to clinical trials. Journal of Chronic
Diseases 1974; 28:365-375.
4.
Pocock SJ, Simon R. Sequential treatment assignment with balancing prognostic factors in the
controlled clinical trials. Biometrics 1975; 31:103-115.
5.
Wei LJ. An application of an urn model to the design of sequential controlled clinical trials. JASA
1978; 73:559-563.
10
6.
Atkinson AC. Optimum biased coin designs for sequential clinical trials with prognostic factors.
Biometrika 1982; 69:61-67.
7.
Atkinson AC. Optimum biased-coin designs for sequential treatment allocation with covariate
information. Statistics in Medicine 1999; 18:1741-1752.
8.
Robbins H. Some aspects of the sequential design of experiments. Bulletin of the American
Mathematical Society 1952; 58:527-535.
9.
Zelen M. Play the winner and the controlled clinical trial. JASA 1969; 64:131-146.
10. Eisele JR. The doubly adaptive biased coin design for sequential clinical trials. Journal of
Statistical Planning and Inference 1994; 38:249-261.
11. Berry D. Adaptive trials and Bayesian statistics in drug development. Biopharmaceutical Report
2001; 9:1-11. (with comments).
12. Berry D. Bayesian statistics and the efficiency and ethics of clinical trials. Statistical Science
2004; 19:175-187.
13. Sampson AR, Sill MW. Drop-the-Losers design: normal case. Biometrical Journal 2005; 47:257268.
14. Birkett MA, Day SJ. Internal pilot studies for estimating sample size. Statistics in Medicine 1994;
13:2455-2463.
15. Herson J, Wittes J. The use of interim analysis for sample size adjustment. Drug Information
Journal 1993; 27:753-760.
16. Gould AL. Interim analysis for monitoring clinical trials that do not materially affect the type I
error rate. Statistics in Medicine 1992; 11:53-66.
17. Gould AL, Shih WJ: Sample size reestimation without unblinding for normally distributed outcomes with unknown variance. Commun Stat Theory Methods 1992; 21(10):2833-2853.
18. Wittes JT, Schabenberger O, Zucker DM, Brittain E, Proschan M. Internal pilot studies I: Type I
error rate of the naive t-test. Statistics in Medicine 1999; 18:3481-3491.
19. Zucker DM, Wittes JT, Schabenberger O, Brittain E. Internal pilot studies II: Comparison of
various procedures. Statistics in Medicine 1999;18: 3493-3509.
20. Kieser M, Friede T. Re-calculating the sample size in internal pilot study designs with control of
the type I error rate. Statistics in Medicine 2000;19: 901-911.
21. Kieser M, Friede T. Blinded sample size reestimation in multiarmed clinical trials. Drug
Information Journal 2000;34: 455-460.
22. Gould L. Sample-size re-estimation: recent developments and practical considerations. Statistics in
Medicine 2001; 20:2625-2643.
23. Friede T, Kieser M. A comparison of methods for adaptive sample size adjustment. Statistics in
Medicine 2001; 20:3861-3873.
24. Mehta CR, Tsiatis AA. Flexible sample size considerations using information-based interim
monitoring. Drug Information Journal 2001; 35: 1095-1112.
25. Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika 1983; 70:
659-663.
26. Schmitz N. Optimal Sequentially Planned Decision Procedures. Lecture Notes in Statistics, vol.
79. Springer: New York, 1993.
27. Cressie N, Morgan PB. The VPRT: a sequential testing procedure dominating the SPRT.
Econometric Theory 1993: 431-450.
11
28. Morgan PB, Cressie N. A comparison of cost-efficiencies of the sequential, group-sequential, and
variable-sample-size-sequential probability ratio tests. Scandinavian Journal of Statistics 1997;
24: 181-200
29. Bartroff J. Optimal multistage sampling in a boundary-crossing problem. Sequential Analysis
2006; 25:59-84.
30. Jennison C, Turnbull BW. Efficient group sequential designs when there are several effect sizes
under consideration. Statistics in Medicine 2006; 25:917-932.
31. Posch M, Bauer P. Adaptive two stage designs and the conditional error function. Biometrical J.
1999; 41:689-696.
32. Posch M, Bauer P. Interim analysis and sample size assessment. Biometrics 2000; 56:1170-1176.
33. Proschan MA, Hunsberger SA. Designed extension of studies based on conditional power.
Biometrics 1995; 51:1315-1324.
34. Cui L, Hung HMJ, Wang SJ. Modification of sample size in group sequential clinical trials.
Biometrics 1999; 55:853-857.
35. Liu Q, Chi GYH. On sample size and inference for two-stage adaptive designs. Biometrics
2001;57: 172-177.
36. Li G, Shih WJ, Xie T, Lu J. A sample size adjustment procedure for clinical trials based on
conditional power. Biostatistics 2002; 3:277–287.
37. Bauer P, Koening F. The reassessment of trial perspectives from interim data—a critical view.
Statistics in Medicine 2006; 25:23-36.
38. Posch M, Bauer P, Brannath W. Issues in designing flexible trials. Statistics in Medicine 2003;
22:953-969.
39. Spiegelhalter D, Freedman L, Blackburn P. Monitoring clinical trials: Conditional or predictive
power? Control Clin Trials 1986; 7:8-17.
40. Spiegelhalter DJ, Abrams KR and Myles JP. Bayesian Approaches to Clinical Trials and HealthCare Evaluation. Wiley, 2004.
41. Wang SJ., Hung HMJ, Tsong Y, Cui L. Group sequential test strategies for superiority and noninferiority hypotheses in active controlled clinical trials. Statistics in Medicine 2001; 20: 19031912.
42. Brannath W, Bauer P, Maurer W, Posch M. Sequential tests for non-inferiority and superiority.
Biometrics 2003; 59:106 –114.
43. Kropf S, Hommel G, Schmidt U, Brickwedel J, Jepsen MS. Multiple comparison of treatments
with stable multivariate tests in a two-stage adaptive design, including a test for non-inferiority.
Biometrical Journal 2000; 42:951–965.
44. Hommel G, Kropf S. Clinical trials with an adaptive choice of hypotheses. Drug Inf J. 2001; 35:
1423–1429.
45. Lang T, Auterith A, Bauer P. Trend tests with adaptive scoring. Biometrical Journal 2000;
42:1007–1020.
46. Lawrence J. Design of clinical trials using an adaptive test statistics. Pharmaceutical Statistics
2002; 1: 97-106.
47. Neuhäuser M. An adaptive location-scale test. Biometrical Journal 2001; 43:809–819.
48. Wang SJ, Hung HMJ. Adaptive covariate adjustment in clinical trials. J Biopharm. Statistics 2005;
15: 605-612.
49. Hommel G. Adaptive modifications of hypotheses after an interim analysis. Biometrical Journal
2001; 43:581–589.
12
50. Offen W et al. (2006). Multiple co-primary endpoints: Medical and statistical solutions. Drug
Information Journal (to appear).
51. Kieser M, Bauer P, Lehmacher W. Inference on multiple endpoints in clinical trials with adaptive
interim analyses. Biometrical Journal. 1999; 41: 261-277.
52. Inoue LYT, Thall PF, Berry DA. Seamlessly expanding a randomized phase II trial to phase III.
Biometrics 2002; 58 823–831.
53. Müller HH, Schäfer H. Adaptive group sequential designs for clinical trials: Combining the
advantages of adaptive and of classical group sequential approaches. Biometrics 2001; 57:886891.
54. Gehan EA. The determination of number of patients in a follow up trial of a new
chemotherapeutic agent. Journal Chronic Disease 1961; 13:346-353.
55. Chen TT. Optimal three-stage designs for phase II cancer trials. Biometrics 1997; 43:865-874.
56. Chen S, Soong SJ, Wheeler RH. An efficient multi-stage procedure for phase II clinical trials that
have high response rate objectives. Controlled Clinical Trials 1994; 15:277-283.
57. Fleming TR. One-sample multiple testing procedure for phase II clinical trials. Biometrics 1982;
38:143-151.
58. Simon R. Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials 1989;
10:1-10.
59. Banerjee A, Tsiatis AA. Adaptive two-stage designs in phase II clinical trials. Statistics in
Medicine 2006; 25: (in press).
60. Thall PF, Simon R. Practical Bayesian guidelines for phase IIB clinical trials. Biometrics 1994;
50:337-349.
61. Wang Y.G. and Leung D.H.Y. An optimal design for screen trials. Biometrics 1998; 54: 243-250.
62. Yao T.J. and Venkatraman E. Optimal two-stage design for a series of pilot trials of new agents.
Biometrics 1998; 54: 1183-1189.
63. Hardwick J. and Stout Q.F. Optimal few-stage designs. Journal of Statistical Planning and
Inference 2002; 104, 121-145.
64. Siegmund D. Sequential Analysis. Tests and Confidence Intervals. Springer, New York, 1985.
65. Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials.
Chapman & Hall, Boca Raton, London, New York, Washington, D.C., 2000.
66. Haybittle J: Repeated assessment of results in clinical trials of cancer treatment. Br J Radiol 1971;
44:793-797.
67. O'Brien PC, Fleming TR. A multiple testing procedure for clinical trials. Biometrics 1979; 35:
549-556.
68. Bauer P. Multistage testing with adaptive designs. Biom. und Inform. in Med. und Biol. 1989; 20:
130-148.
69. Bauer P, Köhne K. Evaluation of experiments with adaptive interim analyses. Biometrics 1994;
50:1029-1041.
70. Fisher LD. Self-designing clinical trials. Statistics in Medicine 1998; 17:1551-1562.
71. Lehmacher W, Wassmer G. Adaptive sample size calculations in group sequential trials.
Biometrics 1999; 55:1286-1290.
72. Brannath W, Posch M, Bauer P. Recursive combination tests. JASA 2002; 97:236-244.
73. Müller HH, Schäfer H. A general statistical principle for changing a design any time during the
course of a trial. Statistics in Medicine 2004; 23:2497-2508.
13
74. Müller HH, Schäfer H. Construction of group sequential designs in clinical trials on the basis of
detectable treatment differences. Statistics in Medicine 2004; 23:1413-1424.
75. Bauer P, Röhmel J. An adaptive method for establishing a dose-response relationship. Statistics in
Medicine 1995;14: 1595-1607.
76. Lehmacher W, Kieser M, Hothorn L. Sequential and multiple testing for dose-response analysis.
Drug Inf. J. 2000; 34: 591-597.
77. Liu Q, Proschan MA, Pledger GW. A unified theory of two-stage adaptive designs. JASA 2002;
97:1034-1041.
78. Fedorov V, Hackl P. Model-Oriented Design of Experiments. Springer, 1997.
79. Fedorov V, Leonov S. Optimal design for dose response experiments: a model-oriented approach.
Drug Inf J. 2001; 35:1373-1383.
80. Fedorov V, Leonov S. Response driven designs in drug development. 2005, In: Wong, W.K.,
Berger, M. (eds.), "Applied Optimal Designs", Wiley.
81. O'Quigley J, Pepe M, Fisher L. Continual reassessment method: a practical design for phase I
clinical trials in cancer. Biometrics 1990; 46: 33-48.
82. Babb J, Rogatko A, Zacks S. Cancer phase I clinical trials: efficient dose escalation with overdose
control. Statistics in Medicine 1998; 17:1103-1120.
83. Haines LM, Perevozskaya I, Rosenberger WF. Bayesian optimal designs for Phase I clinical trials.
Biometrics 2003; 59:591-600.
84. Whitehead J, Williamson D. An evaluation of Bayesian decision procedures for dose-finding
studies. J. Biopharm. Stat. 1998; 8:445-467.
85. Pocock SJ. Clinical Trials. Chichester: Wiley, 1983.
86. Dragalin V, Fedorov V. Adaptive designs for dose-finding based on efficacy-toxicity response.
Journal of Statistical Planning and Inference 2006; 136, 1800-1823.
87. Edler L. Overview of Phase I trials. In: Crowley J, ed. Statistics in Clinical Oncology. New York:
Marcel Dekker, Inc; 2001: 1-34.
88. O’Quigley J. Dose-finding designs using Continual Reassessment Method. In: Crowley J, ed.
Statistics in Clinical Oncology. New York: Marcel Dekker, Inc; 2001: 35-72.
89. Storer BE. Choosing a Phase I design. In: Crowley J, ed. Statistics in Clinical Oncology. New
York: Marcel Dekker, Inc; 2001: 73-91.
90. Dragalin V, Fedorov V, Wu Y. Adaptive designs for selecting drug combinations based on
efficacy-toxicity response. Journal of Statistical Planning and Inference 2006; (to appear).
91. Thall P, Russell K. (1998). A strategy for dose-finding and safety monitoring based on efficacy
and adverse outcomes in phase I/II clinical trials. Biometrics 1998; 54: 251-264.
92. Thall PF, Cook JD. Dose-finding based on efficacy-toxicity trade-offs. Biometrics 2004; 60:684693.
93. O'Quigley J, Hughes M, and Fenton T. Dose finding designs for HIV studies. Biometrics 2001;
57:1018-1029.
94. Braun T. The bivariate continual reassessment method: extending the CRM to phase I trials of two
competing outcomes. Controlled Clinical Trials 2002; 23: 240-256.
95. Whitehead J, Zhou Y, Stevens J, Blakey,G. (2004). An evaluation of a Bayesian method of dose
escalation based on bivariate binary responses. J. Biopharma. Stat. 2004; 14: 969-983.
96. Bekele BN, Shen Y. A Bayesian approach to jointly modeling toxicity and biomarker expression
in a Phase I/II dose-finding trial. Biometrics, 2005; 61: 343-354.
14
97. Thall P, Millikan RE, Mueller P, Lee,S.J. (2003). Dose-finding with two agents in Phase I
oncology trials. Biometrics 2003; 59: 487-496.
98. Gaydos B et al. Adaptive dose-response. Drug Inf J. 2006 (submitted).
99. Maca J, Bhattacharya S, Dragalin V, Gallo P, Krams M. Adaptive Seamless Phase II / III Designs
– Background, Operational Aspects, and Examples. Drug Inf J. 2006 (submitted).
100. Quinlan JA, Gallo P, Krams M. Implementing adaptive designs: logistical and operational
considerations . Drug Inf J. 2006 (submitted).
101. FDA. Critical Path Opportunity List. 2006. http://www.fda.gov/oc/initiatives/criticalpath/
15
Download