Adaptive Designs: Terminology and Classification 1. Introduction Recent achievements in the methodology of adaptive designs provide new ways of drug development that have the potential to improve quality, speed and efficiency of decision making. With introduced flexibility within trial design, this approach saves resources by identifying failures early and increases efficiency by focusing precious patient resource on treatments that have a higher probability of success. While clearly advantageous to the drug development program, this is also ethically beneficial to the patients in the trial as it restricts patient exposures to ineffective treatments. Unfortunately, as often happens with novel approaches, there has been substantial confusion over what these designs are and when they are most applicable. They are known as adaptive, sequential, flexible, self-designing, multi-stage, dynamic, responsedriven, smart, novel designs. We propose here an integrated approach to defining and classifying adaptive designs in order to minimize confusion on their terminology and taxonomy across the pharmaceutical industry, its stakeholders and analysts, and regulatory agencies. The primary purpose of this paper is to describe the range of adaptive designs that are available and to promote the benefits they might bring in all phases of clinical drug development. It is necessary to emphasize that these designs have much more to offer than the rigid conventional parallel group designs in clinical trials. To maintain focus within the space allotment available, we do not attempt an exhaustive literature review. Rather we focus on key ideas and cite supportive literature as appropriate. Section 2 gives a general definition of adaptive designs and their structure. Section 3 provides a classification of adaptive designs 2. Definition of adaptive designs Adaptive design is defined as a multi-stage study design that uses accumulating data to decide on how to modify aspects of the study without undermining the validity and integrity of the trial. To maintain study validity means providing correct statistical inference (such as adjusted p-values, unbiased estimates and adjusted confidence intervals, etc), assuring consistency between different stages of the study, minimizing operational bias. To maintain study integrity means providing convincing results to a broader scientific community, preplanning, as much as possible, based on intended adaptations, and maintaining the blind of interim analysis results. An adaptive design requires the trial to be conducted in multiple stages with access to the accumulated data. An adaptive design may have one or more of the following rules applied at an interim look: Allocation Rule: how subjects will be allocated to different arms in the trial? This can be a fixed randomization, say 1:1, throughout the trial, or it may be adaptive with the randomization ratio changing from stage to stage, based on accruing data. This includes also the decision to drop or add treatment arms. Sampling Rule: how many subjects will be sampled at the next stage? This may depend on estimates of accrual so far, or on estimates of nuisance parameters, e.g. variance, or even on estimates of treatment effect. In dose-escalation studies this is the cohort size per stage. Stopping Rule: when to stop the trial? There are many reasons for stopping a trial: for efficacy, for harm, for futility, for safety. Decision Rule: the final decision and interim decisions pertaining to design change not covered by the previous three rules, e.g. to update the model, to change the endpoint, to modify the initial design. At any stage, the data may be analyzed and subsequent stages can be redesigned taking into account all available data. This definition includes any group sequential designs for which the only design revision is stopping the study early for sufficiently strong evidence of a treatment effect difference. Another kind of adaptive design aims to treat patients in the study as efficiently as possible using response adaptive allocation in which patients are more likely to be assigned to treatment that appears to be more efficient according to the observed responses. Sample size re-assessment or "internal pilot studies" involve the recalculation of sample size based on interim information about the values of nuisance parameters. While each of these three designs allow just one of the adaptation rules, the most recent class of adaptive designs, known also as flexible designs, allow for adaptive allocation rule (changing the randomization from stage to stage), adaptive sampling rule (the timing of the next interim analysis), a stopping rule, as well as for other modifications to be made following interim analyses (adaptive decision rule), including changing the target treatment difference for which the study is powered, changing the primary endpoint or varying the form of the primary analysis. Although statistical methodology has been developed to allow for these types of adaptive designs, these methods should never be used to replace the careful planning for the statistical design of a clinical trial. Before starting the trial, an efficient design must be detailed in the protocol. Adaptive design methodology then provides a valuable tool for reasonable design changes. We describe below the four elements of an adaptive design. 2 Allocation Rules. At each stage, the allocation rule determines how new patients will be assigned to available treatments. An allocation rule may be fixed (static) during the study or may by adaptive (dynamic), changing from stage to stage according to previous treatment assignments or/and patient responses. A fixed allocation rule does not necessarily mean a deterministic rule. On the contrary, randomization (random allocation) of patients is usually used to achieve balance in all known and unknown, observed and unobserved covariates (prognostic factors) at baseline. However, a fixed allocation rule uses allocation probabilities that are determined in advance and are not changed during the trial. Complete randomization uses equal allocation probabilities for balancing treatment assignments. Stratification can be used to improve the randomization, but this approach limits the number of covariates. Permuted block design can also be used, but this method has the disadvantage of high predictability at the investigatory site level. Restricted randomization is used with fixed unequal allocation probabilities for unbalanced treatment allocation. Rosenberger and Lachin [1] develop this subject more deeply than is possible to report here. By contrast, an adaptive allocation rule dynamically alters the allocation probabilities to reflect the accruing data on the trial. Covariate-adaptive randomization [2-5] ensures balance between treatment arms with respect to known covariates. Rather than to balance over known covariates, the optimal design approach [6, 7] minimizes the variance of treatment effect estimator in the presence of covariates. Response-adaptive randomization uses interim data to unbalance the allocation probabilities in favor of the treatment arms having comparatively superior outcomes. The simplest one is the randomized play-the-winner rule [8, 9], in which a success on one treatment results in the next patient’s assignment to the same treatment, and only changes to the alternative treatment in the event of a failure. More complex and flexible allocation rules can be obtained by using urn models [1]. The allocation probabilities are changed during the course of the trial to reflect the known outcomes of patients by adding balls of an appropriate color to the urn. The doubly adaptive biased coin design [10] adapts allocations based on previous treatment group assignments as well as on the outcome information. Bayesian response-adaptive randomization [11, 12] alters the allocation probabilities based on the posterior probabilities of each treatment arm being the “best". Drop-theloser type [13] of allocation rule removes completely a treatment arm from further randomization schedule. This gives patients a higher chance of receiving the treatment that is performing better. Sampling Rules. At each stage, the sampling rule determines how many subjects will be sampled at the next stage. Sample size re-estimation (SSR) design consists of two stages and a simple sampling rule that determines the sample size for the second stage in the light of first stage data. This may depend on estimates of nuisance parameters such as variance or response rate in control arm, but not on the treatment difference. A restricted sampling rule is one where the target sample size calculated before the trial serves as a 3 lower bound for the recalculated sample size [14, 15]. Blinded SSR rules calculate the estimate of the nuisance parameter without unmasking treatment codes [16, 17]. They are quite efficient [18-21] and less controversial than the unblinded SSR [22, 23] rules that require unmasking because the pooled variance depends on the sample mean in each arm. A traditional group sequential design uses a simple sampling rule with fixed (usually equal) sample sizes per stage. On the other hand, the information based design [24] uses a sampling rule that keeps the maximum information fixed but adjusts the sample size in order to achieve it. An error spending approach [25] allows the sample sizes for different stages to vary but in a way that does not depend on the observations from previous stages. Sequentially planned decision procedures [26-30] extend the group sequential designs by allowing future stage sample sizes to depend on the current value of the test statistic. The most flexible SSR rules incorporate information on the estimated treatment difference as well [31, 32]. The sample size for the next stage is determined by the conditional power, defined as the probability of rejecting the null hypothesis at the end of the study, conditional on the first-stage data. This probability is usually calculated under the originally specified treatment difference and uses information not only on the nuisance parameters but all the observed data, by conditioning on the observed test statistic. It is tempting to replace the originally specified treatment difference by its interim estimate. This option, although proposed frequently [33-36], cannot be recommended as a general strategy. The interim effect size is a random variable and will lead to highly variable second stage sample sizes, including particularly large ones [37]. A cap on the maximum sample size is recommended in such situations [38]. Stopping Rules. Stopping rules for clinical trials are intended to protect patients in the trial from unsafe drugs or to hasten the approval of a beneficial treatment. There is a wide range of statistical rules that can be used to determine whether to stop or continue a trial. The majority of such stopping rules are applied to a single primary endpoint and are constructed to satisfy a given power requirement in a hypothesis testing framework. Stopping rules are now available for testing superiority, equivalence, noninferiority and even safety aspects of clinical trials. A trial may be stopped in the following three situations: first, if the experimental treatment is clearly better than the control (superiority); second, if it is clearly worse than the control (harm); and third, if it is clearly not going to be shown to be better than the control (futility). Many stopping rules are based on boundary crossing methodology: at any stage in the trial, a test statistic is calculated and compared with given stopping boundaries, corresponding to one of the three objectives above; if either of them is crossed, the trial is stopped and the appropriate conclusion drawn, otherwise it is continued to the next stage. Bayesian stopping rules are based on posterior probabilities of hypotheses of interest and may be supplemented by making predictions of the possible consequences of continuing. Each of the three objectives may be formalized by assessing the posterior probability that the treatment benefit lies above or below some threshold. A skeptical prior can be used 4 for early stopping for efficacy and an enthusiastic prior for early stopping for futility [39, 40]. Decision Rules. At any stage, additional decision rules can be considered like changing the test statistics, redesigning multiple endpoints, selecting which hypothesis to be tested (switching from superiority to non-inferiority [41, 42] or changing the hierarchical order of hypotheses [43, 44]), changing the patient population (e.g., going forward either with the full population or with a pre-specified subpopulation). To maximize the power of parametric trend tests in a dose-response trial, scores corresponding to the typically unknown shape of the dose response curve have to be applied. Using an adaptive combination test, one can use the first stage data to estimate this shape and compute appropriate scores for the second stage test [45]. A similar idea has been used for changing scores for the comparison of survival curves, if deviations from the proportional hazards assumption are apparent based on the interim data [46]. Location-scale tests are used in situations where an increase in location is accompanied by an increase in variability. A usual test statistic for such a test is the sum of a location and a scale test statistics. This test can be improved by an adaptive two-stage design where in the first stage the sum and in the second stage a weighted sum of a location and a scale test statistics is used. The appropriate weights are estimated based on the first stage data [47]. Another example for an adaptive choice of the test statistics could be to include a covariate in the second stage test procedure which, in the interim analysis, shows an unexpected effect in terms of variance reductions (not foreseen in the study protocol) [48]. Decision rules for redesigning multiple endpoints include changing their pre-assigned hierarchical order in multiple testing [49], updating their correlation in reverse multiplicity situation [50], excluding those that are not properly measured in terms of variability and completeness [51], updating the parameters in modeling the relationship between the primary endpoint and auxiliary variables (biomarkers, short-term endpoints, etc) [12, 52]. After the first stage, one can perform another two-stage test with the level given by the conditional error function [53]. This allows choosing adaptively the number of interim analyses based on information collected so far. For example, if the sample size was increased one can add another interim analysis if the probability for an early decision is high. 5 3. Classification of adaptive designs Single arm trials Standard Phase II studies are used to screen new treatments for activity and decide which ones should be tested further. The decisions generally are based on single-arm studies using short-term endpoints (response/no response) in limited number of patients. The problem is formulated as hypothesis testing about some minimal acceptable probability of response allowing early stopping due to inactivity of the treatment. An early approach [54] considered both estimation and testing. At the end of the first stage a decision is made to abandon development of the new treatment if there have been no responses observed. The sample size for the first stage is determined so as to give a specified type I error rate. Following the first stage, the sampling rule calculates the second stage sample size depending on the data from the first stage, so as to estimate the unknown response rate with the specified precision. The design has been extended to three stages [55, 56]. Several group sequential designs with a fixed sampling rule have been proposed and evaluated in the frequientist framework [57, 58]. An adaptive two-stage design allows the sample size at the second stage to depend on the results at the first stage [59]. A Bayesian design [60] stops the trial for activity as soon as the posterior probability that the true response rate is at least as the standard exceeds 0.9 or stops for futility if the posterior probability that the true response is of a considerable improvement over the standard is less than 0.1. Instead of evaluating each treatment in isolation, one after the other, the adaptive design for the entire screening program can be considered [61-63]. Number of subjects per screening trial is chosen to minimize the shortest possible time to identify the "promising" compound, subject to the given constraints on type I and II risks for the entire screening program. Comparing two treatments The main objective of large-scale Phase III clinical trials is to confirm the clinical benefit of the experimental treatment by comparing it with a control (placebo or active). The clinical benefit is expressed through a parameter, an unknown population characteristic about which a hypothesis testing problem is formulated. A test statistic measures the advantage of experimental over control apparent from the sample of data available at an interim analysis. A sequential design uses a stopping rule that stops the trial at a given stage if the boundary is crossed. If the test statistic stays within the boundaries then there is no enough evidence to come to a conclusion and a further interim look should be taken. A fully sequential design [64] has a very simple sampling rule: look after every observation. Group sequential designs [65] have two or more stages at which the test statistic is compared with the boundaries after groups of patients have been observed. These designs 6 have a simple allocation rule with fixed randomization and a decision rule that simply determines whether to accept or reject the null hypothesis after stopping. The precise form of the stopping rule is determined by consideration of significance level (Type I error rate) and power at the specified alternative (desired treatment advantage on the primary endpoint). The appropriate type of stopping rule should reflect the main objective of the trial and the desirable reasons for stopping or continuing. Traditionally, the purpose of group sequential designs in confirmatory trials (e.g., [66, 67]), was to stop early only under overwhelming evidence of treatment benefit. In such case, they are said to have ‘stopped prematurely’ as if there is some correct size and duration for every trial and falling short of it is inevitably suspect. However, there is no correct sample size. All statistical sample size calculations are based on compromises and assumptions. The compromises come from setting the clinically important treatment difference at which the power is specified and assumptions involve the value of ‘nuisance parameters’, such as the variance of a quantitative endpoint or success rate or survival pattern of patients in the control arm. Instead of relying on such compromises and assumptions, so-called adaptive group sequential designs extend the group sequential design methodology by allowing not only to stop early but also to increase the sample size or study duration when such an increase is worthwhile. The p-value combination test is the cornerstone of the methodology. Technical details may be found in [68, 69]. It has been shown that different approaches to flexible designs via the conditional error function [33] or variance-spending approach [70] or down weighting the second stage test statistic [34, 71] can be looked at in terms of combination functions [31, 72]. Assume that a one-sided null hypothesis H0 is tested in a two-stage design. The test decisions are based on p-values p1 and p2 calculated from the separate samples of the two stages. Early decision boundaries are defined for p1: if p1 α1, (where α1 < α) the trial stops after the interim analysis with an early rejection, if p1 > α0, (where α0 > α) it stops with an acceptance of H0 (stopping for futility). If the trial proceeds to the second stage the decision in the final analysis is based on a combination function C(p1,p2) defined in the study protocol: if C(p1, p2) c the null hypothesis is rejected, otherwise accepted. The rejection boundary c has to be chosen to get a combination test for stochastically independent p-values at the significance level α, and will depend on the first stage decision boundaries α1 and α0. Examples of combination function are Fisher’s product test, the “inverse normal“ method, adaptively weighted z-score test, cumulative sum of chi-square statistics. The methodology allows one to combine p-values from two independent samples, regardless of whether or not they are based on the same endpoint, test statistic, etc. Therefore, an adaptive sampling rule and a wide spectrum of decision rules described in the previous section can be applied after the first stage to modify the design. The recursive application of two-stage combination tests generalizes to flexible designs with variable number of stages, see e.g. [72-74]. This larger class of adaptive designs is called flexible designs. 7 Comparing more than two treatments One of the first proposals for this kind of adaptive designs was for establishing a doseresponse relationship [75]. The objective of the first stage in such a study is to obtain some initial evidence of dose response on the primary endpoint with an option of early stopping for futility. The first stage test statistic can be a linear trend test and its p-value should be used in the combination function at the second stage. Dose selection is not restricted to any specific decision rule, it may be based not only on the primary endpoint, but also takes into account the whole spectrum of safety data. It is quite frequent that not the most efficacious treatment is selected but, for example, a lower dose with submaximal efficacy and a better overall benefit/risk ratio. Sample size re-estimation can be performed in addition to the adaptive choice of the doses carried on to the second stage. The allocation ratio could be changed to randomize more patients to the most promising dose. A closed multiple testing procedure controlling the familywise significance level can be used for individual treatment comparisons [76, 77]. Adaptive model-based dose finding. The primary goal of a dose-finding study is to establish the dose-response relationship. The optimal experimental design framework provides enough structure to make this goal attainable. It is assumed that the available doses (the design region) and the response variables have been defined and there exists a known structure for the mathematical model describing the dose-response relationship (the model). The focus is on choosing the dose levels in some optimal way to enhance the process of estimating the unknown parameters of the model θ. The experimental designs are represented by a set of design points (support points) and a corresponding set of weights representing the allocations to the design points: ξ={(xi, λi), i=1,k}. An important element in optimal design is the information matrix, say M(ξ, θ), which is an expression of the accuracy of the θ estimate based on observations at k design points of design ξ. A "larger" value of M reflects more information (more precision, lower variability) in the estimate. A natural goal in picking the design ξ is to find the design that "maximizes" the determinant of matrix M, the so-called the D-optimal criterion [78-80]. A major challenge in design for nonlinear (in θ) models is that the optimal design ξ* depends on θ--a conundrum: one is looking for the design ξ with the aim of estimating the unknown θ, and yet one has to know θ to find the best ξ. This conundrum leads to various ways of coping with the dependence on θ. These include the locally-optimal design based on one's best guess at θ, Bayesian design by augmenting the criterion to reflect the uncertainty in a prior knowledge about θ, minimax design by finding the design that is optimal under the worst parameter θ value, adaptive design by alternating between forming estimates of θ and choosing a locally-optimal design for that value of the parameter. In an adaptive design, each new cohort of patients is allocated to the doses that maximize the expected increment of information (in terms of selected criterion), given the current interim data. The maximization is carried over the whole range of possible doses with additional constraints that, for instance, may involve the probability of toxicity at those doses, accommodating the maximum tolerated dose (MTD) mentality: dose escalate cautiously starting from the lowest dose. Initial design is chosen and preliminary 8 parameter estimates are obtained. Then, the next dose(s) is selected from the available range of doses that satisfy the efficacy and safety constraints and provide the maximal improvement of the design with respect to the selected criterion of optimality and current parameter estimates. The next available cohort of patients is allocated to this dose. The estimates of unknown parameters are refined given these additional observations. These design-estimation steps are repeated until either the available resources are exhausted or the set of acceptable doses is empty. Such an approach is efficient from the perspective of both time and patient resources. D-optimal designs in general, are concerned mainly with collective ethics: doing in the dose-finding study what is best for future patients who stand to benefit from the results of the trial. In contrast, alternative procedures for dose-finding studies have been proposed that are mainly concerned with individual ethics: doing what is best for current patients in the trial. The continual reassessment method (CRM) [81] was the first such method that formulates the goal of a dose escalation in Phase I trial as to maximize patient gain. Similar procedures are considered in [82-84]. Notice however, that although these designs rely on a noble intention to maximize individual gain by allocating the patient to the "best" known dose, the individual ethics may be well compromised by the "poor learning" about the "best" dose with such a design. Pocock [85] points out that each clinical trial involves a balance between individual and collective ethics and such a balance is never simple but complex. Of course, the collective ethics should never usurp the individual ethics. Dragalin and Fedorov [86] made one of the first attempt to formalize the goal of a dose-finding study as a penalized D-optimal design problem: find the design that maximizes the information (collective (society) ethics) but under the control of the total penalty for treating patients in the trial (individual (all individuals in the trial) ethics). A comprehensive overview of many adaptive designs in Phase I clinical trials is given in [87-89]. Most designs for dose-finding in Phase I clinical trials determine a MTD based on toxicity alone, while ignoring efficacy response. Most Phase II designs assume that a toxicity acceptable dose range has been determined and aim to establish treatment efficacy at some dose in this range, with early stopping if response rate is too low. However, under a variety of circumstances, it is useful to address safety and efficacy simultaneously. A class of models that can be used in early phase clinical trials in which patient response is characterized by two dependent binary outcomes, one for efficacy and one for toxicity have been proposed [86], and response-adaptive designs with both dose allocation rules and early stopping rules in terms of response and toxicity, thus combining elements of more typical Phase I and Phase II trials, have been derived. Similar designs can be used in drug combination studies [90]. For the same situation, Bayesian adaptive designs have been also proposed [91-97]. We refer to Gaydos et al. [98] in this volume for additional information on adaptive designs in dose-finding studies. 9 Seamless Phase II/III designs. Important opportunities for seamless designs are available also in combining traditional Phase IIb and Phase III of clinical development into a single trial, both operationally and inferentially, i.e. conducting both treatment selection and confirmation of treatment efficacy over control under a single protocol where all the data are appropriately used in the final analysis. For a detailed description of this type of adaptive designs see Maca et al. [99] in this volume. 4. Conclusion The objective of a clinical trial may be either to target the MTD or minimum effective dose, or to find the therapeutic range, or to determine the optimal safe dose to be recommended for confirmation, or to confirm efficacy over control in Phase III clinical trial. This clinical goal is usually determined by the clinicians from the pharmaceutical industry, practicing physicians, key opinion leaders in the field, and the regulatory agency. Once agreement has been reached on the objective, it is the statistician's responsibility to provide the appropriate design and statistical inferential structure required to achieve that goal. There is a plenty of available designs on statistician’s shelf. The greatest challenge is their implementation. For logistical and procedural issues in the implementation of adaptive designs see Quinlan et al. [100]. FDA recently released “The Critical Path Opportunities Report” [100] emphasizes that “the two most important areas for improving medical product development are biomarker development (Topic 1 ) and streamlining clinical trials (Topic 2)”. Adaptive designs for clinical trials provide efficient tools to demonstrate the safety and effectiveness of new medical products in faster timeframes with more certainty, at lower costs, and with better information. While adaptive designs are not appropriate for all drug development programs it is considered critical for the success of a pharmaceutical company that R&D increase the utilization of adaptive designs in clinical development plans wherever feasible. References 1. Rosenberger WF, Lachin JM. Randomization in Clinical Trials: Theory and Practice. 2002, Wiley. 2. Taves DR. Minimization: a new method of assigning patients to treatment and control groups. Clinical Pharmacology and Therapeutics 1974;15:443-453. 3. Zelen M. The randomization and stratification of patients to clinical trials. Journal of Chronic Diseases 1974; 28:365-375. 4. Pocock SJ, Simon R. Sequential treatment assignment with balancing prognostic factors in the controlled clinical trials. Biometrics 1975; 31:103-115. 5. Wei LJ. An application of an urn model to the design of sequential controlled clinical trials. JASA 1978; 73:559-563. 10 6. Atkinson AC. Optimum biased coin designs for sequential clinical trials with prognostic factors. Biometrika 1982; 69:61-67. 7. Atkinson AC. Optimum biased-coin designs for sequential treatment allocation with covariate information. Statistics in Medicine 1999; 18:1741-1752. 8. Robbins H. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 1952; 58:527-535. 9. Zelen M. Play the winner and the controlled clinical trial. JASA 1969; 64:131-146. 10. Eisele JR. The doubly adaptive biased coin design for sequential clinical trials. Journal of Statistical Planning and Inference 1994; 38:249-261. 11. Berry D. Adaptive trials and Bayesian statistics in drug development. Biopharmaceutical Report 2001; 9:1-11. (with comments). 12. Berry D. Bayesian statistics and the efficiency and ethics of clinical trials. Statistical Science 2004; 19:175-187. 13. Sampson AR, Sill MW. Drop-the-Losers design: normal case. Biometrical Journal 2005; 47:257268. 14. Birkett MA, Day SJ. Internal pilot studies for estimating sample size. Statistics in Medicine 1994; 13:2455-2463. 15. Herson J, Wittes J. The use of interim analysis for sample size adjustment. Drug Information Journal 1993; 27:753-760. 16. Gould AL. Interim analysis for monitoring clinical trials that do not materially affect the type I error rate. Statistics in Medicine 1992; 11:53-66. 17. Gould AL, Shih WJ: Sample size reestimation without unblinding for normally distributed outcomes with unknown variance. Commun Stat Theory Methods 1992; 21(10):2833-2853. 18. Wittes JT, Schabenberger O, Zucker DM, Brittain E, Proschan M. Internal pilot studies I: Type I error rate of the naive t-test. Statistics in Medicine 1999; 18:3481-3491. 19. Zucker DM, Wittes JT, Schabenberger O, Brittain E. Internal pilot studies II: Comparison of various procedures. Statistics in Medicine 1999;18: 3493-3509. 20. Kieser M, Friede T. Re-calculating the sample size in internal pilot study designs with control of the type I error rate. Statistics in Medicine 2000;19: 901-911. 21. Kieser M, Friede T. Blinded sample size reestimation in multiarmed clinical trials. Drug Information Journal 2000;34: 455-460. 22. Gould L. Sample-size re-estimation: recent developments and practical considerations. Statistics in Medicine 2001; 20:2625-2643. 23. Friede T, Kieser M. A comparison of methods for adaptive sample size adjustment. Statistics in Medicine 2001; 20:3861-3873. 24. Mehta CR, Tsiatis AA. Flexible sample size considerations using information-based interim monitoring. Drug Information Journal 2001; 35: 1095-1112. 25. Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika 1983; 70: 659-663. 26. Schmitz N. Optimal Sequentially Planned Decision Procedures. Lecture Notes in Statistics, vol. 79. Springer: New York, 1993. 27. Cressie N, Morgan PB. The VPRT: a sequential testing procedure dominating the SPRT. Econometric Theory 1993: 431-450. 11 28. Morgan PB, Cressie N. A comparison of cost-efficiencies of the sequential, group-sequential, and variable-sample-size-sequential probability ratio tests. Scandinavian Journal of Statistics 1997; 24: 181-200 29. Bartroff J. Optimal multistage sampling in a boundary-crossing problem. Sequential Analysis 2006; 25:59-84. 30. Jennison C, Turnbull BW. Efficient group sequential designs when there are several effect sizes under consideration. Statistics in Medicine 2006; 25:917-932. 31. Posch M, Bauer P. Adaptive two stage designs and the conditional error function. Biometrical J. 1999; 41:689-696. 32. Posch M, Bauer P. Interim analysis and sample size assessment. Biometrics 2000; 56:1170-1176. 33. Proschan MA, Hunsberger SA. Designed extension of studies based on conditional power. Biometrics 1995; 51:1315-1324. 34. Cui L, Hung HMJ, Wang SJ. Modification of sample size in group sequential clinical trials. Biometrics 1999; 55:853-857. 35. Liu Q, Chi GYH. On sample size and inference for two-stage adaptive designs. Biometrics 2001;57: 172-177. 36. Li G, Shih WJ, Xie T, Lu J. A sample size adjustment procedure for clinical trials based on conditional power. Biostatistics 2002; 3:277–287. 37. Bauer P, Koening F. The reassessment of trial perspectives from interim data—a critical view. Statistics in Medicine 2006; 25:23-36. 38. Posch M, Bauer P, Brannath W. Issues in designing flexible trials. Statistics in Medicine 2003; 22:953-969. 39. Spiegelhalter D, Freedman L, Blackburn P. Monitoring clinical trials: Conditional or predictive power? Control Clin Trials 1986; 7:8-17. 40. Spiegelhalter DJ, Abrams KR and Myles JP. Bayesian Approaches to Clinical Trials and HealthCare Evaluation. Wiley, 2004. 41. Wang SJ., Hung HMJ, Tsong Y, Cui L. Group sequential test strategies for superiority and noninferiority hypotheses in active controlled clinical trials. Statistics in Medicine 2001; 20: 19031912. 42. Brannath W, Bauer P, Maurer W, Posch M. Sequential tests for non-inferiority and superiority. Biometrics 2003; 59:106 –114. 43. Kropf S, Hommel G, Schmidt U, Brickwedel J, Jepsen MS. Multiple comparison of treatments with stable multivariate tests in a two-stage adaptive design, including a test for non-inferiority. Biometrical Journal 2000; 42:951–965. 44. Hommel G, Kropf S. Clinical trials with an adaptive choice of hypotheses. Drug Inf J. 2001; 35: 1423–1429. 45. Lang T, Auterith A, Bauer P. Trend tests with adaptive scoring. Biometrical Journal 2000; 42:1007–1020. 46. Lawrence J. Design of clinical trials using an adaptive test statistics. Pharmaceutical Statistics 2002; 1: 97-106. 47. Neuhäuser M. An adaptive location-scale test. Biometrical Journal 2001; 43:809–819. 48. Wang SJ, Hung HMJ. Adaptive covariate adjustment in clinical trials. J Biopharm. Statistics 2005; 15: 605-612. 49. Hommel G. Adaptive modifications of hypotheses after an interim analysis. Biometrical Journal 2001; 43:581–589. 12 50. Offen W et al. (2006). Multiple co-primary endpoints: Medical and statistical solutions. Drug Information Journal (to appear). 51. Kieser M, Bauer P, Lehmacher W. Inference on multiple endpoints in clinical trials with adaptive interim analyses. Biometrical Journal. 1999; 41: 261-277. 52. Inoue LYT, Thall PF, Berry DA. Seamlessly expanding a randomized phase II trial to phase III. Biometrics 2002; 58 823–831. 53. Müller HH, Schäfer H. Adaptive group sequential designs for clinical trials: Combining the advantages of adaptive and of classical group sequential approaches. Biometrics 2001; 57:886891. 54. Gehan EA. The determination of number of patients in a follow up trial of a new chemotherapeutic agent. Journal Chronic Disease 1961; 13:346-353. 55. Chen TT. Optimal three-stage designs for phase II cancer trials. Biometrics 1997; 43:865-874. 56. Chen S, Soong SJ, Wheeler RH. An efficient multi-stage procedure for phase II clinical trials that have high response rate objectives. Controlled Clinical Trials 1994; 15:277-283. 57. Fleming TR. One-sample multiple testing procedure for phase II clinical trials. Biometrics 1982; 38:143-151. 58. Simon R. Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials 1989; 10:1-10. 59. Banerjee A, Tsiatis AA. Adaptive two-stage designs in phase II clinical trials. Statistics in Medicine 2006; 25: (in press). 60. Thall PF, Simon R. Practical Bayesian guidelines for phase IIB clinical trials. Biometrics 1994; 50:337-349. 61. Wang Y.G. and Leung D.H.Y. An optimal design for screen trials. Biometrics 1998; 54: 243-250. 62. Yao T.J. and Venkatraman E. Optimal two-stage design for a series of pilot trials of new agents. Biometrics 1998; 54: 1183-1189. 63. Hardwick J. and Stout Q.F. Optimal few-stage designs. Journal of Statistical Planning and Inference 2002; 104, 121-145. 64. Siegmund D. Sequential Analysis. Tests and Confidence Intervals. Springer, New York, 1985. 65. Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall, Boca Raton, London, New York, Washington, D.C., 2000. 66. Haybittle J: Repeated assessment of results in clinical trials of cancer treatment. Br J Radiol 1971; 44:793-797. 67. O'Brien PC, Fleming TR. A multiple testing procedure for clinical trials. Biometrics 1979; 35: 549-556. 68. Bauer P. Multistage testing with adaptive designs. Biom. und Inform. in Med. und Biol. 1989; 20: 130-148. 69. Bauer P, Köhne K. Evaluation of experiments with adaptive interim analyses. Biometrics 1994; 50:1029-1041. 70. Fisher LD. Self-designing clinical trials. Statistics in Medicine 1998; 17:1551-1562. 71. Lehmacher W, Wassmer G. Adaptive sample size calculations in group sequential trials. Biometrics 1999; 55:1286-1290. 72. Brannath W, Posch M, Bauer P. Recursive combination tests. JASA 2002; 97:236-244. 73. Müller HH, Schäfer H. A general statistical principle for changing a design any time during the course of a trial. Statistics in Medicine 2004; 23:2497-2508. 13 74. Müller HH, Schäfer H. Construction of group sequential designs in clinical trials on the basis of detectable treatment differences. Statistics in Medicine 2004; 23:1413-1424. 75. Bauer P, Röhmel J. An adaptive method for establishing a dose-response relationship. Statistics in Medicine 1995;14: 1595-1607. 76. Lehmacher W, Kieser M, Hothorn L. Sequential and multiple testing for dose-response analysis. Drug Inf. J. 2000; 34: 591-597. 77. Liu Q, Proschan MA, Pledger GW. A unified theory of two-stage adaptive designs. JASA 2002; 97:1034-1041. 78. Fedorov V, Hackl P. Model-Oriented Design of Experiments. Springer, 1997. 79. Fedorov V, Leonov S. Optimal design for dose response experiments: a model-oriented approach. Drug Inf J. 2001; 35:1373-1383. 80. Fedorov V, Leonov S. Response driven designs in drug development. 2005, In: Wong, W.K., Berger, M. (eds.), "Applied Optimal Designs", Wiley. 81. O'Quigley J, Pepe M, Fisher L. Continual reassessment method: a practical design for phase I clinical trials in cancer. Biometrics 1990; 46: 33-48. 82. Babb J, Rogatko A, Zacks S. Cancer phase I clinical trials: efficient dose escalation with overdose control. Statistics in Medicine 1998; 17:1103-1120. 83. Haines LM, Perevozskaya I, Rosenberger WF. Bayesian optimal designs for Phase I clinical trials. Biometrics 2003; 59:591-600. 84. Whitehead J, Williamson D. An evaluation of Bayesian decision procedures for dose-finding studies. J. Biopharm. Stat. 1998; 8:445-467. 85. Pocock SJ. Clinical Trials. Chichester: Wiley, 1983. 86. Dragalin V, Fedorov V. Adaptive designs for dose-finding based on efficacy-toxicity response. Journal of Statistical Planning and Inference 2006; 136, 1800-1823. 87. Edler L. Overview of Phase I trials. In: Crowley J, ed. Statistics in Clinical Oncology. New York: Marcel Dekker, Inc; 2001: 1-34. 88. O’Quigley J. Dose-finding designs using Continual Reassessment Method. In: Crowley J, ed. Statistics in Clinical Oncology. New York: Marcel Dekker, Inc; 2001: 35-72. 89. Storer BE. Choosing a Phase I design. In: Crowley J, ed. Statistics in Clinical Oncology. New York: Marcel Dekker, Inc; 2001: 73-91. 90. Dragalin V, Fedorov V, Wu Y. Adaptive designs for selecting drug combinations based on efficacy-toxicity response. Journal of Statistical Planning and Inference 2006; (to appear). 91. Thall P, Russell K. (1998). A strategy for dose-finding and safety monitoring based on efficacy and adverse outcomes in phase I/II clinical trials. Biometrics 1998; 54: 251-264. 92. Thall PF, Cook JD. Dose-finding based on efficacy-toxicity trade-offs. Biometrics 2004; 60:684693. 93. O'Quigley J, Hughes M, and Fenton T. Dose finding designs for HIV studies. Biometrics 2001; 57:1018-1029. 94. Braun T. The bivariate continual reassessment method: extending the CRM to phase I trials of two competing outcomes. Controlled Clinical Trials 2002; 23: 240-256. 95. Whitehead J, Zhou Y, Stevens J, Blakey,G. (2004). An evaluation of a Bayesian method of dose escalation based on bivariate binary responses. J. Biopharma. Stat. 2004; 14: 969-983. 96. Bekele BN, Shen Y. A Bayesian approach to jointly modeling toxicity and biomarker expression in a Phase I/II dose-finding trial. Biometrics, 2005; 61: 343-354. 14 97. Thall P, Millikan RE, Mueller P, Lee,S.J. (2003). Dose-finding with two agents in Phase I oncology trials. Biometrics 2003; 59: 487-496. 98. Gaydos B et al. Adaptive dose-response. Drug Inf J. 2006 (submitted). 99. Maca J, Bhattacharya S, Dragalin V, Gallo P, Krams M. Adaptive Seamless Phase II / III Designs – Background, Operational Aspects, and Examples. Drug Inf J. 2006 (submitted). 100. Quinlan JA, Gallo P, Krams M. Implementing adaptive designs: logistical and operational considerations . Drug Inf J. 2006 (submitted). 101. FDA. Critical Path Opportunity List. 2006. http://www.fda.gov/oc/initiatives/criticalpath/ 15