1 DESIGN AND ANALYSIS OF CLINICAL TRIALS Bayesian Adaptive Methods An Application to Phase I Clinical Trials Carrie Ann Deis, Nadine Dewdney May 2012 The usage of Bayesian adaptive methods is more frequent in clinical trials. Due to its flexibility, it is more applicable to Phase I and II trials. The application of Bayesian adaptive methods is shown to be more favorable than the utilization of the traditional approach. The benefits of using the Bayesian adaptive methods include smaller sample size, increased power and lower incidence of ineffective dosing. A hybridization of the traditional and the Bayesian adaptive designs improves upon the traditional design. The FDA has addressed the increasing usage of the Bayesian adaptive designs in guidance documents. Here is a discussion about Bayesian adaptive methods and how it compares to the more traditional approach to drug development. 2 Table of Contents I. Introduction ............................................................................................................................. 3 A. Background Information and Research ............................................................................ 3 B. Adaptive Designs ............................................................................................................. 4 C. Bayesian Approach .......................................................................................................... 5 D. Prior Distributions ............................................................................................................ 7 E. Traditional vs. Bayesian ................................................................................................... 8 F. Hybridization ..................................................................................................................... 11 G. FDA Guidance – Medical Devices................................................................................. 12 H. Conclusion...................................................................................................................... 13 I. References .......................................................................................................................... 15 3 I. Introduction A. Background Information and Research Phase I clinical trials are conducted to determine the toxicity of the drug and for the appropriate dosing of the new intervention. It is the first time that the drug is being tested in humans. The sample size is relatively small, 20 to 50 patients. Depending on the therapeutic nature of the drug, Phase I trials may start off with healthy volunteers. In the case of cancer therapy drugs, Phase I trials are sometimes conducted on cancer patients who have failed to respond to conventional therapy. In drug development, it is assumed that the effectiveness of a drug increases as the dose level increases, however with increased dose there is an increased risk of toxicity, so in Phase I trials the maximum tolerated dose (MTD) is sought. There are some important attributes of Phase I trials. Prior to starting, there has to be a defined starting dose, a toxicity profile and dose-limiting toxicity (DLT), a target toxicity level and a dose-escalation scheme. The starting dose is commonly chosen as one-tenth of the LD10 in mice, or one-third of the lowest toxic dose in dogs. Dose escalation is done incrementally. In most studies, the increments are pre-determined. They may use a modified Fibonacci sequence for dose escalation with the increase rate diminishing as the doses get higher. Standard designs assign patients to dose levels according to predefined rules. There is no stipulation for the dose-toxicity curve. These designs are classified as up-an-down designs. They allow for escalation and de-escalation of doses. The traditional approach to determine the maximum tolerable dose is to see at which dose onethird of the subjects would develop toxicity. The doses would be selected such that D1,…,DK would be close to the MTD. The subjects would be randomized and the number of subjects, ri, developing toxicity would be observed. The proportion, pi = ri/ni, would be used to calculate the proportions exhibiting toxicity. The dose-response would be modeled based on the probability of toxicity. The MTD would be fitted from this model. This method is known as the frequentist approach. The frequentist approach focuses the design of the clinical trial on the probability of the results of the trial. The probability is based on the observed data with the assumption that a particular hypothesis is true. The P-value, which is used in determining the validity of the hypothesis, is the 4 probability of observing results as extreme as or more extreme than the observed results. There are many issues with the frequentist approach. There are ethical concerns with the traditional approach. Patients might be treated excessively and unnecessarily at low doses. Too many patients might be treated at doses that are either too low or too high and it is highly likely that most subjects are treated at extremely low doses. It is not clear that the chosen MTD is the correct dose. The approach is rigorous and inflexible. It limits modernization in the design and analysis of clinical trials. In an effort to improve on the frequentist approach, adaptive designs were developed. As early as in the 1970s, the adaptive design concept was introduced. In an adaptive design, adjustments and modifications can be made after the trial has started. The modifications do not affect the integrity of the trial. There are interim adjustments to the study design following the accumulation of data. There are several adaptive designs, namely, group sequential design, sample-size adjustable design, drop-losers design, adaptive treatment allocation design and adaptive dose-escalation and Bayesian adaptive methods. B. Adaptive Designs During the clinical trials, trial and statistical procedures can be modified in an adaptive design study. These changes are based on the review of the interim data. The goal of the modifications is to improve upon the probability of success of the trial as well as to correctly identify the clinical benefits of the intervention under investigation. There are different types of modifications which are made during an adaptive design. Prospective adaptations are changes such as stopping a trial early for safety or lack of efficacy reasons, dropping the loser or sample size re-estimation. Modifications in hypotheses, inclusion/exclusion criteria, dose/ regimen, treatment duration and endpoints are examples of ad-hoc modifications which are usually not initially recognized as candidates for modification, but become necessary as the trial progresses. Changes which are made to the statistical analysis plans prior to database lock or un-blinding of treatment codes are known as retrospective adaptations. Group sequential designs allow for the stopping of a trial due to safety and/or efficacy issues based on the analysis of interim data. There are stopping boundary functions such as Pocock and O’Brien-Fleming which are used to control type I error rate. 5 A design that allows for sample size adjustments based on the observation of interim data is called a sample size re-estimation design. There are disadvantages to this method. The practice of starting with a small number then adjusting the sample size could lead to ignoring the intended clinically meaningful difference that was originally intended. Drop-the-losers design allow for inferior treatments to be dropped from the study. This design also allows adding additional arms. This design is used in Phase II clinical development. It is a two-stage design. Adaptive dose finding design is used during Phase I studies to determine minimum effective dose (MED) and/or the maximum tolerable dose (MTD). In this design, the continual reassessment method with the Bayesian approach is usually used to estimate the dose-response curve. C. Bayesian Approach The more traditional way of designing and analyzing clinical trials is known as the frequentist approach. The frequentist approach defines the probability based on the data, as described above. The Bayesian approach is based on Bayes theorem and it specifies a prior distribution then updates the distribution as data becomes available. The new distribution is the posterior distribution. Both approaches use probability in their analyses, but they use different inferential methods. As previously mentioned, in the Bayesian approach, designs can be adjusted to adapt to the changing course of the trial at interim points of the trial. Information from multiple sources can be combined as in the case of seamless adaptive designs. The Bayesian analysis can use nonrandomized trials which are not allowed in the frequentist designs. For the Bayesian approach, all unknowns have probability distributions. The probabilities are associated with the parameters and information on the parameters is summarized prior to data collection. As the data is collected, the information on the parameters is updated and the posterior distribution is used for statistical inference of the data. The Bayesian design can use models from previous studies which are similar but independent through hierarchical modeling. This gives the Bayesian approach greater strength in parameter estimation. However, the inferences depend only on the current study and uses data that were actually observed. The inferences are flexible, that is they can be updated as the data is gathered. 6 Sample size does not have to be chosen in advance, but is determined as the trial progresses. The main decision at the onset of the Bayesian design is when to start. Decisions on the continuation of the trial are made as data accumulates and the sample size projection can be determined as information becomes available. The population definition can be altered and the drug of interest can be changed midcourse. These changes could interfere with analysis and result in weakened conclusions, unless they were specified beforehand. Traditionally, clinical trials are randomized regardless of the statistical approach. Randomization is paramount in reducing the possibility of selection bias and balancing the treatment groups’ covariates. Since the Bayesian approach is subjective probability, randomization is not required. The Bayesian approach calculates the predictive probability that the patient will respond to the treatment. The frequentist approach conditions the probability on future observations. The conditional probabilities in the Bayesian approach are averaged over unknown parameters and use the fact that an unconditional probability is the expected value of conditional probabilities. The strength of the Bayesian approach lies on decision making. It is designed for drawing conclusions from studies based on costs and public health benefits. A given decision problem in a clinical trial, will give rise to possible future observations. Each observation has an associated cost and benefit with corresponding predictive probabilities. The probabilities can be weighed and the optimal solution is chosen. In the more traditional approach, this is not possible since there is no way to find predictive probabilities. The basics that are needed to enter a trial are a stopping rule and a prior distribution when using the Bayesian approach. The sample size does not need to be specified in advance, but it is common to have a predetermined maximum size. A trial that is Bayesian in its approach still has to have a protocol and guidelines by which it must abide, so the anticipated type of adaptation needs to be specified. In the case of a Phase I study, from an ethical standpoint, adaptation is necessary. For example, if the treatment is an anti-cancer agent and the subjects are gravely ill, an increase in dose would be beneficially to the patients as well as the outcome of the study since the main objective is dosefinding. Similarly, in Phase II, adaptation is more desirable. The focus is efficacy without excess toxicity, so having the power to alter a trial if efficacy is subpar or if there is excess toxicity makes the Bayesian approach adequate. In Phase III and beyond, adaptation is not necessary and 7 the calculations for the posterior distribution become more cumbersome. There is a greater risk of errors when the Bayesian approach is applied to Phase III and beyond clinical trials. D. Prior Distributions Choosing a prior distribution is paramount in any Bayesian adaptive design. The prior distribution provides information about the treatment before there are experimental results. The selection of a prior distribution is sometimes based on historical data such as what is known about the family of compounds in treating the targeted disease. Historical data may not be available, so the prior distribution could be based on what is known about the biological nature of the disease, data from investigational and related treatments and the preclinical results of these treatments. The following series of graphs depict the progression of a hypothetical prior distribution as it is updated. Figure 1: Sequence of probability distributions for success rate p 8 The first graph shows the selected initial distribution of p. The prior is assumed to be uniformly distributed between 0 and 1. The first treatment is a success and the distribution shifts. After each observation, the distribution is updated based on the results. If there is a success, the distribution shifts to the right and if there is a failure, the distribution shifts to the left. The graph which is labeled ‘Final’ depicts the posterior distribution after ten treatments. The final graph in the sequence of graphs, which is labeled ‘Next observation’, shows the possible outcome of the posterior distribution if there was an eleventh treatment. For a success, the distribution is predicted to follow the previous trend and shift to the right; likewise if there is a failure the previous trend of a shift in the distribution to the left is expected. E. Traditional vs. Bayesian The 3+3 design is a traditional design with no modeling of the dose-toxicity curve. The only assumption is that the toxicity increases with the dose. The first three patients are treated with the starting dose and the next three patients are treated with the next dose level. The dose level has been fixed in advance. If none of the subjects experience toxicity, then another three subjects will be treated at the same dose level. The dosing will be escalated until at least two patients experience dose limiting toxicity. So, the MTD is usually defined as the highest dose level in which six or more patients have been treated and no more than 33% of the patients exhibited toxicity. Another method is the pharmacologically guided dose escalation. This method assumes that DLTs can be predicted by the drug plasma concentrations, so pharmacokinetic data are used to determine the next dose as the study progresses. Prior to the study, there is a pre-specified plasma exposure level which is defined by area under the concentration-time curve (AUC). The AUC data is extrapolated from preclinical studies. The subsequent dose is assigned as long as the predefined plasma exposure level has not been met. Dosing is escalated one patient at a time and it is usually done in 100% dose increments. Once the plasma exposure level has attained the prespecified AUC level, the design is switched to the traditional 3+3 design. The dose increments are then reduced. Anti-cancer agents such as anthracyclines and some platinum compounds have seen good results using this method. This design has limitations such as the difficulties of obtaining real-time pharmacokinetic data from patients. 9 Another design is the accelerated titration design. It is a variation of the 3+3 design. In this design, dose escalation is allowed in multiple cycles in the same patient. This helps in reducing the number of patients treated at low dose levels. Escalating the dose in the same patient allows a patient the opportunity to be treated at a higher dose. On the other hand, the true results might be masked by the cumulative effect of multiple doses. It would be challenging to determine if the results were due to chronic or delayed toxicity. There are other traditional designs such as the 2+4, 3+3+3 and 3+1+1 designs. For the 2+4 design, if one patient in the first two patients exhibits toxicity then four additional patients are added. Like the 3+3 design, the study is stopped when at least two patients exhibit DLT. In the 3+3+3 design, an additional group of three patients is added if at least two patients of the first six exhibits toxicity. If three out of nine patients display toxicity, then the trial is stopped. The 3+1+1 design is the most aggressive of these designs. It is also known as the best of five design. If among the first three patients, DLT is observed in one or two patients, then another patient is added. If two out of four patients exhibit toxicity, then another patient is added. The trial will be terminated if there are three or more patients exhibiting toxicity. The above designs are considered to be rule-based. Bayesian designs fall in the category of model-based designs. The assumption in these designs is that there exists a monotonic doseresponse relationship between the dose and the probability of DLT for the patients who have been treated at the dose level. A dose-toxicity curve and a targeted toxicity level (TTL) are clearly defined in these classes of designs. Through dose escalation, the design sets out to find a dose that will induce a probability of DLT at a pre-specified target toxicity level. The continual reassessment method (CRM) is a Bayesian model-based Phase I design. The dose toxicity relationship is characterized by a one-parameter parametric model, such as the logistic model, the power model or the hyperbolic tangent model. Logistic model: 𝑝(𝑑) = exp(3+𝑎𝑑) 1+exp(3+𝑎𝑑) Power model: 𝑝(𝑑) = 𝑑exp(𝑎) exp(𝑑) Hyperbolic tangent: 𝑝(𝑑) = [exp(𝑑)+exp(−𝑑)]𝑎 Where: p(d) is the probability of DLT 10 d is the dose a is a model parameter The CRM initially assumes a vague prior distribution for the parameter, a; one patient is treated at the level that is closest to the estimated MTD. The toxicity is assessed and the distribution of parameter, a, is updated by calculating its posterior distribution. The calculation is done by multiplying the prior distribution which was chosen by the likelihood. 𝑛 𝐿(𝑎; 𝑑, 𝑦) ∝ ∏ 𝑝(𝑑𝑖 )𝑦𝑖 [1 − 𝑝(𝑑𝑖 )]1−𝑦𝑖 𝑖=1 Where: di and yi are the dose level and toxicity outcome for patient i yi = 1 if a DLT is observed and 0 if none is observed The posterior distribution can be calculated using statistical software. Once the posterior distribution has been calculated, the next patient is treated at the dose level that is closest to the revised MTD based on the distribution of a. This sequence of steps is repeated until a precise estimate of a is obtained or the sample size has been exhausted. Here is an example of a dose-finding escalation design from an oncology trial. The main objective is to determine the MTD for a new drug. Using the results from animal studies, the dose limiting toxicity rate was determined to be 1% for the starting dose of 25 mg/m2, which is one-tenth of the lethal dose. The MTD is estimated to be 150 mg/m2 and the dose limiting toxicity rate is defined as 0.25. We can compare a traditional approach with the Bayesian approach, by using the 3+3 traditional escalation rule (TER) and the continual reassessment method (CRM). Through simulations, the comparison of TER and CRM was done. A logistic toxicity model was selected for the model. The selected dose sequence was chosen with interim factors = 2, 1.67, 1.33, 1.33, 1.33, 1.33, 1.33, 1.33, 1.33. Evaluations are based on safety, accuracy and efficiency. The simulation results are shown in Table 1. 11 Table 1: Summary of simulation results for the designs Method Assumed Mean Mean number True MTD Predicted of Patients MTD 3+3 TER 100 86.7 14.9 CRM 100 99.2 13.4 3+3 TER 150 125 19.4 CRM 150 141 15.5 3+3 TER 200 169 22.4 CRM 200 186 16.8 Mean number of DLTs 2.8 2.8 2.9 2.5 2.8 2.2 In reviewing the results, it can be seen that if the true MTD is 100 mg/m2; the TER approach estimates the MTD to be 86.7 mg/m2 and the CRM estimates it to be 99.2 mg/m2. The average number of patients that is required is for the TER design is 14.9 and 13.4 for the CRM. The DLTs are the same, 2.8. At 100 mg/m2, the differences between the two methods are not overwhelming. However, if we were to look at the results for the MTD at 150 mg/m2, then we would see that there are greater differences. The mean predicted MTD for TER and CRM are 125 mg/m2 and 141 mg/m2 respectively. Both designs underestimate the true MTD but the estimation is much lower for the TER method. With regards to safety, the DLT for TER is 2.9 while CRM has an estimate of 2.5. The estimates show the same trend when the assumed true MTD is 200 mg/m2. Both approaches underestimate the true mean; however the Bayesian approach was much closer to the true value for all dose levels. At all three dose levels, the required number of patients was much lower when using the Bayesian design. The mean number of DLTs for the Bayesian approach was either less than or equal to the traditional design at all dose levels. Based on these observations, the Bayesian CRM approach is more favorable. F. Hybridization The Bayesian approach can be used alone or as a hybrid with the classic approach. The following example shows the Bayesian approach in a classic design. A two-arm parallel design which compares a test treatment and a control has data from three clinical trials. The trials are similar in size. The prior probabilities for the effect size are 0.1, 0.25 and 0.4 with 1/3 probability for each trial. The power of the effect size is given as: 𝑝𝑜𝑤𝑒𝑟(𝜀) = 𝜙( √𝑛 𝜀 − 𝑧1−𝛼 ) 2 12 Φ is the c.d.f. of the standard normal distribution Prior, π(ε), is the uncertainty of ε, the expected power, Pexp is √𝑛 𝑃𝑒𝑥𝑝 = ∫ 𝜙 ( 𝜀 − 𝑧1−𝛼 ) 𝜋(𝜀)𝑑𝜀 2 Assuming, one-sided α = 0.025, so 𝑧1−𝛼 = 1.96, then 1 𝜋(𝜀) = {3 𝜀 = 0.1, 0.25, 0.4 0 In the classic approach the mean of the effect size, 𝜀̅ = 0.25, is used to calculate the sample size. For the design with β = 0.2, the total sample size would be 4(𝑧1−𝛼 + 𝑧1−𝛽 )2 𝑛= = 502 𝜀2 If the Bayesian approach is used, 0.1√𝑛 𝜀 2 𝑃𝑒𝑥𝑝 = 1/3[𝜙 ( − 𝑧1−𝛼 )+ϕ( 0.25√𝑛 𝜀 2 0.4√𝑛 𝜀 2 − 𝑧1−𝛼 ) + 𝜙 ( − 𝑧1−𝛼 )] = 0.66 The Bayesian approach considers the uncertainty of the effect size, so the expected power is the average of the three powers calculated using the 3 different effect sizes. The expected power is found to be 66% which is lower than the 80% power stated by the frequentist approach. In order to reach the expected power of 80%, the sample size needs to be increased. In this hybrid example, the Bayesian approach is used to increase the probability of success given that the final criterion is p ≤ α = 0.025. G. FDA Guidance – Medical Devices With the growing trend of the Bayesian approach being used in clinical trials, the FDA has put out a guideline which specifically addresses the use of Bayesian methods in medical devices and drug development. In addition to the standard protocol, the FDA would like to have the prior information and all the assumptions that will be made during the study. The criterion for success of the study should be clearly stated and the proposed sample size should be justified. In addition, the FDA recommends that tables of the probability of satisfying the study claim given certain “true” parameter values such as event rates along with various sample sizes for the trial be provided. The table should contain an estimate of the probability of a type I error in the 13 case where the parameters are consistent with the null hypothesis or the power in the case where the parameter values are consistent with the alternative. Simulations which are used in order to calculate sample size and other study parameters should adequately reflect the study design. The FDA suggests that the prior probability which is being used for the study claim be evaluated thoroughly before commencement of the study. The prior probability should not be too informative. The value that is considered to be too high for a prior probability is fully dependent on each individual case. The prior probability should definitely not be as large as the success criterion for the posterior distribution. The prior probability should not overwhelm the data; this could lead to inaccurate results and a loss of control of type I error. The effective sample size quantifies the efficiency that is being gained from using the prior information and gauges if the prior is too informative. Data can be simulated using the prior distribution through Markov Chain Monte Carlo simulations. The program code and data that are used in generating simulated results should be provided to the FDA. An electronic copy of the study data and computer code of the simulation should also be provided to the FDA. H. Conclusion The Bayesian adaptive method can be used fully or as a hybrid with the classic approach. Bayesian full approach is more beneficial in Phase I and Phase II studies, due to the inherent adaptive nature of the design. In Phase I trials, conditions are more dynamic than in other phases and the flexible nature of the Bayesian approach allows for unexpected changes. Clinical trials using this method tend to be smaller and more informative. Data can be assessed as they accumulate, so decisions to modify the trial can be made more quickly. In the application of the Bayesian method, it is imperative that the validity and the integrity of the study are maintained. For adaptations such as endpoints or hypotheses changes, the feasibility should be thoroughly evaluated in order to prevent abuse of the method. Protocol amendments need to be evaluated carefully and sufficient information about the proposed study should be provided to the FDA according to the guidance documents. Bayesian adaptive methods are more favorable over traditional methods because of its flexibility, greater efficiency and lower sample size requirements. Its implementation is beneficial to the pharmaceutical industry, as it helps to reduce costs and improve drug development. 14 Despite the fact that in June 2003 the FDA approved Pravigard Pac (Bristol-Myers Squibb) based on analyses using the Bayesian approach, the FDA is cautious of the growing trend of adaptive designs. The agency has released guidance documents which specifically addresses Bayesian clinical trials. The recommendations from the FDA provide guidance in the selection of the prior distribution and the use of simulated data to make such a selection. Although the Bayesian approach has more flexibility than the frequentist approach, it also has drawbacks. Data analysis has to be conducted after treating each patient; this can become overwhelming. The selection of a prior probability can be challenging. There might not be historical data from which a prior distribution can be modeled. The selected prior distribution might be too informative, resulting in inaccurate conclusions regarding the new treatment. Computation can be cumbersome for larger trials and the chance of erroneous decision making is increased. Regardless of the drawbacks, the Bayesian approach remains more favorable than the traditional approach. In the long run, it will lead to faster drug development which would in turn make drug development more economical. 15 I. References 1. Chang, Mark (2008). Adaptive Design Theory and Implementation Using SAS and R. Boca Raton: Chapman & Hall/CRC 2. Berry, Scott M., Carlin, Bradley P., Lee, J.Jack, Muller, Peter (2011). Bayesian Adaptive Methods for Clinical Trials. Boca Raton: Chapman & Hall/CRC 3. Chow, Shein-Chung and Chang, Mark (2008). Adaptive Design Methods in Clinical Trials – A Review. Orphanet Journal of Rare Diseases, 3 11 4. Cook, Thomas D. and DeMets, David L. (2008). Introduction to Statistical Methods for Clinical Trials. Boca Raton: Chapman & Hall/CRC 5. The FDA Center for Drug Evaluation and Research (CDER) and Center for Biologics Evaluation and Research (CBER), Guidance for Industry: Adaptive Design Clinical Trials for Drugs and Biologics: www.fda.gov/downloads/Drugs/ GuidanceComplianceRegulatoryInformation/Guidances/UCM201790 .pdf. 6. Guidance for the Use of Bayesian Statistics: www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm 071072.htm 7. Donald A. Berry (2006). Bayesian Clinical Trials. Nature Reviews, 5 27-36