Nick Pavlakis, MBBS, MMed (Clin Epi), PhD Royal North Shore Hospital Sydney University PHASE II DESIGNS IN ONCOLOGY 1 Outline • Aims of a Phase II study • Design aspects – Disease/population selection – Endpoint selection – Single-arm vs randomised design – Enrichment strategies involving biomarkers – Statistical design aspects: Traditional, 2-stage vs adaptive/Bayesian • Case example 2 Outline • Aims of a Phase II study • Design aspects – Disease/population selection – Endpoint selection – Single-arm vs randomised design – Enrichment strategies involving biomarkers – Statistical design aspects: Traditional, 2-stage vs adaptive/Bayesian • Case example 3 Clinicians’ perspective for Phase II trials • Why do a Phase II Trial? – To determine the activity of a drug (efficacy) in a given tumour type or types (“screening”) • Seek preliminary hints of activity and guide selection of tumour types for further study – To determine safety of the drug • In a specific patient population/disease setting • Part of strategic pharma development, ie, Phase II following Phase I (“decision making”) – Go/no-go answer to allow the conduct of a definitive Phase III/registration trial in a specific disease • Co-operative group or investigator-initiated – Clinically driven rationale/unmet need – Single agent or combination with existing therapy 4 Aims of a Phase II study • Provide initial assessment of drug efficacy: “clinical activity” – Screen out ineffective drugs – Identify promising new drugs for future evaluation • Further define safety and toxicity – Type – Frequency • Provide sufficient evidence base to support Phase III development 5 Outline • Aims of a Phase II study • Design aspects – Disease/population selection – Endpoint selection – Single-arm vs randomised design – Enrichment strategies involving biomarkers – Statistical design aspects: Traditional, 2-stage vs adaptive/Bayesian 6 Disease/population selection • Aim to include patients who are most likely to benefit from the intervention being tested; exclude patients unlikely to benefit or at greater risk of harm – Homogeneous population One or more of the following factors may drive this • A priori information – Disease prevalence of a particular protein or gene abnormality predicting for greater drug benefit based on drug MoA (biologic rationale) • Predictive or prognostic clinical factors or biomarker – Pre-clinical evidence for activity/proof of concept in that tumour type • Clinical factors – Responses seen in Phase I – Biologic rationale in disease area of unmet need 7 Outline • Aims of a Phase II study • Design aspects – Disease/population selection – Endpoint selection – Single-arm vs randomised design – Enrichment strategies involving biomarkers – Statistical design aspects: Traditional, 2-stage vs adaptive/Bayesian • Case example 8 Endpoint selection • Response rate (RR)* • Progression-free survival (PFS)* • Overall survival (OS) • Patient-reported outcomes (PRO)/ Quality of life (QoL) – e.g., clinical benefit response of gemcitabine in pancreas CA (pain, KPS, weight) • Molecular biomarkers, e.g., biomarker expression • Functional imaging, e.g., PET • Toxicity *assumes these are intermediate predictors for OS 9 Primary endpoint (PE) selection Danii et al. Clin Cancer Res. 2009;15(6)March15, 2009 10 Primary endpoint selection • RR vs PFS – Requires understanding of expected drug effect on disease and clinical setting, e.g., cytotoxic vs cytostatic; 1st line vs 2nd line vs maintenance • RR or PFS by RECIST – Rigor depends on goal of the study, e.g., activity “screening” trial (investigator review) vs decision-making “go/no-go” trial (independent review) – Cost implications with independent review, multiple scanning in short intervals 11 Primary endpoint selection RR vs PFS • Know your drug – What is drug expected to do? – Is tumour response expected to occur based on drug MoA or prior observation in Phase I? • Expectations may differ with monotherapy vs combination with chemo • Decision may come down to purpose of the Phase II trial: ie, “signal” finding vs “go/no-go” – e.g., Phase II PFS endpoint for extension to Phase III OS study in Phase II/III design OR – RR endpoint for rapid screening in “pick-the-winner” randomised Phase II study • Study size depends on type I/II error rates around PE estimate 12 Outline • Aims of a Phase II study • Design aspects – Disease/population selection – Endpoint selection – Single arm vs randomised design – Enrichment strategies involving biomarkers – Statistical design aspects: Traditional, 2-stage vs adaptive/Bayesian • Case example 13 Types of Phase II studies Seymour et al. CCR 2010 2010 March 15; 16(6): 1764–1769 14 Single arm vs randomised design Single Arm Randomised • Smaller sample size • Generally larger • Assumes some a priori information of expected RR based on historical information/control database • Ideal for comparison of primary endpoint OR for “calibration” against a control arm where expected outcomes less certain • One- or two-stage design • Comparative or non-comparative designs • Efficient? short accrual esp. with RR but only one study at a time – More expensive but can explore multiple arms at once 15 Single arm vs randomised design: Primary endpoint RR Monotherapy Combination therapy • Single-arm acceptable • Randomised design usually preferred esp. with new combinations • Randomisation for testing different dose or schedule or comparing with other active therapies – e.g., standard therapy ± novel agent, or combination of novel agents 16 Single arm vs randomised design: Primary endpoint PFS Monotherapy Combination therapy • Single arm usually only acceptable when there exists solid a priori info on expected outcomes for patients/disease • Randomised design most suitable – Historical controls – Database • Use of placebo ideal (depending on drug) • Comparative vs non-comparative designs 17 Accuracy of single vs randomised Phase II studies “Variability in historical control success rates, outcome drifts in patient populations over time, and/or patient selection effects can result in inaccurate false-positive and false-negative error rates in single-arm designs, but leave performance of the randomised two-arm design largely unaffected at the cost of 2 to 4 times the sample size compared with single-arm designs. Given a large enough patient pool, the randomised phase II designs provide a more accurate decision for screening agents before phase III testing”. Tang H et al. J Clin Oncol 28:1936-1941 18 Randomised Phase II designs: basic considerations • Non-comparative – Each arm considered on its own for the PE – Good for multiple drug screening, “pick-the-winner” design • Comparative – Statistical design based PE in experimental arm compared with control arm • Cross-over (after progression) – To experimental therapy: to improve drug access, provide extra info • Adaptive designs: evaluate patients' reactions to a drug early in a clinical trial and modify the trial accordingly – Adapt dose, target population; biomarker enrichment; futility criteria; sample size re-estimation • Randomised discontinuation – Useful when enrolling patients in non-progressive disease setting with cytostatic therapies – A type of enrichment strategy 19 Randomised discontinuation design • Alternative phase II study design for determining activity of ”cytostatic” anticancer agents • Example of enrichment design: – To select a subset of enrolled patients, homogeneous with respect to important prognostic factors, and randomise these • Advantageous when a subset of patients, those expressing the molecular target, is sensitive to the agent Rosner GL et al. J Clin Oncol 2002;20: 4478-84; Freidlin B et al. J Clin Oncol 2005;23:5094-8. 20 Randomised discontinuation design Initial phase can be large 2nd phase (randomised phase); select patients thought most likely to benefit Design (follow up) of each phase and sample size affected by • Expected cancer growth rate, and • Degree of drug activity N = 335 Carboxyaminoimidazole (CAI, NSC 609974) CALGB study in metastatic RCC. Patients on placebo could cross over to CAI PE: Proportion of patients progressing CAI vs placebo This design for suspected cytostatic therapies where initial patient population may be too heterogeneous Alternative designs: Simple randomised Phase II; Placebo to agent for fixed periods; cross-over Rosner GL et al. J Clin Oncol 2002; 20: 4478-84 21 Outline • Aims of a Phase II study • Design aspects – Disease/population selection – Endpoint selection – Single arm vs randomised design – Enrichment strategies involving biomarkers – Statistical design aspects: Traditional, 2-stage vs adaptive/Bayesian • Case example 22 Enrichment strategies: Population enrichment design Learn Phase Determine Biomarker Status Conduct the Trial of Study Drug vs Comparator With Two Biomarker Cohorts Confirm Phase Biomarker Positive (BP) Cohort Biomarker Negative (BN) Cohort Assuming a Clear Trend of Efficacy in the BP Compared to the BN Cohort, Conduct the Trial of Study Drug vs Comparator in the BP Cohort onlya Biomarker Positive Cohort aIf there are similar treatment effects in both cohorts, the entire population may be carried forward. Ananthakrishnan R et al. Crit Rev Oncol Hematol. 2013 Oct;88(1):144-53 23 Enrichment strategies • Efficiency depends on strength of a priori information in relation to biomarker • Known biomarker-effect relationship – Go straight to biomarker-selected design • e.g., ALK gene re-arrangement + Phase II population with ALK inhibitor • Less certain correlation or broader action of drug beyond biomarker – Include ALL; analyse by biomarker after N1, validate activity in biomarker selected population in N2: e.g., PDL1 inhibitor studies – Biomarker-selected first phase then unselected second phase T790M + NSCLC 24 Adaptive designs Berry DA. Nat Rev Clin Oncol. 2012 (9): 199-207 25 Adaptive designs: Phase II-III National Academy of Sciences, Nass, S, Harold L, Moses H, and Mendelsohn, J (Eds.), 2010. A National Cancer Clinical Trials System for the 21st Century: Reinvigorating the NCI Cooperative Program. National Academies Press, Washington, DC. 26 Outline • Aims of a Phase II study • Design aspects – Disease/population selection – Endpoint selection – Single arm vs randomised design – Enrichment strategies involving biomarkers – Statistical design aspects: Traditional, 2-stage vs adaptive/Bayesian • Case example 27 Statistical design aspects • Rule no 1: Involve a statistician early on! • GLOSSARY • P0 - the largest response proportion that, if true, means the treatment does not warrant further study • P1 –the smallest response proportion that, if true, implies that the treatment warrants further study. • Statistical test of hypothesis is then conducted to test the null hypothesis (H0): • P ≤ P0 vs the alternative hypothesis that P ≥ P1, where P is the true proportion responding to the treatment in the population 28 Statistical design aspects • Rule no 1: Involve a statistician early on! • GLOSSARY • α is the probability of rejecting the null hypothesis (H0) when it is true (Type I error – the incorrect rejection of a true null hypothesis – false positive), usually 0.05 or 0.10 in Phase II trials – i.e., the probability of rejecting the hypothesis that the proportion responding to the treatment is less than or equal to P0 when this hypothesis is actually true, Pr (Rejecting P≤P0/P≥P1) • β is the probability of rejecting the alternative hypothesis when it is true (Type II error - failure to reject a false null hypothesis – false negative) – i.e., the probability of not rejecting the hypothesis that the proportion responding to the treatment is less than or equal to P0 when this hypothesis is false, Pr (NOT Rejecting P≤P0/P≥P1) • Power is the probability of rejecting the null hypothesis that the proportion responding to the treatment is less than or equal to P0 when this hypothesis is false. Power = 1-β; usually 0.80 or 0.90 29 Design considerations in Phase III trials • Minimise cost of the trial – Minimise number of patients exposed to an ineffective treatment – Enroll as few patients as “necessary” to show benefit or failure 30 Standard Single-Arm Phase II Study • Binary endpoint (clinical response vs no response) • Simple setup: – α = 0.10, β = 0.10 (power = 0.90) H0 : P0 = 0.20 (null response rate) H1 : P1 = 0.40 (target response rate) • Based on design parameters: N = 36 – Conclude effective if 11 or more responses (i.e., observed response rate of ≥0.31) • What if by the 15th patient, you’ve seen no responses? – Is it worth proceeding? • Maybe you should have considered a design with an early stopping rule – 2-stage designs 31 Classic Simon 2-stage single-arm study Ananthakrishnan R et al. Crit Rev Oncol Hematol. 2013 Oct;88(1):144-53 32 Revised 2-stage design • Stage 1: enroll 19 patients – If 4 or more respond, proceed to stage 2 – If 3 or fewer respond, stop • Stage 2: enroll 17 more patients (total N = 36) – If 11 or more of total respond, conclude effective – If 10 or fewer of total respond, conclude ineffective • Design properties • α = 0.10 H0 : p = 0.20 (null response rate) H1 : p = 0.40 (target response rate) • What about power? 33 2-Stage designs 34 2-Stage designs Gehan Two-Stage Design (1961) • It is a two-stage design for estimating the response rate but providing for early termination if the drug shows insufficient antitumour activity • The design is most commonly used with a first stage of 14 patients. If no responses are observed, the trial is terminated Fleming Two-Stage Design (1982) • Fleming’s design is the only two-stage design that we cover that may have the early termination with the “accept the drug” conclusion 35 Frequentist Versus Bayesian • So far, “frequentist” approaches • Frequentists: α and β errors • Bayesians: – Quantify designs with other properties – General philosophy: • • • • Start with prior information (“prior distribution”) Observe data (“likelihood function”) Combine prior and observed data to get “posterior distribution” Make inferences based on posterior 36 FDA criteria for adaptive Bayesian clinical trial According to FDA guidelines, an adaptive Bayesian clinical trial can involve: – Interim looks to stop or to adjust patient accrual – Interim looks to assess stopping the trial early either for success, futility or harm – Reversing the hypothesis of non-inferiority to superiority or vice versa – Dropping arms or doses or adjusting doses – Modification of the randomisation rate to increase the probability that a patient is allocated to the most appropriate arm 37 Adaptive randomisation designs Begin assuming equally effective (1/3, 1/3, 1/3) • May wait until a minimum number have been treated per arm • Based on currently available (accumulated) data, randomise next patient (i.e., “weighted” randomisation) • Stopping rules: drop an arm when there is “strong” evidence that – It has low efficacy, OR – It has lower efficacy than competing treatments 38 Bayesian inferences No p-values and confidence intervals From the posterior distribution: • Posterior probabilities • Prediction intervals • Credible intervals Bayesian designs • Can look at data as often as you like (!) • Use information as it accumulates • Make “what if” calculations • Helps decide to stop now or not 39 Biomarker – Integrated Approaches of Targeted Therapy for Lung Cancer Elimination (BATTLE) Umbrella protocol Core needle biopsy Biomarker profile • EGFR mutation/copy number • KRAS/BRAF mutation • VEGF/VEGFR-2 expression • RXRs/Cyclin D1 expression and CCND1 copy number Equal followed by adaptive randomisation Erlotinib Vandetanib Erlotinib + bexarotene Sorafenib Kim ES et al. Cancer Discovery. Online April 3, 2011; DOI: 10.1158/2159-8274.CD-10-0010 40 Biomarker – Integrated Approaches of Targeted Therapy for Lung Cancer Elimination (BATTLE) Kim ES et al. Cancer Discovery. Online April 3, 2011; DOI: 10.1158/2159-8274.CD-10-0010 41 Biomarker – Integrated Approaches of Targeted Therapy for Lung Cancer Elimination (BATTLE) Kim ES et al. Cancer Discovery. Online April 3, 2011; DOI: 10.1158/2159-8274.CD-10-0010 42 Key issues with adaptive designs • Whether the adaptation process has led to design, analysis, or conduct flaws that have introduced bias that increases the chance of a false conclusion that the treatment is effective (a Type I error) • Whether the adaptation process has led to positive study results that are difficult to interpret irrespective of having control of Type I error 43 Outline • Aims of a Phase II study • Design aspects – Disease/population selection – Endpoint selection – Single-arm vs randomised design – Enrichment strategies involving biomarkers – Statistical design aspects: Traditional, 2-stage vs adaptive/Bayesian • Case example 44 Case Example: INTEGRATE Study J Clin Oncol 33, 2015 (suppl 3; abstr 9) • The disease and patient population • 2012: The treatment of refractory advanced oesophago-gastric cancer (AOGC) remains an area of unmet need • Vascular endothelial growth factor (VEGF) associated with prognosis in AOGC – High IHC expression: poor outcome – Low sVEGF: better outcome – 1st-L PIII AVAGAST study: Chemotherapy (CX) +/- bevacizumab: improved PFS; ORR but not OS • The drug: • Regorafenib (BAY 73-4506) - oral multi-kinase inhibitor targeting angiogenic (VEGFR, TIE-2), stromal (PDGFR-β) and oncogenic (RAF, RET and KIT) receptor tyrosine kinases – Effective in Phase III studies in refractory colorectal cancer (CORRECT study); and refractory GIST (GRID study) 45 Case Example: INTEGRATE Study J Clin Oncol 33, 2015 (suppl 3; abstr 9) • The study question: Is regorafenib active in AOGC? – Goal: to determine if sufficient activity for Phase III study (screening) • Study design and considerations – Phase II, multicentre study • Single-arm vs randomised? – Concerns: “cytostatic” drug (mainly SD in CORRECT study); uncertain PFS in AOGC • Randomised Phase II: placebo-controlled 46 Case Example: INTEGRATE Study J Clin Oncol 33, 2015 (suppl 3; abstr 9) • Study population • Adults with AOGC, histo/cytologically confirmed (adeno or undifferentiated) refractory to 1st- or 2nd-line chemo – No prior anti-VEGF therapy • Type of disease: Measurable or evaluable? • Primary endpoint selection: ORR vs PFS • PFS – Measurable disease defined by RECIST Version 1.1 by CT scan performed within 21 days prior to randomisation 47 Case Example: INTEGRATE Study J Clin Oncol 33, 2015 (suppl 3; abstr 9) • Study design questions • Why have a control arm? • Comparative vs non-comparative design? – Both are option: N greater in comparative design • Interim analysis? • Randomisation: 1:1 vs 2 (Regorafenib): 1 (placebo) • Cross-over: Y or N? 48 Case Example: INTEGRATE Study J Clin Oncol 33, 2015 (suppl 3; abstr 9) STATISTICAL PLAN • Assumption for this patient population: – Median TTP approximately 2 months. An increase in the median TTP to 3.33 months for patients receiving regorafenib will be of clinical interest. • The primary endpoint is the median progression-free survival (PFS) in the intervention group. • N = 92 patients in the treatment group will have 90% power with 95% confidence to include a clinically interesting 66% PFS at 3 months and exclude a less interesting 50% PFS rate at 3 months. 100 patients will be randomised to allow for drop out/ineligibility. • Simon 2-stage design • Futility analysis after 33 patients have been followed for at least 2 months. If 15 or more patients have not progressed then the study regimen/design will be reassessed or the study stopped. 49 Case Example: INTEGRATE Study J Clin Oncol 33, 2015 (suppl 3; abstr 9) • Randomised phase II calibration design • N=100 regorafenib (active arm) – Simon 2-stage design – Null hypothesis (H0): 2 month PFS ≤50% (i.e., median PFS ≤2.00 months) – Alternative hypothesis (HA): 2 month PFS ≥66% (i.e., median PFS ≥3.33 months) – N=100 provides >90% power at 5% significance to reject H0 if HA is true • N=50 placebo (calibration arm) – To judge the applicability of the reference hypothesis (H0) – To provide a reference if H0 found not applicable • A total of N=150 patients randomised 2:1 stratified by prior chemotherapy lines (1 vs 2) and geographic region (ANZ/Canada/Korea) 50 CONCLUSION PHASE II TRIAL DESIGNS 51 Appropriate Primary Endpoint “Response” Tumour shrinkage expected (or other qualified biomarker) PFS Combination Monotherapy Randomised design Only if robust control available Consider Single arm design Randomised design Single arm design • Include secondary endpoints (biomarkers, PROs, imaging) • Biomarkers ̶ Do not enrich unless clinically validated ̶ Consider adaptive designs ̶ Consider multi-disease trials Seymour et al. CCR 2010 2010 March 15; 16(6): 1764–1769 52 53