Problems and issues in adaptive clinical trial design Gordon Lan Statistical Research Center Pfizer Inc. ASA NJ-Chapter Seminar Piscataway, NJ June 4, 2002 Outline: 1. Information, sample size, trend of the data and conditional power (Max Halperin) 2. Re-estimation of sample size 3. Over-ruling of a sequential boundary 1 Adaptive (Flexible) designs Modification of the design of an experiment based on accrued data has been in practice for hundreds, if not thousands, of years in medical research. In the past, we have a tendency to adopt statistical procedures available in the literature and apply them directly to the design of clinical trials. However, since those procedures were not motivated by clinical trial practice, they may not be the best tools to handle certain situations. The major concern about unblinding during an interim analysis is the potential bias introduced by a change in clinical practice resulting from feedback from the analysis. To preserve the credibility of the study, results of the unblinded analysis should not be made available to anyone directly involved in managing the study. DSMB/FDA 2 How do we design a clinical trial for a new treatment? H0: New treatment = Standard treatment Ha1: New treatment is “better than” the standard treatment Ha2: New treatment is “worse than” the standard treatment 1=P(Accept Ha1| H0 ) 2=P(Accept Ha2| H0 ) = 1 + 2 Fixed versus sequential design. (1) 1= 0.025, 2= 0.025. (2) 1=0.025, 2= 0.2. In case (2), = 0.025 + 0.2 = 0.225. Is the study credible? Example: Z = 2.14, one-sided p-value = 0.016. Doubled one-sided p-value = 2 x 0.016 = 0.032. Two-sided p-value = ? 3 * 0.025 * * * * * 0.025 4 One sample, determination of sample size iid N ( ,1) X 1 , X 2 ,, X N Test Z(N ) o : o X1 X N N vs a : > o with .025, power = 1 - = .85 1.96 Reject o N N ( ) N z 1.96, z 1.04 ; z z 1.96 1.04 3. EZ ( N ) For a given value of , solve EZ ( N ) z z for N . That is, solve = 0.5 N= 36 N 3 for N . 0.2 0.1 0.05 225 900 3600 0.01 90000 5 0 In the one-sample case with X N(,1), we solve for N from N 3 z z . In the one-sample case with X = N(,2), we solve for N from N 3 z z . = For comparisons of two means, we solve N from X Y N = 3. 4 = X Y For comparisons of two survival distributions using the logrank test, we solve D (expected number of events) from log C T D = 3. 4 = log C T We will give a brief introduction to the statistical procedures for clinical trial design and interim data analysis for the one-sample case with X N(,1). The same idea (with minimal modifications) applies to the two-sample comparisons. 6 The “trend” of the data Z(1), Z(2),……. A stochastic process. X 1 , X 2 ,, X n ,, X N Z (n) Sn n iid N ( ,1) , n 1, , N EZ ( n ) n , Var Z ( n ) 1 EZ(n) > 0 N 3 = = n 0 1 ... 2 3 n N Given Z, EC[Z1]=? VarC[Z1]=? PC[Z1>1.96]=? (n, Z(n) ) ( , Z) ( , B) where B = Z . The parameter = E[ Z1] is call the drift. 7 Note that Z1 = 1 X 1 X n X n 1 X N N = + [ - ], the sum of two independent r.v.’s Conditional on given, ECB1 = B + (1 - ) VarC = (1 - ) 8 Example: 0.2 , N 3 N 225 when n 90 , Z.4 2.846 P ( Z1 1.96 Z.4 2.846, ) ? = 90/225 = 0. 4. B = 2.846 .4 =1.8, Z.4 Z(90) X90 2.846 X90 0.3 1 90 . 4.5 (ii) current trend 1.8 1.5 1.2 0 .3 3.6 (i) hypothetical trend 0.2 3 (0.4, 1.8) 1.8 (iii) Null trend (0.4, 1.2) 0 0.4 1.0 9 (i) P[ B1 1.96 B.4 1.8, 3] B1 3.6 1.96 3.6 B.4 1.8, 3] .6 .6 P[ N (0,1) 2.12] P[ 0.9830 (ii) P[ B1 1.96 B.4 1.8, 4.5] B1 4.5 1.96 4.5 B.4 1.8, 4.5] .6 .6 P[ N (0,1) 3.28] P[ .9995 (iii) P[ B1 1.96 B.4 1.8, 0] 1.96 1.8 ] .6 P[ N (0,1) 0.21] 0.4168 P[ N (0,1) This conditional power evaluation procedure applies to the comparisons of two means, or two survival distributions using the logrank test. 10 Two- Example (BHAT) Primary endpoint : survival time D is “estimated” to be 398 in June, 1982. d = 318 in September, 1981. = 318/398 = 0.80 Z0.8 logrank 2.82, B0.8 2.82 .80 2.52 3.15 (ii) 2.52 (iii) 2.52 0 .8 (ii) Under the current trend 1.96 315 . Pc Z1 1.96 P N (0,1) .2 P N (0,1) 2.66 .9961 (iii) Under the null trend 1.96 2.52 Pc Z1 1.96 P N (0,1) .2 P N (0,1) 1.25 .8944 * PType I error .025/.8944 .02795 11 1 Review of sample size estimation N = sample size or amount of information required = F(, , power), (1) where = (treatment effect) is unknown in practice. When is given, N is evaluated by (1) to reach a desired power. For the comparison of two means, = (x - y)/; For the comparison of two survival distributions under PH assumption, = log(C/T). Choose = the minimal clinically meaningful treatment difference. =? Public health Potential profit If can be estimated accurately in advance, then it is not ethical to start the clinical trial. 12 Re-estimation of sample size 1. Based on re-estimation of nuisance parameters. Comparison of two means: 1 and 2 are parameters of interest; is a nuisance parameter; Sample size depends on the value of = 1 2 . 2. Based on observed treatment difference. We will discuss only case 2. Re-estimation of sample size may inflate the -level. 13 Proposal #1: Introduce a weighted Z-test to control the -level. Let Z X 1 X 2 ... X 100 100 X 1 X 2 ... X 40 100 X 41 X 42 ... X 100 100 40 X 1 X 2 ... X 40 60 X 41 X 42 ... X 100 100 100 40 60 0.4 Z [1 40] 0.6 Z [41 100]. 14 Perform interim data analysis after n(<N) observations. If the sample mean X is greater than or equal to 0 , proceed as planned. If the sample mean X is less than 0 , then increase the sample size of the study to N* so that the “unconditional power” is 85%. Modify the test statistic from Z to Z*. n n The weighted Z* puts weights: (= n/N) on the observations X1, X2,…, Xn, and 1 on the observations X(n+1), X(N+2),…, XN*. 15 Let Z X 1 X 2 ... X N N X 1 X 2 ... X n N X n 1 X n 2 ... X N N n X 1 X 2 ... X n N n X n 1 X n 2 ... X N N N n N n X X 2 ... X n X X n 2 ... X N 1 1 n 1 n N n Z [1 n] 1 Z [(n 1) N ]. The weighted Z-test is defined as Z * Z [1 n] 1 Z [(n 1) N *]. Note that N* depends on Z[1~n] but Z* has a standard normal distribution under H0. Even if N* is decided by a desired unconditional power (say, 75%), Z* is also standard normal under H0. The only problem of this approach is the unequal weights of observations. (N = 100, n = 60, = .6, 1- = .4, N* = 200). 16 Proposal #2: Monitor at time = n/N, where 0 < < 1. Evaluate conditional power (CP) based on the value of Z, and suppose the observed trend continues. (1) Extension of the study may inflate -level. (2) If the study stops early for futility because the CP is low, then the -level will be reduced. We may use the reduced in (2) to re-estimate the sample size. 17 Animal Care and Use Committee, NHLBI-NIH (1987-1989) 100(rats) = 50 + 50 After 50 rats, if the results are “promising”, give the researcher another 50. If the results are NOT promising, stop the experiment. What if the results are somewhat promising? P(Z(100) 1.96 | Z(50) and the current trend) = (?0.9 or 0.015 or 0.06?) 18 Use of Conditional Power to modify sample size I. An example of a two-stage design. Data are analyzed at = 0.5. CP = CP( ̂ ). (1) If CP 0.1, stop and accept Ho. (2) If CP 0.85, continue to = 1. Reject Ho if Z1 z0.025= 1.96. (3) If 0.1 < CP < 0.85, extend the trial to M > 1 so that CPM = 0.85, and reject Ho if ZM 1.96. The probability of Type I error, estimated by simulations with 1 million repetitions, is ̂ = 0.022. Modifications and simulations. Cynthia Siu, 2001 JSM Cunshan Wang, 2002 JSM 19 II. Use simulations to investigate similar two-stage designs under various conditions. Consider = 0.1 to 0.9 by 0.1, = 0.025 and 0.05. (1) If CP ll (lower limit ll = 0.0 to 0.45 by 0.05), stop and accept Ho. (2) If CP ul (upper limit ul = 0.5 to 0.95 by 0.05) continue to = 1. Reject Ho if Z1 z (3) If ll < CP < ul, extend the trial to M > 1 such that CP(M) = ul, and reject Ho if ZM z. The simulation results suggested that when 0.7 and ll 0.1, then the P(Type I error) . Optimality versus flexibility 20 Some selected simulation results for = 0.8 and 0.9, = 0.025 are reported in the following table. (=0.025.) 0.8 ll 0.05 ul 0.5 – 0.75 0.8 0.8 0.8 0.8 0.9 0.05 0.10 0.10 0.15 0.05 0.8 – 0.95 0.5 – 0.55 0.6 – 0.95 0.5 – 0.95 0.5 – 0.70 0.9 0.10 0.5 – 0.60 0.9 0.9 0.10 0.15 0.65 – 0.95 0.5 – 0.95 21 ̂ 0.252 ̂ 0.0270 < 0.025 0.0253 < 0.025 < 0.025 0.260 ̂ 0.0272 0.252 ̂ 0.0254 < 0.025 < 0.025 Alpha Spending Function *(2)- *(1) *() 0 1 2 1 *() is a non- decreasing function defined on [0,1]. *(0) = 1, *(1) = . At 1 , find b1 such that (under H0) P(Z1 b1 ) = *(1). At 2 , find b2 such that (under H0) P(Z1< b1 and Z2 b2) = *(2) - *(1). At 3, …….. 22 http://www.medsch.wisc.edu/landemets/ Programs for Computing Group Sequential Boundaries Using the Lan-DeMets Method Version 2 Last Update: 12/1/99 David M. Reboussin Department of Public Health Bowman Gray School of Medicine Winston-Salem, NC 27157 David L. DeMets Department of Biostatistics and Medical Informatics, Department of Statistics University of Wisconsin Madison, WI 53706 KyungMann Kim Department of Biostatistics and Medical Informatics, Department of Statistics University of Wisconsin Madison, WI 53706 K. K. Gordon Lan Pfizer Central Research Groton, CT 06340 23 Overruling of a group sequential boundary What if we specify a spending function (group sequential boundary) in the protocol but use it only as a “guideline”? Remarks: (1) It is possible that, for some k<K, the boundary is crossed but the DSMB decides to continue the study. In addition, the decision to reject or to accept the null hypothesis will be made depending on the future Z-values. (2) The “lower boundary” introduced in ss re-estimation cannot be overruled. 24 Reasons for overruling Which treatment is better? (Reference: Lan and Wittes, 1994) (1) Multiple endpoints. Pick a primary endpoint and use a group sequential boundary to monitor this endpoint. (2) Primary endpoint = survival time Definition A: The new treatment is better if it delays the occurrence of death. Definition B: The new treatment is better if it reduces hazard all the time. 25 How to adjust the future upper boundary after overruling 2* () = log [1 + (e-1) ] ------ Pocock = Boundary based on 2* () 0.25 0.5 0.75 1 2.37 2.37 2.36 2.35 2.16 2.31 2.33 2.04 2.26 1.96 ________________________________________________ Power comparisons: =0 = 2.187 = 2.453 = 2.738 = 3.065 Original boundary Z1=2.35 Modified as above .025 .50 .60 .70 .80 .0094 .4353 .5410 .6510 .7627 .0126 .4679 .5732 .6792 .7849 26 Sequential design: * * Reject * * 0.5 * With the lower boundary at = 0.5 for acceptance of the null hypothesis, then the realized -level will be less than 0.025. If, to improve efficiency, we lower the upper boundary so that = 0.025, then the lower boundary CANNOT be overruled. 27 Recommendations for overruling: A group sequential boundary should only be overruled in a conservative way. If the design of a trial is one-sided, do not introduce a lower boundary unless it is necessary. EaST Triangular test 28 Final comments 1. Many statistical methods in clinical trial design and data analysis need modifications. Modification of the design of an experiment based on accrued data has been in practice for hundreds, if not thousands, of years in medical research. The major concern about unblinding during an interim analysis is the potential bias introduced by a change in clinical practice resulting from feedback from the analysis. To preserve the credibility of the study, results of the unblinded analysis should not be made available to anyone directly involved in managing the study. (I think the issue here is how to develop the SOP’s for interim analyses.) 2. In clinical trial data monitoring process, we may not have a Brownian motion process with a linear trend. 29