Statistical Methods in Clinical Trials Sample Size Determination Ziad Taib Biostatistics AstraZeneca March 12, 2012 • “The number of subjects in a clinical study should always be large enough to provide a reliable answer to the question(s) addressed.” • “The sample size is usually determined by the primary objective of the trial.” • “ Sample size calculation should be explicitly mentioned in the protocol .” (from ICH-E9) Power and sample size • Suppose we want to test if a drug is better than a placebo, or if a higher dose is better than a lower dose. • Sample size: How many patients should we include in our clinical trial, to give ourselves a good chance of detecting any effects of the drug? • Power: Assuming that the drug has an effect, what is the probability that our clinical trial will give a significant result? Sample Size and Power • Sample size is contingent on design, analysis plan, and outcome • With the wrong sample size, you will either – Not be able to make conclusions because the study is “underpowered” – Waste time and money because your study is larger than it needed to be to answer the question of interest Sample Size and Power • With wrong sample size, you might have problems interpreting your result: – Did I not find a significant result because the treatment does not work, or because my sample size is too small? – Did the treatment REALLY work, or is the effect I saw too small to warrant further consideration of this treatment? – Issue of CLINICAL versus STATISTICAL significance Sample Size and Power • Sample size ALWAYS requires the investigator to make some assumptions – How much better do you expect the experimental therapy group to perform than the standard therapy groups? – How much variability do we expect in measurements? – What would be a clinically relevant improvement? • The statistician CANNOT tell what these numbers should be • It is the responsibility of the clinical investigators to define these parameters Errors with Hypothesis Testing Accept H0 Reject H0 E=C OK Type I error EC Type II error OK Ho: E=C (ineffective) H1: E≠C (effective) Type I Error Concluding for alternative hypothesis while nullhypothesis is true (false positive) • Probability of Type I error = significance level or level • Value needs to be specified in the protocol and should be small • Explicit guidance from regulatory authorities, e.g. 0.05 • One-sided or two-sided ? • Ho: ineffective vs. Ha: effective Type II Error • Concluding for null-hypothesis while alternative hypothesis is true (false negative) • Probability of Type II error = level • Value to be chosen by the sponsor, and should be small • No explicit guidance from regulatory authorities, e.g. 0.20, or 0.10 • Power =1 - = 1 - Type II error rate = P(reject H0 when Ha is true) =P(concluding that the drug is effective when it is) • Typical values 0.80 or 0.90 Sample Size Calculation • Following items should be specified – a primary variable – the statistical test method – the null hypothesis; the alternative hypothesis; the study design – the Type I error – the Type II error – way how to deal with treatment withdrawals Three common settings 1. Continuous outcome: e.g., number of units of blood transfused, CD4 cell counts, LDL-C. 1. Binary outcome: e.g., response vs. no response, disease vs. no disease 1. Time-to-event outcome: e.g., time to progression, time to death. Continuous outcomes • Easiest to discuss • Sample size depends on – Δ: difference under the null hypothesis – α: type 1 error – β type 2 error – σ: standard deviation – r: ratio of number of patients in the two groups (usually r = 1) One-sample • Assume that we wish to study a hypothesis related to a sample of independent and identically distributed normal random variables with mean and standard deviation . Assuming is known, a (1-)100% confidence interval for is given by YZ( ) 2 n EZ( ) 2 n • The maximum error in estimating is which is a function of n. If we want to pre specify E, then n can be chosen according to Z( )22 n 2 2 E • This takes care of the problem of determining the sample size required to attain a certain precision in estimating using a confidence interval at a certain confidence level. Since a confidence interval is equivalent to a test, the above procedure can also be used to test a null hypothesis. Suppose thus that we wish to test the hypothesis – H 0: μ 1 = μ 0 – H a: μ a > μ 0 • with a significance level . For a specific alternative H1: μa = μ0 + where >0, the power of the test is given by Two samples • H 0: μ 1 – μ 2 = 0 • H1: μ1 – μ2 ≠≠ 0 • With known variances 1 and 2, the power against a specific alternative μ1 = μ2 + is given by • is normal so PZ( ) ZZ( ) 2 d 2 d Z() Z( ) 2 d 2 2 Z ( ) Z ( ) 1 2 2 n 2 2 • In the one sided case we have 2 2 2 Z ( ) Z ( ) 1 2 n 2 Variants of two-sample formula For testing the difference in two means, with (equal) variance and equal allocation to each arm: ( Z Z ) 2 n 2 1 2 2 With Unequal allocation to each arm, where n2 = n1 ( Z Z ) r 1 2 n 2 1 r 2 2 If variance unknown, replace by t-distribution One proportion We want to test • H 0 : P = P0 • Ha: P >P1 with a certain power (1-) and with a certain significance level (). The required sample of size is P P Z ( 1 P ) Z ( 1 P ) n 2 0 0 p p 2 02 1 0 1 1 Two proportion We want to test • H 0 : P1 = P2 • H a : P1 ≠ P2 with a certain power (1-) and with a certain significance level (). The required sample of size is Z ( / 2 )P ( 1 P ) Z P ( 1 P ) P ( 1 P ) n , 2 1 2 1 p p 2 ( P P ) 1 2 P 2 1 2 2 Analysis of variance We want to to test • H0: 1 = 2 = …= k • Ha: at least one i is not zero Take: k 1 , k ( n 1 ) N k , n F F ( , , ), , 2 2 ( ) ( 2 ) ( )( 2 1 ) F Z ( ) ( )F ( 2 2 ) 1 * A 2 k i 2 2 1 2 22 2 1 22 2 1 1 1 1 1 2 * A 2 2 2 1 The last equation is difficult to solve but there are tables. * A StudySize Survival analysis • Assume we plan a RCT aiming at comparing a new treatment (drug) with an old one (control). Such a comparison can be made in terms of the hazard ratio or some function of this. Let d stand for the drug and c for control • In the sequel we describe the sample size required to demonstrate that the drug is better than the control treatment. This will be made under some assumptions in two steps • Step 1: specify the number of events needed • Step 2: specify the number of patients needed to obtain the “the right” number of events. Step 1 In what follows we will use the notation below to give a formula specifying the number of events needed to have a certain asymptotic power in many different cases like The log-rank test – the score test – parametric tests based on the exponential distribution – etc. p The proportion of individual receiveing the new drug q 1 p The proportion of individual receivein the old dru d the required number of events C The critical value of the test Z The upper quantile of the standard normal distrib on C Z d 2 power 2 pq • It is not clear how should estimated. The difficulty lies in the fact that this depends on the exact situation at hand. We illustrate that through some examples • EXAMPLE 1 Assume that our observations follow exp() under the old drug (control) and exp(exp) under the new drug. Then the ratio of medians of the two distributions will be M 2 1 d ln M e ln 2 e c • To be able to discover a 50% increase in the median we take M 1 3 2 d e M 3 c e 2 In general we can use Taylor expansion to derive the following formula e e S t S t ; 1 F t 1 F t b c b c log 1 F t F t b b e F log 1 F t t c c We can thus interpret e in terms of how more likely the event A: (event before t in the drug group) is compared to the event B: (event before t in the control group) COMMENTS • To choose one has to have an idea about what is possible to achieve which depends on the nature of the disease. e • With hypertension values of between 0.87 and 1.15 are assumed to be equivalent while with AIDS and cancer no therapy can do more than reduce the risk by 50% • EXAMPLE 3 In a clinical trial we want to compare an old therapy with 5 years survival rate of 30% with a new one and hope that the survival rate of the new therapy is 45%. In formulas e e S 5 S 5 ; 0 . 45 0 . 30 b c 0 . 45 log e e 0 . 663 log 0 . 30 • We perform a two sided test with =0.05 and want to achieve 90% power of discovering a risk ratio of 0.663. Moreover we assume that p=q=0.5. C Z 1 . 96 1 . 28 d 249 eve 1 2 pq 1 0 . 663 2 0 . 05 0 . 10 2 2 2 2 1. It is possible to dress a table specifying as a function of the remaing parameters when =0.05 and p=q and the test is two sided. 2. Observe that e and e lead to the same number of events. 3. The largest power is obtained when p=q. 4. When p and q are not equal divide the table value by 4pq. 5. For a one sided test, only C needs to be changed from 1.96 to 1.64. e power 1.15 1.25 1.50 1.75 2.00 0.5 787 309 94 49 32 0.7 1264 496 150 79 51 0.8 1607 631 191 100 65 0.9 2152 844 256 134 88 The number of events needed for a certain power Under equal allocation. Reminder • Step 1: specify the number of events needed • Step 2: specify the number of patients needed to obtain the “the right” number of events. Step 2 In what follows we consider how we can specify the number of patients needed to get the “right” number of events needed to have a certain asymptotic power. In general this will depend on 1. The length of the recruitment period T 2. The recruitment rate R 3. The follow up time F 4. The survival function for the control group • • d = 2N x P(event) )2N => d/P(event) P(event) is a function of the length of total follow-up at time of analysis and the average hazard rate. This step is often performed on an ad hoc basis and therefore a call for caution is motivated also in various written documents such as the statistical part of a protocol. Formulations should be in the spirit of the following example “For a power of 80% to detect a 50% improvement in survival, using a two sided test at = 0.05, the study needs 191 events. We ANTICIPATE that this will require the enrollment of ….” Such formulations are not popular but one has to be as close to the truth as possible. Dose response • In this type of studies we are interested in whether some doses exhibit some good properties vs. placebo. The null hypothesis is of the form H0: 0 = 1 = …= k . • The alternative hypothesis can take many different forms for example: H0: 0 < 1 <…< k . Different cases of interest can be visualized as in the figures in the next slide. One possibility is to use linear contrasts k k j 1 j 1 H cjj 0, cj 0, a : k k j 1 j 1 ˆ cY, C cjj, C j j Crossover Designs • In this type of studies we are interested in testing a null hypothesis of the form – H0: T = P against – Ha: T ≠ P . Placebo Arm k=1 Arm k=2 Period j=1 Drug Period j=2 Arm k=1 T11 T21 T12 T22 Period j=1 Drug Period j=2 Placebo Crossover design Arm k=2 A general model we can use for the response of the ith individual in the kth arm at the jth period is where S and e are independent random effects with means zero and standard deviations and . The above hypotheses can be tested using the statistic Td, which can also be used to determine the power of the test against a certain hypothesis of the type S Assuming equal sample sizes in both arms, and that d can be estimated from old studies we have (*) where stands for the clinically meaningful difference. (*) can now be sued to determine n. Interval Hypothesis for equivalence • If we reject the null hypothesis as formulated above we conclude that the treatments are nonequivalent. In case we cannot reject the null hypothesis, can we say that the treatments are equivalent? The answer is NO! To deal with this we formulate non-equivalence as our null hypothesis and try to reject this. This leads to the following TWO ONE SIDED TESTS (TOST) formulation – H0: T - P ≥ L – against – Ha1: T - P > L - H0: T - P ≤ U - against - Ha2: T - P < U • Where L and U are clinically meaningful limits of equivalence. L T - P U • Schuirman’s test • Reject H0: non-equivalence if • Can be worked by hand but easier to use some software. Software • For common situations, software is available • Software available for purchase – – – – – NQuery StudySize PASS Power and Precision Etc….. • Online free software available on web search • • • http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize http://calculators.stat.ucla.edu http://hedwig.mgh.harvard.edu/sample_size/size.html Questions or Comments?