Multiplicity in Clinical Trials Ziad Taib Biostatistics AstraZeneca March 12, 2012 Issues • The multiplicity problem • Sources of multiplicity in clinical trials • Bonferroni • Holm • Hochberg • Closed test procedures • FDR (Benjamini-Hochberg) The multiplicity problem • When we perform one test there is a risk of % for a false significant result i.e. rejecting H0 (no effect) when it is actually true. • What about the risk for at least one false significant result when performing many tests? Greater or smaller? When performing 20 independent tests, we shall expect to have one significant result even though no difference exists. Probability of at least one false significant result Number of tests 1 2 5 10 50 Probability 0.05 0.0975 0.226 0.401 0.923 P(at least one false positive result) = 1 - P(zero false positive results) = 1 – (1 - .05) ^ k Multiplicity Dimensions • • • • • A. Multiple treatments B. Multiple variables C. Multiple time points D. Interim analyses E. Subgroup analyses The multiplicity problem • Doing a lot of tests will give us significant results just by chance. • We want to find methods to control this risk (error rate). • The same problem arises when considering many confidence intervals simultaneously. Family wise error rate • FWE = probability of observing a false positive finding in any of the tests undertaken • While there may be different opinions about needing to adjust: – Regulatory authorities are concerned about any false claims for the effectiveness of a drug, not just for the claim based on the primary endpoint(s) – So we will need to demonstrate adequate control of the FWE rate • Its not just about the p-value! – True! estimates and confidence intervals are important too – Ideally, multiplicity methods need to handle these as well Procedures for controlling the probability of false significances • • • • • Bonferroni Holm Hochberg Closed tests FDR Bonferroni • N different null hypotheses H1, … HN • Calculate corresponding p-values p1, … pN • Reject Hk if and only if pk < /N Variation: The limits may be unequal as long as they sum up to Conservative Bonferroni’s inequality • P(Ai) = P(reject H0i when it is true ) N P A P A N i i NN i 1 i 1 i 1 N N N Reject at least one hypthesis falsely Example of Bonferroni correction • Suppose we have N = 3 t-tests. • Assume target alpha = 0.05. • Bonferroni corrected p-value is alpha/N = 0.05/3 = 0.0167 • Unadjusted p-values are p1 = 0.001; p2 = 0.013; p3 = 0.074 – p1 = 0.001 < 0.0167, so reject H01 – p2 = 0.013 < 0.0167, so reject H02 – p3 = 0.074 > 0.0167, so do not reject H03 Holm • N different null hypotheses H01, … H0N • Calculate corresponding p-values p1, … pN • Order the p-values from the smallest to the largest, p(1) < ….<p(N) • Start with the smallest p-value and reject H(j) as long as p(j) < /(N-j+1) Example of Holm’s test • • • • • Suppose we have N = 3 t-tests. Assume target alpha= 0.05. Unadjusted p-values are p1 = 0.001; p2 = 0.013; p3 = 0.074 For the jth test, calculate alpha(j) = alpha/(N – j +1) – For test j = 1, alpha(1) = 0.05/(3 – 1 + 1)=0.0167 – the observed p1 = 0.001 is less than 0.0167, so we reject the null hypothesis. • For test j = 2, • alpha(2) = 0.05/(3 – 2 + 1) = 0.05 / 2= 0.025 • the observed p2 = 0.013 is less than alpha(j) = 0.025, so we reject the null hypothesis. • For test j = 3, • alpha(3) = 0.05/(3 – 3 + 1) = 0.05 • the observed p3 = 0.074 is greater than alpha(3) = 0.05, so we do not reject the null hypothesis. Hochberg • N different null hypotheses H1, … HN • Calculate corresponding p-values p1, … pN • Order the p-values from the smallest to the largest, p(1) < ….<p(N) • Start with the largest p-value. If p(N) < stop and declare all comparisons significant at level (i.e. reject H(1) … H(N) at level ). Otherwise accept H(N) and go to the next step • if p(N-1) < /2 stop and declare H(1) … H(N-1) significant. Otherwise accept H(N-1) and go to the next step • …. • If p(N-k+1) < /(N-k+1) stop and declare H(1) … H(N-k+1) significant. Otherwise accept H(N-k+1) and go to the next step Closed procedures - stepwise • Pre-specify order of the tested hypothesis. Test on 5% level until non-significant result. • Order of tested hypothesis stated in protocol – Dose-response – Factorial designs Example • Assume we performed N=5 tests of hypothesis simultaneously and want the result to be at the level 0.05. The p-values obtained were p(1) 0.009 p(2) 0.011 p(3) 0.012 p(4) 0.134 p(5) 0.512 • Bonferroni: 0.05/5=0.01. Since only p(1) is less than 0.01 we reject H(1) but accept the remaining hypotheses. • Holm: p(1), p(2) and p(3) are less than 0.05/5, 0.05/4 and 0.05/3 respectively so we reject the corresponding hypotheses H(1), H(2) and H(3). But p(4) = 0.134 > 0.05/2=0.025 so we stop and accept H(3) and H(4). • Hochberg: – 0.512 is not less than 0.05 so we accept H(5) – 0.134 is not less than 0.025 so we accept H(4) – 0.012 is less than 0.0153 so we reject H(1),H(2) and H(3) Questions or Comments?