THE FAMOUS BONFERRONI INEQUALITY ΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘ It is well known to most statisticians that testing enough hypotheses will lead to an eventual rejection. For instance, if you test 100 hypotheses at the 5% level, you can expect to reject about five of them, even if all are true. We need some sort of safeguard against this. A simple and easy-to-apply method is provided by the Bonferroni inequality. Suppose that A1, A2, …, Am are events. In the application we have in mind, there are null hypotheses H1, H2, …, Hm and event Aj is “Reject null hypothesis Hj .” Since it is possible that all the null hypotheses are true, we must assure that P(A1 A2 … Am) in order to protect our level of significance. The Bonferroni inequality says that P(A1 A2 … Am) P(A1) + P(A2) + … + P(Am) This is very easy to prove. Prove this first with m = 2 and then use induction. The application is as follows. To get an overall level , we will just test each hypothesis at level . Thus m P(A1 A2 … Am) P(A1) + P(A2) + … + P(Am) = ... = α m m m Many comments.... for the individual hypotheses. Any set of m m non-negative numbers adding up to will do. Of course, if you don’t use , you’ll have m to tell an interesting story. (1) It is not necessary to use level (2) There is no requirement for the hypotheses (or the statistics which test them) to be independent. Indeed, the marvelous feature of the Bonferroni inequality is that it can be used in problems with very complicated dependence structures. Θ Page 1 gs2011 THE FAMOUS BONFERRONI INEQUALITY ΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘ (3) If in fact the hypotheses really are independent, the Bonferroni bound is very close to the exact probability. Here’s why. P(reject at least one null hypothesis) = 1 - P(accept all hypotheses) = 1 - P A1 A2 A3 ... Am = 1 - P( A1) P( A2 ) P( A3 ) ... P( Am ) = 1 1 m m 2 3 m m ... = 1 1 m m 2 m 3 m 2 3 m m = ... 2 m 3 m Suppose, for the sake of illustration, that = 0.05 and m = 10. The first term m 10 2 after is 0.005 = – 0.001125. The exact value of 1 1 is very m 2 close to 0.04888987, and this is just under the target of 0.05. is generally not a round number. The usual statistical tables (t, F, m 0.05 chi-squared) are not set up to handle cutoff points like 0.0033. Interpolate as well 15 as you can or use a computer program such as Minitab. (This is not a problem with test statistics that are normal, since the normal table is very finely graduated.) (4) The value (5) Sometimes m is so big (say m = 200) that you feel awkward using such extreme 0.05 cutoff points, even for a normal distribution. For example, = 0.00025, and you do 200 not have sufficient faith in your distributional assumptions that you believe that you could have an accurate 0.00025 point. Swallow your doubts and proceed anyhow, but be aware of the next point. Θ Page 2 gs2011 THE FAMOUS BONFERRONI INEQUALITY ΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘ (6) The Bonferroni method is conservative, but not disastrously so. For instance, Z values of 2.37 would be judged non-significant in a Bonferroni environment, but a Z value of 5.10 would easily survive as significant, even when m is reasonably large. For example, in the case above with α = 0.05 and m = 200, the one-sided normal point is z0.00025 = 3.48. The two-sided normal point is z0.000125 = 3.66. (7) The Bonferroni method was designed to protect the family of null hypotheses {H1, H2, …, Hm}. Suppose that the Bonferroni method allows you to reject one or more of these hypotheses. How stringent do you have to be with those that remain? Suppose that m = 10 and = 0.05, so that the individual tests are being done at level = 0.005. m Suppose that three of the tests are easily rejected at this stringent level. How stringent do you have to be with the remaining seven? Is really required? The best answer is that m a stepwise Bonferroni method may be used. Among your m tests, find the one which is most extreme (smallest p-value). If you cannot reject the corresponding hypothesis at level , then stop. If you can reject it, then use level for the m - 1 remaining m m 1 hypotheses. Then continue with a family of m - 1 hypotheses, and repeat the procedure. (8) It is surprising how few statisticians are aware of the procedure noted in (7). (9) Suppose that you really do believe these stories about the evils of testing too many hypotheses. Suppose that your day’s work will involve the testing of 50 hypotheses, brought to you by 50 different researchers. You decide to follow the 0.05 Bonferroni methodology and test each hypothesis at level = 0.001. It happens that 50 only one of the hypotheses gets rejected (having a t statistic of 87.44 on 22 df), and the remaining 49, even using the procedure in (7), are all accepted. It happens that several of the 49 researchers are outraged. In particular, Professor Wickingham produced a Z-score of 2.77. He argues that you failed to reject his null hypothesis simply because a number of other researchers happened to ask you questions on the same day! What should be done? The statistical concensus is that the Bonferroni method should not be extended across analyses. Certainly unrelated problems should not be grouped in this manner, and most statisticians would exploit Bonferroni only within a tightly-defined analysis. Typical uses would be comparison of treatments after a significant analysis of variance, selection of the most significant coefficient in a regression, or selection of the most extreme correlation coefficient in a matrix. Θ Page 3 gs2011 THE FAMOUS BONFERRONI INEQUALITY ΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘ (10) The problem discussed here is one of MULTIPLE COMPARISONS. The basic references are Simultaneous Statistical Inference, by Rupert Miller (2nd edition preferred, but not critical) and Multiple Comparison Procedures, by Yosi Hochberg and Ajit Tamhane. Θ Page 4 gs2011