®OF12поRM65поPL52,54,48поJUпоRFA

advertisement
THE FAMOUS BONFERRONI INEQUALITY
ΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘ
It is well known to most statisticians that testing enough hypotheses will lead to an
eventual rejection. For instance, if you test 100 hypotheses at the 5% level, you can
expect to reject about five of them, even if all are true. We need some sort of safeguard
against this. A simple and easy-to-apply method is provided by the Bonferroni
inequality.
Suppose that A1, A2, …, Am are events. In the application we have in mind, there are null
hypotheses H1, H2, …, Hm and event Aj is “Reject null hypothesis Hj .”
Since it is possible that all the null hypotheses are true, we must assure that
P(A1  A2  …  Am)   in order to protect our level of significance.
The Bonferroni inequality says that
P(A1  A2  …  Am)  P(A1) + P(A2) + … + P(Am)
This is very easy to prove. Prove this first with m = 2 and then use induction.
The application is as follows. To get an overall level , we will just test each hypothesis

at level . Thus
m
P(A1  A2  …  Am)
 P(A1) + P(A2) + … + P(Am) =




 ... 
= α
m
m
m
Many comments....

for the individual hypotheses. Any set of m
m

non-negative numbers adding up to  will do. Of course, if you don’t use , you’ll have
m
to tell an interesting story.
(1)
It is not necessary to use level
(2)
There is no requirement for the hypotheses (or the statistics which test them) to be
independent. Indeed, the marvelous feature of the Bonferroni inequality is that it can be
used in problems with very complicated dependence structures.
Θ
Page 1
gs2011
THE FAMOUS BONFERRONI INEQUALITY
ΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘ
(3)
If in fact the hypotheses really are independent, the Bonferroni bound is very
close to the exact probability.
Here’s why.
P(reject at least one null hypothesis)
= 1 - P(accept all hypotheses)
= 1 - P  A1  A2  A3  ...  Am  = 1 - P( A1)  P( A2 )  P( A3 )  ...  P( Am )
 
= 1  1  
 m
m
2
3


 m  
 m  

           ... 
= 1  1  m


m
 2  m 
 3  m 


2
3
 m  
 m  
=             ...
 2  m 
 3  m 
Suppose, for the sake of illustration, that  = 0.05 and m = 10. The first term
m
10 
 
2
after  is    0.005 = – 0.001125. The exact value of 1   1   is very
 m
2 
close to 0.04888987, and this is just under the target of 0.05.

is generally not a round number. The usual statistical tables (t, F,
m
0.05
chi-squared) are not set up to handle cutoff points like
 0.0033. Interpolate as well
15
as you can or use a computer program such as Minitab. (This is not a problem with test
statistics that are normal, since the normal table is very finely graduated.)
(4)
The value
(5)
Sometimes m is so big (say m = 200) that you feel awkward using such extreme
0.05
cutoff points, even for a normal distribution. For example,
= 0.00025, and you do
200
not have sufficient faith in your distributional assumptions that you believe that you
could have an accurate 0.00025 point. Swallow your doubts and proceed anyhow, but be
aware of the next point.
Θ
Page 2
gs2011
THE FAMOUS BONFERRONI INEQUALITY
ΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘ
(6)
The Bonferroni method is conservative, but not disastrously so. For instance, Z
values of 2.37 would be judged non-significant in a Bonferroni environment, but a Z
value of 5.10 would easily survive as significant, even when m is reasonably large.
For example, in the case above with α = 0.05 and m = 200, the one-sided normal point
is z0.00025 = 3.48. The two-sided normal point is z0.000125 = 3.66.
(7)
The Bonferroni method was designed to protect the family of null hypotheses
{H1, H2, …, Hm}. Suppose that the Bonferroni method allows you to reject one or more
of these hypotheses. How stringent do you have to be with those that remain? Suppose

that m = 10 and  = 0.05, so that the individual tests are being done at level
= 0.005.
m
Suppose that three of the tests are easily rejected at this stringent level. How stringent do

you have to be with the remaining seven? Is
really required? The best answer is that
m
a stepwise Bonferroni method may be used. Among your m tests, find the one which is
most extreme (smallest p-value). If you cannot reject the corresponding hypothesis at


level , then stop. If you can reject it, then use level
for the m - 1 remaining
m
m 1
hypotheses. Then continue with a family of m - 1 hypotheses, and repeat the procedure.
(8)
It is surprising how few statisticians are aware of the procedure noted in (7).
(9)
Suppose that you really do believe these stories about the evils of testing too
many hypotheses. Suppose that your day’s work will involve the testing of 50
hypotheses, brought to you by 50 different researchers. You decide to follow the
0.05
Bonferroni methodology and test each hypothesis at level
= 0.001. It happens that
50
only one of the hypotheses gets rejected (having a t statistic of 87.44 on 22 df), and the
remaining 49, even using the procedure in (7), are all accepted. It happens that several of
the 49 researchers are outraged. In particular, Professor Wickingham produced a Z-score
of 2.77. He argues that you failed to reject his null hypothesis simply because a number
of other researchers happened to ask you questions on the same day! What should be
done? The statistical concensus is that the Bonferroni method should not be extended
across analyses. Certainly unrelated problems should not be grouped in this manner, and
most statisticians would exploit Bonferroni only within a tightly-defined analysis.
Typical uses would be comparison of treatments after a significant analysis of variance,
selection of the most significant coefficient in a regression, or selection of the most
extreme correlation coefficient in a matrix.
Θ
Page 3
gs2011
THE FAMOUS BONFERRONI INEQUALITY
ΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘΘ
(10) The problem discussed here is one of MULTIPLE COMPARISONS. The basic
references are Simultaneous Statistical Inference, by Rupert Miller (2nd edition preferred,
but not critical) and Multiple Comparison Procedures, by Yosi Hochberg and Ajit
Tamhane.
Θ
Page 4
gs2011
Download