Approximation of the Poisson, Binomial and Hypergeometric Distributions by the

advertisement
251distrex5 4/11/06 (Open this document in 'Page Layout' view!)
Approximation of the Poisson, Binomial and Hypergeometric Distributions by the
Normal Distribution.
4. Normal Approximation to the Binomial Distribution.
If an appropriate Binomial table is not available, it is common to use the Normal distribution in place of the
Binomial with   np and   npq
q  1  p  . Usually this is done when
n is large, but there are
rules of thumb available to decide on the appropriateness of a substitution.
The traditional criterion is np  5 and nq  5 , which is easy to remember because   np is the expected
number of successes and nq  n   is the expected number of failures. A more modern criterion
0    3  n , probably works better, but is harder to remember.
a. Without Continuity Correction.
There is a problem with this sort of substitution which will be handled in section b below.
Let us try to find the Binomial probability P5  x  15  when n  20 and p  .4 and let us assume that
we do not have a Binomial table. First the criteria. If we use the traditional criteria, we note that
  np  20.4  8 and that nq  20  8  12 . Since these are both above 5, we can use the Normal
distribution. If we wish to use the more modern criteria, find   npq  8.6  4.8  2.191 so
  3  8  32.191   8  6.573 or 1.427 and 14.573. Since these numbers are between zero and n  20 ,
we can use the Normal distribution.
x   x  np x  8
We transform x into z 



npq 2.191
15  8 
 58
z
 P 1.37  z  3.19   P1.37  z  0  P0  z  3.19 
P5  x  15   P 
2.191 
 2.191
= .4147 + .4993 = .9140
b. With Continuity Correction.
There is a basic incompatibility between the Normal distribution, which is continuous, and the Binomial
distribution, which is discrete. Actually, individual probabilities like P5 are undefined in a continuous
distribution. However, we can approximate the binomial probability P5 by the Normal probability
P4.5  x  5.5 . If we continue in this vein, we will expand each interval by lowering its lower limit by
0.5 and raising its upper limit by 0.5. So the Binomial problem P5  x  15  is approximated by the
Normal problem P4.5  x  15 .5 . If we do this
15 .5  8 
 4.5  8
z
 P 1.60  z  3.42   P1.60  z  0  P0  z  3.42  =
P4.5  x  15 .5  P 
2.191 
 2.191
.4452 + .4997 = .9549
If n  20 and p  .4 , the Binomial table gives us P5  x  15   Px  15   Px  4  .99968  .05095
= .94873, so that our error with the continuity correction was below 0.7%.
As n gets larger, individual probabilities become quite small and the continuity correction becomes
negligible. The only rule of thumb that I have seen on this is that one should always use the correction if
npq  9 .
c. Extensions
The Binomial distribution is often expressed as the probabilities of a proportion of successes p 
the mean is p and the standard deviation is
x
, when
n
pq
. This means that for a Binomial distribution with n  20
n
p p p
p  .4
.4.6 
 0.1095445 and z 


.
20

pq 0.1095445
n
Our Binomial problem P5  x  15  becomes the problem
and p  .4 , we have   p  .4,  
.75  .4 
 .25  .4
z
 P 1.37  z  3.20  = .4147 + .4993 = .9140.
P.25  p  .75   P 
0.1095445 
 0.1095445
0.5 0.5

 .025 . The
If we insist on a continuity correction (and we should), we expand the interval by
n
20
.775  .4 
 .225  .4
z
expression becomes the Normal problem P.225  p  .775   P 
0.1095445 
 0.1095445
 P1.60  z  3.42   P1.60  z  0  P0  z  3.42  = .4452 + .4997 = .9549.
If we have a Hypergeometric problem, we can exploit the similarity of the Hypergeometric distribution to
N n
M
npq .
the Binomial distribution by using our original value of x with p 
,   np and  
N 1
N
x
A great advantage is that we can also work with p  with mean of p and the standard deviation is
n
N n
 N  n  pq
. But note that if N  20 n , the finite population correction
would be likely to


N 1
N

1
n


have little effect, so that we may be better off using the Binomial distribution or the Normal approximation
to it.
5. Normal approximation to the Poisson Distribution.
If an appropriate Poisson table is not available, it is common to use the Normal distribution in place of the
Poisson with   m and   m , where m is the parameter (mean and variance) of the Poisson
distribution. The rule of thumb is that this works fairly well if if m  25 . For example, let us find
P5  x  15  , when m  25 . Obviously, we are on the edge of the acceptable values for the parameter,
but let’s try.
x   x  m x  25


We transform x into z 
. Without the continuity correction P5  x  15 

5
m
15  25 
 5  25
 P
z
 P 4.00  z  2.00   P4.00  z  0  P2.00  z  0  .5000  .4772
5 
 5
 .0228 . With the continuity correction we will expand each interval by lowering its lower limit by 0.5 and
raising its upper limit by 0.5. So the Normal problem P5  x  15  is approximated by the Poisson
15 .5  25 
 4.5  25
z
problem P4.5  x  15 .5 . If we do this we get P4.5  x  15 .5  P 

5
5


 P4.10  z  1.90   P4.10  z  0  P1.90  z  0  .5000  .4713  .0287 . The Poisson table
with m  25 gives us P5  x  15  Px  15   Px  4  .02229  .00000  .02229 , so that our error
without the continuity correction was about 2% and with it was much larger. It seems that the continuity
correction works badly for very high or low values of x .
Download