16. the normal approximation to the binomial distribution

advertisement
16. THE NORMAL APPROXIMATION
TO THE BINOMIAL DISTRIBUTION
It is sometimes difficult to directly compute probabilities for a
binomial (n, p) random variable, X. We need a different table for
each value of n, p. If we don't have a table, direct calculations
can get cumbersome very quickly.
Eg: Compute P(X ≤ 100) for n = 150, p = 0.35.
For normal random variables, on the other hand, probability
calculations are extremely easy; just one table is required.
Fortunately, we can approximate the binomial distribution by a
normal distribution, with an appropriate choice of µ and σ.
To get a feel for why this might work, let's study the Quincunx.
The Quincunx is a device invented by Sir Francis Galton in the
1800’s which shows empirically that binomial random variables,
observed repeatedly, reveal a histogram which looks bell-shaped, as
long as the number of trials is not too small.
See Quincunx website at:
http://www.rand.org/methodology/stat/applets/clt.html
• In general, the distribution of a binomial random variable may be
accurately approximated by that of a normal random variable,
as long as np ≥ 5, nq ≥ 5, and assuming that a “continuity
correction” is made to account for the fact that we are using a
continuous distribution (the normal) to approximate a discrete one
(the binomial).
• For approximating the distribution of X, we will use the
normal distribution with mean µ = np, variance σ2 = npq,
where q = 1 − p. Why are these reasonable choices of µ, σ2?
To study the quality of this approximation, visit the Normal
Approximation to the Binomial website at:
http://www.stat.sc.edu/~west/applets/binomialdemo2.html
This draws a bar chart of the binomial distribution for a given n, p,
and superimposes the approximating normal distribution. Note how
skewness increases as p moves away from 0.5.
See histograms of number of dark M&Ms and orange
M&Ms from M&M Lab. (Separate handout).
• If p(x) is the binomial distribution and f (x) is the density of
the normal, the approximation is:
a + 12
p (a ) ≈ ∫ f ( x)dx
a − 12
b
b + 12
x=a
a − 12
∑ p ( x) ≈ ∫ f ( x)dx
Thus, the binomial probability p(a) is approximately equal to the
probability that a normal RV with mean np and variance npq lies
between x = a − 1/2 and x = a + 1/2.
Also, P(a ≤ X ≤ b) is approximately equal to the area under the
normal curve between x = a − 1/2 and x = b + 1/2.
• The “continuity correction” is the use of a − 1/2, b + 1/2 in
the normal approximation. This ensures that probabilities are
always approximated by areas under the normal curve. It can
dramatically improve the quality of the approximation, even
when n is large, so it should be used whenever possible.
In the diagram above, the bars represent the binomial distribution
with n = 10, p = 0.5. The superimposed curve is a normal density
f(x). The mean of the normal is µ = np = 5, and the standard
deviation is
σ = 10(0.5)(0.5) = 158
.
Suppose we wish to find p(4), the probability that the binomial
equals 4.
From Table 2 of Appendix B, we get
p(4) = 0.3770 − 0.1719 = 0.2051.
This is the exact probability, but we won’t always be so lucky as
to have a binomial table for the given n and p.
So let’s try the normal approximation.
Using the normal approximation, we need to calculate the
probability that our normal is between 3.5 and 4.5. The
corresponding z-scores are
(3.5 − 5)/1.58 = −0.95 and (4.5 − 5)/1.58 = −0.32.
Thus, the normal approximation to p(4) is
Pr(0.32 < Std Normal < 0.95) = 0.3289 − 0.1255 = 0.2034.
This is quite close to the actual value, p(4) = 0.2051.
If we hadn’t used the continuity correction, our approximation
to p(4) would be zero, that is, the area under the normal curve
between 4 and 4. This would be a very poor approximation
indeed!
As the diagram shows, the area under the normal density
between 3.5 and 4.5 provides a reasonable approximation to the
height of the bar, p(4). This should make it clear why the
continuity correction is helpful.
Eg: Tomorrow morning’s Iberia flight to Madrid can seat
370 passengers. From past experience, Iberia knows that the
probability is 0.90 that a given ticket-holder will show up for
the flight. They have sold 400 tickets, deliberately
overbooking the flight. How confident can Iberia be that no
passenger will need to be “bumped” (denied boarding)?
Solution: We will assume that the number (X) of passengers
showing up for the flight has a binomial distribution with mean µ
µ = (400)(0.9) = 360 and standard deviation
σ = 400(0.9)(01
. )=6
(Is this reasonable?)
We want Pr[X ≤ 370]. We approximate this by the probability
that our normal RV is less than 370.5. This is the probability that
a standard normal is less than z = (370.5 − 360)/6 = 1.75. So the
probability that nobody gets bumped is approximately
0.5 + 0.4599 = 0.9599. (Almost 96%).
Eg: What is the probability that you will win at least $10 after
playing 100 games of craps for $1 per game?
Solution: To win at least $10, you must win at least 55 games.
The number (X) of games you will win has a binomial distribution
with n = 100, p = .493.
Therefore, X has mean µ = 49.3 and standard deviation
σ = 100(.493)(.507) = 5.00.
We want Pr[X ≥ 55] = Pr[Std. Normal > (54.5 – 49.3)/5]
= Pr[Std. Normal >1.04] = .5 – .3508 = .1492.
(Just a 15% chance!)
[Forecasting Lab Results]
Download