OVERVIEW: The binomial distribution is frequently useful in situations where there are two outcomes of interest, such as success or failure. It is often used to model real-life situations, and it finds its way into many extremely useful and important statistical applications and computations.
The binomial setting:
(1) Each observation is in one of two categories: success or failure.
(2) A fixed number, N, of observations.
(3) Observations are independent .
(Knowing the result of one observation tells you nothing about the other observations.)
(4) The probability of success is the same for each observation.
If a count, X, has a binomial distribution with number of observations, N, and probability of success, p, then
Mean(X) =
x
= np
Standard Deviation(X) =
x
= np ( 1
p )
The probability that one will get exactly k successes is
N
C k p k
(1-p)
N-k
, or you can use binompdf(n, p, X)
Example:
I roll a single die 60 times. If X represents the number of times I roll a "3", then
x
= 60(1/6) = 10.
x
= sqrt[60(1/6)(5/6)] = 2.89.
The probability that I will roll exactly ten 3's is
60
C
10
(1/6) 10 (5/6) 50 = .1370 = 13.7%.
Or binompdf (60,1/6,10) = .1370
The probability that I will roll at least ten 3’s is binomcdf(60, 1/6, 10) = .5834
Example (Small sample size from large population. Use of binomial distribution is appropriate.):
Assume that 30% of a population is Hispanic. A random sample of size 4 is chosen from this population. If X is the number of Hispanics in the sample, then
x
= (4)(.3) = 1.2
x
= sqrt[(4)(.3)(.7)] = 0.9165
Prob(X=0) =
4
C
0
(.3)
0
(.7)
4
= 0.2401
Prob(X=1) =
4
C
1
(.3)
1
(.7)
3
= 0.4116
Prob(X=2) =
4
C
2
(.3)
2
(.7)
2
= 0.2646
Prob(X=3) =
4
C
3
(.3)
3
(.7)
1
= 0.0756
Prob(X=4) =
4
C
4
(.3)
4
(.7)
0
= 0.0081
The probability that a sample would contain two or fewer Hispanics is Prob(X=0) +
Prob(X=1) + Prob(X=2) = 0.2401 + 0.4116 + 0.2646 = 0.9163 = 91.63%.
The TI-83 can be very useful in calculating binomial probabilities. For instance, the probability that the sample contains exactly 2 Hispanics is binompdf( 4,.3,2) = .2646.
The probability that the sample contains 2 or fewer Hispanics is binomcdf( 4,.3,2) =
.9163.
NOTE: It is important to understand when one has a binomial setting, and when one doesn't have such a setting. For instance, consider a regular shuffled deck of 52 cards.
Setting #1: I pick a card at random and note whether or not it is a heart. I put the card back in the deck, thoroughly shuffle the deck, and then randomly pick a card. Again, I note whether or not it is a heart. I repeat this process ten times. If X is the total number of hearts I obtained in the ten trials, then this is a binomial setting. Possible values for X are
0,1,2,3,4,5,6,7,8,9,10. [N = 10, p = 0.25, and each observation is independent of the previous ones.]
Setting #2: Basically the same situation, except that I do not put the randomly picked card back in the deck before each reshuffling. After ten trials, possible values for X are
0,1,2,3,4,5,6,7,8,9,10. However, this is not a binomial setting. Each observation after the first is not independent of the previous one.
Normal Approximation for Binomial Setting:
If the number of trials, n , gets too large to deal with binomial probabilities, we can approximate the distribution using the Normal approximation. To use this, be sure that np
10 and n(1 – p)
10 before you do this.
OVERVIEW: The geometric setting is somewhat similar to the binomial setting, the basic difference being that the geometric setting does not have a fixed number of observations and you are looking for the first time a success occurs.
The geometric setting:
(1) Each observation is in one of two categories: success or failure.
(2) The probability of success is the same for each observation.
(3) Observations are independent. (Knowing the result of one observation tells you nothing about the other observations.)
(4) The variable of interest is the number of trials required to obtain the first success.
Example:
Question: How many times would you expect to have to roll a single die in order to get a
"6"?
Here is a simulation approach (ten trials) using randInt(1,6,15) on the TI-83.
Trial 1: 3 5 2 2 3 6 ... (6 rolls)
Trail 2: 1 3 3 2 1 4 2 1 1 6 ... (10 rolls)
Trial 3: 5 1 5 6 ... (4 rolls)
Trial 4: 4 1 4 3 5 ... (5 rolls)
Trial 5: 4 5 3 4 6 ... (5 rolls)
Trial 6: 5 5 1 3 3 2 2 1 3 6 ... (10 rolls)
Trial 7: 5 3 4 1 6 ... (5 rolls)
Trial 8: 5 2 4 2 1 6 ... (6 rolls)
Trial 9: 4 5 1 4 6 ... (5 rolls)
Trial 10: 2 5 4 2 1 2 3 4 6 (9 rolls)
The mean number of rolls for the 10 trials is 6.5.
*hint: make sure you can do a simulation of a geometric distribution, including creating a histogram!
8.2 Formulas
Mean of a geometric distribution: x
Variance of a geometric distribution:
2 x
1 p
1
p
2 p
Standard deviation of a geometric distribution:
x
1
p p
2
* note: these formulas are not on your formula sheet, but will be provided on the test day
In the die-rolling example, the probability of rolling a 3 is 1/6. Hence, the expected number of rolls before the first success is 1/(1/6) = 6.
In the present California Lottery, one chooses 6 numbers from the set 1,2,3,...,49,50,51. The
State of California then randomly selects 6 numbers from the set. If you happen to match the six numbers chosen by the State, you win millions of dollars. Your probability of matching six is
1/(
51
C
6
)= 1/18,009,460.
That is, you would expect to have to play 18,009,460 times to get your first success. If you played once a week, you would expect your first success after 346,336 weeks, or 6,660 years.
Good luck!
Geometpdf will give the probability of an event first occurring after a specific number of trials [geometpdf(p, X)]
Geometcdf will give the probability of an event first occurring within a specific number of trials [geometcdf(p, X)]