Chapter 8 Day 3: Percentiles and Approximating Binomial

advertisement
Chapter 8 Day 3: Percentiles and Approximating
Binomial Distribution Probabilties
Finding Percentiles: Percentile rankings are quick ways
to compare large groups of people, Ex. SAT’s, GRE’s,
GMATS, etc.
kth Percentile for a data set is a number that has k% of the
data values at or below it. (Same is true for Random
Variables).
*Often we are interested in what numerical value falls at a
certain percentile*
Percentile: Refers to the value of a variable
Percentile Ranking: Refers to the proportion below that
value.
Ex. If the 75th percentile for GRE Verbal scores is 600,
then 75% of GRE Verbal scores are below 600 and 25%
are above 600. The percentile is 600 and the percentile
ranking is 75%.
*The Percentile rank for the value of a variable
corresponds to the cumulative probability for that value.
Ex. The 75th percentile of Verbal GRE scores is the
Verbal score for which .75 is the cumulative probability
(area to the left under the density curve).
Finding Percentiles for a specified percentile ranking
1. Find the z-score that has the specified cumulative
probability. (Search your table).
2. Calculate the value of the variable that has the zscore found in step 1. This can be done by using the
relationship x  z  
Ex. IQ scores are normally distributed with a mean of
100 and a standard deviation of 15. What is the 80th
percentile for IQ scores? In other words what is the IQ
score x such that P ( IQ  x)  .80 ?
1. Draw a picture.
2. Use your table on 612-613 to find the associated zscore.
3. Calculate x.
*Z-score of 0.85 correlates with the 80th percentile.
x
x  100
z
0.85 
So
Implies

15
0.85(15)  X  100
So:
x  100  (.85)(15)  112.75
*Someone with an IQ of 112.75 is more intelligent than
about 80% of the population.
Approximating Binomial Distribution Probabilities
Recall the probability formula for Binomial Random
n!
k
nk
p
(
1

p
)
Variables: P(X=k)= k!(n  k )!
*As n gets large this formula becomes difficult to
compute because of the factorials involved.
*However, the normal distribution can be used to
approximate probabilities for a binomial random variable
when this situation occurs.
Normal Approximation to the Binomial Distribution:
*If X is a binomial random variable based on n trials with
success probability p, and n is large, then the random
variable X is also approximately a normal random
variable.
So, Mean =
  np
Standard Deviation =
  np(1  p)
*In order to use the approximation effectively, both np
and n(1-p) must be at least 10.
Ex. For which of the following situations could a normal
approximation be made for the given binomial
distribution.
Scenario 1: n=32 and p=.6
Scenario 2: n=48 and p=.9
**We could make a normal approximation for Scenario 1,
however, n(1-p)= 4.8 < 10, so we cannot use a normal
approximation for Scenario 2.
Ex. Suppose p = 0.488 is the proportion of one-child
families in which the child is a boy.
For a random sample of n=75 one-child families, estimate
the probability that there will be 40 or fewer boys. Use
the normal approximation to the binomial distribution.
*np=75(.488)=36.6>10
n(1-p)=75(.512)=38.4>10
Therefore we can use the normal approximation.
*We want to find
P( X  40)
Also,   np  36.6
  np(1  p)  36.6(1  .488)  4.33
So, P( X  40)
 (P(Z  404.3336.6 )  P(Z  0.785)  0.7852
About a 79% chance that there will be 40 or fewer boys
out of the 75 families.
Continuity Corrections:
Ex. Draw the exact Binomial pdf for an event that is
normally distributed with 6 outcomes.
Board Example
Notice that Technically, P ( X  4) for the binomial
distribution is the area of 4 rectangles. Also notice that
the rectangle centered at 4 goes all the way out to 4.5.
*However, our normal approximation for the binomial
variable found the area under a normal curve going only
up to 4. (So we omitted half the original rectangle from
the binomial pdf).
*To make better predictions with our normal
approximation to the binomial we need to make a
continuity correction by either adding or subtracting 0.5.
Ex. Suppose a fair coin is flipped 200 times. Let X= # of
Heads. (Notice that X will have a binomial distribution).
a) Calculate the mean and standard deviation for
X= # of Heads.
Mean =   E ( X )  np  200(0.5)  100 Heads
Standard Deviation =
  np(1  p)  100(0.5)  50  7.07
b) Use the normal approximation to the binomial
distribution to estimate the probability that the
number of heads is greater than or equal to 120.
P( X  120)  1  P( X  120)  1  P( Z 
x

120  100
 1  P( Z 
)  1  P( Z  2.83)
7.07
P( Z  120)  1  .9977  0.0023
)
c) Repeat part (b) using the continuity correction.
Board Example.
**Subtract 0.5 from 120 because technically we only
want to subtract out everything less than 120.
Therefore
So,
P( Z 
119.5  100
)  P( Z  2.76)  0.9971
7.07
P( Z  120)  1  0.9971  0.0029
Section 8.8 Sums, Differences, and Combinations of
Random Variables.
A Linear Combination of random variables X, Y, … is a
combination of the form:
L = aX + bY + . . .
Where a, b, etc. are numbers, which could be positive or
negative. Two most common are:
Sum = X + Y
Difference = X – Y
*If X, Y, . . . are random variables, a, b, . . . are numbers,
either positive or negative, and L= aX + bY + . . .
The mean of L is
Mean (L)= a Mean(X) + b Mean (Y) + …
Also:
Mean (X + Y)= Mean (X) + Mean (Y)
Mean (X – Y) = Mean (X) – Mean (Y)
Ex. Suppose X= Height of Females in MA 2830
Y= Height of Males in MA 2830
*The Mean height of students in MA 2830 is going to be
a weighted mean (Because there are more girls than boys
in the class).
So if L represents the entire MA 2830 class:
Mean (L)= a Mean (Female Heights) + b Mean (Male
Heights)
**We could also look at the differences in the heights of
the men and women.
Mean (X – Y) = Mean (Females Heights) – Mean (Male
Heights).
*Suppose that MA 2830 is 70% Female and that the Mean
Height of Females is 65 inches and the Mean Height of
Males in 2830 is 70 inches.
a) What is the Mean Height of MA 2830 students?
Mean (L)= a Mean(X) + b Mean (Y)
= 0.7 (65) + 0.3 (70) = 66.5 inches tall
b) What is the Mean Difference in Heights between
females and males in MA 2830?
Mean (X – Y)= Mean (X) – Mean (Y)
= 65 – 70 = 5 inches.
If X and Y are independent random variables, a, b, etc.
are numbers, and L = aX + bY + …
Then, Variance and Standard Deviation of L are:
Variance(L)= a2 Variance(X) + b2Variance (Y) + …
Standard Dev. (L)=
Variance(L)
**Notice**
Variance (X + Y) = Variance (X) + Variance (Y)
Variance (X – Y)= Variance (X) + Variance (Y)
** They are equal because in the difference formula b= -1
and b2 = +1.
Combining Independent Normal or Binomial Random
Variables
*Any linear combination of normally distributed variables
also has a normal distribution*
If X, Y, are independent, normally distributed random
variables, and a, b, etc. are numbers, either positive or
negative, then the random variable L = aX + bY + . . . is
normally distributed and:
*X + Y is normally distributed with mean  x   y and
2
2
standard deviation  x   y
*X – Y is normally distributed withi mean  x   y and
2
2
standard deviation  x   y
Ex. You have recently become lackadaisical about
making it to your Statistics class on time. You leave
home 35 minutes before class is set to start. Your travel
time from your front door to the parking lot at school is
normally distributed with a mean of 20 minutes and a
standard deviation of 4 minutes. The time it takes to park
and then walk to class is also normally distributed with a
mean of 7 minutes and a standard deviation of 3 minutes.
The driving time and parking/walking time are
independent of one another. What is the probability that
you will walk in late to class thereby gaining the eternal
angst of the instructor.
X= Driving Time; normally distributed with  x  20
Minutes and  x  4 minutes.
Y= Parking/Walking Time; normally distributed with
 y  7 minutes and  y  3 minutes.
T= X + Y = Total Time
*Notice that the random variable T has a normal
distribution since it is the sum of two independent,
normally distributed random variables.
Mean (T) =
   x   y = 20 + 7 = 27 minutes
Standard Deviation =    2 x   2 y = 42  32  25  5
P(T>35)=1-P(T<35) = 1-
P( Z 
35  27
)=
5
1- P( Z  1.6)
= 1 - .9452= .0548
Therefore, there is a 5.5% chance you will be late for
class.
Download