Chapter 8 Day 3: Percentiles and Approximating Binomial Distribution Probabilties Finding Percentiles: Percentile rankings are quick ways to compare large groups of people, Ex. SAT’s, GRE’s, GMATS, etc. kth Percentile for a data set is a number that has k% of the data values at or below it. (Same is true for Random Variables). *Often we are interested in what numerical value falls at a certain percentile* Percentile: Refers to the value of a variable Percentile Ranking: Refers to the proportion below that value. Ex. If the 75th percentile for GRE Verbal scores is 600, then 75% of GRE Verbal scores are below 600 and 25% are above 600. The percentile is 600 and the percentile ranking is 75%. *The Percentile rank for the value of a variable corresponds to the cumulative probability for that value. Ex. The 75th percentile of Verbal GRE scores is the Verbal score for which .75 is the cumulative probability (area to the left under the density curve). Finding Percentiles for a specified percentile ranking 1. Find the z-score that has the specified cumulative probability. (Search your table). 2. Calculate the value of the variable that has the zscore found in step 1. This can be done by using the relationship x z Ex. IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. What is the 80th percentile for IQ scores? In other words what is the IQ score x such that P ( IQ x) .80 ? 1. Draw a picture. 2. Use your table on 612-613 to find the associated zscore. 3. Calculate x. *Z-score of 0.85 correlates with the 80th percentile. x x 100 z 0.85 So Implies 15 0.85(15) X 100 So: x 100 (.85)(15) 112.75 *Someone with an IQ of 112.75 is more intelligent than about 80% of the population. Approximating Binomial Distribution Probabilities Recall the probability formula for Binomial Random n! k nk p ( 1 p ) Variables: P(X=k)= k!(n k )! *As n gets large this formula becomes difficult to compute because of the factorials involved. *However, the normal distribution can be used to approximate probabilities for a binomial random variable when this situation occurs. Normal Approximation to the Binomial Distribution: *If X is a binomial random variable based on n trials with success probability p, and n is large, then the random variable X is also approximately a normal random variable. So, Mean = np Standard Deviation = np(1 p) *In order to use the approximation effectively, both np and n(1-p) must be at least 10. Ex. For which of the following situations could a normal approximation be made for the given binomial distribution. Scenario 1: n=32 and p=.6 Scenario 2: n=48 and p=.9 **We could make a normal approximation for Scenario 1, however, n(1-p)= 4.8 < 10, so we cannot use a normal approximation for Scenario 2. Ex. Suppose p = 0.488 is the proportion of one-child families in which the child is a boy. For a random sample of n=75 one-child families, estimate the probability that there will be 40 or fewer boys. Use the normal approximation to the binomial distribution. *np=75(.488)=36.6>10 n(1-p)=75(.512)=38.4>10 Therefore we can use the normal approximation. *We want to find P( X 40) Also, np 36.6 np(1 p) 36.6(1 .488) 4.33 So, P( X 40) (P(Z 404.3336.6 ) P(Z 0.785) 0.7852 About a 79% chance that there will be 40 or fewer boys out of the 75 families. Continuity Corrections: Ex. Draw the exact Binomial pdf for an event that is normally distributed with 6 outcomes. Board Example Notice that Technically, P ( X 4) for the binomial distribution is the area of 4 rectangles. Also notice that the rectangle centered at 4 goes all the way out to 4.5. *However, our normal approximation for the binomial variable found the area under a normal curve going only up to 4. (So we omitted half the original rectangle from the binomial pdf). *To make better predictions with our normal approximation to the binomial we need to make a continuity correction by either adding or subtracting 0.5. Ex. Suppose a fair coin is flipped 200 times. Let X= # of Heads. (Notice that X will have a binomial distribution). a) Calculate the mean and standard deviation for X= # of Heads. Mean = E ( X ) np 200(0.5) 100 Heads Standard Deviation = np(1 p) 100(0.5) 50 7.07 b) Use the normal approximation to the binomial distribution to estimate the probability that the number of heads is greater than or equal to 120. P( X 120) 1 P( X 120) 1 P( Z x 120 100 1 P( Z ) 1 P( Z 2.83) 7.07 P( Z 120) 1 .9977 0.0023 ) c) Repeat part (b) using the continuity correction. Board Example. **Subtract 0.5 from 120 because technically we only want to subtract out everything less than 120. Therefore So, P( Z 119.5 100 ) P( Z 2.76) 0.9971 7.07 P( Z 120) 1 0.9971 0.0029 Section 8.8 Sums, Differences, and Combinations of Random Variables. A Linear Combination of random variables X, Y, … is a combination of the form: L = aX + bY + . . . Where a, b, etc. are numbers, which could be positive or negative. Two most common are: Sum = X + Y Difference = X – Y *If X, Y, . . . are random variables, a, b, . . . are numbers, either positive or negative, and L= aX + bY + . . . The mean of L is Mean (L)= a Mean(X) + b Mean (Y) + … Also: Mean (X + Y)= Mean (X) + Mean (Y) Mean (X – Y) = Mean (X) – Mean (Y) Ex. Suppose X= Height of Females in MA 2830 Y= Height of Males in MA 2830 *The Mean height of students in MA 2830 is going to be a weighted mean (Because there are more girls than boys in the class). So if L represents the entire MA 2830 class: Mean (L)= a Mean (Female Heights) + b Mean (Male Heights) **We could also look at the differences in the heights of the men and women. Mean (X – Y) = Mean (Females Heights) – Mean (Male Heights). *Suppose that MA 2830 is 70% Female and that the Mean Height of Females is 65 inches and the Mean Height of Males in 2830 is 70 inches. a) What is the Mean Height of MA 2830 students? Mean (L)= a Mean(X) + b Mean (Y) = 0.7 (65) + 0.3 (70) = 66.5 inches tall b) What is the Mean Difference in Heights between females and males in MA 2830? Mean (X – Y)= Mean (X) – Mean (Y) = 65 – 70 = 5 inches. If X and Y are independent random variables, a, b, etc. are numbers, and L = aX + bY + … Then, Variance and Standard Deviation of L are: Variance(L)= a2 Variance(X) + b2Variance (Y) + … Standard Dev. (L)= Variance(L) **Notice** Variance (X + Y) = Variance (X) + Variance (Y) Variance (X – Y)= Variance (X) + Variance (Y) ** They are equal because in the difference formula b= -1 and b2 = +1. Combining Independent Normal or Binomial Random Variables *Any linear combination of normally distributed variables also has a normal distribution* If X, Y, are independent, normally distributed random variables, and a, b, etc. are numbers, either positive or negative, then the random variable L = aX + bY + . . . is normally distributed and: *X + Y is normally distributed with mean x y and 2 2 standard deviation x y *X – Y is normally distributed withi mean x y and 2 2 standard deviation x y Ex. You have recently become lackadaisical about making it to your Statistics class on time. You leave home 35 minutes before class is set to start. Your travel time from your front door to the parking lot at school is normally distributed with a mean of 20 minutes and a standard deviation of 4 minutes. The time it takes to park and then walk to class is also normally distributed with a mean of 7 minutes and a standard deviation of 3 minutes. The driving time and parking/walking time are independent of one another. What is the probability that you will walk in late to class thereby gaining the eternal angst of the instructor. X= Driving Time; normally distributed with x 20 Minutes and x 4 minutes. Y= Parking/Walking Time; normally distributed with y 7 minutes and y 3 minutes. T= X + Y = Total Time *Notice that the random variable T has a normal distribution since it is the sum of two independent, normally distributed random variables. Mean (T) = x y = 20 + 7 = 27 minutes Standard Deviation = 2 x 2 y = 42 32 25 5 P(T>35)=1-P(T<35) = 1- P( Z 35 27 )= 5 1- P( Z 1.6) = 1 - .9452= .0548 Therefore, there is a 5.5% chance you will be late for class.