Chapter 6: Normal Probability Distributions The NORMAL DISTRIBUTION describes many different data sets. Attributes of the Normal Curve: Notation: Symmetric: Mean= Standard Deviation= Range is approximately 4 standard deviations The entire area under the curve is 100%, or 1 Standardizing with Z-Scores: The standard deviation is the most common measure of spread used for normal curves and is a natural ruler for comparing individual values to the mean. To determine how many standard deviations the value is away from the mean, we can standardize this value. z = Observed value – mean Standard deviation So, Z X where μ (population mean) and σ (population standard deviation) are given. The z-score tells us how many standard deviation an observation is above or below the mean. Example: A Z-score of 1 means the observation is 1 standard deviation larger than the mean. A Z-Score of –2 means the observation is 2 standard deviations smaller than the mean. 1 **Note: In these examples, we are talking about a single observation (X) coming from an entire population with a mean ( ) and a standard deviation ( ) Example: Two boys, in different classes, each ran a race. Boy A finished in 6.5 minutes. The average for his class was 8 minutes, with a standard deviation of 1.0 minutes. Boy B finished in 7.5 minutes. His class average was 9.5 minutes, with a standard deviation of 1.5 minutes. Suppose the distribution of times for each class follows a bell curve. Use z-scores to determine which boy did better with respect to the rest of his class. Explain your answer. Finding Normal Percentiles using the Standard Normal Table: Example #1: Use the Z table to find the following: Draw the picture first, shade the region you want and look up the Z in Table 2 to find the proportion to the left of that z-score. The proportion is also known as probability that the value of a particular member of a population will fall in the given interval. a. P(Z<-1.42)= b. P(Z 1.95) = 2 c. P(-1.02<Z 2.57) = Example #2: Women’s heights Assume that college women’s heights follow a normal curve with a mean height of 65 inches and a standard deviation of 2.7 inches. a) Find the probability a college woman, selected at random, is shorter than 62 inches? b) Find the probability a college woman, selected at random, is at least 68 inches tall? c) Find the probability a college woman, selected at random, is between 60 and 68 inches tall? Example #3: According to an article in Newsweek, in China, the mean emission of organic pollutants is 11.7 million pounds per day. Assume the water pollution in China is normally distributed throughout the year with a standard deviation of 2.8 million pounds of organic emissions per day. a) What is the probability that on any given day the water pollution in China is at least 15 million pounds per day? b) What is the probability that on any given day the water pollution in China is between 6.2 and 9.3 million pounds per day? 3 Inverse Normal Probability Calculations: Un-standardizing Sometimes the proportion or percentage is given and you must find the corresponding Z-score and un-standardize the value by finding the X-value STEPS: Draw the picture Identify the Z-value from the given value of the proportion. Solve for X: X Z Example 1. The distribution of heights of college women is normal, with mean 65 inches and standard deviation 2.7 inches. a) Find the height such that 10% of college women are shorter than that height. b) Determine the two heights that make up the middle 90%. Example 2. An athletic association wants to sponsor a footrace. The time it takes to run the course is normally distributed with a mean of 58.6 minutes, and a standard deviation of 3.9 minutes a) The association decides to have a tryout run, and eliminate the slowest 30% of the racers. What should the cutoff time be in the tryout run for elimination? 4 b) What is the value of the first quartile for this distribution? Practice problems. 1. A World Health Organization study of health in various countries reported that in Canada, systolic blood pressure readings have a mean of 122 and a standard deviation of 16. It is known that the distribution of systolic blood pressure is normal. a) What is the probability a Canadian selected at random has systolic blood pressure between 100 and 135? b) High systolic blood pressures can be very dangerous. What systolic blood pressure represents the boundary for the upper 7% of blood pressures? 2. Suppose that the distribution for the amount spent by students vacationing for a week in Florida is normally distributed with a mean of $650 and a standard deviation of $120. a) What is the probability that a randomly selected student vacationing for a week in Florida will spend between $500 and $900? b) Only 8% of students will spend more than what amount? 5 3. A machine that cuts corks for wine bottles operates in such a way that the distribution of the diameter of the corks produced is normal with a standard deviation of 0.15 cm. Suppose that 15% of the corks have a diameter above 3.244 cm. a) Find the mean diameter of the corks. b) Suppose the machine has been recalibrated and the mean diameter of the corks produced is now 4.5 cm with a standard deviation of 0.15 cm. Specifications for this machine require that cork diameters should be no smaller than 4.42 cm. What is the probability that a cork selected at random from this machine will have a diameter smaller than 4.42 cm? The Distribution of a Sample Mean Example #1: Tossing a Die Tossing a single die 10,000 times Histogram of single toss 1800 1600 1400 Frequency 1200 1000 800 600 400 200 0 1 2 3 4 single toss 5 6 7 Tossing a pair of dice 10,000 times, calculating and graphing the averages of each pair. 6 Histogram of Avg of pairs 1800 1600 1400 Frequency 1200 1000 800 600 400 200 0 1 2 3 4 5 6 A vg of pairs Tossing twenty dice 10,000 times, calculating and graphing the averages. Histogram of average of 20 600 500 Frequency 400 300 200 100 0 2.0 2.4 2.8 3.2 3.6 average of 20 4.0 4.4 4.8 In general, taking the average of larger sample sizes gives a more precise estimate of the true mean. (The spread around the center gets smaller) A sampling distribution is the probability distribution of a sample statistic. The Central Limit Theorem (CLT): When drawing a Simple Random Sample (SRS) n from any non-normal population with a mean and a standard deviation , then the sample mean ( x ) has a sampling distribution that is approximately normal as long as the sample is large enough. Rule of thumb: If the population is not normally distributed, n should be greater than or equal to 30. Conditions: 1. The sampled values must be independent of one another. 2. Randomization condition: The data values must be sampled randomly. 7 3. If sampling has not been made using replacement, the sample size should be no larger than 10% of the population. Usually, populations are so large that 10% is a small fraction. The Sampling Distribution Model for a Sample Mean: The mean of the sample averages is: x The standard deviation of the sample averages is: x n 1. If a population is normal and has the N ( , ) distribution, then the sample mean x of (n) independent observations has a distribution that is normal: x ~ N ( , ). n 2. If a population is non-normal, then the sample mean x of (n) independent observations has a distribution that is approximately normal according to the CLT (as long as n is large) : x ~ AN ( , ). n Now, when we are looking for area under a normal curve and we are dealing with a sample mean, the new Z-Score becomes: Z X = n Z X x x Example #1: Weights of Adult Men 1. In engineering, weights of people are considered so that airplanes and elevators aren’t overloaded, chairs won’t break and other embarrassing things won’t occur. Men’s weights are normal with a mean of 173 lbs., and a standard deviation of 30 lbs. a. What is the probability a randomly selected man weighs more than 180 lbs.? b. If 9 men are randomly selected (say to be in an elevator), what is the probability that their average weight is more than 180 lbs. 8 Example #2: As reported by Runner’s World magazine, the times of the finishers in the New York City 10 km run are normally distributed with a mean of 61 minutes and a standard deviation of 9 minutes. A simple random sample of 30 runners is selected. a) Describe the sampling distribution of the average 10km finishing times. b) Find the probability that the average for the sample of 30 finishing times will be more than 65 minutes. Example #3: A rental car company has noticed that the distribution of the number of miles customers put on rental cars per day is right skewed. The distribution has a mean of 60 miles and a standard deviation of 25 miles. A random sample of 120 rental cars is selected. a) Describe the sampling distribution of the average number of miles driven per day for the sample of 120 rental cars. Use the appropriate notation. b) What is the probability that the mean number of miles driven per day for the sample of 120 cars is less than 54? 9 c) What is the probability that the total number of miles driven per day in the sample of 120 cars exceeds 7400? Inverse Calculation: Un-standardizing This Z-score calculation can also be rearranged to solve for a sample mean: Z X n X Z n Example #1: The amounts of telephone bills for all households in a large city have a distribution that is skewed to the right with a mean of $75 and a standard deviation of $27. A random sample of 90 households is selected from this city. What is the value representing the first quartile for the sampling distribution of X ? Example #2: A waiter believes the distribution of his tips has a model that is right skewed, with a mean of $9.60 and a standard deviation of $5.40. A random sample of 40 parties this waiter waits on is selected. a) Describe the sampling distribution of the sample mean tip. b) What is the probability that the waiter will earn a total of less than $450 in tips when he waits on 40 parties? 10 c) How much does the waiter earn on the best 10% of weekends in which he waits on 40 parties? Example #3: In the library on a university campus, there is a sign in the elevator that indicates a weight limit of 2500 pounds. Assume the average weight of students, faculty and staff on campus is normally distributed, with a mean of 150 pounds, and standard deviation 27 pounds. A random sample of 16 persons from the campus is selected. a. Describe the sampling distribution of the sample mean weight. b. What is the probability that the average weight of the 16 people in the sample is less than 160 pounds? c. Suppose the sample of 16 people is placed in the library elevator. What is the probability that the total weight of the 16 persons on the elevator will exceed the weight limit of 2500 pounds? The Distribution of a Sample Proportion Example 1: Classroom Experiment: Simulating a sampling distribution for a sample proportion. Suppose students are asked to spin a penny on their desk and record the number of heads they get. 200 Students were directed to consider the distribution of sample proportion values from samples of 10 and 20 spins. Trial #1: Variable Sample Prop N 200 n 10 Mean 0.4840 SE Mean 0.0107 11 StDev 0.1515 Histogram of sample proportion of heads 50 Frequency 40 30 20 10 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 sample proportion of heads (n=10 spins) 200 students 0.9 Trial #2: Variable Sample Prop N 200 n 20 Mean 0.48975 SE Mean 0.00815 StDev 0.11532 Histogram of sample proportion of heads 70 60 Frequency 50 40 30 20 10 0 0.2 0.4 0.6 0.8 sample proportion of heads (n=20 spins) 200 students 1.0 Question #1: What proportion of heads did you expect from each set of spins? Question #2: Did the students get the same sample proportion every time? This is called Sampling Variability. Question #3: Compare the two graphs. Which one did a better job at estimating the true proportion? Why? Give two reasons. The main fact sand formulas The sample proportion (p-hat): p̂ = number of successes in sample total number in the sample 12 Notation: p̂ x , where x is the number of successes. n In other words, p̂ is a sample proportion from a SRS of size (n) from a population having proportion of successes is p . Sample proportions summarize categorical variables. Attributes, Assumptions and Conditions: 1. The sampled values must be independent of one another. 2. Mean of a sample proportion: pˆ p 3. Standard deviation of a sample proportion: p (1 p ) n p̂ 4. When n is sufficient large and the true proportion p is not too near 0 or 1, the sampling distribution model for a proportion is approximately normal. pˆ ~ AN p, pq n 5. As a safe (and conservative) rule of thumb, check that the number of successes and the number of failures are at least 10. np 10 nq 10 6. If sampling has not been made using replacement, the sample size must be no larger than 10% of the population. Usually, populations are so large that 10% is a small fraction. Standardized Statistics: The standardized z-score we will use for sample proportions is as follows: Z pˆ pˆ pˆ pˆ p p(1 p) n 13 Examples. 1. The Associated Press reported that 71% of Americans ages 25 and older are overweight. A researcher wants to know whether the proportion of such individuals in his state that are overweight differs from the national proportion. A random sample of 600 adults in his state results in 405 who are classified as overweight. a. What is the sample proportion of overweight Americans? b. Check and verify all of the attributes, assumptions and conditions. c. Describe the sampling distribution of the sample proportion for size 600 using the appropriate notation. d. Find the probability that at most 405 of the 600 sampled adults are classified overweight. 2. According to the 2001 Youth Risk Behavior Surveillance by the Center for Disease Control and Prevention, 39% of the 10th-graders surveyed said that they watch three or more hours of television on a typical school day. Assume that this percentage is true for the current population of all 10th –graders. Suppose in a random sample of 200 10th-graders, 86 watched three or more hours of television on a typical school day. a. Check the general properties and describe the sampling distribution of the sample proportion of size 200 using the appropriate notation. b. Find the probability that 86 or more out of the 200 students watched three or more hours of television on a typical day. 14 3. A nationwide survey by the University of Connecticut Center for Survey Research and Analysis found that 30% of men aged 18 to 29 had tattoos in 2002. Suppose this result holds true for the current population of all men in this age group. Find the probability that in a random sample of 500 men aged 18 to 29, between 28.4% and 32.6% have tattoos. 4. 5% of the requests to a web server end up in a network error. A network technician monitors a busy web server for one hour. He observes that 200 requests were received and 14 ended up in an error. Which of the following correctly describes the sampling distribution of p̂ , the sample proportion of requests to the web server that end up in an error? A. p̂ is approximately normal, with mean 0.07 and standard deviation 0.0180. B. p̂ is normal, with mean 0.05 and standard deviation 0.0154. C. p̂ is approximately normal, with mean 10 and standard deviation 3.082. D. p̂ is approximately normal, with mean 0.05 and standard deviation 0.0154 E. p̂ is normal, with mean 0.07 and standard deviation 0.0180. Calculate the probability that the proportion of requests in the sample that end up in an error is at least 0.07 (14 out of 200)? 15 SUMMARY: Make sure that you understand this. A. Normal Distribution Symmetric: Mean= Standard Deviation= Range is approximately 4 standard deviations The entire area under the curve is 100%, or 1 Standardizing with Z-Scores: The standard deviation is the most common measure of spread used for normal curves and is a natural ruler for comparing individual values to the mean. To determine how many standard deviations the value is away from the mean, we can standardize this value. z = Observed value – mean, or Z X B. Finding Normal Percentiles using the Standard Normal Table. C. Inverse Normal Probability Calculations X Z D. The Distribution of a Sample Mean X has a distribution that is approximately normal as long as the sample is large enough. Rule of thumb: If the population is not normally distributed, n should be greater than or equal to 30. 16 The sample size should be no larger than 10% of the population. The mean of the sample averages is: x The standard deviation of the sample averages is: x n 1. If a population is normal and has the N ( , ) distribution, then X ~ N ( , n ). 2. If a population is non-normal, then for large n the sample mean is approximately normal (CLT) : X ~ AN ( , n ). The Z-Score becomes: Z X = n Z X x x Inverse Calculation: Un-standardizing Z X n X Z n E. The Distribution of a Sample Proportion pˆ p 17 p (1 p ) n p̂ When n is sufficient large and the true proportion p is not too near 0 or 1, the sampling distribution model for a proportion is approximately normal. pˆ ~ AN p, 18 p(1 p) n