The Normal Distribution NOTES In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.) By looking for patterns (shape, center, spread, outliers, etc.) By calculating (mean, median, mode, spread, etc.) Sometimes, the overall pattern of a large number of observations is so regular that we can describe it using a curve. LET’S START BY LOOKING AT A DENSITY CURVE. Suppose you have created the histogram on the left to represent a set of data. Because it appears symmetric and its ends behave similarly, we could approximate it with a curve, or model, as shown on the right. It will be easier for us to work with the curve than with the histogram itself. THIS IS BECAUSE WE WILL NOT HAVE TO WORRY ABOUT THE CATEGORIES FROM OUR HISTOGRAM WHEN TRYING TO DESCRIBE THE DATA. THE PROPERTIES OF THE CURVE WILL ALLOW US TO DESCRIBE OUR DATA MORE QUICKLY AND ACCURATELY THAN BEFORE. LET’S SEE HOW THIS PROCESS WORKS. Our curve is an example of a NORMAL CURVE. A normal curve is used to describe a normal distribution. A normal curve is symmetric. A normal curve has a single peak. A normal curve is bellshaped. All normal curves have the same shape. The area underneath the curve is exactly 1, and it represents the proportion of all observations. More About the Normal Curve The mean, µ, is located at the center of the curve. The mean is the same as the median. The standard deviation, σ, is the measure of spread for normal distributions. The points at which the curvature changes are located at a distance of σ from the mean. Here’s a visual of mean and standard deviation for a normal curve: One standard deviation, σ Two standard deviations, 2σ Mean, µ Normal Distributions are Important Because… They are good descriptions for some distributions of data…like SAT scores, characteristics of populations, and even some scores on psychological tests. They are good approximations of many kinds of chance outcomes, like flipping a coin or rolling a die. Many of the inferences we can make using a normal distribution can be applied to other situations in which data is almost symmetric. Keep in mind… Not all sets of data are modeled by a normal distribution. Some sets of data are skewed towards the right (like income distributions). Some sets of data are skewed towards the left (like the average number of letters in words we say each day). Now, let’s begin to learn how to work with a normal distribution. The 68-95-99.7 Rule In the normal distribution with mean µ and standard deviation σ: 68% of the observations fall within σ of the mean µ 95 % of the observations fall within 2σ of the mean µ 99.7% of the observations fall within 3σ of the mean µ The distribution of heights of adult males is approximately normal with mean 69 inches and standard deviation 2.5 inches. Between what heights do the middle 95% of men fall? 95% will fall within 2 standard deviations of the mean. 2 standard deviations above the mean is 74 inches. 2 standard deviations below the mean is 64 inches. This means that 95% of men have heights that fall between 64 and 74 inches. What percent of men are taller than 74 inches? 74 inches is 2 standard deviations above the mean. If we take 100% of men and subtract away the 95% that fall within 2 standard deviations, we are left with 5% of men. Half of this 5% will be above the 2 standard deviation mark, so 2.5% of men are taller than 74 inches. Standard Normal Distribution Not all predictions that we need to make will be an exact standard deviation or two away from the mean. Because of this, we need to standardize our values. The standard normal distribution is the normal distribution N(0, 1) with mean 0 and standard deviation 1. To standardize a variable x with a normal distribution N(µ, σ), we will use This is often referred to as a z-score. At this point, if you haven’t already done so, print the standard normal table linked in the introduction to these notes on the course page. Let’s practice! Let’s find the proportion of adult men who are less than 70 inches tall. Remember from before that the mean is 69 inches and the standard deviation is 2.5 inches. Because 65 isn’t an equal number of standard deviations from the mean, let’s standardize it! The formula: So, this is .2 standard deviation more than the mean height. To actually determine the proportion this z-score represents, we can use the area under the standard normal curve. Look at your standard normal table and find z = 0.2. The corresponding standard normal probability is .5793. This means that 57.93% of adult men are less than 70 inches tall. This means that 100%-57.93% = 42.07% of adult men are more than 70 inches tall. Keep in mind that the numbers in the standard normal table will always represent the area under the curve to the LEFT of z. You try this one. Then, go to the next slide to check your answer. Find the proportion of adult men who are at least 79 inches (or 6 feet, 9 inches) tall. How well did you do? Find the z-score: Go to the table and find the probability associated with the zscore of 3.8: .99993 This means that 1 - .99993 = .00007 or .007% of adult men are at leas 79 inches tall. Margin of Error Shows how accurate we believe our guess is, based on the variability of our estimates We can discuss margin of error using the properties of the standard normal curve. Example The sampling distribution of a set of test scores is approximately normal with a mean of 280 and a standard deviation of 1.9. According to the 68-95-99.7 Rule, about 95% of all the values would fall within 2 standard deviations, or within 3.8, of the mean of this curve. This means that the margin of error for this distribution is ± 3.8., meaning that the actual mean score for everyone taking the test is within 3.8 points of the mean of 280 (or between 276.2 and 283.8). We would be able to say we are 95% confident about this range of values because it represents about 95% of all values.