Unit 6 Summary Unit 6 covers the Normal Curve and the Central Limit Theorem. What do we mean by "normal?" We encounter the normal (bell) curve in our daily lives but may not recognize it. I would be willing to say that the time you commute to work over the course of a month or two is normally distributed. The average miles per gallon you get with your car is probably normally distributed. Beyond the fact that we encounter the normal curve in our daily lives it has tremendous value in statistics. The properties of the normal curve are: • • • The curve is unimodal, that is one mode over the mean The curve is symmetrical, that is the median is the same as the mean and mode The curve approaches but never touches the horizontal axis We can describe a distribution (remember unit 4) using the mean and the standard deviation. A very important feature of the normal curve is the applicability of the Empirical Rule. Using the 68-95-99.7 Rule or Empirical Rule (page 205) we can determine the percentage of the area under the curve between 1, 2 or 3 standard deviations on either side of the mean. There is approximately 68% fall within ± 1 standard deviation, 95% fall within ± 2 standard deviations, and 99.7% fall within ± 3 standard deviations. A major concept that we will address in the next unit is the fact that the area under the curve is equal to the probability. Therefore, if 68% of the scores in a distribution are between ± 1 standard deviation, then the probability of randomly selecting a score from the distribution in that range is also 68%. The 68-95-99.7 rule only applies to data values that are 1, 2, or 3 standard deviations away from the mean. We can generalize this rule if we know precisely how many standard deviations from the mean a particular data value is. The number of standard deviations a data value is above or below the mean is called a standard score or z -score. The z score distribution has a mean of 0 and a standard deviation of 1 because a data value at the mean is 0 standard deviation away from the mean and a value that is one standard deviation away from the mean is one standard deviation away from the mean! The z score locates a single score on a distribution of scores in terms of how many standard deviations away from the mean the score is. The formula for the z score is: Z = (X – m)/ s So if we have a distribution that has a mean of 225 and a standard deviation of 50, we can find the z score for a value of 300 by using the formula: Z = (300 - 225) / 50 = 75/50 = 1.5 We can now use StatCrunch {Stat -- Calculators -- Normal} to get the percentile but we could also use Appendix A: z-Score Tables. Again StatCrunch will do this much quicker for you and you do not run the risk of error. You should read that as I recommend you use StatCrunch! To use the table we need the z score. For our z score of +1.5 we can look in the first column for the X.X part of our z score and the columns for the .XX values. In the columns the first .x is a place holder for the value in the 10's place from the first column. Looking on page 447 we find the 1.5 row in the first column and then we use the .00 column and find the percentile is .9332 or 93.32. If we had used StatCrunch it would be: 1 Unit 6 Summary You will notice that StatCrunch is more accurate than the table because it is showing 7 decimal places instead of just 4! As we move toward our next unit, probability, we can make a probability statement about that score. In our example the probability of getting a score of 300 or less is 0.93. Section 5.3 introduces us to the Central Limit Theorem. The Central Limit Theorem (CLT) is applicable when the sample size is large (n>30). In this case, the shape of the x distribution is irrelevant and the CLT can be used to describe the sampling distribution. Here is a demonstration of a sampling distribution and how the CLT works for us. I want you to consider the following population of scores: 2, 4, 6, 8 Now, I want you to calculate the mean (μ) and standard deviation (σ) of this population. Remember that this is a population so the formula for the Standard Deviation does not use N-1 but just N. μ= ΣX/N = σ= √ΣX2 – (ΣX)2/N / N 20/4 = 5 √120 – 202/4 / 4 = 120-100/4 = 20/4 = √5 = 2.236 These represent values that usually no one knows. It is what we will try to estimate when we make inferences to the population in unit 8. 2 Unit 6 Summary Next let’s look at the Central Limit Theorem. First take all possible samples (with replacement) of two (n = 2) from this population. There will be 16 such samples. They are represented below by Pick 1 and Pick 2. For example, our first sample is 2, 2. There are all possible samples listed. We then will use these data as our population and calculate the mean of this set of data and the standard deviation. Pick 1 Pick 2 Mean Mean 2 Variance 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 2 4 6 8 2 4 6 8 2 4 6 8 2 4 6 8 2 3 4 5 3 4 5 6 4 5 6 7 5 6 7 8 80 4 9 16 25 9 16 25 36 16 25 36 49 25 36 49 64 440 0 2 8 18 2 0 2 8 8 2 0 2 18 8 2 0 Standard Deviation 0.000 1.414 2.828 4.243 1.414 0.000 1.414 2.828 2.828 1.414 0.00 1.414 4.243 2.828 1.414 0.00 Remember these are not estimated but actual values since we have all possible samples of size 2. This is the population of all these samples. The mean of the means from this distribution is 80/16 = 5, which equals the population mean. So we have shown that the mean of the means is equal to µ or the population mean. Now let’s see about what the theorem tells us about the standard deviation. First we will calculate it from our data and then use the Theorem formula to see if it matches. From our data, we calculate the Population Standard Deviation: Sx = √ΣX2 – (ΣX)2/N / N = √440 – (80)2/16 / 16 (notice we divide by N since this is a population). = √40/16 = √2.5 = 1.58 Now, we will calculate what the Central Limit Theorem tells us the standard deviation will be. It is σx = σ/ √n 3 Unit 6 Summary = 2.236 / √2 = 2.236 / 1.14142 = 1.58 Notice that they are identical. Now, if you graph all the means from our example in a histogram you will have the Sampling Distribution for n = 2 from our population. Also you will see that it is somewhat like a normal distribution. We will cover sampling distributions later in Unit 8. Also, you need to remember that the Theorem applies to any distribution only when n>30 and here we only have n = 2. Once we have the value for Z we can use StatCrunch to give us the probability. 4