Central Limit Theorem

advertisement
Central Limit Theorem
Understanding the Central Limit Theorem is critical for understanding hypothesis tests of
means. The central limit theorem states that when an infinite number of successive
random samples are taken from a population, the sampling distribution of the means of
those samples will become approximately normally distributed with mean µ and standard
deviation /N [~N(µ,/N)] as the sample size becomes larger, irrespective of the shape
of the population distribution.
The following tutorial demonstrates the different components of the central limit
theorem.
First, let’s choose our population.
This is a population in which all five characteristics being measured have the same
probability of occurrence. This is called a uniform distribution. The scores in the
population range from 1 to 5. The mean of the population is 3.0 and the standard
deviation of the population is 1.41 [µ=3.0; =1.41].
Our next step is to choose successive samples from this population. All samples must
have the same sample size. Let’s start small and choose samples of 5.
The distribution of scores in each of these samples is presented below. The red arrow
indicates the mean of the sample. As you can see, the sample mean is not the same for
these samples. Let’s choose some more samples.
Again, we can see that the distribution is different for each sample and that the means are
not the same. Let’s choose some more samples.
We now have taken 25 samples (N=5) from the population and have calculated a mean
for each sample. Just as we graphed the distribution of individual scores in each of the
samples, we can plot the distribution of the sample means. The distribution of our 25
sample means is presented below.
We can see that this distribution looks different from the population distribution. The
sample means do not have an equal probability of occurrence. In fact, the distribution
shows us that most of the sample means are clustering around a score of 3.0. We can
describe the central tendency and variability of distributions of sample means, just as we
can distributions of raw data. The mean of these sample means is 3.14 and the standard
deviation of these sample means is .59. The mean is very close to the population mean.
The standard deviation, however, is a lot smaller. The range of sample means is also
smaller than the range of scores in the population.
Let’s see what happens to our distribution of sample means if we take 75 more samples
of 5 from the population. This time, we won’t plot all 100 of our sample distributions,
we will just graph the values of the sample means.
As you can see, the distribution of sample means is almost normally distributed. The
mean of this distribution of sample means is 3.0 and the standard deviation is ##. Even
with a very small sample size, our distribution of sample means becomes approximately
normally distributed with a mean of 3.0 and a standard deviation very close in value to
/N (1.41/5 = .6306).
If we were to take an infinite number of samples from the population, the sampling
distribution would be approximately normally distributed as presented below. The mean
of this sampling distribution is equal to 3.0 and the standard deviation is .6306.
Download