Sampling Distribution Lab

advertisement
Samples, Sampling Distributions, and the CLT
The big topic--the only topic--today is the Central Limit Theorem. If you draw lots and lots of samples from
the same population, what do the means of those samples look like?
The purpose of this lab is to get hands on experience with samples and their properties. You will simulate
collecting samples from a population, using Excel, and explore how sample means and standard deviations
are affected by the size of the sample. By the end of the lab you should be able to show:
As the sample size (n) gets larger, the sample variability (s) gets _______
As the sample size (n) gets larger, the standard error of the mean gets ________
USING THE SAMPLING FUNCTION TO EXAMINE DISTRIBUTIONS OF SAMPLE MEANS
Sampling from a Normal Distribution
Using the random number generator, generate a normally distributed data set (N=240) with a mean of 50 and a
standard deviation of 8. This will be our population (make sure the first value is not wack).
1) Find the mean, standard deviation, and draw the histogram for this population (set the bins yourself).
Population Distribution
Mean =
St Dev =
2) Now simulate taking 20 samples of N=15 from this population. Go to the “Data Analysis” menu and choose
“Sampling”. In the space for “Input Range” put the range of cells that contains the population data. Where it says
“Sampling Method” check “Random”. Now here’s something very dumb on Excel’s part. Where it says “Number of
Samples” put in the number of people per sample (n). Don’t ask me why it is labeled incorrectly. Finally, under
“Output range” enter the cell where you want the output to start (e.g. for the first sample this might be C2). Do this
over again nineteen times (sorry) not forgetting to change the output cell each time (D2, E2, etc).
3) Theoretically, what value should the mean of the sample means approach? Theoretically, what value should the
standard deviation of the sample means approach?
1
4) Calculate the mean of each of your twenty samples. You will in this way be creating a distrubition of sample means.
What is the mean and standard deviation of this sampling distribution? Draw a histogram of the sample means (use the
same bins you used for the population so that you can compare the histograms easily).
Sampling Distribution (N=15)
Mean of sample means =
St Dev of sample means =
5) Now simulate taking 20 samples of N=60. What is the mean and standard deviation of this sampling distribution?
Draw a histogram of the sample means (use the same bins as the previous).
Sampling Distribution (N=60)
Mean of sample means =
St Dev of sample means =
Theoretically, what value should the mean of the sample means approach? Theoretically, what value should the
variance of the sample means approach? How did your samples do?
6) If you're having loads of fun, repeat the operation with sample size = 130.
Sampling Distribution (N=130)
Mean of sample means =
St Dev of sample means =
2
Sampling from a Skewed Distribution
Open the guinea.xls file in the Datasets folder of the S1610q folder on the desktop, and immediately save it as
something else IN THE STUDENTWORK FOLDER. This dataset gives the survival times of 72 guinea pigs in a
medical experiment. The distribution of survival times is strongly skewed to the right. Sampling from this population
can demonstrate how averaging reduces variability and creates a more normal distribution.
1) Find the mean, mode, median, and standard deviation, and draw the histogram for this population (set the bins
yourself).
Population Distribution
Mean =
Mode =
Median =
St Dev =
2) Simulate taking 20 samples of N=12 from this population. Calculate the mean of each sample and draw the
histogram of means. Then simulate taking 20 samples of N=50, and calculate the means, draw the histogram. Compare
the sampling distributions below.
Sampling Dist (N=12)
Mean =
St Dev =
Sampling Dist (N=50)
Mean =
St Dev =
How does the shape of the sampling distributions differ from that of the population distribution? How do the shapes
change as the sample gets larger? How do the means of the sampling distributions differ from that of the population
distribution? How do the standard deviations differ?
How does this exercise illustrate the central limit theorem and the law of large numbers?
3
Download