Samples, Sampling Distributions, and the CLT The big topic--the only topic--today is the Central Limit Theorem. If you draw lots and lots of samples from the same population, what do the means of those samples look like? The purpose of this lab is to get hands on experience with samples and their properties. You will simulate collecting samples from a population, using Excel, and explore how sample means and standard deviations are affected by the size of the sample. By the end of the lab you should be able to show: As the sample size (n) gets larger, the sample variability (s) gets _______ As the sample size (n) gets larger, the standard error of the mean gets ________ USING THE SAMPLING FUNCTION TO EXAMINE DISTRIBUTIONS OF SAMPLE MEANS Sampling from a Normal Distribution Using the random number generator, generate a normally distributed data set (N=240) with a mean of 50 and a standard deviation of 8. This will be our population (make sure the first value is not wack). 1) Find the mean, standard deviation, and draw the histogram for this population (set the bins yourself). Population Distribution Mean = St Dev = 2) Now simulate taking 20 samples of N=15 from this population. Go to the “Data Analysis” menu and choose “Sampling”. In the space for “Input Range” put the range of cells that contains the population data. Where it says “Sampling Method” check “Random”. Now here’s something very dumb on Excel’s part. Where it says “Number of Samples” put in the number of people per sample (n). Don’t ask me why it is labeled incorrectly. Finally, under “Output range” enter the cell where you want the output to start (e.g. for the first sample this might be C2). Do this over again nineteen times (sorry) not forgetting to change the output cell each time (D2, E2, etc). 3) Theoretically, what value should the mean of the sample means approach? Theoretically, what value should the standard deviation of the sample means approach? 1 4) Calculate the mean of each of your twenty samples. You will in this way be creating a distrubition of sample means. What is the mean and standard deviation of this sampling distribution? Draw a histogram of the sample means (use the same bins you used for the population so that you can compare the histograms easily). Sampling Distribution (N=15) Mean of sample means = St Dev of sample means = 5) Now simulate taking 20 samples of N=60. What is the mean and standard deviation of this sampling distribution? Draw a histogram of the sample means (use the same bins as the previous). Sampling Distribution (N=60) Mean of sample means = St Dev of sample means = Theoretically, what value should the mean of the sample means approach? Theoretically, what value should the variance of the sample means approach? How did your samples do? 6) If you're having loads of fun, repeat the operation with sample size = 130. Sampling Distribution (N=130) Mean of sample means = St Dev of sample means = 2 Sampling from a Skewed Distribution Open the guinea.xls file in the Datasets folder of the S1610q folder on the desktop, and immediately save it as something else IN THE STUDENTWORK FOLDER. This dataset gives the survival times of 72 guinea pigs in a medical experiment. The distribution of survival times is strongly skewed to the right. Sampling from this population can demonstrate how averaging reduces variability and creates a more normal distribution. 1) Find the mean, mode, median, and standard deviation, and draw the histogram for this population (set the bins yourself). Population Distribution Mean = Mode = Median = St Dev = 2) Simulate taking 20 samples of N=12 from this population. Calculate the mean of each sample and draw the histogram of means. Then simulate taking 20 samples of N=50, and calculate the means, draw the histogram. Compare the sampling distributions below. Sampling Dist (N=12) Mean = St Dev = Sampling Dist (N=50) Mean = St Dev = How does the shape of the sampling distributions differ from that of the population distribution? How do the shapes change as the sample gets larger? How do the means of the sampling distributions differ from that of the population distribution? How do the standard deviations differ? How does this exercise illustrate the central limit theorem and the law of large numbers? 3