PROJECT: Simulation of Sampling Distribution of The Mean. The purpose of this project is to understand the following theorem. ___________________________________________________________________________________ Theorem: (MEAN AND STANDARD DEVIATION OF A SAMPLE MEAN) Suppose that x is the mean of a simple random sample of size n drawn from a large population with mean and standard deviation . Then the mean of the sampling distribution of x is x and its standard deviation is x n (approximately if finite population) ___________________________________________________________________________________ In this project, we try to understand the theorem above and the Central Limit Theorem that we will learn in the class. The purpose of this project is to examine the distribution of the estimates (sample means) and to explore how close the estimates based on surveys are to the true population value (population mean). We will use income.sav SPSS data set for practicing the random sampling technique with SPSS, simulation and studying the sampling distribution of the mean. I. Understand the population The variable to be studied is Income variable. Let us assume that the data is a population of individuals. Use SPSS to calculate the population mean, =______, and the population standard deviation =_______ , of this population for Income variable. (The standard deviation given from using Descriptive statistics option of SPSS is the sample standard deviation. You need to correct the sample standard deviation produced by SPSS to make it a population standard deviation. s2 x (n-1) / N = 2 Make a histogram for this data. What type of skewness do you see in this data set. Does the Income data follow a normal distribution? This can be answered by using SPSS normality test option to verify it. To do normality test with SPSS: p-value from this test is .000 One-Sampl e Kolmogorov-Smi rnov Test INCOM E 5000 N Normal Parametersa,b M ost Extreme Differences M ean 19.89742 Std. Deviation 12.57269 Absolut e .121 Positive .121 Negative -.115 Kolmogorov-Smirnov Z 8.549 Asymp . Sig. (2-tailed) .000 a. Test distribution is Normal. b. Calculat ed from data. and select variable to be tested and check the normality box and click OK. “p-value” is the number under the sig. column. If the p-value is less than .05, it implies that the data is not likely to be normally distributed. Therefore, the Income data is not quite normally distributed. Since its p-value is less than .05. Another test procedure for normality is in the Analyze/Descriptive Statistics/explore… option. II. Simulation Study I For Understanding Sampling Distribution of the Sample Mean One can conduct an experiment on a computer to generate many more samples that would be possible in practice. First, obtain a random sample from the Income population. Preparing Data: Simulate the sampling distribution of the sample mean, x , of Income for sample of size n = 10, by generating random sample of size 10 for 50 times(replications), and record their means using the following space or directly enter into SPSS data editor. Random sampling can be done by Step 1: Clicking on Transform\Random Number Seed. Then click the dot for Random Seed and then hit OK to set the seed for random number generation. Step 2: To get a random sample click on Data\Select Cases. Then in the dialog box which appears, click on Random sample of cases. Then click the Sample button. Click in the Exactly circle and then put 10 (if a random sample of size 10 is to be selected) in the between Exactly and cases. Step 3: To calculate the mean of the random sample, click Analyze/Descriptive Statistics/Descriptive, and select the variable for calculating mean. Examine the distribution of the simulated sample means: What is the median of the simulated sample means? _____ What is the mean of the sampled means? _____ Compare it with the population mean, . Is it close to what is stated in the theorem? What is the sample standard deviation of the simulated sample means, s x ? _____ Compare the standard deviation of the simulated sampling distribution of the mean with the population standard deviation of Income. o According to the theorem, what should be approximately the value of the standard deviation of the sampling distribution of the mean? (Use the formula stated in the theorem.) o Does the standard deviation of the simulated sampling distribution approximately equal to the standard deviation of the sampling distribution described in the theorem above? o If the simulation results do not satisfy the theorem, please explain “why?”. Make a histogram of the simulated sample means. Describe the distribution of the simulated sample mean data. Do a normality test for the simulated sample means. Does it follow a normal distribution? Simulation Study II For Understanding Sampling Distribution of the Sample Mean Simulated data for the sample mean, x , of Income for sample of size n = 2, 4, 10, 25, 50, 100 with 400 replications are recorded in the SPSS data file income_sim.sav. Each variable in the data file represent the result from random sampling of 400 times with different sample size. Examine the distribution and descriptive statistics of these data to see how the sample size affects the distribution of sample means. Use SPSS to find the mean and standard deviation and also find the p-value from the normality test. Sample size 2 4 10 25 50 100 mean median s.d. p-value of normality test Make histograms for all these simulated sample means. How does the distribution shape change as sample size increases? Describe the difference between the distribution of sample means for sample size 10 with 400 replications and the one you did in Simulation I with 50 replications. Compare the mean of the sample means calculated from different sample sizes with the mean of the original population. Compare the standard deviation of the sample means, for different sample sizes, with the standard deviation of the population. What does this tell us about the precision of the estimation of population mean with larger sample?