How to select a simple random sample? How to simulate the

advertisement
PROJECT: Simulation of Sampling Distribution of The Mean.
The purpose of this project is to understand the following theorem.
___________________________________________________________________________________
Theorem: (MEAN AND STANDARD DEVIATION OF A SAMPLE MEAN)
Suppose that x is the mean of a simple random sample of size n drawn from a large population with
mean  and standard deviation  . Then the mean of the sampling distribution of x is  x   and
its standard deviation is
x 

n
(approximately if finite population)
___________________________________________________________________________________
In this project, we try to understand the theorem above and the Central Limit Theorem that we will
learn in the class. The purpose of this project is to examine the distribution of the estimates (sample
means) and to explore how close the estimates based on surveys are to the true population value
(population mean). We will use income.sav SPSS data set for practicing the random sampling
technique with SPSS, simulation and studying the sampling distribution of the mean.
I. Understand the population
The variable to be studied is Income variable. Let us assume that the data is a population of
individuals.
 Use SPSS to calculate the population mean,  =______,
and the population standard deviation  =_______ , of this population for Income variable.
(The standard deviation given from using Descriptive statistics option of SPSS is the sample
standard deviation. You need to correct the sample standard deviation produced by SPSS to
make it a population standard deviation. s2 x (n-1) / N = 2
 Make a histogram for this data. What type of skewness do you see in this data set.
 Does the Income data follow a normal distribution? This can be answered by using SPSS
normality test option to verify it.
To do normality test with SPSS:
p-value from this test is .000
One-Sampl e Kolmogorov-Smi rnov Test
INCOM E
5000
N
Normal Parametersa,b
M ost Extreme
Differences
M ean
19.89742
Std. Deviation
12.57269
Absolut e
.121
Positive
.121
Negative
-.115
Kolmogorov-Smirnov Z
8.549
Asymp . Sig. (2-tailed)
.000
a. Test distribution is Normal.
b. Calculat ed from data.
and select variable to be tested and check the normality box and click OK. “p-value” is the number
under the sig. column. If the p-value is less than .05, it implies that the data is not likely to be
normally distributed. Therefore, the Income data is not quite normally distributed. Since its p-value
is less than .05.
Another test procedure for normality is in the Analyze/Descriptive Statistics/explore… option.
II. Simulation Study I For Understanding Sampling Distribution of the Sample Mean
One can conduct an experiment on a computer to generate many more samples that would be possible
in practice. First, obtain a random sample from the Income population.
Preparing Data:
Simulate the sampling distribution of the sample mean, x , of Income for sample of size n = 10, by
generating random sample of size 10 for 50 times(replications), and record their means using the
following space or directly enter into SPSS data editor. Random sampling can be done by
Step 1: Clicking on Transform\Random Number Seed. Then click the dot for Random Seed and then
hit OK to set the seed for random number generation.
Step 2: To get a random sample click on Data\Select Cases. Then in the dialog box which appears,
click on Random sample of cases. Then click the Sample button. Click in the Exactly circle
and then put 10 (if a random sample of size 10 is to be selected) in the between Exactly and
cases.
Step 3: To calculate the mean of the random sample, click Analyze/Descriptive Statistics/Descriptive,
and select the variable for calculating mean.
Examine the distribution of the simulated sample means:
 What is the median of the simulated sample means? _____
 What is the mean of the sampled means? _____
 Compare it with the population mean,  . Is it close to what is stated in the theorem?
 What is the sample standard deviation of the simulated sample means, s x ? _____
 Compare the standard deviation of the simulated sampling distribution of the mean with the
population standard deviation of Income.
o According to the theorem, what should be approximately the value of the standard
deviation of the sampling distribution of the mean? (Use the formula stated in the
theorem.)
o Does the standard deviation of the simulated sampling distribution approximately equal
to the standard deviation of the sampling distribution described in the theorem above?
o If the simulation results do not satisfy the theorem, please explain “why?”.
 Make a histogram of the simulated sample means. Describe the distribution of the simulated
sample mean data.
 Do a normality test for the simulated sample means. Does it follow a normal distribution?
Simulation Study II For Understanding Sampling Distribution of the Sample Mean
Simulated data for the sample mean, x , of Income for sample of size n = 2, 4, 10, 25, 50, 100 with
400 replications are recorded in the SPSS data file income_sim.sav. Each variable in the data file
represent the result from random sampling of 400 times with different sample size. Examine the
distribution and descriptive statistics of these data to see how the sample size affects the distribution of
sample means.

Use SPSS to find the mean and standard deviation and also find the p-value from the normality
test.
Sample
size
2
4
10
25
50
100




mean
median
s.d.
p-value of
normality test
Make histograms for all these simulated sample means. How does the distribution shape change
as sample size increases?
Describe the difference between the distribution of sample means for sample size 10 with 400
replications and the one you did in Simulation I with 50 replications.
Compare the mean of the sample means calculated from different sample sizes with the mean of
the original population.
Compare the standard deviation of the sample means, for different sample sizes, with the
standard deviation of the population. What does this tell us about the precision of the estimation
of population mean with larger sample?
Download