Lecture Notes

advertisement
Psych 5500/6500
The Sampling Distribution of the Mean
Fall, 2008
Sampling Distribution of
the Mean
The 'sampling distribution of the mean’
(SDM): the population of all the sample
means you could get if you sampled a certain
number of scores from a certain population.
For Example
In a previous semester I asked the students to
draw a sample from a deck of playing cards.
Original Population from Which the
Sample Was Drawn
4 cards of each type (jacks counted as ’11’, queens as ’12’, kings as
’13’). This is a graph of individual scores in the population (i.e. ‘Y’).
The mean of the population of playing cards is μY=7 and its standard
deviation is σY=3.74. Note the population is not normally distributed,
and the exact values of μ and σ are known (not estimated).
Sample Means When N=4
The students were asked to sample four
cards and find the mean of the sample. Not
surprisingly they obtained many different
sample means.
Sample means from when n=4:
6.75, 5.5, 7.25, 7.5, 6.25, 8, 9.75, 7, 9.5, 4.25, 3.75,
9, 5.25, 7.75, 7.5, 11.5, 2.25, 9.25, 3.5, 8.75, 5.5,
6, 7.25, 7.75, 9.75, 7, 9.5, 7.5, 6.5, 9.25, 7.25,
7.25, 9.5
SDM for N=4
This is a graph of the 23 sample means (rounded off to the nearest
whole number). We are starting to see the shape of the sampling
distribution of the mean when n=4. Note that the mean of the sample
means looks to be around ‘7’ (the mean of the original population,
which is why the sample mean is an unbiased estimate of the
population mean).
Sample Means When N=8
The students were then asked to sample eight
cards and find the sample mean.
Sample means from when n=8:
6.25, 8.25, 5.75, 5.38, 6.63, 7.5, 7.5, 9.13, 8.38, 5.63,
6.13, 5.88, 8.13, 7.75, 7.13, 4.7, 5.63, 6.63, 9.13,
5.88, 5.88, 5.13, 8.63, 6.13, 7.5, 9.13, 8.13, 7.63,
6.75, 7.88, 7.38, 7.50, 7.85
SDM for N=8
Again, this is a graph of the sample means for when N=8.
And again, the mean of the sample means looks to be the
same as the mean of the original population (7).
Comparisons
The next three slides show the three graphs. Note
the following:
1.
While the population from which we sampled was
not normally distributed, the graphs of the sample
means begin to look more like normal curves.
2.
The variance of the sample means is less than
the variance of the original population, as n
moves from 4 to 8, the variance of the sample
means decreases (the sample mean is a
‘consistent’ estimate of the population mean).
Original Population from Which
the Sample Was Drawn
This is a graph of individual scores (Y).
SDM for N=4
This is a graph of sample means (when n=4).
SDM for N=8
This is a graph of the sample means for when N=8.
Short Cut
The preceding approach for finding the
sampling distribution of the mean would
actually require that we obtain an infinite
number of sample means to arrive at a true
picture of the population of sample means
we could obtain if we sampled a certain
number of scores from a certain population
(i.e. the SDM). This is a good way to
introduce the concept of SDM but we need a
short cut for actually producing an SDM...
1) The Shape of the SDM
You can count on the SDM being normally
distributed if either of the following two conditions
are met.
1. The SDM will be normally distributed if the
population you sampled from is normally
distributed.
2. The SDM will be normally distributed (even if the
population you sampled from is not) if the N of
your sample is large enough (Central Limit
Theorem). Rule of thumb: N ≥ 30
2) The Mean of the SDM
The mean of the population of sample means
equals the mean of the population from
which you sample (that is why the sample
mean is an ‘unbiased’ estimate of the
population mean).
μY  μY
3) The Standard Deviation
of the SDM
The standard deviation of the sample means is
less than the standard deviation of the
population from which you sampled, as the
means will vary less than the scores do.
σY
σY 
N
σ Y is also knownas the ' standarderror of the mean' ,
can you figure out whyit is calledthat?
Example: Original
Population
Let’s say the population is normally distributed, which means that the
SDM will be normally distributed as well.
SDM for N=4
 Y  Y  60
σY
16
σY 

8
N
4
SDM for N=64
 Y   Y  60
σY
16
σY 

2
N
64
Probability and the SDM
When the SDM is normally distributed we can
answer certain types of questions. The
following slides take us through a typical
question from the homework assignment.
Question
We will begin by repeating a process learned
in an earlier lecture.
We are sampling from a population that is
normally distributed with a mean of 55 and a
standard deviation of 10.
What is the probability of drawing a score from
that population that is between 50 and 60?
p(50  Y  60)?
Original Population
Step 1: draw and label the population.
Original Population
Step 2: shade in the area of question.
Original Population
Step 3: compute the z scores and look up the area under
the normal curve. The probability of obtaining a single score
between 50  Y  60 = .1915+.1915 = .3830
p=.3830
Question
Now we are going to ask a new question. If we
sample nine scores from that population,
what is the probability of obtaining a sample
mean that is between 50 and 60?
p(50  Y  60)?
SDM for N=9
Step 1: draw the sampling distribution of the mean, which is
the population of all the sample means we could get if we
sample 9 scores from the original population. We know the
SDM is normally distributed, its mean is the same as the mean of the
population, and we can compute the standard deviation of the curve
(‘standard error’). Note this is a population of sample means.
SDM for N=9
Step 2: shade in the area of question.
SDM & Standard Score
To figure out the shaded area of the normal
curve we need to change the sample means
of 50 and 60 to standard scores.
As always, the standard score will be the ‘raw’ score
on the graph (this is a graph of sample means) –
the mean of the graph (the mean of the sample
means) divided by the standard deviation of the
graph (the standard deviation of the sample
means, a.k.a. the ‘standard error’)
z
Y - Y
Y
SDM for N=9
z
Y - Y
Y
60  55
50 - 55

 1.5 and z 
 1.5
3.33
3.33
Step 3: compute the z scores and look up the area under
the normal curve. The probability of obtaining a sample mean
between 50 and 60 = .4332+.4332
p=.8664
Looking Back
When we sampled one score from a normal
population that had μ=55 and σ=10 there
was a 38.3% chance that the score would
be within 5 of the population mean.
When we sampled 9 scores from that
population there was a 86.64% chance that
the sample mean would be within 5 of the
population mean.
1-tail and 2-tail p values
We are very close to doing some statistical analyses
to test specific hypothesis. The next step is to
play with scenarios such as:
You sample 36 scores from a population that has a
μ=80 and σ=12. For what value of the sample
mean is there only a 5% chance that you would
obtain a sample mean that is that far or farther
above the population mean?
To set up the problem first draw the population you will be sampling
from, and then the SDM (population of sample means for N=36).
We don’t know if the population is normally distributed, do we know
if the SDM is?
Formulas
z
Y - Y
Y
and Y  (z)( Y )  Y
What sample mean would be 1.65 standard deviations above the mean
on this curve?
Y  (z)( Y )  Y  (1.65)(2)  80  83.3
Conditional Probability
Let’s think of it as a conditional probability.
p(Y  83.3| sampling36 scoresfroma population
with a  of 80 and of 12)  .05
Another Example
You sample 36 scores from a population that
has a μ=80 and σ=12. For what value of
the sample mean is there only a 5% chance
that you would obtain a sample mean that
is that far or farther below the population
mean?
Conditional Probability
p(Y  76.7 | sampling36 scoresfroma population
with a  of 80 and of 12)  .05
Final Example
You sample 36 scores from a population that
has a μ=80 and σ=12. For what values of
the sample mean is there only a 5% chance
that you would obtain a sample mean that
is that far or farther away from the
population mean (in either direction)?
For a normal curve the z scores that cut off a total of the 5% most
extreme scores (in both directions) are:
Conditional Probability
p(Y  76.08 or Y  83.92 | sampling36 scores
froma populationwith a  of 80 and of 12)  .05
Download