Distribution of Sample Mean Slides

advertisement
The (“Sampling”) Distribution for
the Sample Mean*
1
Distribution of Sample Means
A quantitative population of N units with
parameters
mean 
standard deviation 
A random sample of n units from the population
Statistic: The sample mean X.
2
Distribution of Sample Means
Statistic: The sample mean X.
This statistic is an unbiased point estimate
(on average correct) of the parameter .
X  
3
20 Times Rule / 5% Rule
(same thing)
If the population size (N) is at least 20 times the
sample size (n)
N / n  20
n / N  0.05
or
then the standard deviation is (essentially)
X 

n
4
Distribution of the Sample Mean
Given
A variable with population that is not Normally
distributed with mean  and standard deviation .
A random sample of size n.
When the population size
is at least 20 times n.
Result
The sample mean has approximate Normal
distribution with
X  
X 
n
5
Example
Rolls of paper leave a factory with weights that
are Normal with mean  = 1493 lbs, and
standard deviation  = 12 lbs.
6
Finding probabilities
What is the probability a roll weighs over 1500 lbs?
ANS: 0.2798
(about 28% of rolls exceed 1500 lbs)
1500  1493
Z
 0.5833
12
7
New Question
A truck transports 8 rolls at a time. The legal weight
limit for the truck is 12,000 lbs. What is the probability
8 rolls have total weight exceeding this limit?
Since 12000/8 = 1500, the question could also be
phrased:
What is the probability 8 rolls have (sample) mean
weight exceeding 1500?
The bad news: The answer is not 0.2798.
The good news: It’s not that tough.
8
Distribution of the Sample Mean
Review of previous slide
Given
A variable with population that is Normally
distributed with mean  and standard deviation .
A random sample of size n.
(N/n  20)
Result
The sample mean X has Normal distribution:
X  
X 
n
9
Example - continued
Rolls (single rolls) of paper leave a factory with
weights that are Normal with mean  = 1493 lbs,
and standard deviation  = 12 lbs.
If n = 8 rolls are randomly selected, what is the
probability their sample mean weight exceeds
1500?
The distribution of sample means X is Normal.

12
X 

 4.243
 X    1493
n
8
10
Finding probabilities
Find the probability the sample mean is over 1500 lbs.
Here we’re using the same mean,
but a standard deviation reduced
to 4.243.
ANS: 0.0495
1500  1493
7
Z

 1.650
4.243
4.243
11
Interpreting the Result
The probability the sample mean for 8 rolls exceeds
1500 lbs is 0.0495.
For 4.95% of all possible samples of 8 rolls, the sample
mean exceeds 1500 lbs.
Equivalent: There is a 0.0495 probability that the total
weight will exceed 81500 = 12,000 lbs.
We’re working towards using the sample mean as an
estimate of the population mean.
12
The Picture
1500
Sample mean weights for
samples of 8 rolls.
1453
1463
1473
1483
Weights of single rolls.
1493
1503
Weight (lbs)
1513
1523
1533
13
The Picture
About 28% of all
rolls are > 1500 lbs
14
The Picture
About 5% of all
samples of 8 rolls
have mean > 1500 lbs
15
Example
Survival times have a right skewed distribution
with mean  = 13 months and standard deviation
 = 12 months.
What can we say about the distribution of sample
mean survival times for samples of n patients?
 X    13

12.0
X 

n
n
As n gets larger, the distribution gets closer to
Normal.
16
Sample mean n = 64
SD = 1.5
Single values
SD = 12.0
Sample mean n = 16
SD = 3.0
Sample mean n = 4
SD = 6.0
0
10
20
30
40
50
60
  13
17
Distribution of the Sample Mean
Given
A variable with population that is not Normally
distributed with mean  and standard deviation .
A random sample of size n.
Assume the population
size is at least 20 times n.
Result
The sample mean has approximate Normal
distribution with
X  
X 
n
18
Distribution of the Sample Mean
Given
A variable with population that is not Normally
distributed with mean  and standard deviation .
A random sample of size n.
Result
The sample mean has generally unknown
distribution with
X  
X 
n
19
Distribution
Central Limit
of the
Theorem
Sample
(CLT)
Mean
Given
A variable with population that is not Normally
distributed with mean  and standard deviation .
A random sample of size n, where n is sufficiently
large.
Result
The sample mean has approximate Normal
distribution with
X  
X 
n
20
What is “Sufficiently Large?”
Your book says “generally n at least 30.”
If the population is fairly symmetric without outliers,
considerably less than 30 will do the trick.
If the population is highly skewed, or not unimodal,
considerably more than 30 may be required.
If the population is Normal then sample size is
not a concern: The sample mean is Normal.
You may use the “30” rule if you recognize that
it’s not that black and white, and that for Normal
populations, n = 1 is “sufficiently large.”
21
Example
The Census Bureau reports the average age at
death for female Americans is 79.7 years, with
standard deviation 14.5 years.
 = 79.7 years
 = 14.5 years
What can we say about the distribution of sample
means for samples of size 7?
It has mean  X    79.7
It has standard deviation  X 
Is the distribution Normal?

n

14.5
7
 5.48
22
Example
Distribution of longevity:   80   15
Within 1 s.d.:
23
Example
Distribution of longevity:   80   15
If Normal
Within 1 s.d.:
(65, 95)
24
Example
Distribution of longevity:   80   15
If Normal
Within 1 s.d.:
(65, 95)
 68%
25
Example
Distribution of longevity:   80   15
If Normal
Within 1 s.d.:
(65, 95)
Within 2 s.d.s: (50, 110)
 68%
 95%
26
Example
Distribution of longevity:   80   15
If Normal
Within 1 s.d.:
(65, 95)
Within 2 s.d.s: (50, 110)
 68%
 95%
Above 110
27
Example
Distribution of longevity:   80   15
If Normal
Within 1 s.d.:
(65, 95)
Within 2 s.d.s: (50, 110)
Above 110
 68%
 95%
 2.5%
28
Example
Distribution of longevity:   80   15
If Normal
Within 1 s.d.:
(65, 95)
Within 2 s.d.s: (50, 110)
Above 110
 68%
 95%
 2.5%
1 in 40 ???
No way! The distribution is not Normal.
29
Example
The Normal shouldn’t be used here (why not?)
16
Percent of Women
14
12
10
8
6
4
2
0
30
45
60
75
Age at Death (years) for Women
90
30
Example
The Normal shouldn’t be used here (why not?)
The distribution of age at death is not Normal. It is
quite left skewed.
The sample size is not sufficiently large. (At least 30
by your book, although for this situation your
instructor would probably buy into as low as 20.)

The Central Limit Theorem can’t be applied.
The sample mean doesn’t have approximate Normal
distribution
31
Example
What can we say about the distribution of sample
means for samples of size 7?
It has mean  X    79.7
It has standard deviation  X 

n

14.5
7
 5.48
Is the distribution Normal?
NO!
32
Example
 = 79.7 years
 = 14.5 years
I looked at a few recent obituaries in the Oswego
Daily News (online):
79 70 48 99 85 71 45
X  71.00 S  19.36
33
Example
 X    79.7

14.5
X 

 5.48
n
7
This sample has X  71.0. A difference of 8.7.
Can we compute a Z score for 71.0? Should we?
Z = (71.0 – 79.7) /5.48 = 8.7/5.48 = –1.59
Why not? This suggests 71.0 (8.7 from 79.7) is
somewhat, but not extremely, unusually low.
71.0 is 1.59 standard deviations from 79.7.
34
Example
Should we use the Table to obtain probabilities from Z
scores (such as our Z = –1.59)?
NO
If not, how could we get the probability
of a result within 8.7 from 79.7?
Preferred method: Much
Using either
a huge database of longevities:
more compact; faster to
work with; essentially
identical results.
Simulate many (all possible) samples of size 7. Determine what
proportion of samples give a mean at no more than 8.7 from 79.7.
a mathematical model for the longevities
Either determine the model for sample means using calculus, or
approximate it using numerical methods.
35
Example
What is the distribution of the sample mean of
samples of size n = 48?
 X    79.7

14.5
X 

 2.09
n
48
Even though age at death is left skewed, with
n = 48 (large enough) the Central Limit Theorem
applies, and the sample mean has approximate
Normal distribution.
36
Example
I looked at 41 more recent obituaries (total of 48)
79
70
87
71
101
89
64
44
75
49
69
91
71
51
48
more
90
93
92
81
50
85
99
85
 data
95
51
80
89
92
86
74
68
81
88
81
92
71
45
99
77
78
89
42
93
69
72
92
92
91
X  77.52 S  16.37
37
Example
40
50
60
70
80
90
100
Median
Mean
Mode
38
95% Confidence Intervals
Example
Means for samples of 48 US longevities:
 X  79.7
My sample
 X  2.09 Normal
X  77.52
The sample mean is (79.7 – 77.52) = 2.18 from
the population mean.
What is the probability that a random sample of
48 U.S. women’s deaths gives a sample mean at
within 2.18 of 79.7.
2.18 below 79.7 is 77.52.
2.18 above 79.7 is 81.88
39
Example
Below 77.52 or above 81.88.
Z = 2.18/2.08 = 1.04
Probability = 0.852 – 0.148 = 0.704
0.7054
77.52
79.7
81.88
Normal, Mean=79.7, StDev=2.08
40
Example
Find the probability that a random sample of 48
U.S. women’s deaths gives a sample mean at
within 2.18 of 79.7.
Probability = 0.704
About 30% (that’s almost 1 in 3) of all samples
of 48 deaths give a sample mean more than 2.18
from 79.7.
41
Example
Give two explanations that account for the 2.18
year difference between the data on Oswego
longevity (which were lower on average) and the
U.S. longevity parameter of 79.7.
1. Women in Oswego do not live as long on
average as they do nationwide. That is:
Oswego< 79.7
42
Example
Give two explanations that account for the 2.18
year difference between the data on Oswego
longevity (which were lower on average) and the
U.S. longevity parameter of 79.7.
2. Sampling variability (sampling “error”):
Oswego= 79.7
About 30% of all samples of 48 women yield a
mean 2.18 or more from 79.7. That isn’t so
uncommon. Our data aren’t very inconsistent
with the national result.
43
Sampling Without Replacement
What to do if the sample size is more than 5% of
the population size…
N= population size
n = sample size
N / n  20
n / N ≤ 0.05
44
Distribution of Sample Means
The distribution of the sample mean has
> mean  X  
(“unbiased”)

N n
> standard deviation  X 

N 1
n
> shape closer to Normal
(but not necessarily Normal)
45
Word Lengths – Gettysburg Address
1
2
3
4
5
6
7
Individual Word Lengths
= 268
words:
Mean length
EachNsymbol
represents
up to 2 observations.
Standard Deviation
8
9
10
11
 = 4.295.
 = 2.123.
Not Normal. Right skewed. Can’t use Table A2.
46
Distribution of Sample Means: n = 5
Sample means X from samples of size n = 5 have
> mean
 X    4.295
> standard deviation

N  n 2.123 268 5
X 



N 1
268 1
n
5
263
 0.9494
 0.9494 0.9925 0.942
267
> shape closer to Normal (but not Normal – a bit right
skewed)
47
Distribution of Sample Means : n = 5
The standard
deviation of
this
distribution
is 0.942.
4.295
The mean of this
distribution is 4.295.
1.6 2.0 2.4 2.8 3.2 3.6 4.0 4.4 4.8 5.2 5.6 6.0 6.4 6.8 7.2 7.6 8.0
Sample Mean Word Length
The shape is close to Normal (but not Normal
– there’s right skew).
Each symbol represents up to 22 observations.
48
Distribution of Sample Means : n = 10
Sample means X from samples of size n = 10 have
> mean
 X    4.295
> standard deviation

N  n 2.123 268 10
X 



N 1
268 1
n
10
258
 0.6714
 0.6714 0.9830 0.660
267
> shape closer to Normal (but not exactly Normal – a bit
right skewed)
49
Distribution of Sample Means : n = 10
The
standard
deviation
of this
distribution
is 0.660.
4.295
The mean of this
distribution is 4.295.
1.6 2.0 2.4 2.8 3.2 3.6 4.0 4.4 4.8 5.2 5.6 6.0 6.4 6.8 7.2 7.6 8.0
Sample Mean Word Length
The shape is quite close to Normal (just a little
Each symbol represents up to 31 observations.
50
right skew – not enough to fuss over).
n=5

N  n 2.123 268 5
X 



N 1
268 1
n
5
263
 0.9494
 0.9494 0.9925 0.942
267
Awful close to  1
n = 10

N  n 2.123 268 10
X 



N 1
268 1
n
10
258
 0.6714
 0.6714 0.9830 0.660
51
267
n=5

N  n 2.123 268 5
X 



N 1
268 1
n
5
263
 0.9494
 0.9494 0.9925 0.942
267
Almost the same.
n = 10

N  n 2.123 268 10
X 



N 1
268 1
n
10
258
 0.6714
 0.6714 0.9830 0.660
52
267
n = 100

N  n 2.123 268 100
X 



N 1
268 1
n
100
168
 0.2123
 0.2123 0.7932 0.1684
267
Not so close to  1
Not almost the same.
53
Distribution of the Sample Mean
Given
A variable with population that is distributed with
mean  and standard deviation .
A random sample of size n.
PARAMETERS
Results 1 and 2
STATISTIC
The sample mean X has distribution with the same
mean and a smaller standard deviation.
X  

N n
X 

N 1
n
54
Distribution of the Sample Mean
Given
A variable with population that is distributed with
mean  and standard deviation .
A random sample of size n.
Results 3
The sample mean X has distribution with a shape
that is closer to Normal.
X  

N n
X 

N 1
n
55
Download