Lecture notes

advertisement
Rule of sample proportions
IF:
1. There is a population proportion of interest
2. We have a random sample from the population
3. The sample is large enough so that we will see at least
five of both possible outcomes
THEN:
If numerous samples of the same size are taken and the sample
proportion is computed every time, the resulting histogram
will:
1. be roughly bell-shaped
2. have mean equal to the true population proportion
3. have standard deviation equal to
population proportion  (1  population proportion )
sample size
Sample means: measurement variables
Suppose we want to estimate the mean weight at PSU
Histogram of Weight, with Normal Curve
40
Frequency
30
20
10
0
100
200
300
Weight
Data from stat 100 survey, spring 2004. Sample size 237.
Mean value is 152.5 pounds.
Standard deviation is about (240 – 100)/4 = 35
What is the uncertainty in the mean?
We need a margin of error for the mean.
Suppose we take another sample of 237.
What will the mean be?
Will it be 152.5 again?
Probably not.
Consider what would happen if we took 1000 samples,
each of size 237, and computed 1000 means.
Hypothetical result, using a “population” that resembles our sample:
Histogram of 1000 means with normal
curve, based on samples of size 237
Frequency
100
50
0
145
150
155
Weight
Standard deviation is about
(157 – 148)/4 = 9/4 = 2.25
160
Extremely
interesting:
The histogram
of means is
bell-shaped,
even though
the original
population
was skewed!
Formula for estimating the standard deviation of
the sample mean (don’t need histogram)
Just like in the case of proportions, we would
like to have a simple formula to find the
standard deviation of the mean without having
to resample a lot of times.
Suppose we have the standard deviation of the
original sample. Then the standard deviation
of the sample mean is:
standard deviation of the data
sample size
So in our example of weights:
The standard deviation of the sample is about 35.
Hence by our formula:
Standard deviation of the mean is 35 divided by
the square root of 237:
35/15.4 = 2.3
(Recall we estimated it to be 2.25)
So the margin of error of the sample mean is
2x2.3 = 4.6
Report 152.5 ± 4.6 (or 147.9 to 157.1)
Example: SAT math scores
Suppose nationally we know that the SAT math test has a
mean of 100 points and a standard deviation of 100 points.
Draw by hand a picture of what you expect the distribution
of sample means based on samples of size 100 to look like.
Sample means have a normal distribution
mean 500
standard deviation 100/10 = 10
So draw a bell shaped curve, centered at 500, with 95%
of the bell between 500 – 20 = 480 and 500 + 20 = 520
0.03
0.04
Normal curve of SAT means, sample size 100
0.02
A sample of 100 SAT
math scores with a
mean of 540 would be
very unusual.
0.00
0.01
A sample of 100 with a
mean of 510 would not
be unusual.
460
480
500
Score
520
540
Rule of sample means
IF:
1. The population of measurements of interest is bellshaped, OR
2. A large sample (at least 30) is taken.
THEN:
If numerous samples of the same size are taken and the sample
mean is computed every time, the resulting histogram will:
1. be roughly bell-shaped
2. have mean equal to the true population mean
3. have standard deviation estimated by
sample standard deviation
sample size
Back to proportions:
Suppose the true proportion is known
When we know the true population proportion, then we can
anticipate where a sample proportion will fall (give an interval
of values).
It is known that about 12% of the population is left-handed.
Take a sample of size 200.
We need the standard deviation of the sample proportion:
.12  (1  .12)
SD 
 .023
200
We want a normal curve centered at .12 with standard
deviation .023. So 95% of the bell should be spread
between
.12 – 2(.023) = .12 − .046 = .074
.12 + 2(.023) = .12 + .046 = .166
Normal curve of sample proportions
based on sample size 200
0.051
0.074
0.097
0.120
0.143
sample percents
0.166
0.189
Normal curve of sample proportions
based on sample size 200
95% of the time,
we expect a
sample of size 200
to produce a
sample proportion
between .074 and
.166.
5% of the time, we
expect the sample
proportion to be
outside this range.
0.051
0.074
0.097
0.120
0.143
sample percents
0.166
0.189
True proportion known (cont’d)
If you play 100 games of craps, where will the proportion of
games you win lie 95% of the time?
True proportion (mean of sample proportions): .493
Standard deviation of sample proportions:
.493  (1  .493)
 .050
100
Answer: Between .493−2(.050) and .493+2(.050), or
between .393 and .593.
True proportion unknown
Next, suppose we do not know the true population proportion
value. This is far more common in reality!
How can we use information from the sample to estimate the
true population proportion?
Suppose we have a sample of 200 students in STAT 100 and
find that 28 of them are left handed.
Our sample proportion is:
.14
We can now estimate the standard deviation of the
sample proportion based on a sample of size 200:
.14  (1  .14)
SD 
 .025
200
Hence, 2 standard deviations = 2(.025) = .05
Normal curve of sample proportions
based on sample size 200
Note the green curve,
which is the truth.
Of course, ordinarily
we don’t know where it
lies, but at least we
know its standard
deviation.
0.08
0.10
0.12
0.14
0.16
0.18
Thus, we can build a
confidence interval
around our 14%
estimate (in red).
sample percents
If we take another sample, the red line will move but the green curve
will not!
30 confidence intervals
based on sample size 200
0.06
0.08
0.10
0.12
0.14
0.16
sample percents
0.18
If we repeat the
sampling over
and over, 95% of
our confidence
intervals will
contain the true
proportion of
0.12.
This is why we
use the term
“95% confidence
interval”.
Definition of “95% confidence interval for the true
population proportion”:
An interval of values computed from the sample that is almost
certain (95% certain in this case) to cover the true but
unknown population proportion.
The plan:
1. Take a sample
2. Compute the sample proportion
3. Compute the estimate of the standard deviation of the
proportion  (1  proportion)
sample proportion:
sample size
4. 95% confidence interval for the true population proportion:
sample proportion ± 2(SD)
Download