Rule of sample proportions (p. 359)

advertisement
10/31/08
132 × 185 Row total × Column total
=
= 106.6
229
Overall total
Oct. 31 Statistic for the day:
Per capita consumption of candy by
Americans in 2007: 24.5 pounds
Rule of sample proportions (p. 359)
IF:
73 × 110
218
= 63.6of interest
1. There is a population proportion
2. We have a random sample from the 2population
(Observed − Expected)
3. The sample is large
enough so that we will see at least five
Expected
of both possible outcomes
THEN:
1
√ size are taken and the sample
margin
of error
If numerous
samples
of the=same
sample size
proportion is computed every time, the resulting histogram
will:
1
1
margin of error = √
1. be roughly bell-shaped1007
=
31.7
= 0.032
364population
× · · · × proportion
326
2. have mean equal 365
to the×true
P (No Match) =
Assignment: Read Chapter 20
Exercises pp. 382-383: 1, 2, 4, 6
365 ×equal
365 ×
3. have standard deviation
to:· · · × 365
!
"
"
"
"
"
#
= 0.109
(population proportion) × (1 − population proportion)
sample size
!
"
"
"
#
standard deviation = "
(.56) × (.44)
= 0.16
1007
1
Sample means: quantitative variables
Suppose we want to estimate the mean weight at PSU
What is the uncertainty in the mean?
We need a margin of error for the mean.
Suppose we take another sample of 237.
What will the mean be?
Will it be 152.5 again?
Probably not.
Consider what would happen if we took 1000 samples,
each of size 237, and computed 1000 means.
Data from stat 100 survey, spring 2004. Sample size 237.
Mean value is 152.5 pounds.
Standard deviation is about (240 – 100)/4 = 35
Hypothetical result, using a “population” that resembles our sample:
Extremely
interesting:
The histogram
of means is
bell-shaped,
even though
the original
population
was skewed!
Formula for estimating the standard deviation of
the sample mean (don’t need histogram)
Just like in the case of proportions, we would
like to have a simple formula to find the
standard deviation of the mean without having
to resample a lot of times.
Suppose we have the standard deviation of the
original sample. Then the standard deviation
of the sample mean is:
Standard deviation is about
(157 – 148)/4 = 9/4 = 2.25
1
10/31/08
Example: SAT math scores
So in our example of weights:
The standard deviation of the sample is about 35.
Hence by our formula:
Standard deviation of the mean is 35 divided by
the square root of 237:
35/15.4 = 2.3
(We estimated 2.25)
So the margin of error of the sample mean is
2×2.3 = 4.6
Report 152.5 ± 4.6
(Same as 147.9 to 157.1)
A sample of 100 SAT
math scores with a
mean of 540 would be
very unusual.
A sample of 100 with a
mean of 510 would not
be unusual.
Suppose nationally we know that the SAT math test has a
mean of 100 points and a standard deviation of 100 points.
Draw by hand a picture of what you expect the distribution
of sample means based on samples of size 100 to look like.
Sample means have a normal distribution
mean 500
standard deviation 100/10 = 10
So draw a bell shaped curve, centered at 500, with 95%
of the bell between 500 – 20 = 480 and 500 + 20 = 520
132 × 185 Row total × Column total
=
= 106.6
229
Overall total
73 × 110
= 63.6
Rule of sample 218
means
(p. 363)
(Observed
− Expected)
IF:
1. The population
of measurements
of interest is2bell-shaped,
OR
Expected
2. A large sample (at least 30) is taken.
1
margin of error = √
sample
size
THEN: If numerous samples of the same size are
taken and
the sample
mean is computed every time, the resulting
histogram
will:
1
1
margin of error = √
=
= 0.032
1. be roughly bell-shaped
1007 31.7
2. have mean equal to the true population
mean
365 × 364
× · · · × 326
P (No Match) =
= 0.109
3. have standard deviation estimated
by:365 × · · · × 365
365 ×
sample standard deviation
√
sample size
!
"
"
"
"
"
"
"
#
132 × 185
Row total × Column total
=
= 106.6
(population
×total
(1 − population
proportion)
229 proportion)
Overall
sample size
73 × 110
1
218
= 63.6
(Observed − Expected)2
Expected
Back to proportions:
Suppose the true proportion is known
margin of error = √
1
sample size
Back to proportions (cont’d): 1
1
margin of error = √
=
= 0.032
31.7
Suppose the true proportion 1007
is known
Recall that in craps, P(win) = 244/495 = 0.493 for each game.
365 × 364 × · · · × 326
Recall that in craps, P(win)
= 244/495
P (No
Match) = = 0.493 for each game.
= 0.109
Q: If you play 100 games of craps, what
proportion will you win?
sample standard deviation
The RoSP says that the sampling distribution √
of the proportion of wins
sample size
A: Dunno.
Q: Okay, I guess that was obvious. But can we at
least give a range of possible proportions that will be
valid most (say, 95%) of the time?
A: Sure. Use the Rule of Sample Proportions.
365 × 365 × · · · × 365
in 100 games will be:
• • • !
"
"
"
(population proportion) × (1 − population proportion)
"
#
roughly normal,
sample size
with mean 0.493
!
"
and standard deviation "
" (.493) × (.507)
#
100
= 0.05.
1
Therefore, the 68-95-99.7 rule says that 95% of the time,
the proportion
of wins in 100 games will be between 0.493−2×0.05 and 0.493+2×0.05.
(0.393
and
0.593)
2
10/31/08
True proportion unknown
Next, suppose we do not know the true population proportion
value. This is far more common in reality!
How can we use information from the sample to estimate the
true population proportion?
Suppose we have a sample of 200 students in STAT 100 and
find that 28 of them are left handed.
Our sample proportion is:
We can now estimate the standard deviation of the
sample proportion based on a sample of size 200:
!
"
"
"
"
#
(.14) × (1 − .14)
= 0.025
200
(94 − 106.6)2 (38 − 25.4)2 (91 − 78.4)2 (6 − 18.6)2
+
+
+
= 18.3
106.6
25.4
78.4
18.6
2
Hence, 2 standard
(73 deviations
− 62.4) = 2×.025 = .05
= 1.80
62.4
(53 − 63.6)2
= 1.77
63.6
0.14
(35 − 45.6)2
= 2.46
On the following two slides, we’ll pretend
45.6 that the true population proportion is 0.12.
(57 − 46.4)2
= 2.42
46.4

P (no match with MY birthday) = 

P (no match at all) ≈ 
Note the green curve,
which is the truth.

364 49

= 0.874
365

364 1225

= 0.035
365
2
Of course, ordinarily
we don’t know where it
lies, but at least we
know its standard
deviation.
Thus, we can build a
confidence interval
around our 14%
estimate (in red).
If we repeat the
sampling over
and over, 95% of
our confidence
intervals will
contain the true
proportion of
0.12.
This is why we
use the term
“95% confidence
interval”.
If we take another sample, the red line will move but the green curve
will not!
Definition of “95%132confidence
interval
fortotal
the= 106.6
true
× 185 Row total
× Column
=
229
Overall total
population proportion”:
73 × 110
= 63.6
218 the sample that is almost
An interval of values computed from
2
certain (95% certain in (Observed
this case)
to cover
the true but
− Expected)
Expected
unknown population proportion.
margin of error = √
The plan:
margin of error = √
1
sample size
1
1
=
= 0.032
1007 31.7
1. Take a sampleP (No Match) = 365 × 364 × · · · × 326 = 0.109
365 × 365 × · · · × 365
2. Compute the sample proportion
sample standard deviation
√
sample size
3. Compute the estimate of the standard deviation of the
sample proportion: (proportion) × (1 − proportion)
!
"
"
"
"
#
sample size
4. 95% confidence interval(.14)
for×the
true population proportion:
(1 − .14)
= 0.025
sample proportion ± 2×SD 200
!
"
"
"
#
Other confidence coefficients
The confidence intervals we’ve been constructing (using ±2×stdev)
are called 95% confidence intervals.
The confidence coefficient is 95%, or .95.
It means that we cover the middle 95% of the normal curve
associated with sample proportions, and this requires 2 standard
deviations.
We can change the confidence coefficient by using the normal
table (p. 157) to determine the appropriate number of standard
deviations.
1
3
10/31/08
Other confidence coefficients:
An example
Suppose we want a 90% confidence interval instead of
95%.
How many standard deviations span the middle 90% of
the normal curve?
90% confidence interval (see p. 157)
Since 90% is in the
middle, there is 5% in
either end.
So find z for .05 and z
for .95.
We get z = ±1.64
90% confidence interval: sample proportion ± 1.64×stdev
Back to the original example of sample size 200, 14% lefthanded in sample.
We found the sample stdev to be .025.
Hence, to construct a 90% CI we have
.140 – 1.64×.025 = .140 − .041 = .099
.140 + 1.64×.025 = .140 + .041 = .181
90% confidence interval: .099 to .181
(95% confidence interval was: .090 to .190)
So the 90% CI is shorter (a more precise estimate) but less
accurate. It has a 10% chance of missing the true population
proportion.
4
Download