chapter9 - mistergallagher

advertisement
Chapter 9
Estimating a Population
Proportion
Created by Kathy Fritz
Selecting an Estimator
What makes a statistic a good estimator of
a population characteristic?
1. Choose a statistic that is unbiased
Unbiased, since
the distribution is
In other
words,
a statistic
that
doesthat
not is
Biased,
A statistic
sincewith
the
a sampling
Unbiased,
distribution
since
centered
at the
consistently
to value
underestimate
to
distribution
centered
at is
thetend
actual
the
distribution
of the true
population
is orvalue
overestimate
value ofestimator
aatpopulation
NOT
characteristic
centered
at
is anthe
unbiased
centered
the of that
characteristic
is an unbiased
of that
the true value
population
characteristic.
true estimator
value
characteristic.
What makes a statistic a good estimator of
a population characteristic?
1. Choose a statistic that is unbiased
2. Choose a statistic with a small standard
error
Unbiased, but has a
smaller
standarddistribution is
The
standard
deviation
of
a
sampling
If a sampling distribution is centered very close to
error
so it iserror.
more
called
the
standard
the actual value of the population characteristic, a
precise.
small standard error ensures that values of the
Unbiased, but has a
statistic will cluster tightly around the actual value of
larger standard
A statistic
that is unbiased
and has a small
the population
characteristic.
error so it is not as
standard error is likely to result in an estimate
precise.
that is close to the actual value of that
population characteristic.
In a review of ALL criminal cases heard by the
Supreme Courts of 11 states from 2000 to 2004,
391 of the 1488 cases were decided in favor of the
defendant. Let p be the proportion of all cases reviewed
that decided in favor of the defendant.
𝑝=
391
1488
= 0.263
Suppose that the proportion p = 0.263 was not known. To
estimate this proportion, you plan to select a sample and
compute 𝑝, the sample proportion that were decided in
favor of the defendant.
If n = 25, then the standard error of 𝑝 is
standard error of 𝑝 =
𝑝(1 − 𝑝)
=
𝑛
0.263(1 − 0.263)
= 0.088
25
Supreme Court Cases Continued . . .
Let p be the proportion of all cases reviewed that decided
in favor of the defendant.
If n = 25, then the standard error of 𝑝 is
𝑝(1 − 𝑝)
=
𝑛
0.263(1 − 0.263)
= 0.088
25
𝑝(1 − 𝑝)
=
𝑛
0.263(1 − 0.263)
= 0.044
100
How does
theof 𝑝 =
standard
error
sample size
affect the
standard error
𝑝? then the standard error of 𝑝 is
If n =of
100,
standard error of 𝑝 =
Supreme Court Cases Continued . . .
Suppose that p = 0.40. How does this affect the standard
error of 𝑝?
If n = 25 and p = 0.263, then the standard error of 𝑝 is
standard error of 𝑝 =
𝑝(1 − 𝑝)
=
𝑛
0.263(1 − 0.263)
= 0.088
25
If n = 25 and p = 0.40, then the standard error of 𝑝 is
standard error of 𝑝 =
𝑝(1 − 𝑝)
=
𝑛
0.40(1 − 0.40)
= 0.098
25
Supreme Court Cases Continued . . .
Suppose that p = 0.04. How does this affect the standard
error of 𝑝?
Does it surprise you that 𝑝 tends to
If n = 25 and p = 0.263, then the standard error of 𝑝 is
produce more precise estimates the
− 𝑝)
0.263(1 − 0.263)
farther the population𝑝(1
proportion
is
standard errorfrom
of 𝑝 =
=
= 0.088
0.5? 𝑛
25
If n = 25 and p = 0.04, then the standard error of 𝑝 is
𝑝(1 − 𝑝)
0.04(1 − 0. 04)
standard error of 𝑝 =
=
= 0.039
𝑛
25
For a fixed sample size, the standard error
of 𝑝 is greatest when p = 0.5.
Estimating a Population
Proportion
Margin of Error
The value of the sample proportion 𝑝 provides an estimate
of the population proportion p.
Let
p = 0.484
If 𝑝 = 0.426, then the estimate is “off” by 0.058. This
difference represents the error in the estimate.
A different sample might produce an estimate of
𝑝 = 0.498, resulting in an estimation error of 0.014.
The margin of error of
a statistic
is the
Notice
that different
maximum likely estimation
error.
samples
will produce
different 𝑝 estimates that
It is unusual for an estimate to differ from the actual
will have different
value of the population characteristic by more than the
estimation errors.
margin of error.
Recall the General Properties for
Sampling Distributions of 𝑝
1. The mean of the 𝑝 sampling distribution is p.
πœ‡π‘ = 𝑝
When
theseerror
properties
hold, we
can 𝑝
2. The
standard
(deviation)
of the
use what
we knowisabout normal
sampling
distribution
distributions to tell us about how 𝑝
𝑝(1−𝑝)
behaves asπœŽπ‘an=estimator of p.
𝑛
3. If n is large, the 𝑝 sampling distribution is
approximately normal.
If a variable has a standard normal distribution,
about 95% of the time the value of variable will be
between -1.96 and 1.96.
Central Area = 0.95
Lower tail area = .025
Upper tail area = .025
-1.96
0
1.96
If n is large, the 𝑝 sampling distribution is
approximately normal with mean p and standard
error πœŽπ‘ =
𝑝(1−𝑝)
.
𝑛
About 95% of the possible 𝑝 will fall
For any normal distribution,
about 95% of the
𝑝(1−𝑝)
within
1.96
of1.96
the standard
populationdeviations
observed
values
will be within
𝑛
of the mean.
proportion
p.
Central Area = 0.95
This is the margin of error for
estimating a population proportion.
Lower tail area = .025
Upper tail area = .025
-1.96πœŽπ‘
p
1.96πœŽπ‘
Margin of Error for Estimating a
Population Proportion p
Appropriate when the following conditions are met
1. The sample is a random sample from the population of
interest
OR
the sample is selected in a way that makes it
reasonable to think the sample is representative of
the population.
2. The sample size is large enough. This condition is
met when either both 𝑛𝑝 ≥ 10 and 𝑛(1 − 𝑝) ≥ 10
OR (equivalently)
the sample includes at least 10 successes and at
least 10 failures.
Margin of Error for Estimating a Population
Proportion p Continued . . .
When these conditions are met
𝑝(1 − 𝑝
margin of error = 1.96
𝑛
Interpretation of margin of error
The formula
givenfor
forthe
thesample
marginproportion
of error istoactually
It would
be unusual
differ
margin
of population
error, but proportion
it is common
fromthe
theestimated
actual value
of the
bytomore
referoftoerror.
it without the “estimated”.
than the margin
Anyoftime
a margin
of errorthe
is reported,
is an will
For 95%
all random
samples,
estimationiterror
be less than theestimated
margin of margin
error. of error.
Based on a representative sample of 511 U.S. teenagers
ages 12 to 17, International Communications Research
estimated that the proportion of teens who support
keeping the legal drinking age at 21 is 𝑝 = 0.64 with a
margin of error of 0.04.
Let’s see how this margin of error was computed.
Check conditions:
1. Given that the sample was representative of the
population
2. The sample size is large enough because
𝑛𝑝 = 511 0.64 = 327 ≥ 10 and
𝑛 1 − 𝑝 = 511 0.36 = 184 ≥ 10
Legal Drinking Age Continued . . .
𝑝 = 0.64 with a margin of error of 0.04
Compute margin of error
π‘šπ‘Žπ‘Ÿπ‘”π‘–π‘› π‘œπ‘“ π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ = 1.96
0.64(0.36)
511
= 0.04
Interpretation
An estimate of the proportion of U.S. teens who favor
keeping the legal drinking age at 21 is 0.64. It is
unlikely that this estimate differs from the actual
population proportion by more than 0.04.
A Large Sample Confidence
Interval for a Population
Proportion
Confidence Interval
Confidence Level
Developing
a
Confidence
Interval
Notice that this line equals
𝑝(1−𝑝)we get this 𝑝.
Suppose
1.96
. We will use
𝑛
Approximate sampling
distribution of 𝑝
this to
we
createSuppose
an interval
of values
toget this 𝑝.
we Suppose
get
this 𝑝.
estimate p.
This 𝑝 did not fall within 1.96
p standardUsing this method
p (1 ο€­ p )
p (1 ο€­ p )
1 . 96
1 . 96
deviations of the value
of
p
AND
its
n
n
of calculation, the
interval does NOT “capture” p.
𝑝
confidence interval
will not capture p
𝑝
5% of the time.
ThisNotice
line represents
𝑝 the 1.96
that
length
of
This
line represents
1.96
When
n
is
large,
a
95%
confidence
interval
for
p is
standard
above
eachdeviations
half of the
interval
This
𝑝 felldeviations
within 1.96below
standard
standard
This 𝑝 fell
within
1.96
standard
the mean. 𝑝(1𝑝(1−𝑝)
−of
𝑝) p AND
𝑝(1
− 𝑝)
deviations
of
the
value
of p AND its
the
mean.
equals
1.96
deviations
of
the
value
𝑝 − 1.96
, 𝑝 + 1.96
𝑛
“captures”
p.
𝑛
𝑛
its interval “captures”
pinterval
.
Confidence Intervals
A confidence interval (CI) for a population
characteristic specifies an interval of plausible
values for the characteristic.
The interval is constructed in such a way so
that
the
resulting
will be successful
The
primary
goalinterval
of a confidence
interval in
capturing
the actualanvalue
of the
population
is to estimate
unknown
population
characteristic acharacteristic.
specified percentage of time.
Confidence level
The confidence level associated with a
confidence interval is the success rate of the
method used to construct the interval.
If this method was used to generate an interval
estimate over and over again from different
random samples, in the long run 95% of the
resulting
intervals would
include
Our confidence
is in the
methodthe
– actual value
in any one particular
interval!
of theNOT
characteristic
being estimated.
The diagram to the right
is 100 95% confidence
intervals for p computed
from 100 different
random samples.
out of
the
100with
Note7 that
the
ones
confidence
do
asterisks
dointervals
not capture
not contain p.
p.
Why not?
If we were to compute
100 more confidence
intervals for p from 100
different random
samples, would we get
the same results?
Other Confidence Levels
Suppose we wanted to create confidence intervals
with a 90% confidence level . . .
Notice also that
the larger the
confidence level,
Notice that
the larger the
these
critical value will
critical
be AND the
values differ
Suppose
we wanted to create confidence intervals
wider
the confidence level . . .
for
with
a 99%
interval will be.
different
confidence
levels.
The Large-Sample Confidence
Interval for p
The normal distribution is only an approximation of the
Appropriate
when
the
following
conditions
are
sampling distribution
of 𝑝let’s
andlook
the at
true
confidence
level
Now
general
formula.
met:
may differ somewhat from the reported level. If 𝑛𝑝 ≥ 10
and
𝑛(1 sample
− 𝑝) ≥ 10,
approximation
is reasonable
and of
the
1. The
is athe
random
sample from
the population
actual confidence
levelisisselected
usually quite
close
to the
interest
or the sample
in a way
that
makes
reported
level. This
is why
it sample
is important
to verify this
it reasonable
to think
the
is representative
of
condition.
the population.
2. The sample size is large enough. This condition is met
when either both 𝑛𝑝 ≥ 10 and 𝑛 1 − 𝑝 ≥ 10 or
(equivalently) the sample includes at least 10
successes and at least 10 failures.
The Large-Sample Confidence
Interval for p Continued . . .
The desired
level
determines
which z critical
When
theseconfidence
conditions
are
met, a confidence
value is used.
The population
three most common
confidence
levels
interval
for the
proportion
is
use the following z critical values:
Confidence Level
pˆ ο‚± ( z90%critical
95%
99%
z Criticalpˆ
Value
(1 ο€­
value)
1.645
1.96
2.58
pˆ )
n
This is a generic formula for a confidence
interval:
Estimated
Statistic ± critical value (standard error
of theerror
statistic)
standard
of 𝑝
The Large-Sample Confidence
Interval for p Continued . . .
Interpretation of Confidence Interval
You can be confident that the actual value of the
population proportion is included in the computed interval.
In any given problem, this statement
should be worded in context.
Interpretation of Confidence Level
The confidence level specifies the approximate
percentage of time that this method is expected to be
successful in capturing the actual population proportion.
Recall from Chapter 7 . . .
Four Key Questions:
5 Steps:
Q Estimate or hypothesis
testing?
S Sample data or
experimental data?
T One variable or two?
Categorical or
numerical?
N How many samples or
treatments?
E (Estimate) – Explain what
population characteristic
you plan to estimate.
M (Method) – Select a
method using QSTN
C (Check) – Verify that the
conditions are met
C (Calculate) – Perform the
necessary calculations
C (Communicate) – Interpret
the confidence interval
Of 1100 drivers surveyed, 990 admitted to careless or
aggressive driving during the previous 6 months. Assuming
that it is reasonable to regard this sample of 1100 as
representative of the population of drivers, compute a
90% confidence interval to estimate p, the proportion of
all drivers who have engaged in careless or aggressive
driving in the last 6 months.
Step 1 (E): The proportion of drivers who have engaged in
careless or aggressive driving during the last
6 months, p, will be estimated.
Step 2 (M): Because the answers to the four key
questions are Q: estimation, S: sample data,
T: one categorical variable, N: one sample, a
confidence interval for a population
proportion will be considered.
Careless or Aggressive Driving Continued . . .
Step 3 (C): There are two conditions that need to be met
for the confidence interval of this section to
be appropriate.
1. You do not know how the sample was
selected. In order to proceed, you MUST
assume that the sample was representative of
the population.
2. Sample size is large enough because
990
𝑛𝑝 = 1100
= 990 ≥ 10 and
1100
𝑛 1 − 𝑝 = 1100 0.10 = 110 ≥ 10
Step 4 (C): Calculate the interval
0.9(0.1)
0.9 ± 1.645
= (0.885, 0.915)
1100
Careless or Aggressive Driving Continued . . .
Step 5 (C): Communicate results
Interpret Confidence Interval:
Assuming that the sample was representative of the
population, you can be about 90% confident that the
actual proportion of drivers who engaged in careless or
aggressive driving in the past 6 months is somewhere
between 0.885 and 0.915.
Interpret Confidence level:
The method used to construct this interval estimate is
successful in capturing the actual value of the population
proportion about 90% of the time.
Three Things that Affect the
Width of a Confidence Interval
1. The higher the confidence level, the
wider the interval.
2. The larger the sample size, the
narrower the interval.
3. The closer 𝑝 is to 0.5, the wider the
interval.
An Alternative to the LargeSample z Interval
Even when the sample size conditions are met, sometimes
the actual confidence level associated with the method
may be noticeably different from the reported confidence
level.
One way to correct this is to use a modified sample
proportion, π‘π‘šπ‘œπ‘‘ , the proportion of successes after adding
two successes and two failures to the sample.
π‘π‘šπ‘œπ‘‘
π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ 𝑠𝑒𝑐𝑐𝑒𝑠𝑠𝑒𝑠 + 2
=
𝑛+4
Use this modified sample proportion in place of 𝑝 in the
usual confidence interval formula.
Choosing a Sample Size to Achieve
a Desired Margin of Error
Choosing a Sample Size
Using a 95% confidence interval, the sample size
required to estimate a population proportion p with a
margin of error M is
Before collecting any data,
𝑝(1 −you
𝑝) might wish to
determine𝑀a =
sample
1.96 size that ensures a
𝑛
certain margin of error.
If we solve this for n . . .
1.96
𝑛 = 𝑝(1 − 𝑝)
𝑀
2
If there is no prior knowledge available, then the
The value of
p may be estimated
using
prior
conservative
estimate for
p is
0.5.information.
Why is the conservative estimate
for p = 0.5?
The formula for the margin of error is
𝑝(1 − 𝑝)
𝑀 = 1.96
𝑛
Since we are looking for the sample size that produces a
certain margin of error, then we need to focus on the
possible values of p(1 - p)
0.1(0.9) = 0.09
0.2(0.8) = 0.16
0.3(0.7) = 0.21
0.4(0.6) = 0.24
0.5(0.5) = 0.25
By using 0.5 for p, we are
using the largest possible
value for p(1 – p) in our
calculations.
Researchers have found biochemical markers of cancers
in the exhaled breath of cancer patients, but chemical
analysis of breath specimens has not yet proven effective
in diagnosing cancer.
A study is to be performed to investigate whether a dog
can be trained to identify the presence or absence of
cancer by sniffing breath specimens.
How many different breath specimens should be used if you
want to estimate the long-run proportion of correct
identifications for this dog with a margin of error of 0.10?
Always round
2 the
2
sample1.96
size up to the 1.96
𝑛 = 𝑝 1 −next
𝑝 whole number.
= 0.25
= 96.04
𝑀
0.10
A sample of at least 97 breath specimens
should be used.
Avoid These Common
Mistakes
Avoid These Common Mistakes
If a 90% confidence interval for p, the proportion of
students at a particular college who own a computer, is
(0.56, 0.78), you might say
Interpretation
of interval
Interpretation of
confidence level
Don’t get these two statements confused!
Avoid These Common Mistakes
1. In order for an estimate to be useful, you must
know something about its accuracy. You should
beware of a single number estimate that is not
accompanied by a margin of error or some
other measure of accuracy.
Avoid These Common Mistakes
2. A confidence interval estimate that is wide
indicates that you don’t have very precise
information about the population characteristic
being estimated.
Don’t be fooled by a high confidence level.
The best strategy for decreasing
High confidence is not the same thing as saying you
the
width
of information
a confidence
have
precise
about the value of a
interval ispopulation
to take acharacteristic.
larger
sample!
Avoid These Common Mistakes
3. The accuracy of an estimate depends on the
sample size, not the population size.
Notice that the margin of error involves the
sample size n, and decreases as n increases.
The size of the population, N, does need to be considered
if sampling without replacement and the sample size is
more than 10% of the population size.
In this case, the margin of error is adjusted by
multiplying it by a finite population correction factor
𝑡−𝒏
𝑡−𝟏
Avoid These Common Mistakes
4. CONDITIONS ARE IMPORTANT!
If conditions are met, the large sample
confidence interval provides a method for using
sample data to estimate the population
proportion with confidence, and the confidence
level is a good approximation of the
success rate for the method.
Avoid These Common Mistakes
5. When reading published reports, don’t
fall into the trap of thinking confidence
interval every time you see a ± in an
expression.
In addition to confidence intervals it is common
to see both estimate ± margin of error and
estimate ± standard error reported.
±
Download