AP Statistics Chapter 10 - Peacock

advertisement
Confidence intervals
for Proportions
Chapter 19
Objectives:
1.
2.
3.
4.
5.
Standard Error
Confidence Interval
One-proportion z-interval
Margin of Error
Critical Value
Introduction
• Statistical Inference
– Involves methods of using information from a sample
to draw conclusions regarding the population.
– In formal statistical inference, we use probability to
express the strength of our conclusions.
Example of Statistical Inference
• In the Vietnam War years, a lottery determined the order in which men
were drafted for army service. The lottery assigned draft numbers by
choosing birth dates in random order. We expect a correlation near
zero between birth dates and draft numbers if the draft numbers come
from random choice. The actual correlation between birth date and
draft number in the first draft lottery was r = -0.226. That is, men born
later in the year tended to get lower draft numbers. Is this small
correlation evidence that the lottery was biased?
• Our unaided judgment can’t tell because any two variables will have
some association in practice, just by chance. So we calculate that a
correlation this far from zero has probability less than 0.001 in a truly
random lottery.
• Because a correlation as strong as that observed would almost never
occur in a random lottery, there is strong evidence that the lottery was
unfair.
Two Most Common Types of
Formal Statistical Inference
1. Confidence Intervals
– Estimate the value of a population
parameter.
2. Tests of Significance
– Assess the evidence for a claim about a
population.
Conference Intervals and
Tests of Significance
• Both types of inference are based on the
sampling distributions of a sample statistic.
• That is, both report probabilities that state
what would happen if we used the
inference method many times.
CONFIDENCE INTERVAL
The sample proportion p̂
We now study categorical data and draw inference on the proportion, or
percentage, of the population with a specific characteristic.
If we call a given categorical characteristic in the population “success,”
then the sample proportion of successes, p̂
,is:
pˆ 

We choose 50 people in an undergrad class, and find that 10 of them are
Hispanic:

count of successes in the sample
count of observatio ns in the sample
p̂ = (10)/(50) = 0.2 (proportion of Hispanics in sample)
You treat a group of 120 Herpes patients given a new drug; 30 get better:
p̂ = (30)/(120) = 0.25 (proportion of patients improving in sample)
Sampling distribution of p̂
The sampling distribution of p̂ is never exactly normal. But as the
sample size increases, the sampling distribution of p̂ becomes
approximately normal.
Concept of Confidence Intervals
• As we discussed in sampling distributions, the existence
of sampling variation affects the accuracy of a sample
statistic as an estimator of a population parameter.
• The unbiased estimators calculated using a sampling
distribution can be described as point estimators –
specific numbers that are estimates of the parameter.
• In this section, we will develop the idea of a different
type of estimate, an interval estimate, which
incorporates the sampling variability of the point
estimators.
Standard Error
• Both of the sampling distributions we’ve
looked at are Normal.
– For proportions
pq
n
SD  pˆ  
– For means
SD  y  

n
Standard Error
• When we don’t know p or σ, (which we normally
don’t, because they are population parameters)
we’re stuck, right?
• Nope. We will use sample statistics to estimate
these population parameters.
• Whenever we estimate the standard deviation of
a sampling distribution, we call it a standard
error.
Standard Error
• For a sample proportion, the standard
error is
ˆˆ
pq
SE  pˆ  
n
• For the sample mean, the standard error is
s
SE  y  
n
A Confidence Interval
• Recall that the sampling distribution model of p̂
is centered at p, with standard deviation pq .
n
• Since we don’t know p, we can’t find the true
standard deviation of the sampling distribution
model, so we need to find the standard error:
p̂q̂
SE( p̂) 
n
A Confidence Interval
• By the 68-95-99.7% Rule, we know
– about 68% of all samples will have p̂ ’s within 1 SE of
p
– about 95% of all samples will have p̂ ’s within 2 SEs
of p
– about 99.7% of all samples will have p̂ ’s within 3 SEs
of p
• We can look at this from p̂ ’s point of view…
A Confidence Interval
• Consider the 95% level:
– There’s a 95% chance that p is no more than 2 SEs
away from p̂ .
– So, if we reach out 2 SEs, we are 95% sure that p will
be in that interval. In other words, if we reach out 2
SEs in either direction of p̂ , we can be 95% confident
that this interval contains the true proportion.
• This is called a 95% confidence interval.
A Confidence Interval
Confidence Interval
• Definition
– Confidence Interval is a range of values used
to estimate the true value of a population
parameter.
What Does “95% Confidence” Really
Mean?
• Each confidence interval uses a sample
statistic to estimate a population
parameter.
• But, since samples vary, the statistics we
use, and thus the confidence intervals we
construct, vary as well.
What Does “95% Confidence” Really
Mean?
• The figure to the right
shows that some of
our confidence
intervals (from 20
random samples)
capture the true
proportion (the green
horizontal line), while
others do not:
What Does “95% Confidence” Really
Mean?
• Our confidence is in the process of
constructing the interval, not in any one
interval itself.
• Thus, we expect 95% of all 95%
confidence intervals to contain the true
parameter that they are estimating.
What Does “95% Confidence” Really
Mean?
• Returning to our
pervious example.
• 20 samples from the
same population gave
these 95% confidence
intervals. In the long run,
95% of all samples give
an interval that contains
the population
proportion p.
A Level C Confidence Interval has
Two Parts:
1. An interval calculated from the data, usually
of the form
estimate ± margin of error
–
Example: estimate –
margin of error – how accurate we
believe our estimate is, based on
the variability of the estimate. For a 95%
confidence interval the margin of error
would be 2SE ( pˆ ) .
2. A Confidence Level C, which gives the
probability that the interval will capture
the true parameter value in repeated
samples.
– Example: 95% confidence interval – normally
use confidence level of 90% or higher (want
to be sure of our conclusions).
Margin of Error: Certainty vs.
Precision
• We can claim, with 95% confidence, that the interval
p̂  2SE( p̂)
contains the true population proportion.
– The extent of the interval on either side of p̂ is called the margin
of error (ME).
• In general, confidence intervals have the form estimate ±
ME.
• The more confident we want to be, the larger our ME
needs to be, making the interval wider.
Margin of Error: Certainty vs.
Precision
Margin of Error: Certainty vs.
Precision
• To be more confident, we wind up being less precise.
– We need more values in our confidence interval to be more
certain.
• Because of this, every confidence interval is a balance
between certainty and precision.
• The tension between certainty and precision is always
there.
– Fortunately, in most cases we can be both sufficiently certain and
sufficiently precise to make useful statements.
Margin of Error: Certainty vs.
Precision
• The choice of confidence level is somewhat
arbitrary, but keep in mind this tension between
certainty and precision when selecting your
confidence level.
• The most commonly chosen confidence levels
are 90%, 95%, and 99% (but any percentage
can be used).
Critical Values
• The ‘2’ in pˆ  2SE( pˆ ) (our 95% confidence interval) came
from the 68-95-99.7% Rule.
• Using a table or technology, we find that a more exact
value for our 95% confidence interval is 1.96 instead of
2.
– We call 1.96 the critical value and denote it z*.
• For any confidence level, we can find the corresponding
critical value (the number of SEs that corresponds to our
confidence interval level).
Example:
Confidence
Level
Lower Critical
Value
Upper Critical
Value
Critical Values
• Example: For a 90% confidence interval, the
critical value is 1.645:
z*
• The critical value z* is the number (z-score)
on the borderline separating sample
statistics that are likely to occur from those
that are unlikely to occur, for a given
confidence level.
z* is the same for any normal distribution
for a given confidence level
Problem:
• Find the critical value z* for a confidence
level of 88%?
• invNorm(.94)=1.555
Your Turn:
• Find the critical value z* for a confidence
level of 73%?
• invNorm(.73)=.6128
Assumptions and Conditions
• All statistical models are made upon assumptions.
– Different models make different assumptions.
– If those assumptions are not true, the model might be
inappropriate and our conclusions based on it may be
wrong.
• You can never be sure that an assumption is true,
but you can often decide whether an assumption
is plausible by checking a related condition.
Assumptions and Conditions
• Here are the assumptions and the
corresponding conditions you must check before
creating a confidence interval for a proportion:
• Independence Assumption: We first need to
Think about whether the Independence
Assumption is plausible. It’s not one you can
check by looking at the data. Instead, we check
two conditions to decide whether independence
is reasonable.
Assumptions and Conditions
– Randomization Condition: Were the data sampled at
random or generated from a properly randomized
experiment? Proper randomization can help ensure
independence.
– 10% Condition: Is the sample size no more than 10%
of the population?
 Sample Size Assumption: The sample needs to
be large enough for us to be able to use the CLT.
– Success/Failure Condition: We must expect at least
10 “successes” and at least 10 “failures.”
One-Proportion z-Interval
• When the conditions are met, we are ready to find the
confidence interval for the population proportion, p.
• The confidence interval is
p̂  z  SE  p̂ 

where
SE( p̂) 
p̂q̂
n
• The critical value, z*, depends on the particular
confidence level, C, that you specify.
One-Sample Confidence Interval for p Summary
Confidence intervals contain the population proportion p in C% of
samples. For an SRS of size n drawn from a large population and with
sample proportion p̂calculated from the data, an approximate level C
confidence interval for p is:
pˆ  me, me is the margin of error
me  z * SE  z * pˆ (1  pˆ ) n
Using p̂ as an unbiased estimate of p.
C
me
me
−Z*
Z*
C is the area under the standard
normal curve between −z* and z*.
Procedure:
Confidence Interval for a Population Proportion
1. Identify the population of interest and the
parameter you want to draw conclusions about
(population proportion p).
2. Choose the appropriate inference procedure.
Verify the conditions for using the selected
procedure.
– Conditions population proportion;
• Random condition
• 10% condition
• Success/Failure condition
3. If the conditions are met, carry out the
inference procedure.
– Confidence interval (CI)
•
CI = estimate ± margin of error
– In general
•
CI = estimate ± z* • SE
– For population proportion p
• Estimate = p̂
•
•
•
SE ( pˆ ) 
ˆˆ
pq
n
z*: calculated based on the confidence level
CI for p : pˆ  z *SE ( pˆ )
4. Interpret your results in the context of the
problem.
• Summary – Confidence interval for
population proportion p
pˆ  z  SE ( pˆ )
*
Unbiased estimate
of population
proportion p
Upper critical value
for confidence level
Standard deviation of the
sampling distribution of
sample proportions
Example - Medication side effects
Arthritis is a painful, chronic inflammation of the joints.
An experiment on the side effects of pain relievers examined
arthritis patients to find the proportion of patients who suffer
side effects. It was found that 23 out of 440 arthritis patients
What are some side effects of ibuprofen?
suffered side effects.
Serious side effects (seek medical attention immediately):
Allergic reaction (difficulty breathing, swelling, or hives),
Muscle cramps, numbness, or tingling,
Ulcers (open sores) in the mouth,
Rapid weight gain (fluid retention),
Seizures,
Black, bloody, or tarry stools,
Blood in your urine or vomit,
Decreased hearing or ringing in the ears,
Jaundice (yellowing of the skin or eyes), or
Abdominal cramping, indigestion, or heartburn,
Less serious side effects (discuss with your doctor):
Dizziness or headache,
Nausea, gaseousness, diarrhea, or constipation,
Depression,
Fatigue or weakness,
Dry mouth, or
Irregular menstrual periods
Solution
• Check Conditions
– Randomization Condition; assume the 440 arthritis
patients were randomly selected.
– 10% Condition; it is reasonable to assume there are
more than 4,400 total arthritis patients.
– Success/Failure Condition: np̂= (440)(23/440) = 23 and
nq̂= (440)(317/440) = 317, both are greater than 10.
• All required conditions are met.
Let’s calculate a 90% confidence interval for the population proportion of
arthritis patients who suffer some “adverse symptoms.”
pˆ 
What is the sample proportion p̂ ?
23
 0.052
440
What is the sampling distribution for the proportion of arthritis patients with
adverse symptoms for samples of 440?
For a 90% confidence level, z* = 1.645.
Using the one sample method, we
calculate a margin of error me:
me  z * pˆ (1  pˆ ) n
me  1.645  0.052(1  0.052) / 440
me  1.645  0.014  0.023
N ( pˆ , pˆ (1  pˆ ) n )Upper tail probability P
z*
0.25
0.2 0.15
0.1 0.05 0.03 0.02 0.01
0.67 0.841 1.036 1.282 1.645 1.960 2.054 2.326
50% 60% 70% 80% 90% 95% 96% 98%
Confidence level C
90%CIfor p : pˆ  me
or 0.052  0.023
 With 90% confidence level, between 2.9% and 7.5% of
arthritis patients taking this pain medication experience
some adverse symptoms.
Your Turn:
• In a random sample of 50 Philadelphia
families with children of preschool age, 35
had children enrolled in preschool. Find a
95% confidence interval for the true
proportion of Philadelphia families with
children enrolled in preschool.
Solution
• Check Conditions
– Random condition: given, sample was random.
– 10% condition: it is reasonable that the number
preschool age children in Philadelphia is greater than
500.
35
ˆ
p

 .7
– Success/Failure condition:
50
npˆ  50(.7)  35  10 & nqˆ  50(.3)  15  10
so the sample is large enough.
Solution
• 95% CI (one-proportion z-interval)
95% CI for p : pˆ  z *SE ( pˆ )
pˆ 
35
 .7
50
z *  1.96
SE ( pˆ ) 
ˆˆ
pq

n
.7 .3  .0648
50
pˆ  z*SE ( pˆ )  .7  1.96 .0648  .7  .127
The 95% CI is: .7  .127 or .573,.827 
• Conclusion: We are 95% confident that between 57.3%
and 82.7% of preschool age children in Philadelphia are
currently enrolled in preschool.
Confidence Intervals on the TI-83/84
• Press STAT key, choose TESTS, and then
choose A: 1-PropZInterval…
• Adjust the settings;
– x: the number selected (not p̂)
– n: the sample size
– C-level: confidence level
• Then choose “Calculate”
Solve the Previous Problem
Using the TI-84
• In a random sample of 50 Philadelphia families
with children of preschool age, 35 had children
enrolled in preschool. Find a 95% confidence
interval for the true proportion of Philadelphia
families with children enrolled in preschool.
• Solution: .57298,.82702 
pˆ  .7
n  50
How Confidence Intervals Behave
• The user chooses the confidence level, and the margin
of error follows from this choice.
– The higher the confidence level, the greater the margin of error
and hence, the larger the confidence interval.
– The lower the confidence level, the lesser the margin of error
and hence, the smaller the confidence interval.
• We would like high confidence and a small margin of
error.
– High confidence says that our method almost always gives
correct answers.
– A small margin of error says that we know the parameter more
precisely.
Margin of Error
me  z * pˆ (1  pˆ ) n
• The margin of error gets smaller when;
1.
Z* gets smaller. Trade-off between confidence level and
margin of error. To obtain a smaller margin of error from the
same data, you must be willing to accept a lower confidence
level.
2. SE ( pˆ ) : pˆ (1  pˆ ) n gets smaller. SE ( pˆ ) measures the variation
in the population. It is easier to pin down p when SE ( pˆ ) is small
(less variation).
3. n gets larger. Increasing the sample size n reduces the margin
of error for a fixed confidence level (large sample size means
less variation).
•
Because n appears under the radical, we must increase the
sample size by a factor of four to cut the margin of error in half.
Choosing the Sample Size
• We can arrange to have both high confidence
and a small margin of error by taking a large
enough sample.
• To determine the sample size n that will yield a
confidence interval for a population proportion
with a specified margin of error me.
• me  z * pˆ (1  pˆ ) n (margin of error wanted)
• Solve for n: n  pˆ 1  pˆ 
 me 
 * 
 z 
2
Example:
• At the end of every school year, the state
administers a reading test to a SRS drawn from
a population of 100,000 third graders. Over the
last five years, students who took the test
correctly answered 75% of the test questions.
What sample size should you use to achieve a
margin of error equal to 4%, with a confidence
level of 95%?
Solution
me  .04
me  z *
pˆ  .75
ˆˆ
pq
n
.04  1.96 
n
CL  95%  z *  1.960
(.75)(.25)
n
.1875
 450.19
2
 .04 


 1.96 
Sample size should be 451 third graders.
Choosing Your Sample Size
• To determine the sample size, choose a Margin of Error
(me) and a Confidence Interval Level.
• The formula requires p̂ which we don’t have yet because
we have not taken the sample. A good estimate for p̂ ,
which will yield the largest value for p̂q̂ (and therefore for
n) is 0.50.
ˆˆ
pq
*
me  z
n
• Solve the formula for n.
n
ˆˆ
pq
2
 me 
 * 
 z 
Example:
• At the end of every school year, the state
administers a reading test to a SRS drawn from
a population of 100,000 third graders. What
sample size should you use to achieve a margin
of error equal to 4%, with a confidence level of
95%?
Solution:
me  .04
CL  95%  z *  1.960
me  z *
ˆˆ
pq
n
.04  1.96 
n
use pˆ  .5 (most conservative value)
.5.5
n
.25
 600.25
2
 .04 


 1.96 
Sample size should be 601 third graders.
Your Turn:
• Suppose the U.S. President wants an estimate
of the proportion of the population who support
his current policy toward revisions in the Social
Security System. The president wants the
estimate to be within .04 of the true proportion.
Assume a 90 percent level confidence. How
large a sample is required?
Solution
me  z
*
ˆˆ
pq
n
.04  1.645 
(.5)(.5)
n
.25
n
 422.8
2
 .04 


1.645


Sample size should be 423.
Sample Size
• In practice, taking samples costs time and money. The
required sample size may be impossibly expensive.
• Notice once again that it is the size of the sample that
determines the margin of error. The size of the
population (as long as the population is much larger than
the sample) does not influence the sample size we need.
The Perfect Confidence
Interval
What Can Go Wrong?
Don’t Misstate What the Interval Means:
• Don’t suggest that the parameter varies.
• Don’t claim that other samples will agree with yours.
• Don’t be certain about the parameter.
• Don’t forget: It’s about the parameter (not the statistic).
• Don’t claim to know too much.
• Do take responsibility (for the uncertainty).
• Do treat the whole interval equally.
What Can Go Wrong?
Margin of Error Too Large to Be Useful:
• We can’t be exact, but how precise do we need to be?
• One way to make the margin of error smaller is to reduce
your level of confidence. (That may not be a useful
solution.)
• You need to think about your margin of error when you
design your study.
– To get a narrower interval without giving up confidence, you
need to have less variability.
– You can do this with a larger sample…
What Can Go Wrong?
Choosing Your Sample Size:
• In general, the sample size needed to produce a
confidence interval with a given margin of error at a
given confidence level is:
z  p̂q̂

n
 2
ME
2
where z* is the critical value for your confidence level.
• To be safe, round up the sample size you obtain.
What Can Go Wrong?
Violations of Assumptions:
• Watch out for biased samples—keep in mind
what you learned in Chapter 12.
– There is no correct method for inference from data
haphazardly collected with bias of unknown size.
Fancy formulas cannot rescue badly produced data.
• Think about independence.
What have we learned?
• Finally we have learned to use a sample to say
something about the world at large.
• This process (statistical inference) is based on our
understanding of sampling models, and will be our focus
for the rest of the book.
• In this chapter we learned how to construct a confidence
interval for a population proportion.
– Best estimate of the true population proportion is the one we
observed in the sample.
What have we learned?
– Best estimate of the true population proportion is the
one we observed in the sample.
– Create our interval with a margin of error.
– Provides us with a level of confidence.
– Higher level of confidence, wider our interval.
– Larger sample size, narrower our interval.
– Calculate sample size for desired degree of precision
and level of confidence.
– Check assumptions and condition.
What have we learned?
• We’ve learned to interpret a confidence
interval by Telling what we believe is true
in the entire population from which we took
our random sample. Of course, we can’t
be certain, but we can be confident.
Assignment
• Pg. 455 – 458: #1 – 15 odd, 21, 23, 29,
31, 35
• Read Chapter 20, Pg. 459 - 475
Download