Chapters 19 Confidence Intervals for a Population Proportion p

advertisement
From the Data at Hand to the
World at Large
Chapter 19
Confidence Intervals for an
Unknown Population p
Estimation of a population parameter:
•Estimating an unknown population
proportion p
Chapter 19 Objectives
1. Determine, manually and using
technology, confidence intervals for an
unknown population proportion p based on
the information contained in a single
sample.
2. Understand and properly interpret a
confidence interval for a population
proportion p.
What do we frequently need to
estimate?
• An unknown population
proportion p
• An unknown population
mean 
• Will deal first with
estimating a population
proportion p
?
p?
Concepts of Estimation
• The objective of estimation is to estimate
the unknown value of a population
parameter, like a population proportion p,
on the basis of a sample statistic calculated
from sample data.
 e.g., NCSU student affairs office may want
to estimate the proportion of students that
want more campus weekend activities
• There are two types of estimates
– Point Estimate
– Interval estimate
Point Estimate of p
x
• p^ = n
, the sample proportion of x
successes in a sample of size n, is the best
point estimate of the unknown value of the
population proportion p
Example: Estimating an
unknown population proportion p
• Is Sidney Lowe’s departure good or bad
for State's men's basketball team?
(Technician poll; not scientifically valid!!)
• In a sample of 1000 students, 590 say that
Lowe’s departure is good for the bb team.
• ^p = 590/1000 = .59 is the point estimate of
the unknown population proportion p that
think Lowe’s departure is good.
Shortcoming of Point Estimates
•
p̂
= 590/1000 = .59, best estimate of
unknown population proportion p of
students that think Lowe’s departure is good
for the team.
BUT
How good is this best estimate?
No measure of reliability
Another type of estimate
Interval Estimator
A confidence interval is a range (or an
interval) of values used to estimate the
unknown value of a population parameter .
http://abcnews.go.com/US/PollVault/
Tool for Constructing Confidence
Intervals: The Central Limit
Theorem
• If a random sample of n observations is
selected from a population (any population),
and x “successes” are observed, then when n
is sufficiently large, the sampling distribution
of the sample proportion p will be
approximately a normal distribution.
• (n is large when np ≥ 10 and nq ≥ 10).
The sampling distribution model for a
sample proportion p
Provided that the sampled values are independent and the
sample size n is large enough, the sampling distribution of
p is modeled by a normal distribution with E(p) = p and
standard deviation SD(p) =
pq
n
, that is

pq 
pˆ ~ N  p,

n 

where q = 1 – p and where n large enough means np>=10
and nq>=10
The Central Limit Theorem is a formal statement of this
fact.
95% Confidence Interval for p
x
ˆ
Use p  to constructa 95% confidenceinterval
n
for p :
ˆ (1  pˆ )
ˆ (1  pˆ )
p
p
)
, pˆ  1.96
( pˆ  1.96
n
n
written
ˆ (1  pˆ )
p
pˆ  1.96
n
Standard Normal
P(-1.96  z  1.96) =. 95
Sampling distribution of p̂
Confidence level
.95
pq
p  1.96
n
p
pq
p  1.96
n
ˆ will be in this interval
95% of the time p
Therefore, the interval

pq
pq 
, pˆ  1.96
 pˆ 1.96

n
n 

will "capture" p 95% of the time
Example (Gallup Polls)
Voter preference polls typically sample
approximat ely 1600 voters; suppose pˆ  .52.
Then if we desire a 95% confidence interval
for p we calculate
pˆ qˆ
(.52)(. 48)
pˆ  1.96
 .52  1.96
n
1600
 .52  .024  (.496, .544)
http://abcnews.go.com/US/PollVault/story?id=1
45373&page=1
Standard Normal
98% Confidence Intervals
For p

pˆ (1  pˆ )
pˆ (1  pˆ ) 
, pˆ  2.33
 pˆ  2.33

n
n


written
pˆ  2.33
pˆ (1  pˆ )
n
Four Commonly Used
Confidence Levels
Confidence Level
.90
.95
.98
.99
Multiplier
1.645
1.96
2.33
2.58
Medication side effects (confidence
interval for p)
Arthritis is a painful, chronic inflammation of the joints.
An experiment on the side effects of pain relievers
examined arthritis patients to find the proportion of
patients who suffer side effects.
What are some side effects of ibuprofen?
Serious side effects (seek medical attention immediately):
Allergic reaction (difficulty breathing, swelling, or hives),
Muscle cramps, numbness, or tingling,
Ulcers (open sores) in the mouth,
Rapid weight gain (fluid retention),
Seizures,
Black, bloody, or tarry stools,
Blood in your urine or vomit,
Decreased hearing or ringing in the ears,
Jaundice (yellowing of the skin or eyes), or
Abdominal cramping, indigestion, or heartburn,
Less serious side effects (discuss with your doctor):
Dizziness or headache,
Nausea, gaseousness, diarrhea, or constipation,
Depression,
Fatigue or weakness,
Dry mouth, or
Irregular menstrual periods
440 subjects with chronic arthritis were given ibuprofen for pain relief;
23 subjects suffered from adverse side effects.
Calculate a 90% confidence interval for the population proportion p of
arthritis patients who suffer some “adverse symptoms.”
ˆ  z*
p
ˆˆ
pq
n
What is the sample proportion p̂ ?
pˆ 
23
 0.052
440
For a 90% confidence level, z* = 1.645.
pˆ  z *
ˆˆ
pq
n
.052(1  .052)
440
.052  1.645(0.011)
.052  1.645
90% CI for p :
0.052  0.018  (.034,.070)
.052  .018
 We are 90% confident that the interval (.034, .070) contains the true
proportion of arthritis patients that experience some adverse symptoms when
taking ibuprofen.
Example: impact of sample size
90 % CI : .052  1.645
.052 (1  .052 )
 .052  .018  (.034 ,.070 )
440
90 % CI : .052  1.645
.052 (1  .052 )
 .052  .007  (.045 ,.059 )
1000
n=440: width of 90% CI: 2*.018 = .036
n=1000: width of 90% CI: 2*.007=.014
When the sample size is increased, the 90% CI is narrower
Example
• Find a 95% confidence interval for p, the
proportion of NCSU students that strongly
favor the current lottery system for
awarding tickets to football and men’s
basketball games, if a random sample of
1000 students found that 347 strongly favor
the current system.
Example (solution)
pˆ  347
 .347, so 1  pˆ  .653
1000
and the confidence interval is
(.347)(.653)
.347  1.96
= .347  .0295 
1000
(.3175, .3765)
Interpreting Confidence Intervals
• Previous example: .347±.0295(.3175, .3765)
• Correct: We are 95% confident that the interval from
.3175 to .3765 actually does contain the true value of p.
This means that if we were to select many different
samples of size 1000 and construct a 95% CI from each
sample, 95% of the resulting intervals would contain the
value of the population proportion p. (.3175, .3765) is one
such interval. (Note that 95% refers to the procedure we
used to construct the interval; it does not refer to the
population proportion p)
• Wrong: There is a 95% chance that the population
proportion p falls between .3175 and .3765. (Note that p is
not random, it is a fixed but unknown number)
Confidence Interval Interpretation
7 x 5 = 42
She achieved 95% accuracy, so she would
answer 95 out of 100 correctly, say.
95% confidence intervals
behave the same way. An
individual confidence
interval either captures p or it
doesn’t…
Is there a 95% chance that
7x5=42?
…but in a group of many 95%
confidence intervals, about
95% of them will capture p.
Determining the Sample Size n to
Estimate a Population Proportion p
To Estimate a Population
Proportion p
• If you desire a C% confidence interval for a
population proportion p with an accuracy
specified by you, how large does the sample
size need to be?
• We will denote the accuracy by ME, which
stands for Margin of Error.
Required Sample Size n to Estimate
a Population Proportion p
pˆ  ME
* pq
ˆ
CI for p : p  z
n
set z
*
pq
 ME and solve for n :
n
z*2pq
n=
;
2
(ME)
Sampling distribution of p̂
Confidence level
.95
p
pq
p  1.96
n
ME
pq
p  1.96
n
ME
pq
set ME  1.96
and solve for n
n
1.96  pq
2
n
 ME 
2
What About p and q=1-p?
z*2pq
n=
; we don't know p or q;
2
(ME)
TWO METHODS :
1: if prior information is available concerning
the value of p, use that value of p to calculate
n;
2 : if no prior information about p is available,
to obtain a conservative estimate of the
1
required sample size, use p  q 
2
Example: Sample Size to Estimate a
Population Proportion p
• When the Football Bowl Series (FBS) conferences
were debating setting up a college football playoff
to decide the national champion, they hired the
Gallup organization to conduct a poll to estimate
the proportion p of fans that wanted a playoff.
• If the FBS wanted to estimate p to within .025
with 95% confidence, how many fans should they
include in the sample?
*2
n=
z pq
(ME)2
Example: Sample Size to Estimate a
Population Proportion p (cont.)
• The desired Margin of Error is ME = .025
• FBS wants to be 95% confident, so z*=1.96;
the required sample size is
(1.96)2 pq
n
(.025)2
• Since the sample has not yet been taken, the
sample proportion p is still unknown.
• We proceed using either one of the following
two methods:
Example: Sample Size to Estimate a
Population Proportion p (cont.)
z*2pq
n=
(ME)2
• Method 1:
– There is no knowledge about the value of p
– Let p = .5. This results in the largest possible n needed for
a 95% confidence interval of the form pˆ  .025
– If the proportion does not equal .5, the actual ME will be
narrower than .025 with the n obtained by the formula
below.
• Method 2:
– There is some idea about the value of p (say p ~ .7)
– Use the value of p to calculate the sample size
1.96   .5  .5

n
2
.025
2
 1536.64  1537
1.96   .7  .3

n
2
.025
2
 1290.78  1291
Example: Sample Size to Estimate a
Population Proportion p
The Curdle Dairy Co. wants to estimate the
proportion p of customers that will purchase
its new broccoli-flavored ice cream.
Curdle wants to be 90% confident that they
have estimated p to within .03. How many
customers should they sample?
*2
n=
z pq
(ME)2
Example: Sample Size to Estimate a
Population Proportion p (cont.)
• The desired Margin of Error is ME = .03
• Curdle wants to be 90% confident, so z*=1.645;
the required sample size is
(1.645)2 pq
n
(.03)2
• Since the sample has not yet been taken, the
sample proportion p is still unknown.
• We proceed using either one of the following
two methods:
Example: Sample Size to Estimate a
Population Proportion p (cont.)
z*2pq
n=
(ME)2
• Method 1:
– There is no knowledge about the value of p
– Let p = .5. This results in the largest possible n needed for
p̂  .03
a 90% confidence interval of the form
– If the proportion does not equal .5, the actual ME will be
narrower than .03 with the n obtained by the formula below.
• Method 2:
– There is some idea about the value of p (say p ~ .2)
– Use the value of p to calculate the sample size
1.645  .5  .5

n
2
.03
2
 751.67  752
1.645  .2  .8

n
2
.03
2
 481.07  482
Download