Chapter 8.1 — Inference for population proportions

advertisement
Chapter 8.1 — Inference for population proportions
Inference for a population proportion p
Stat 226 – Introduction to Business Statistics I
Suppose we are interested in the proportion of people with credit card debt
larger than $5,000.
Spring 2009
Professor: Dr. Petrutza Caragea
Section A
Tuesdays and Thursdays 9:30-10:50 a.m.
The parameter of interest is now no longer a mean but a proportion, e.g.
say 25% of all credit card holders.
We will denote the population proportion by p.
The Census Bureau obtains a random sample of 2500 people and found
750 have more than $5,000 credit card debt. How would you estimate p?
Chapter 8, Section 8.1
Inference for population proportions
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 8.1
1 / 15
Chapter 8.1 — Inference for population proportions
Assuming that we will have a random sample the value of !
p will be
random as well (our statistic !
p is a random variable)
mean of !
p is the population proportion p, i.e. µbp = p
"
Section 8.1
2 / 15
confidence intervals
hypotheses tests
As the sample size n increases, the spread of the sampling distribution of !
p
decreases.
Introduction to Business Statistics I
Section 8.1
Knowing the sampling distribution of !
p , we can do inference for the
population proportion p in form of
p(1 − p)
n
! is an
Because the mean of the sampling distribution is indeed p, p
unbiased estimator of p.
Stat 226 (Spring 2009)
Introduction to Business Statistics I
For sufficiently large sample sizes we have that
# "
$
p(1 − p)
!
p ∼ N p,
n
properties of the sampling distribution of !
p are
shape is close to normal
σbp =
Stat 226 (Spring 2009)
Chapter 8.1 — Inference for population proportions
sampling distribution of !
p
standard deviation of !
p is
We can use the sample proportion !
p to estimate the population proportion
! is an unbiased estimator of p.
p, p
3 / 15
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 8.1
4 / 15
Chapter 8.1 — Inference for population proportions
Chapter 8.1 — Inference for population proportions
Example: Bob wonders what proportion of students at his school think
that tuition is too high. He interviews a random sample of 50 of the 2400
students at his (small) college and finds that 38 think that tuition is too
high. Construct a 95% confidence interval for the proportion of students
of the entire college thinking that tuition is too high.
a (1 − α) · 100% ci for the population proportion p
A (1 − α) · 100% confidence interval for the population proportion p is
given by
#
$
"
!
p (1 − !
p)
∗
!
p±z
,
n
%
by using !
p instead of the unknown p.
where we estimate σbp = p(1−p)
n
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 8.1
5 / 15
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 8.1
Chapter 8.1 — Inference for population proportions
Chapter 8.1 — Inference for population proportions
Assumptions
Answer: Yes. Use the so-called Wilson’s estimator: &
p
How?? We simply add 4 “phony” observations — 2 “yes” (positive)
counts and 2 “no” (negative) counts. Then estimate p.
Independence Assumption
Plausible independence?
A (1 − α) · 100% confidence interval for the population proportion p using
Wilson’s estimate is given by
#
$
"
&
p (1 − &
p)
∗
&
p±z
n+4
Random sample?
10% condition: Population size > 10 · n
Sample size assumption: Large enough for CLT
Check that n · p > 10 and n · (1 − p) > 10
This helps to move !
p further away from 0 or 1, respectively.
Question: If not true, can we still estimate p?
Stat 226 (Spring 2009)
Introduction to Business Statistics I
6 / 15
As long as n ≥ 5, this works very well.
Section 8.1
7 / 15
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 8.1
8 / 15
Chapter 8.1 — Inference for population proportions
Chapter 8.1 — Margin of error (m) too large to be useful
Example: Bob’s data cont’d
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 8.1
9 / 15
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Chapter 8.1 — Determining sample size
Chapter 8.1 — Determining sample size
Back to Bob’s data: a confidence interval that says that the percentage of
people who think they pay too much for tuition is between 10% and 90%
wouldn’t be of much use. Most likely, you have a sense of how large a
margin of error you can tolerate.
You would like to get a narrower interval without giving up confidence ⇒
you need to have less variability in your sample proportions. How can you
do that? Choose a larger sample. How large???
This yields a required sample size of
Recall when estimating unknown means, we used
What we know: z ∗ , m
What we usually don’t know: p What can we do???
n≥
'
z∗ · σ
m
(2
1
all we need to do is to adjust the standard deviation σ and use the
standard deviation of !
p
)
σbp = p(1 − p).
Stat 226 (Spring 2009)
Introduction to Business Statistics I
n≥
Section 8.1
11 / 15
'
z∗
m
(2
· p (1 − p)
Section 8.1
10 / 15
Section 8.1
12 / 15
round up!!
One possibility: Consider the Worst Case Scenario
(the one that needs the largest sample size): p = 0.5.
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Chapter 8.1 — Determining sample size
Chapter 8.1 — Inference for population proportions
Example: Bob’s data
we want a 95% CI for p that has a width of only 0.1
Other possibilities:
1
2
!, or &
Use information on p, p
p from previous studies (e.g. prior belief,
a pilot study, historical data)
if we are going to use Wilson’s estimate &
p , we need to remember that
we add 4 “phony” observations, so we really only need
n≥
'
z∗
m
(2
observations! — again round up!!
Stat 226 (Spring 2009)
·&
p (1 − &
p) − 4
Introduction to Business Statistics I
Section 8.1
13 / 15
Chapter 8.1 — Inference for population proportions
significance test for a single proportion
when performing a hypothesis test, the null hypothesis specifies a
value for p, which we will call p0
when calculating p-values we act again as if the hypothesized p (i.e.
p0 ) was actually true.
when testing H0 : p = p0 , we substitute p0 for p in the expression for
σbp and then standardize !
p in order to obtain our test statistic. That
is, we get
p̂ − p0
z="
p0 (1 − p0 )
n
see Handout – Hypothesis Testing for one proportion p
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 8.1
15 / 15
⇒ margin of error m=width/2 = 0.1/2=0.05
based on Bob’s previous data we have
x +2
4
&
p=
=
= 0.2857
n+4
14
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 8.1
14 / 15
Download