Confidence Interval for the Population Proportion

advertisement
Confidence Interval for the
Population Proportion
1
What a way to start a section of notes – but anyway. Imagine you
are at the ground level in front of my house at the curb. The
picture below is the view of a sprinkler turned on full blast. The
one thing bad about the picture is the sprinkler does not shoot in
both directions at once. It shoots left and then right. But I put
both for illustrative purposes.
2
When I put my sprinkler in the center of my yard I can cover the
middle 95% of the front yard.
Let’s think about an experiment we could undertake. Say you
are outside my house late at night when all the lights are out and
you are blindfolded. Then we spin you around a lot. Your job
then is to put the sprinkler down in the yard. What is the
probability that the center of the yard will get wet?
Did you say 95%? Sure you did and here is why. If, when in
the center, 95% of the yard can be hit, then putting the sprinkler
at different places in the yard would mean that 95% of the time
the center of the yard should be hit.
Hope this helps you understand confidence intervals. If not,
well, sorry.
3
Overview
 In
this section we study one of the two
basic inference methods confidence intervals (hypothesis testing is
the other)
 Confidence intervals are used when our
interest is estimating an unknown
population parameter.

4
Overview




In a previous section we saw the sampling distribution of
sample proportions and on the next slide I show these
results for the sampling distribution. The sampling
distribution:
1) has a normal distribution
2) has the same mean as the population proportion from
which the sample is drawn (even when we do not know
the value – it’s the pattern that is the same), and
3) has a standard error equal to
the square root of [p(1-p)]/n. Now in this section for
practical purposes we use the sample proportion P hat in
the formula instead of the population proportion p
because we do not know it and are trying to estimate it.
5
The number line here is really
just the line segment from 0 to
1 and the value marked on the
line is the population
proportion of a category of a
qualitative variable. The small
letter p is the population
proportion of the category.
p
P is the sample
proportion and will
be typed (and
pronounced) P hat.
σphat = sqrt((P hat(1-P hat))/n)
µPhat
σphat = the standard deviation of the distribution of sample
proportions P hat and is given another name – the standard
error of the proportion!
P hat
6
Here I just have the sampling distribution of the sample
proportions P hat. Remember the center is at the population
proportion p.
µPhat
P hat
7
On the previous slide I have arrows pointing in both directions
from the center of the distribution. By adjusting the length of the
arrows we can envision capturing a certain % of sample
proportions. For example, if we go out 1.96 standard errors in
either direction we know we would have 95% of all sample
proportions.
Now, think about slide 2 and slide 7 together. Starting at the
center we cover 95% of the “ground.”
8
As I explain of slide 3 with the sprinkler, a similar story holds
here. Say in 1 sample we get the sample proportion P hat 1.
By the using the same length that was used from the center, if
we now use the lengths from the sample proportion the
“ground” covered by the length moving in either direction
from the sample proportion P hat 1 should cover the center
95% of the time.
µPhat
P hat
P hat 1
9
From my example you see that sometimes the sample
proportion will not be the population proportion. So, a
confidence interval builds in a margin of error around our
sample proportion in the hopes that the interval will include the
population proportion. The way we calculate the interval is
1) Take the sample proportion
2) Calculate another value I will explain about more later
3) Get two numbers by taking the sample proportion and
subtracting the other value and taking the sample mean and
adding the other value.
This interval, from a low value to a high value, is hoped to
contain the true unknown population proportion.
10
From the last slide I now reiterate some ideas. The line
represents sample proportions. In our sample we get the one
represented by the vertical maker. Then we calculate another
value – I show you later. Take this number and subtract it
from the sample proportion to get the lower limit of the
interval and also take this number and add it to the sample
mean to get the upper limit of the interval.
Lower limit sample pro upper limit
P hat
11
Estimating with confidence - confidence interval
A
property we learned earlier,
combined with our more precise notion
of the 68 - 95 - 99.7 rule, is that 95% of
sample proportions lie within 1.96
standard errors of the population
proportion.
 Imagine
1.96 standard errors is the length of
my sprinkler in one direction. If in the center
95% of the yard can get wet, then by putting
then sprinkler at other parts of the yard the
center will get wet 95% of the time.
12
Estimating with confidence - confidence interval
To get a confidence interval for the unknown population mean
we
1. Calculate the sample proportion.
2. Calculate 1.96 standard errors.
3. Take the sample proportion and subtract 1.96 standard
errors. Take the sample proportion and add 1.96 standard
errors.
13
Example based on example 9.1 of the
book page 193
There is a population of adults who watch the evening news. A
drug maker may be interested in the proportion of viewers who
would ask their doctor for information about the drug. The
interest is in the population proportion of viewers who asked their
doctor about the drug.
Say 693 adults are surveyed to see if they had asked their doctor
about a drug being advertised on the evening news. In the sample,
104 said they asked their doctor about the drug.
The sample proportion P hat = 104/693 = .15
The standard error of the proportion is the square root of
((.15)(.85))/693 = square root of .0001839 = .0135572 or .014
14
Problem
So, we need the number 1.96 times .014 = .027
The lower limit of the interval is .15 - .027 = .123, and
the upper limit of the interval is .15 + .027 = .177.
When we use this method we can expect that the
interval we get will contain the true unknown
population proportion 95% of the time.
15
Estimating with confidence - critical z
 The
Z of 1.96 was the Z to get a 95%
confidence interval. 1.96 is called the
critical Z, or Zα/2.
.025
.475
.475
.025
P hat
.025 is the area to the right of the critical z = 1.96.
16
Estimating with confidence - critical z
 What
if we want a 90% confidence
interval?
.05
.45
.45
.05
x
The Z we should use is 1.645
17
Estimating with confidence
Note that when we went from a 95% to a 90% interval the
interval shrank. The 90% interval leaves us less confident
and we get a smaller interval. This also means a 95%
interval leaves us more confident and gives us a bigger
interval.
If you want to be 100% sure the interval includes the
unknown mean, guess the interval is between a minus
infinity and infinity. You can be sure the number is in that
rather large interval – but this is not very practical.
18
99% confidence interval
The Z to use if you want a 99% confidence interval is
2.575. We see the Z’s used for 4 commonly used
confidence levels on page 139.
19
Estimating with confidence - summary
A
C% confidence interval means we can
be C% confident the unknown
parameter lies within Z standard errors
of the sample proportion.
 This really means we arrived at these
numbers by a method that gives correct
results C% of the time.
 Here C is the Confidence Coefficient.
20
Level of significance – alpha
The book uses the Greek letter alpha to stand for what is called
the level of significance.
alpha (α)= 1 – Confidence Coefficient.
Well, if we are , for example, 95% confident the interval
includes the unknown proportion then there is a 5% probability
the interval will not include the unknown proportion.
21
The level of significance refers to the probability that a
confidence interval would not include the true population
proportion. Typically we pick on a level of significance of .05.
In general, instead of the specific case of .05, we refer to the
level of significance as alpha.
Since alpha = 1 – confidence coefficient, an alpha of .05 would
give a confidence coefficient of .95 and we would talk about a
95% confidence interval. A 5% chance of the population
proportion not being in the interval calculated from the sample
is the same as being 95% confident the population mean is in
the interval calculated from the sample.
22
So, our confidence interval is made up of that part of the number
line that has the limits
P hat minus (Zα/2 times the standard error) and
P hat plus (Zα/2 times the standard error),
Where the standard error = square root of ((P hat)(1 – P hat)/n).
Remember n = the sample size and P hat is calculated as the number
of responses that have the category of the qualitative variable we are
interested in divided by the total number of responses. Example
would be number of folks who say Mt. Dew is their favorite pop
divided by all the people who where asked what is their favorite pop
(and some said Coke, some said Pepsi, some said Mt. Dew and so
on).
23
In statistics the word estimator is used often. The word is used to
talk about a method or procedure of using sample information to
learn about a population parameter.
The sample proportion P hat is a point estimator of the population
proportion p.
Note, once we collect a sample and calculate the sample proportion
the value is called a point estimate.
The point estimator has a drawback in that since you and I know
the value obtained varies from sample to sample, the estimate will
not likely be the true unknown population proportion. This is why
we go to the confidence interval or what is more formally called an
interval estimator.
24
In an introductory stats class it is customary to mention, without
proof, some properties of estimators. I do that here. The estimators
we mention in this class typically have the properties mentioned.
An unbiased estimator of a population parameter is an estimator
whose expected value is equal to the parameter. So the expected
value of the sample proportion is the population proportion! This
basically means the average of all possible sample proportions is the
population proportion.
An unbiased estimator is said to be consistent if the difference
between the estimator and the parameter grows smaller as the
sample size grows larger. Note in the standard error of the sample
proportion that there is division by n and as n gets larger the
standard error shrinks.
25
If there are two, or more, unbiased estimators of a parameter, the
one whose variance is smaller is said to be relatively more efficient.
For most of the rest of the term we will see ideas similar to what
we have in this section. Hypothesis testing is a variation of what
we have done here.
Lastly I want to work on a problem to develop 1 last idea. On the
next slide I have an example where P hat = .48 from various sample
sizes and the interval is based on 95% confidence.
Note how as the sample size increases Z times the standard error
decreases and note that the lower and upper limits get closer
together. Z times the standard error is the margin of error I
mentioned before.
26
27
With this thinking we can see the margin of error is
Zα/2 times sqrt((P hat)(1-P hat)/n). Note on the previous slide
this is the third column. If we set this margin of error equal to B
we have
B = Zα/2 times sqrt((P hat)(1-P hat)/n). We can use this to help
use see what sample size we need to have a certain margin of
error.
By some math we can rearrange to have n by itself as
n = (Zα/2 )2 (P hat)(1 – P hat)/B2.
28
The last point I would make is that if we are thinking about a
sample size we really do not know P hat yet and so we should
either use
1) .5 if we have no idea about the population proportion, or
2) Use some value that we think is relevant for the problem at hand.
Let’s do problem 9.16 page 147 to consider this idea. From the
problem we have
n = 1.6452 times (.5)(.5)/.032 = 2.71(.25)/.0009 = 753
29
Download