Knowledge and Skills Confidence interval Margin of error

advertisement
Confidence Intervals
Learning Objectives: After completion of this module, the student will be able to



construct and interpret confidence intervals
critically evaluate the outcomes of surveys
interpret the margin of error in the context of nominal confidence level, sample size, and
assumptions made to construct a confidence interval
Knowledge and Skills


Confidence interval
Margin of error
Prerequisites




Calculation of percentages
Arithmetic averages
Normal distribution
Estimating the mean of a normal distribution
Before You Come to Class
The January 18, 2009 New York Times published a poll on America’s expectation of then President-elect
Barack Obama. The poll included the question “What is your best guess about the United States five
years from now?” A total of 1,112 randomly selected adults responded to the survey, and 61% of them
felt that things will be better. The New York Times states that “[i]n theory, in 19 cases out of 20, overall
results based on such samples will differ by no more than three percentage points in either direction
from what would have been obtained by seeking to interview all American adults.”
Consult the following resources to familiarize yourself with the issues involved in conducting surveys:
http://www.ropercenter.uconn.edu/education/polling_fundamentals_intro.html
Citation: Neuhauser, C. Confidence Intervals
Created: October 4, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 1
http://www.isixsigma.com/library/content/c040607a.asp
http://www.robertniles.com/stats/margin.shtml
Here is a calculator that calculates the sample size needed for a given margin of error, confidence level,
and population size:
http://www.isixsigma.com/offsite.asp?A=Fr&Url=http://www.raosoft.com/samplesize.html
After reading the resources, you should be able to answer the following questions:
What is a random sample?
How is the sample for a telephone survey selected?
What are types of errors that may occur when conducting surveys?
What role do the margin of error and the level of confidence play in reporting surveys?
Citation: Neuhauser, C. Confidence Intervals
Created: October 4, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 2
Simulation
On the web pages that are mentioned above you encountered the concept of level of confidence and
margin of error. We will first investigate these concepts through simulations. When the NYT conducted
the survey to determine the percentage of people who feel that things will be better five years from
now, they did not know the percentage—otherwise they would not have needed to conduct the survey.
It should also be clear that the NYT would not be able to call up every adult in the U.S. and asked for
their opinion. Instead, the NYT called up 1,112 U.S. adults and extrapolated from the sample to the
entire U.S. population. They added the claim that “[i]n theory, in 19 cases out of 20, overall results
based on such samples will differ by no more than three percentage points in either direction from what
would have been obtained by seeking to interview all American adults.”
Let’s simulate surveys but let’s first pretend we know the actual proportion of adults in the population
feel that things will be better in five years. Suppose you knew that the percentage is p100% where p is a
number between 0 and 1. For instance, if p is 0.2, then p100% would be (0.2)(100)%=20%.
We set up a simulation that simulated samples of size 100: For each value of p between 0 and 1,
incremented by 0.05, Figure 1 shows the results of 1000 simulations of samples of size 100. The red
curves from top to bottom are the 97.5th, 50th, and 2.5th percentiles. We see in Figure 1 that the
spread of the simulated arithmetic averages is largest when p=0.5.
1
0.9
0.8
arithmetic average
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
p
0.6
0.7
0.8
0.9
1
Figure 1: Simulations of arithmetic averages of samples of size 100 for different values of p (see text for further explanations).
The red curves represent the 2.5%, 50%, and 97.5% percentiles.
Citation: Neuhauser, C. Confidence Intervals
Created: October 4, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 3
When we estimate a proportion p, we use the arithmetic average as our estimator for p. To capture the
variation, we will define an interval, called the confidence interval, so that with high likelihood the true
value will be contained in the interval. More precisely, we choose a confidence level 1   , which is
often 95% or 99%, so that if we repeated the survey multiple times, only a fraction  of the confidence
intervals would not contain the true value. The radius of the confidence interval (i.e., half its length) is
called the margin of error.
Going back to the NYT poll, we can now say that the estimated proportion of adults who felt that things
will be better in five years was 61% in that poll. The confidence level was 95% (i.e., 19 out of 20 cases),
and the margin of error was 3%. This means that in 95 out of 100 surveys, the result would fall between
58% and 64%.
Theory
We need to introduce some notation to explain how this works. Let’s denote by (x1 , x2 ,
, xn ) the
outcome of a survey where n individuals where asked about an issue, and xi  1 if the ith person is in
favor of that issue, and xi  0 otherwise. Assume that the proportion of individuals in the entire
population who are in favor of the issue is p. The number of people in favor in the sample of size n is
n
then obtained by adding up the 1’s, i.e.,
x .
i
i 1
The arithmetic average xn 
random sample ( X1 , X2 ,
1
n

n
x
i 1 i
is the proportion of individuals in favor of the issue. Given a
, X n ) from this population, we calculate the sample mean Xn 
1
n

n
i 1
Xi .
When the sample size n is large, one can show that
Xn  p
p(1  p) / n
is approximately normally distributed with mean 0 and variance 1. If we denote by z1 /2 the (1   / 2)th
percentile of a normal distribution with mean 0 and variance 1, then


Xn  p
1    P  z1 /2 
 z1 /2 


p(1  p) / n


By rearranging the right-hand side, we obtain
Citation: Neuhauser, C. Confidence Intervals
Created: October 4, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 4
1    P( z1 /2 
Xn  p
p(1  p) / n
 z1 /2 )
 P( z1 /2 p(1  p) / n  X n  p  z1 /2 p(1  p) / n )
 P( X n  z1 /2 p(1  p) / n   p   X n  z1 /2 p(1  p) / n )
 P( X n  z1 /2 p(1  p) / n  p  X n  z1 /2 p(1  p) / n )
The interval [ X n  z1 /2 p(1  p) / n , X n  z1 /2 p(1  p) / n ] is called the confidence interval at the
confidence level 1   . There is a problem with computing the confidence interval since it involves the
unknown value p, which we will address now.
Computing the Confidence Interval
To compute the confidence interval, we could choose X n for the unknown value of p. The rational is
that X n is the estimator of p and, according to the Law of Large Numbers, Xn  p as n  . We thus
arrive at the following expression for the confidence interval of a proportion
 Xn  z1 /2 X n (1  X n ) / n , X  z1 /2 X n (1  X n ) / n 


For a 95% confidence interval, we set z10.05/2  1.96 . For a 99% confidence interval, we set
z10.01/2  2.58 . In general, for a 1   level confidence interval, we find the value of z1 /2 by looking up
in Excel
“=NORMINV(probability, 0,1)”
where probability is equal to 1-α/2. Figure 2 that explains this:
Citation: Neuhauser, C. Confidence Intervals
Created: October 4, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 5
Figure 2: Finding
z1 /2 . The area under the curve between -1.96 and 1.96 is equal to 1    1  0.05  0.95 .
Since we don’t know the value of p when we conduct a survey, a conservative way to compute
confidence intervals is to replace the expression z1 /2 p(1  p) / n by its largest value, which happens
when p=1/2. The 95% confidence interval is then of the form
1.96
1.96 

, Xn 
 Xn 

2 n
2 n

and the margin of error is 1.96 / 2 n .
For a 99% confidence interval, replace 1.96 by 2.58. (Why?)
Homework
1. Show graphically that
p(1  p) is largest when p  1 / 2. In this case,
p(1  p)  (0.5)(1  0.5)  1 / 4  1 / 2 . Using this result, explain why it follows that the margin of error
at the 95% confidence level is calculated 1.96 / 2 n .
2. At the 95% confidence level, generate a table that lists the minimum sample size required when the
margin of error is 1%, 3%, 5%, and 10%.
Citation: Neuhauser, C. Confidence Intervals
Created: October 4, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 6
3. At the 99% confidence level, generate a table that lists the minimum sample size required when the
margin of error is 1%, 3%, 5%, and 10%.
4. When estimating a proportion, the following quantities play a role: sample size, confidence level, and
margin of error.
How does the margin of error depend on the sample size?
How does the margin of error depend on the confidence level?
List all ingredients to construct a confidence interval.
Citation: Neuhauser, C. Confidence Intervals
Created: October 4, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 7
Download