Estimates of Population Parameters

advertisement
Chapter 6 – Confidence Intervals
6.1 Estimates of Population Parameters
One of the primary uses of statistics is to estimate population parameters when the
population is too large for a census to be practical. To accomplish this, a random
sample of values from the population data set is drawn and the sample statistic
calculated to draw inferences to estimate the value of unknown
population parameters . . . INFERENTIAL STATISTICS here we come!
Types of Estimates
Point Estimate – a single-value estimate of the population parameter.
Examples:
 The mean height of American men is 69 inches
 65% of New Jersey residents support a ban on cell phone use while driving.
Interval Estimate – an interval
estimated to contain the value of a
population parameter
Examples:
 The mean height of American men is between 67.5 and 70.5 inches
 65% (± 3%) of New Jersey residents support a ban on cell phone use while
driving.
1
Level of Confidence
A point estimate is almost sure to differ from the actual population parameter (at least
slightly), while a good interval estimate can be quite likely to contain the population
parameter.
The level of confidence for an interval estimate is the probability that the
interval contains the population parameter. The level of confidence is
denoted by c, the area under the standard normal curve between the critical
values of –zc and zc. The most commonly used values of c are .90, .95, and .99.
Examples:
 The mean height of American men is between 67.5 and 70.5 inches with a level
of confidence of c  .90
 With 95% confidence, we can say that 65% (± 3%) of New Jersey residents
support a ban on cell phone use while driving.
Note: The level of confidence is a probability for the experiment of drawing a sample
and constructing an interval estimate. That is, though there will be a natural variation
between samples, c is the percent of these sample means that will be between –zc and
zc.
The first example above can be interpreted as follows:
Whenever a sample of American men is drawn and an interval estimate is
constructed from the sample data, 90% of the samples will yield an interval
estimate that contains the actual mean height of all men. Another way of wording
this is that we are 90% sure (confident) that the true (population) mean height of
all American men is within this interval.
2
Estimating the Mean
Point Estimate – If we wish to estimate the population mean  for a random
variable x using a sample of values, the best possible point estimate is just
the sample mean x .
Interval Estimate – We construct an interval estimate for  by starting with the
sample mean x and adding a margin of error denoted by E both above
and below x . We will then have an interval estimate of the form:
( x  E, x  E )
Example: For the estimate of men’s heights given above, the sample average x was
69 inches and a margin of error of E  1.5 inches was used to get the interval estimate
( x  E , x  E )  (69  1.5, 69  1.5)  (67.5, 70.5)
Note: The margin
of error used for an interval estimate depends
on the level of confidence desired. A larger level of confidence will result in
a _______________ margin of error, and hence a _______________ interval.
Usually, the level of confidence is selected, and then the corresponding margin of error
necessary is calculated.
3
Calculating Margin of Error with a Large Sample
If the random variable x is normally distributed (with a known standard
deviation  ) or if the sample size n is at least 30, then since we will be drawing a
random sample and taking the sample mean x , we can apply the Central
Limit Theorem. In this case, The Central Limit Theorem guarantees us that:
 x is approximately normally distributed
 x  
 x  n
Notice that the mean value of x is equal to  the population mean we are trying to
estimate. Given the desired level of confidence c, we are now trying to find the
amount of error E necessary to ensure that the probability of x being within E of the
mean is c.
There are always two critical z-scores  zc which give the appropriate probability for
the standard normal distribution (see diagram in book p. 282), and so the
corresponding probability for the distribution of x is just zc   x or:
E  zc

n
Note: Usually  is not known, but if n  30 , then the sample standard deviation s is
generally a reasonable estimate.
Before continuing, lest practice finding zc for a confidence level of:
c = 0.90
c = 0.95
c = 0.97
4
Example: Suppose a random sample of 40 American men is drawn and the weights of
the men measured. If the sample mean is x  182 lbs. and sample standard deviation is
s  18.4 lbs., then we can construct a 90% confidence interval for the mean weight of the
population of American men as follows.
Since
c  .90
we look up the value
1 c
 .05
2
in the standard normal table and find that it
corresponds to a z-score of z  1.645 thus we will use 1.645 for
error can now be calculated using the formula:

18.4
E  zc
n
 (1.645) 
40
zc .
The margin of
 4.8
Therefore, based on our sample data we are 90% confident that the average weight of
all American men is within 4.8 lbs. of our sample average. From this we get that our
90% confidence interval is: ( x  E , x  E )  (182  4.8,182  4.8)  (177.2,186.8)
Exercise: Construct a 95% confidence interval from the above data.
5
Example: A survey of 100 Tampa commuters finds that the average commuting time
to work is x  25.5 minutes with a standard deviation of s  11.5 minutes. We can
construct a 98% confidence interval for the mean commuting time of Tampa
commuters as follows.
Since
c  .98
we look up the value
in the standard normal table
and find that it corresponds to a z-score of
thus we will use
for
zc .
The margin of error can now be calculated using the formula:
E  zc 
n
And our 98% confidence interval is:
( x  E, x  E )=
So based on our sample data, we can be 98% sure that the average commuting time of
Tampa commuters is between
and
minutes.
Exercise: Repeat the above, but suppose the sample size had been only 50
commuters. What happens to the margin of error? (Make a guess, now see if you were
right by computing the interval.)
6
Example: From previous examples we know that the heights of American women
ages 20-29 are normally distributed with a standard deviation of   2.75 inches.
If we did not know the mean height of all women, we could use the following sample
of 5 women’s heights to estimate  :
67, 63, 64, 65, 63
The sample average is x  64.4 inches for this sample. We can now construct a 95%
confidence interval for  . Since a probability of .025 corresponds to a z-score of
z  1.96 , we will use 1.96 for zc . We again use the formula:

2.75
E  zc
n
 (1.96) 
5
 2.4
to get a margin of error of 2.4 inches. The corresponding 95% confidence interval is:
( x  E , x  E )  (64.4  2.4, 64.4  2.4)  (62, 66.8)
So based on our sample we are 95% confident that 62    66.8 . Note that this interval
contains the previously given value   63.5 . If many samples of 5 women were drawn,
95% of the sample means would be within 2.4 inches of 63.5.
Determining Sample Size
Note: Given a confidence level c, and standard deviation  , drawing a larger sample
will _______crease the margin of error.
If a desired margin of error and level of confidence are known, and if an estimate of
the standard deviation can be made, then the sample size necessary can be determined
by solving the error formula for n.
We know:
E  zc
multiplying by
n

n
yields:
E n  zc
dividing by E we obtain:
n
zc
E
z 
n   c  Always round UP!
 E 
2
finally, squaring both sides yields:
7
Example: Suppose we want to estimate the mean weight of American men, and we
want to be 95% confident that our estimate is within 2 lbs. of the actual mean. Since
our previous study allows us to estimate that the standard deviation of men’s weights
is   18.4 pounds, we can use the formula above to determine the appropriate sample
size.
From the given information, we have c  .95 , so zc  1.96 , the maximum desired error
is E  2 , and we estimated that   18.4 , so:
 z    1.96 18.4 
n c  
  325.15
2

 E  
2
2
Thus a sample of ___________ men should give the desired level of accuracy.
Exercise: Calculate the number of Tampa commuters it would be necessary to
measure the commute for in order to estimate the mean commuting time to within 1
minute with 99% confidence. (From a previous example we had: a survey of 100
Tampa commuters finds that the average commuting time to work is x  25.5 minutes
with a standard deviation of s  11.5 minutes.)
Do page 320 # 56 together to be turned in. Now let’s sample our class and use the data
we have to construct a 90% confidence interval for the average age of a college
student. What had to change for us to do this with a smaller sample than required?
How might sampling just our class be a confounding issue?
8
6.2
The (Student) t-distributions
In the last example we were able to construct a confidence interval with a small
sample ( n  5 ) because the variable women’s heights is normally distributed.
Unfortunately, it was necessary that we knew the population standard deviation  .
is very unlikely in any real-life situation that we would know 
yet be trying to estimate  .
It
and
A good solution would be to use the sample standard deviation s from our small
sample to estimate  , but the estimation of  from a small sample is generally not
accurate enough for use in the normal distribution test.
If x is normally distributed, then the distribution of
z
x 
/ n
is the standard normal distribution for any sample size n, but the distribution of
t
x 
s/ n
is not. The distribution of the random variable above is the Student t-distribution
with n – 1degrees of freedom (d.f.).
Properties of t-distributions
The t-distributions are a family of probability density functions. For each possible
degree of freedom, there is a unique t-distribution with that degree of freedom.
Like the standard normal distribution, the t-distributions are symmetric, bell-shaped
probability density functions with a mean of 0. However, the t-distribution is
a
wider bell curve with thicker tails than the standard normal
curve.
As the degree of freedom increases, the t-distributions become closer to a normal
distribution. Thus for large sample sizes ( n  30 ), we can use s in place of  , and then
use the standard normal distribution.
See diagram and box on page 325 of the text. SEE PAGE 329 FLOW CHART!
9
Estimating the Mean Using a t-distribution
The process of constructing a confidence interval using a t-distribution is almost
identical to that used to construct confidence intervals using the standard normal
distribution.
First we must know that the variable x is normally
distributed with
unknown standard deviation σ and that we will draw a small
sample (n < 30).
We then choose c, the desired level of confidence, and calculate the statistics x and s
from our sample group.
The sample mean x will again be the best point estimate and the center of our interval.
We can then calculate the margin of error for our estimate using the formula:
E  tc
s
n
Where tc is the critical t-value corresponding to the level of confidence c. The values
of tc for common values of c are given in a table in the front of your text.
sure to use a degree of freedom of
n 1
Make
.
Note: tc  zc for the same value of c since the t-distribution is wider, so we get a
larger margin of error using the t-distribution.
10
Example: Suppose we had our sample of 5 women’s heights from the previous
example:
67, 63, 64, 65, 63
If we knew that women’s heights were normally distributed, but did not know that
  2.75 inches, then we would use the sample standard deviation s as our estimate of  ,
and then use a t-distribution interval.
The sample mean is
x  64.4 inches
and the sample standard deviation is s  1.67 inches.
For 95% confidence, the critical t-score for degree of freedom _____ is:
So:
E  tc
s
 1.67 
 (2.776) 
  2.07
n
 5 
and so our 95% confidence interval is: (64.4  2.07, 64.4  2.07)  (62.33, 66.47)
Exercise: Construct a 99% confidence interval from the above data.
11
tc  2.776
Example: SAT Math Scores are normally distributed. A sample of scores for 20
students has sample mean of x  522.8 with a sample standard deviation of s  154.5 .
We can calculate the 90% confidence interval as follows:
For 90% confidence, the critical t-score for degree of freedom 19 is:
So:
E  tc
tc  1.729
s
 154.5 
 (1.729) 
  59.7
n
20


and our 90% confidence interval is: (522.8  59.7,522.8  59.7)  (463.1,582.5)
Exercises:
 Suppose the same sample mean and sample standard deviation had been obtained
from a sample of size 16. What would the 90% confidence interval be?
 Suppose the same sample mean and sample standard deviation were obtained
from a sample of size 50. What would the 90% confidence interval be?
If time: p 332 # 30. Be sure to do #29 for homework.
12
6.3 Estimating a Population Proportion
Often, we wish to estimate what portion or percentage of a population falls into a
given category.
Examples: (What type of data is being measured here: qualitative or
quantitative?
 What percent of Florida voters plan to vote for the Democratic candidate for
senate?
 What percent of people have Type A+ blood?
 What percent of computer processors are defective?
Such a percentage is called a population proportion and is represented by the
variable p, (the probability of success in a single trial of a binomial experiment). The
proportion calculated in a sample group is denoted by p̂ (“p hat”) and is the best
point estimate for p. If we have a sample of size n where x of the sample members
are in the category being measured, then we calculate
p̂
(proportion
of sample’s successes) by using:
ˆ 
p
x
n
Where x = the number of successes in the sample, and n = the number in the sample.
Example: In a sample of 577 computer processors, 37 were found to be defective.
Thus if p̂ represents the proportion of defective processors, then:
pˆ 
x 37

 .064
n 577
So we might estimate that 6.4% of processors are defective.
13
The Sampling Distribution of
p̂
We construct interval estimates for p in much the same way as our confidence
intervals for a mean. We can calculate p̂ and use it as the center of
our
interval and then add a margin of error above and below p̂ .
The experiment of drawing a sample of n objects and counting the number x in the
desired category is a binomial experiment with n trials and probability of success p on
each trial (as long as the population is very large compared to n). If a sample is
sufficiently large however, then the average number of successes p̂ will be
approximately normally distributed. More specifically:
If an n trial binomial experiment is conducted with probability of success p, and if
np > 5 and nq > 5, then the distribution of the random variable p̂ is
approximately normal with:
 pˆ  E ( x / n) 
 pˆ   ( x / n) 
E ( x) np

p
n
n
and
 ( x)
n
14

npq

n
pq
n
Constructing Confidence Intervals for p
From the above we see that we can use the normal distribution to construct confidence
intervals for p. As before, we first decide the desired level of confidence c, and
then find the critical z-score zc . The margin of error is then found by a similar formula
as before:
E  zc pˆ  zc
pq
n
Note: p is the quantity we are trying to estimate, so it is actually unknown, but once
the sample is drawn, we can use p̂ as our estimate for p and qˆ  1  pˆ as our estimate for
q. So in practice, the formula for margin of error in a population proportion estimate
ˆˆ
pq
E

z
c
is:
n
Example: In a study of a microchip manufacturer, 37 out of a sample of 577
processors were found to have defects. We can construct a 95% confidence interval
for the percentage of defective processors as follows:
First we calculate
x
pˆ  
n
, and so
q̂ 
.
Next check that npˆ  x  37  5 and nqˆ  n  x  540  5 . Which means?
Now since c  .95 , we have zc  .
Our margin of error is thus:
ˆˆ
pq
E  zc

n
Our 95% confidence interval is:
( pˆ  E , pˆ  E )  (0.064 -
, 0.064 +
)=(
,
)
So we are 95% confident that the percentage of defective processors is between
and
%.
Exercise: Calculate the 99% confidence interval for this example.
15
%
Example: A new drug is tested on a sample of 75 adults who were infected with a
cold virus. 32% of the adults in the sample developed no symptoms. Construct a 90%
confidence interval for the proportion of adults who will be prevented from getting a
cold by the drug.
From the above we have that n =
,
p̂ =
and
q̂ =
How can we determine if we can approximate the sampling distribution of
the normal distribution?
Find zc: we use the fact that c  .90 to find zc =
Calculate the margin of error E:
E  zc
p̂ using
.
ˆˆ
pq

n
And our 90% confidence interval is:
( pˆ  E , pˆ  E ) 
So with 90% confidence, we can say that between
will be helped by the drug.
% and
% of adults
Exercise: Suppose that the drug company runs a larger study with a sample of 425
adults, and again 32% develop no symptoms. Construct a 98% confidence interval in
this case.
16
Determining Sample Size
As with our estimates of the mean, we often wish to estimate the size of sample
necessary to achieve a certain margin of error and level of confidence.
To determine a formula, we again solve our margin of error formula for n.
We know:
E  zc
multiplying by
n
pq
n
yields:
E n  zc pq
dividing by E we obtain:
n
zc pq
E
finally, squaring both sides yields:
zc2 pq
n 2
E
Notice that this formula depends on our knowing p and q in advance. If estimates are
known, then they may be used, otherwise, we use the fact that since 0  p  1 ,
pq  p(1  p)  .25
Example: Suppose we want to estimate the percent of processors that are defective to
within 1% with 95% confidence. We can determine the necessary sample size as
follows.
If we have no prior estimate of p and q, then we use pq  .25 , and we have the desired
margin of error is E  .01, and since c  .95 , zc  1.96 . So the required sample size is:
zc2 pq (1.96)2 (.25)
n 2 
 9604
E
(.01)2
So a sample of 9604 processors is necessary to obtain the desired accuracy.
Exercise: Repeat the above, but use the fact that our previous study showed
that p  .064 .
17
Using the TI-83 to Construct Confidence Intervals
The TI-83 can make the margin of error and confidence interval calculation given the
level of confidence c, the standard deviation  , sample size n, and sample mean x .
To start the program ZInterval, do the following:
Press [STAT]
Use the arrow keys to highlight the TESTS Menu
Highlight 7: ZInterval from the list and press [ENTER]
A menu now appears on the screen. You can use the down and up arrows to highlight
different entries.
If you know n,  , and x , then make sure Inpt: is set to Stats. If you want the TI-83 to
calculate from a data set, you would choose Data for Inpt:
Enter the values of  , x , n, and c.
When you are finished highlight the word Calculate and press [ENTER]. The
confidence interval will then be calculated and appear on your screen along with
n.
Example: Using the data from our first example:   18.4
x  182 , n  40 , and c  .90 , the ZInterval program gives:
confidence interval.
18
(177.21,186.79) as
the
x
and
Using the TI-83 to Construct t-distribution Intervals
The TI-83 can be used to construct confidence intervals for the t-distributions. Again,
the level of confidence c, the standard deviation  , the sample size n, and the sample
mean x are required.
To start the program TInterval, do the following:
Press [STAT]
Use the arrow keys to highlight the TESTS Menu
Highlight 8: TInterval from the list and press [ENTER]
A menu now appears on the screen. You can use the down and up arrows to highlight
different entries.
As before, if n,  , and x are known, then set Inpt: to Stats, and if using a data set stored
in a list set Inpt: to Data.
Enter the values of  , x , n, and c.
When you are finished highlight the word Calculate and press [ENTER]. The
confidence interval will then be calculated and appear on your screen along with
n.
Example: Using the data from our SAT example:   154.5 , x  522.8 ,
the TInterval program gives: (463.06,582.54) as the confidence interval.
19
n  20 ,
x
and
and c  .90 ,
Using the TI-83 to Construct Population Proportion Confidence Intervals
The TI-83 can be used to construct confidence intervals for population proportions.
The level of confidence c, the sample size n, and the number in the category from the
sample x are required.
To start the program 1-PropZInt, do the following:
Press [STAT]
Use the arrow keys to highlight the TESTS Menu
Highlight A: 1-PropZInt from the list and press [ENTER]
A menu now appears on the screen. You can use the down and up arrows to highlight
different entries.
Enter the values of x, n, and c.
If the value of p̂ is known but not x, then calculate x from the formula x  npˆ .
When you are finished highlight the word Calculate and press [ENTER]. The
confidence interval will then be calculated and appear on your screen along with p̂
and n.
Example: Using the data from our first example: x  37 , n  577 , and c  .95 , the
1-PropZInt program gives: (.04414,.08411) as the confidence interval.
20
Download