Estimates of Population Parameters

advertisement
STA 2023 Elementary Statistics
Lecture Notes
Chapter 6 – Confidence Intervals
Professor Achenbach
Estimates of Population Parameters
One of the primary uses of statistics is to estimate population parameters when the
population is too large for a census to be practical. To accomplish this, a random sample
of values from the population data set is drawn and the sample statistic calculated.
Types of Estimates
Point Estimate – a single-value estimate of the population parameter.
Examples:


The mean height of American men is 69 inches
65% of New Jersey residents support a ban on cell phone use while
driving.
Interval Estimate – an interval estimated to contain the value of a population parameter
Examples:


The mean height of American men is between 67.5 and 70.5 inches
65% (± 3%) of New Jersey residents support a ban on cell phone use
while driving.
Level of Confidence
A point estimate is almost sure to differ from the actual population parameter (at least
slightly), while a good interval estimate can be quite likely to contain the population
parameter.
The level of confidence for an interval estimate is the probability that the interval
contains the population parameter. The level of confidence is denoted by c.
The most commonly used values of c are .90, .95, and .99.
1
Examples:


The mean height of American men is between 67.5 and 70.5 inches with a level of
confidence of c  .90
With 95% confidence, we can say that 65% (± 3%) of New Jersey residents
support a ban on cell phone use while driving.
Note: The level of confidence is a probability for the experiment of drawing a sample
and constructing an interval estimate.
The first example above can be interpreted as follows:
Whenever a sample of American men is drawn and an interval estimate is
constructed from the sample data, 90% of the samples will yield an interval
estimate that actually contains the mean height of all men.
Estimating the Mean
Point Estimate – If we wish to estimate the population mean  for a random variable x
using a sample of values, the best possible point estimate is just the sample mean x .
Interval Estimate – We construct an interval estimate for  by starting with the sample
mean x and adding a margin of error denoted by E both above and below x . We will
then have an interval estimate of the form:
( x  E, x  E )
Example: For the estimate of men’s heights given above, the sample average x was 69
inches and a margin of error of E  1.5 inches was used to get the interval estimate
( x  E , x  E )  (69  1.5, 69  1.5)  (67.5, 70.5)
Note: The margin of error used for an interval estimate depends on the level of
confidence desired. A larger level of confidence will result in a larger margin of error,
and hence a wider interval. Usually, the level of confidence is selected, and then the
corresponding margin of error necessary is calculated.
2
Calculating Margin of Error with a Large Sample
If the random variable x is normally distributed (with a known standard deviation  ) or
if the sample size n is at least 30, then since we will be drawing a random sample and
taking the sample mean x , we can apply the Central Limit Theorem. In this case, The
Central Limit Theorem guarantees us that:


x  

x 
x is approximately normally distributed
n
Notice that the mean value of x is equal to  the population mean we are trying to
estimate. Given the desired level of confidence c, we are now trying to find the amount
of error E necessary to ensure that the probability of x being within E of the mean is c.
There are always two critical z-scores  zc which give the appropriate probability for the
standard normal distribution (see diagram in book p. 271), and so the corresponding
probability for the distribution of x is just zc   x or:
E  zc

n
Note: Usually  is not known, but if n  30 , then the sample standard deviation s is
generally a reasonable estimate.
Example: Suppose a random sample of 40 American men is drawn and the weights of
the men measured. If the sample mean is x  182 lbs. and sample standard deviation is
s  18.4 lbs., then we can construct a 90% confidence interval for the mean weight of the
population of American men as follows.
Since c  .90 we look up the value
1 c
 .05 in the standard normal table and find that it
2
corresponds to a z-score of z  1.645 thus we will use 1.645 for zc . The margin of
error can now be calculated using the formula:
E  zc

n
 (1.645) 
18.4
 4.8
40
Therefore, based on our sample data we are 90% confident that the average weight of all
American men is within 4.8 lbs. of our sample average. From this we get that our 90%
confidence interval is: ( x  E , x  E )  (182  4.8,182  4.8)  (177.2,186.8)
3
Exercise: Construct a 95% confidence interval from the above data.
Exercises: A survey of 100 Tampa commuters finds that the average commuting time to
work is x  25.5 minutes with a standard deviation of s  11.5 minutes. Construct a 98%
confidence interval for the mean commuting time of Tampa commuters.
Exercise: Repeat the above, but suppose the sample size had been only 50 commuters.
What happens to the margin of error?
Example: From previous examples we know that the heights of American women ages
20-29 are normally distributed with a standard deviation of   2.75 inches.
If we did not know the mean height of all women, we could use the following sample of
5 women’s heights to estimate  :
67, 63, 64, 65, 63
The sample average is x  64.4 inches for this sample. We can now construct a 95%
confidence interval for  . Since a probability of .025 corresponds to a z-score of
z  1.96 , we will use 1.96 for zc . We again use the formula:
E  zc

n
 (1.96) 
2.75
 2.4
5
to get a margin of error of 2.4 inches. The corresponding 95% confidence interval is:
( x  E , x  E )  (64.4  2.4, 64.4  2.4)  (62, 66.8)
So based on our sample we are 95% confident that 62    66.8 . Note that this interval
contains the previously given value   63.5 . If many samples of 5 women were drawn,
95% of the sample means would be within 2.4 inches of 63.5.
4
Determining Sample Size
Note: Given a confidence level c, and standard deviation  , drawing a larger sample will
decrease the margin of error.
If a desired margin of error and level of confidence are known, and if an estimate of the
standard deviation can be made, then the sample size necessary can be determined by
solving the error formula for n.
We know:

E  zc
n
multiplying by n yields:
E n  zc
dividing by E we obtain:
n
zc
E
finally, squaring both sides yields:
z 
n c 
 E 
2
Example: Suppose we want to estimate the mean weight of American men, and we want
to be 95% confident that our estimate is within 2 lbs. of the actual mean. Since our
previous study allows us to estimate that the standard deviation of men’s weights is
  18.4 pounds, we can use the formula above to determine the appropriate sample size.
From the given information, we have c  .95 , so zc  1.96 , the maximum desired error
is E  2 , and we estimated that
  18.4 , so:
 z    1.96 18.4 
n c  
  325.15
2

 E  
2
2
Thus a sample of 326 men should give the desired level of accuracy.
Exercise: Calculate the number of Tampa commuters it would be necessary to measure
the commute for in order to estimate the mean commuting time to within 1 minute with
99% confidence.
5
The Student t-distributions
In the last example we were able to construct a confidence interval with a small sample
( n  5 ) because the variable women’s heights is normally distributed. Unfortunately, it
was necessary that we knew the population standard deviation  . It is very unlikely in
any real-life situation that we would know  and yet be trying to estimate  .
A good solution would be to use the sample standard deviation s from our small sample
to estimate  , but the estimation of  from a small sample is generally not accurate
enough for use in the normal distribution test.
If x is normally distributed, then the distribution of
z
x 
/ n
is the standard normal distribution for any sample size n, but the distribution of
t
x 
s/ n
is not. The distribution of the random variable above is the Student t-distribution with
n – 1degrees of freedom.
Properties of t-distributions
The t-distributions are a family of probability density functions. For each possible degree
of freedom, there is a unique t-distribution with that degree of freedom.
Like the standard normal distribution, the t-distributions are symmetric, bell-shaped
probability density functions with a mean of 0. However, the t-distribution is a wider bell
curve with thicker tails than the standard normal curve.
As the degree of freedom increases, the t-distributions become closer to a normal
distribution. Thus for large sample sizes ( n  30 ), we can use s in place of  , and then
use the standard normal distribution.
See diagram and box on page 284 of the text.
6
Estimating the Mean Using a t-distribution
The process of constructing a confidence interval using a t-distribution is almost identical
to that used to construct confidence intervals using the standard normal distribution.
First we must know that the variable x is normally distributed with unknown standard
deviation  and that we will draw a small sample ( n  30 ).
We then choose c, the desired level of confidence, and calculate the statistics x and s
from our sample group.
The sample mean x will again be the best point estimate and the center of our interval.
We can then calculate the margin of error for our estimate using the formula:
E  tc
s
n
Where tc is the critical t-value corresponding to the level of confidence c. The values of
tc for common values of c are given in a table in the front of your text. Make sure to use
a degree of freedom of n 1 .
Note: tc  zc for the same value of c since the t-distribution is wider, so we get a larger
margin of error using the t-distribution.
Example: Suppose we had our sample of 5 women’s heights from the previous example:
67, 63, 64, 65, 63
If we knew that women’s heights were normally distributed, but did not know that
  2.75 inches, then we would use the sample standard deviation s as our estimate of
 , and then use a t-distribution interval.
The sample mean is x  64.4 inches and the sample standard deviation is s  1.67 inches.
For 95% confidence, the critical t-score for degree of freedom 4 is: tc  2.776
So:
E  tc
s
 1.67 
 (2.776) 
  2.07
n
 5 
and so our 95% confidence interval is: (64.4  2.07, 64.4  2.07)  (62.33, 66.47)
Exercise: Construct a 99% confidence interval from the above data.
7
Exercises: SAT Math Scores are normally distributed. A sample of scores for 20
students has sample mean of x  522.8 with a sample standard deviation of s  154.5 .

Calculate the 90% confidence interval for the mean SAT Math Score.

Suppose the same sample mean and sample standard deviation had been obtained
from a sample of size 16. What would the 90% confidence interval be?

Suppose the same sample mean and sample standard deviation were obtained
from a sample of size 50. What would the 90% confidence interval be?
Estimating a Population Proportion
Often, we wish to estimate what portion or percentage of a population falls into a given
category.
Examples:

What percent of Florida voters plan to vote for the Democratic candidate for
senate?

What percent of people have Type A+ blood?

What percent of computer processors are defective?
Such a percentage is called a population proportion and is represented by the variable p.
The proportion calculated in a sample group is denoted by p̂ and is the best point
estimate for p. If we have a sample of size n where x of the sample members are in the
category being measured, then we calculate p̂ for our sample by using: pˆ 
x
n
Example: In a sample of 577 computer processors, 37 were found to be defective. Thus
if p̂ represents the proportion of defective processors, then:
pˆ 
x 37

 .064
n 577
So we might estimate that 6.4% of processors are defective.
8
The Sampling Distribution of p̂
We construct interval estimates for p in much the same way as our confidence intervals
for a mean. We can calculate p̂ and use it as the center of our interval and then add a
margin of error above and below p̂ .
The experiment of drawing a sample of n objects and counting the number x in the
desired category is a binomial experiment with n trials and probability of success p on
each trial (as long as the population is very large compared to n). If a sample is
sufficiently large however, then the average number of successes p̂ will be
approximately normally distributed. More specifically:
If an n trial binomial experiment is conducted with probability of success p, and if
np  5 and nq  5 , then the distribution of the random variable p̂ is approximately
normal with:
 pˆ  E ( x / n) 
E ( x) np

p
n
n
and
 pˆ   ( x / n) 
 ( x)
n

npq

n
pq
n
Constructing Confidence Intervals for p
From the above we see that we can use the normal distribution to construct confidence
intervals for p. As before, we first decide the desired level of confidence c, and then find
the critical z-score zc . The margin of error is then found by a similar formula as before:
E  zc pˆ  zc
pq
n
Note: p is the quantity we are trying to estimate, so it is actually unknown, but once the
sample is drawn, we can use p̂ as our estimate for p and qˆ  1  pˆ as our estimate for q.
So in practice, the formula for margin of error in a population proportion estimate is:
E  zc
9
ˆˆ
pq
n
Example: In a study of a microchip manufacturer, 37 out of a sample of 577 processors
were found to have defects. We can construct a 95% confidence interval for the
percentage of defective processors as follows:
x 37

 .064 , and so qˆ  1  pˆ  1  .064  .936 ,
n 577
ˆ  x  37  5 and nqˆ  n  x  540  5
and check that np
Now since c  .95 , we have zc  1.96 .
First we calculate pˆ 
Our margin of error is thus:
E  zc
ˆˆ
pq
(.064)(.936)
 1.96
 .02
n
577
or 2% .
Our 95% confidence interval is:
( pˆ  E , pˆ  E )  (.064  .02,.064  .03)  (.044,.084)
So we are 95% confident that the percentage of defective processors is between 4.4% and
8.4%.
Exercise: Calculate the 99% confidence interval for this example.
Exercise: A new drug is tested on a sample of 75 adults who were infected with a cold
virus. 32% of the adults in the sample developed no symptoms. Construct a 90%
confidence interval for the proportion of adults who will be prevented from getting a cold
by the drug.
Exercise: Suppose that the drug company runs a larger study with a sample of 425
adults, and again 32% develop no symptoms. Construct a 98% confidence interval in this
case.
10
Determining Sample Size
As with our estimates of the mean, we often wish to estimate the size of sample necessary
to achieve a certain margin of error and level of confidence.
To determine a formula, we again solve our margin of error formula for n.
We know:
E  zc
pq
n
multiplying by n yields:
E n  zc pq
dividing by E we obtain:
n
zc pq
E
finally, squaring both sides yields:
zc2 pq
n 2
E
Notice that this formula depends on our knowing p and q in advance. If estimates are
known, then they may be used, otherwise, we use the fact that since 0  p  1 ,
pq  p(1  p)  .25
Example: Suppose we want to estimate the percent of processors which are defective to
within 1% with 95% confidence. We can determine the necessary sample size as
follows.
If we have no prior estimate of p and q, then we use pq  .25 , and we have the desired
margin of error is E  .01, and since c  .95 , zc  1.96 . So the required sample size is:
zc2 pq (1.96)2 (.25)
n 2 
 9604
E
(.01)2
So a sample of 9604 processors is necessary to obtain the desired accuracy.
Exercise: Repeat the above, but use the fact that our previous study showed
that p  .064 .
11
Using the TI-83 to Construct Confidence Intervals
The TI-83 can make the margin of error and confidence interval calculation given the
level of confidence c, the standard deviation  , sample size n, and sample mean x .
To start the program ZInterval, do the following:
Press [STAT]
Use the arrow keys to highlight the TESTS Menu
Highlight 7: ZInterval from the list and press [ENTER]
A menu now appears on the screen. You can use the down and up arrows to highlight
different entries.
If you know n,  , and x , then make sure Inpt: is set to Stats. If you want the TI-83 to
calculate from a data set, you would choose Data for Inpt:
Enter the values of  , x , n, and c.
When you are finished highlight the word Calculate and press [ENTER]. The confidence
interval will then be calculated and appear on your screen along with x and n.
Example: Using the data from our first example:   18.4
x  182 , n  40 , and c  .90 , the ZInterval program gives: (177.21,186.79) as the
confidence interval.
Using the TI-83 to Construct t-distribution Intervals
The TI-83 can be used to construct confidence intervals for the t-distributions. Again, the
level of confidence c, the standard deviation  , the sample size n, and the sample mean
x are required.
To start the program TInterval, do the following:
Press [STAT]
Use the arrow keys to highlight the TESTS Menu
Highlight 8: TInterval from the list and press [ENTER]
A menu now appears on the screen. You can use the down and up arrows to highlight
different entries.
12
As before, if n,  , and x are known, then set Inpt: to Stats, and if using a data set stored
in a list set Inpt: to Data.
Enter the values of  , x , n, and c.
When you are finished highlight the word Calculate and press [ENTER]. The confidence
interval will then be calculated and appear on your screen along with x and n.
Example: Using the data from our SAT example:   154.5 , x  522.8 , n  20 , and
c  .90 , the TInterval program gives: (463.06,582.54) as the confidence interval.
Using the TI-83 to Construct Population Proportion Confidence Intervals
The TI-83 can be used to construct confidence intervals for population proportions. The
level of confidence c, the sample size n, and the number in the category from the sample
x are required.
To start the program 1-PropZInt, do the following:
Press [STAT]
Use the arrow keys to highlight the TESTS Menu
Highlight A: 1-PropZInt from the list and press [ENTER]
A menu now appears on the screen. You can use the down and up arrows to highlight
different entries.
Enter the values of x, n, and c.
If the value of p̂ is known but not x, then calculate x from the formula x  npˆ .
When you are finished highlight the word Calculate and press [ENTER]. The confidence
interval will then be calculated and appear on your screen along with p̂ and n.
Example: Using the data from our first example: x  37 , n  577 , and c  .95 , the
1-PropZInt program gives: (.04414,.08411) as the confidence interval.
13
Download