 

advertisement
Confidence Intervals Based on the t-Distribution
You have learned that a confidence interval for a population mean  has the form: x  z 2 

.
n
One huge problem with this expression is that if the population mean  is unknown and needs to
be estimated, it is not likely that the population standard deviation  would be known. A natural
solution is to simply use the sample standard deviation s as an estimate for the population
s
standard deviation . So it would seem reasonable to use x  z 2 
to obtain a confidence
n
interval for μ. But does this expression actually produce an interval that will succeed in
capturing the population mean  100(1-)% of the time? We will examine this question using
simulation in Example 1 below.
Example 1: Chest Measurements of Militamen
The file Militia.MTW on the FacShare drive contains chest measurements (in inches) for
5738 Scottish militiamen in the early 19th century. We will treat these observations as our
population, take samples of size n = 5 from it, and form 95% confidence intervals for the
population mean.
(a) Use the Stat > Basic Statistics > Display Descriptive Statistics
to find the population mean and standard deviation. Record them below with appropriate
symbols. Also make a histogram of the population distribution. Does the population
distribution appear to be approximately normal?
(b) For this population, is it appropriate to use the Central Limit Theorem to approximate the
distribution of X even a very small sample size like n = 5? Explain.
(c) We will now use the Minitab macro militia.mac (found on Facshare) to simulate selecting
1000 samples of size n = 5 from this population. For each of these 1000 samples, we
compute a confidence interval for the mean using the formula suggest above, and then we’ll
count how many of those intervals contain the actual population mean μ. Copy the macro
from Facshare to your desktop and then type a command like the following in Minitab
(replacing bgill with your username): %’C:\Users\bgill\Desktop\militia’
(type a % sign followed by the full path name of the location of the file in single quotes). It
might be easiest to save the macro to a USB drive where the path will be something simpler
like %’D:\militia’. Enter 5 for the sample size and 1000 for the number of samples to
generate. The macro will select 1000 samples of size 5 and compute the mean and standard
deviation for each sample. It will also produce a histogram of the simulated sampling
distributions of x and s. Does the sampling distribution of x appear to be approximately
normal? What about the sampling distribution of s?
s
from each of your 1000 samples, and
n
count how many of these (alleged) 95% confidence intervals succeed in capturing the
population mean μ:
MTB> let c5=c3-1.96*c4/sqrt(5)
MTB> let c6=c3+1.96*c4/sqrt(5)
MTB> let c7=(c5<39.832 & c6>39.832)
MTB> tally c7
What percentage of your 1000 intervals succeed in capturing the population mean? Is it
close to 95%?
(d) Now create the bounds for the expression x  z 2 
You should have found that this procedure does not produce intervals that succeed in capturing 
95% of the time, so we need to adjust the procedure. The adjustment needed is to use a critical
value not from the z-distribution (standard normal) but rather from what is called a t-distribution
(you will receive a separate handout with more discussion of the t-distribution). These critical
values for the t-distribution will depend on both the desired confidence level and on our sample
size. To find a confidence interval based a sample of size n, we will use what is called the tdistribution with n-1 degrees of freedom. The critical value for a 100(1-)% confidence interval
will be denoted by tn 1, 2 .
s
. This procedure is
n
appropriate when one has a simple random sample and when either the population has a normal
distribution or the sample size is large (n ≥ 30 as a rule-of-thumb).
A 100(1-)% confidence interval for μ is given by: x  tn 1, 2 
(e) Use Minitab to find the critical value tn 1, 2 for a 95% confidence interval for μ with a
sample of size n = 5. Note that this means that we will have n – 1 = 4 degrees of freedom.
Also, for a 95% confidence interval, we have 1 – α = 0.95, so α = 0.05 and α/2 = 0.025. We
can find the critical value using a process very similar to what we did for critical values for
the normal distribution:
 Select Graph > Probability Distribution Plot from the menu.
 Select View Probability and click OK.
 Under Distribution, select t. Then enter 4 as the Degrees of freedom (n – 1 = 4).
 Click on the Shaded Area tab. Under Define Shaded Area By, select Probability.
Then select Middle and enter 0.025 for both Probability 1 and Probability 2. Click
OK.
Record the resulting critical value t4,0.025 below.
(f) Use this t-interval procedure to form 95% confidence intervals from your 1000 samples:
MTB> let c8=c3-2.776*c4/sqrt(5)
MTB> let c9=c3+2.776*c4/sqrt(5)
MTB> let c10=(c5<39.832 & c6>39.832)
MTB> tally c10
What percentage of your 1000 intervals succeed in capturing the population mean? Is it
close to 95%?
You should find that this new confidence interval formula using the sample standard deviation s
along with the t-distribution works much better than the formula using the standard normal
distribution.
Example 2: ACT Scores
Thirteen randomly selected juniors at a university are given the ACT. The mean and standard
deviation for their scores are 26 and 4, respectively. Find a 95% confidence interval for the
mean of score of all juniors at the university. Note that scores on standardized exams like the
ACT are normally distributed.
(a) Find the critical value tn 1, 2 needed to form a 95% confidence interval for the population
mean μ based on this sample of size n = 13. (Remember to use n – 1 as the degrees of
freedom).
(b) Find a 95% confidence interval for the mean ACT score for all juniors at the university.
Example 3: A pharmaceutical company checks the potency of the drugs it manufactures by
chemical analysis. It selects a random sample of 16 cephalothin crystals manufactured in the last
week and finds the following values for the release potency of the crystals:
897
914
913
906
916
918
905
921
918
906
895
893
908
906
907
901
The data can be found in the file Cephalothin.MTW on Facshare.
(a) Find the mean and standard deviation of the sample data. Also examine a histogram of the
data and comment on its key features.
(b) Find a 99% confidence interval for the mean release potency of all cephalothin crystals
manufactured in the last week. Use Minitab to find the critical value tn 1, 2 and then show
the details of computing the interval.
(c) Use Minitab to verify your computations from part (b). Select Stat > Basic Statistics > One
Sample t. Double click on the column containing the data to select it as the variable, and use
the Options button to set the confidence level to 99% then click OK twice.
(d) The release potency is supposed to be 910. Based on the confidence interval, does it seem
plausible that the population mean for all cephalothin crystals produced in the last week
could be 910? Does this data provide strong evidence that the mean release potency is not
910? Explain.
(e) Comment on whether this t-procedure is valid here. [Hints: First ask whether the sample is a
random one from the population. Then ask whether the sample size is large or whether the
data appear to be normally distributed.]
(f) What percentage of the crystals from the sample fall within the interval that you computed?
Is this percentage close to 99%? Should it be? Explain.
This last question should remind you that the confidence interval estimates the value of the
population mean. If we repeatedly take random samples and form a confidence interval like this,
then in the long run 99% of those intervals would contain the actual value of the population
mean. The interval does not aim to estimate the value of an individual observation, however.
(g) What effect would it have on the confidence intervals and the questions above if the first
value in the data (897) was replaced by 807?
Download