Statistics 2014, Fall 2001

advertisement
1
Chapter 5 – Decision Making for Two Samples
Inference about Two Population Means
We want to compare the means of two populations to see whether they differ. There are two situations
to consider, as shown in the following examples:
1) In an experiment designed to study the effects of illumination level on task performance
(“Performance of Complex Tasks Under Different Levels of Illumination,” J. Illuminating
Engineering, 1976: 235-242), subjects were required to insert a fine-tipped probe into the eyeholes of
ten needles in rapid succession both for a low-light-level with a black background and for a higher
level with a white background. It is of interest to compare the mean times for completion of the task
under the two different conditions.
2) Compare the mean lifetime, 1, for transistors produced by production line 1 to the mean lifetime,
2, for transistors produced by production line 2. We want to know whether these two means differ.
In the first case, we are comparing related means, using dependent samples. For each member of one
sample, there is a matched member of the other sample.
In the second case, we are comparing unrelated means, using independent samples. There is no natural
way to match each member of one sample with a member of the other sample.
We will use somewhat different procedures for hypothesis tests, depending on whether our samples are
dependent or independent.
There is another issue to be considered. Are the variances of the two populations equal or unequal.
This issue, of course, did not arise with inference about a single population. We will see that the
procedure for inference about the difference between the means depends on the comparison of the
variances.
Comparing Two Means, Independent Samples
We will assume the following:
1) We have selected a random sample from each of the two populations. The r.s., of size n1,
from population 1 will be denoted by
be i.i.d. with mean
denoted by
X 11 , X 12 ,
1 and variance  12 .
X 21 , X 22 ,
, X 2 n2 .
, X 1n1 .
These r.v.’s are assumed to
The r.s., of size n2, from population 2 will be
These r.v.’s are assumed to be i.i.d. with mean
2
2

and variance
2 .
2)
3)
n n
The two populations are independent. This implies that all of the 1
2 r.v.’s listed
above are independent of each other.
Either both populations are normal, or the conditions of the Central Limit Theorem apply.
(We may also check for normality of each population using normal probability plots with the
samples of data.)
2
1  2 , between the population means.
We want to estimate the difference,
A logical point
estimator of this parameter is X 1  X 2 . It is easily shown that this statistic is an unbiased
estimator of the parameter. It is also easily shown that the variance of the estimator is
V  X1  X 2  
 12
n1

 22
n2
.
Given these results and the assumptions listed above, it is clear that the random variable
X
1
 X 2    1  2 
 12
n1

 22
has an approximate standard normal distribution. We want to use
n2
this fact to do inference about the difference between the two population means. However, the random
variable given above depends on two other unknown parameters. We need to estimate the two
population variances.
If we can assume equal population variances, then the following statistic:
t
for
S
2
P
X
1
 X 2   1   2 
n1  1s12  n2  1s22
1  2  .
n1  n2  2
1 1

n1 n2
may be used to construct a confidence interval
Here the quantity
n1  1 S12   n2  1 S 22


n1  n2  2
is the pooled variance estimate.
Confidence Intervals for Differences Between Population Means
We can find confidence interval estimates for the differences between two population means
(independent samples), where the two population variances are equal, using the following formula:
X
1  X 2   t
2
n1  1s12  n2  1s22  1
,d . f .
n1  n2  2
n
 1

1

n2  .
In this case, d.f. = n1
+ n2 – 2.
Example: The accompanying table gives summary data on cube compressive strength (N/mm2) for
concrete specimens made with a pulverized fuel-ash mix (“A study of twenty-five-year-old pulverized
fuel ash concrete used in foundation structures,” Proceedings of the Institute of Civil Engineers,”
3
  28 , in mean compressive
Mar. 1985, 149-165). We want to estimate the difference,
7
strengths, with 95% confidence, and interpret this interval estimate.
Age (days)
7
28
Sample Size
68
74
Sample Mean
26.99
35.76
Sample SD
4.89
6.43
Since the sample standard deviations do not differ considerably, we assume that the population
variances are equal. The pooled variance estimate is
n  1s72  n28  1s282 67 4.89 2  73 6.432
s P2  7

 33.00206 . The critical value is
n7  n28  2
140
t140, 0.025  1.9771 . Then the endpoints of the interval are

x7  x28   t140, 0.025



 1
1 
1
1
  26.99  35.76  1.9771 33.00206    10.6780 , and
s P2  
 68 74 
 n7 n28 
 1
1 
1
1
  26.99  35.76  1.9771 33.00206    6.8620 .
s P2  
 68 74 
 n7 n28 
We are 95% confident that the difference between mean compressive strength after 7 days of curing
and the mean compressive strength after 28 days of curing is between -10.6780 N/mm2 and -6.8620
N/mm2. In particular, we have a high level of confidence that the mean compressive 28-day strength is
higher than the mean compressive 7-day strength.
x7  x28   t140, 0.025
Download