Calculating a confidence interval for difference between two means

advertisement
Calculating a confidence interval for difference between two means (if you want to
listen also turn on your computer speakers)
As an example, we will use data from Spring 2007 for exam1 given to two different
delivery methods of Stat200: Blended(B) and Resident(R)
First, understand the emphasis of a confidence interval is to estimate some unknown
value, e.g. a population mean, proportion, mean difference. In this example we would be
estimating the mean difference.
Recall that basic formula for a confidence interval is as follows:
Sample Statistic ± Multiplier*Standard Error
In our case the sample statistic will be the difference between the two sample means; the
multiplier will come from the t-distribution, and the standard error will be determined by
whether or not we can assume that the two population variances are equal. If they are
equal we can “pool” the variances together. We will start with the sample statistic.
From the data in Minitab and finding basic statistics, the sample mean for Blended is 75.2
and 76.46 for Resident. Therefore:
X B  X R = 75.2 - 76.46 = - 1.26
Now we have the following:
- 1.26 ± Multiplier*Standard Error
Moving on to the multiplier, since we use the t-distribution we need to figure out degrees
of freedom (DF) which means that with a two-sample mean study we will need to
determine if our population variances are equal. The calculation of degrees of freedom
differ between pooled (easy to calculate) and unpooled (ugh!). If we can assume they are
equal then DF is simply found by adding the two sample sizes and subtracting two, or:
NB + NR - 2.
The general rule of thumb to determine equal variances for two populations is to compare
the two sample standard deviations. If the ratio of the larger to the smaller is less than or
equal to 2 we can assume the variances are equal. This looks like the following:
S larger
Ssmaller
2
1
Again using our descriptive statistics from Minitab we see that the larger S is 23.49 and
the smaller is 18.78:
23.49
 1.29 , which is less than 2 so we can assume the variances
18.78
are equal! Therefore, our degrees of freedom (again from Minitab) is 157 + 239 - 2 =
394.
With such large DF, we can pretty much use the Z-values meaning for a 95% level of
confidence employ 1.96 as the multiplier.
To continue, we are now at:
- 1.26 ± 1.96*Standard Error
We move on to finding the standard error. With the variances assumed equal, we use the
pooled method in calculating our standard error.
S.E. = Sp
1
1
where Sp =

NB NR
( N B  1) * S2B  ( N R  1) * S 2R
NB  NR  2
N B = 157 SB = 23.49 N R =239 SR = 18.78 and substituting, Sp = 20.77
Putting all of this together:
Standard Error (S.E.) = 20.77 *
1
1
= 2.05

157 239
For our completed interval:
- 1.26 ± 1.96*2.05
This results in a 95% confidence interval for the mean difference between Blended and
Resident exam 1 scores to be from - 5.45 to 2.94, and with 0 contained in the interval we
cannot say that a difference is present between the mean exam 1 scores.
That is all there is to it! Thankfully we have quick and powerful tools such as Minitab!
Following, we walk through doing this in Minitab. Before we start it is important to
recognize that in this example the exam 1 scores are in one column and the delivery
method (i.e. subscripts) is in a separate column. We will see why this matters. If we had
the exam 1 scores in two distinct columns, one for Blended and another for Resident we
would select a different feature within the Minitab interface window for a two-sample
means analysis.
[copy available at: http://www.stat.psu.edu/~ajw13/examples/CI/two_means_CI.doc]
2
Download