Calculating a confidence interval for difference between two means (if you want to listen also turn on your computer speakers) As an example, we will use data from Spring 2007 for exam1 given to two different delivery methods of Stat200: Blended(B) and Resident(R) First, understand the emphasis of a confidence interval is to estimate some unknown value, e.g. a population mean, proportion, mean difference. In this example we would be estimating the mean difference. Recall that basic formula for a confidence interval is as follows: Sample Statistic ± Multiplier*Standard Error In our case the sample statistic will be the difference between the two sample means; the multiplier will come from the t-distribution, and the standard error will be determined by whether or not we can assume that the two population variances are equal. If they are equal we can “pool” the variances together. We will start with the sample statistic. From the data in Minitab and finding basic statistics, the sample mean for Blended is 75.2 and 76.46 for Resident. Therefore: X B X R = 75.2 - 76.46 = - 1.26 Now we have the following: - 1.26 ± Multiplier*Standard Error Moving on to the multiplier, since we use the t-distribution we need to figure out degrees of freedom (DF) which means that with a two-sample mean study we will need to determine if our population variances are equal. The calculation of degrees of freedom differ between pooled (easy to calculate) and unpooled (ugh!). If we can assume they are equal then DF is simply found by adding the two sample sizes and subtracting two, or: NB + NR - 2. The general rule of thumb to determine equal variances for two populations is to compare the two sample standard deviations. If the ratio of the larger to the smaller is less than or equal to 2 we can assume the variances are equal. This looks like the following: S larger Ssmaller 2 1 Again using our descriptive statistics from Minitab we see that the larger S is 23.49 and the smaller is 18.78: 23.49 1.29 , which is less than 2 so we can assume the variances 18.78 are equal! Therefore, our degrees of freedom (again from Minitab) is 157 + 239 - 2 = 394. With such large DF, we can pretty much use the Z-values meaning for a 95% level of confidence employ 1.96 as the multiplier. To continue, we are now at: - 1.26 ± 1.96*Standard Error We move on to finding the standard error. With the variances assumed equal, we use the pooled method in calculating our standard error. S.E. = Sp 1 1 where Sp = NB NR ( N B 1) * S2B ( N R 1) * S 2R NB NR 2 N B = 157 SB = 23.49 N R =239 SR = 18.78 and substituting, Sp = 20.77 Putting all of this together: Standard Error (S.E.) = 20.77 * 1 1 = 2.05 157 239 For our completed interval: - 1.26 ± 1.96*2.05 This results in a 95% confidence interval for the mean difference between Blended and Resident exam 1 scores to be from - 5.45 to 2.94, and with 0 contained in the interval we cannot say that a difference is present between the mean exam 1 scores. That is all there is to it! Thankfully we have quick and powerful tools such as Minitab! Following, we walk through doing this in Minitab. Before we start it is important to recognize that in this example the exam 1 scores are in one column and the delivery method (i.e. subscripts) is in a separate column. We will see why this matters. If we had the exam 1 scores in two distinct columns, one for Blended and another for Resident we would select a different feature within the Minitab interface window for a two-sample means analysis. [copy available at: http://www.stat.psu.edu/~ajw13/examples/CI/two_means_CI.doc] 2