Compare Two Dependent Samples (Paired t-test) In the last section, we looked at comparing the means of two different populations: for example, if you are curious about how the average height of male and female students compares with each other, you can use the t-test that is based on comparing the difference between the sample means ( xm x f ) with the difference between the population means ( m f ). Example: Weather Forecast However, it's useful to recognize that two separate sets of quantitative data do not automatically imply that you are comparing two means. For example, consider the following example that shows the actual temperature of the day and the temperature in the forecast: Actual Low Temperature Forecast Low Temperature 1 16 -5 16 -5 20 23 22 9 15 You can see that the forecast is not particularly accurate, since it deviates from the actual low temperature in most of the days. So we can ask the following question: on average, is the forecast low temperature significantly different from the actual low temperature (at the 0.05 level)? At first, it might seem that these hypotheses might be appropriate: H 0 : 1 2 H a : 2 2 But there is a problem: we are not particular interested in how the mean of ALL actual temperatures compares with the mean of ALL forecast temperatures. In fact, consider the scenario: the forecast overestimates the temperature for half of the days, and under-estimates the temperature in the other half. If the over-estimates and the under-estimates are roughly the same in size, then their means can be quite close! So what is the better way to handle this question? It turns out that we are not really comparing two means at all. Instead, we should be comparing the difference of the temperatures. If the average difference significantly deviates from zero, then the forecast is almost always wrong by a large margin. So the first step is finding out what the differences are: Actual Low Temperature Forecast Low Temperature Difference (actual - forecast) 1 16 -15 -5 16 -21 -5 20 -25 23 22 1 9 15 -6 We'll use the variable d to represent the difference. Now our hypotheses become: H 0 : d 0 H a : d 0 We are using d here to indicate the mean of the differences, which is just a single parameter. Conducting the t-test like what we did for testing one mean, we found the test statistic: t d d 13.2 0 2.76 sd / n 10.7 / 5 d and sd refer to the (sample) mean and std dev of the differences, respectively. For the degree of freedom, since we are testing 5 values of difference (one for each day), we have df 5 1 4 . From the Student t calculator in GeoGebra, the P-value for the two-tailed test is: P-value = 2 P(t 2.76) 0.051 So the conclusion is we do not reject H 0 : there is not enough evidence to show the the forecast significantly deviates from the actual low temperature of the day. After this example, you can probably recognize that this type of problem really should have been included in the hypothesis for one mean: the statistical inference only involves one parameter -- d in this case. The way we used the data to derive the test statistic is very different from the previous section: instead of looking at the two separate sample means x1 and x2 , we first look at the difference d x1 x2 , and then check to see the mean and standard deviation of the differences d and sd . Independent v.s. Dependent/Paired Samples The textbook makes a distinction between the previous and current sections by a pair of terms: if we are comparing two population means, it's called "inference from two independent samples"; on the other hand, if we are looking at the mean of differences, the technical phrase is "inference from two dependent samples". So of you may be thinking about the concept of independence v.s. dependence of two events from the earlier chapter in probability, since within each pair, the value of the actual temperatures does seem to "affect" the value of the forecast temperature. However, it's probably more useful if we use the term "paired sample" instead of "dependent sample", since it better captures what we need to DO with the data prior to hypothesis testing. For this reason, the test we did above was also known as “paired t-test”, as used by most people who apply it in their work. The paired / dependent samples are actually quite common: for example, if you would like to see whether following a diet has reduced the weight of participants, you will measure the weight of each person before and after the diet; if we would like to see whether using a fertilizer has increased crop production, we'll better use similar plots with nearly identical soil and weather conditions so that other confounding factors are not interfering with the response variable. In fact, an experimental design using paired samples is almost always better than one based on independent samples. This is the reason why studies using identical twins have a special place in medicine: if two persons are genetically identical, then their differences are presumably the results of their experiences. Since twins are hard to find, medical researchers often try as much as they can to at least have some type of pairing between the experimental group and control group, especially when the sample size is very small.