Compare_Two_Dependen..

advertisement
Compare Two Dependent Samples
(Paired t-test)
In the last section, we looked at comparing the means of two different populations: for example, if you
are curious about how the average height of male and female students compares with each other, you
can use the t-test that is based on comparing the difference between the sample means ( xm  x f ) with
the difference between the population means (  m   f ).
Example: Weather Forecast
However, it's useful to recognize that two separate sets of quantitative data do not automatically imply
that you are comparing two means. For example, consider the following example that shows the actual
temperature of the day and the temperature in the forecast:
Actual Low Temperature Forecast Low Temperature
1
16
-5
16
-5
20
23
22
9
15
You can see that the forecast is not particularly accurate, since it deviates from the actual low
temperature in most of the days. So we can ask the following question: on average, is the forecast low
temperature significantly different from the actual low temperature (at the 0.05 level)?
At first, it might seem that these hypotheses might be appropriate:
H 0 : 1  2
H a : 2  2
But there is a problem: we are not particular interested in how the mean of ALL actual temperatures
compares with the mean of ALL forecast temperatures. In fact, consider the scenario: the forecast overestimates the temperature for half of the days, and under-estimates the temperature in the other half.
If the over-estimates and the under-estimates are roughly the same in size, then their means can be
quite close!
So what is the better way to handle this question? It turns out that we are not really comparing two
means at all. Instead, we should be comparing the difference of the temperatures. If the average
difference significantly deviates from zero, then the forecast is almost always wrong by a large margin.
So the first step is finding out what the differences are:
Actual Low Temperature Forecast Low Temperature Difference (actual - forecast)
1
16
-15
-5
16
-21
-5
20
-25
23
22
1
9
15
-6
We'll use the variable d to represent the difference. Now our hypotheses become:
H 0 : d  0
H a : d  0
We are using d here to indicate the mean of the differences, which is just a single parameter.
Conducting the t-test like what we did for testing one mean, we found the test statistic:
t
d  d 13.2  0

 2.76
sd / n 10.7 / 5
d and sd refer to the (sample) mean and std dev of the differences, respectively. For the degree of
freedom, since we are testing 5 values of difference (one for each day), we have df  5  1  4 . From
the Student t calculator in GeoGebra, the P-value for the two-tailed test is:
P-value = 2  P(t  2.76)  0.051
So the conclusion is we do not reject H 0 : there is not enough evidence to show the the forecast
significantly deviates from the actual low temperature of the day.
After this example, you can probably recognize that this type of problem really should have been
included in the hypothesis for one mean: the statistical inference only involves one parameter -- d in
this case. The way we used the data to derive the test statistic is very different from the previous section:
instead of looking at the two separate sample means x1 and x2 , we first look at the difference
d  x1  x2 , and then check to see the mean and standard deviation of the differences d and sd .
Independent v.s. Dependent/Paired Samples
The textbook makes a distinction between the previous and current sections by a pair of terms: if we are
comparing two population means, it's called "inference from two independent samples"; on the other
hand, if we are looking at the mean of differences, the technical phrase is "inference from two
dependent samples". So of you may be thinking about the concept of independence v.s. dependence of
two events from the earlier chapter in probability, since within each pair, the value of the actual
temperatures does seem to "affect" the value of the forecast temperature. However, it's probably more
useful if we use the term "paired sample" instead of "dependent sample", since it better captures what
we need to DO with the data prior to hypothesis testing. For this reason, the test we did above was also
known as “paired t-test”, as used by most people who apply it in their work.
The paired / dependent samples are actually quite common: for example, if you would like to see
whether following a diet has reduced the weight of participants, you will measure the weight of each
person before and after the diet; if we would like to see whether using a fertilizer has increased crop
production, we'll better use similar plots with nearly identical soil and weather conditions so that other
confounding factors are not interfering with the response variable. In fact, an experimental design using
paired samples is almost always better than one based on independent samples. This is the reason why
studies using identical twins have a special place in medicine: if two persons are genetically identical,
then their differences are presumably the results of their experiences. Since twins are hard to find,
medical researchers often try as much as they can to at least have some type of pairing between the
experimental group and control group, especially when the sample size is very small.
Download