reduce - Statland.org

advertisement
Does blocking really reduce variability?
Many modern introductory statistics courses include attention to study design. The Advanced
Placement Statistics Exam discusses the role of blocking, which is said to “reduce variability”.
However, neither the AP curriculum, nor most other introductory courses, covers the analysis of
randomized block designs. This leaves the student a bit in the dark as to just how variability is
reduced. This article attempts to give a partial explanation of this based on the two-sample t-test
and simple linear regression; topics covered in the AP curriculum and most introductory college
courses. We assume students have at least seen computer printouts for both of these techniques.
We will use data from an experiment to measure the effect of lighting conditions on the ability of
humans to judge distance. Twenty-four people were randomly divided into two groups and
assigned to one of the 'treatment' groups. All 24 people were asked to judge how far they were from
a number of different objects. An average 'error' in judgment, in feet, was recorded for each person.
One treatment group was shown the objects in bright sunshine, and the other under cloudy
conditions.
---------------------I
+
I-------------------
1
2
-------------------------I
+
I---------------------------+---------+---------+---------+---------+---------+----Errors
4.0
5.0
6.0
7.0
8.0
9.0
Parallel boxplots show that a reasonable model for this data would be a shift of about one and a half
feet more errors for the second (cloudy) group. A two-sample t-test appears appropriate with no
apparent skewness or outliers.
Two-sample T for Error_Sun vs Error_Clouds
Error_Sun
Error_Clouds
N
12
12
Mean
5.717
7.36
StDev
0.896
1.14
SE Mean
0.26
0.33
Difference = mu (Error_Sun) - mu (Error_Clouds)
Estimate for difference: -1.64167
T-Test of difference = 0 (vs not =): T-Value = -3.91
Pooled StDev = 1.0275
P-Value = 0.001
DF = 22
As you can see, we used the older version of the test with a pooled variance estimate and 22 degrees
of freedom. If you normally teach the newer version that does not pool then the only number above
that will change is the degrees of freedom which will then be 20. However, that will not always be
the case. We pooled here to make a better parallel with what we will do next. Note that the
standard deviations of 1.1 and 0.9 are remarkably close. You can use this or the virtually identical
results with this data to justify pooling.
There is a technique called Analysis of variance (ANOVA) that extends the pooled version of twosample t to more than two groups. All we want to say at the moment is that it gives the same result
when we have but two groups.
One-way ANOVA: Errors versus Group
Source
Group
Error
Total
DF
1
22
23
S = 1.027
SS
16.17
23.23
39.40
MS
16.17
1.06
F
15.32
R-Sq = 41.05%
P
0.001
R-Sq(adj) = 38.37%
The first thing to note is that the p-value is the same and so the decision is the same: there seems to
be a real difference in the ability to judge distance under the two lighting conditions. Next, the
mysterious F-value of 15.32 has a square root of 3.91, which is the t-value for the t-test. These two
similarities will always exist when comparing pooled two-sample t with an ANOVA on the same
two groups. ANOVA always pools.
The remainder of the table (or all we need of it) can be related to similar tables in regression
printouts. The sum of the squared residuals from the overall mean of all 24 observations is 39.40.
Of this, 16.17 (or 41.05%) is “accounted for” by the lighting conditions. (We won’t get into what
“accounted for” means here other than to say that R2
means the same thing here as it does in regression.) Also as in regression the value of “S” is (like
every standard deviation) a typical value for how far the data vary from some model or summary of
the data. The first standard deviation we learned was a typical value for how far a batch of numbers
varied from their mean. For all 24 observations, the standard deviation is 1.309. For regression,
“S” is a typical value for the residuals (vertical distances from the regression line). For this
ANOVA the “S = 1.027” is a typical value for how far the observations are from their respective
group means.
At this point we have to confess that we have been concealing on aspect of this study. There was a
blocking variable – age. The subjects were in fact divided into three age groups of equal size and
then half of each age group was assigned to each lighting condition. We can do a two-way
ANOVA that takes account of both lighting condition and age.
Two-way ANOVA: Errors versus Group, AgeGrp
Source
Group
AgeGrp
Error
Total
DF
1
2
20
23
S = 0.8205
SS
16.17
9.76
13.47
39.40
MS
16.1704
4.8800
0.6733
R-Sq = 65.82%
F
24.02
7.25
P
0.000
0.004
R-Sq(adj) = 60.69%
Again, we do not need to understand al of this, but note that R2 went up and S went down – both
good signs. We usually measure variability in these situations with the sum of squared residuals.
We can see that that is unchanged for the total and for the Group variable, but now AgeGrp
accounts for 9.76 and the Error sum has been reduce by just this amount from the one-way
ANOVA. This is the reduction in variability to which the magic phrase applies. The unaccountedfor sum of squares dropped from 23.13 to 13.47. The percentage unaccounted for dropped from
100-41.05=58.95% to 100-65.82=34.18%. Either way you look at it, it was cut nearly in half.
Let’s look at why this kind of decrease in variability is important. In a simple one-sample t-test, we
look at a fraction whose denominator is a measure of variability: the (estimated) standard error of
the mean. Other inference situations are similar, even if the formula is much more complicated, and
even if we do not know what the formula is. What we can say about it though is that it is that
estimate of variability that is reduced by blocking. This generally makes our test statistic larger (F
went from 15.32 to 24.02) and p-values get smaller (0.001 to 0.000). This increases the power of a
test and decreases the margin of error (or width of a confidence interval).
Finally, here are the raw data and the fits for the final model.
Errors
5.2
4.3
5.8
6
5.2
4.2
6
5.4
6.2
7.1
6.7
6.5
7.1
6.4
7.9
5.6
7
7.1
8.2
6
8
6.7
9.1
9.2
Group
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
AgeGrp
1
1
1
1
2
2
2
2
3
3
3
3
1
1
1
1
2
2
2
2
3
3
3
3
FITS
5.22
5.22
5.22
5.22
5.32
5.32
5.32
5.32
6.62
6.62
6.62
6.62
6.86
6.86
6.86
6.86
6.96
6.96
6.96
6.96
8.26
8.26
8.26
8.26
What model is that you ask? Not one that can be expressed in an equation as readily as a regression
model. But it is just as simple. Here it is. Taking a young person in bright sunlight as the base
(5.22 feet for average error) add on



0.10 feet for being middle aged
1.30 feet for being even older
1.64 feet for cloudy conditions
You can verify this in the FITS column. That 1.64 is just the difference in the two means from the ttest we started with. But now if we did a confidence interval for that difference it would be narrower
(details not shown here) and in addition we have learned something about the effect of age.
Technology Note:
The data are part of the DEPTH dataset that comes with the Student Edition of Minitab 14, which
was used to create all the printouts.
Download