3.3.2 ANOVA for two or more factors

advertisement
Analysis of variance for two or more factors
The mice performed well, but an academic career was looking doubtful. At least for the
post-doc. The mice, at home in a top research lab, were secure. The post-doc's name
was John; the 18 mice were Alice, Bob, Charlene, …, and Robert (not their real names).
John hoped to see if a certain gene affected learning and memory. The mice hoped to
find the hidden platform in the water maze.
John was studying mice in which he genetically modified a gene. For his experiment, he
created knock-out mice that lacked the gene. Then he measured the difference in
learning and memory-related behaviors between the wild type (Wt) mice with the
normal version of the gene and the knock-out (KO) mice that lacked the gene. Creating
the knock-out mice and doing the behavioral tests was difficult and time consuming.
After almost a year working on his post-doc, John only had 9 knock-out mice and 9 wildtype mice. His data looked like this.
The t-test for the difference in response between the knockouts and wildtype was nonsignificant, p=0.1023.
Two Sample t-test
data: Response by Treatment
t = -1.7332, df = 16, p-value = 0.1023
alternative hypothesis: true difference in means is not equal to
0
95 percent confidence interval:
-2.6034999 0.2612777
sample estimates:
mean in group KO mean in group Wt
2.544444
3.715556
John was afraid that his academic research career might be coming to an end. Without
positive results from his experiment, he couldn't publish and couldn't get a faculty
position. His wife would be very unhappy if he told her he had to start over in a new
post-doc project, instead of looking for a better-paying job.
But John was sure there was a difference between the knockout and wildtype mice.
Because of the effort involved in producing the knockouts, he had run the experiments
over a period of several months. When he plotted his data, there was a clear difference
between the treatment groups within each month. But the responses differed greatly by
month. It appeared that the measurements he was taking were reliable within month,
but not reproducible across months.
John decided to try a separate t-test for knock-out versus wildtype for each Month 1, 2
and 3. His p-values for the three months were p=0.0155, p=0.0160, and p=0.0586;
significant for the first two months, but not significant for the third. At this point, John
came to me for help.
John told me about his work, and I told John that we would use two-way ANOVA. John
wanted to compare two treatment group (knockout vs wildtype) but he had run the
experiment over several months. Something about the measurements was different in
each month; one possible reason was the instrument, which John had to re-calibrate
each month. The month of measurement had become an important source of variability
in the response. Two-way ANOVA provides the way to control for month while testing
for the effect of treatment group. If we use a t-test for treatment group, which doesn't
remove the unexplained variability due to month, we get a non-significant result. John
and I agreed to do a two-way ANOVA, including treatment and month. What did we
see?
Analysis of Variance Table
Response: Response
Df Sum Sq Mean Sq F value
Pr(>F)
factor(Month) 2 30.5044 15.2522 90.202 1.005e-08
Treatment
1 6.1718 6.1718 36.500 3.031e-05
Residuals
14 2.3672 0.1691
The ANOVA showed that the effect of the factor Month is significant (p=1.005e-08),
confirming that Month was a significant source of otherwise unexplained variability in
the response. More interesting, the ANOVA table also showed that Treatment is
significant, with p=3.031e-05, or p=0.00003. John was ecstatic. The mice were pleased.
His experiment was a success, he could publish, and his academic career was back on
track. All thanks to using two-way analysis of variance to control for the unexplained
variance that the t-test failed to control.
If other factors (besides treatment) affect the response, then we want to include those
factors in our analysis. Use ANOVA to control for the effects of factors that cause
unexplained variance.
The t-test is the most commonly used hypothesis test. It is also the worst test. It has no
ability to control for factors besides treatment that affect the response. It minimizes the
power to detect treatment effects. It maximizes the sample size required to make
discoveries.
The t-test is the test to use when you don't want to discovery anything.
Two-way ANOVA example
Here's another example of 2-way ANOVA. We have two categorical variables (two
factors) that affect the patient's response:


Factor 1 (treatment): drug vs. placebo
Factor 2 (sex): male vs. female
Subject
Treat
Sex
Response
1
Drug
MALE
1.2
2
Drug
MALE
1.2
3
Drug
MALE
1
4
Drug
MALE
1.1
5
Drug
MALE
1.1
6
Drug
FEMALE
4
7
Drug
FEMALE
4
8
Drug
FEMALE
4
9
Drug
FEMALE
4.1
10
Drug
FEMALE
4.1
11
Placebo
MALE
1
12
Placebo
MALE
2
13
Placebo
MALE
2
14
Placebo
MALE
2.1
15
Placebo
MALE
2.1
16
Placebo
FEMALE
4
17
Placebo
FEMALE
5
18
Placebo
FEMALE
5
19
Placebo
FEMALE
5.1
20
Placebo
FEMALE
5.1
Here's a boxplot of Response versus Treatment.
For Response versus Treatment, the t-test p-value = 0.3 is not significant, so we would
conclude that the drug mean is not different from the placebo mean. Let's look at
response as a function of sex.
Males and females differ greatly in their response. For Response versus Sex, the t-test pvalue = 1.5e-10. If we don't control for the sex effect, it will be hard to detect the effect
of the drug. When researchers see this problem, they sometimes try separate t-tests for
males and females. This strategy reduces the sample size in each subgroup; a better
alternative is two-way ANOVA. Here is the two way ANOVA.
Analysis of Variance Table
Response: Response
Df Sum Sq Mean Sq F value
Pr(>F)
Gender
1 43.808 43.808 406.515 2.622e-13 ***
Treatment 1 2.888
2.888 26.799 7.583e-05 ***
Residuals 17 1.832
0.108
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Both Gender and Treatment are significant (p < 0.05). Males and females differ greatly
in their response. Drug and placebo also differ in their response. By including the gender
covariate that influenced the response, we are better able to determine if the treatment
is effective. If we just do a t-test for Treatment, the result is not significant. Using the
two-way ANOVA to control for Gender, Treatment is significant. Which do you think is
the better design?
Interactions
When we have two or more factors that affect a response, we may have interactions
between the factors. We'll see some examples here, and examine interactions in more
depth when we look at multiple regression and factorial experiment design.
Example: effect of male and female rabbits on the number of baby rabbits
The two factors are male rabbits (present or absent) and female rabbits (present or
absent). The dependent (response) variable is the number of baby rabbits.
The effect of male rabbits depends on the female rabbits:
 male rabbits alone produce no baby rabbits
 if female rabbits are present, then having male rabbits present leads to baby
rabbits.
There is an interaction between the two factors (male rabbit and female rabbit) in their
effect on the response (baby rabbits). The effect of the male depends on whether or not
there is a female.
The word "depends" tells us we have an interaction. Whenever you say that the effect
of one factor depends on the level of another factor, you have an interaction.
Example: interaction of anti-depressant drug with age in effect on suicidal thought
The two factors are anti-depressant drug and patient age. The dependent variable is
suicidal thoughts.
The effect of an anti-depressant drug depends on the age of the patient:
 the drug reduces suicidal thoughts in adults
 the drug increases suicidal thoughts in teen-agers
There is an interaction between the two factors (drug and age) in their effect on the
response (suicidal thoughts), because the effect of the drug depends on the age of the
patient.
Testing for interactions
We can examine interactions using graphs, and use ANOVA to test for significant
interactions. We'll use a cookie baking example. We are interested in how the yield of
good cookies is affected by the baking temperature and time in the oven. Here's our
data for 8 batches of cookies.
Batch
1
2
3
4
5
6
7
8
Temperature Time
Yield
1
1
30
1
1
35
1
2
60
1
2
58
2
1
60
2
1
64
2
2
30
2
2
35
An interaction plot is a convenient way to visualize interaction effects. If the lines are
not parallel, there is an interaction. The interaction plot shows that there is an
interaction between Temperature and Time in their effect on yield.
interaction.plot(temperature, time, yield)
We can do formal statistical tests for interaction. In the ANOVA without the interaction
term, neither temperature nor time is significant:
Analysis of Variance Table
Response: yield
Df Sum Sq Mean Sq F value Pr(>F)
temperature 1
4.5
4.5
0.014 0.9103
time
1
4.5
4.5
0.014 0.9103
Residuals
5 1603.0
320.6
Here is the ANOVA model including an interaction term.
Analysis of Variance Table
Response: yield
temperature
time
temperature:time
Residuals
--Signif. codes: 0
Df Sum Sq Mean Sq F value
Pr(>F)
1
4.5
4.50
0.5143 0.5129366
1
4.5
4.50
0.5143 0.5129366
1 1568.0 1568.00 179.2000 0.0001801 ***
4
35.0
8.75
‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The temperature:time interaction term is significant (p=0.000180). Based on the
interaction test and the interaction plot, it appears that the effect of time depends on
temperature and vice versa. When we have interactions, we need to do further work to
understand the effects of individual factors at different levels. We'll look more at this
later.
Interaction example: weight gain vs diet
For another two-way ANOVA interaction example, we'll use data from the textbook "A
Handbook of Statistical Analyses using R", by Brian Everitt and Torsten Horthorn. It is
well worthwhile getting a copy if you want to learn R.
The experiment examined weight gain in rats fed four diets with different diet type (low
versus high protein) and protein source (beef versus cereal). Here's the interaction plot.
The interaction plot shows that the effect of protein source (beef versus cereal)
depends on diet type (low versus high protein), and the effect of type depends on
source. Because the lines are not parallel, there is an interaction.
Here's the analysis of variance including the source:type interaction term.
Analysis of Variance Table
Response: weightgain
Df Sum Sq Mean Sq F value Pr(>F)
source
1 220.9
220.9 0.9879 0.32688
type
1 1299.6 1299.6 5.8123 0.02114 *
source:type 1 883.6
883.6 3.9518 0.05447 .
Residuals
36 8049.4
223.6
The p-value = 0.05447 for the source:type interaction approaches significance, so we
should be concerned that the effect of source depends on type, and the effect of type
depends on source.
Download