Analysis of variance for two or more factors The mice performed well, but an academic career was looking doubtful. At least for the post-doc. The mice, at home in a top research lab, were secure. The post-doc's name was John; the 18 mice were Alice, Bob, Charlene, …, and Robert (not their real names). John hoped to see if a certain gene affected learning and memory. The mice hoped to find the hidden platform in the water maze. John was studying mice in which he genetically modified a gene. For his experiment, he created knock-out mice that lacked the gene. Then he measured the difference in learning and memory-related behaviors between the wild type (Wt) mice with the normal version of the gene and the knock-out (KO) mice that lacked the gene. Creating the knock-out mice and doing the behavioral tests was difficult and time consuming. After almost a year working on his post-doc, John only had 9 knock-out mice and 9 wildtype mice. His data looked like this. The t-test for the difference in response between the knockouts and wildtype was nonsignificant, p=0.1023. Two Sample t-test data: Response by Treatment t = -1.7332, df = 16, p-value = 0.1023 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.6034999 0.2612777 sample estimates: mean in group KO mean in group Wt 2.544444 3.715556 John was afraid that his academic research career might be coming to an end. Without positive results from his experiment, he couldn't publish and couldn't get a faculty position. His wife would be very unhappy if he told her he had to start over in a new post-doc project, instead of looking for a better-paying job. But John was sure there was a difference between the knockout and wildtype mice. Because of the effort involved in producing the knockouts, he had run the experiments over a period of several months. When he plotted his data, there was a clear difference between the treatment groups within each month. But the responses differed greatly by month. It appeared that the measurements he was taking were reliable within month, but not reproducible across months. John decided to try a separate t-test for knock-out versus wildtype for each Month 1, 2 and 3. His p-values for the three months were p=0.0155, p=0.0160, and p=0.0586; significant for the first two months, but not significant for the third. At this point, John came to me for help. John told me about his work, and I told John that we would use two-way ANOVA. John wanted to compare two treatment group (knockout vs wildtype) but he had run the experiment over several months. Something about the measurements was different in each month; one possible reason was the instrument, which John had to re-calibrate each month. The month of measurement had become an important source of variability in the response. Two-way ANOVA provides the way to control for month while testing for the effect of treatment group. If we use a t-test for treatment group, which doesn't remove the unexplained variability due to month, we get a non-significant result. John and I agreed to do a two-way ANOVA, including treatment and month. What did we see? Analysis of Variance Table Response: Response Df Sum Sq Mean Sq F value Pr(>F) factor(Month) 2 30.5044 15.2522 90.202 1.005e-08 Treatment 1 6.1718 6.1718 36.500 3.031e-05 Residuals 14 2.3672 0.1691 The ANOVA showed that the effect of the factor Month is significant (p=1.005e-08), confirming that Month was a significant source of otherwise unexplained variability in the response. More interesting, the ANOVA table also showed that Treatment is significant, with p=3.031e-05, or p=0.00003. John was ecstatic. The mice were pleased. His experiment was a success, he could publish, and his academic career was back on track. All thanks to using two-way analysis of variance to control for the unexplained variance that the t-test failed to control. If other factors (besides treatment) affect the response, then we want to include those factors in our analysis. Use ANOVA to control for the effects of factors that cause unexplained variance. The t-test is the most commonly used hypothesis test. It is also the worst test. It has no ability to control for factors besides treatment that affect the response. It minimizes the power to detect treatment effects. It maximizes the sample size required to make discoveries. The t-test is the test to use when you don't want to discovery anything. Two-way ANOVA example Here's another example of 2-way ANOVA. We have two categorical variables (two factors) that affect the patient's response: Factor 1 (treatment): drug vs. placebo Factor 2 (sex): male vs. female Subject Treat Sex Response 1 Drug MALE 1.2 2 Drug MALE 1.2 3 Drug MALE 1 4 Drug MALE 1.1 5 Drug MALE 1.1 6 Drug FEMALE 4 7 Drug FEMALE 4 8 Drug FEMALE 4 9 Drug FEMALE 4.1 10 Drug FEMALE 4.1 11 Placebo MALE 1 12 Placebo MALE 2 13 Placebo MALE 2 14 Placebo MALE 2.1 15 Placebo MALE 2.1 16 Placebo FEMALE 4 17 Placebo FEMALE 5 18 Placebo FEMALE 5 19 Placebo FEMALE 5.1 20 Placebo FEMALE 5.1 Here's a boxplot of Response versus Treatment. For Response versus Treatment, the t-test p-value = 0.3 is not significant, so we would conclude that the drug mean is not different from the placebo mean. Let's look at response as a function of sex. Males and females differ greatly in their response. For Response versus Sex, the t-test pvalue = 1.5e-10. If we don't control for the sex effect, it will be hard to detect the effect of the drug. When researchers see this problem, they sometimes try separate t-tests for males and females. This strategy reduces the sample size in each subgroup; a better alternative is two-way ANOVA. Here is the two way ANOVA. Analysis of Variance Table Response: Response Df Sum Sq Mean Sq F value Pr(>F) Gender 1 43.808 43.808 406.515 2.622e-13 *** Treatment 1 2.888 2.888 26.799 7.583e-05 *** Residuals 17 1.832 0.108 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Both Gender and Treatment are significant (p < 0.05). Males and females differ greatly in their response. Drug and placebo also differ in their response. By including the gender covariate that influenced the response, we are better able to determine if the treatment is effective. If we just do a t-test for Treatment, the result is not significant. Using the two-way ANOVA to control for Gender, Treatment is significant. Which do you think is the better design? Interactions When we have two or more factors that affect a response, we may have interactions between the factors. We'll see some examples here, and examine interactions in more depth when we look at multiple regression and factorial experiment design. Example: effect of male and female rabbits on the number of baby rabbits The two factors are male rabbits (present or absent) and female rabbits (present or absent). The dependent (response) variable is the number of baby rabbits. The effect of male rabbits depends on the female rabbits: male rabbits alone produce no baby rabbits if female rabbits are present, then having male rabbits present leads to baby rabbits. There is an interaction between the two factors (male rabbit and female rabbit) in their effect on the response (baby rabbits). The effect of the male depends on whether or not there is a female. The word "depends" tells us we have an interaction. Whenever you say that the effect of one factor depends on the level of another factor, you have an interaction. Example: interaction of anti-depressant drug with age in effect on suicidal thought The two factors are anti-depressant drug and patient age. The dependent variable is suicidal thoughts. The effect of an anti-depressant drug depends on the age of the patient: the drug reduces suicidal thoughts in adults the drug increases suicidal thoughts in teen-agers There is an interaction between the two factors (drug and age) in their effect on the response (suicidal thoughts), because the effect of the drug depends on the age of the patient. Testing for interactions We can examine interactions using graphs, and use ANOVA to test for significant interactions. We'll use a cookie baking example. We are interested in how the yield of good cookies is affected by the baking temperature and time in the oven. Here's our data for 8 batches of cookies. Batch 1 2 3 4 5 6 7 8 Temperature Time Yield 1 1 30 1 1 35 1 2 60 1 2 58 2 1 60 2 1 64 2 2 30 2 2 35 An interaction plot is a convenient way to visualize interaction effects. If the lines are not parallel, there is an interaction. The interaction plot shows that there is an interaction between Temperature and Time in their effect on yield. interaction.plot(temperature, time, yield) We can do formal statistical tests for interaction. In the ANOVA without the interaction term, neither temperature nor time is significant: Analysis of Variance Table Response: yield Df Sum Sq Mean Sq F value Pr(>F) temperature 1 4.5 4.5 0.014 0.9103 time 1 4.5 4.5 0.014 0.9103 Residuals 5 1603.0 320.6 Here is the ANOVA model including an interaction term. Analysis of Variance Table Response: yield temperature time temperature:time Residuals --Signif. codes: 0 Df Sum Sq Mean Sq F value Pr(>F) 1 4.5 4.50 0.5143 0.5129366 1 4.5 4.50 0.5143 0.5129366 1 1568.0 1568.00 179.2000 0.0001801 *** 4 35.0 8.75 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 The temperature:time interaction term is significant (p=0.000180). Based on the interaction test and the interaction plot, it appears that the effect of time depends on temperature and vice versa. When we have interactions, we need to do further work to understand the effects of individual factors at different levels. We'll look more at this later. Interaction example: weight gain vs diet For another two-way ANOVA interaction example, we'll use data from the textbook "A Handbook of Statistical Analyses using R", by Brian Everitt and Torsten Horthorn. It is well worthwhile getting a copy if you want to learn R. The experiment examined weight gain in rats fed four diets with different diet type (low versus high protein) and protein source (beef versus cereal). Here's the interaction plot. The interaction plot shows that the effect of protein source (beef versus cereal) depends on diet type (low versus high protein), and the effect of type depends on source. Because the lines are not parallel, there is an interaction. Here's the analysis of variance including the source:type interaction term. Analysis of Variance Table Response: weightgain Df Sum Sq Mean Sq F value Pr(>F) source 1 220.9 220.9 0.9879 0.32688 type 1 1299.6 1299.6 5.8123 0.02114 * source:type 1 883.6 883.6 3.9518 0.05447 . Residuals 36 8049.4 223.6 The p-value = 0.05447 for the source:type interaction approaches significance, so we should be concerned that the effect of source depends on type, and the effect of type depends on source.