ANOVA Handout #1. 1-Way ANOVA Example #We’ll analyze the Filling Machines data from Problem 16.11, p. 725-6 from the #textbook: A company uses six filling machines of the same make and model to place #detergent into cartons that show a label weight of 32 ounces fillmach<-read.table("c://Classes//Stat214//CH16PR11.txt") ## what the data look like fillmach[1:3,] V1 V2 V3 1 -0.14 1 1 2 0.20 1 2 3 0.07 1 3 #assign variable names: names(fillmach)=c("amount","machine","replication") #We can make the variables more accessible using the attach statement: attach(fillmach) ## machine is the factor in this experiment and has to be declared as such in R machine<-factor(machine) # Let’s visualize the data boxplot(amount~ machine) ## apparently there appear to exist differences among the six machines with respect to the ## amount of detergent they put in the cartons #we can apply ANOVA (single factor). Notice that we use the aov() and not lm(). #aov uses lm for specific ANOVA analysis including the Tukey multiple comparison. ## Is there a difference between machines? summary(fm1 <- aov(amount ~ machine)) Df Sum Sq Mean Sq F value Pr(>F) machine 5 2.2893 0.4579 14.784 3.636e-11 *** Residuals 114 3.5306 0.0310 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # F-test is significant, hence we conclude that the mean amounts of fill differs among the #six machines ## Let’s check the distributional assumptions par(mfrow=c(2,2)) plot(fm1) ## This plot checks the basic assumptions for the classic ANOVA model with normally ##distributed, independent and constant error. Homoscedasticity ( variance of the error is ##constant) is very important (i.e. error in the model is constant and independent of the ##factor levels). ## The QQplot checks the normal distributions. Influential points can be detected in the ##Cook’s distance plot. ## There are a few of problematic points (8, 32 and 71). #Next question is: where are the differences? # We’ll consider all pairwise comparisons: Tukey’s Method #Multiple comparisons by TukeyHSD must be done on a list made by #aov() ## We usually do not look for differences if the ANOVA null-hypothesis is not rejected ## The function TukeyHSD implements Tukey multiple comparisons fm1Tukey<-TukeyHSD(fm1,"machine") fm1Tukey Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = amount ~ machine) $machine diff lwr upr p adj 2-1 0.1170 -0.0443194 0.2783194 0.2934937 3-1 0.3865 0.2251806 0.5478194 0.0000000 4-1 0.2920 0.1306806 0.4533194 0.0000106 5-1 0.0515 -0.1098194 0.2128194 0.9392011 6-1 0.0780 -0.0833194 0.2393194 0.7260015 3-2 0.2695 0.1081806 0.4308194 0.0000588 4-2 0.1750 0.0136806 0.3363194 0.0252432 5-2 -0.0655 -0.2268194 0.0958194 0.8469184 6-2 -0.0390 -0.2003194 0.1223194 0.9815028 4-3 -0.0945 -0.2558194 0.0668194 0.5359056 5-3 -0.3350 -0.4963194 -0.1736806 0.0000003 6-3 -0.3085 -0.4698194 -0.1471806 0.0000029 5-4 -0.2405 -0.4018194 -0.0791806 0.0004684 6-4 -0.2140 -0.3753194 -0.0526806 0.0026737 6-5 0.0265 -0.1348194 0.1878194 0.9968910 # Differences between Brands are significant at 5% level if the confidence interval #around the estimation of the difference does not contain zero. #This can be visualized by a plot of the list: par(mfrow=c(1,1)) plot(fm1Tukey) ## Statistically significant Differences exist between machines 1-3, 1-4, 2-3,2-4,3-5,36,5-4,6-4 #1. 2-Way ANOVA Example #We’ll analyze the Cash Offers data from Problem 16.10, p. 725 from the textbook: #A consumer organization studied the effect of age of automobile owner on size of cash #offer for a used car by utilizing 12 persons in each of three age groups. Six of the twelve #were females and six males. cash<-read.table("c://Classes//Stat214//CH19PR10.txt") ## what the data look like cash[1:3,] V1 V2 V3 V4 1 21 1 1 1 2 23 1 1 2 3 19 1 1 3 names(cash)<-c('offer','age','sex','replicate') cash$sex<-factor(cash$sex) cash$age<-factor(cash$age) ## Next we plot an interaction plot interaction.plot(cash$age,cash$sex,cash$offer) ## start with a full interaction model FullMod<-lm(cash$offer~cash$sex+cash$age+cash$sex:cash$age) #use the 'anova' function to get the ANOVA-table ( type I SS or sequential extra sum of #squares): anova(FullMod) Analysis of Variance Table Response: cash$offer Df Sum Sq Mean Sq F value cash$sex 1 5.44 5.44 2.2791 cash$age 2 316.72 158.36 66.2907 cash$sex:cash$age 2 5.06 2.53 1.0581 Residuals 30 71.67 2.39 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Pr(>F) 0.1416 9.79e-12 *** 0.3597 ## only age appears to be important as we expected it from the interaction plot ## let’s fit 1-way ANOVA with age only summary(aov(cash$offer ~ cash$age)) Df Sum Sq Mean Sq F value Pr(>F) cash$age 2 316.72 158.36 63.601 4.769e-12 *** Residuals 33 82.17 2.49 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ###Problem 19.14—Hay Fever (p. 868) hay<-read.table("c://Classes//Stat214//CH19PR14.txt") names(hay)<-c('relief','A','B','replicate') hay$A<-factor(hay$A) hay$B<-factor(hay$B) ## Next we plot an interaction plot interaction.plot(hay$A,hay$B,hay$relief) ## interaction is evident FullMod<-lm(hay$relief~hay$A*hay$B) anova(FullMod) Analysis of Variance Table Response: hay$relief Df Sum Sq Mean Sq F value hay$A 2 220.020 110.010 1827.86 hay$B 2 123.660 61.830 1027.33 hay$A:hay$B 4 29.425 7.356 122.23 Residuals 27 1.625 0.060 Pr(>F) < 2.2e-16 *** < 2.2e-16 *** < 2.2e-16 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## interaction is significant as expected # check the residuals for violations of model assumptions par(mfrow=c(2,2)) plot(FullMod) ### Multiple Comparisons: ## One approach when interaction is important is to perform a one-way ANOVA with all ## combinations combined in one factor. #A simple trick (however A and B need to be used as numerics): C<-as.numeric(hay$A)*10+as.numeric(hay$B) C # [1] 11 11 11 11 12 12 12 12 13 13 13 13 21 21 21 21 22 22 22 22 23 23 23 23 31 31 31 #31 32 32 32 32 33 #[34] 33 33 33 C<- factor(C) CMod<-lm(hay$relief~C) anova(CMod) Analysis of Variance Table Response: hay$relief Df Sum Sq Mean Sq F value Pr(>F) C 8 373.10 46.64 774.91 < 2.2e-16 *** Residuals 27 1.63 0.06 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## we now compare all the combinations of A and B coded in the new combined factor C ## ## We can apply Tukey multiple comparisons. TukeyHSD(aov(hay$relief~C), "C", ordered = TRUE) ## not showing the result as is very long ## here’s the plot instead par(mfrow=c(1,1)) plot(TukeyHSD(aov(hay$relief~C), "C", ordered = TRUE))