STAT:5201 Applied Statistic II (Factorial with 3 factors as 23 design) Three-way ANOVA (Factorial with three factors) with replication Factor A: angle (low=0/high=1) Factor B: geometry (shape A=0/shape B=1) Factor C: speed (low=0/high=1) Response: Life of machine in tool hours. An engineer is interested in the effects of cutting angle (A), tool geometry (B), and cutting speed (C) on the life (in hours) of a machine tool. Three runs are done for each combination of factor levels, and all runs are done in random order. This is a completely randomized design (CRD). { D.C. Montgomery (2005). Design and analysis of experiments. John Wiley & Sons: USA. } SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/ data tool; do angle = 0,1; do geometry = 0,1; do speed = 0,1; do replicate = 1 to 3; input life @@; output; end; end; end; end; datalines; 22 31 25 32 43 29 35 34 50 55 47 46 44 45 38 40 37 36 60 50 54 39 41 47 ; run; proc print data=tool; run; Obs angle geometry speed replicate life 1 2 3 4 5 6 7 8 9 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 2 3 1 2 3 1 2 3 1 22 31 25 32 43 29 35 34 50 55 11 12 13 14 15 16 17 18 19 20 21 22 23 24 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 47 46 44 45 38 40 37 36 60 50 54 39 41 47 proc glm data=tool plot=diagnostics; class angle geometry speed replicate; model life=angle|geometry|speed; /* Full model fits a ‘separately fit cell mean’*/ run; Partial output: The GLM Procedure Dependent Variable: life Source Model Error Corrected Total DF 7 16 23 Sum of Squares 1612.666667 482.666667 2095.333333 Mean Square 230.380952 30.166667 F Value 7.64 Pr > F 0.0004 DF Type III SS Mean Square F Value Pr > F 1 1 1 1 1 1 1 280.1666667 770.6666667 48.1666667 0.6666667 468.1666667 16.6666667 28.1666667 280.1666667 770.6666667 48.1666667 0.6666667 468.1666667 16.6666667 28.1666667 9.29 25.55 1.60 0.02 15.52 0.55 0.93 0.0077 0.0001 0.2245 0.8837 0.0012 0.4681 0.3483 Dependent Variable: life Source angle geometry angle*geometry speed angle*speed geometry*speed angle*geometry*speed <--- The diagnostic plots look OK, and the 3-way interaction is not significant here, so that term could be removed from the model (which places it in the error term). Just for the sake of investigation, let’s look at the 2-way interaction plots of angle (A) and geometry (B) for each level of speed (C) anyway... I would not show this to my client, but it’s for my own information. /*Plot 2-way interaction plots of AB for each level of C separately.*/ data lowC; set tool; if speed=0; run; proc print data=lowC; run; symbol1 interpol=std1mj value=star line=1 color=black; symbol2 interpol=std1mj value=diamond line=2 color=blue; proc gplot data=lowC; plot life*angle=geometry/haxis=-0.5 to 1.5; title "low speed: 2-way plot for AB"; run; data highC; set tool; if speed=1; run; proc print data=highC; run; proc gplot data=highC; plot life*angle=geometry/haxis=-0.5 to 1.5; title "high speed: 2-way plot for AB"; run; Though it is visually apparent that these two interaction plots are not the same, the two interactions presented in them are not statistically significantly different. The quantitative value of the 2-way interaction within each plot can be visualized by first considering the difference between lines at each respective angle level (there are two such differences), then subtracting these two differences. The values for these interactions did not test as significantly different, and thus, the 3-way interaction tested as not significant. According to the Type III ANOVA table, the 2-way interaction between angle (A) and speed (C) is significant, and the other 2-way interactions are not significant (AB and BC). We will look at the ‘marginal’ 2-way interaction plot for each combination of factors AB, AC, and BC (these plots average over replicates in a cell and over the levels of the unplotted factor)... Source angle*geometry angle*speed geometry*speed angle*geometry*speed DF Type III SS Mean Square F Value Pr > F 1 1 1 1 48.1666667 468.1666667 16.6666667 28.1666667 48.1666667 468.1666667 16.6666667 28.1666667 1.60 15.52 0.55 0.93 0.2245 0.0012 0.4681 0.3483 /* Look at the marginal 2-way interaction plots.*/ symbol1 interpol=std1mj value=star line=1 color=black; symbol2 interpol=std1mj value=diamond line=2 color=blue; proc gplot data=tool; plot life*angle=geometry/haxis=-.5 to 1.5; title "AB interaction (averaged across third factor)"; proc gplot data=tool; plot life*angle=speed/haxis=-.5 to 1.5; title "AC interaction (averaged across third factor)"; proc gplot data=tool; plot life*speed=geometry/haxis=-.5 to 1.5; title "BC interaction (averaged across third factor)"; run; The type of interaction in the AC plot causes concern for making global statements about the main effects for angle (A) and speed(C), and this interaction is statistically significant. When angle is low (far left side), speed has a positive effect on life, and when angle is high (far right side), speed has a negative effect on life. The minimal model should include: A, B, C, AC (following the hierarchy principle). Suppose the 3-way interaction was significant. How to proceed?... Subset data? One could proceed by considering a separate two-factor factorial model for each level of angle that includes speed and geometry. /*Fit 2-factor model for low A.*/ data lowA; set tool; if angle=0; run; proc glm data=lowA plot=diagnostics; class speed geometry replicate; model life=speed|geometry; lsmeans geometry speed; run; /* The plot generated by the following is the same as that provided by PROC GLM*/ proc gplot data=lowA; plot life*speed=geometry; title Low angle: 2-way plot for BC; run; The GLM Procedure Class Level Information Class speed geometry replicate Levels 2 2 3 Values 0 1 0 1 1 2 3 The GLM Procedure Dependent Variable: life Source Model Error Corrected Total DF 3 8 11 Sum of Squares 854.916667 360.000000 1214.916667 Mean Square 284.972222 45.000000 F Value 6.33 Pr > F 0.0166 Source speed geometry speed*geometry DF 1 1 1 Type III SS 252.0833333 602.0833333 0.7500000 Mean Square 252.0833333 602.0833333 0.7500000 F Value 5.60 13.38 0.02 Pr > F 0.0455 0.0064 0.9005 When angle is set to the low level, there is no significant interaction between geometry (B) and speed (C) (see plot next page). There is a significant positive speed effect, and a significant positive geometry main effect. Provided from PROC GLM. The GLM Procedure Least Squares Means geometry 0 1 speed 0 1 life LSMEAN 30.3333333 44.5000000 life LSMEAN 32.8333333 42.0000000 When angle is set to the low level, there is no significant interaction between geometry (B) and speed (C). There is a significant positive speed effect, and a significant positive geometry main effect. If you’d like to get the estimates for the parameters in the model that you fitted, you can request them with the solution option in the model statement. But I think, in this case, the means are probably easier to interpret to a client. proc glm data=lowA plot=diagnostics; class speed geometry replicate; model life=speed|geometry/solution; lsmeans geometry speed; lsmeans geometry*speed; run; Parameter Estimate Intercept speed speed geometry geometry speed*geometry speed*geometry speed*geometry speed*geometry 0 1 0 1 0 0 1 1 0 1 0 1 49.33333333 -9.66666667 0.00000000 -14.66666667 0.00000000 1.00000000 0.00000000 0.00000000 0.00000000 B B B B B B B B B Standard Error t Value Pr > |t| 3.87298335 5.47722558 . 5.47722558 . 7.74596669 . . . 12.74 -1.76 . -2.68 . 0.13 . . . <.0001 0.1156 . 0.0280 . 0.9005 . . . NOTE: The X’X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter ’B’ are not uniquely estimable. There are 4 cells in this 2-way ANOVA. Because SAS sets the effects for the final level of each factor to zero, the baseline group (i.e. cell mean represented by the intercept) is B=1 and C=1. The output shows this to be 49.333333 and that’s the same as the LSmeans output for that cell mean in the model that includes interaction (shown below). The GLM Procedure Least Squares Means speed geometry 0 0 1 1 0 1 0 1 life LSMEAN 26.0000000 39.6666667 34.6666667 49.3333333 /*Fit 2-factor model to high A.*/ data highA; set tool; if angle=1; run; proc glm data=highA plot=diagnostics; class speed geometry replicate; model life=speed|geometry; lsmeans geometry speed; run; /* The plot generated by the following is the same as that provided by PROC GLM*/ proc gplot data=highA; plot life*speed=geometry; title High angle: 2-way plot for BC; run; The GLM Procedure Class Level Information Class Levels Values speed 2 0 1 geometry 2 0 1 replicate 3 1 2 3 The GLM Procedure Dependent Variable: life Source Model Error Corrected Total DF 3 8 11 Sum of Squares 477.5833333 122.6666667 600.2500000 Mean Square 159.1944444 15.3333333 F Value 10.38 Pr > F 0.0039 Source speed geometry speed*geometry DF 1 1 1 Type III SS 216.7500000 216.7500000 44.0833333 Mean Square 216.7500000 216.7500000 44.0833333 F Value 14.14 14.14 2.88 Pr > F 0.0055 0.0055 0.1284 When angle is set to the high level, there is no significant interaction between geometry (B) and speed (C). There is a significant negative speed effect, and a significant positive geometry effect (see plot on next page). The GLM Procedure Least Squares Means geometry 0 1 speed 0 1 life LSMEAN 40.0000000 48.5000000 life LSMEAN 48.5000000 40.0000000 When angle is set to the high level, there is no significant interaction between geometry (B) and speed (C). There is a significant negative speed effect, and a significant positive geometry effect. Suppose the 3-way interaction was significant. How to proceed?... Slice the data? One could get a very similar analysis (with more degrees of freedom for error) by fitting the full model and then ‘slicing’ by angle (A). proc glm data=tool plot=diagnostics; class angle speed geometry replicate; model life=angle|speed|geometry; lsmeans angle*geometry*speed/slice=angle; run; /* slice the full model by angle level*/ The GLM Procedure Class Level Information Class angle speed geometry replicate Levels 2 2 2 3 Values 0 1 0 1 0 1 1 2 3 Number of Observations Used 24 The GLM Procedure Least Squares Means angle 0 0 0 0 1 1 1 1 speed 0 0 1 1 0 0 1 1 geometry 0 1 0 1 0 1 0 1 life LSMEAN 26.0000000 39.6666667 34.6666667 49.3333333 42.3333333 54.6666667 37.6666667 42.3333333 angle*speed*geometry Effect Sliced by angle for life angle 0 1 DF Sum of Squares Mean Square F Value Pr > F 3 3 854.916667 477.583333 284.972222 159.194444 9.45 5.28 0.0008 0.0101 | {z } If you compare the Mean Squares in the above ‘slice’ output, they match the Mean Squares for the two models we fit in the two subsetted analyses, but the F -statistics are different. Why?