SAS data statements and data

advertisement
STAT:5201 Applied Statistic II
(Factorial with 3 factors as 23 design)
Three-way ANOVA (Factorial with three factors) with replication
Factor A: angle (low=0/high=1)
Factor B: geometry (shape A=0/shape B=1)
Factor C: speed (low=0/high=1)
Response: Life of machine in tool hours.
An engineer is interested in the effects of cutting angle (A), tool geometry (B), and cutting speed (C) on
the life (in hours) of a machine tool. Three runs are done for each combination of factor levels, and all runs
are done in random order. This is a completely randomized design (CRD). { D.C. Montgomery (2005).
Design and analysis of experiments. John Wiley & Sons: USA. }
SAS data statements and data:
/*Factor A: angle
Factor B: geometry
Factor C: speed*/
data tool;
do angle = 0,1;
do geometry = 0,1;
do speed = 0,1;
do replicate = 1 to 3;
input life @@;
output;
end;
end;
end;
end;
datalines;
22 31 25 32 43 29 35 34 50 55 47 46
44 45 38 40 37 36 60 50 54 39 41 47
;
run;
proc print data=tool;
run;
Obs
angle
geometry
speed
replicate
life
1
2
3
4
5
6
7
8
9
10
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
1
1
1
0
0
0
1
1
2
3
1
2
3
1
2
3
1
22
31
25
32
43
29
35
34
50
55
11
12
13
14
15
16
17
18
19
20
21
22
23
24
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
47
46
44
45
38
40
37
36
60
50
54
39
41
47
proc glm data=tool plot=diagnostics;
class angle geometry speed replicate;
model life=angle|geometry|speed;
/* Full model fits a ‘separately fit cell mean’*/
run;
Partial output:
The GLM Procedure
Dependent Variable: life
Source
Model
Error
Corrected Total
DF
7
16
23
Sum of
Squares
1612.666667
482.666667
2095.333333
Mean Square
230.380952
30.166667
F Value
7.64
Pr > F
0.0004
DF
Type III SS
Mean Square
F Value
Pr > F
1
1
1
1
1
1
1
280.1666667
770.6666667
48.1666667
0.6666667
468.1666667
16.6666667
28.1666667
280.1666667
770.6666667
48.1666667
0.6666667
468.1666667
16.6666667
28.1666667
9.29
25.55
1.60
0.02
15.52
0.55
0.93
0.0077
0.0001
0.2245
0.8837
0.0012
0.4681
0.3483
Dependent Variable: life
Source
angle
geometry
angle*geometry
speed
angle*speed
geometry*speed
angle*geometry*speed
<---
The diagnostic plots look OK, and the 3-way interaction is not significant here, so that term could be removed from the model (which places it in the error term).
Just for the sake of investigation, let’s look at the 2-way interaction plots of angle (A) and geometry (B)
for each level of speed (C) anyway... I would not show this to my client, but it’s for my own information.
/*Plot 2-way interaction plots of AB for each level of C separately.*/
data lowC; set tool;
if speed=0;
run;
proc print data=lowC;
run;
symbol1 interpol=std1mj value=star line=1 color=black;
symbol2 interpol=std1mj value=diamond line=2 color=blue;
proc gplot data=lowC;
plot life*angle=geometry/haxis=-0.5 to 1.5;
title "low speed: 2-way plot for AB";
run;
data highC; set tool;
if speed=1;
run;
proc print data=highC;
run;
proc gplot data=highC;
plot life*angle=geometry/haxis=-0.5 to 1.5;
title "high speed: 2-way plot for AB";
run;
Though it is visually apparent that these two interaction plots are not the same, the two interactions
presented in them are not statistically significantly different. The quantitative value of the 2-way interaction
within each plot can be visualized by first considering the difference between lines at each respective
angle level (there are two such differences), then subtracting these two differences. The values for these
interactions did not test as significantly different, and thus, the 3-way interaction tested as not significant.
According to the Type III ANOVA table, the 2-way interaction between angle (A) and speed (C) is
significant, and the other 2-way interactions are not significant (AB and BC). We will look at the ‘marginal’
2-way interaction plot for each combination of factors AB, AC, and BC (these plots average over replicates
in a cell and over the levels of the unplotted factor)...
Source
angle*geometry
angle*speed
geometry*speed
angle*geometry*speed
DF
Type III SS
Mean Square
F Value
Pr > F
1
1
1
1
48.1666667
468.1666667
16.6666667
28.1666667
48.1666667
468.1666667
16.6666667
28.1666667
1.60
15.52
0.55
0.93
0.2245
0.0012
0.4681
0.3483
/* Look at the marginal 2-way interaction plots.*/
symbol1 interpol=std1mj value=star line=1 color=black;
symbol2 interpol=std1mj value=diamond line=2 color=blue;
proc gplot data=tool;
plot life*angle=geometry/haxis=-.5 to 1.5;
title "AB interaction (averaged across third factor)";
proc gplot data=tool;
plot life*angle=speed/haxis=-.5 to 1.5;
title "AC interaction (averaged across third factor)";
proc gplot data=tool;
plot life*speed=geometry/haxis=-.5 to 1.5;
title "BC interaction (averaged across third factor)";
run;
The type of interaction in the AC plot causes concern for making global statements about the main effects
for angle (A) and speed(C), and this interaction is statistically significant. When angle is low (far left
side), speed has a positive effect on life, and when angle is high (far right side), speed has a negative effect
on life. The minimal model should include: A, B, C, AC (following the hierarchy principle).
Suppose the 3-way interaction was significant. How to proceed?... Subset data?
One could proceed by considering a separate two-factor factorial model for each level of angle that includes
speed and geometry.
/*Fit 2-factor model for low A.*/
data lowA; set tool;
if angle=0;
run;
proc glm data=lowA plot=diagnostics;
class speed geometry replicate;
model life=speed|geometry;
lsmeans geometry speed;
run;
/* The plot generated by the following is the same as that provided by PROC GLM*/
proc gplot data=lowA;
plot life*speed=geometry;
title Low angle: 2-way plot for BC;
run;
The GLM Procedure
Class Level Information
Class
speed
geometry
replicate
Levels
2
2
3
Values
0 1
0 1
1 2 3
The GLM Procedure
Dependent Variable: life
Source
Model
Error
Corrected Total
DF
3
8
11
Sum of
Squares
854.916667
360.000000
1214.916667
Mean Square
284.972222
45.000000
F Value
6.33
Pr > F
0.0166
Source
speed
geometry
speed*geometry
DF
1
1
1
Type III SS
252.0833333
602.0833333
0.7500000
Mean Square
252.0833333
602.0833333
0.7500000
F Value
5.60
13.38
0.02
Pr > F
0.0455
0.0064
0.9005
When angle is set to the low level, there is no significant interaction between geometry (B) and speed (C)
(see plot next page). There is a significant positive speed effect, and a significant positive geometry main
effect.
Provided from PROC GLM.
The GLM Procedure
Least Squares Means
geometry
0
1
speed
0
1
life LSMEAN
30.3333333
44.5000000
life LSMEAN
32.8333333
42.0000000
When angle is set to the low level, there is no significant interaction between geometry (B) and speed (C).
There is a significant positive speed effect, and a significant positive geometry main effect.
If you’d like to get the estimates for the parameters in the model that you fitted, you can request them
with the solution option in the model statement. But I think, in this case, the means are probably easier
to interpret to a client.
proc glm data=lowA plot=diagnostics;
class speed geometry replicate;
model life=speed|geometry/solution;
lsmeans geometry speed;
lsmeans geometry*speed;
run;
Parameter
Estimate
Intercept
speed
speed
geometry
geometry
speed*geometry
speed*geometry
speed*geometry
speed*geometry
0
1
0
1
0
0
1
1
0
1
0
1
49.33333333
-9.66666667
0.00000000
-14.66666667
0.00000000
1.00000000
0.00000000
0.00000000
0.00000000
B
B
B
B
B
B
B
B
B
Standard
Error
t Value
Pr > |t|
3.87298335
5.47722558
.
5.47722558
.
7.74596669
.
.
.
12.74
-1.76
.
-2.68
.
0.13
.
.
.
<.0001
0.1156
.
0.0280
.
0.9005
.
.
.
NOTE: The X’X matrix has been found to be singular, and a generalized inverse
was used to solve the normal equations. Terms whose estimates are
followed by the letter ’B’ are not uniquely estimable.
There are 4 cells in this 2-way ANOVA. Because SAS sets the effects for the final level of each factor to
zero, the baseline group (i.e. cell mean represented by the intercept) is B=1 and C=1. The output shows
this to be 49.333333 and that’s the same as the LSmeans output for that cell mean in the model that
includes interaction (shown below).
The GLM Procedure
Least Squares Means
speed
geometry
0
0
1
1
0
1
0
1
life LSMEAN
26.0000000
39.6666667
34.6666667
49.3333333
/*Fit 2-factor model to high A.*/
data highA; set tool;
if angle=1;
run;
proc glm data=highA plot=diagnostics;
class speed geometry replicate;
model life=speed|geometry;
lsmeans geometry speed;
run;
/* The plot generated by the following is the same as that provided by PROC GLM*/
proc gplot data=highA;
plot life*speed=geometry;
title High angle: 2-way plot for BC;
run;
The GLM Procedure
Class Level Information
Class
Levels
Values
speed
2
0 1
geometry
2
0 1
replicate
3
1 2 3
The GLM Procedure
Dependent Variable: life
Source
Model
Error
Corrected Total
DF
3
8
11
Sum of
Squares
477.5833333
122.6666667
600.2500000
Mean Square
159.1944444
15.3333333
F Value
10.38
Pr > F
0.0039
Source
speed
geometry
speed*geometry
DF
1
1
1
Type III SS
216.7500000
216.7500000
44.0833333
Mean Square
216.7500000
216.7500000
44.0833333
F Value
14.14
14.14
2.88
Pr > F
0.0055
0.0055
0.1284
When angle is set to the high level, there is no significant interaction between geometry (B) and speed (C).
There is a significant negative speed effect, and a significant positive geometry effect (see plot on next page).
The GLM Procedure
Least Squares Means
geometry
0
1
speed
0
1
life LSMEAN
40.0000000
48.5000000
life LSMEAN
48.5000000
40.0000000
When angle is set to the high level, there is no significant interaction between geometry (B) and speed
(C). There is a significant negative speed effect, and a significant positive geometry effect.
Suppose the 3-way interaction was significant. How to proceed?... Slice the data?
One could get a very similar analysis (with more degrees of freedom for error) by fitting the full model and
then ‘slicing’ by angle (A).
proc glm data=tool plot=diagnostics;
class angle speed geometry replicate;
model life=angle|speed|geometry;
lsmeans angle*geometry*speed/slice=angle;
run;
/* slice the full model by angle level*/
The GLM Procedure
Class Level Information
Class
angle
speed
geometry
replicate
Levels
2
2
2
3
Values
0 1
0 1
0 1
1 2 3
Number of Observations Used
24
The GLM Procedure
Least Squares Means
angle
0
0
0
0
1
1
1
1
speed
0
0
1
1
0
0
1
1
geometry
0
1
0
1
0
1
0
1
life LSMEAN
26.0000000
39.6666667
34.6666667
49.3333333
42.3333333
54.6666667
37.6666667
42.3333333
angle*speed*geometry Effect Sliced by angle for life
angle
0
1
DF
Sum of
Squares
Mean Square
F Value
Pr > F
3
3
854.916667
477.583333
284.972222
159.194444
9.45
5.28
0.0008
0.0101
|
{z
}
If you compare the Mean Squares in the above ‘slice’ output, they match the Mean Squares for the two
models we fit in the two subsetted analyses, but the F -statistics are different. Why?
Download