Box-Cox Procedure for Single

advertisement
Box-Cox Procedure for Single-Factor ANOVA
When both normality and homoscedasticity assumptions are violated, the Box-Cox procedure provides a
general method for finding a suitable transformation of the response variable. Below, the Box-Cox
procedure is applied to the Servo-Data example. The following SAS program would need to be run for
various values of  within a range from -1 to 1. The best transformation would be the one yielding the
smallest value for SSE.
data one;
input trt y;
label trt = "Location"
y = "Time Between Failures";
ypr = y;
obs = _n_;
dum = 1;
;
cards;
1 4.41
1 100.65
1 14.45
1 47.13
1 85.21
2 8.24
2 81.16
2 7.35
2 12.29
2 1.61
3 106.19
3 33.83
3 78.88
3 342.81
3 44.33
;
proc means;
var y;
output out=ydesc min=miny n=size;
;
data two;set ydesc;
dum = 1;
;
data three;merge one two;by dum;
if miny <= 0 then y = y - miny + 1;
keep y trt size dum obs ypr;
;
data four;set three;
if obs = 1 then do;
u = ypr**(1/size);
v = u;
end;
if obs > 1 then do;
v = u*(ypr**(1/size));
u = v;
end;
retain u;
;
proc print;
;
data five;set four;if obs = size;
keep u size dum obs;
dum = 1;
;
proc print;
;
data six;merge three five;by dum;
/* The following statement needs to be changed for different values of lambda. */
lambda = 0.1;
if lambda = 0 then do;
k = u;
w = k*log(y);
end;
if lambda ne 0 then do;
k = lambda*(u**(lambda-1));
w = ((y**lambda) - 1)/k;
end;
;
proc print;
;
proc plot;
plot w*trt;
;
proc glm;
class trt;
model w = trt;
output out=resi r=resids;
means trt / lsd duncan tukey scheffe;
title "Analysis of Variance for Ch.18 p. 792";
title2 "With Follow-Up Tests";
;
data two;set resi;
/* The following statement creates a dummy variable with value 1 for every */
/* observation in the data set. This variable will be used to merge the sample */
/* statistics with every observation in the original data set. */
dum = 1;
;
proc sort;by resids;
;
proc means noprint;
var resids;
output out=meanr mean=mu std=s n=size;
title;
title2;
;
data three;set meanr;
dum = 1;
;
data three;merge two three;by dum;
p = (_n_ - 0.5)/size;
/* The following equation would need to be changed for q-q plots for other */
/* probability distributions. For example, for an exponential(mu) distribution, */
/* the statement would be Q = -mu*log(1-p). */
Q = probit(p);
;
proc plot;
plot resids*trt;
title 'Plot of Residuals vs. Factor Levels';
;
proc reg noprint;
model resids = Q;
plot (resids predicted.)*Q / overlay;
title 'Normal Probability Plot';
title2 'For Residuals from ANOVA';
;
run;
Download