Discussion 4 Stephanie Chan February 10, 2015 1 Problem 18.17 Winding Speeds. In a completely randomized design to study the eect of the speed of winding thread (1: slow; 2: normal; 3: fast; 4: maximum) onto 75-yard spools, 16 runs of 10,000 spools, each were made at each of the four winding speeds. The response variable is the number of thread breaks during the production run. The results (in time order) are as follows. i 1 2 3 1 4 3 2 2 7 6 4 3 12 6 14 4 17 15 7 j ... ... ... ... ... 14 15 16 2 3 4 4 7 6 13 10 14 19 9 23 Since the responses are counts, the researcher was concerned about the normality and equal variances assumptions of the ANOVA model (16.2) 1.1 Part a. Obtain the tted values and residuals for ANOVA model (16.2) ## Problem 18.2 ## Import the data ## set the directory to whatever directory you keep the data setwd("~/Dropbox/school/sta106-2015/discussion4/") mydata = read.table("winding.txt",header=FALSE) y = mydata[,1] x = mydata[,2] stripchart(y~x,pch=16,method="stack") 1 ## part(a) x = as.factor(x) fit = aov(y~x) # fitted values fitted(fit) 1 3.5625 11 3.5625 21 5.8750 31 5.8750 2 3 3.5625 3.5625 12 13 3.5625 3.5625 22 23 5.8750 5.8750 32 33 5.8750 10.6875 4 3.5625 14 3.5625 24 5.8750 34 10.6875 5 6 7 3.5625 3.5625 3.5625 15 16 17 3.5625 3.5625 5.8750 25 26 27 5.8750 5.8750 5.8750 35 36 37 10.6875 10.6875 10.6875 2 8 3.5625 18 5.8750 28 5.8750 38 10.6875 9 3.5625 19 5.8750 29 5.8750 39 10.6875 10 3.5625 20 5.8750 30 5.8750 40 10.6875 41 42 43 44 45 46 47 48 49 50 10.6875 10.6875 10.6875 10.6875 10.6875 10.6875 10.6875 10.6875 16.5625 16.5625 51 52 53 54 55 56 57 58 59 60 16.5625 16.5625 16.5625 16.5625 16.5625 16.5625 16.5625 16.5625 16.5625 16.5625 61 62 63 64 16.5625 16.5625 16.5625 16.5625 # residuals residuals(fit) 1 0.4375 11 -1.5625 21 1.1250 31 1.1250 41 -3.6875 51 -9.5625 61 -0.5625 1.2 2 3 4 5 6 7 -0.5625 -1.5625 -0.5625 0.4375 0.4375 -0.5625 12 13 14 15 16 17 0.4375 0.4375 -1.5625 -0.5625 0.4375 1.1250 22 23 24 25 26 27 -3.8750 3.1250 -0.8750 -0.8750 3.1250 -2.8750 32 33 34 35 36 37 0.1250 1.3125 -4.6875 3.3125 1.3125 -0.6875 42 43 44 45 46 47 -4.6875 1.3125 0.3125 -4.6875 2.3125 -0.6875 52 53 54 55 56 57 3.4375 -3.5625 -5.5625 -0.5625 8.4375 -5.5625 62 63 64 2.4375 -7.5625 6.4375 8 2.4375 18 0.1250 28 2.1250 38 -1.6875 48 3.3125 58 7.4375 Part b Prepare suitable residual plots to study whether or not the error variances are equal for the four winding speeds. What are your ndings? ## part (b) Residual plots r = residuals(fit) stripchart(r~x,method="stack",pch=16) 3 9 1.4375 19 -1.8750 29 0.1250 39 1.3125 49 0.4375 59 1.4375 10 0.4375 20 0.1250 30 -1.8750 40 6.3125 50 -1.5625 60 4.4375 1.3 Part c Test by means of the Brown-Forsythe test whether or not the treatment error variances are equal; uses α = .05. What is the p-value of the test? Are your results consistent with the diagnosis in part b? ## part (c) Brown Forsythe test # install.packages("lawstat") library(lawstat) levene.test(y,x) : : modified robust Brown-Forsythe Levene-type test based on the absolute : deviations from the median : 4 : data: y : Test Statistic = 9.5416, p-value = 3.04e-05 Test by means of the Hartley test whether or not the treatment error variance are equal; use α = .05. ## Hartley Test by(y,x,var) x: 1 [1] 1.195833 -----------------------------------------------------------x: 2 [1] 3.983333 -----------------------------------------------------------x: 3 [1] 10.49583 -----------------------------------------------------------x: 4 [1] 28.92917 # install packages("SuppDists") library(SuppDists) Hstar = 28.929/1.195 Hcrit = qmaxFratio(0.95,15,4) pval = 1-pmaxFratio(Hstar,15,4) # can also use Table B.10 Hstar Hcrit pval [1] 24.20837 [1] 3.998907 [1] 0.0003999325 1.4 Part d For each winding speed, calculate Ȳi. and si . Examine the relations found in the table on page 791 and determine the transformation that is most appropriate here. What do you conclude? 5 ## part (d) means = by(y,x,mean) sds = by(y,x,sd) sds^2/means sds/means sds/means^2 > means = by(y,x,mean) > means x: 1 [1] 3.5625 -----------------------------------------------------------x: 2 [1] 5.875 -----------------------------------------------------------[x: 3 [1] 10.6875 -----------------------------------------------------------x: 4 [1] 16.5625 > sds = by(y,x,sd) > sds x: 1 [1] 1.093542 -----------------------------------------------------------x: 2 [1] 1.995829 -----------------------------------------------------------x: 3 [1] 3.239727 -----------------------------------------------------------x: 4 [1] 5.378584 > sds^2/means x: 1 6 [1] 0.3356725 -----------------------------------------------------------x: 2 [1] 0.6780142 -----------------------------------------------------------x: 3 [1] 0.9820663 -----------------------------------------------------------x: 4 [1] 1.746667 > sds/means x: 1 [1] 0.3069591 -----------------------------------------------------------x: 2 [1] 0.3397156 -----------------------------------------------------------x: 3 [1] 0.3031324 -----------------------------------------------------------x: 4 [1] 0.3247447 > sds/means^2 x: 1 [1] 0.08616395 -----------------------------------------------------------x: 2 [1] 0.05782393 -----------------------------------------------------------x: 3 [1] 0.02836326 -----------------------------------------------------------x: 4 [1] 0.01960723 7 i 1 2 3 4 transformation 1.5 si 2 Y¯i. 0.3357 0.6780 0.9820 1.7467 √ Y si Y¯i. 0.3070 0.3397 0.3031 0.3247 log Y si 2 Y¯i. 0.0862 0.0578 0.0284 0.0196 1 Y Part e. Use the Box-Cox procedure to nd an appropriate power transformation of Y. Evaluate SSE for the values of λ given in Table 18.6. Does λ = 0 a logarithmic tranformation appear to be reasonable based on the Box-Cox procedure? ## part(e) library(MASS) fit = aov(y~x) boxcox(fit) 8 1.6 extra If you want to do analysis with log transformed data ynew = log(y) stripchart(ynew~x,method="stack",pch=16) 9 fitnew = aov(ynew~x) rnew = residuals(fitnew) stripchart(rnew~x,method="stack",pch=16) 10 2 Interaction Plot This is slightly modied from what I did in discussion today Create a fake data set with factor 1 having 3 levels and factor 2 having 2 levels. Use rnorm to generate randomly distributed normal values # make data for interaction plot set.seed(0) # only so we have the same random values each time val = rnorm(6) f1 = factor(c(1,1,1,2,2,2)) # equivalent to factor(rep(c(1,2),each=3)) f2 = factor(c(1,2,3,1,2,3)) # equivalent to factor(rep(c(1,2,3),times=2)) # help(rep) for more details 11 # check to make sure your data: val, f1, f2 all match properly f1f2 1 2 3 1 1.2629543 -0.3262334 1.3297993 2 1.2724293 0.4146414 -1.5399500 interaction.plot(f1,f2,val) interaction.plot(f2,f1,val) You can show the plots from two sides 12