2014 Final Exam KEY Question 1 1.1 [25 points] [2 points] Describe in detail the design of this experiment [see appendix]. RCBD Split-split plot with a 3x2x3 factorial treatment structure, with 3 reps per block*factorial combination Weight (in ounces) a) Mainplot: a group of six piglets; b) Subplot: one piglet. c) Sub-subplot: Repeated measures of weight per piglet Design: Response Variable: Experimental Unit: Class Variable 1 2 3 4 Block or Treatment Block Treatment Treatment Treatment Number of Levels 2 3 2 3 Subsamples? Covariable? Fixed or Random Random Fixed Fixed Fixed Description Farm Breeds of pigs Diets Time (weeks) NO NO Data Pigs; Do Block = 1 to 2; Do Breed = 1 to 3; Do Rep = 1 to 3; Do Diets = 1 to 2; Do Time = 1 to 3; Input Weight @@; Output; End; End; End; End; End; Cards; 75 77 73 81 82 77 92 91 93 59 50 59 69 89 74 83 91 89 82 102 100 89 79 76 74 73 91 88 89 97 88 86 105 103 107 83 71 79 77 59 76 68 81 76 74 83 81 83 56 71 61 65 74 75 76 78 84 82 96 89 97 62 53 74 77 72 72 76 86 82 80 95 103 93 65 68 71 73 59 71 78 66 59 71 71 72 82 68 62 78 77 89 96 71 80 84 84 91 100 71 95 89 80 82 87 77 77 81 ; Proc GLM Data = Pigs Order = Data; Class Block Breed Diets Time; Model Weight = Block Breed Block*Breed Diets Breed*Diets Block*Breed*Diets Time Breed*Time Diets*Time Breed*Diets*Time; test h=Breed e= Block*Breed; test h=Diets e= Block*Breed*Diets; test h=Breed*Diets e= Block*Breed*Diets; Means Breed / tukey e= Block*Breed; Means Diets; Means Time; Contrast "Time lineal" Time 1 0 -1; Contrast "Time quadratic" Time 1 -2 1; Output out = PigsPR R=Res P=Pred; Proc Univariate Data = PigsPR Normal; Var Res; Proc GLM Data = Pigs; * Levene's for Breed; Class Breed; Model Weight = Breed; Means Breed / hovtest = Levene; Proc GLM Data = Pigs; * Levene's for Diets; Class Diets; Model Weight = Diets; Means Diets / hovtest = Levene; Proc GLM Data = Pigs; * Levene's for Time; Class Time; Model Weight = Time; Means Time / hovtest = Levene; Run; Quit; 1.2 [3 points] Show that the data meet the assumptions of normality of residuals and homogeneity of variances among Diets, among Breeds and among times. Tests for Normality Test Statistic p Value Shapiro-Wilk W 0.989571 Pr < W 0.5753 Levene's Test for Homogeneity of Weight Variance ANOVA of Squared Deviations from Group Means Source DF Sum of Squares Mean Square Breed 2 10679.4 5339.7 Error 105 1284041 12229.0 Levene's Test for Homogeneity of Weight Variance ANOVA of Squared Deviations from Group Means Source DF Sum of Squares Mean Square Diets 1 19180.3 19180.3 Error 106 3693421 34843.6 Levene's Test for Homogeneity of Weight Variance ANOVA of Squared Deviations from Group Means F Value 0.44 Pr > F 0.6474 F Value 0.55 Pr > F 0.4598 Source Time Error DF 2 105 Sum of Squares 10020.7 2306986 Mean Square 5010.3 21971.3 F Value 0.23 Pr > F 0.7965 All assumptions are met. We fail to reject the null hypothesis that the residuals are normally distributed (p = 0.5753). We fail to reject the null hypotheses that Time, Diets and Breed are homogeneous (p=0.7965, p = 0.4598, and p=0.6474, respectively). 1.3 [7 points] Run the appropriate ANOVA and answer the following questions (with p-values): a. Is there a significant effect of breed on weight? b. Is there a significant effect of diet on weight? c. Is there a significant effect of time (weeks) on weight? d. Are there any significant interactions between Diets and Breeds? Dependent Variable: Weight Source DF Sum of Squares Mean Square F Value Pr > F Model 23 12954.85185 563.25443 22.20 <.0001 Error 84 2131.55556 25.37566 Corrected Total 107 15086.40741 R-Square Coeff Var Root MSE Weight Mean 0.858710 6.342294 5.037426 79.42593 Source Block Block*Breed Diets Breed*Diets Block*Breed*Diets Time Breed*Time Diets*Time Breed*Diets*Time DF 1 2 1 2 3 2 4 2 4 Type III SS 3245.037037 35.018519 1108.481481 47.574074 57.055556 2390.740741 98.148148 156.074074 208.370370 Mean Square 3245.037037 17.509259 1108.481481 23.787037 19.018519 1195.370370 24.537037 78.037037 52.092593 F Value 127.88 0.69 43.68 0.94 0.75 47.11 0.97 3.08 2.05 Pr > F <.0001 0.5044 <.0001 0.3957 0.5257 <.0001 0.4301 0.0514 0.0943 *** NS NS NS Tests of Hypotheses Using the Type III MS for Block*Breed as an Error Term Source DF Type III SS Mean Square F Value Pr > F Breed 2 5608.351852 2804.175926 160.15 0.0062 *** Tests of Hypotheses Using the Type III MS for Source DF Type III SS Mean Square Diets 1 1108.481481 1108.481481 Breed*Diets 2 47.574074 23.787037 Block*Breed*Diets as an Error Term F Value Pr > F 58.28 0.0047 *** 1.25 0.4027 NS a) Yes, there is a significant effect of Breed on weight (p=0.0062). b) Yes, there is a significant effect of Diet on weight (p = 0.0047) c) Yes, there is a significant effect of Time on weight (p = <0.0001), but due to the fact that it is significant, we have to proceed with the conservative degrees of freedom test. d) No, the interaction between Diet and Breed is not significant. 1.4 [3 points] If appropriate, use conservative degrees of freedom to test the Time and all 2 and 3 way Diet and Breed interactions with Time. State whether or not you should continue the analysis past this step. If so, carry out the appropriate next analysis and run any other assumptions tests. Do your conclusions about Time or Time interactions change using regular and conservative degrees of freedom? ADJUSTED: Error DF 84 ADJ DF 42 Source DF Time 2 Breed*Time 4 Diets*Time 2 Breed*Diets*Time 4 Adj df 1 2 1 2 F Value 47.11 0.97 3.08 2.05 p-value <.0001 *** 0.4301 NS 0.0514 NS 0.0943 NS adj. p-value 0.0001 *** 0.3874 NS 0.0866 NS 0.1414 NS All of the factors and interactions have the same significance or non-significance as in the split-split plot analysis, so we stop here, and do not proceed to do a repeated measures analysis. None of the conclusions change from the previous analysis (Time is significant, all interactions are NS). 1.5 [3 points] Carry out the appropriate means separation tests to compare all pairs of Breeds (controlling the maximum experiment wise error rate). Which breed weighed the most (on average)? Tukey's Studentized Range (HSD) Test for Weight Note: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ. Alpha 0.05 Error Degrees of Freedom 2 Error Mean Square 17.50926 Critical Value of Studentized Range 8.33078 Minimum Significant Difference 5.8099 Means with the same letter are not significantly different. Tukey Grouping Mean N Breed A 89.2500 36 3 B 76.8611 36 2 B 72.1667 36 1 Breed 3 had the greatest average weight over the course of the study (89.25 ounces), which was significantly different from the average weight of Breeds 1 and 2 (which were not significantly different from one another). Level of Diets N Weight Mean Std Dev 1 54 82.6296296 12.0600640 2 54 76.2222222 10.8760888 Diet 1 had the greatest average weight over the course of the study (82.630 ounces), which was significantly different from the average weight of Diet 2 (p = 0.0047), from the ANOVA. 1.6 [3 points] Use contrasts to determine if the response in time is lineal. Dependent Variable: Weight Contrast DF Contrast SS Mean Square F Value Pr > F Time lineal 1 2334.722222 2334.722222 Time quadratic 1 56.018519 56.018519 92.01 <.0001 2.21 0.1411 The response in time is lineal (p<.0001). 1.7 [3 points] Graph the main effects or the appropriate interactions if significant and comment. proc gplot data=Pigs ; ** Main effect plots **; axis1 offset=(5 pct,5 pct); axis2 offset = (5 pct,5 pct); symbol1 i=std1mtj v=none color=BLUE; plot Weight * Breed = 1 / ; run; axis1 offset=(5 pct,5 pct); axis2 offset = (5 pct,5 pct); symbol1 i=std1mtj v=none color=RED; plot Weight * Diets = 1 / ; run; axis1 offset=(5 pct,5 pct); axis2 offset = (5 pct,5 pct); symbol1 i=std1mtj v=none color=RED; plot Weight * Time = 1 / ; run; The graphs confirm what we know from the ANOVA and means separation above. The highest weight was found at Time 3, Breed 3, and Diet 1. Since the interactions are not significant, the appropriate graphs are the main effects 1.8 [1 points] What other measurement could the researchers have taken before starting the experiment that could have been used as covariable to reduce the variability of the experiment? The researchers could have taken a measurement of initial pig weight (at birth), in order to determine if the differences in average weight were due to diet or breed, or if they were due to differences in initial weight. Question 2 2.1 [25 points] [2 points] Describe in detail the design of this experiment [see appendix]. Response Variable: Experimental Unit: RCBD with a 3 x 4 factorial treatment structure nested in locations with 1 rep per block*factorial combination Healthy Fruit per tree Tree Class Variable 1 2 3 4 Number of Levels 10 2 3 3 Design: Block or Treatment Treatment Block Treatment Treatment Subsamples? Covariable? Fixed or Random Random Random Fixed Fixed Description Locations/Farmers Greenhouses Varieties (2 new and 1 old) Pesticides (0, 2, and 4 qts/A) NO NO Data Lemon; Do Block = 1 to 2; Do Trt = 1 to 9; Do Location = 1 to 10; Input Y @@; Output; End; End; End; Cards; 74 65 55 64 57 56 64 56 67 76 56 56 54 76 61 55 50 61 67 59 50 82 68 57 71 84 72 81 80 64 65 69 51 71 68 64 59 51 63 70 68 52 78 77 60 83 66 74 72 73 50 67 63 67 68 64 80 85 64 82 62 78 58 61 66 50 52 60 56 61 56 50 50 50 50 56 75 66 57 57 55 50 50 66 63 60 55 64 64 64 75 67 60 58 77 50 65 59 56 63 62 72 50 59 76 80 73 73 81 58 67 75 73 67 74 64 64 56 70 81 62 50 81 72 68 79 85 81 77 50 71 61 57 81 68 83 56 71 51 67 56 66 87 72 58 57 58 51 62 55 64 57 65 65 57 57 69 66 62 66 67 69 60 66 60 79 63 51 68 58 73 66 73 67 60 62 55 74 73 60 ; Proc GLM Order = Data; Class Location Block Trt; Model Y = Location Block(Location) Trt Trt*Location; Random Location Block(Location) Trt*Location / test; Contrast 'MvsO' Trt 1 1 -2 1 1 -2 1 1 -2 / e = Trt*Location; Contrast 'MvsM' Trt 1 -1 0 1 -1 0 1 -1 0 / e = Trt*Location; Contrast 'Lin' Trt 1 1 1 0 0 0 -1 -1 -1 / e = Trt*Location; Contrast 'Quad' Trt 1 1 1 -2 -2 -2 1 1 1 / e = Trt*Location; Contrast 'MOxL' Trt 1 1 -2 0 0 0 -1 -1 2 / e = Trt*Location; Contrast 'MOxQ' Trt 1 1 -2 -2 -2 4 1 1 -2 / e = Trt*Location; Contrast 'MMxL' Trt 1 -1 0 0 0 0 -1 1 0 / e = Trt*Location; Contrast 'MMxQ' Trt 1 -1 0 -2 2 0 1 -1 0 / e = Trt*Location; LSMeans Trt / pdiff adjust=tukey lines e = trt*Location; Output out = LemonPR p = Pred r =Res; Proc Univariate Data = LemonPR normal; Var Res; Proc Plot Data = LemonPR; Plot Res*Pred = Trt; Proc GLM; Class Location; Model Y = Location; Means Location / hovtest = Levene; Proc GLM; Class Trt; Model Y = Trt; Means Trt / hovtest = Levene; Proc VarComp Method = Type1; Class Location Block Trt; Model Y = Trt Location Block(Location) Trt*Location / Fixed = 1; Run; Quit; 2.2 [3 points] Show that the data meet the assumptions of normality of residuals and homogeneity of variances among treatment groups. Tests for Normality Test Statistic p Value Shapiro-Wilk W 0.992939 Pr < W 0.5360 Levene's Test for Homogeneity of Y Variance ANOVA of Squared Deviations from Group Means Source DF Sum of Squares Mean Square F Value Pr > F Location 9 71816.0 7979.6 0.80 0.6153 Error 170 1692292 9954.7 Levene's Test for Homogeneity of Y Variance ANOVA of Squared Deviations from Group Means Source DF Sum of Squares Mean Square F Value Pr > F Trt 8 20972.5 2621.6 0.41 0.9166 Error 171 1106866 6472.9 All assumptions are met. We fail to reject the null hypothesis that the residuals are normally distributed (p = 0.5360). We fail to reject the null hypotheses that Location and Trt Variances are homogeneous (p=0.6153, and p=0.9166, respectively). 2.3 [6 points] Using a single treatment factor with 8 levels representing the 8 combinations of pesticide rate and variety, carry out the appropriate Proc GLM for this experiment and answer the following questions based on the overall ANOVA (report F and p-values): a. b. c. d. Does treatment significantly affect fruit number? Is the effect of treatment location specific? Is there significant variation in fruit number within locations? Is there significant variation in fruit number among locations? Source DF Type III SS Mean Square F Value Pr > F Location 9 1963.466667 218.162963 1.86 0.1553 Error 12.272 1441.755706 117.485185 Error: MS(Block(Location)) + MS(Location*Trt) - MS(Error) Source DF Type III SS Mean Square Block(Location) 10 1015.222222 101.522222 Location*Trt 72 4804.933333 66.735185 Error: MS(Error) 80 4061.777778 50.772222 F Value Pr > F 2.00 0.0441 1.31 0.1168 Source DF Trt 8 Error 72 Error: MS(Location*Trt) F Value Pr > F 7.27 <.0001 Type III SS Mean Square 3880.844444 485.105556 4804.933333 66.735185 Source Type III Expected Mean Square Location Var(Error) + 2 Var(Location*Trt) + 9 Var(Block(Location)) + 18 Var(Location) Block(Location) Var(Error) + 9 Var(Block(Location)) Trt Var(Error) + 2 Var(Location*Trt) + Q(Trt) Location*Trt Var(Error) + 2 Var(Location*Trt) a) Yes, there is a significant effect of Treatment on fruit number (p=<0.0001). b) No, the effect of treatment is not location specific (interaction between location and treatment) (p = 0.1168). c) Yes, there is significant variation in fruit number between greenhouses (p = 0.0441). d) No, there is not significant variation in fruit number among locations (p = 0.1553). 2.4 [6 points] Use contrasts to partition the sums of squares associated with 2.3 in order to help you answer the following questions (report p-values): a. Is there a difference between the new varieties and the old variety? b. Is there a difference between the two new varieties? c. Characterize the response of fruit number to pesticide rate. d. Are the dosage responses different between old and new varieties? e. Are the dosage responses different between the two new cultivars? Contrast Coefficients: Dosage Variety Mvs old M vs M L Q MOxL MOxQ MMxL MMxQ 0 1 2 3 1 1 -2 1 -1 0 1 1 1 1 1 1 1 1 -2 1 1 -2 1 -1 0 1 -1 0 2 1 2 3 1 1 -2 1 -1 0 0 0 0 -2 -2 -2 0 0 0 -2 -2 4 0 0 0 -2 2 0 1 1 1 -1 1 -1 1 -1 1 Tests of Hypotheses Using the Type III MS for Contrast DF Contrast SS Mean Square F MvsO 1 268.669444 268.669444 4.03 MvsM 1 492.075000 492.075000 7.37 Lin 1 2990.008333 2990.008333 44.80 Quad 1 14.802778 14.802778 0.22 MOxL 1 49.504167 49.504167 0.74 MOxQ 1 14.734722 14.734722 0.22 MMxL 1 27.612500 27.612500 0.41 MMxQ 1 23.437500 23.437500 0.35 4 . 2 3 1 -2 -1 0 -1 -1 1 1 -1 2 1 -2 1 0 -1 0 Location*Trt as an Error Term Pr > F 0.0486 0.0083 <.0001 0.6391 0.3919 0.6399 0.5221 0.5553 a) Yes, there is a difference in fruit number between the new varieties and the old variety (p = 0.0486). b) Yes, there is a difference in fruit number between the two new varieties (p = 0.0083). c) The pesticide rates can be characterized by a linear response (p = <0.0001), but not a quadratic response (p = 0.6391). d) No, there is not an interaction between the new varieties and the old variety and the different pesticide rates. The new and old varieties do not have different linear or quadratic responses (p = .6391, and 0.6399, respectively). e) No, the responses of the two new varieties are not different at the different pesticide rates (p = 0.5221 and 0.5553, respectively). 2.5 [3 points] Rank the 9 treatment combinations in terms of mean fruit number and assign them to significance groups using Tukey’s method of means separation. Tukey Comparison Lines for Least Squares Means of Trt LS-means with the same letter are not significantly different. Y LSMEAN Trt LSMEAN Number A 72.10 7 7 V1R4 B A 69.00 9 9 V3R4 B A 68.95 4 4 V1R2 B A C 67.50 8 8 V2R4 B D C 63.65 5 5 V2R2 B D C 62.85 6 6 V3R2 B D C 61.85 1 1 V1R0 D C 59.60 2 2 V2R0 D 57.20 3 3 V3R0 2.6 [1 points] Among the group of varieties with the highest yield (NS differences in yield among each other in Tukey), which variety would you recommend to minimize the use of pesticide? . Variety 1 with Rate 2 Means separation: The means separation test indicates that Variety 1 at 4 qt/A, Variety 3 at 4 qt/A, Variety 1 at 2 qt/A and Variety 2 and 4 qt/A are all not significantly different from one another, in terms of healthy fruit /tree. So, at the high rate of pesticide, all varieties are the same, but at a lower pesticide rate (2 qt/A), Variety 1 is recommended. 75 70 65 V1 V2 60 55 50 R0 R2 R4 2.7 [1 points] Would you feel comfortable in extending your recommendation to all greenhouse locations in Southern California? YES, because there are no significant interaction between location and treatments, so the response is the same across locations (in addition there are no significant differences across locations) 2.8. [3 points] What is the major source of variation in this experiment (use Proc VarComp)? Type 1 Estimates Variance Component Estimate Percent Var(Location) 5.59321 0.079919212 Var(Block(Location)) 5.63889 0.080571916 Var(Location*Trt) 7.98148 0.114044278 Var(Error) 50.77222 0.725464594 Total =69.9858 What this table shows is that the component of variation due to differences among locations (7.9% of the total) is small compared to the component of variation due to error (72.5% of the total). The residual error is the largest component of variation. This explains the relatively low proportion of variation explained by our model (R2=0.742) Question 3 data final_2014_2; Input Block Gene $ Cultivar ID $ yield; *yield = badyield**(-1.98323); *yield = log10(badyield); *; *yield = sqrt(badyield); *; Cards; 1 T 1 A 138 1 T 2 B 114 1 T 3 C 89 1 T 4 D 73 1 S 1 E 97 1 S 2 F 115 1 S 3 G 48 1 S 4 H 49 2 T 1 A 124 2 T 2 B 99 2 T 3 C 74 2 T 4 D 49 2 S 1 E . 2 S 2 F 86 2 S 3 G 46 2 S 4 H 46 3 T 1 A 88 3 T 2 B 67 3 T 3 C 70 3 T 4 D 47 3 S 1 E 58 3 S 2 F 74 3 S 3 G 44 3 S 4 H 44 4 T 1 A 60 4 T 2 B 62 4 T 3 C 60 4 T 4 D 43 4 S 1 E 50 4 S 2 F 55 4 S 3 G 41 4 S 4 H 40 ; Proc Print data = final_2014_2; ID Block Gene Cultivar ID; var yield; Proc GLM data = final_2014_2; title 'Exploratory model 2'; Class Block Gene Cultivar; model yield = Gene|Cultivar|Block @2; Proc GLM data = final_2014_2; title 'ANOVA'; Class Block Gene Cultivar; model yield = Gene|Cultivar Block; Proc GLM data = final_2014_2; title 'TRTMT ANOVA'; Class Block ID; model yield = Block ID; means ID; Output out = fPR r = fres p = fpred; Contrast 'diff. btw. African' ID Contrast 'diff. Btw. C. American' ID Contrast 'diff. btw Af and Am' ID Contrast 'Gene' ID Contrast 'int. gene by African' ID Contrast 'int. gene by C. American' ID Contrast 'int. gene by btw Af and Am' ID Proc Gplot data = fPR; title 'res*pred'; plot fres*fpred = ID; [20 points] 1 -1 0 0 1 -1 0 0; 0 0 1 -1 0 0 1 -1; 1 1 -1 -1 1 1 -1 -1; 1 1 1 1 -1 -1 -1 -1; 1 -1 0 0 -1 1 0 0; 0 0 1 -1 0 0 -1 1; 1 1 -1 -1 -1 -1 1 1; Proc GLM data = final_2014_2; title 'levene'; Class ID; model yield = ID; means ID / Hovtest = Levene; Proc Univariate data = fPR normal; title 'Normality test'; var fres; qqplot; histogram; proc gplot data = final_2014_2; title 'Gene1*Cultivar interaction TRANSFORMED DATA/UNTRANFORME DATA'; ** Two-way Plots **; axis1 offset=(5 pct,5 pct); axis2 offset = (5 pct,5 pct); symbol1 i=std1mtj v=none color=BLUE; symbol2 i=std1mtj v=none color=BLACK; symbol3 i=std1mtj v=none color=GREEN; symbol4 i=std1mtj v=none color=ORANGE; symbol5 i=std1mtj v=none color=RED; plot yield * Gene = Cultivar/ description = "Plot of Yield by Gene2 and cultivar genetic background"; run; quit; 3.1 [2 points] Use the table in the appendix to describe the design of the experiment. Design: Response Variable: Experimental Unit: Class Variable 1 2 3 RCBD factorial Yield in bushels / Acre A 30 m2 Field Plot Block or Treatment Block Treatment Treatment No. of Levels 4 4 2 Subsamples? Covariable NO NO Description 240 m2 field segment Genetic background of one of four cultivars On of two alleles of the same gene 3.2 [3 points] Check all of the assumptions of your model. Tests for Normality Test Statistic p Value Shapiro-Wilk W 0.967291 Pr < W 0.4478 The Shapiro-Wilk test suggest that the data is normality distributed (P = 0.4478). Levene's Test for Homogeneity of yield Variance Levene's Test for Homogeneity of yield Variance ANOVA of Squared Deviations from Group Means Source ID Error DF Sum of Squares Mean Square F Value Pr > F 7 2816790 402399 23 3037586 132069 3.05 0.0202 Levene's Test for Homogeneity of yield Variance ANOVA of Squared Deviations from Group Means Source DF Sum of Squares Mean Square F Value Pr > F Gene 1 487771 487771 Error 29 22348814 770649 0.63 0.4327 Levene's Test for Homogeneity of yield Variance ANOVA of Squared Deviations from Group Means Source DF Sum of Squares Mean Square F Value Pr > F Cultivar Error 3 3440166 1146722 27 6296861 233217 4.92 0.0075 The Levene’s test suggest that the variance between treatments is not homogeneous (ID P = 0.020 or if run separately Cultivar P=0.0075). Both analyses are considered correct Source DF Type III SS Mean Square F Value Pr > F Gene 1 2033.472222 2033.472222 47.12 0.0001 Cultivar 3 7999.003472 2666.334491 61.78 <.0001 Gene*Cultivar 3 317.601852 7.36 0.0109 Block 3 6970.836806 2323.612269 53.84 <.0001 Block*Gene 3 309.597222 103.199074 2.39 0.1442 Block*Cultivar 9 2290.284722 254.476080 5.90 0.0101 952.805556 The exploratory model suggest that the block effects are multiplicative (P = 0.0101). Data will be transformed but to help with the later analysis of the interactions, the ANOVA is also run on the untransformed data Source DF Type III SS Mean Square F Value Pr > F Gene 1 2482.370400 2482.370400 16.43 0.0006 Cultivar 3 7854.252976 2618.084325 17.33 <.0001 Gene*Cultivar 3 1205.018601 401.672867 2.66 0.0761 Block 3 6824.190476 2274.730159 15.06 <.0001 Contrast diff. btw. African DF Contrast SS Mean Square F Value Pr > F 1 19.580875 19.580875 0.13 0.7226 Contrast DF Contrast SS Mean Square F Value Pr > F diff. Btw. C. American 1 410.062500 2.71 0.1151 diff. btw Af and Am 1 7443.188582 7443.188582 49.26 <.0001 Gene 1 2482.370400 2482.370400 16.43 0.0006 int. gene by African 1 787.537396 787.537396 5.21 0.0335 int. gene by C. American 1 410.062500 410.062500 2.71 0.1151 int. gene by btw Af and Am 1 0.910173 0.910173 0.01 0.9389 410.062500 No overall significant interaction, but one of the contrast for interactions is significant even in the untransformed data 3.3 [6 points] If necessary transform the data using a power transformation and report the results demonstrating that all the assumptions are met. Run the appropriate ANOVA model to test for effects of the gene and cultivar genetic background and the interactions between them. Report the results of your ANOVA and describe which effects are significant. data final_power_trans; input means stddev; logmeans = log10(means); logvar = log10(stddev*stddev); cards; 102.50 35.30 85.50 25.09 73.25 12.04 53.00 13.56 68.33 25.15 82.50 25.15 44.75 2.99 44.75 3.77 ; Proc GLM data = final_power_trans; title 'power trans'; model logvar = logmeans; run; quit; Parameter Estimate Standard Error t Value Pr > |t| Intercept -7.580524919 1.91249374 -3.96 0.0074 logmeans 1.04656856 5.15 0.0021 5.394158485 a = 1 – (5.394158485 / 2) = -1.98323 The data will be transformed by raising them to the power of -1.98323. TRANSFORMED DATA Test for Normality OK Test Statistic p Value Shapiro-Wilk W 0.952353 Pr < W 0.1812 The Shapiro-Wilk test suggest that the data is normally distributed (P = 0.1812). Test of homogeneity of variances Levene's Test for Homogeneity of yield Variance ANOVA of Squared Deviations from Group Means Source DF Sum of Squares Mean Square F Value Pr > F ID Error 7 5.11E-14 7.3E-15 23 1.8E-13 7.84E-15 0.93 0.5015 Levene's Test for Homogeneity of yield Variance ANOVA of Squared Deviations from Group Means Source DF Sum of Squares Mean Square F Value Pr > F Gene 1 4.11E-14 4.11E-14 Error 29 1.63E-12 5.63E-14 0.73 0.4000 Levene's Test for Homogeneity of yield Variance ANOVA of Squared Deviations from Group Means Source DF Sum of Squares Mean Square F Value Pr > F Cultivar Error 3 9.07E-14 3.02E-14 27 5.87E-13 2.17E-14 1.39 0.2670 The transformation eliminated the heterogeneity of variances. Levene’s test suggest that the variances are homogeneous (ID P = 0.5015, also works separately for Cultivar P=0.367 and Gene P=0.40). Source DF Type III SS Mean Square F Value Pr > F Gene 1 1.0995647E-6 1.0995647E-6 64.06 <.0001 Cultivar 3 3.5023752E-6 1.1674584E-6 68.02 <.0001 Gene*Cultivar 3 7.2635596E-7 2.4211865E-7 14.11 0.0015 Block 3 1.9891237E-6 6.6304123E-7 38.63 <.0001 Block*Gene 3 3.266445E-9 0.19 0.9001 Block*Cultivar 9 1.4429678E-7 1.6032976E-8 0.93 0.5438 9.799335E-9 The problem with non-additivity has disappeared as all Block*treatment interaction are no longer significant (P = 0.9001 and P = 0.5438). Now time for the ANOVA: Source DF Type III SS Mean Square F Value Pr > F Gene 1 1.2443457E-6 1.2443457E-6 86.45 <.0001 Cultivar 3 3.5229784E-6 1.1743261E-6 81.59 <.0001 Gene*Cultivar 3 7.211971E-7 2.4039903E-7 16.70 <.0001 Block 3 1.9685005E-6 6.5616684E-7 45.59 <.0001 The data suggest that there is a significant effect on yield due to the gene (P < 0.0001) and the cultivar genetic background (P < 0.0001). There is significant gene by cultivar genetic background interactions (P < 0.0001) and there were significant block effects (P < 0.0001). 3.4 [6 points] Before starting the experiment the investigator formulated questions of interest about the significant genetic background effects and interactions that may arise. Please test the following: Contrast DF Contrast SS Mean Square F Value Pr > F diff. btw. African 1 1.124603E-8 1.124603E-8 0.78 0.3872 diff. Btw. C. American 1 3.4573416E-7 3.4573416E-7 24.02 <.0001 diff. btw Af and Am 1 3.1298497E-6 3.1298497E-6 217.45 <.0001 Gene 1 1.2443457E-6 1.2443457E-6 86.45 <.0001 int. gene by African 1 1.0701751E-7 1.0701751E-7 7.44 0.0130 int. gene by C. American 1 3.2820811E-7 3.2820811E-7 22.80 0.0001 int. gene by btw Af and Am 1 2.6331091E-7 2.6331091E-7 18.29 0.0004 3.4.1. Is there a difference in yield between the African and C. American cultivars? Yes P<0.0001 3.4.2 Is there a difference in yield between the two African cultivars? No P=0.3872 (because the effects are crossed for S and T and they cancel each other) 3.4.3 Is there a difference in yield between the two C. American cultivars? Yes P<0.0001 3.4.4 Is the effect of the gene on yield different between the African and C. American cultivars? Yes P=0.0004 3.4.5 Is the effect of the gene on yield different between the African cultivars (i.e. between cultivars 1 and 2)? Yes P= 0.013 (this shows the crossed effect of the gene in S and T alleles) 3.4.6 Is the effect of the gene on yield different between the C. American cultivars (i.e. between cultivars 3 & 4)? Yes P=0.0001 [3 points] Please present line plots of Gene by cultivar interaction for both the original and transformed data. Based on these graph, and the significance of the ANOVAs and contrasts of transformed and untransformed data indicate which of the following statements reflect the reality and why: The Tolerant allele increases yield relative to the susceptible allele in at least some cultivars CORRECT ONE The Tolerant allele decreases yield relative to the susceptible allele in at least some cultivars The Tolerant allele increases yield relative to the susceptible allele in all cultivars The Tolerant allele decreases yield relative to the susceptible allele in all cultivars Original and transformed data interaction plots T he interactions are observed in both transformed and untransformed data, so there is a different response of gene by cultivar (and at least one of the interactions is significant in the untransformed data and all the interactions are significantly different in the transformed data). These graphs show the interactions between varieties within each the African and Central American varieties Question 4 Data Gibberellin; Input Chamber GA $ Light SeedWgt X; Z = SeedWgt + 2.38*(X-17.5); Cards; 1 C 8 111.28 15 2 C 8 111.12 14 3 C 8 115.47 13 1 C 10 115.89 14 2 C 10 112.04 15 3 C 10 115.57 14 1 C 12 118.89 13 2 C 12 115.17 14 3 C 12 113.99 15 1 G3 8 102.18 19 2 G3 8 98.35 20 3 G3 8 97.41 20 1 G3 10 103.71 20 [30 points] 2 G3 10 101.21 19 3 G3 10 102.88 20 1 G3 12 99.16 21 2 G3 12 99.47 20 3 G3 12 103.37 19 1 G4 8 117.09 13 2 G4 8 109.19 15 3 G4 8 111.10 14 1 G4 10 119.23 13 2 G4 10 115.30 14 3 G4 10 118.30 13 1 G4 12 113.31 15 2 G4 12 110.86 15 3 G4 12 117.75 13 1 G4&3 8 92.40 23 2 G4&3 8 91.89 22 3 G4&3 8 96.94 21 1 G4&3 10 98.45 22 2 G4&3 10 92.23 23 3 G4&3 10 95.46 22 1 G4&3 12 94.50 23 2 G4&3 12 94.76 22 3 G4&3 12 97.72 22 ; Proc GLM; Title 'One-way ANOVAs for X and SEEDWGT'; Class Chamber Light GA; Model X SeedWgt = Chamber Light Chamber*Light GA GA*Light; Test h=Chamber e= Chamber*Light; Test h=Light e= Chamber*Light; Contrast 'G3' GA -1 1 -1 1; Contrast 'G4' GA -1 -1 1 1; Contrast 'G3*G4' GA 1 -1 -1 1; Proc GLM; Title 'General regression'; Model SeedWgt = X; Proc GLM Order = Data; Title 'The ANCOVA'; Class Chamber Light GA; Model SeedWgt = Chamber Light Chamber*Light GA GA*Light X/ solution; Test h=Chamber e= Chamber*Light; Test h=Light e= Chamber*Light; LSMeans GA Contrast Contrast Contrast Contrast Contrast Light/ StdErr; 'Light Lineal' 'Light Quadratic' 'G3' 'G4' 'G3*G4' Light -1 0 1 / e=Chamber*Light; Light 1 -2 1 / e=Chamber*Light; GA -1 1 -1 1; GA -1 -1 1 1; GA 1 -1 -1 1; Proc GLM; Title 'Homogeneity of slopes for light'; Class Chamber Light; Model SeedWgt = Chamber Light X Light*X; Proc GLM; Title 'Homogeneity of slopes for GA'; Class Chamber Light GA; Model SeedWgt = Chamber Light Chamber*Light GA GA*Light X GA*X; Proc GLM Order = Data; Title 'ANOVA on Z'; Class Chamber Light GA; Model Z = Chamber Light Chamber*Light GA GA*Light; Test h=Chamber e= Chamber*Light; Test h=Light e= Chamber*Light; Output out = PRz p = Pred r = Res; Proc Univariate Data = PRz normal; Title 'Normality of residuals'; Var Res; Proc Plot Data = PRz; Plot Res*Pred = GA; Proc GLM; Title Class Model Means Proc GLM; Title Class Model Means Run;Quit; 'Homogeneity of variances'; GA; Z = GA; GA / hovtest = Levene; 'Homogeneity of variances'; Light; Z = Light; Light / hovtest = Levene; 1) [2 points] Describe in detail the design of the experiment. Design: RCBD with 3 blocks (Chamber models). Split-plot with photoperiod as main plot and GA levels as subplot (GA levels are itself organized as a 2x2 factorial). Response Variable: Weight of seeds from one complete tray (in grams) Experimental Unit: a) Mainplot: photoperiod (assigned to a chamber of a specific model). b) Subplot: one tray of 100 Arabidopsis plants. Class Variable 1 2 3 Number of Levels 3 3 4 Block or Treatment Block Treatment Treatment Subsamples? Covariable? NO Yes Fixed or Random Random Fixed Fixed Description Chamber model Photoperiod (main plot) Combinations of GA3 and GA4 Level of disease 2) [5 points] Run the ANOVA and ANCOVA designs and compare the results. Use the correct experimental design and correct error terms in both of them! One-way ANOVA for covariable X Class Levels Values Chamber 3 1 2 3 Light 3 8 10 12 GA 4 C G3 G4 G4&3 Number of Observations Read 36 Number of Observations Used 36 Dependent Variable: X Source DF Sum of Squares Mean Square F Value Pr > F Model 17 473.3333333 27.8431373 Error 18 13.6666667 0.7592593 Corrected Total 35 487.0000000 R-Square Coeff Var Root MSE 0.971937 Source 36.67 <.0001 X Mean 4.979171 0.871355 17.50000 DF Type III SS Mean Square F Value Pr > F Chamber 2 2.1666667 1.0833333 1.43 0.2660 Light 2 0.5000000 0.2500000 0.33 0.7237 Chamber*Light 4 0.8333333 0.2083333 0.27 0.8906 GA 3 468.1111111 156.0370370 205.51 <.0001 Light*GA 6 1.7222222 0.2870370 0.38 0.8834 Contrast DF Contrast SS Mean Square F Value Pr > F G3 1 441.0000000 441.0000000 G4 1 11.1111111 11.1111111 14.63 0.0012 G3*G4 1 16.0000000 16.0000000 21.07 0.0002 580.83 <.0001 Tests of Hypotheses Using the Type III MS for Chamber*Light as an Error Term Source DF Type III SS Mean Square F Value Pr > F Chamber 2 2.16666667 1.08333333 5.20 0.0772 Light 2 0.50000000 0.25000000 1.20 0.3906 Level of GA N Mean C SeedWgt X Std Dev Mean Std Dev 9 14.1111111 0.78173596 114.380000 2.54248402 Level of GA N SeedWgt X Mean Mean Std Dev Std Dev G3 9 19.7777778 0.66666667 100.860000 2.32896436 G4 9 13.8888889 0.92796073 114.681111 3.68866658 G4&3 9 22.2222222 0.66666667 94.927778 2.44033183 GA affects the disease (no effect of photoperiod). G3 increases the disease relative to the control and the effect of G3 is enhanced by the presence of G4 (significant interaction G3xG4 on X). Therefore, we need to be careful in the interpretation of the effect of GA on seed-weight, since the effect can be mediated by the effect of GA on the disease and the effect of the disease on seed-weight. One-way ANOVAs for SEEDWGT Dependent Variable: SeedWgt Source DF Sum of Squares Mean Square F Value Pr > F Model 17 2809.617439 165.271614 Error 18 91.757583 5.097644 Corrected Total 35 2901.375022 32.42 <.0001 R-Square Coeff Var Root MSE SeedWgt Mean 0.968374 Source 2.125740 2.257796 106.2122 DF Type III SS Mean Square F Value Pr > F Chamber 2 65.876772 32.938386 6.46 0.0077 Light 2 55.974606 27.987303 5.49 0.0138 Chamber*Light 4 11.786844 2.946711 0.58 0.6823 GA 3 2649.776778 883.258926 173.27 <.0001 Light*GA 6 4.367073 0.86 0.5442 26.202439 Contrast DF Contrast SS Mean Square F Value Pr > F G3 1 2491.008100 2491.008100 G4 1 71.346178 71.346178 14.00 0.0015 G3*G4 1 87.422500 87.422500 17.15 0.0006 488.66 <.0001 Tests of Hypotheses Using the Type III MS for Chamber*Light as an Error Term Source DF Type III SS Mean Square F Value Pr > F Chamber 2 65.87677222 32.93838611 11.18 0.0230 Light 2 55.97460556 27.98730278 9.50 0.0303 The ANOVA indicates significant effects of Chamber (block), light and GA with no interaction The ANCOVA Dependent Variable: SeedWgt Source DF Sum of Squares Mean Square F Value Pr > F Model 18 2887.078513 160.393251 Error 17 14.296509 0.840971 Corrected Total 35 2901.375022 190.72 <.0001 R-Square Coeff Var Root MSE SeedWgt Mean 0.995073 Source 0.863408 0.917045 106.2122 DF Type III SS Mean Square F Value Pr > F Chamber 2 36.55569866 18.27784933 21.73 <.0001 Light 2 63.67956537 31.83978269 37.86 <.0001 Chamber*Light 4 3.46479180 0.86619795 1.03 0.4202 GA 3 0.29693662 0.09897887 0.12 0.9485 Light*GA 6 4.35938952 0.72656492 0.86 0.5404 X 1 77.46107398 77.46107398 92.11 <.0001 Tests of Hypotheses Using the Type III MS for Chamber*Light as an Error Term Source DF Type III SS Mean Square F Value Pr > F Chamber 2 36.55569866 18.27784933 21.10 0.0075 Light 2 63.67956537 31.83978269 36.76 0.0027 Parameter Estimate Standard Error t Value Pr > |t| Intercept 149.4660366 B 5.47521430 27.30 <.0001 X 0.24806143 -9.60 <.0001 -2.3807317 Slope -2.38. A negative slope indicates that the increase of the disease level decreases the see-weight. Least Squares Means GA SeedWgt LSMEAN Standard Error Pr > |t| C 106.311965 0.894504 <.0001 G3 106.282778 0.642416 <.0001 G4 106.084024 0.946498 <.0001 G4&3 106.170122 1.210629 <.0001 Interpret the results of the ANOVA and ANCOVA. Explain any difference in the two analyses. Based on the ANCOVA results answer the following questions Once the effect of the disease is removed the effect of GA on seed-weight disappears. This indicates that the observed effect of GA on seed-weight was an indirect effect of the effect of GA on the disease! The effect of light is highly significant. Light does not affect the disease but still affects seed weight confirming that photoperiods of 10 or 12 h of light increase seed yield relative to 8 h. The response is not lineal (based on the means it seems to saturate it effects at 10h. 3.1. [2 points] Are there significant differences in seed-weigh among different GA levels? No. They disappear in the ANCOVA model P=0.9485 3.2. [2 points] Are there significant interactions for seed-weight between the GA3 and GA4 effects? Not in the ANCOVA model. P=0.8997 Dependent Variable: SeedWgt Contrast DF Contrast SS Mean Square F Value Pr > F G3 1 0.00021905 0.00021905 0.00 0.9873 G4 1 0.14396681 0.14396681 0.17 0.6842 G3*G4 1 0.01377586 0.01377586 0.02 0.8997 3.3. [2 points Are there significant differences in seed-weigh among different photoperiods? Indicate P Yes P=0.0027 3.4. [2 points Are the differences in photoperiod lineal? Light SeedWgt LSMEAN Standard Error Pr > |t| 8 104.336606 0.265534 <.0001 10 107.324106 0.265534 <.0001 12 106.975955 0.267937 <.0001 Tests of Hypotheses Using the Type III MS for Chamber*Light as an Error Term Contrast DF Contrast SS Mean Square F Value Pr > F Light Lineal 1 40.68075737 40.68075737 46.96 0.0024 Light Quadratic 1 22.05143729 22.05143729 25.46 0.0073 The differences in photoperiod are significant but not lineal (significant quadratic response P=0.0073). Seed-weight is higher at 10-12 h than at 8h, but there are no differences between 10 and 12 h suggesting that 10 h is sufficient to saturate the photoperiod response. 3.5. [2 points Are the differences in photoperiod different at the different GA treatments? No. there is no significant interaction between light and GA treatments in the ANOVA. P=0.5404 4) Answer the following questions: 4.1. [2 points] Are the slopes homogeneous within GA treatments and within photoperiods? Homogeneity of slopes for light Dependent Variable: SeedWgt Source DF Sum of Squares Mean Square F Value Pr > F Model 7 2878.691338 411.241620 Error 28 22.683684 0.810132 Corrected Total 35 2901.375022 507.62 <.0001 R-Square Coeff Var Root MSE SeedWgt Mean 0.992182 Source 0.847429 0.900073 106.2122 DF Type III SS Mean Square F Value Pr > F Chamber 2 37.267363 18.633681 23.00 <.0001 Light 2 4.015414 2.007707 2.48 0.1021 X 1 2751.801118 2751.801118 3396.73 <.0001 X*Light 2 0.153351 0.076676 0.09 0.9100 Slopes of seed-weigh vs disease are homogeneous for the three photoperiods Homogeneity of slopes for GA Dependent Variable: SeedWgt Source DF Sum of Squares Mean Square F Value Pr > F Model 21 2891.512949 137.691093 Error 14 9.862074 0.704434 Corrected Total 35 2901.375022 195.46 <.0001 R-Square Coeff Var Root MSE SeedWgt Mean 0.996601 Source 0.790216 0.839306 106.2122 DF Type III SS Mean Square F Value Pr > F Chamber 2 32.93884880 16.46942440 23.38 <.0001 Light 2 65.17865884 32.58932942 46.26 <.0001 Chamber*Light 4 6.16141440 1.54035360 2.19 0.1235 GA 3 4.42364106 1.47454702 2.09 0.1471 Light*GA 6 5.44555607 0.90759268 1.29 0.3240 X 1 72.00482116 72.00482116 102.22 <.0001 X*GA 3 4.43443572 1.47814524 2.10 0.1464 Slopes of seed-weigh vs disease are homogeneous for the four GA treatments 4.2 [2 points] Are the residuals from the ANCOVA model normally distributed? ANOVA on Z Dependent Variable: Z Source DF Sum of Squares Mean Square F Value Pr > F Model 17 109.9041056 6.4649474 Error 18 14.2965167 0.7942509 8.14 <.0001 Source DF Sum of Squares Mean Square F Value Pr > F Corrected Total 35 124.2006222 R-Square Coeff Var Root MSE 0.884892 Source Z Mean 0.839082 0.891208 106.2122 DF Type III SS Mean Square F Value Pr > F Chamber 2 37.30090556 18.65045278 23.48 <.0001 Light 2 64.04677222 32.02338611 40.32 <.0001 Chamber*Light 4 3.50444444 0.87611111 1.10 0.3853 GA 3 0.29731111 0.09910370 0.12 0.9442 Light*GA 6 4.75467222 0.79244537 1.00 0.4565 Tests of Hypotheses Using the Type III MS for Chamber*Light as an Error Term Source DF Type III SS Mean Square F Value Pr > F Chamber 2 37.30090556 18.65045278 21.29 0.0074 Light 2 64.04677222 32.02338611 36.55 0.0027 Almost identical to the ANCOVA! This indicates that the adjusted values are correct. Normality of residuals The UNIVARIATE Procedure Tests for Normality Test Statistic p Value Shapiro-Wilk W 0.970054 Pr < W 0.4270 Plot of Res*Pred. 1.5 ˆ ‚ ‚ ‚ ‚ ‚ ‚ 1.0 ˆ ‚ ‚ ‚ ‚ ‚ ‚ 0.5 ˆ ‚ ‚ Res ‚ ‚G ‚ ‚ 0.0 ˆ Symbol is value of GA. G G G G C G G C G C G G G G G C G ‚ ‚ ‚ ‚ ‚ ‚ -0.5 ˆ ‚ ‚ ‚ ‚ ‚ ‚ -1.0 ˆ ‚ ‚ ‚ ‚ ‚ ‚ -1.5 ˆ G G G G G C C C G C G G G C G G G Šˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒ 103 104 105 106 107 108 109 110 Pred Residuals of the ANOVA are normally distributed 4.3 [2 points] Are the variances for photoperiod and for GA treatments homogeneous? Homogeneity of variances Dependent Variable: Z Levene's Test for Homogeneity of Z Variance ANOVA of Squared Deviations from Group Means Source DF Sum of Squares Mean Square F Value Pr > F GA Error 3 6.4302 2.1434 32 528.1 16.5027 0.13 0.9416 Levene's Test for Homogeneity of Z Variance ANOVA of Squared Deviations from Group Means Source DF Sum of Squares Mean Square F Value Pr > F Light Error 2 5.6965 2.8482 33 89.7145 2.7186 Level of Light N 1.05 0.3621 Z Mean Std Dev 8 12 104.336667 1.33350075 10 12 107.324167 1.54117699 12 12 106.975833 1.14676984 Both light and GA show homogeneous variances. 4.4 [2 points] Is the regression between seed-weight and disease significant? What is the average slope? Explain what does that slope and its sign mean in terms of seed-weight and disease levels. General regression Dependent Variable: SeedWgt Source DF Sum of Squares Mean Square F Value Pr > F Model 1 Error 34 124.169230 Corrected Total 35 2901.375022 2777.205792 2777.205792 760.45 <.0001 3.652036 R-Square Coeff Var Root MSE SeedWgt Mean 0.957203 Source DF X 1.799256 1.911030 106.2122 Type I SS Mean Square F Value Pr > F 1 2777.205792 2777.205792 Parame ter Estimate Standar d Error t Val ue Pr > | t| Interc ept 148.0027 253 1.54855 700 95.57 <.0001 X 2.388028 7 0.08659 704 27.58 <.0001 Significant regression suggests that it might be worth adjusting for this covariable. The increase of the disease is associated with a decrease in seed-weight 760.45 <.0001 4.5 [2 points] Is the disease independent of the treatments? No. The ANOVA of the covariable indicate a significant GA effect 4.6 [3 points] What factors affect significantly the level of disease? Is there any significant interaction? How can this result help you to understand the differences between the ANOVA and ANCOVA results? GA3 presence results in an increase in disease severity, which is further increased in the presence of the two GA forms (GA4&3). This is an interesting discovery that can generate a new grant!