CROP 590 Experimental Design in Agriculture Lab exercise – 6th week Factorial Experiments Orthogonal Contrasts Regression in the ANOVA SAS On-line Documentation PROC GLM REG Procedure Part I. Two-Way Factorial experiment The next tool we will learn in SAS is how to write the model statement for a factorial experiment so that we can look at the main effect and interaction sources of variation in the ANOVA table. A forage crop specialist wanted to determine the effect of a Rhizobium seed inoculant and phosphorus fertilizer on the yield of alfalfa grown for hay. Because he expected the effect of the inoculant to depend on whether or not fertilizer had been applied, he decided to use a factorial set of treatments with an inoculation factor at two levels and a phosphorus factor at two levels. He used a randomized block design with four blocks. His treatments were: C P I PI No phosphorus, no seed treatment 25 kg/ha phosphorus added Seed treated with inoculum Both phosphorus and seed treatment 1) Download the data file for this exercise (Lab6_2way.xlsx). Input your data into SAS. 2) Use Proc GLM to conduct the ANOVA. Remember to define all of your independent variables as class variables. The Model can be written in two ways: MODEL Yield = Block Phosphorus|Seedtreat; or MODEL Yield = Block Phosphorus Seedtreat Phosphorus*Seedtreat; The above statements are equivalent because the vertical ‘|’ tells SAS to consider all main effects and interactions between the two factors. (Note: the vertical bar may appear as two vertical dashes on your keyboard) What do the results of the ANOVA indicate about the treatments? Are the interactions significant? Are the main effects significant? Use an LSMEANS statement to obtain the means and standard errors. LSMEANS Phosphorus|Seedtreat /stderr tdiff; Compare the results of the t tests for main effect means with the results from the ANOVA. Is there any reason to perform mean comparison tests in this case? 1 3) Have a quick look at a plot of the means: The ‘nway’ option tells SAS to calculate the means for interactions but not for main effects The NWAY option tells SAS to calculate PROC MEANS NWAY; the means at the level of the CLASS seedtreat phosphorus; interactions but not for main effects. VAR Yield; By default, the means in the output OUTPUT OUT=new2 MEAN=; RUN; data set will be given the name 'Yield', because that is the response variable name. PROC GPLOT DATA=new2; PLOT Yield*phosphorus=seedtreat; SYMBOL1 i=join VALUE=dot; RUN; QUIT; The symbol1 statement will create dots for the data points on the graph (VALUE=dot) and connect the dots for each seedtreatment with a line (i=join). Part II. Three-way factorial An experiment was conducted to investigate the effect of zinc and copper added to basic diets of maize or wheat fed to chickens. The treatments were arranged in a 23 factorial with four broods of chicks used as blocks. The results are expressed as average weekly gains in grams (Lab6_3way.xlsx). 1) Write a SAS program to analyze the data. The model statement is an extension of the two-factor analysis. Remember that the vertical slash ‘|’ is equivalent to specifying all of the possible main effects and interactions among the factors. PROC GLM; CLASS Block Zinc Copper Grain; MODEL Weight = Block Zinc|Copper|Grain; MEANS Zinc|Copper|Grain; RUN; 2) How would you interpret these results? What could you add to obtain estimates of standard errors of means? 3) Create graphs of the interactions of interest. PROC MEANS nway; CLASS Copper Grain; VAR weight; OUTPUT OUT=new3 MEAN=; RUN; PROC GPLOT DATA=new3; PLOT weight*copper=grain; SYMBOL1 i=join VALUE=dot; RUN; 2 Part III. Using contrast statements in a factorial experiment An agricultural chemist suspected that the activity of root growth hormone was dependent on temperature and concentration. He designed an experiment using two different temperature baths of a nutrient solution containing 2 ppm, 4 ppm, and 6 ppm of the root growth hormone. He had two assistants to help measure the root growth response so the experiment was run as a randomized block design with two blocks. TEMP CONC BLOCK ROOTS Cold Cold Cold Cold Cold Cold Warm Warm Warm Warm Warm Warm 2 2 4 4 6 6 2 2 4 4 6 6 1 2 1 2 1 2 1 2 1 2 1 2 76 60 63 58 50 45 23 15 36 25 47 36 1) Copy this data into a SAS data step using the appropriate input statement and a datalines statement. 2) Develop a set of orthogonal contrasts to answer the following questions: a. b. c. d. Is there a difference in root growth due to temperature? Is there a linear response to concentration? Is there a quadratic response to concentration? Does the nature of the response to concentration (linear or quadratic) depend on the temperature? Cold, 2 ppm Cold, 4 ppm Cold, 6 ppm Warm, 2 ppm Warm, 4 ppm 3) Run a SAS program on this data to address these questions. PROC GLM plots=diagnostics; TITLE 'contrasts in a factorial experiment'; CLASS block temp conc; MODEL roots = block temp conc temp*conc; 3 Warm, 6 ppm /*make sure that your coefficients correspond to the order of your means*/ CONTRAST CONTRAST CONTRAST CONTRAST CONTRAST 'main 'conc 'conc 'temp 'temp effect of temp' temp -1 1; lin' conc -1 0 1; quad' conc 1 -2 1; x conc lin' temp*conc 1 0 -1 -1 0 1; x conc quad' temp*conc -1 2 -1 1 -2 1; LSMEANS conc temp; LSMEANS temp*conc/out=new; PROC GPLOT data=new; TITLE 'Effect of conc and temp on root growth'; SYMBOL1 i=join v=dot; AXIS1 label=('Root growth'); PLOT lsmean*conc=temp/ vaxis=axis1; RUN; 4) How would you interpret the output? 5) You can also use the “ESTIMATE” statement in PROC GLM rather than contrast statements. The syntax is very similar, but you may want to use the “divisor” option to get the proper scaling of the linear contrast estimates. The results of the t tests will be the same as before. Use a CONTRAST statement if you are most interested in determining the proportion of the treatment Sum of Squares that is due to the contrast. Use the ESTIMATE statement if you are interested in the actual value of the linear contrast. Estimate 'main effect of temp' temp -1 1; Estimate 'conc lin' conc -1 0 1; Estimate 'conc quad' conc 1 -2 1/divisor=2; Estimate 'temp x conc lin' temp*conc 1 0 -1 -1 0 1; Estimate 'temp x conc quad' temp*conc -1 2 -1 1 -2 1/divisor=2; You can also use the ESTIMATE statement to get an unbiased estimate of a single mean. For example, for the low temperature treatment: Estimate 'BLUE mean of low temp' intercept 1 temp 1 0; Compare your results using Estimate statements to the output from the Contrasts and lsmeans statements. 4 Part IV. Part I using nonclass variables One could analyze the data from Part I by considering the quantitative factor (concentration) to be a nonclass variable: PROC GLM data=A; TITLE 'regression vs contrasts in a factorial experiment'; CLASS block temp; MODEL roots = block temp conc conc*conc temp*conc temp*conc*conc/SS1 solution; RUN; Part V: Regression with orthogonal polynomials compared to regression on a quantitative variable (this section is optional!) An experiment was conducted to determine the effect of storage temperature on seed viability. Fifteen seed samples were obtained and three samples, selected at random from the fifteen were stored at each of five temperatures: 10, 30, 50, 70, 90. At the end of a one year storage period the samples were tested for germination percentage with the following results: Temp 10 10 10 30 30 30 50 50 50 70 70 70 90 90 90 Germination 62 55 57 26 36 31 16 15 23 10 11 18 13 11 9 1) Copy this data into a SAS data step using the appropriate input statement and a datalines statement. 2) Use Proc GLM to analyze the data as a CRD. Remember to specify that temperature is a class variable. Are there significant differences among the Temperature treatments? 3) Use Contrast statements to determine if the effect of temperature is linear, quadratic, cubic, or quartic (refer to the table of polynomial coefficients handed out in class). PROC GLM; TITLE 'Use of orthogonal polynomials'; CLASS Temp; 5 MODEL Germination = Temp; CONTRAST 'Temp Linear' Temp -2 -1 0 1 2; CONTRAST 'Temp Quadratic' Temp 2 -1 -2 -1 2; CONTRAST 'Temp Cubic' Temp -1 2 0 -2 1; CONTRAST 'Temp Quartic' Temp 1 -4 6 -4 1; LSMEANS Temp / stderr out=new2; PROC GPLOT data=new2; PLOT lsmean*Temp; RUN; QUIT; Based on your ouput, which polynomials will you retain in your model? 4) Now try running the same regression, but consider Temperature to be a continuous rather than a class variable. PROC GLM data=yourfile; TITLE 'Linear Regression of germination on temperature'; MODEL Germination = Temp; PROC GLM data=yourfile; TITLE 'Quadratic Regression of germination on temperature'; MODEL Germination = Temp Temp*Temp; PROC GLM data=yourfile; TITLE 'Cubic Regression of germination on temperature'; MODEL Germination = Temp Temp*Temp Temp*Temp*Temp; PROC GLM data=yourfile; TITLE 'Quartic Regression of germination on temperature'; MODEL Germination = Temp Temp*Temp Temp*Temp*Temp Temp*Temp*Temp*Temp; RUN; Proc GLM provides direct estimates of the regression coefficients for nonclass variables. Note how these change as we go to higher order polynomials. Compare Type I vs Type III SS as we go to higher order polynomials. For regression analysis using nonclass variables, we generally use the Type I (sequential) SS. Type I SS partitions the Model SS into component SS due to adding each variable sequentially to the model in the order that they appear in the model statement. Type III SS (partial SS) determines the SS explained be each variable after all of the other variables have been included in the model, regardless of the order that they appear in the model statement. Compare the SS for the quartic regression to the output from the analysis of contrasts. What happens to MSE as we go to higher order polynomials? How does it compare to the result from the analysis of contrasts? How do you explain the differences in the error term? PROC REG is a general purpose regression procedure in SAS that could be used in this example. PROC GLM or PROC MIXED are better choices if you have both class and nonclass 6 variables in your model. PROC REG has nine options for selecting the best model. In our example, we could request a forward selection process to sequentially add terms to the model. Because PROC REG cannot analyze a squared or higher order polynomial term in the model statement, we must first create new variables representing these terms: data four; set three; tempsq=temp*temp; tempcu=temp*temp*temp; tempqu=temp*temp*temp*temp; PROC REG data=four; TITLE 'Forward Selection'; MODEL Germination = Temp tempsq tempcu tempqu/SELECTION=FORWARD ss1; RUN; QUIT; 7