CROP 590 Experimental Design in Agriculture Lab exercise – 3rd week Two-Way Analysis of Variance RBD, missing values Power calculations SAS On-line Documentation SAS/STAT PROC GLM SAS/STAT PROC MIXED SAS/STAT PROC GLMPOWER We have seen how a one-way analysis of variance considers one treatment factor with two or more treatment levels. This type of analysis would be used for a Completely Randomized Design (CRD) experiment. Today we will go to the next level and look at the SAS commands for a Two-Way Analysis of Variance. We will analyze data from a Randomized Complete Block Design (RBD). We will also learn how SAS can handle missing data and estimate power for RBD experiments. A horticulturist performed a field experiment to determine the effect of fungicide treatments applied to plots where azaleas were to be grown. The treatments consisted of a control (no fungicide), an old product and a new product. The fungicides were applied to plots before inoculation in a Randomized Complete Block Design with 4 replications. Two plants were planted in each plot after inoculation. After several weeks the plants were dug up and their root weights were recorded. 1. Download the data for this exercise from the file Lab3.xlsx. Part I. RBD using plot means 2. For this analysis, we will work with the means for each plot (a conventional RBD design). The data are completely balanced (there are no missing plots). Write a SAS program to input data from the spreadsheet called “Plot Means” (in the Lab3.xls file). Use ‘input’ and ‘datalines’ statements and cut and paste the values from the spreadsheet directly into your SAS program. Remember that you need to include a $ sign after a character variable in your input statement, and that you need a semicolon at the end of your data lines. 3. Now use PROC GLM to run an RBD analysis on this data set. You will now have two independent variables in your model statement. The data are balanced (all treatments occur once in each block), so we can use the MEANS statement to obtain treatment means and request an LSD test. Proc GLM; Title 'RBD for fungicide treatments'; class block fungicide; model Rootwt=block fungicide; means fungicide/lsd; Run; 4. Interpret the results of the SAS output. Are there differences among the fungicide treatments? How does the new fungicide compare with the old product and the control? Was it advantageous to use blocks? 1 Part II. RBD with missing values 5. Now let’s compare the output when one or more of the means are missing. Make a copy of your SAS program, but replace the value of RootWt for the Old fungicide in Block 1 with a period (.). That is the designation for a missing plot in a SAS program or data text file. When you use the import wizard to bring in an excel spreadsheet, the missing cells should be left empty. SAS will automatically convert the missing cell to a ‘.’. 6. PROC GLM has the capability to make adjustments for missing values. However, you will need to replace the ‘Means’ statement with ‘LSMEANS’ (least squares means). It will no longer be possible to run a single LSD test for all treatment comparisons, because the standard errors will be slightly different for each mean and mean comparison. Instead, we will request standard errors (stderr) and the t statistics and probabilities associated with individual mean comparison tests (tdiff). If you only need the probability values, use the pdiff option rather than tdiff. Proc GLM; Title 'RBD with missing plots'; class block fungicide; model Rootwt=block fungicide; lsmeans fungicide/stderr tdiff; 7. A predicted value is the value of the response variable that is expected for a given treatment level based on the model. A residual is the difference between the observed and predicted value. To obtain estimates of predicted values and residuals (which can be plotted against the predicted values to check normality assumptions), include the following statements at the end of PROC GLM: Output out=new P=Yhat R=Resid; Proc Print data=new; run; Although we really have no reason to estimate a value for a missing plot in the modern era, you may be interested to know that the use of a missing plot formula would give you the predicted value in the case where you have a single missing observation in an RBD. Predicted values and residuals can now be found in the data set “new” in the “work” directory. Alternatively, you could create an output data set using the ODS system: ods output LSMeans=adjmean Diff=ptable PredictedValues=Yhat; This will create three new data sets: “adjmean”, “ptable” and “Yhat”. Open these data sets and note the contents of each. For a complete list of possible ODS table names, go to the GLM help documention, select the “details” tab, and choose “ODS Table Names”. 2 8. According to this analysis, are there significant differences among the fungicide treatments? Is the new fungicide different from the old? Were the block effects significant? Observe what happens to the Sums of Squares, df, the standard errors of means and the mean comparison tests when you have missing plots. Also note that with unbalanced data there is a difference between Type I and Type III Sums of Squares. In general, you will use the Type III Sums of Squares when you have missing data. 9. For unbalanced data, you should consider using a Mixed Model analysis if your model consists of some random (e.g. blocks) and some fixed effects (e.g. fungicides in this experiment). The syntax is similar, except that the random effects are not included in the model statement. Proc Mixed data=two; Title 'RBD with mixed models'; class block fungicide; model Rootwt=fungicide; Random block; lsmeans fungicide/tdiff; run; For balanced data, the results of the F tests of fixed effects for a Mixed Model analysis and ANOVA will be the same. Pairwise comparisons of treatment means will also be the same for the Mixed Model analysis and the ANOVA when data are balanced. Standard errors of means will be somewhat different for a Mixed Model analysis and an ANOVA, for both balanced and unbalanced data sets. Standard error estimates from Mixed Models will often be a little larger than from an ANOVA, because Mixed Models account for the additional variation due to sampling random factors (such as blocks) from a reference population. Part III. Power Calculations in an RBD 10. Use the estimate of MSE obtained from the ANOVA to conduct a power analysis: proc glmpower data=one; Title 'Power for fungicide experiment'; class Block Fungicide; model RootWt = Block Fungicide; power stddev = 1.518474 ntotal = 12 power = .; run; 3 Part IV. RBD using Excel 11. If time permits, try running a two-way ANOVA on the original data set (without missing plots) using Excel. First you must create a two-way table, with Blocks on one axis and treatments on the other. To run the analysis go to “Data” and select “Data Analysis” from the toolbar. (if your computer does not show the “Data Analysis” option you may have to add in the Analysis ToolPak - refer to your notes from Lab 2). Select “ANOVA: Two-factor without replication”. Excel considers blocks to be a second factor, and therefore does not recognize that the experiment is replicated. After you run the analysis, compare the results to SAS output from Part I. 4