Experimental Design in Agriculture Name: KEY CSS 590 Second Midterm Winter 2012 10 pts 1) You wish to compare eight varieties of peas in a field that was determined to have a soil variability index of b=0.6. The last time you conducted a trial in this field you obtained a CV of 16% using a plot size of 12 m2 and four replications in a randomized block design. Determine the size of plot you would you need to have an 80% probability of detecting differences of 30% of the mean, using a significance level of 5%. Assuming that your planter and harvest equipment are designed for a plot that is 2m wide, how long will your plot need to be to attain the optimum plot size? Use the tables at the end of this exam and show your work. r= 4 dfe = (r-1)(t-1) = (4-1)(8-1) = 21 t(0.05, 21 df) = 2.08 t(0.40, 21 df) = 0.859 CV = 16 d = 30 b = 0.6 Xb = [2*(2.08+0.859)2*162]/(4*302) Xb = 1.228 (Xb)1/0.6 = (1.228) 1/0.6 X = 1.409 1.409 x 12m2 = 16.91 m2 Length = 16.91/2 = 8.45 m 8 pts 2) A fellow graduate student is planning an experiment. He would like to have a 90% probability of detecting differences that are within 20 units of the mean, using a Type I error rate of 0.05. He is not sure how to determine if his experimental design has the desired level of power. Using the estimate of variance and other parameters that he provides, you calculate a detectable difference of 25 units for his experiment. What advice would you give to him? Because the d value is larger than 20 units, the experiment does not have the desired level of power or precision (confidence intervals around the treatment means are too wide). He will need to increase the number of replications or possibly the plot size, or find other ways to reduce experimental error. 1 3) Five varieties of Indian mustard were compared for glucosinolate content (GLN) in a field study. The trial was conducted as a Randomized Complete Block Design with four blocks. Leaf tissue was sampled from three plants in each plot. Glucosinolate content was measured on each individual plant. The data were analyzed with SAS PROC GLM. Some of the output is shown below. The GLM Procedure Dependent Variable: GLN Source Model 19 11949.023 628.896 Error 40 538.773 13.469 Corrected Total 59 12487.796 Source 5 pts DF Sum of Squares Mean Square DF Type III SS Mean Square Block 3 1346.003 448.668 Variety 4 10000.396 2500.099 Block*Variety 12 602.624 50.219 a) What value from this output provides an estimate of the variance among plants within each plot? 13.469 8 pts b) Calculate the appropriate F value for testing the null hypothesis that the means of all varieties are equal. Use the F table from the back of this exam to determine whether to accept or reject the null hypothesis, using =0.05. F = MS(variety)/MS(block*Variety)=2500.099/50.219=49.78 Critical F with 4, 12 df = 3.26 49.78>3.26 so we reject the null hypothesis and conclude that there are differences among the varieties 8 pts c) Calculate the standard error for a variety mean for this experiment. sx MSE 50.219 2.046 r*n 4*3 2 12 pts 4) You have conducted an experiment with seven treatments using a Latin Square Design. a) Complete the ANOVA by filling in the shaded cells below. Source df Total MS F 48 107.05 Treatment 6 39.56 6.593 5.449 Row 6 6.93 1.155 0.954 Column 6 24.26 4.043 3.342 30 36.30 1.210 Error 8 pts SS b) Calculate the relative efficiency of this design compared to a CRD. RE MSR MSC (t 1)MSE 1.155 4.043 (7 1) *1.210 1.287 (t 1) * MSE (7 1) *1.210 29% more efficient than a CRD 8 pts c) If you were to conduct a similar experiment in the future, would you use the same experimental design? If not, what changes would you make? What evidence is there to support your answer? I would conduct this experiment as an RBD with Columns as blocks. I might also consider using fewer reps to save resources, since there are highly significant differences among treatments (there is a good level of power). The F critical value for 6 and 30 df is 2.42. Thus an F test for Rows would be nonsignificant, whereas the F test for columns is significant. The results from Part b further support the need to use columns as a blocking factor. There is no evidence that the use of rows as a blocking factor helped to control experimental error. Although not required, you could also support your answer by calculating RE in comparison to an RBD. Compared to an RBD with rows as blocks: (33% gain in efficiency) RE MSC (t 1)MSE 4.043 (7 1) *1.210 1.33 t * MSE 7*1.210 Compared to an RBD with columns as blocks (1% loss in efficiency) RE MSR (t 1)MSE 1.155 (7 1) *1.210 0.99 t * MSE 7*1.210 This last comparison shows that there is nothing to be gained by adding rows to an RBD design with columns as blocks. Hence the RBD with columns as blocks is the best choice. 3 6 pts 5) The arcsin transformation is often recommended for data that follow a binomial distribution. Which of the data sets below is most likely to benefit from an arcsin transformation? (circle one) i. Grain protein content with different fertilizer treatments. Values range from 9.5 to 13.1 percent. ii. Insects caught in traps at varying elevations. Results vary from 10 to 50 insects per week. The means are proportional to the variance. iii. Disease incidence (% infected plants) of crop varieties that have been uniformly inoculated with the pathogen. Some varieties are highly resistant and others are highly susceptible to the disease. iv. Weed counts with varying herbicide treatments. Values range from 2 to 250. The means are proportional to the standard deviation. 9 pts 6) One of the assumptions required for a valid ANOVA is that the residuals (errors) are normally distributed. How can you determine if a data set that you have collected from an experiment meets this assumption? Boxplots for each treatment group and for the whole set of residuals can help to determine if there is any skewness in the data or if there are extreme outliers. Q-Q plots or normal probability plots can also be used – for these a fairly large number of observations are needed, so it’s best to use the combined set of residuals for all treatment groups. If the residuals fall close to the line, the normality assumption is met. Formal tests for normality such as the Shapiro-Wilk test can also be conducted. 4 6) An experiment has been conducted to evaluate the effect of three methods for pruning blueberries (no pruning, standard method, new method) and two fertilizer levels (low and high) on fruit yield. The experiment includes all possible combinations of these two treatment factors. The graph below shows the results for the experiment (error bars are the standard errors of the means): Effect of Pruning Method and Fertility on Fruit Yield 35 LOW Fruit Yield 30 HIGH 25 20 15 10 5 0 None Standard New Pruning Method 8 pts a) Based on the results shown in the graph, do you anticipate that we will be able to interpret the F test from the ANOVA for the main effect of pruning method in this experiment? Explain your answer. No. There is a strong interaction between the pruning method and level of fertility applied. It is not meaningful to discuss the average effect of each pruning method, because the effect of the pruning method depends on the level of fertility. In this case the averages for pruning methods would be very similar and would probably not be significant, when in fact there is a strong effect of pruning method on fruit yield at each fertility level. 10 pts b) Assuming that there are only two replications for this experiment, draw a possible layout (randomization) for this experiment in the field. You may use a CRD or an RBD (indicate which design you have chosen.) Include all of the experimental units in your diagram and label the treatments that are assigned to each plot. RBD show either one Block 1 Block 2 Standard-Low New-High None-High New-Low Standard-High None-Low None-Low New-Low Standard-High New-High Standard-Low None-High Standard-Low New-Low Standard-High Standard-Low None-Low None-High 5 CRD Standard-High None-High None-Low New-High New-Low New-High F Distribution 5% Points Denominator df 1 1 161.45 2 18.51 3 10.13 4 7.71 5 6.61 6 5.99 7 5.59 8 5.32 9 5.12 10 4.96 11 4.84 12 4.75 13 4.67 14 4.60 15 4.54 16 4.49 17 4.45 18 4.41 19 4.38 20 4.35 21 4.32 22 4.30 23 4.28 24 4.26 25 4.24 26 4.23 27 4.21 28 4.20 29 4.18 30 4.17 Student's t Distribution Numerator (2-tailed probability) 2 3 4 5 6 7 199.5 215.71 224.58 230.16 233.99 236.77 19.00 19.16 19.25 19.3 19.33 19.36 9.55 9.28 9.12 9.01 8.94 8.89 6.94 6.59 6.39 6.26 6.16 6.08 5.79 5.41 5.19 5.05 4.95 5.88 5.14 4.76 4.53 4.39 4.28 4.21 4.74 4.35 4.12 3.97 3.87 3.79 4.46 4.07 3.84 3.69 3.58 3.50 4.26 3.86 3.63 3.48 3.37 3.29 4.10 3.71 3.48 3.32 3.22 3.13 3.98 3.59 3.36 3.20 3.09 3.01 3.88 3.49 3.26 3.10 3.00 2.91 3.80 3.41 3.18 3.02 2.92 2.83 3.74 3.34 3.11 2.96 2.85 2.76 3.68 3.29 3.06 2.90 2.79 2.71 3.63 3.24 3.01 2.85 2.74 2.66 3.59 3.20 2.96 2.81 2.70 2.61 3.55 3.16 2.93 2.77 2.66 2.58 3.52 3.13 2.90 2.74 2.63 2.54 3.49 3.10 2.87 2.71 2.60 2.51 3.47 3.07 2.84 2.68 2.57 2.49 3.44 3.05 2.82 2.66 2.55 2.46 3.42 3.03 2.80 2.64 2.53 2.44 3.40 3.00 2.78 2.62 2.51 2.42 3.38 2.99 2.76 2.60 2.49 2.40 3.37 2.98 2.74 2.59 2.47 2.39 3.35 2.96 2.73 2.57 2.46 2.37 3.34 2.95 2.71 2.56 2.45 2.36 3.33 2.93 2.70 2.55 2.43 2.35 3.32 2.92 2.69 2.53 2.42 2.33 6 df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0.4 0.05 0.01 1.376 12.706 63.667 1.061 4.303 9.925 0.978 3.182 5.841 0.941 2.776 4.604 0.920 2.571 4.032 0.906 2.447 3.707 0.896 2.365 3.499 0.889 2.306 3.355 0.883 2.262 3.250 0.879 2.228 3.169 0.876 2.201 3.106 0.873 2.179 3.055 0.870 2.160 3.012 0.868 2.145 2.977 0.866 2.131 2.947 0.865 2.120 2.921 0.863 2.110 2.898 0.862 2.101 2.878 0.861 2.093 2.861 0.860 2.086 2.845 0.859 2.080 2.831 0.858 2.074 2.819 0.858 2.069 2.807 0.857 2.064 2.797 0.856 2.060 2.787 0.856 2.056 2.779 0.855 2.052 2.771 0.855 2.048 2.763 0.854 2.045 2.756 0.854 2.042 2.750