Experimental Design in Agriculture CSS 590 Second Midterm Winter 2012

advertisement
Experimental Design in Agriculture
Name:
KEY
CSS 590
Second Midterm
Winter 2012
10 pts
1) You wish to compare eight varieties of peas in a field that was determined to have a soil
variability index of b=0.6. The last time you conducted a trial in this field you obtained a
CV of 16% using a plot size of 12 m2 and four replications in a randomized block design.
Determine the size of plot you would you need to have an 80% probability of detecting
differences of 30% of the mean, using a significance level of 5%.
Assuming that your planter and harvest equipment are designed for a plot that is 2m
wide, how long will your plot need to be to attain the optimum plot size?
Use the tables at the end of this exam and show your work.
r= 4
dfe = (r-1)(t-1) = (4-1)(8-1) = 21
t(0.05, 21 df) = 2.08
t(0.40, 21 df) = 0.859
CV = 16
d = 30
b = 0.6
Xb = [2*(2.08+0.859)2*162]/(4*302)
Xb = 1.228
(Xb)1/0.6 = (1.228) 1/0.6
X = 1.409
1.409 x 12m2 = 16.91 m2
Length = 16.91/2 = 8.45 m
8 pts
2) A fellow graduate student is planning an experiment. He would like to have a 90%
probability of detecting differences that are within 20 units of the mean, using a Type I
error rate of 0.05. He is not sure how to determine if his experimental design has the
desired level of power. Using the estimate of variance and other parameters that he
provides, you calculate a detectable difference of 25 units for his experiment. What
advice would you give to him?
Because the d value is larger than 20 units, the experiment does not have the desired
level of power or precision (confidence intervals around the treatment means are too
wide). He will need to increase the number of replications or possibly the plot size, or
find other ways to reduce experimental error.
1
3) Five varieties of Indian mustard were compared for glucosinolate content (GLN) in a field
study. The trial was conducted as a Randomized Complete Block Design with four
blocks. Leaf tissue was sampled from three plants in each plot. Glucosinolate content
was measured on each individual plant.
The data were analyzed with SAS PROC GLM. Some of the output is shown below.
The GLM Procedure
Dependent Variable: GLN
Source
Model
19
11949.023
628.896
Error
40
538.773
13.469
Corrected Total 59
12487.796
Source
5 pts
DF Sum of Squares Mean Square
DF Type III SS Mean Square
Block
3
1346.003
448.668
Variety
4
10000.396
2500.099
Block*Variety 12
602.624
50.219
a) What value from this output provides an estimate of the variance among plants within
each plot?
13.469
8 pts
b) Calculate the appropriate F value for testing the null hypothesis that the means of all
varieties are equal. Use the F table from the back of this exam to determine whether
to accept or reject the null hypothesis, using =0.05.
F = MS(variety)/MS(block*Variety)=2500.099/50.219=49.78
Critical F with 4, 12 df = 3.26
49.78>3.26 so we reject the null hypothesis and conclude that there are differences
among the varieties
8 pts
c) Calculate the standard error for a variety mean for this experiment.
sx 
MSE
50.219

 2.046
r*n
4*3
2
12 pts
4) You have conducted an experiment with seven treatments using a Latin Square Design.
a) Complete the ANOVA by filling in the shaded cells below.
Source
df
Total
MS
F
48
107.05
Treatment
6
39.56
6.593
5.449
Row
6
6.93
1.155
0.954
Column
6
24.26
4.043
3.342
30
36.30
1.210
Error
8 pts
SS
b) Calculate the relative efficiency of this design compared to a CRD.
RE 
MSR  MSC  (t  1)MSE 1.155  4.043  (7  1) *1.210

 1.287
(t  1) * MSE
(7  1) *1.210
29% more efficient than a CRD
8 pts
c) If you were to conduct a similar experiment in the future, would you use the same
experimental design? If not, what changes would you make? What evidence is there
to support your answer?
I would conduct this experiment as an RBD with Columns as blocks. I might also
consider using fewer reps to save resources, since there are highly significant
differences among treatments (there is a good level of power).
The F critical value for 6 and 30 df is 2.42. Thus an F test for Rows would be
nonsignificant, whereas the F test for columns is significant. The results from Part b
further support the need to use columns as a blocking factor. There is no evidence that
the use of rows as a blocking factor helped to control experimental error.
Although not required, you could also support your answer by calculating RE in
comparison to an RBD.
Compared to an RBD with rows as blocks: (33% gain in efficiency)
RE 
MSC  (t  1)MSE 4.043  (7  1) *1.210

 1.33
t * MSE
7*1.210
Compared to an RBD with columns as blocks (1% loss in efficiency)
RE 
MSR  (t  1)MSE 1.155  (7  1) *1.210

 0.99
t * MSE
7*1.210
This last comparison shows that there is nothing to be gained by adding rows to an RBD
design with columns as blocks. Hence the RBD with columns as blocks is the best
choice.
3
6 pts
5) The arcsin transformation is often recommended for data that follow a binomial
distribution. Which of the data sets below is most likely to benefit from an arcsin
transformation? (circle one)
i. Grain protein content with different fertilizer treatments. Values range from 9.5 to 13.1
percent.
ii. Insects caught in traps at varying elevations. Results vary from 10 to 50 insects per
week. The means are proportional to the variance.
iii. Disease incidence (% infected plants) of crop varieties that have been uniformly
inoculated with the pathogen. Some varieties are highly resistant and others are highly
susceptible to the disease.
iv. Weed counts with varying herbicide treatments. Values range from 2 to 250. The
means are proportional to the standard deviation.
9 pts
6) One of the assumptions required for a valid ANOVA is that the residuals (errors) are
normally distributed. How can you determine if a data set that you have collected from
an experiment meets this assumption?
Boxplots for each treatment group and for the whole set of residuals can help to
determine if there is any skewness in the data or if there are extreme outliers. Q-Q plots
or normal probability plots can also be used – for these a fairly large number of
observations are needed, so it’s best to use the combined set of residuals for all
treatment groups. If the residuals fall close to the line, the normality assumption is met.
Formal tests for normality such as the Shapiro-Wilk test can also be conducted.
4
6) An experiment has been conducted to evaluate the effect of three methods for pruning
blueberries (no pruning, standard method, new method) and two fertilizer levels (low and
high) on fruit yield. The experiment includes all possible combinations of these two
treatment factors. The graph below shows the results for the experiment (error bars are
the standard errors of the means):
Effect of Pruning Method and Fertility on Fruit Yield
35
LOW
Fruit Yield
30
HIGH
25
20
15
10
5
0
None
Standard
New
Pruning Method
8 pts
a) Based on the results shown in the graph, do you anticipate that we will be able to
interpret the F test from the ANOVA for the main effect of pruning method in this
experiment? Explain your answer.
No. There is a strong interaction between the pruning method and level of fertility
applied. It is not meaningful to discuss the average effect of each pruning method,
because the effect of the pruning method depends on the level of fertility. In this case
the averages for pruning methods would be very similar and would probably not be
significant, when in fact there is a strong effect of pruning method on fruit yield at
each fertility level.
10 pts
b) Assuming that there are only two replications for this experiment, draw a possible
layout (randomization) for this experiment in the field. You may use a CRD or an
RBD (indicate which design you have chosen.) Include all of the experimental units
in your diagram and label the treatments that are assigned to each plot.
RBD
show either one
Block 1
Block 2
Standard-Low
New-High
None-High
New-Low
Standard-High
None-Low
None-Low
New-Low
Standard-High
New-High
Standard-Low
None-High
Standard-Low
New-Low
Standard-High
Standard-Low
None-Low
None-High
5
CRD
Standard-High
None-High
None-Low
New-High
New-Low
New-High
F Distribution 5% Points
Denominator
df
1
1 161.45
2 18.51
3 10.13
4
7.71
5
6.61
6
5.99
7
5.59
8
5.32
9
5.12
10
4.96
11
4.84
12
4.75
13
4.67
14
4.60
15
4.54
16
4.49
17
4.45
18
4.41
19
4.38
20
4.35
21
4.32
22
4.30
23
4.28
24
4.26
25
4.24
26
4.23
27
4.21
28
4.20
29
4.18
30
4.17
Student's t Distribution
Numerator
(2-tailed probability)
2
3
4
5
6
7
199.5 215.71 224.58 230.16 233.99 236.77
19.00 19.16 19.25
19.3 19.33 19.36
9.55
9.28
9.12
9.01
8.94
8.89
6.94
6.59
6.39
6.26
6.16
6.08
5.79
5.41
5.19
5.05
4.95
5.88
5.14
4.76
4.53
4.39
4.28
4.21
4.74
4.35
4.12
3.97
3.87
3.79
4.46
4.07
3.84
3.69
3.58
3.50
4.26
3.86
3.63
3.48
3.37
3.29
4.10
3.71
3.48
3.32
3.22
3.13
3.98
3.59
3.36
3.20
3.09
3.01
3.88
3.49
3.26
3.10
3.00
2.91
3.80
3.41
3.18
3.02
2.92
2.83
3.74
3.34
3.11
2.96
2.85
2.76
3.68
3.29
3.06
2.90
2.79
2.71
3.63
3.24
3.01
2.85
2.74
2.66
3.59
3.20
2.96
2.81
2.70
2.61
3.55
3.16
2.93
2.77
2.66
2.58
3.52
3.13
2.90
2.74
2.63
2.54
3.49
3.10
2.87
2.71
2.60
2.51
3.47
3.07
2.84
2.68
2.57
2.49
3.44
3.05
2.82
2.66
2.55
2.46
3.42
3.03
2.80
2.64
2.53
2.44
3.40
3.00
2.78
2.62
2.51
2.42
3.38
2.99
2.76
2.60
2.49
2.40
3.37
2.98
2.74
2.59
2.47
2.39
3.35
2.96
2.73
2.57
2.46
2.37
3.34
2.95
2.71
2.56
2.45
2.36
3.33
2.93
2.70
2.55
2.43
2.35
3.32
2.92
2.69
2.53
2.42
2.33
6
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
0.4
0.05
0.01
1.376 12.706 63.667
1.061 4.303 9.925
0.978 3.182 5.841
0.941 2.776 4.604
0.920 2.571 4.032
0.906 2.447 3.707
0.896 2.365 3.499
0.889 2.306 3.355
0.883 2.262 3.250
0.879 2.228 3.169
0.876 2.201 3.106
0.873 2.179 3.055
0.870 2.160 3.012
0.868 2.145 2.977
0.866 2.131 2.947
0.865 2.120 2.921
0.863 2.110 2.898
0.862 2.101 2.878
0.861 2.093 2.861
0.860 2.086 2.845
0.859 2.080 2.831
0.858 2.074 2.819
0.858 2.069 2.807
0.857 2.064 2.797
0.856 2.060 2.787
0.856 2.056 2.779
0.855 2.052 2.771
0.855 2.048 2.763
0.854 2.045 2.756
0.854 2.042 2.750
Download