CSS 590 Field Plot Technique

advertisement
CROP 590 Experimental Design in Agriculture
Lab exercise – 6th week
Factorial Experiments
Orthogonal Contrasts
Regression in the ANOVA
SAS On-line Documentation
PROC GLM
REG Procedure
Part I. Two-Way Factorial experiment
The next tool we will learn in SAS is how to write the model statement for a factorial
experiment so that we can look at the main effect and interaction sources of variation in the
ANOVA table.
A forage crop specialist wanted to determine the effect of a Rhizobium seed inoculant and
phosphorus fertilizer on the yield of alfalfa grown for hay. Because he expected the effect
of the inoculant to depend on whether or not fertilizer had been applied, he decided to use
a factorial set of treatments with an inoculation factor at two levels and a phosphorus factor
at two levels. He used a randomized block design with four blocks. His treatments were:
C
P
I
PI
No phosphorus, no seed treatment
25 kg/ha phosphorus added
Seed treated with inoculum
Both phosphorus and seed treatment
1) Download the data file for this exercise (Lab6_2way.xlsx). Input your data into SAS.
2) Use Proc GLM to conduct the ANOVA. Remember to define all of your independent
variables as class variables. The Model can be written in two ways:
MODEL Yield = Block Phosphorus|Seedtreat;
or
MODEL Yield = Block Phosphorus Seedtreat Phosphorus*Seedtreat;
The above statements are equivalent because the vertical ‘|’ tells SAS to consider all
main effects and interactions between the two factors. (Note: the vertical bar may
appear as two vertical dashes on your keyboard)
What do the results of the ANOVA indicate about the treatments? Are the interactions
significant? Are the main effects significant?
Use an LSMEANS statement to obtain the means and standard errors.
LSMEANS Phosphorus|Seedtreat /stderr tdiff;
Compare the results of the t tests for main effect means with the results from the
ANOVA. Is there any reason to perform mean comparison tests in this case?
1
3) Have a quick look at a plot of the means:
The ‘nway’ option tells SAS to calculate the means for interactions but not for main
effects
The NWAY option tells SAS to calculate
PROC MEANS NWAY;
the means at the level of the
CLASS seedtreat phosphorus;
interactions but not for main effects.
VAR Yield;
By default, the means in the output
OUTPUT OUT=new2 MEAN=;
RUN;
data set will be given the name 'Yield',
because that is the response variable
name.
PROC GPLOT DATA=new2;
PLOT Yield*phosphorus=seedtreat;
SYMBOL1 i=join VALUE=dot;
RUN;
QUIT;
The symbol1 statement will create dots
for the data points on the graph
(VALUE=dot) and connect the dots for
each seedtreatment with a line
(i=join).
Part II. Three-way factorial
An experiment was conducted to investigate the effect of zinc and copper added to basic
diets of maize or wheat fed to chickens. The treatments were arranged in a 23 factorial with
four broods of chicks used as blocks. The results are expressed as average weekly gains in
grams (Lab6_3way.xlsx).
1) Write a SAS program to analyze the data. The model statement is an extension of the
two-factor analysis. Remember that the vertical slash ‘|’ is equivalent to specifying all of
the possible main effects and interactions among the factors.
PROC GLM;
CLASS Block Zinc Copper Grain;
MODEL Weight = Block Zinc|Copper|Grain;
MEANS Zinc|Copper|Grain;
RUN;
2) How would you interpret these results? What could you add to obtain estimates of
standard errors of means?
3) Create graphs of the interactions of interest.
PROC MEANS nway;
CLASS Copper Grain;
VAR weight;
OUTPUT OUT=new3 MEAN=;
RUN;
PROC GPLOT DATA=new3;
PLOT weight*copper=grain;
SYMBOL1 i=join VALUE=dot;
RUN;
2
Part III. Using contrast statements in a factorial experiment
An agricultural chemist suspected that the activity of root growth hormone was dependent
on temperature and concentration. He designed an experiment using two different
temperature baths of a nutrient solution containing 2 ppm, 4 ppm, and 6 ppm of the root
growth hormone. He had two assistants to help measure the root growth response so the
experiment was run as a randomized block design with two blocks.
TEMP CONC BLOCK ROOTS
Cold
Cold
Cold
Cold
Cold
Cold
Warm
Warm
Warm
Warm
Warm
Warm
2
2
4
4
6
6
2
2
4
4
6
6
1
2
1
2
1
2
1
2
1
2
1
2
76
60
63
58
50
45
23
15
36
25
47
36
1) Copy this data into a SAS data step using the appropriate input statement and a
datalines statement.
2) Develop a set of orthogonal contrasts to answer the following questions:
a.
b.
c.
d.
Is there a difference in root growth due to temperature?
Is there a linear response to concentration?
Is there a quadratic response to concentration?
Does the nature of the response to concentration (linear or quadratic) depend on
the temperature?
Cold, 2 ppm
Cold, 4 ppm
Cold, 6 ppm
Warm, 2 ppm
Warm, 4 ppm
3) Run a SAS program on this data to address these questions.
PROC GLM plots=diagnostics;
TITLE 'contrasts in a factorial experiment';
CLASS block temp conc;
MODEL roots = block temp conc temp*conc;
3
Warm, 6 ppm
/*make sure that your coefficients correspond to the order of your
means*/
CONTRAST
CONTRAST
CONTRAST
CONTRAST
CONTRAST
'main
'conc
'conc
'temp
'temp
effect of temp' temp -1 1;
lin' conc -1 0 1;
quad' conc 1 -2 1;
x conc lin' temp*conc 1 0 -1 -1 0 1;
x conc quad' temp*conc -1 2 -1 1 -2 1;
LSMEANS conc temp;
LSMEANS temp*conc/out=new;
PROC GPLOT data=new;
TITLE 'Effect of conc and temp on root growth';
SYMBOL1 i=join v=dot;
AXIS1 label=('Root growth');
PLOT lsmean*conc=temp/ vaxis=axis1;
RUN;
4) How would you interpret the output?
5) You can also use the “ESTIMATE” statement in PROC GLM rather than contrast
statements. The syntax is very similar, but you may want to use the “divisor” option to
get the proper scaling of the linear contrast estimates. The results of the t tests will be
the same as before. Use a CONTRAST statement if you are most interested in
determining the proportion of the treatment Sum of Squares that is due to the contrast.
Use the ESTIMATE statement if you are interested in the actual value of the linear
contrast.
Estimate 'main effect of temp' temp -1 1;
Estimate 'conc lin' conc -1 0 1;
Estimate 'conc quad' conc 1 -2 1/divisor=2;
Estimate 'temp x conc lin' temp*conc 1 0 -1 -1 0 1;
Estimate 'temp x conc quad' temp*conc -1 2 -1 1 -2 1/divisor=2;
You can also use the ESTIMATE statement to get an unbiased estimate of a single
mean. For example, for the low temperature treatment:
Estimate 'BLUE mean of low temp' intercept 1 temp 1 0;
Compare your results using Estimate statements to the output from the Contrasts and
lsmeans statements.
4
Part IV. Part I using nonclass variables
One could analyze the data from Part I by considering the quantitative factor
(concentration) to be a nonclass variable:
PROC GLM data=A;
TITLE 'regression vs contrasts in a factorial experiment';
CLASS block temp;
MODEL roots = block temp conc conc*conc temp*conc temp*conc*conc/SS1
solution;
RUN;
Part V: Regression with orthogonal polynomials compared to regression on a
quantitative variable (this section is optional!)
An experiment was conducted to determine the effect of storage temperature on seed
viability. Fifteen seed samples were obtained and three samples, selected at random from
the fifteen were stored at each of five temperatures: 10, 30, 50, 70, 90. At the end of a one
year storage period the samples were tested for germination percentage with the following
results:
Temp
10
10
10
30
30
30
50
50
50
70
70
70
90
90
90
Germination
62
55
57
26
36
31
16
15
23
10
11
18
13
11
9
1) Copy this data into a SAS data step using the appropriate input statement and a
datalines statement.
2) Use Proc GLM to analyze the data as a CRD. Remember to specify that temperature is a
class variable. Are there significant differences among the Temperature treatments?
3) Use Contrast statements to determine if the effect of temperature is linear, quadratic,
cubic, or quartic (refer to the table of polynomial coefficients handed out in class).
PROC GLM;
TITLE 'Use of orthogonal polynomials';
CLASS Temp;
5
MODEL Germination = Temp;
CONTRAST 'Temp Linear' Temp -2 -1 0 1 2;
CONTRAST 'Temp Quadratic' Temp 2 -1 -2 -1 2;
CONTRAST 'Temp Cubic' Temp -1 2 0 -2 1;
CONTRAST 'Temp Quartic' Temp 1 -4 6 -4 1;
LSMEANS Temp / stderr out=new2;
PROC GPLOT data=new2;
PLOT lsmean*Temp;
RUN;
QUIT;
Based on your ouput, which polynomials will you retain in your model?
4) Now try running the same regression, but consider Temperature to be a continuous
rather than a class variable.
PROC GLM data=yourfile;
TITLE 'Linear Regression of germination on temperature';
MODEL Germination = Temp;
PROC GLM data=yourfile;
TITLE 'Quadratic Regression of germination on temperature';
MODEL Germination = Temp Temp*Temp;
PROC GLM data=yourfile;
TITLE 'Cubic Regression of germination on temperature';
MODEL Germination = Temp Temp*Temp Temp*Temp*Temp;
PROC GLM data=yourfile;
TITLE 'Quartic Regression of germination on temperature';
MODEL Germination = Temp Temp*Temp Temp*Temp*Temp
Temp*Temp*Temp*Temp;
RUN;
Proc GLM provides direct estimates of the regression coefficients for nonclass variables.
Note how these change as we go to higher order polynomials.
Compare Type I vs Type III SS as we go to higher order polynomials. For regression
analysis using nonclass variables, we generally use the Type I (sequential) SS. Type I SS
partitions the Model SS into component SS due to adding each variable sequentially to the
model in the order that they appear in the model statement. Type III SS (partial SS)
determines the SS explained be each variable after all of the other variables have been
included in the model, regardless of the order that they appear in the model statement.
Compare the SS for the quartic regression to the output from the analysis of contrasts.
What happens to MSE as we go to higher order polynomials? How does it compare to
the result from the analysis of contrasts? How do you explain the differences in the error
term?
PROC REG is a general purpose regression procedure in SAS that could be used in this
example. PROC GLM or PROC MIXED are better choices if you have both class and nonclass
6
variables in your model. PROC REG has nine options for selecting the best model. In our
example, we could request a forward selection process to sequentially add terms to the
model. Because PROC REG cannot analyze a squared or higher order polynomial term in the
model statement, we must first create new variables representing these terms:
data four;
set three;
tempsq=temp*temp;
tempcu=temp*temp*temp;
tempqu=temp*temp*temp*temp;
PROC REG data=four;
TITLE 'Forward Selection';
MODEL Germination = Temp tempsq tempcu tempqu/SELECTION=FORWARD ss1;
RUN;
QUIT;
7
Download