So far...

advertisement
So far...
 We have been estimating differences caused by
application of various treatments, and
determining the probability that an observed
difference was due to chance
 The presence of interactions may indicate that
two or more treatment factors have a joint effect
on a response variable
 But we have not learned anything about how two
(or more) variables are related
Types of Variables in Crop Experiments
 Treatments such as fertilizer rates, varieties, and
weed control methods which are the primary
focus of the experiment
 Environmental factors, such as rainfall and solar
radiation which are not within the researcher’s
control
 Responses which represent the biological and
physical features of the experimental units that
are expected to be affected by the treatments
being tested
What is Regression?
 The way that one variable is related to another.
 As you change one, how are others affected?
Yield
Grain Protein %
 May want to
– Develop and test a model for a biological system
– Predict the values of one variable from another
Usual associations within ANOVA...
 Agronomic experiments frequently consist of different
levels of one or more quantitative variables:
– Varying amounts of fertilizer
– Several different row spacings
– Two or more depths of seeding
 Would be useful to develop an equation to describe the
relationship between plant response and treatment level
– the response could then be specified for not only the
treatment levels actually tested but for all other
intermediate points within the range of those treatments
 Simplest form of response is a straight line
Fitting the Linear Regression Model
Y2
Wheat
Yield
(Y)
Y4
Y = 0 + 1X + 
X4
where:
Y = wheat yield
X = nitrogen level
0 = yield with no N
1 = change in yield per
unit of applied N
 = random error
Y3
Y1
X1
X2
X3
Applied N Level
 Choose a line that minimizes deviation of
observed values from the line (predicted values)
Types of regression models
 Model I
– Values of the independent variable X are controlled by the
experimenter
– Assumed to be measured without error
– We measure response of the independent variable Y to
changes in X
 Model II
– Both the X and the Y variables are measured and subject to
error (e.g., in an observational study)
– Either variable could be considered as the independent
variable; choice depends on the context of the experiment
– Often interested in correlations between variables
– May be descriptive, but might not be reliable for prediction
Sums of Squares due to Regression
Y  0  1X  
Ŷ  a  bX
 Because the line passes through X,Y
Y  a  bX
a  Y  bX
Ŷ  Y  b  X  X 
 j (X j  X)(Yj  Y) SCPXY  XY
b

 2
2
SSX
X
 j (X j  X)
  j (X j  X)(Yj  Y) 
SSR 
2
(X

X)
j j
2
Partitioning SST
 Sums of Squares for Treatments (SST)
contains:
– SSLIN = Sum of squares associated with the
linear regression of Y on X (with 1 df)
– SSLOF = Sum of squares for the failure of the
regression model to describe the relationship
between Y and X (lack of fit) (with t-2 df)
One way:
 Find a set of coefficients that define a linear
contrast
– use the deviations of the treatment levels from
the mean level of all treatments
– so that k j  X j  X
 Therefore
LLIN   j (X j  X)Yj
 The sum of the coefficients will be zero,
satisfying the definition of a contrast
Computing SSLIN
_
 SSLIN = r*LLIN2/[Sj (Xj - X)2]
really no different from any other
contrast - df is always 1
 SSLOF (sum of squares for lack of fit) is computed
by subtraction
SSLOF = SST - SSLIN
(df is df for treatments - 1)
 Not to be confused with SSE which is still the SS
for pure error (experimental error)
F Ratios and their meaning
 All F ratios have MSE as a denominator
 FT = MST/MSE tests
– significance of differences among the treatment means
 FLIN = MSLIN/MSE tests
– H0: no linear relationship between X and Y (1 = 0)
– Ha: there is a linear relationship between X and Y ( 1  0)
 FLOF = MSLOF/MSE tests
– H0: the simple linear regression model describes the data
E(Y) = 0 + 1X
– Ha: there is significant deviation from a linear relationship
between X and Y
E(Y)  0 + 1X
The linear relationship
 The expected value of Y given X is described by
the equation:
Ŷj  Y  b1 (X j  X)
where:
– Y = grand mean of Y
– Xj = value of X (treatment level) at which Y is
estimated
– LLIN   j (X j  X)Yj
SSLIN
r * L2LIN

2
 j (X j  X)
b1 
L LIN
2
(X

X)
j j
Orthogonal Polynomials
 If the relationship is not linear, we can simplify
curve fitting within the ANOVA with the use of
orthogonal polynomial coefficients under these
conditions:
– equal replication
– the levels of the treatment variable must be equally
spaced
• e.g., 20, 40, 60, 80, 100 kg of fertilizer per plot
Curve fitting
 Model: E(Y) = 0 + 1X + 2X2 + 3X3 +…
 Determine the coefficients for 2nd order and higher
polynomials from a table
 Use the F ratio to test the significance of each contrast.
 Unless there is prior reason to believe that the equation
is of a particular order, it is customary to fit the terms
sequentially
 Include all terms in the equation up to and including the
term at which lack of fit first becomes nonsignificant
Table of coefficients
Where do linear contrast coefficients come from? (revisited)
 L LIN   j (X j  X)Yj
 Assume 5 Nitrogen levels: 30, 60, 90, 120, 150
_
 x = 90
 k1 = (-60, -30, 0, 30, 60)
 If we code the treatments as 1, 2, 3, 4, 5
_
 x =3
 k1 = (-2, -1, 0, 1, 2)
_
 b1 = LLIN / [r Sj (xj - x)2], but must be decoded back to
original scale
X  X
k1  1 

d


Consider an experiment
 Five levels of N (10, 30, 50, 70, 90) with four
replications
SSLIN
r * L2LIN

2
(X

X)
j j
 Linear contrast
– L LIN  (2)Y1  (1)Y2  (0)Y3  (1)Y4  (2)Y5
– SSLIN = 4* LLIN2 / 10
 Quadratic
– LQUAD  (2)Y1  (1)Y2  (2)Y3  (1)Y4  (2)Y5
– SSQUAD = 4*LQUAD2 / 14
LOF still significant? Keep going…
 Cubic
– LCUB  (1)Y1  (2)Y2  (0)Y3  ( 2)Y4  (1)Y5
– SSCUB = 4*LCUB2 / 10
 Quartic
– LQUAR  (1)Y1  (4)Y2  (6)Y3  (4)Y4  (1)Y5
– SSQUAR = 4*LQUAR2 / 70
 Each contrast has 1 degree of freedom
 Each F has MSE in denominator
Numerical Example
 An experiment to determine the effect of nitrogen on the
yield of sugarbeet roots:
– RBD
– three blocks
– 5 levels of N (0, 35, 70, 105, and 140) kg/ha
 Meets the criteria
– N is a quantitative variable
– levels are equally spaced
– equally replicated
 Significant SST so we go to contrasts
Orthogonal Partition of SST
N level (kg/ha)
0
35
70
105
140
Li
Sj kj2
Order Mean
28.4 66.8 87.0 92.0 85.7
SS(L)i
Linear
-2
-1
0
+1
+2
46.60
10 651.4780
Quadratic
+2
-1
-2
-1
+2
-34.87
14 260.5038
Cubic
-1
+2
0
-2
+1
2.30
10
1.5870
Quartic
+1
-4
+6
-4
+1
0.30
70
.0039
Sequential Test of Nitrogen Effects
Source
df
SS
MS
F
(1)Nitrogen
4
913.5627 228.3907
64.41**
(2)Linear
1
651.4680 651.4680
183.73**
3
262.0947
Dev (LOF)
(3)Quadratic 1
Dev (LOF) 2
87.3649
24.64**
260.5038 260.5038
73.47**
1.5909
.7955
0.22ns
 Choose a quadratic model
– First point at which the LOF is not significant
– Implies that a cubic term would not be significant
Regression Equation
bi = LREG / Sj kj2
Useful for prediction
To scale to original
X values
Coefficient
b0
b1
b2
23.99
4.66
-2.49
Ŷj  Y  4.66k1j  2.49k 2 j
for example, at 0 kg N/ha
Ŷ1  23.99  0.418(2)  0.002(2)  9.69
X  X
k1  1 

d


 X  X  2  t 2  1  
k 2   2 



 d   12  
Y  9.69  0.418X  0.002X 2
Easier way 1) use contrasts to find the best model and estimate pure error
2) get the equation from a graph or from regression analysis
Common misuse of regression...
 Broad Generalization
– Extrapolating the result of a regression line outside
the range of X values tested
– Don’t go beyond the highest nitrogen rate tested, for
example
– Or don’t generalize over all varieties when you have
just tested one
 Do not over interpret higher order polynomials
– with t-1 df, they will explain all of the variation among
treatments, whether there is any meaningful pattern to
the data or not
Class vs nonclass variables
 General linear model in matrix notation
Y = Xß + 
 X is the design matrix
– Assume a CRD with 3 fertilizer treatments, 2 replications
This column is dropped - it provides no additional information
 x 1 x2 x3
 L1 L2
1
1
0
0
1
-1
1
1
30
900
1
1
0
0
1
-1
1
1
30
900
1
0
1
0
1
0
-2
1
60 3600
1
0
1
0
1
0
-2
1
60 3600
1
0
0
1
1
1
1
1
90 8100
1
0
0
1
1
1
1
1
90 8100
ANOVA
(class variables)
Orthogonal
polynomials
b0 x
x2
Regression
(continuous variables)
Download