b - SLU

advertisement
Så används statistiska
metoder i jordbruksförsök
Svenska statistikfrämjandets vårkonferens den 23 mars 2012 i Alnarp
Johannes Forkman, Fältforsk, SLU
Agricultural field experiments
Experimental treatments
•
•
•
•
•
Varieties
Weed control treatments
Plant protection treatments
Tillage methods
Fertilizers
Experimental design
Allocate Treatments A and B to eight plots...
Option 1:
A
A
A
A B
Option 2:
A B
A B
A B
A B
Option 3:
A B
B
A B
B
A
B
B
B
A
Systematic error
• The plots differ...
• The treatments are not compared on equal
terms.
• There will be a systematic error in the
comparison of A and B.
Randomise the
treatments. This
procedure transforms
the systematic error
into a random error.
R. A. Fisher
Example
Treatment
Yield (kg/ha)
Mean (kg/ha)
8165
A
A
7792
8397
A
B
B
B
7764
8483
8602
8641
8029.3
B
8783
8627.2
The difference is 598
A
Randomisation test
• The observed difference is 598 kg/ha.
• There are 8!/(4! 4!) = 70 possible random
arrangements.
• The two most extreme differences are 598 and -598.
• P-value = 2/70 = 0.029
t-test
tī€Ŋ
598
598
ī€Ŋ
ī€Ŋ 3.63
93438 15298 164.9
ī€Ģ
4
4
Compare with a t-distribution with 6 degrees of freedom
P-value = 0.011
The randomisation model
𝐲 = 𝐗𝛃 + 𝐞 ,
E 𝐲 = 𝐗𝛃
cov(𝐲) = (𝐈 − 𝑁 −1 𝟏𝟏′ )𝝈𝟐
𝑁 is the number of available plots.
The approximate model
𝐲 = 𝐗𝛃 + 𝐞 ,
E 𝐲 = 𝐗𝛃
When 𝑁 is infinitely large
cov(𝐲) = 𝐈𝝈𝟐 .
For statistical tests, we assume further that
𝐲 = MVN(𝐗𝛃 , 𝐈𝝈𝟐 ).
A crucial assumption
Unit-Treatment additivity:
• Variances and covariances do not depend on
treatment
Heterogeneity
A
B
Inference about what??
• Randomisation model: The average if the
treatment was given to all plots of the experiment.
• The approximate model: The average if the
treatment was given to infinitely many plots?
Sample
Population
Variance in a difference
When cov(𝐲) = (𝐈 − 𝑁 −1 𝟏𝟏′ )𝝈𝟐 , then
var đ‘Ļ1 − đ‘Ļ2 =
1
𝑟1
+
1
𝑟2
𝜎2.
+
1
𝑟2
𝜎2.
When cov 𝐲 = 𝐈𝝈𝟐 , then
var đ‘Ļ1 − đ‘Ļ2 =
1
𝑟1
Independent errors
• Randomisation gives approximately independent
error terms
• Information about plot position was ignored
• This information can be utilized
B
A B
A
A B
B
A
Tobler’s law of geography
“Everything is related to
everything else, but near
things are more related
than distant things.”
Waldo Tobler
Random fields
The random function Z(s) is a
• stochastic process if the plots belong to
a space in one dimension
• random field, if the plots belong to a
space in two or more dimensions
Semivariogram
(|| h ||)
sill
95%
practical range
|| h ||
Spatial modelling
• Can improve precision.
• Still rare in analysis of agricultural field experiments.
• There are many possible spatial models and
methods.
• Can be used whether or not the treatments were
randomized...
• Which is the best design for spatial analysis?
Randomised block design
Gradient
III
E
F
C
D
G
A
H
B
II
A
C
G
B
D
E
H
F
I
G
H
D
E
F
B
C
A
Incomplete block design
V
VIII
VII
VI
2
M I
A E L P D H G O C K J
III
IV
II
B N F
I
1
I
J
K L P O M N F E H G A D C B
Strata
• Replicates
• Blocks
• Plots
Ofullständiga block
Split-plot design
Replicate I
1
D A C B
3
Replicate II
2
B C A D C B D A
Strata
• Replicates
• Plots
• Subplots
2
B D C A
1
C B A D
3
B C A D
Replicate I
1
3
Replicate II
2
2
1
3
Comparison
1a
D A C B
B C A D C B D A
B D C A
C B A D
B C A D
1b
D A C B
B C A D C B D A
B D C A
C B A D
B C A D
2a
D A C B
B C A D C B D A
B D C A
C B A D
B C A D
2b
D A C B
B C A D C B D A
B D C A
C B A D
B C A D
3
D A C B
B C A D C B D A
B D C A
C B A D
B C A D
A design with several strata
Each replicate:
sown conventionally
sown with no tillage
cultivar 2
cultivar 1
cultivar 3
Mo applied
Mo applied
Bailey, R. A. (2008). Design of comparative experiments. Cambridge University Press.
The linear mixed model
y = Xb + Zu + e
X: design matrix for fixed effects (treatments)
Z: design matrix for random effects (strata)
u is N(0, G)
e is N(0, R)
Bates about error strata
“Those who long ago took courses in "analysis of
variance" or "experimental design" that
concentrated on designs for agricultural
experiments would have learned methods for
estimating variance components based on
observed and expected mean squares and
methods of testing based on "error strata". (If
you weren't forced to learn this, consider yourself
lucky.) It is therefore natural to expect that the F
statistics created from an lmer model (and also
those created by SAS PROC MIXED) are based
on error strata but that is not the case.”
Approximate t and F-tests
𝑡=
đ‘ŗ𝜷
when L is one-dimensional, and
var(đ‘ŗ𝜷)
𝐹=
𝐋𝜷 ′ var 𝐋𝜷
−1
rank var 𝐋𝜷
𝐋𝜷
otherwise.
The number of degrees of freedom is an issue.
SAS: the Satterthwaite or the Kenward & Roger method.
Likelihood ratio test
Full model (FM):
p parameters
Reduced model (RM):
q parameters
L(ī­ FM | y)
2log
L(ī­ RM | y)
is asymptotically c2 with p – q degrees of freedom.
Bayesian analysis
y = Xb + Zu + e
u is N(0, G)
e is N(0, R)
G is diag(Φ)
R is diag(σ2)
Independent priori distributions: p(b), p(Φ)
Sampling from the posterior distribution: p(b,Φ | y)
P-values in agricultural research
• Only discuss statistically significant results
• Do not discuss biologically insignificant results
(although they are statistically significant).
• “Limit statements about significance to those
which have a direct bearing on the aims of the
research”. (Onofri et al., Weed Science, 2009)
Shrinkage estimators
Galwey (2006). Introduction to mixed modelling. Wiley.
Fixed or random varieties?
Fixed varieties (BLUE)
• Few varieties
• Estimation of differences
Random varieties (BLUP)
• Many varieties
• Ranking of varieties
Conclusions based on a
simulation study
i.
Modelling treatment as random is efficient for
small block experiments.
ii. A model with normally distributed random effects
performs well, even if the effects are not normally
distributed.
iii. Bayesian methods can be recommended for
inference about treatment differences.
Summary
• Fisher’s ideas about randomisation and blocking
are still predominant.
• Strong focus on p-values.
• Linear mixed models are used extensively.
• Spatial and Bayesian methods are used less often.
• The question is what is random and fixed, and how
to calculate p-values.
Tack för uppmärksamheten!
Download