Fixed vs. Random Effects

advertisement
Fixed vs. Random Effects
 Fixed effect
– we are interested in the effects of the treatments (or blocks) per se
– if the experiment were repeated, the levels would be the same
– conclusions apply to the treatment (or block) levels that were tested
– treatment (or block) effects sum to zero
 Random effect
 i  0
i
– represents a sample from a larger reference population
– the specific levels used are not of particular interest
– conclusions apply to the reference population
• inference space may be broad (all possible random effects)
or narrow (just the random effects in the experiment)
– goal is generally to estimate the variance among treatments
(or other groups)

 Need to know which effects are fixed or random to determine
appropriate F tests in ANOVA
2
T
Fixed or Random?
 lambs born from common parents (same ram and ewe)






are given different formulations of a vitamin supplement
comparison of new herbicides for potential licensing
comparison of herbicides used in different decades
(1980’s, 1990’s, 2000’s)
nitrogen fertilizer treatments at rates of 0, 50, 100, and
150 kg N/ha
years of evaluation of new canola varieties (2008, 2009,
2010)
location of a crop rotation experiment that is conducted
on three farmers’ fields in the Willamette valley (Junction
City, Albany, Woodburn)
species of trees in an old growth forest
Fixed and random models for the CRD
Yij = µ + i + ij
2t   i2 (t  1)
i
variance among fixed treatment effects
Fixed Model
(Model I)
Source
Treatment
Error
Random Model
(Model II)
Source
Treatment
Error
Expected
df
Mean Square
t -1  e2 + r T2
tr -t  e2
df
t -1
tr -t
Expected
Mean Square
2e + r2T
 e2
Yij = µ + i +j + ij
Models for the RBD
Fixed Model
Source
Block
Treatment
Error
df
r-1
t-1
(r-1)(t-1)
Source Random Model
Expected
Mean Square
e2 + t2B
e2 + rT2
 e2
Mixed Model
Source
Block
Treatment
Error
df
r-1
t-1
(r-1)(t-1)
Source
Block
Block
Treatment
Error
e2 + t2B
e2 + rT2
 e2
Treatment
Source
Expected
Mean Square
 + t
 + r

2
e
2
e
2
e
df
r-1
t-1
(r-1)(t-1)
Expected
Mean Square
2
B
2
T
T2    2j (t  1)
j
Block
2
B  
i
Treatment
2
i
(r  1)
Nested (Hierarchical) Designs
 Levels of one factor (B) occur within the levels of
another factor (A)
 Levels of B are unique to each level of A
 Factor B is nested within A
Factor A = the pigs (sows)
Factor B = the piglets
 Nested factors are usually random effects
Nested vs. Cross-Classified Factors
Nested
Cross-classified
A1
A2
A3
B1 B2
B3 B4
B5 B6
Each unit of B is unique
to each unit of A
B1
B2
A1
X
X
A2
X
X
A3
X
X
All possible
combinations of
A and B
General form for degrees of freedom
B nested in A  a(b-1)
A*B  (a-1)(b-1)
Sub - Sampling
 It may be necessary or convenient to measure a
treatment response on subsamples of a plot
– several soil cores within a plot
– duplicate laboratory analyses to estimate grain protein
 Introduces a complication into the analysis that can be
handled in one of two ways:
– compute the average for each plot and analyze normally
– subject the subsamples themselves to an analysis
 The second choice gives an additional source of variation
in the ANOVA – often called the sampling error
Use Sampling to Gain Precision
 When making lab measurements, you will have
better results if you analyze several samples to
get a truer estimate of the mean.
 It is often useful to determine the number of
samples that would be required for your chosen
level of precision.
 Sampling will reduce the variability within a
treatment across replications.
Stein’s Sample Estimate
2 2
1
2
Where
t s
n
d
t1 is the tabular t value for the desired confidence
level and the degrees of freedom of the initial
sample
d is the half-width of the desired confidence
interval
s is the standard deviation of the initial sample
For Example
•
•
•
•
•
We are measuring grain protein
content and want to increase the
precision for each replicate of a
treatment.
We collect and run five samples
from the same block and same
treatment.
We decide that an alpha level of
5% is acceptable and we would
like to be able to get within 0.5
units of the true mean.
We continue to apply the formula
until we get a stable estimate of n.
To obtain the desired level of
precision, we would need to run at
least 10 samples per block per
treatment.
Subsample
6.2
7.4
5.8
7
6.1
mean
variance
t (0.05, 4 df)
d
n
6.50
0.45
2.78
0.50
13.88
For n = 5
t12s2 2.782 * 0.45
n 2 
 13.88
2
d
0.5
For n = 14
t12s2 2.162 * 0.45
n 2 
 8.40
2
d
0.5
For n = 9
t12s2 2.312 * 0.45
n 2 
 9.57
2
d
0.5
For n = 10
t12s2 2.262 * 0.45
n 2 
 9.21
2
d
0.5
Linear model with sub-sampling
 For a CRD
Yijk= + i + ij + ijk
 = mean effect
i = ith treatment effect
ij = random error
ijk=sampling error
 For an RBD
Yijk= + i + j + ij + ijk
 = mean effect
βi = ith block effect
j = jth treatment effect
ij = treatment x block interaction, treated as error
ijk=sampling error
Expected Mean Squares – RBD with subsampling
Source
df
Expected Mean Square
Block
r-1
σ + nσ + tnσ
Treatment
t-1
s2 + ne2 + rn2t
Error
Sampling Error
(r-1)(t-1)
rt(n-1)
2
s
2
e
2
b
s2 + n e2
s2
 In this example, treatments are fixed and blocks are random effects
 This is a mixed model because it includes both fixed and random effects
 Appropriate F tests can be determined from the Expected Mean Squares
The RBD ANOVA with Subsampling
Source df
SS
MS
Total
rtn-1
SSTot =
Block
r-1

tn   Y  Y 
rn   Y  Y 
n   Y  Y   SSB  SST

 ijk Yijk  Y
SSB=
t-1
SST =
2
(r-1)(t-1)
SSE =
k
Sampling Error
rt(n-1)
SST/(t-1)
FT = MST/MSE
j
j
Error
SSB/(r-1)
2
i
i
Trtmt
F
2
SSE/(r-1)(t-1) FE = MSE/MSS
k
SSS =
SSS/rt(n-1)
SSTot-SSB-SST-SSE
Significance Tests
 MSS estimates
– the variation among samples
 MSE estimates
Therefore:
 FE

–
– the variation among samples plus
– the variation among plots treated
alike
 MST estimates
– the variation among samples plus
– the variation among plots treated
alike plus
– the variation among treatment
means

tests the
significance of the
variation among
plots treated alike
FT
–
tests the
significance of the
differences
among the
treatment means
Means and Standard Errors
2
2
2
2

+
n



MSE
e
s2Y 
 s
 s + e
rn
rn
rn
r
Standard Error of a treatment mean
s Y  MSE rn
Confidence interval estimate
L   i   Y i  t  MSE rn
Standard Error of a difference
s  Y  Y   2MSE rn
1
2
Confidence interval estimate L   1   2    Y 1  Y 2   t  2MSE rn
t to test difference between
two means
Y
1  Y2
t
2MSE rn
Allocating resources – reps vs samples
 Cost function
n
C = c1r + c2rn
– c1 = cost of an experimental unit
– c2 = cost of a sampling unit
2
c1s
2
c 2e
 If your goal is to minimize variance for a fixed cost,
use the estimate of n to solve for r in the cost function
 If your goal is to minimize cost for a fixed variance,
use the estimate of n to solve for r using the formula
for a variance of a treatment mean
2
2
2
y
See Kuehl pg 163 for an example
s e

+
rn
r
Download