Document 10639957

advertisement
ANOVA Variance Component Estimation
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
1 / 32
We now consider the ANOVA approach to variance component
estimation.
The ANOVA approach is based on the method of moments.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
2 / 32
Suppose
y = Xβ + Zu + e, where
u and e are independent random vectors satisfying
E(u) = 0 and
c
Copyright 2012
Dan Nettleton (Iowa State University)
E(e) = 0.
Statistics 611
3 / 32
Furthermore, suppose n×q
Z can be partitioned as
Z = [ Z1 , . . . , Zm ]
n×q1
n×qm
and u can be partitioned correspondingly as


u1
 . 
. 
u=
 . 
um
so that
Zu =
c
Copyright 2012
Dan Nettleton (Iowa State University)
m
X
Zj uj .
j=1
Statistics 611
4 / 32
Suppose


u1
 . 
2 0
2 0
 . 
Var(u) = Var 
 .  = diag(σ1 1q1 , . . . , σm 1qm )
um


σ12 I
0
 q1 ×q1



...
= 
.


0
σm2 I
qm ×qm
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
5 / 32
Then
Var(y) =
m+1
X
σj2 Zj Z0j ,
j=1
2
where Zm+1 ≡n×n
I and σm+1
= σe2 .
By Lemma 4.1,
E(y0 Ay) = tr(AVar(y)) + [E(y)]0 AE(y)
m+1
X
=
σj2 tr(AZj Z0j ) + β 0 X0 AXβ
j=1
=
m+1
X
σj2 tr(Z0j AZj ) + β 0 X0 AXβ.
j=1
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
6 / 32
Now suppose we choose a set of matrices A1 , . . . , Am+1 3
X0 Ai X = 0 ∀ i = 1, . . . , m + 1.
Then
0
E(y Ai y) =
m+1
X
σj2 tr(Z0j Ai Zj )
∀ i = 1, . . . , m + 1.
j=1
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
7 / 32
If we use the method of moments of moments, we replace
E(y0 Ai y) with its observed value y0 Ai y to obtain the equations
0
y Ai y =
m+1
X
σ̂j2 tr(Z0j Ai Zj )
∀ i = 1, . . . , m + 1.
j=1
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
8 / 32
We can write these equations in matrix form as

tr(Z01 A1 Z1 )
...
tr(Z0m+1 A1 Zm+1 )

σ̂12


y0 A1 y



 

 tr(Z01 A2 Z1 ) . . . tr(Z0m+1 A2 Zm+1 )   σ̂22   y0 A2 y 


=

..
..
..

  ..   ..  .
.
.
.

 .   . 
0
0
2
tr(Z1 Am+1 Z1 ) . . . tr(Zm+1 Am+1 Zm+1 ) σ̂m+1
y0 Am+1 y
2
Solving for σ̂12 , σ22 , . . . , σ̂m+1
gives the ANOVA estimates.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
9 / 32
The matrices A1 , . . . , Am+1 are usually chosen so that
y0 A1 y, . . . , y0 Am+1 y
correspond to sums of squares from an ANOVA table.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
10 / 32
Many strategies for choosing A1 , . . . , Am+1 have been proposed.
The book Variance Components by Searle, Casella, and
McCulloch contains an extensive discussion of several
strategies.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
11 / 32
Let M denote the matrix whose (i, j)th element is tr(Z0j Ai Zj ).
Let
s = [y0 A1 y, y0 A2 y, . . . , y0 Am+1 y]0 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
12 / 32
If M is nonsingular, then the vector of ANOVA variance
component estimates is


σ̂12
 . 
−1
. 
σ̂ 2 ≡ 
 .  = M s.
2
σ̂m+1
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
13 / 32
Recall M and s were chosen so that

Mσ 2 = E(s),
where

σ12
 . 
. 
σ2 = 
 . .
2
σm+1
Thus,
E(σ̂ 2 ) = E(M−1 s) = M−1 E(s)
= M−1 Mσ 2 = σ 2 .
∴ the ANOVA estimator of σ 2 is unbiased.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
14 / 32
Find ANOVA-based estimates of σp2 and σe2 for the seedling dry
weight example.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
15 / 32
Recall

1

1

1


1

1

X=
1

1


1


1
1

1 0

1 0

1 0


1 0

1 0

,
0 1

0 1


0 1


0 1
0 1
c
Copyright 2012
Dan Nettleton (Iowa State University)

1

1

1


0

0

Z=
0

0


0


0
0

0 0 0

0 0 0

0 0 0


1 0 0

1 0 0

.
0 1 0

0 1 0


0 1 0


0 0 1
0 0 1
Statistics 611
16 / 32
Because there is only one variance component besides the error
variance, we have m = 1, Z1 = Z, and Z2 = I.
We need to identify matrices A1 and A2 such that
X0 A1 X = 0 and X0 A2 X = 0.
Of course, we also require the quadratic forms y0 A1 y and y0 A2 y to
contain information about σp2 and σe2 .
To find appropriate A1 and A2 , start by writing down an ANOVA
table with columns labeled Source, Matrix, and DF.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
17 / 32
Source
Matrix
DF
Intercept
P1
1
Genotype
PX − P1
2−1=1
Pot(Genotype) PZ − PX
4−2=2
Error
I − PZ
10 − 4 = 6
Total
I
10
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
18 / 32
Which of the matrices in the ANOVA table satisfy X0 AX = 0?
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
19 / 32
Let
A1 = PZ − PX
and A2 = I − PZ .
∵ C(X) ⊂ C(Z), PZ X = X.
∴ X0 (PZ − PX )X = X0 (PZ X − PX X)
= X0 (X − X) = 0.
Also, X0 (I − PZ )X = X0 (X − PZ X) = X0 (X − X) = 0.
∴ X0 Ai X = 0
∀ i = 1, 2.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
20 / 32
To apply the ANOVA variance component estimation method, we
need to set up the equations
"
#" #
tr(Z01 A1 Z1 ) tr(Z02 A1 Z2 ) σ̂p2
tr(Z01 A2 Z1 ) tr(Z02 A2 Z2 )
σ̂e2
"
=
#
y0 A1 y
y0 A2 y
⇐⇒
"
tr(Z0 (PZ − PX )Z) tr(PZ − PX )
tr(Z0 (I − PZ )Z)
tr(I − PZ )
c
Copyright 2012
Dan Nettleton (Iowa State University)
#" #
σ̂p2
σ̂e2
"
=
#
y0 (PZ − PX )y
y0 (I − PZ )y
.
Statistics 611
21 / 32
Let nij = number of seedlings in the jth pot for genotype i.
Thus, we have n11 = 3, n12 = 2, n21 = 3, n22 = 2.
Write y0 A1 y = y0 (PZ − PX )y and y0 A2 y = y0 (I − PZ )y using
summation notation.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
22 / 32
y0 A1 y = y0 (PZ − PX )y = k(PZ − PX )yk2
= kPZ y − PX yk2

 
2
ȳ11· 1
ȳ
1
1··
3×1
3×1

 

ȳ12· 1  ȳ1·· 1 
2×1 

 2×1 
= ȳ 1  − ȳ 1 
 21· 3×1   2·· 3×1 
ȳ22· 1
ȳ
1
2··
2×1
2×1
= 3(ȳ11· − ȳ1·· )2 + 2(ȳ12· − ȳ1·· )2
+3(ȳ21· − ȳ2·· )2 + 2(ȳ22· − ȳ2·· )2
2 X
2
X
=
nij (ȳij· − ȳi·· )2 .
i=1 j=1
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
23 / 32
y0 A2 y = y0 (I − PZ )y
= k(I − PZ )yk2
= ky − PZ yk2
nij
2 X
2 X
X
=
(yijk − ȳij· )2 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
i=1 j=1 k=1
Statistics 611
24 / 32
Find tr(Z0 (PZ − PX )Z), tr(Z0 (I − PZ )Z), tr(PZ − PX ), and tr(I − PZ ).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
25 / 32

3

3

3


3

3

Note that PX Z = 15 
0

0


0


0
0

2 0 0

2 0 0

2 0 0


2 0 0

2 0 0

 . Thus, Z0 PX Z =
0 3 2

0 3 2


0 3 2


0 3 2
0 3 2
c
Copyright 2012
Dan Nettleton (Iowa State University)


9 6 0 0




1 6 4 0 0
5
.
0 0 9 6
0 0 6 4
Statistics 611
26 / 32

3

0
Z0 P Z Z = Z0 Z = 
0

0

0 0 0

2 0 0
.
0 3 0

0 0 2
Thus,
tr(Z0 A1 Z) = tr(Z0 (PZ − PX )Z)
c
Copyright 2012
Dan Nettleton (Iowa State University)
= tr(Z0 PZ Z − Z0 PX Z)
= tr(Z0 Z) − tr(Z0 PX Z)
26
= 4.8.
= 10 −
5
Statistics 611
27 / 32
tr(Z0 A2 Z) = tr(Z0 (I − PZ )Z)
= tr(Z0 (Z − PZ Z))
= tr(Z0 (Z − Z))
= 0.
tr(A1 ) = tr(PZ − PX ) = tr(PZ ) − tr(PX )
= rank(Z) − rank(X) = 4 − 2 = 2.
tr(A2 ) = tr(I − PZ ) = 10 − 4 = 6.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
28 / 32
Find expressions for the ANOVA estimators of σ̂p2 and σ̂e2 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
29 / 32
The ANOVA/Method of Moments equations are
"
tr(Z0 (PZ − PX )Z) tr(PZ − PX )
tr(Z0 (I − PZ )Z)
"
#" #
4.8 2 σ̂p2
0
σ̂e2
6
#" #
σ̂p2
tr(I − PZ )
σ̂e2
"
=
#
y0 (PZ − PX )y
y0 (I − PZ )y
⇐⇒
" P P
2
2
nij (ȳij· − ȳi·· )2
= P2 i=1
P2 j=1
Pnij
2
k=1 (yijk − ȳij· )
i=1
j=1
#
⇐⇒
σ̂e2 =
P2
i=1
P2
j=1
Pnij
k=1 (yijk −ȳij· )
6
c
Copyright 2012
Dan Nettleton (Iowa State University)
2
, σ̂p2 =
P2
i=1
P2
2
2
j=1 nij (ȳij· −ȳi·· ) −2σ̂e
4.8
.
Statistics 611
30 / 32
The ANOVA estimates of the variance components are
sometimes equal to the REML estimates.
This equality occurs for certain types of balanced designs and
when the ANOVA estimates are positive.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
31 / 32
For the seedling dry weight example, the REML and ANOVA
estimates agree when the latter are positive.
However, if last pot contained a 3 seedlings instead of 2 (for
example), the REML and ANOVA estimates would differ.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
32 / 32
Download