ANOVA Variance Component Estimation c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 32 We now consider the ANOVA approach to variance component estimation. The ANOVA approach is based on the method of moments. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 2 / 32 Suppose y = Xβ + Zu + e, where u and e are independent random vectors satisfying E(u) = 0 and c Copyright 2012 Dan Nettleton (Iowa State University) E(e) = 0. Statistics 611 3 / 32 Furthermore, suppose n×q Z can be partitioned as Z = [ Z1 , . . . , Zm ] n×q1 n×qm and u can be partitioned correspondingly as u1 . . u= . um so that Zu = c Copyright 2012 Dan Nettleton (Iowa State University) m X Zj uj . j=1 Statistics 611 4 / 32 Suppose u1 . 2 0 2 0 . Var(u) = Var . = diag(σ1 1q1 , . . . , σm 1qm ) um σ12 I 0 q1 ×q1 ... = . 0 σm2 I qm ×qm c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 5 / 32 Then Var(y) = m+1 X σj2 Zj Z0j , j=1 2 where Zm+1 ≡n×n I and σm+1 = σe2 . By Lemma 4.1, E(y0 Ay) = tr(AVar(y)) + [E(y)]0 AE(y) m+1 X = σj2 tr(AZj Z0j ) + β 0 X0 AXβ j=1 = m+1 X σj2 tr(Z0j AZj ) + β 0 X0 AXβ. j=1 c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 6 / 32 Now suppose we choose a set of matrices A1 , . . . , Am+1 3 X0 Ai X = 0 ∀ i = 1, . . . , m + 1. Then 0 E(y Ai y) = m+1 X σj2 tr(Z0j Ai Zj ) ∀ i = 1, . . . , m + 1. j=1 c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 7 / 32 If we use the method of moments of moments, we replace E(y0 Ai y) with its observed value y0 Ai y to obtain the equations 0 y Ai y = m+1 X σ̂j2 tr(Z0j Ai Zj ) ∀ i = 1, . . . , m + 1. j=1 c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 8 / 32 We can write these equations in matrix form as tr(Z01 A1 Z1 ) ... tr(Z0m+1 A1 Zm+1 ) σ̂12 y0 A1 y tr(Z01 A2 Z1 ) . . . tr(Z0m+1 A2 Zm+1 ) σ̂22 y0 A2 y = .. .. .. .. .. . . . . . . 0 0 2 tr(Z1 Am+1 Z1 ) . . . tr(Zm+1 Am+1 Zm+1 ) σ̂m+1 y0 Am+1 y 2 Solving for σ̂12 , σ22 , . . . , σ̂m+1 gives the ANOVA estimates. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 9 / 32 The matrices A1 , . . . , Am+1 are usually chosen so that y0 A1 y, . . . , y0 Am+1 y correspond to sums of squares from an ANOVA table. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 10 / 32 Many strategies for choosing A1 , . . . , Am+1 have been proposed. The book Variance Components by Searle, Casella, and McCulloch contains an extensive discussion of several strategies. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 11 / 32 Let M denote the matrix whose (i, j)th element is tr(Z0j Ai Zj ). Let s = [y0 A1 y, y0 A2 y, . . . , y0 Am+1 y]0 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 12 / 32 If M is nonsingular, then the vector of ANOVA variance component estimates is σ̂12 . −1 . σ̂ 2 ≡ . = M s. 2 σ̂m+1 c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 13 / 32 Recall M and s were chosen so that Mσ 2 = E(s), where σ12 . . σ2 = . . 2 σm+1 Thus, E(σ̂ 2 ) = E(M−1 s) = M−1 E(s) = M−1 Mσ 2 = σ 2 . ∴ the ANOVA estimator of σ 2 is unbiased. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 14 / 32 Find ANOVA-based estimates of σp2 and σe2 for the seedling dry weight example. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 15 / 32 Recall 1 1 1 1 1 X= 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 , 0 1 0 1 0 1 0 1 0 1 c Copyright 2012 Dan Nettleton (Iowa State University) 1 1 1 0 0 Z= 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 . 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 Statistics 611 16 / 32 Because there is only one variance component besides the error variance, we have m = 1, Z1 = Z, and Z2 = I. We need to identify matrices A1 and A2 such that X0 A1 X = 0 and X0 A2 X = 0. Of course, we also require the quadratic forms y0 A1 y and y0 A2 y to contain information about σp2 and σe2 . To find appropriate A1 and A2 , start by writing down an ANOVA table with columns labeled Source, Matrix, and DF. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 17 / 32 Source Matrix DF Intercept P1 1 Genotype PX − P1 2−1=1 Pot(Genotype) PZ − PX 4−2=2 Error I − PZ 10 − 4 = 6 Total I 10 c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 18 / 32 Which of the matrices in the ANOVA table satisfy X0 AX = 0? c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 19 / 32 Let A1 = PZ − PX and A2 = I − PZ . ∵ C(X) ⊂ C(Z), PZ X = X. ∴ X0 (PZ − PX )X = X0 (PZ X − PX X) = X0 (X − X) = 0. Also, X0 (I − PZ )X = X0 (X − PZ X) = X0 (X − X) = 0. ∴ X0 Ai X = 0 ∀ i = 1, 2. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 20 / 32 To apply the ANOVA variance component estimation method, we need to set up the equations " #" # tr(Z01 A1 Z1 ) tr(Z02 A1 Z2 ) σ̂p2 tr(Z01 A2 Z1 ) tr(Z02 A2 Z2 ) σ̂e2 " = # y0 A1 y y0 A2 y ⇐⇒ " tr(Z0 (PZ − PX )Z) tr(PZ − PX ) tr(Z0 (I − PZ )Z) tr(I − PZ ) c Copyright 2012 Dan Nettleton (Iowa State University) #" # σ̂p2 σ̂e2 " = # y0 (PZ − PX )y y0 (I − PZ )y . Statistics 611 21 / 32 Let nij = number of seedlings in the jth pot for genotype i. Thus, we have n11 = 3, n12 = 2, n21 = 3, n22 = 2. Write y0 A1 y = y0 (PZ − PX )y and y0 A2 y = y0 (I − PZ )y using summation notation. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 22 / 32 y0 A1 y = y0 (PZ − PX )y = k(PZ − PX )yk2 = kPZ y − PX yk2 2 ȳ11· 1 ȳ 1 1·· 3×1 3×1 ȳ12· 1 ȳ1·· 1 2×1 2×1 = ȳ 1 − ȳ 1 21· 3×1 2·· 3×1 ȳ22· 1 ȳ 1 2·· 2×1 2×1 = 3(ȳ11· − ȳ1·· )2 + 2(ȳ12· − ȳ1·· )2 +3(ȳ21· − ȳ2·· )2 + 2(ȳ22· − ȳ2·· )2 2 X 2 X = nij (ȳij· − ȳi·· )2 . i=1 j=1 c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 23 / 32 y0 A2 y = y0 (I − PZ )y = k(I − PZ )yk2 = ky − PZ yk2 nij 2 X 2 X X = (yijk − ȳij· )2 . c Copyright 2012 Dan Nettleton (Iowa State University) i=1 j=1 k=1 Statistics 611 24 / 32 Find tr(Z0 (PZ − PX )Z), tr(Z0 (I − PZ )Z), tr(PZ − PX ), and tr(I − PZ ). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 25 / 32 3 3 3 3 3 Note that PX Z = 15 0 0 0 0 0 2 0 0 2 0 0 2 0 0 2 0 0 2 0 0 . Thus, Z0 PX Z = 0 3 2 0 3 2 0 3 2 0 3 2 0 3 2 c Copyright 2012 Dan Nettleton (Iowa State University) 9 6 0 0 1 6 4 0 0 5 . 0 0 9 6 0 0 6 4 Statistics 611 26 / 32 3 0 Z0 P Z Z = Z0 Z = 0 0 0 0 0 2 0 0 . 0 3 0 0 0 2 Thus, tr(Z0 A1 Z) = tr(Z0 (PZ − PX )Z) c Copyright 2012 Dan Nettleton (Iowa State University) = tr(Z0 PZ Z − Z0 PX Z) = tr(Z0 Z) − tr(Z0 PX Z) 26 = 4.8. = 10 − 5 Statistics 611 27 / 32 tr(Z0 A2 Z) = tr(Z0 (I − PZ )Z) = tr(Z0 (Z − PZ Z)) = tr(Z0 (Z − Z)) = 0. tr(A1 ) = tr(PZ − PX ) = tr(PZ ) − tr(PX ) = rank(Z) − rank(X) = 4 − 2 = 2. tr(A2 ) = tr(I − PZ ) = 10 − 4 = 6. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 28 / 32 Find expressions for the ANOVA estimators of σ̂p2 and σ̂e2 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 29 / 32 The ANOVA/Method of Moments equations are " tr(Z0 (PZ − PX )Z) tr(PZ − PX ) tr(Z0 (I − PZ )Z) " #" # 4.8 2 σ̂p2 0 σ̂e2 6 #" # σ̂p2 tr(I − PZ ) σ̂e2 " = # y0 (PZ − PX )y y0 (I − PZ )y ⇐⇒ " P P 2 2 nij (ȳij· − ȳi·· )2 = P2 i=1 P2 j=1 Pnij 2 k=1 (yijk − ȳij· ) i=1 j=1 # ⇐⇒ σ̂e2 = P2 i=1 P2 j=1 Pnij k=1 (yijk −ȳij· ) 6 c Copyright 2012 Dan Nettleton (Iowa State University) 2 , σ̂p2 = P2 i=1 P2 2 2 j=1 nij (ȳij· −ȳi·· ) −2σ̂e 4.8 . Statistics 611 30 / 32 The ANOVA estimates of the variance components are sometimes equal to the REML estimates. This equality occurs for certain types of balanced designs and when the ANOVA estimates are positive. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 31 / 32 For the seedling dry weight example, the REML and ANOVA estimates agree when the latter are positive. However, if last pot contained a 3 seedlings instead of 2 (for example), the REML and ANOVA estimates would differ. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 32 / 32