Best Linear Unbiased Prediction (BLUP) c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 71 Best Linear Unbiased Prediction (BLUP): Suppose y = Xβ + ε, where E(ε) = 0, and Var(ε) = Σ = σ 2 V. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 2 / 71 Initially, we will assume the Aitken model holds so that σ 2 > 0 is unknown and V is a known symmetric and PD matrix. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 3 / 71 Suppose u is a random variable with mean 0 and finite variance. A linear predictor d + a0 y of c0 β + u is unbiased iff E(d + a0 y) = E(c0 β + u) = c0 β c Copyright 2012 Dan Nettleton (Iowa State University) ∀ β ∈ Rp . Statistics 611 4 / 71 c0 β + u is predictable iff ∃ an unbiased linear predictor of c0 β + u. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 5 / 71 Note that c0 β + u is predictable ⇐⇒ ∃ a 3 c0 = a0 X. This follows from the same argument used to show that c0 β estimable ⇐⇒ ∃ a 3 c0 = a0 X. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 6 / 71 Also, d + a0 y is an unbiased predictor of c0 β + u iff d=0 and c0 = a0 X, as argued for the case of an unbiased estimator. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 7 / 71 Suppose ŵ is a predictor of a random variable w. The mean squared error (MSE) of ŵ as a predictor of w is E(ŵ − w)2 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 8 / 71 Suppose Cov(ε, u) = σ 2 k for some known vector k. Suppose c0 β + u is predictable. Prove that c0 β̂ GLS + û has lowest MSE among all unbiased linear predictors of c0 β + u, where û ≡ [Cov(ε, u)]0 [Var(ε)]−1 (y − Xβ̂ GLS ) = k0 V −1 (y − Xβ̂ GLS ). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 9 / 71 Proof: First note that c0 β̂ GLS + û = c0 β̂ GLS + k0 V −1 (y − Xβ̂ GLS ) has expectation c0 β + k0 V −1 (Xβ − Xβ) = c0 β. Thus, c0 β̂ GLS + û is an unbiased predictor of c0 β + u. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 10 / 71 Now note that c0 β̂ GLS + û = c0 β̂ GLS + k0 V −1 (y − Xβ̂ GLS ) = (c0 − k0 V −1 X)β̂ GLS + k0 V −1 y = (c0 − k0 V −1 X)(X0 V −1 X)− X0 V −1 y + k0 V −1 y = [(c0 − k0 V −1 X)(X0 V −1 X)− X0 V −1 + k0 V −1 ]y ≡ b0 y (a linear estimator.) c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 11 / 71 Because c0 β̂ GLS + û = b0 y is an unbiased predictor, we know c Copyright 2012 Dan Nettleton (Iowa State University) b0 X = c0 . Statistics 611 12 / 71 Let a0 y be any other linear unbiased predictor of c0 β + u. Then a0 X = c0 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 13 / 71 The MSE of a0 y is E[a0 y − (c0 β + u)]2 = Var[a0 y − (c0 β + u)] c Copyright 2012 Dan Nettleton (Iowa State University) = Var(a0 y − u) = Var(a0 y − b0 y − u + b0 y) = Var[(a − b)0 y] + Var(b0 y − u) +2Cov[(a − b)0 y, b0 y − u]. Statistics 611 14 / 71 Now Cov[(a − b)0 y, b0 y − u] = (a − b)0 Var(y)b − Cov[(a − b)0 y, u] = (a − b)0 σ 2 Vb − (a − b)0 Cov(y, u) = σ 2 (a − b)0 Vb − (a − b)0 Cov(ε, u) = σ 2 (a − b)0 Vb − σ 2 (a − b)0 k = σ 2 (a − b)0 (Vb − k). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 15 / 71 Now Vb − k = V[V −1 X[(X0 V −1 X)− ]0 (c − X0 V −1 k) + V −1 k] − k = X[(X0 V −1 X)− ]0 (c − X0 V −1 k). Thus, the covariance term is σ 2 (a − b)0 X[(X0 V −1 X)− ]0 (c − X0 V −1 k), which is 0 because a0 X = b0 X = c0 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 16 / 71 Thus, we have MSE(a0 y) = Var[(a − b)0 y] + Var(b0 y − u) = Var[(a − b)0 y] + Var[b0 y − (c0 β + u)] = Var[(a − b)0 y] + E[b0 y − (c0 β + u)]2 = Var[(a − b)0 y] + MSE(b0 y). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 17 / 71 Thus, MSE(a0 y) ≥ MSE(b0 y) with equality iff Var[(a − b)0 y] = 0 ⇐⇒ a = b. Thus c0 β̂ GLS + û is the unique best linear unbiased predictor (BLUP) of c0 β + u. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 18 / 71 In practice, Σ = σ2V involves unknown variance components in addition to the unknown σ 2 . We replace unknown variance components in c0 β̂ GLS + û by the estimates to obtain “empirical” BLUPs (eBLUPs). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 19 / 71 This typically results in a nonlinear predictor whose properties are not so well characterized. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 20 / 71 Example: Suppose yij is the average monthly milk production of the jth daughter of sire i. Suppose yij = µ + si + eij for i = 1, . . . , t; j = 1, . . . , ni , where s1 , . . . , st are iid with mean 0 and variance σs2 , c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 21 / 71 independent of the eij terms, which are iid with mean 0 and variance σe2 . Suppose σs2 /σe2 is a known constant d. Find an expression for the BLUP of µ + s1 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 22 / 71 Let y = [y11 , . . . , y1n1 , y21 , . . . , y2n2 , . . . , ytnt ]0 . Let X =n×1 1 where n = Pt i=1 ni . Let β = [µ]. s1 + e11 .. . s1 + e1n1 s2 + e21 Let ε = . .. . s2 + e2n2 .. . st + etnt c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 23 / 71 Then Var(ε) is block diagonal with blocks of sizes n1 × n1 , n2 × n2 , . . . , nt × nt . Each block has σs2 + σe2 on the diagonal and σs2 on the off-diagonal. The ith block is σe2 I + σs2 110 = σe2 [I + d110 ]. ni ×ni c Copyright 2012 Dan Nettleton (Iowa State University) ni ×ni Statistics 611 24 / 71 We wish to predict c0 β + u, where c = [1], β = [µ] and u = s1 . σs2 1 n1 ×1 Cov(ε, u) = Cov(ε, s1 ) = 0 (n−n1 )×1 d 1 n ×1 = σe2 1 ≡ σe2 k. 0 (n−n1 )×1 c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 25 / 71 We need β̂ GLS , a solution to the Aitken equations X0 V −1 Xβ = X0 V −1 y, where V = diag( I + d 110 , . . . , I + d 110 ). n1 ×n1 0 I + d 11 ni ×ni ni ×ni c Copyright 2012 Dan Nettleton (Iowa State University) n1 ×n1 −1 = I − ni ×ni nt ×nt nt ×nt d 110 . 1 + dni ni ×ni Statistics 611 26 / 71 Thus, X0 V −1 X = 10 V −1 1 t X 0 = 1 I− d 0 11 1 1 + dn i i=1 t X dn2i = ni − 1 + dni i=1 = c Copyright 2012 Dan Nettleton (Iowa State University) t X i=1 ni . 1 + dni Statistics 611 27 / 71 0 −1 XV y = t X i=1 d 0 11 yi 1 I− 1 + dni 0 (where yi = [yi1 , . . . , yini ]0 ) t X dni yi· = yi· − 1 + dn i i=1 t X dni = 1− yi· 1 + dn i i=1 = c Copyright 2012 Dan Nettleton (Iowa State University) t X i=1 ni ȳi· . 1 + dni Statistics 611 28 / 71 Thus, β̂ GLS = µ̂GLS = (X0 V −1 X)− X0 V −1 y Pt ni i=1 1+dni ȳi· = Pt ni i=1 1+dn i Pt 2 2 −1 (σ /n i + σs ) ȳi· e = Pi=1 . t 2 2 −1 i=1 (σe /ni + σs ) c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 29 / 71 Note that each ȳi· is an unbiased estimator of µ with variance σe2 /ni + σs2 . Thus, µ̂GLS is a weighted average of independent unbiased estimators of µ, with inverse variances of estimators as weights. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 30 / 71 Now, ŝ1 = û = k0 V −1 (y − Xβ̂ GLS ) 0 −1 kV y = = = = c Copyright 2012 Dan Nettleton (Iowa State University) d 0 d1 I − 11 y1 1 + dn1 d2 n1 dy1· − y1· 1 + dn1 d + d 2 n1 − d 2 n1 y1· 1 + dn1 dn1 ȳ1· . 1 + dn1 0 Statistics 611 31 / 71 0 −1 k V Xβ̂ GLS d 0 = d1 I − 11 1µ̂GLS 1 + dn1 d2 n21 µ̂GLS = dn1 − 1 + dn1 dn1 µ̂GLS . = 1 + dn1 c Copyright 2012 Dan Nettleton (Iowa State University) 0 Statistics 611 32 / 71 Thus, the BLUP of µ + s1 is dn1 dn1 ȳ1· − µ̂GLS 1 + dn1 1 + dn1 dn1 dn1 = ȳ1· + 1 − µ̂GLS 1 + dn1 1 + dn1 σe2 /n1 σs2 ȳ1· + 2 µ̂GLS . = 2 σe /n1 + σs2 σe /n1 + σs2 µ̂GLS + c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 33 / 71 The mean squared error (MSE) of a predictor ŵ of a random vector w is MSE(ŵ) = E[(ŵ − w)(ŵ − w)0 ]. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 34 / 71 Now supposeq×1 u is a random vector with mean 0 and Cov(ε, u) = σ 2n×q K. The BLUP of predictable Cβ + u is Cβ̂ GLS + û, where û ≡ [Cov(ε, u)]0 [Var(ε)]−1 (y − Xβ̂ GLS ) = K 0 V −1 (y − Xβ̂ GLS ). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 35 / 71 Cβ̂ GLS + û is the “best” linear unbiased predictor in the sense that MSE(ŵ) − MSE(Cβ̂ GLS + û) is nonnegative definite ∀ ŵ, a linear unbiased predictor of Cβ + u. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 36 / 71 As for the case of univariate prediction, we replace unknown variance components in Cβ̂ GLS + û by their estimates to obtain an eBLUP whenever necessary. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 37 / 71 For example, suppose y = Xβ + Zu + e, E(u) = 0 Var(u) = G where E(e) = 0 Var(e) = R Cov(u, e) = 0. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 38 / 71 Then, with ε = Zu + e, we have Var(ε) = ZGZ0 + R and Cov(ε, u) = Cov(Zu + e, u) c Copyright 2012 Dan Nettleton (Iowa State University) = Cov(Zu, u) = ZCov(u, u) = ZVar(u) = ZG. Statistics 611 39 / 71 It follows that the BLUP of u is GZ0 (ZGZ0 + R)−1 (y − Xβ̂ GLS ). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 40 / 71 Similarly, if D is a known m × q matrix, the BLUP of Du is DGZ0 (ZGZ0 + R)−1 (y − Xβ̂ GLS ). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 41 / 71 When G and R are unknown, we use the eBLUP DĜZ0 (ZĜZ0 + R̂)−1 (y − Xβ̂ GLS ). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 42 / 71 Note that computing β̂ GLS or computing the BLUP of Du requires inversion of the n × n matrix ZGZ0 + R. This can be computationally expensive. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 43 / 71 We often assume that R = σe2 I, but even then ZGZ0 + σe2 I is a n × n matrix that may be difficult to invert. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 44 / 71 Until further notice, assume R = σe2 I for some unknown σe2 > 0. Define H = 1 G. σe2 Then Var(y) = ZGZ0 + σe2 I = σe2 (ZHZ0 + I) = σe2 V where V = ZHZ0 + I. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 45 / 71 C.R. Henderson introduced the Mixed Model Equations(MME) " X0 X X0 Z Z0 X H−1 + Z0 Z c Copyright 2012 Dan Nettleton (Iowa State University) #" # β̃ ũ " = # X0 y Z0 y . Statistics 611 46 / 71 Recall that the Aitken equations are X0 V −1 Xβ̃ = X0 V −1 y, which in our special case are equivalent to X0 (ZHZ0 + I)−1 Xβ̃ = X0 (ZHZ0 + I)−1 y. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 47 / 71 Henderson showed that (1) The MME are consistent. (2) If " # β̃ solve the MME, then β̃ is a solution to the AE ũ (equivalently, Xβ̃ = Xβ̂ GLS ) and ũ is the BLUP of u. (3) Conversely, if β̃ solves the AE, then the MME have a solution whose leading subvector is β̃. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 48 / 71 A nice thing about this result is that we can find β̂ GLS and the BLUP of u (or Du) without inverting the n × n matrix (ZGZ0 + σe2 I). " X0 X X0 Z # Z0 X H−1 + Z0 Z is (p + q) × (p + q). In some problems, p + q n. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 49 / 71 By (2), the BLUP of a predictable Cβ + Du is given by Cβ̃ + Dũ where " # β̃ ũ solves the MME. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 50 / 71 To prove Henderson’s results, we begin with the following lemma. Lemma H1: Suppose A is a symmetric and positive definite matrix. Suppose W is any matrix with number of columns equal to number of rows of A. Then AW 0 (I + WAW 0 )−1 = (A−1 + W 0 W)−1 W 0 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 51 / 71 Proof: First show that I + WAW 0 and A−1 + W 0 W are each nonsingular. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 52 / 71 ∀ x 6= 0, x0 (I + WAW 0 )x = x0 x + x0 WAW 0 x = ||x||2 + ||A1/2 W 0 x||2 ≥ ||x||2 > 0. Thus, I + WAW 0 is PD and ∴ nonsingular. ∀ x 6= 0, x0 (A−1 + W 0 W)x = x0 A−1 x + x0 W 0 Wx = x0 A−1 x + ||Wx||2 ≥ x0 A−1 x > 0 ∵ A PD =⇒ A−1 PD. ∴ A−1 + W 0 W is nonsingular. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 53 / 71 Now show that (A−1 + W 0 W)AW 0 = W 0 (I + WAW 0 ) and then complete the proof. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 54 / 71 (A−1 + W 0 W)AW 0 = A−1 AW 0 + W 0 WAW 0 c Copyright 2012 Dan Nettleton (Iowa State University) = W 0 + W 0 WAW 0 = W 0 (I + WAW 0 ). Statistics 611 55 / 71 Multiplying on the left by (A−1 + W 0 W)−1 gives AW 0 = (A−1 + W 0 W)−1 W 0 (I + WAW 0 ). Multiplying on the right by (I + WAW 0 )−1 gives the result. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 56 / 71 Now use Lemma H1 to prove Lemma H2: V −1 = (I + ZHZ0 )−1 = I − Z(H−1 + Z0 Z)−1 Z0 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 57 / 71 Proof: Applying the Lemma H1 with H = A and Z = W yields (H−1 + Z0 Z)−1 Z0 = HZ0 (I + ZHZ0 )−1 . (∗) Thus, [I − Z(H−1 + Z0 Z)−1 Z0 ](I + ZHZ0 ) = [I − ZHZ0 (I + ZHZ0 )−1 ](I + ZHZ0 ) by (∗) = I + ZHZ0 − ZHZ0 = I. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 58 / 71 Now let’s prove Henderson’s results beginning with (2). Suppose " # β̃ ũ solves the MME. Then X0 Xβ̃ + X0 Zũ = X0 y and Z0 Xβ̃ + (H−1 + Z0 Z)ũ = Z0 y. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 59 / 71 The 2nd equation =⇒ (H−1 + Z0 Z)ũ = Z0 y − Z0 Xβ̃ =⇒ ũ = (H−1 + Z0 Z)−1 Z0 (y − Xβ̃) =⇒ ũ = HZ0 (I + ZHZ0 )−1 (y − Xβ̃) by (∗) =⇒ ũ = HZ0 V −1 (y − Xβ̃) = GZ0 Σ−1 (y − Xβ̃). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 60 / 71 Now this last expression for ũ indicates that ũ is the BLUP for u as long as β̃ solves the Aitken equations. To show this, we examine the 1st MME. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 61 / 71 The 1st MME is X0 Xβ̃ + X0 Zũ = X0 y. We have shown ũ = (H−1 + Z0 Z)−1 Z0 (y − Xβ̃). Combining these equations gives X0 Xβ̃ + X0 Z(H−1 + Z0 Z)−1 Z0 (y − Xβ̃) = X0 y. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 62 / 71 By Lemma H2, Z(H−1 + Z0 Z)−1 Z0 = I − V −1 . Thus, X0 Xβ̃ + X0 (I − V −1 )(y − Xβ̃) = X0 y ⇐⇒ X0 Xβ̃ + X0 y − X0 Xβ̃ − X0 V −1 y + X0 V −1 Xβ̃ = X0 y ⇐⇒ X0 V −1 Xβ̃ = X0 V −1 y. ∴ β̃ solves the AE and Henderson’s result (2) follows. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 63 / 71 Now let’s prove (3). Suppose β̃ solves the AE. Let ũ = HZ0 V −1 (y − Xβ̃) = HZ0 (I + ZHZ0 )−1 (y − Xβ̃) = (H−1 + Z0 Z)−1 Z0 (y − Xβ̃) by (∗) =⇒ (H−1 + Z0 Z)ũ = Z0 (y − Xβ̃) = Z0 y − Z0 Xβ̃ =⇒ Z0 Xβ̃ + (H−1 + Z0 Z)ũ = Z0 y. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 64 / 71 ∴ 2nd MME is satisfied. Now X0 V −1 Xβ̃ = X0 V −1 y =⇒ X0 [I − Z(H−1 + Z0 Z)Z0 ]Xβ̃ = X0 [I − Z(H−1 + Z0 Z)Z0 ]y (By Lemma H2) c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 65 / 71 It follows that X0 Xβ̃ + X0 Z(H−1 + Z0 Z)−1 Z0 (y − Xβ̃) = X0 y. Now ũ = (H−1 + Z0 Z)−1 Z0 (y − Xβ̃) =⇒ X0 Xβ̃ + X0 Zũ = X0 y. ∴ The 1st MME is satisfied and result (3) follows. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 66 / 71 Result (1) is now easy to prove as follows. The AE are consistent. Thus, by Henderson’s result (3), the MME are consistent. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 67 / 71 Now suppose y = Xβ + Zu + e, where all is as before except that, instead of Var(e) = σe2 I, Var(e) = R, a symmetric, positive definite variance matrix. Let S = 1 R, σe2 where σe2 is a positive variance parameter. Consider the transformation S−1/2 y = S−1/2 Xβ + S−1/2 Zu + S−1/2 e. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 68 / 71 Then S−1/2 e ∼ N(0, σe2 I), and our previous results with S−1/2 y, S−1/2 X, and S−1/2 Z in place of y, X, and Z yield the mixed model equations " X0 S−1 X X0 S−1 Z #" # β̃ Z0 S−1 X H−1 + Z0 S−1 Z c Copyright 2012 Dan Nettleton (Iowa State University) ũ " = # X0 S−1 y Z0 S−1 y . Statistics 611 69 / 71 Multiplying both sides by 1/σe2 gives " X0 R−1 X X0 R−1 Z #" # β̃ Z0 R−1 X G−1 + Z0 R−1 Z ũ " = # X0 R−1 y Z0 R−1 y . This is the more general form of Henderson’s Mixed Model Equations. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 70 / 71 Finally, let Ĝ and R̂ denote G and R with unknown parameters replaced by their REML estimates. In practice, we solve " X0 R̂−1 X X0 R̂−1 Z #" # β̃ Z0 R̂−1 X Ĝ−1 + Z0 R̂−1 Z ũ " = # X0 R̂−1 y Z0 R̂−1 y , to get solutions needed for computation of BLUEs and BLUPs. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 71 / 71