Methodologies for Population/Quantitative Genetics Animal Science 562 Mixed Models A. INTRODUCTION Models can take various forms. In general, there are 1. True models - unknown, but perfectly describe the data 2. Ideal models - the best model that a researcher can define to represent the true situation 3. Operational models - the model used for the analysis. This model may be limited because of shortcomings in the data or even our computer facilities. B. THE GENERAL LINEAR MODEL A model is composed of three parts. They are: 1. Mathematical equations, 2. Expected values and covariance matrices of all random variables, and 3. Assumptions, restrictions and limiting factors which affect the sample of data, or the conduct of the analysis 1. Mathematical equations y = Xb + Zu + e where y is a vector of observations of length N, s b is an unknown fixed vector of length p. If there are s fixed effects, then p = ∑ b i i =1 u is an unobservable random vector of length q. If there are t random factors, then t q = ∑ ui i =1 e is a unobservable random vector of length N, and X, Z are known matrices relating the observations to vector elements. 2. Expectations and covariance matrices The expected value of all random variables should be specified. The random variables in the general linear model are y, u, e, and w. w is an unknown vector of order mx1, assumed to be random, which has non-zero covariances with either u or e or both. At this point, we merely mention the existence of w, a random vector not directly associated with the linear model for y. 1 Animal Science 562 Mixed Models However, the researcher may want to know something about w from the analysis of y. ⎡ y ⎤ ⎡ Xb + ZE(u) + E(e)⎤ ⎢u ⎥ ⎢ ⎥ Pb ⎢ ⎥ ⎢ ⎥ E = ⎢e⎥ ⎢ ⎥ Tb ⎢ ⎥ ⎢ ⎥ wb ⎣w ⎦ ⎣ ⎦ We may always write E(u) = c, and then define c as c = Pb for some P. Difficulties arise in subsequent analyses when P has to be specified explicitly. Thus it is common and expedient ⎡u ⎤ ⎡0⎤ to assume E ⎢ ⎥ = ⎢ ⎥ . Then E(y)=Xb. ⎣ e ⎦ ⎣0 ⎦ Covariance matrices or variance-covariance (VCV) matrices of random variables are matrices that contain variances along the diagonal and covariances in the off-diagonal. In terms of the random variable of the model, the following notation is used: ZG + S ZS + R ZA + B ⎤ ⎡y⎤ ⎡ V ⎢ u ⎥ ⎢ GZ' + S' G S A ⎥⎥ V⎢ ⎥ = ⎢ ⎢ e ⎥ ⎢ S' Z' + R S' R B ⎥ ⎥ ⎢ ⎥ ⎢ A' B' H ⎦ ⎣w ⎦ ⎣ A' Z' + B' The variance-covariance matrices are: V(y) = V(Xb + Zu + e) =V(Xb) + V(Zu + e) = 0 + V(Zu + e) since b is fixed = V(Zu) + V(e) + Cov(Zu,e') + Cov(e,u'Z') = ZV(u)Z' + V(e) + Cov(Zu,e') + Cov(e,u'Z') = ZGZ' +R + ZS + S'Z' Hence, V(u) = G, a general matrix that can take various forms depending on the assumptions we are willing to make. Some typical forms might be A σ s2 or I σ s2 . V(e) = R, another general matrix that can take various forms and will frequently be assumed equal to I σ e2 . Cov(Zu,e') = ZCov(u,e') = ZS, which equals zero if we are willing to assume S=0. At this point we have the opportunity to work a little biology of the population into the model to illustrate the general flexibility. Think of an example where Cov(u,e) ≠ 0? Doesn't this 2 Animal Science 562 Mixed Models occur when we have preferential treatment given to the progeny of certain sires? Hence, V(y) = ZGZ' + R. One should describe the structure of G and R since this may affect the computer programming strategy. 3. Assumptions, restrictions, and/or limiting factors These are usually requirements imposed on the model because of shortcomings in the data, but could also describe limitations on the length of certain vectors. This is the place where assumptions about the distribution of the parameters are made, and the analyst has the opportunity to explain assumptions leading to the operational model. For example, 1. In using all lactation records for sire evaluation, a restriction on y is that all first lactation records on a cow must be present. This also affects the number of cows in the analysis. In addition, all cows with first lactation records are present. That is to say, the cows represented are a random sample from the population. 2. A common restriction for variance component estimation is to limit the analysis to only sires with five or more daughters in at least two different herds. 3. A restriction on the model would be possible if some a priori knowledge was known about treatment effects or group differences. In summary, all 3 parts are necessary to determine the validity of the analysis and to be able to interpret the results clearly. C. WRITING MODELS 1. Developing equations BLUP: one-way random yij = µ + ai + eij 2 2 R = I σ e2 / σ 2 , b = µ, u' = (a1 a2 a3), G = I σa / σ . 2 2 Many are not familiar with the expressions σ e2 / σ 2 and σa / σ . This is merely saying that we might not know the population variance σ 2 , but we know σ a2 and σ e2 relative to the true population variance. The general form of mixed model equations is: ⎡ X' R −1X X' R −1Z ⎤ ⎢ −1 −1 −1 ⎥ ⎣⎢ Z' R X Z' R Z + G ⎦⎥ R = I σ e2 / σ 2 , 3 ⎡bˆ ⎤ ⎡ X' R −1y ⎤ ⎢ ⎥=⎢ −1 ⎥ ⎣aˆ ⎦ ⎣⎢ Z' R y ⎦⎥ Animal Science 562 Mixed Models ⎡ X' Xσ 2 / σ e2 ∴ ⎢ 2 2 ⎢⎣ Z' Xσ / σ e ⎤ ⎡bˆ ⎤ ⎡ X' yσ 2 / σ e2 ⎤ ⎥ ⎥⎢ ⎥=⎢ Z' Zσ 2 / σ e2 + Iσ 2 / σ a2 ⎥⎦ ⎣aˆ ⎦ ⎢⎣ Z' yσ 2 / σ e2 ⎥⎦ X' Zσ 2 / σ e2 Multiply all equations by σ e2 / σ 2 , and the re-expression of the mixed model equations, based on the assumptions, is X' Z ⎡ X' X ⎤ ⎡bˆ ⎤ ⎡ X' y ⎤ ⎢ Z ' X Z ' Z + Iσ 2 / σ 2 ⎥ ⎢ ⎥ = ⎢ Z ' y ⎥ e a ⎦ ⎣aˆ ⎦ ⎣ ⎦ ⎣ 2. Examples of writing models Example A The equation of the model is yij = µ + ai + eij where yij is the weaning weight of a calf in the ith age group, µ is the overall mean for all calves in all age groups, ai is an age of calf effect, and eij is a random error effect associated with each observation. The expectation and VCV matrix are: E(yij) = µ + ai E(eij) = 0 V(eij) = V(yij) = σ i2 This means that age groups were assumed to be fixed and that observations in age group i have a different variance than observations in age group j. Furthermore, assume that Cov (eij, eij') = ¼ σ i2 This is the covariance among observations in the same age group. and Cov (ei'j', eij) = 0 This is the covariance among two observations in different age groups. In terms of the general linear model, how do these values relate to R and G? In this model, Zu and hence G are nonexistent because there are no random factors other than eij in the equation. If we have 3 age groups with 3, 2, and 4 observations each, then e' = (e11 e12 e13 e21 e22 e31 e32 e33 e34) 4 Animal Science 562 Mixed Models 0 ⎤ ⎡R1 0 ⎢ and V (e) = ⎢ 0 R 2 0 ⎥⎥ ⎢⎣ 0 0 R 3 ⎥⎦ There is no covariance between error terms on calves in different age groups, but within age group i, ⎡1 1 L 1 ⎤ 4 4⎥ ⎢ ⎢1 1 L 1⎥ 2 4 ⎥ σi Ri = ⎢4 M M O M⎥ ⎢ 1 1 ⎢ L 1⎥ ⎣4 4 ⎦ The order of which depends on the number of observations in that age group, ni. Matrices of this structure can be inverted readily (Searle, 1966). Note that Ri = .25J + .75 I where J= 11'. The inverse to Ri is then ⎡ ⎤ .25 ⎡ 1 ⎤ R i−1 = ⎢ ⎥ I − ⎢ ⎥J ⎣ .75 ⎦ ⎣ .75 (.75 + .25n i ) ⎦ One should take advantage of these shortcuts whenever possible. The assumptions, restrictions and limitations of this model might be a) the true difference between age group 1 and 2 is 60kg, or b) the sum of the age group solutions is zero. There are two kinds of restrictions. They are a) restrictions on the true parameters, and b) restrictions on the solutions. Both types of restrictions will influence estimation and hypothesis testing. Any more comments about this example are difficult to make. As mentioned earlier, this part of the model is very specific to the data and conditions which give rise to the data. Example B The equation of this model is yij = µ + ai + eij where yij µ is the weaning weight of the jth progeny of sire i, is the mean weaning weight for all calves, 5 Animal Science 562 Mixed Models ai eij is a random effect due to the sire, who may have several progeny, and is a random error effect associated with each weight. Now, b = [u] u' = (a1 a2 ... aq) The expectations are: E(yij) = µ ≡ E(y) = 1µ E(aij) = 0 ≡ E(u) = 0 E(eij) = 0 ≡ E(e) = 0 ⎡u ⎤ ⎡G 0 ⎤ ⎡Iσ a2 V⎢ ⎥ = ⎢ ⎥=⎢ ⎣ e ⎦ ⎣ 0 R ⎦ ⎣⎢ 0 0 ⎤ ⎥ Iσ e2 ⎦⎥ Recall that V = ZGZ' + R. If we have three sires, seven calves from these sires, and let ⎡1 1 0 0 0 0 0 ⎤ Z' = ⎢⎢0 0 1 1 0 0 0⎥⎥ ⎢⎣0 0 0 0 1 1 1⎥⎦ Then, ⎡1 ⎢1 ⎢ ⎢0 ⎢ ZGZ' = ⎢0 ⎢0 ⎢ ⎢0 ⎢0 ⎣ and 0 0 1 1 0 0 0 0⎤ 0⎥⎥ 0⎥ ⎥ 0⎥ 1⎥ ⎥ 1⎥ 1⎥⎦ ⎡ A1 V = ⎢⎢ 0 ⎢⎣ 0 0 A2 0 ⎡σ a2 ⎢ ⎢0 ⎢0 ⎣⎢ 0 σ a2 0 ⎡1 ⎢1 ⎢ 0 ⎤ ⎡1 1 0 0 0 0 0⎤ ⎢0 ⎥ ⎢ 0 ⎥ ⎢⎢0 0 1 1 0 0 0⎥⎥ = ⎢0 σ a2 ⎥⎥ ⎢⎣0 0 0 0 1 1 1⎥⎦ ⎢0 ⎦ ⎢ ⎢0 ⎢0 ⎣ 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0⎤ 0⎥⎥ 0⎥ ⎥ 0⎥ σ a2 1⎥ ⎥ 1⎥ 1⎥⎦ 0 ⎤ 0 ⎥⎥ A 3 ⎥⎦ where A1 is of order 2x2 and A1 = σ a2 J + σ e2 I. Thus, V-1 can be easily calculated for this example, or for any model of this class with any number of sires and calves per sire. There are no special assumptions, restrictions or limitations to discuss with this example. 6