Mixed Models

advertisement
Methodologies for Population/Quantitative Genetics
Animal Science 562
Mixed Models
A. INTRODUCTION
Models can take various forms. In general, there are
1. True models
- unknown, but perfectly describe the data
2. Ideal models
- the best model that a researcher can define to represent the true
situation
3. Operational models - the model used for the analysis. This model may be limited
because of shortcomings in the data or even our computer
facilities.
B. THE GENERAL LINEAR MODEL
A model is composed of three parts. They are:
1. Mathematical equations,
2. Expected values and covariance matrices of all random variables, and
3. Assumptions, restrictions and limiting factors which affect the sample of data, or the
conduct of the analysis
1. Mathematical equations
y = Xb + Zu + e
where
y is a vector of observations of length N,
s
b is an unknown fixed vector of length p. If there are s fixed effects, then p = ∑ b i
i =1
u is an unobservable random vector of length q. If there are t random factors, then
t
q = ∑ ui
i =1
e is a unobservable random vector of length N, and
X, Z are known matrices relating the observations to vector elements.
2. Expectations and covariance matrices
The expected value of all random variables should be specified. The random variables in
the general linear model are y, u, e, and w. w is an unknown vector of order mx1, assumed to be
random, which has non-zero covariances with either u or e or both. At this point, we merely
mention the existence of w, a random vector not directly associated with the linear model for y.
1
Animal Science 562
Mixed Models
However, the researcher may want to know something about w from the analysis of y.
⎡ y ⎤ ⎡ Xb + ZE(u) + E(e)⎤
⎢u ⎥ ⎢
⎥
Pb
⎢
⎥
⎢
⎥
E
=
⎢e⎥ ⎢
⎥
Tb
⎢ ⎥ ⎢
⎥
wb
⎣w ⎦ ⎣
⎦
We may always write E(u) = c, and then define c as c = Pb for some P. Difficulties arise
in subsequent analyses when P has to be specified explicitly. Thus it is common and expedient
⎡u ⎤ ⎡0⎤
to assume E ⎢ ⎥ = ⎢ ⎥ . Then E(y)=Xb.
⎣ e ⎦ ⎣0 ⎦
Covariance matrices or variance-covariance (VCV) matrices of random variables are
matrices that contain variances along the diagonal and covariances in the off-diagonal. In terms
of the random variable of the model, the following notation is used:
ZG + S ZS + R ZA + B ⎤
⎡y⎤ ⎡ V
⎢ u ⎥ ⎢ GZ' + S'
G
S
A ⎥⎥
V⎢ ⎥ = ⎢
⎢ e ⎥ ⎢ S' Z' + R
S'
R
B ⎥
⎥
⎢ ⎥ ⎢
A'
B'
H ⎦
⎣w ⎦ ⎣ A' Z' + B'
The variance-covariance matrices are:
V(y) = V(Xb + Zu
+ e)
=V(Xb) + V(Zu
+ e)
= 0
+ V(Zu
+ e)
since b is fixed
=
V(Zu)
+ V(e) + Cov(Zu,e') + Cov(e,u'Z')
=
ZV(u)Z' + V(e) + Cov(Zu,e') + Cov(e,u'Z')
=
ZGZ'
+R
+ ZS
+ S'Z'
Hence,
V(u) = G, a general matrix that can take various forms depending on the assumptions we are
willing to make. Some typical forms might be A σ s2 or I σ s2 .
V(e) = R, another general matrix that can take various forms and will frequently be assumed
equal to I σ e2 .
Cov(Zu,e') = ZCov(u,e') = ZS, which equals zero if we are willing to assume S=0.
At this point we have the opportunity to work a little biology of the population into the
model to illustrate the general flexibility. Think of an example where Cov(u,e) ≠ 0? Doesn't this
2
Animal Science 562
Mixed Models
occur when we have preferential treatment given to the progeny of certain sires?
Hence, V(y) = ZGZ' + R.
One should describe the structure of G and R since this may affect the computer
programming strategy.
3. Assumptions, restrictions, and/or limiting factors
These are usually requirements imposed on the model because of shortcomings in the
data, but could also describe limitations on the length of certain vectors. This is the place where
assumptions about the distribution of the parameters are made, and the analyst has the
opportunity to explain assumptions leading to the operational model. For example,
1. In using all lactation records for sire evaluation, a restriction on y is that all first
lactation records on a cow must be present. This also affects the number of cows in
the analysis. In addition, all cows with first lactation records are present. That is to
say, the cows represented are a random sample from the population.
2. A common restriction for variance component estimation is to limit the analysis to
only sires with five or more daughters in at least two different herds.
3. A restriction on the model would be possible if some a priori knowledge was known
about treatment effects or group differences.
In summary, all 3 parts are necessary to determine the validity of the analysis and to be
able to interpret the results clearly.
C. WRITING MODELS
1. Developing equations
BLUP: one-way random
yij = µ + ai + eij
2
2
R = I σ e2 / σ 2 , b = µ, u' = (a1 a2 a3), G = I σa / σ .
2
2
Many are not familiar with the expressions σ e2 / σ 2 and σa / σ . This is merely saying
that we might not know the population variance σ 2 , but we know σ a2 and σ e2 relative to the
true population variance.
The general form of mixed model equations is:
⎡ X' R −1X
X' R −1Z ⎤
⎢
−1
−1
−1 ⎥
⎣⎢ Z' R X Z' R Z + G ⎦⎥
R = I σ e2 / σ 2 ,
3
⎡bˆ ⎤ ⎡ X' R −1y ⎤
⎢ ⎥=⎢
−1 ⎥
⎣aˆ ⎦ ⎣⎢ Z' R y ⎦⎥
Animal Science 562
Mixed Models
⎡ X' Xσ 2 / σ e2
∴ ⎢
2
2
⎢⎣ Z' Xσ / σ e
⎤ ⎡bˆ ⎤ ⎡ X' yσ 2 / σ e2 ⎤
⎥
⎥⎢ ⎥=⎢
Z' Zσ 2 / σ e2 + Iσ 2 / σ a2 ⎥⎦ ⎣aˆ ⎦ ⎢⎣ Z' yσ 2 / σ e2 ⎥⎦
X' Zσ 2 / σ e2
Multiply all equations by σ e2 / σ 2 , and the re-expression of the mixed model equations, based
on the assumptions, is
X' Z
⎡ X' X
⎤ ⎡bˆ ⎤ ⎡ X' y ⎤
⎢ Z ' X Z ' Z + Iσ 2 / σ 2 ⎥ ⎢ ⎥ = ⎢ Z ' y ⎥
e
a ⎦ ⎣aˆ ⎦ ⎣
⎦
⎣
2. Examples of writing models
Example A The equation of the model is
yij = µ + ai + eij
where
yij
is the weaning weight of a calf in the ith age group,
µ
is the overall mean for all calves in all age groups,
ai
is an age of calf effect, and
eij
is a random error effect associated with each observation.
The expectation and VCV matrix are:
E(yij) = µ + ai
E(eij) = 0
V(eij) = V(yij) = σ i2
This means that age groups were assumed to be fixed and that observations in age group i
have a different variance than observations in age group j.
Furthermore, assume that
Cov (eij, eij') = ¼ σ i2
This is the covariance among observations in the same age
group.
and
Cov (ei'j', eij) = 0
This is the covariance among two observations in different
age groups.
In terms of the general linear model, how do these values relate to R and G? In this model, Zu
and hence G are nonexistent because there are no random factors other than eij in the equation.
If we have 3 age groups with 3, 2, and 4 observations each, then
e' = (e11 e12 e13 e21 e22 e31 e32 e33 e34)
4
Animal Science 562
Mixed Models
0 ⎤
⎡R1 0
⎢
and
V (e) = ⎢ 0 R 2 0 ⎥⎥
⎢⎣ 0
0 R 3 ⎥⎦
There is no covariance between error terms on calves in different age groups, but within age
group i,
⎡1 1 L 1 ⎤
4
4⎥
⎢
⎢1 1 L 1⎥ 2
4 ⎥ σi
Ri = ⎢4
M
M
O
M⎥
⎢
1
1
⎢
L 1⎥
⎣4 4
⎦
The order of which depends on the number of observations in that age group, ni. Matrices of this
structure can be inverted readily (Searle, 1966). Note that
Ri = .25J + .75 I
where J= 11'. The inverse to Ri is then
⎡
⎤
.25
⎡ 1 ⎤
R i−1 = ⎢ ⎥ I − ⎢
⎥J
⎣ .75 ⎦
⎣ .75 (.75 + .25n i ) ⎦
One should take advantage of these shortcuts whenever possible.
The assumptions, restrictions and limitations of this model might be
a) the true difference between age group 1 and 2 is 60kg, or
b) the sum of the age group solutions is zero.
There are two kinds of restrictions. They are
a) restrictions on the true parameters, and
b) restrictions on the solutions.
Both types of restrictions will influence estimation and hypothesis testing.
Any more comments about this example are difficult to make. As mentioned earlier, this
part of the model is very specific to the data and conditions which give rise to the data.
Example B
The equation of this model is
yij = µ + ai + eij
where
yij
µ
is the weaning weight of the jth progeny of sire i,
is the mean weaning weight for all calves,
5
Animal Science 562
Mixed Models
ai
eij
is a random effect due to the sire, who may have several progeny, and
is a random error effect associated with each weight.
Now,
b = [u]
u' = (a1 a2 ... aq)
The expectations are:
E(yij) = µ ≡ E(y) = 1µ
E(aij) = 0 ≡ E(u) = 0
E(eij) = 0 ≡ E(e) = 0
⎡u ⎤ ⎡G 0 ⎤ ⎡Iσ a2
V⎢ ⎥ = ⎢
⎥=⎢
⎣ e ⎦ ⎣ 0 R ⎦ ⎣⎢ 0
0 ⎤
⎥
Iσ e2 ⎦⎥
Recall that V = ZGZ' + R. If we have three sires, seven calves from these sires, and let
⎡1 1 0 0 0 0 0 ⎤
Z' = ⎢⎢0 0 1 1 0 0 0⎥⎥
⎢⎣0 0 0 0 1 1 1⎥⎦
Then,
⎡1
⎢1
⎢
⎢0
⎢
ZGZ' = ⎢0
⎢0
⎢
⎢0
⎢0
⎣
and
0
0
1
1
0
0
0
0⎤
0⎥⎥
0⎥
⎥
0⎥
1⎥
⎥
1⎥
1⎥⎦
⎡ A1
V = ⎢⎢ 0
⎢⎣ 0
0
A2
0
⎡σ a2
⎢
⎢0
⎢0
⎣⎢
0
σ a2
0
⎡1
⎢1
⎢
0 ⎤ ⎡1 1 0 0 0 0 0⎤ ⎢0
⎥
⎢
0 ⎥ ⎢⎢0 0 1 1 0 0 0⎥⎥ = ⎢0
σ a2 ⎥⎥ ⎢⎣0 0 0 0 1 1 1⎥⎦ ⎢0
⎦
⎢
⎢0
⎢0
⎣
1
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
1
0
0
0
0
0
0
0
1
1
1
0
0
0
0
1
1
1
0⎤
0⎥⎥
0⎥
⎥
0⎥ σ a2
1⎥
⎥
1⎥
1⎥⎦
0 ⎤
0 ⎥⎥
A 3 ⎥⎦
where A1 is of order 2x2 and A1 = σ a2 J + σ e2 I.
Thus, V-1 can be easily calculated for this example, or for any model of this class with
any number of sires and calves per sire.
There are no special assumptions, restrictions or limitations to discuss with this example.
6
Download