Module 9: Best Linear Unbiased Prediction
– Purelines
– Single-crosses
Best Linear Unbiased Prediction (BLUP)
•
Allows comparison of material from different populations evaluated in different environments
•
Makes use of all performance data available for each genotype, and accounts for the fact that some genotypes have been more extensively tested than others
•
Makes use of information about relatives in pedigree breeding systems
•
Provides estimates of genetic variances from existing data in a breeding program without the use of mating designs
Bernardo, Chapt. 11
BLUP History
• Initially developed by C.R. Henderson in the 1940’s
•
Most extensively used in animal breeding
• Used in crop improvement since the 1990’s, particularly in forestry
•
BLUP is a general term that refers to two procedures
– true BLUP – the ‘P’ refers to prediction in random effects models (where there is a covariance structure)
– BLUE – the ‘E’ refers to estimation in fixed effect models
(no covariance structure)
• “Best” means having minimum variance
• “Linear” means that the predictions or estimates are linear functions of the observations
•
Unbiased
– expected value of estimates = their true value
– predictions have an expected value of zero
(because genetic effects have a mean of zero)
Regression in matrix notation
Linear model
Parameter estimates
Y = X
+ ε b = (X’X) -1 X’Y
Source
Regression
Residual
Total df p n-p n
SS b’X’Y
Y’Y - b’X’Y
Y’Y
MS
MS
R
MS
E
BLUP Mixed Model in Matrix Notation
Observations
Design matrices
Y = X
+ Zu + e
Residual errors
Fixed effects Random effects
•
Fixed effects are constants
– overall mean
– environmental effects (mean across trials)
•
Random effects have a covariance structure
– breeding values
– dominance deviations
– testcross effects
– general and specific combining ability effects
Classification for the purposes of
BLUP
BLUP for purelines – barley example
Environments
Set 1
Cultivar Grain Yield t/ha
18 Morex (1) 4.45
Set 1
Set 1
Set 2
Set 2
Set 2
18
18 Stander (4)
9
9
9
Robust (2)
Robust (2)
Excel (3)
Stander (4)
4.61
5.27
5.00
5.82
5.79
Parameters to be estimated
• means for two sets of environments – fixed effects
– we are interested in knowing effects of these particular sets of environments
• breeding values of four cultivars – random effects
– from the same breeding population
– there is a covariance structure (cultivars are related)
Bernardo, pg 269
Linear model for barley example
Y ij
=
+ t i
+ u j
+ e ij t i u j
= effect of i th set of environments
= effect of j th cultivar
In matrix notation: Y = X
+ Zu + e
4.45
1 0
4.61
1 0
5.27
= 1 0 b
1
5.00
0 1 b
2
5.82
0 1
5.79
0 1
1 0 0 0
0 1 0 0 u
1
+ 0 0 0 1 u
2
0 1 0 0 u
3
0 0 1 0 u
4
0 0 0 1 e
11 e
12
+ e
14 e
22 e
23 e
24
Weighted regression
Y = X
+ ε b = (X’X) -1 X’Y
Where ε ij
~N (0, σ 2 )
When ε ij
~N (0, R σ 2 )
Then b = (X’R -1 X) -1 X’R -1 Y
For the barley example
18 0 0 0 0 0
0 18 0 0 0 0
R
-1
= 0 0 18 0 0 0
0 0 0 9 0 0
0 0 0 0 9 0
0 0 0 0 0 9
Covariance structure of random effects
XY
Morex 1
Robust
Excel
Stander
Morex Robust Excel Stander
1/2
1
7/16
27/32
1
11/32
43/64
91/128
1
Remember
XY
2
A
2
D r = 2
XY
2 A u
A
2
2
1
1
2
7/8 11/16
27/16 43/32
7/8 27/16 2 91/64
11/16 43/32 91/64 2
A
2
Mixed Model Equations
=
X’R -1 X
Z’R -1 X
X’R -1 Z
Z’R -1 Z + A -1 ( σ
ε
2 / σ
A
2 )
R σ 2
-1
X’R -1 Y
Z’R -1 Y
• each matrix is composed of submatrices
• the algebra is the same
Calculations in Excel
Results from BLUP
Original data
BLUP estimates
For fixed effects b
1 b
2
=
=
+ t
+ t
1
2
Environments
Set 1
Set 1
18
18
Cultivar
Morex
Robust
Grain Yield t/ha
4.45
4.61
Set 1
Set 2
Set 2
Set 2
18 Stander
9
9
9
Robust
Excel
Stander
5.27
5.00
5.82
5.79
1
2 u
1 u
2 u
3 u
4
Set 1
Set 2
Morex
Robust
Excel
Stander
4.82
5.41
-0.33
-0.17
0.18
0.36
Interpretation from BLUP
BLUP estimates
1
2 u
1 u
2 u
3 u
4
Set 1
Set 2
Morex
Robust
Excel
Stander
For a set of recombinant inbred lines from an F
2 cross of Excel x Stander
4.82
5.41
-0.33
-0.17
0.18
0.36
Predicted mean breeding value = ½(0.18+0.36) = 0.27
Shrinkage estimators
•
In the simplest case (all data balanced, the only fixed effect is the overall mean, inbreds unrelated)
BLUP (
i
)
h
2
Y i .
Y
..
•
If h 2 is high, BLUP values are close to the phenotypic values
•
If h 2 is low, BLUP values shrink towards the overall mean
•
For unrelated inbreds or families, ranking of genotypes is the same whether one uses BLUP or phenotypic values
Sampling error of BLUP
=
X’R
Z’R
-1
-1
X
X
X’R -1 Z
Z’R -1 Z + A -1 ( σ
ε
2 / σ
A
2 )
-1 invert the matrix R σ 2
X’R -1 Y
Z’R -1 Y coefficient matrix
C
11
C
21
C
12
C
22 each element of the matrix is a matrix
•
Diagonal elements of the inverse of the coefficient matrix can be used to estimate sampling error of fixed and random effects
Sampling error of BLUP
=
C
11
C
21
C
12
C
22
X’R -1
Y
Z’R -1
Y
2
C
11
2
fixed effects
2
22
2
random effects
Estimation of Variance Components
(would really need a larger data set)
1.
Use your best guess for an initial value of σ
ε
2 / σ
A
2
2.
Solve for
ˆ and û
3.
Use current solutions to solve for σ
ε
2
σ
A
2 and then for
4.
Calculate a new σ
ε
2 / σ
A
2
5.
Repeat the process until estimates converge
BLUP for single-crosses
Performance of a single cross:
G
B73,Mo17
= GCA
B73
+ GCA
Mo17
+ SCA
B73,Mo17
BLUP Model
Y = X
+ Ug
1
+ Wg
2
+ Ss + e
•
Sets of environments are fixed effects
•
GCA and SCA are considered to be random effects
Example in Bernardo, pg 277 from Hallauer et al., 1996
Performance of maize single crosses
Set Entry Pedigree
1 SC-1 B73 x Mo17
1
1
SC-2
SC-3
H123
B84
x
x
Mo17
N197
2
2
SC-2 H123 x Mo17
SC-3 B84 x N197
Grain Yield t ha
-1
7.85
7.36
5.61
7.47
5.96
Iowa Stiff Stalk x Lancaster Sure Crop
7.85
7.36
5.61
7.47
5.96
1 0
1 0 b
1
= 1 0 b
2
0 1
0 1
1 0 0
0 0 1 g
B73
+ 0 1 0 g
B84
0 0 1 g
H123
0 1 0
1 0
1 0 g
Mo17
+ 0 1 g
N197
1 0
0 1
1 0 0
0 1 0 s
1
+ 0 0 1 s
2
0 1 0 s
3
0 0 1 e
11 e
12
+ e
13 e
22 e
23
Covariance of single crosses
SC-X is j x k SC-Y is j’ x k’
Cov
SC
jj '
2
GCA ( 1 )
kk '
2
GCA ( 2 )
jj '
kk '
2
SCA
B73, B84, H123
G
1
=
1
B73,B84
B73,B84
B73,H123
1
B84,H123
B73,H123
B84,H123
1 g1
MO17, N197
G
2
=
1
Mo17,N197
Mo17,N197
1
G 2
1 GCA(1) assuming no epistasis
g2
G
2
2 GCA(2)
Covariance of single crosses
SC-X is j x k SC-Y is j’ x k’
Cov
SC
jj '
2
GCA ( 1 )
kk '
2
GCA ( 2 )
jj '
kk '
2
SCA
SC-1= B73 x MO17 SC-2= H123 x MO17 SC-3= B84 x N197
1
S =
B73,H123
Mo17,Mo17
B73,B84
Mo17,N197
B73,H123
Mo17,Mo17
1
B73,B84
Mo17,N197
B84,H123
Mo17,N197
B84,H123
Mo17,N197
1 s
S
2
SCA
Solutions
X'R
-1
X X'R
-1
U
U'R
-1
X U'R
-1
U +
Q
1
W'R
-1
X W'R
-1
U
Z'R
-1
X Z'R
-1
U
X'R
-1
W
U'R
-1
W
W'R
-1
W +
Q
2
Z'R
-1
W
X'R
-1
Z
U'R
-1
Z
W'R
-1
Z
Z'R
-1
Z +
Q
S
-1
X
X'R
-1
Y
U'R
-1
Y
W'R
-1
Y
Z'R
-1
Y
G
1
1
G
2
1
Q
S
S
1
/
/
/
2
GCA(1)
2
GCA(2)
2
SCA
b
1
b g
B
2
73
g
B 84 g
H 123 g
g
Mo 17
N
197 s
s s
1
2
3