BLUP for Purelines - Crop and Soil Science

advertisement
PBG 650 Advanced Plant Breeding
Module 9: Best Linear Unbiased Prediction
– Purelines
– Single-crosses
Best Linear Unbiased Prediction (BLUP)
•
Allows comparison of material from different
populations evaluated in different environments
•
Makes use of all performance data available for
each genotype, and accounts for the fact that some
genotypes have been more extensively tested than
others
•
Makes use of information about relatives in pedigree
breeding systems
•
Provides estimates of genetic variances from
existing data in a breeding program without the use
of mating designs
Bernardo, Chapt. 11
BLUP History
•
•
•
Initially developed by C.R. Henderson in the 1940’s
•
BLUP is a general term that refers to two
procedures
Most extensively used in animal breeding
Used in crop improvement since the 1990’s,
particularly in forestry
– true BLUP – the ‘P’ refers to prediction in random effects
models (where there is a covariance structure)
– BLUE – the ‘E’ refers to estimation in fixed effect models
(no covariance structure)
B-L-U
• “Best” means having minimum variance
• “Linear” means that the predictions or
estimates are linear functions of the
observations
• Unbiased
– expected value of estimates = their true value
– predictions have an expected value of zero
(because genetic effects have a mean of zero)
Regression in matrix notation
Y = X + ε
Linear model
Parameter estimates
b = (X’X)-1X’Y
Source
df
SS
MS
Regression
p
b’X’Y
MSR
Residual
n-p
Y’Y - b’X’Y
MSE
Total
n
Y’Y
BLUP Mixed Model in Matrix Notation
Design matrices
Y = X + Zu + e
Fixed effects Random effects
• Fixed effects are constants
– overall mean
– environmental effects (mean across trials)
•
Random effects have a covariance structure
–
–
–
–
breeding values
dominance deviations
testcross effects
general and specific combining ability effects
Classification
for the
purposes of
BLUP
BLUP for purelines – barley example
Environments
Set 1
18
Set 1
18
Set 1
18
Set 2
9
Set 2
9
Set 2
9
Cultivar
Grain Yield t/ha
Morex (1)
4.45
Robust (2)
4.61
Stander (4)
5.27
Robust (2)
5.00
Excel (3)
5.82
Stander (4)
5.79
Parameters to be estimated
• means for two sets of environments – fixed effects
– we are interested in knowing effects of these particular
sets of environments
• breeding values of four cultivars – random effects
– from the same breeding population
– there is a covariance structure (cultivars are related)
Bernardo, pg 269
Linear model for barley example
Yij =  + ti + uj + eij
ti = effect of ith set of environments
uj = effect of jth cultivar
Y = X + Zu + e
In matrix notation:
4.45
1 0
4.61
1 0
5.27 = 1 0
5.00
0 1
5.82
0 1
5.79
0 1
b1
b2
1
0
+ 0
0
0
0
0
1
0
1
0
0
0
0
0
0
1
0
0
0
1
0
0
1
u1
u2
u3
u4
e11
e12
+ e14
e22
e23
e24
Weighted regression
Y = X + ε
Where εij ~N (0, σ2)
b = (X’X)-1X’Y
For the barley example
When εij ~N (0,
Then b =
18 0 0
0 18 0
Rσ2)
(X’R-1X)-1X’R-1Y
R-1=
0
0
0
0
0
0
0
0
0
0
0 18 0
0 0 9
0 0 0
0 0 0
0
0
9
0
0
0
0
9
Covariance structure of random effects
Morex
Morex 1
Robust
Excel
Stander
XY
Excel
Stander
1/2
1
7/16
27/32
1
11/32
43/64
91/128
1
2
Cov XY  rA
Remember
 u  A A 
2
Robust
2
2
 D
r = 2XY
2
1
1
2
7/8
27/16
11/16
43/32
7/8
11/16
27/16
43/32
2
91/64
91/64
2
A
2
Mixed Model Equations
βˆ
uˆ
-1
X’R-1X
X’R-1Z
Z’R-1X
Z’R-1Z + A-1(σε2/σA2)
=
Rσ2
•
•
X’R-1Y
each matrix is composed of submatrices
the algebra is the same
Calculations in Excel
Z’R-1Y
Results from BLUP
Original data
BLUP estimates
For fixed effects
b1 =  + t1
b2 =  + t2
Environments
Set 1
18
Set 1
18
Set 1
18
Set 2
9
Set 2
9
Set 2
9
1
2
u1
u2
u3
u4
Cultivar
Morex
Robust
Stander
Robust
Excel
Stander
Grain Yield t/ha
4.45
4.61
5.27
5.00
5.82
5.79
Set 1
4.82
Set 2
5.41
Morex
-0.33
Robust
-0.17
Excel
0.18
Stander
0.36
Interpretation from BLUP
BLUP estimates
1
2
u1
u2
u3
u4
Set 1
4.82
Set 2
5.41
Morex
-0.33
Robust
-0.17
Excel
0.18
Stander
0.36
For a set of recombinant inbred lines
from an F2 cross of Excel x Stander
Predicted mean breeding value = ½(0.18+0.36) = 0.27
Shrinkage estimators
•
In the simplest case (all data balanced, the only
fixed effect is the overall mean, inbreds unrelated)


2
BLUP(i )  h Yi.  Y..

•
If h2 is high, BLUP values are close to the
phenotypic values
•
If h2 is low, BLUP values shrink towards the overall
mean
•
For unrelated inbreds or families, ranking of
genotypes is the same whether one uses BLUP or
phenotypic values
Sampling error of BLUP
βˆ
uˆ
X’R-1Z
Z’R-1X
Z’R-1Z + A-1(σε2/σA2)
X’R-1Y
=
coefficient matrix
C11
C21
Z’R-1Y
Rσ2
invert the matrix
•
-1
X’R-1X
C12
C22
each element of the
matrix is a matrix
Diagonal elements of the inverse of the coefficient matrix can
be used to estimate sampling error of fixed and random effects
Sampling error of BLUP
βˆ
=
uˆ
C11
C21
C12
C22
X’R-1Y
Z’R-1Y
2

 2  C112
fixed effects
2

 2  C222
random effects
Estimation of Variance Components
(would really need a larger data set)
1.
2.
3.
Use your best guess for an initial value of σε2/σA2
4.
5.
Calculate a new σε2/σA2
Solve for ˆ and û
Use current solutions to solve for σε2 and then for
σA2
Repeat the process until estimates converge
BLUP for single-crosses
Performance of a single cross:
GB73,Mo17 = GCAB73 + GCAMo17 + SCAB73,Mo17
BLUP Model
Y = X + Ug1 + Wg2 + Ss + e
•
•
Sets of environments are fixed effects
GCA and SCA are considered to be random effects
Example in Bernardo, pg 277 from Hallauer et al., 1996
Performance of maize single crosses
Set
1
1
1
2
2
Entry
SC-1
SC-2
SC-3
SC-2
SC-3
Grain Yield
Pedigree
t ha-1
B73 x Mo17
7.85
H123 x Mo17
7.36
B84 x N197
5.61
H123 x Mo17
7.47
B84 x N197
5.96
Iowa Stiff Stalk x Lancaster Sure Crop
7.85
7.36
5.61
7.47
5.96
1
1
= 1
0
0
0
0
0
1
1
b1
b2
1
0
+ 0
0
0
0
0
1
0
1
0
1 0
1 0
1 gB73
1 0 gMo17
0 1
0 gB84 + 0 1 gN197 + 0 0
1 gH123
1 0
0 1
0
0 1
0 0
0
0 s1
1 s2
0 s3
1
e11
e12
+ e13
e22
e23
Covariance of single crosses
SC-X is jxk SC-Y is j’xk’
2
Cov SC   jj'GCA (1)
2
 kk 'GCA (2)
B73, B84, H123
1
G1=
B73,B84
MO17, N197
B73,B84 B73,H123
1
B73,H123 B84,H123
2
2g1  G1GCA(1)
2
  jj'kk 'SCA
B84,H123
1
G2=
Mo17,N197
Mo17,N197
1
1
2
2g2  G2GCA(2)
assuming no epistasis
Covariance of single crosses
SC-X is jxk SC-Y is j’xk’
2
Cov SC   jj'GCA (1)
S=
2
 kk 'GCA (2)
2
  jj'kk 'SCA
SC-1=B73xMO17
SC-2=H123xMO17 SC-3=B84xN197
1
B73,H123Mo17,Mo17 B73,B84Mo17,N197
B73,H123Mo17,Mo17
1
B84,H123Mo17,N197
B73,B84Mo17,N197
B84,H123Mo17,N197
1
  S
2
s
2
SCA
Solutions
-1
X'R X
-1
X'R U
-1
-1
X'R W
X'R Z
U'R X U'R-1U + Q 1 U'R W
U'R Z
-1
-1
-1
-1
W'R X W'R U
-1
Z'R X
-1
Z'R U
-1
W'R W + Q 2 W'R Z
-1
-1
Z'R W
-1
-1
-1
X
Z'R Z + Q S
-1
2

Q1  G11 2 / GCA(1)
2

Q2  G21 2 / GCA(2)
2

QS  S1 2 / SCA
-1
X'R Y
U'R Y
-1
W'R Y
-1
Z'R Y

 b1 
  
 b2 
 g B73 


 gB84 

 g
  H123 
gMo 17 

 g
 N197 
 s1 
  
 s 2 
 s3 
Download