PBG 650 Advanced Plant Breeding Module 8: Estimating Genetic Variances –GCA, SCA

advertisement
PBG 650 Advanced Plant Breeding
Module 8: Estimating Genetic Variances
– Nested design
–GCA, SCA
– Diallel
Nested design
• Also called
Males
– North Carolina Design 1
– Hierarchical design
•
1
Two types of families
– Half sibs (male groups)
– Full-sibs (females/males)
2
.
.
.
m
Females
1
2
3
4
5
6
7
8
.
.
f
Nested design – one location
Linear Model
•
Yijk=  + Bi + Mj + Fk(j) + eijk
Source
df
MS
Blocks
r-1
MSR
Males
m-1
MSM
Females/males
m(f-1)
MSF
Error
(r-1)(mf-1)
MSE
Fmales 
MSM
FF / M 
MSF
MSF
MS e
Expected Mean Square
 e2  r F2/ M  rf M2
 e2  r F2/ M
 e2
 
2
M

2
F/M
MSM  MSF
rf

MSF  MS e
r
Might also have sets and multiple environments
See Bernardo, pg 164, for ANOVA with sets and environments
Variance components from the nested design
 Half sibs   M   A
2
2
4
A 
2


2
F/M
 
2
D
M
 

(if the parents are not inbred)
2
1 F
2
Fullsibs
2
1
4
2
M
2
Fullsibs
4
(1  F)
2
2
F/M


2
Half sibs
2
F/M 
   
 M2 
1
4
2
A
1
4
2
D
(if the parents are not inbred)
fyi
Expected Mean Squares in SAS
• Random statement generates expected mean squares
• Test option obtains appropriate F tests for the model specified
• In the example below, cultivars are fixed, all other effects are random
Proc GLM;
Class Loc Rep Cultivar;
Model Yield=Loc Rep(Loc) Cultivar Loc*Cultivar;
Random Loc Rep(Loc) Loc*Cultivar/Test;
Run;
controversial (could be dropped)
Source
Type III Expected Mean Square
Loc
Var(Error) + 3 Var(Loc*Cultivar) + 7 Var(Rep(Loc)) + 21 Var(Loc)
Rep(Loc)
Var(Error) + 7 Var(Rep(Loc))
2
Cultivar
Var(Error) + 3 Var(Loc*Cultivar) + Q(Cultivar)
Loc
Loc*Cultivar Var(Error) + 3 Var(Loc*Cultivar)

•
fixed effect
PROC MIXED, PROC VARCOMP, and PROC GLIMMIX use Mixed
Models and REML estimation and give direct estimates of variance
components
Combining Ability
•
General combining ability (GCA)– the average of all F1
crosses from a line (or genotype), expressed as a deviation
from the population mean
•
The expected value of a cross is the sum of the combining
ability of its two parents
•
Specific combining ability (SCA)– the deviation of a cross
from its expected value
X  X  GCAP1  GCAP 2  SCAP1P2
Where X is the performance
of the cross
GP1xP2  P1xP2  GCA P1  GCA P 2  SCA P1P 2
  2
2
X
2
GCA

2
SCA
Estimation of combining ability
GCA
• polycross method - allow all lines to intermate naturally
• top crossing - a line is crossed to a random sample of plants
from a reference population
GCA and SCA
• Factorial design (NC Design II) – a group of ‘male’ parents is
crossed to a group of ‘female’ parents
– requires mxf crosses (e.g. 5x5=25)
– can be applied to two heterotic populations
• Diallel – all possible crosses among a set of parents
– n(n-1)/2 possible crosses without parents or reciprocals
(e.g. 10x9/2=45)
Variations on the Diallel
• Type of cross-classified design
• With or without the parents
• With or without reciprocal crosses
– bulk seed from both parents if maternal effects are not important
•
Genotypes may be random or fixed
– For random model, need many parents to adequately sample the
population
•
Large number of crosses!
– Can be divided into sets
– Partial diallels can be conducted
•
If parents are inbred, can make paired row crosses to obtain
more seed
Hallauer, Carena, and Miranda (2010) pg 119-138
Griffing’s Methods (Diallels)
•
Method 1
– all possible crosses, including selfs
•
Method 2
– no reciprocals
•
Method 3
– no parents
•
For each Method, genotypes may be
Model I = Fixed
Model II = Random
Method 4
– no parents or reciprocals
– most common, because parents are often inbred and less
vigorous
Diallel crossing
Parent
A
B
C
D
A
a+a
a+b
a+c
b+b
B
C
D
…….
a+d
a+n
a
b+c
b+d
b+n
b
c+c
c+d
c+n
c
d+d
d+n
d
N
n+n
n
…..
Mean
…..
N
Diallel analysis
Random model
•
•
Usually does not include parents and reciprocals
Can be divided into sets
Source
df
MS
Blocks
r-1
Crosses
[n(n-1)/2] -1
MS2
GCA
n-1
MS21
SCA
n(n-3)/2
MS22
Error
(r-1){[n(n-1)/2] -1}
 GCA 
2
MS1
MS21  MS22
r(n  2)
Griffing (1956) is classic reference
Expected Mean Square
 e2  r C2
2
2
 e2  r SCA
 r(n  2) GCA
2
 e2  r SCA
 e2

2
SCA

MS 22  MS1
r
Genetic variances from random model

2
GCA
A 
2

2
SCA
 
2
D
 CovHS 
4
1 F
1 F
4

2
A
General form for variance of
a variance component
2
2
Var(ˆ g )  2
k
 GCA
2
 CovFS  2CovHS 
4
(1  F)
MS 2g
f 2
g
2

2
SCA
k=coefficient of MS
fg=df of the mean square
(1  F)
4
2

2
D
Fixed model
• GCA effects
ĝi 
1
n(n  2)
Lattice designs are useful
nX .  2X..
i
 (ĝi ) 
n 1
2
e
2
n(n  2)
• SCA effects
2
Xi.  Xj. 
ŝi j  Xij 
X..
n(n  2)
(n  1)(n  2)
1
Advantage: first order effects (means)
are estimated with greater precision
than variances
 (ŝij ) 
2
n3
n 1

2
e
Diallel analysis with parents
Gardner-Eberhart Analysis II
Source
df
Source
df
Blocks
r-1
Blocks
r-1
Entries
[n(n+1)/2]-1
Entries
[n(n+1)/2]-1
Parents
n-1
Varieties
n-1
Parents vs crosses
1
Heterosis
n(n-1)/2
Crosses
[n(n-1)/2]-1
Average
1
GCA
n-1
Variety
n -1
SCA
n(n-3)/2
Specific
n(n-3)/2
Error
(r-1){[n(n+1)/2] -1}
Error
(r-1){[n(n+1)/2] -1}
• Gardner-Eberhart partitioning of Sums of Squares is non-orthogonal
• Fit model sequentially
Factorial Mating Design
Diallel
Factorial (Design II)
Parents (males)
1
2
3
4
Parents (males)
1
2
3
4
Parents
(females)
1
2
3
Parents
(females)
…..
X12
X13
X14
5
X15
X25
X35
X45
…..
X23
X24
6
X16
X26
X36
X46
…..
X34
7
X17
X27
X37
X47
…..
8
X18
X28
X38
X48
4
Parents
Diallel
Factorial
4
6
4
6
15
9
10
45
25
20
190
100
100
4950
2500
n
n(n-1)/2
n2/4
Number of crosses
General formula for covariance of relatives
A
B
C
X
D
Y
Cov  r2A  D2
r = 2XY
 = ACBD + ADBC
Extended to include epistasis:
2
Cov  r2A  D2  r 22AA  r2AD  2DD
 ...
Epistatic Variance
•
Often assumed to be absent, but could bias
estimates of A2 and D2 upwards
•
•
Estimation requires more complex mating designs
•
Coefficients are correlated with those for A2 and
D2, which leads to multicollinearity problems
•
For most crops, experimental estimates of epistatic
variance have been small
Expected to be smaller than A2 and D2, so larger
experiments are needed for adequate precision
Example of mating design to estimate epistatic variance
•
Design I experiment from ‘Jarvis’ and ‘Indian Chief’
maize populations
2
G0
•
2
 A
2
 D

3 2

4 AA

1 2
1 2
  4 DD
2 AD
 ...
Obtained random inbred lines from each population,
which were used as parents in a Design II
experiment
2
G1
•

2
4f / m

2
m
2
 f
2
 mf

2
A
2
 D
2
  AA
2
2
  AD  DD
A comparison of these values can be made to
estimate epistatic variances
2
2
G


G0
1
Eberhart et al., 1966
 ...
Precision of variance components
•
Minimum of 50-100 progeny to adequately sample population
(Bernardo’s advice, some would say more!)
• Large numbers of progeny do not guarantee precise
estimates of variance
•
Confidence intervals can be determined for estimates of
variance (sets lower and upper bounds)
•
It’s possible in practice to obtain negative estimates of
variance components, but they are theoretically impossible
– large error variance
– true estimate of genetic variance is close to zero
– Report as zero? (may lead to bias when results are compiled across
many experiments)
See Bernardo, pg 166, for further details on confidence intervals
Resampling methods
•
Confidence interval calculations assume that the
underlying distribution is normal. Work best for
balanced data.
•
Resampling methods are useful when
– underlying distributions are unknown or are not normal
– we don’t know how to estimate the confidence interval
•
Examples
– Bootstrap – resampling with replacement
– Jackknife – systematically delete data points
– Permutation test – data scrambling
• only works when there are two or more types of families
Download