PBG 650 Advanced Plant Breeding Module 8: Estimating Genetic Variances – Nested design –GCA, SCA – Diallel Nested design • Also called Males – North Carolina Design 1 – Hierarchical design • 1 Two types of families – Half sibs (male groups) – Full-sibs (females/males) 2 . . . m Females 1 2 3 4 5 6 7 8 . . f Nested design – one location Linear Model • Yijk= + Bi + Mj + Fk(j) + eijk Source df MS Blocks r-1 MSR Males m-1 MSM Females/males m(f-1) MSF Error (r-1)(mf-1) MSE Fmales MSM FF / M MSF MSF MS e Expected Mean Square e2 r F2/ M rf M2 e2 r F2/ M e2 2 M 2 F/M MSM MSF rf MSF MS e r Might also have sets and multiple environments See Bernardo, pg 164, for ANOVA with sets and environments Variance components from the nested design Half sibs M A 2 2 4 A 2 2 F/M 2 D M (if the parents are not inbred) 2 1 F 2 Fullsibs 2 1 4 2 M 2 Fullsibs 4 (1 F) 2 2 F/M 2 Half sibs 2 F/M M2 1 4 2 A 1 4 2 D (if the parents are not inbred) fyi Expected Mean Squares in SAS • Random statement generates expected mean squares • Test option obtains appropriate F tests for the model specified • In the example below, cultivars are fixed, all other effects are random Proc GLM; Class Loc Rep Cultivar; Model Yield=Loc Rep(Loc) Cultivar Loc*Cultivar; Random Loc Rep(Loc) Loc*Cultivar/Test; Run; controversial (could be dropped) Source Type III Expected Mean Square Loc Var(Error) + 3 Var(Loc*Cultivar) + 7 Var(Rep(Loc)) + 21 Var(Loc) Rep(Loc) Var(Error) + 7 Var(Rep(Loc)) 2 Cultivar Var(Error) + 3 Var(Loc*Cultivar) + Q(Cultivar) Loc Loc*Cultivar Var(Error) + 3 Var(Loc*Cultivar) • fixed effect PROC MIXED, PROC VARCOMP, and PROC GLIMMIX use Mixed Models and REML estimation and give direct estimates of variance components Combining Ability • General combining ability (GCA)– the average of all F1 crosses from a line (or genotype), expressed as a deviation from the population mean • The expected value of a cross is the sum of the combining ability of its two parents • Specific combining ability (SCA)– the deviation of a cross from its expected value X X GCAP1 GCAP 2 SCAP1P2 Where X is the performance of the cross GP1xP2 P1xP2 GCA P1 GCA P 2 SCA P1P 2 2 2 X 2 GCA 2 SCA Estimation of combining ability GCA • polycross method - allow all lines to intermate naturally • top crossing - a line is crossed to a random sample of plants from a reference population GCA and SCA • Factorial design (NC Design II) – a group of ‘male’ parents is crossed to a group of ‘female’ parents – requires mxf crosses (e.g. 5x5=25) – can be applied to two heterotic populations • Diallel – all possible crosses among a set of parents – n(n-1)/2 possible crosses without parents or reciprocals (e.g. 10x9/2=45) Variations on the Diallel • Type of cross-classified design • With or without the parents • With or without reciprocal crosses – bulk seed from both parents if maternal effects are not important • Genotypes may be random or fixed – For random model, need many parents to adequately sample the population • Large number of crosses! – Can be divided into sets – Partial diallels can be conducted • If parents are inbred, can make paired row crosses to obtain more seed Hallauer, Carena, and Miranda (2010) pg 119-138 Griffing’s Methods (Diallels) • Method 1 – all possible crosses, including selfs • Method 2 – no reciprocals • Method 3 – no parents • For each Method, genotypes may be Model I = Fixed Model II = Random Method 4 – no parents or reciprocals – most common, because parents are often inbred and less vigorous Diallel crossing Parent A B C D A a+a a+b a+c b+b B C D ……. a+d a+n a b+c b+d b+n b c+c c+d c+n c d+d d+n d N n+n n ….. Mean ….. N Diallel analysis Random model • • Usually does not include parents and reciprocals Can be divided into sets Source df MS Blocks r-1 Crosses [n(n-1)/2] -1 MS2 GCA n-1 MS21 SCA n(n-3)/2 MS22 Error (r-1){[n(n-1)/2] -1} GCA 2 MS1 MS21 MS22 r(n 2) Griffing (1956) is classic reference Expected Mean Square e2 r C2 2 2 e2 r SCA r(n 2) GCA 2 e2 r SCA e2 2 SCA MS 22 MS1 r Genetic variances from random model 2 GCA A 2 2 SCA 2 D CovHS 4 1 F 1 F 4 2 A General form for variance of a variance component 2 2 Var(ˆ g ) 2 k GCA 2 CovFS 2CovHS 4 (1 F) MS 2g f 2 g 2 2 SCA k=coefficient of MS fg=df of the mean square (1 F) 4 2 2 D Fixed model • GCA effects ĝi 1 n(n 2) Lattice designs are useful nX . 2X.. i (ĝi ) n 1 2 e 2 n(n 2) • SCA effects 2 Xi. Xj. ŝi j Xij X.. n(n 2) (n 1)(n 2) 1 Advantage: first order effects (means) are estimated with greater precision than variances (ŝij ) 2 n3 n 1 2 e Diallel analysis with parents Gardner-Eberhart Analysis II Source df Source df Blocks r-1 Blocks r-1 Entries [n(n+1)/2]-1 Entries [n(n+1)/2]-1 Parents n-1 Varieties n-1 Parents vs crosses 1 Heterosis n(n-1)/2 Crosses [n(n-1)/2]-1 Average 1 GCA n-1 Variety n -1 SCA n(n-3)/2 Specific n(n-3)/2 Error (r-1){[n(n+1)/2] -1} Error (r-1){[n(n+1)/2] -1} • Gardner-Eberhart partitioning of Sums of Squares is non-orthogonal • Fit model sequentially Factorial Mating Design Diallel Factorial (Design II) Parents (males) 1 2 3 4 Parents (males) 1 2 3 4 Parents (females) 1 2 3 Parents (females) ….. X12 X13 X14 5 X15 X25 X35 X45 ….. X23 X24 6 X16 X26 X36 X46 ….. X34 7 X17 X27 X37 X47 ….. 8 X18 X28 X38 X48 4 Parents Diallel Factorial 4 6 4 6 15 9 10 45 25 20 190 100 100 4950 2500 n n(n-1)/2 n2/4 Number of crosses General formula for covariance of relatives A B C X D Y Cov r2A D2 r = 2XY = ACBD + ADBC Extended to include epistasis: 2 Cov r2A D2 r 22AA r2AD 2DD ... Epistatic Variance • Often assumed to be absent, but could bias estimates of A2 and D2 upwards • • Estimation requires more complex mating designs • Coefficients are correlated with those for A2 and D2, which leads to multicollinearity problems • For most crops, experimental estimates of epistatic variance have been small Expected to be smaller than A2 and D2, so larger experiments are needed for adequate precision Example of mating design to estimate epistatic variance • Design I experiment from ‘Jarvis’ and ‘Indian Chief’ maize populations 2 G0 • 2 A 2 D 3 2 4 AA 1 2 1 2 4 DD 2 AD ... Obtained random inbred lines from each population, which were used as parents in a Design II experiment 2 G1 • 2 4f / m 2 m 2 f 2 mf 2 A 2 D 2 AA 2 2 AD DD A comparison of these values can be made to estimate epistatic variances 2 2 G G0 1 Eberhart et al., 1966 ... Precision of variance components • Minimum of 50-100 progeny to adequately sample population (Bernardo’s advice, some would say more!) • Large numbers of progeny do not guarantee precise estimates of variance • Confidence intervals can be determined for estimates of variance (sets lower and upper bounds) • It’s possible in practice to obtain negative estimates of variance components, but they are theoretically impossible – large error variance – true estimate of genetic variance is close to zero – Report as zero? (may lead to bias when results are compiled across many experiments) See Bernardo, pg 166, for further details on confidence intervals Resampling methods • Confidence interval calculations assume that the underlying distribution is normal. Work best for balanced data. • Resampling methods are useful when – underlying distributions are unknown or are not normal – we don’t know how to estimate the confidence interval • Examples – Bootstrap – resampling with replacement – Jackknife – systematically delete data points – Permutation test – data scrambling • only works when there are two or more types of families