Genetic evaluation under parental uncertainty Robert J. Tempelman Michigan State University, East Lansing, MI National Animal Breeding Seminar Series December 6, 2004. Key papers from our lab: Cardoso, F.F., and R.J. Tempelman. 2003. Bayesian inference on genetic merit under uncertain paternity. Genetics, Selection, Evolution 35:469-487. Cardoso, F.F., and R.J. Tempelman. 2004. Genetic evaluation of beef cattle accounting for uncertain paternity. Livestock Production Science 89: 109-120. Multiple sires – The situation Cows are mated with a group of bulls under pasture conditions Common in large beef cattle populations raised on extensive pasture conditions – Accounts for up to 50% of calves in some herds under genetic evaluation in Brazil (~25-30% on average) – Multiple sires group sizes range from 2 to 12+ (Breeding cows group size range from 50 to 300+) Common in commercial U.S. herds. – Potential bottleneck for genetic evaluations beyond the seedstock level (Pollak, 2003). Multiple sires – The situation x x ? ? Who is the sire? The tabular method for computing genetic relationships Recall basis tabular method for computing the numerator relationship matrix: – Henderson, C.R. 1976. A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32:69. A = {aij} where aij is the genetic relationship between animals i and j. Let parents of j be sj and dj. aij 0.5ai ,s j 0.5ai ,d j a jj 1 0.5as j ,d j 1 Fj The average numerator relationship matrix (ANRM) Henderson, C.R. 1988. Use of an average numerator relationship matrix for multiple-sire joining. Journal of Animal Science 66:1614-1621. – aij is the genetic relationship between animals i and j. Suppose dam of j be known to be dj whereas there are vj different candidate sires (s1,s2,…svj) with probabilities vj (p1,p2,…pvj) of being the true sire: p s j 1 sj 1 aij 0.5ai ,d j 0.5 ps1 ai , s1 ps2 ai ,s2 .... pv j ai ,sv j a jj 1 0.5 ps1 as1 ,d j ps2 as2 ,d j .... pv j av j ,d j 1 Fj Pedigree file example from Henderson (1988) 0 = unknown Animal Sires Sire probabilities Dam 1 0 1 0 2 0 1 0 3 1 1 2 4 1 1 2 5 3 1 4 6 3 1 0 7 3,5 0.6, 0.4 6 8 1,5 0.3, 0.7 4 9 1,4,5 0.3, 0.6, 0.1 6 10 1 1 4 Could be determined using genetic markers Henderson, 1988 1 0 0.5 0.5 0.5 0.25 0.375 1 0.5 0.5 0.5 0.25 0.375 1 0.5 0.75 0.5 0.7 1 0.75 0.25 0.425 1.25 0.375 0.6625 A 1 0.725 1.225 Rest provided in Numerator relationship matrix: Animal Sires Sire probabilities Dam 7 3,5 0.6, 0.4 6 8 1,5 0.3, 0.7 4 9 1,4,5 0.3, 0.6, 0.1 6 10 1 1 4 a17 0.5 a16 0.5 p3a13 p5a15 0.5 0.25 0.5 0.6 0.5 0.4 0.5 0.375 a27 0.5 a26 0.5 p3a23 p5a25 0.5 0.25 0.5 0.6 0.5 0.4 0.5 0.375 a37 0.5 a36 0.5 p3a33 p5a35 0.5 0.5 0.5 0.6 1.0 0.4 0.75 0.7 a47 0.5 a46 0.5 p3a43 p5a45 0.5 0.25 0.5 0.6 0.5 0.4 0.75 0.425 a57 0.5 a56 0.5 p3a53 p5a55 0.5 0.375 0.5 0.6 0.75 0.4 1.25 0.6625 a67 0.5 a66 0.5 p3a63 p5a65 0.5 1.0 0.5 0.6 0.5 0.4 0.375 0.725 a77 1 0.5 p3a36 p5a56 1 0.5 0.6 0.5 0.4 0.375 1.225 Note if true sire of 7 is 3, a77 = 1.25; otherwise a77 = 1.1875 How about inferring upon what might be the correct sire? Empirical Bayes Strategy: – Foulley, J.L., D. Gianola, and D. Planchenault. 1987. Sire evaluation with uncertain paternity. Genetics, Selection, Evolution. 19: 83-102. Sire model implementation. Simple sire model y Animal Sires Sire probabilities 1 0 1 2 0 1 3 1 1 4 1 1 5 3 1 6 3 1 7 3,5 0.6, 0.4 8 1,5 0.3, 0.7 9 1,4,5 0.3, 0.6, 0.1 10 1 1 =Xb+ y3 1 y 4 1 y5 0 Xβ y6 0 y 0 7 y8 ? y ? 9 y10 1 Zs + e 0 0 0 0 1 0 1 0 ? 0 0 0 0 ? 0 0 e3 0 0 s1 e4 0 s3 e5 0 s4 e6 ? s5 e7 ? e8 e ? 9 e10 0 One possibility: Substitute sire probabilities for elements of Z. Animal Sires Sire probabilities 1 0 1 2 0 1 3 1 1 4 1 1 5 3 1 6 3 1 7 3,5 0.6, 0.4 8 1,5 0.3, 0.7 9 1,4,5 0.3, 0.6, 0.1 10 1 1 y3 e3 1 0 0 0 y s 1 0 0 0 4 1 e4 y5 0 1 0 0 s3 e5 Xβ y s 1 0 0 4 e6 6 0 y 0 0.6 0 0.4 s e 7 5 7 0 0.7 y8 e8 0.3 0 y e 0.3 0 0.6 0.1 9 9 y10 e10 1 0 0 0 Strategy of Foulley et al. (1987) y3 e3 1 0 0 0 y s 1 0 0 0 4 1 e4 y5 s3 e5 0 1 0 0 Xβ 0 1 0 0 y6 s4 e6 y s e 0 Pr( sire '3' | y ) 0 Pr( sire '5' | y ) 7 7 7 5 7 0 0 Pr( sire8 '5' | y ) y8 e8 Pr( sire8 '1' | y ) y e Pr( sire '1' | y ) 0 Pr( sire9 ' 4 ' | y) Pr( sire9 '5' | y ) 9 9 9 y10 e10 1 0 0 0 Pr(sirei ' j ' | y) : Posterior probabilities using provided sire probabilities as “prior” probabilities and y to estimate elements of Z. - computed iteratively Limitation: Can only be used for sire models. Inferring upon elements of design matrix Where else is this method currently used? Segregation analysis – Estimating allelic frequencies and genotypic effects for a biallelic locus WITHOUT molecular marker information. – Prior probabilities based on HW equilibrium for base population. – Posterior probabilities based on data. – Reference: Janss, L.L.G., R. Thompson., J.A.M. Van Arendonk. 1995. Application of Gibbs sampling for inference in a mixed major gene-polygenic inheritance model in animal populations. Theoretical and Applied Genetics 91: 1137-1147. Another strategy (most commonly used) Use phantom groups (Westell et al., 1988; Quaas et al., 1988). Used commonly in genetic evaluation systems having incomplete ancestral pedigrees in order to mitigate bias due to genetic trend. – Limitations (applied to multiple sires): 1. Assumes the number of candidate sires is effectively infinite within a group. 2. None of the phantom parents are related. 3. Potential confounding problems for small groups (Quaas, 1988). The ineffectiveness of phantom grouping for genetic evaluations in multiple sire pastures: Perez-Enciso, M. and R.L. Fernando. 1992. Genetic evaluation with uncertain parentage: A comparison of methods. Theoretical and Applied Genetics 84:173-179. Sullivan, P.G. 1995. Alternatives for genetic evaluation with uncertain paternity. Canadian Journal of Animal Science 75:31-36. – Greater selection response using Henderson’s ANRM relative to phantom grouping (simulation studies). – Excluding animals with uncertain paternity reduces expected selection response by as much as 37%. Uncertain paternity objectives 1. To propose a hierarchical Bayes animal model for genetic evaluation of individuals having uncertain paternity 2. To estimate posterior probabilities of each bull in the group being the correct sire of the individual 3. To compare the proposed method with Henderson’s ANRM via 1. Simulation study 2. Application to Hereford PWG and WW data. Uncertain paternity hierarchical Bayes model 1st stage Data - y (Performance records) Residual terms - e (assumed to be normal) Non-genetic effects - b (Contemporary groups, age of dam, age of calf, gender) Animal genetic values – a y = Xb + Za + e; e ~N (0,Ise2) Uncertain paternity hierarchical Bayes model 2nd stage Non-genetic effects b ~N (bo,Vb) Prior means based on literature information Variance based on the reliability of prior information Animal genetic values a|s ~N (0,Assa2) (Co)variances based on relationship (A), sire assignments (s) and genetic variance (sa2) Residual Variance se2 ~ se2cn-2 Prior knowledge based on literature information Uncertain paternity hierarchical Bayes model 3rd stage sire assignments genetic variance Prob s j π j sa2 ~ sa2cna)-2 Probability for sire assignments (pj) Prior knowledge based on literature information Could be based on marker data. Uncertain paternity - hierarchical Bayes model 4th stage Specifying uncertainty for probability of sire assignments vj k p π j | α j p j k 1 k j Dirichlet prior e.g. How sure are you about the prior probabilities of 0.6 and 0.4 for Sires 3 and 5, respectively, being the correct sire? Assessment based on how much you trust the genotype based probabilities. Could also model genotyping error rates explicitly (Rosa, G.J.M, Yandell, B.S., Gianola, D. A Bayesian approach for constructing genetics maps when markers are miscoded. Genetics, Selection, Evolution 34:353-369) Uncertain paternity joint posterior density Data 1st stage Non-genetic fixed effects Residual error Genetic effects 2nd stage Prior means (literature information) Variance (reliability of priors) 3rd stage 4th stage (Co)variances (relationship, sire assignments and genetic variances) Prior probability for sire assignments Reliability of priors Prior knowledge based on literature information Prior knowledge based on literature information Markov chain Monte Carlo (MCMC) Simulation Study (Cardoso and Tempelman, 2003) Generation Base population Selection (20 sires & 100 dams) 0 Breeding population Random mating (inbreeding avoided) 1 Offspring (500 animals) Selection (5 sires & 25 dams) Selection (15 sires & 75 dams) Breeding population Selection (5 sires & 25 dams) Random mating (inbreeding avoided) 2 . . . 5 Offspring (500 animals) Selection (15 sires & 75 dams) Breeding population . . . Offspring (360 animals) Totals: 80 sires, 400 dams, 2000 non-parents. Sires averaged 23.6 progeny, Dams averaged 5.9 progeny Paternity assignment Offspring .7 Certain Random Assignment to Paternity Condition .3 Uncertain Assignment to Multiple Sire Groups .2 2 .3 3 .2 4 .1 6 .1 8 .1 10 Within the assigned group one of the sire is picked to be the true sire (with equal or unequal probabilities) Sire 1 s 1 d d Record: yi ai ai mi i ei 2 2 Simulated traits: Ten datasets generated from each of two different types of traits: – Trait 1 (WW): h 0.3 ram -0.2 2 hm 0.2 2 a – Trait 2 (PWG): ha2 0.5 ram 0 2 h 0 m Naïve prior assignments: i.e. equal prior probabilities to each candidate sire (i.e. no information based on genetic markers available) Posterior probabilities of sire assignments being equal to true sires Multiple-sire group size Animal Category 2 3 4 6 8 10 Parents 0.525 0.349 0.269 0.183 0.127 0.110 Non-parents 0.517 0.345 0.268 0.178 0.134 0.105 Parents 0.521 0.352 0.280 0.188 0.138 0.111 Non-parents 0.540 0.360 0.289 0.191 0.143 0.111 Trait 1 Trait 2 Rank correlation of predicted genetic effects ANRM = Henderson’s ANRM HIER = proposed model TRUE = all sires known Trait 1 0.85 b Trait 2 a a 0.80 a a b Rank correlation 0.75 b 0.70 0.65 b a b a a 0.60 a Sidenote: a a 0.55 ANRM HIER TRUE b 0.50 a 0.45 a 0.40 0.35 Parents additive Non-parents additive Parents maternal Non-parents maternal Parents additive Non-parents additive Model fit criteria was clearly in favor of HIER over ANRM Uncertain paternity application to field data Data set 3,402 post-weaning gain records on Hereford calves raised in southern Brazil (from 1991-1999) 4,703 animals Paternity (57% certain; 15% uncertain & 28% unknown-base animals) Group sizes 2, 3, 4, 5, 6, 10, 12 & 17 Methods ANRM (average relationship) HIER (uncertain paternity hierarchical Bayes model) Posterior inference for PWG genetic parameters under ANRM versus HIER models Parametera Posterior median 95% Credible Set ha2 0.231 (0.153, 0.316) s a2 73.8 (48.0, 103.6) s e2 246.5 (221.5, 271.2) 2 s cg 404.5 (334.3, 494.0) ha2 0.244 (0.162, 0.336) s a2 78.2 (51.1, 111.2) s e2 242.9 (216.5, 268.2) 2 s cg 404.5 (333.9, 493.8) ANRM HIER Uncertain paternity Results summary Model choice criteria (DIC and PBF) decisively favored HIER over ANRM Very high rank correlations between genetic evaluations using ANRM versus HIER Some non-trivial differences on posterior means of additive genetic value for some animals Uncertain paternity assessment of accuracy (PWG) Standard deviation of additive genetic effects y = 0.6786x + 2.2914 R2 = 0.741 9.0 8.5 SD(a), ANRM (kg) 8.0 7.5 7.0 6.5 6.0 5.5 Sire with 9 progeny 5.0 4.5 i.e. accuracies are generally slightly overstated with Henderson’s ANRM Sire with 50 progeny 4.0 4.0 4.5 5.0 5.5 6.0 6.5 7.0 SD(a), HIER (kg) 7.5 8.0 8.5 9.0 Conclusions Uncertain paternity modeling complements genetic marker information (as priors) – Reliability on prior information can be expressed (via Dirichlet). Little advantage over the use of Henderson’s ANRM. – However, accuracies of EPD’s overstated using ANRM. – Power of inference may improve with better statistical assumptions (i.e. heterogeneous residual variances) Implementation issues Likely require a non-MCMC approach to providing genetic evaluations. Some hybrid with phantom grouping may be likely needed. – Candidate sires are not simply known for some animals. Bob Weaber’s talk.