Uncertain paternity

advertisement
Genetic evaluation under parental uncertainty
Robert J. Tempelman
Michigan State University, East Lansing, MI
National Animal Breeding Seminar Series
December 6, 2004.
Key papers from our lab:

Cardoso, F.F., and R.J. Tempelman. 2003.
Bayesian inference on genetic merit under
uncertain paternity. Genetics, Selection,
Evolution 35:469-487.
 Cardoso, F.F., and R.J. Tempelman. 2004.
Genetic evaluation of beef cattle
accounting for uncertain paternity.
Livestock Production Science 89: 109-120.
Multiple sires – The situation


Cows are mated with a group of bulls under
pasture conditions
Common in large beef cattle populations
raised on extensive pasture conditions
– Accounts for up to 50% of calves in some herds
under genetic evaluation in Brazil (~25-30% on
average)
– Multiple sires group sizes range from 2 to 12+
(Breeding cows group size range from 50 to 300+)

Common in commercial U.S. herds.
– Potential bottleneck for genetic evaluations beyond
the seedstock level (Pollak, 2003).
Multiple sires – The situation
x
x
?
?
Who is the sire?
The tabular method for computing
genetic relationships

Recall basis tabular method for computing
the numerator relationship matrix:
– Henderson, C.R. 1976. A simple method for
computing the inverse of a numerator relationship
matrix used in prediction of breeding values.
Biometrics 32:69.

A = {aij} where aij is the genetic relationship
between animals i and j. Let parents of j be sj
and dj.
aij  0.5ai ,s j  0.5ai ,d j
a jj  1  0.5as j ,d j  1  Fj
The average numerator relationship
matrix (ANRM)

Henderson, C.R. 1988. Use of an average
numerator relationship matrix for multiple-sire
joining. Journal of Animal Science 66:1614-1621.
– aij is the genetic relationship between animals i and j.
Suppose dam of j be known to be dj whereas there are vj
different candidate sires (s1,s2,…svj) with probabilities
vj
(p1,p2,…pvj) of being the true sire:
p

s j 1
sj
1
aij  0.5ai ,d j  0.5 ps1 ai , s1  ps2 ai ,s2  ....  pv j ai ,sv


j

a jj  1  0.5 ps1 as1 ,d j  ps2 as2 ,d j  ....  pv j av j ,d j  1  Fj
Pedigree file example from
Henderson (1988)
0 = unknown
Animal
Sires
Sire probabilities
Dam
1
0
1
0
2
0
1
0
3
1
1
2
4
1
1
2
5
3
1
4
6
3
1
0
7
3,5
0.6, 0.4
6
8
1,5
0.3, 0.7
4
9
1,4,5
0.3, 0.6, 0.1
6
10
1
1
4
Could be
determined
using genetic
markers















Henderson, 1988
1 0 0.5 0.5 0.5 0.25 0.375
 1 0.5 0.5 0.5 0.25 0.375


1 0.5 0.75 0.5
0.7

1 0.75 0.25 0.425


1.25 0.375 0.6625
A
1
0.725


1.225





Rest provided in
Numerator relationship matrix:
Animal
Sires
Sire
probabilities
Dam
7
3,5
0.6, 0.4
6
8
1,5
0.3, 0.7
4
9
1,4,5
0.3, 0.6, 0.1
6
10
1
1
4
a17  0.5  a16   0.5  p3a13  p5a15   0.5  0.25  0.5  0.6 0.5  0.4 0.5  0.375
a27  0.5  a26   0.5  p3a23  p5a25   0.5  0.25  0.5  0.6 0.5  0.4 0.5  0.375
a37  0.5  a36   0.5  p3a33  p5a35   0.5  0.5  0.5  0.6 1.0  0.4 0.75  0.7
a47  0.5  a46   0.5  p3a43  p5a45   0.5  0.25  0.5  0.6 0.5  0.4 0.75  0.425
a57  0.5  a56   0.5  p3a53  p5a55   0.5  0.375  0.5  0.6 0.75  0.4 1.25  0.6625
a67  0.5  a66   0.5  p3a63  p5a65   0.5 1.0  0.5  0.6 0.5  0.4 0.375  0.725
a77  1  0.5  p3a36  p5a56   1  0.5  0.6  0.5  0.4  0.375  1.225
Note if true sire of 7 is 3, a77 = 1.25; otherwise a77 = 1.1875
How about inferring upon what
might be the correct sire?

Empirical Bayes Strategy:
– Foulley, J.L., D. Gianola, and D.
Planchenault. 1987. Sire evaluation with
uncertain paternity. Genetics,
Selection, Evolution. 19: 83-102.

Sire model implementation.
Simple sire model
y
Animal
Sires
Sire probabilities
1
0
1
2
0
1
3
1
1
4
1
1
5
3
1
6
3
1
7
3,5
0.6, 0.4
8
1,5
0.3, 0.7
9
1,4,5
0.3, 0.6, 0.1
10
1
1
=Xb+
 

 

 

 y3 
1
 

y
 4
1
 y5 
0
   Xβ  
 y6 
0
y 
0
7
 

 y8 
?
y 
?
9
 

 y10 
1
Zs + e
0 0
0 0
1 0
1 0
? 0
0 0
0 ?
0 0
 

 

 

 e3 
0
 

0   s1   e4 
0   s3   e5 
 

0   s4   e6 
 
?   s5   e7 
 

?
 e8 
e 
?
 9

e10 
0 
One possibility: Substitute sire
probabilities for elements of Z.
Animal
Sires
Sire probabilities
1
0
1
2
0
1
3
1
1
4
1
1
5
3
1
6
3
1
7
3,5
0.6, 0.4
8
1,5
0.3, 0.7
9
1,4,5
0.3, 0.6, 0.1
10
1
1
 
 


 
 


 
 


 y3 
 e3 
1
0
0
0 
 
 


y
s
1
0
0
0


4
1
 
 e4 


 y5 
0
1
0
0   s3   e5 
 
   Xβ  



y
s
1
0
0  4  e6 
 6
0
y 
 0 0.6 0 0.4   s   e 
 7

 5  7 
0 0.7 
 y8 
 e8 
0.3 0
y 
e 
0.3 0 0.6 0.1
9
 
 9


 y10 
e10 
 1
0
0
0 
Strategy of Foulley et al. (1987)
 
 


 
 


 
 


 y3 
 e3 


1
0
0
0
 
 


y
s
1
0
0
0


4
1
 
 e4 


 y5 

  s3   e5 
0
1
0
0
   Xβ  
  
0
1
0
0
 y6 

  s4   e6 
y 

  s   e 
0
Pr(
sire

'3'
|
y
)
0
Pr(
sire

'5'
|
y
)
7
7
 7

 5  7 
0
0
Pr( sire8  '5' | y ) 
 y8 
 e8 
 Pr( sire8  '1' | y )
y 
e 
 Pr( sire  '1' | y )
0
Pr( sire9  ' 4 ' | y) Pr( sire9  '5' | y ) 
9
9
 
 9


 y10 
e10 


1
0
0
0
Pr(sirei  ' j ' | y)
: Posterior probabilities using provided sire probabilities as
“prior” probabilities and y to estimate elements of Z.
- computed iteratively
Limitation: Can only be used for sire models.
Inferring upon elements of design
matrix


Where else is this method currently used?
Segregation analysis
– Estimating allelic frequencies and genotypic effects for
a biallelic locus WITHOUT molecular marker
information.
– Prior probabilities based on HW equilibrium for base
population.
– Posterior probabilities based on data.
– Reference: Janss, L.L.G., R. Thompson., J.A.M. Van
Arendonk. 1995. Application of Gibbs sampling for
inference in a mixed major gene-polygenic inheritance
model in animal populations. Theoretical and Applied
Genetics 91: 1137-1147.
Another strategy (most commonly
used)

Use phantom groups (Westell et al.,
1988; Quaas et al., 1988).
Used commonly in genetic evaluation systems
having incomplete ancestral pedigrees in
order to mitigate bias due to genetic trend.
– Limitations (applied to multiple sires):
1. Assumes the number of candidate sires is
effectively infinite within a group.
2. None of the phantom parents are related.
3. Potential confounding problems for small
groups (Quaas, 1988).
The ineffectiveness of phantom
grouping for genetic evaluations in
multiple sire pastures:


Perez-Enciso, M. and R.L. Fernando. 1992.
Genetic evaluation with uncertain parentage: A
comparison of methods. Theoretical and Applied
Genetics 84:173-179.
Sullivan, P.G. 1995. Alternatives for genetic
evaluation with uncertain paternity. Canadian
Journal of Animal Science 75:31-36.
– Greater selection response using Henderson’s ANRM
relative to phantom grouping (simulation studies).
– Excluding animals with uncertain paternity reduces
expected selection response by as much as 37%.
Uncertain paternity objectives
1.
To propose a hierarchical Bayes animal
model for genetic evaluation of individuals
having uncertain paternity
2.
To estimate posterior probabilities of each
bull in the group being the correct sire of the
individual
3.
To compare the proposed method with
Henderson’s ANRM via
1. Simulation study
2. Application to Hereford PWG and WW
data.
Uncertain paternity hierarchical Bayes model
1st stage
Data - y
(Performance records)
Residual terms - e
(assumed to be normal)
Non-genetic effects - b
(Contemporary groups, age of dam,
age of calf, gender)
Animal genetic values – a
y = Xb + Za + e; e ~N (0,Ise2)
Uncertain paternity hierarchical Bayes model
2nd stage
Non-genetic
effects
b ~N (bo,Vb)
Prior means based on literature
information
Variance based on the reliability of
prior information
Animal genetic
values
a|s ~N (0,Assa2)
(Co)variances based on
relationship (A), sire
assignments (s) and
genetic variance (sa2)
Residual
Variance
se2 ~ se2cn-2
Prior knowledge
based on
literature
information
Uncertain paternity hierarchical Bayes model
3rd stage
sire assignments
genetic variance
Prob s j  π j
 
sa2 ~ sa2cna)-2
Probability for sire
assignments (pj)
Prior knowledge based on
literature information
Could be based on
marker data.
Uncertain paternity -
hierarchical Bayes model
4th stage
Specifying uncertainty for
probability of sire
assignments


vj
 
k
p π j | α j   p j 
k 1
k
 j 
Dirichlet
prior
e.g. How sure are you about the prior probabilities of 0.6 and 0.4 for Sires 3 and 5,
respectively, being the correct sire?
Assessment based on how much you trust the genotype based probabilities.
Could also model genotyping error rates explicitly (Rosa, G.J.M, Yandell, B.S.,
Gianola, D. A Bayesian approach for constructing genetics maps when markers are
miscoded. Genetics, Selection, Evolution 34:353-369)
Uncertain paternity joint posterior density
Data
1st stage
Non-genetic
fixed effects
Residual
error
Genetic
effects
2nd stage
Prior means (literature information)
Variance (reliability of priors)
3rd stage
4th stage
(Co)variances (relationship,
sire assignments and genetic
variances)
Prior probability for sire
assignments
Reliability of priors
Prior knowledge
based on
literature
information
Prior knowledge based on
literature information
Markov chain Monte Carlo
(MCMC)
Simulation Study (Cardoso and Tempelman, 2003)
Generation
Base population
Selection (20 sires & 100 dams)
0
Breeding population
Random mating
(inbreeding avoided)
1
Offspring (500 animals)
Selection (5 sires & 25 dams)
Selection (15 sires & 75 dams)
Breeding population
Selection (5 sires & 25 dams)
Random mating
(inbreeding avoided)
2
.
.
.
5
Offspring (500 animals)
Selection (15 sires & 75 dams)
Breeding population
.
.
.
Offspring (360 animals)
Totals: 80 sires,
400 dams, 2000
non-parents.
Sires averaged 23.6 progeny,
Dams averaged 5.9 progeny
Paternity assignment
Offspring
.7
Certain
Random Assignment to Paternity Condition
.3
Uncertain
Assignment to Multiple Sire Groups
.2
2
.3
3
.2
4
.1
6
.1
8
.1
10
Within the assigned group one of the sire is picked to be the true sire
(with equal or unequal probabilities)
Sire
1 s 1 d
d
Record: yi    ai  ai  mi   i  ei
2
2
Simulated traits:

Ten datasets generated from each of
two different types of traits:
– Trait 1 (WW):
 h  0.3 ram  -0.2


2
hm  0.2 

2
a
– Trait 2 (PWG):
 ha2  0.5 ram  0


2
h

0
m


Naïve prior assignments:
i.e. equal prior
probabilities to each
candidate sire (i.e. no
information based on
genetic markers available)
Posterior probabilities of sire
assignments being equal to true sires
Multiple-sire group size
Animal Category
2
3
4
6
8
10
Parents
0.525
0.349
0.269
0.183
0.127
0.110
Non-parents
0.517
0.345
0.268
0.178
0.134
0.105
Parents
0.521
0.352
0.280
0.188
0.138
0.111
Non-parents
0.540
0.360
0.289
0.191
0.143
0.111
Trait 1
Trait 2
Rank correlation of predicted genetic
effects
ANRM = Henderson’s ANRM
HIER = proposed model
TRUE = all sires known
Trait 1
0.85
b
Trait 2
a a
0.80
a a
b
Rank correlation
0.75
b
0.70
0.65
b
a
b
a
a
0.60
a
Sidenote:
a
a
0.55
ANRM
HIER
TRUE
b
0.50
a
0.45
a
0.40
0.35
Parents
additive
Non-parents
additive
Parents
maternal
Non-parents
maternal
Parents
additive
Non-parents
additive
Model fit
criteria was
clearly in
favor of
HIER over
ANRM
Uncertain paternity application to field data
 Data set
 3,402 post-weaning gain records on
Hereford calves raised in southern Brazil
(from 1991-1999)
 4,703 animals
 Paternity (57% certain; 15% uncertain & 28%
unknown-base animals)
 Group sizes 2, 3, 4, 5, 6, 10, 12 & 17
 Methods
 ANRM (average relationship)
 HIER (uncertain paternity hierarchical Bayes
model)
Posterior inference for PWG genetic parameters under
ANRM versus HIER models
Parametera
Posterior median
95% Credible Set
ha2
0.231
(0.153, 0.316)
s a2
73.8
(48.0, 103.6)
s e2
246.5
(221.5, 271.2)
2
s cg
404.5
(334.3, 494.0)
ha2
0.244
(0.162, 0.336)
s a2
78.2
(51.1, 111.2)
s e2
242.9
(216.5, 268.2)
2
s cg
404.5
(333.9, 493.8)
ANRM
HIER
Uncertain paternity Results summary

Model choice criteria (DIC and PBF)
decisively favored HIER over ANRM

Very high rank correlations between
genetic evaluations using ANRM versus
HIER

Some non-trivial differences on
posterior means of additive genetic
value for some animals
Uncertain paternity assessment of accuracy (PWG)
Standard deviation of additive genetic effects
y = 0.6786x + 2.2914
R2 = 0.741
9.0
8.5
SD(a), ANRM (kg)
8.0
7.5
7.0
6.5
6.0
5.5
Sire with 9 progeny
5.0
4.5
i.e. accuracies are
generally slightly
overstated with
Henderson’s ANRM
Sire with 50 progeny
4.0
4.0
4.5
5.0
5.5
6.0
6.5
7.0
SD(a), HIER (kg)
7.5
8.0
8.5
9.0
Conclusions

Uncertain paternity modeling complements
genetic marker information (as priors)
– Reliability on prior information can be
expressed (via Dirichlet).
 Little advantage over the use of Henderson’s
ANRM.
– However, accuracies of EPD’s overstated
using ANRM.
– Power of inference may improve with better
statistical assumptions (i.e. heterogeneous
residual variances)
Implementation issues
Likely require a non-MCMC approach
to providing genetic evaluations.
 Some hybrid with phantom grouping
may be likely needed.

– Candidate sires are not simply known
for some animals.

Bob Weaber’s talk.
Download