sv-lncs - acteon

advertisement
Genotyping Strategies for Genomic Selection in Dairy
Cattle
J.A. Jiménez-Montero1*, O. Gonzalez-Recio2, R. Alenda1
1
Departamento de Producción Animal, E.T.S. Agrónomos-Universidad
Politécnica,.Ciudad Universitaria s/n, 28040 Madrid, Spain
2Departamento de Mejora Genética Animal, INIA, Carretera La Coruña km 7,
28040 Madrid, Spain
Abstract. Genotyping strategies have to be design to make economically feasible
genomic selection in dairy cattle. A simulation was performed to study different
genotyping strategies in a northern Spanish dairy cattle population. These strategies
were used for the selection of the 2%, 5% and 10 % of the female population: random
(RND), divergent phenotypic (DPH) and EBV (DBV) selection, top phenotypic
(TopPH), and EBV (TopBV) selection and divergent family EBV selection (DFM).
For the sake of comparison, a strategy genotyping sires (SDYD) was also used.
Simulation was done for traits with h2=0.1 and h2=0.25. Genomic breeding values
were estimated using Bayesian Lasso. Highest predictive accuracies were found for
DPH and DBV strategies overperformed other strategies, including SDYD. Strategies
selecting only top animals might be unwise.
Keywords: genomic selection, genotyping strategies, female population,
Bayesian Lasso, accuracy.
1 Introduction
Genomic selection is one of the most promising tools to increase genetic gain rate
appeared in the last decades. It has been focused on predicting sires’ PTA or DYD
due to the higher accuracy of proven bulls’ EBVs and the impact of sires in the
breeding programs. Nonetheless, there are some facts that make hypothesizing that
genomic selection may be of interest in the female population. For instance,
economically important traits are measured in the female population, it represents the
largest proportion of the Holstein population, dominant and epistatic effects may be
captured and exploited on it and the association between cow’s genotype and
‘adjusted phenotype’ is expected to be stronger than that between sire’s genotype and
his progeny average. As current genotyping cost is still economically unfeasible for
commercial farms, maximizing genotyping investment is still a great challenge in
genomic selection to be used in the commercial population.
Several genotyping strategies may be considered. For instances, a reduced number
of informative SNPs may be chosen from a reference population, usually sires [1,2]
*
Corresponding author: jajmontero@gmail.com
The use of imputation algorithms for obtaining unobserved SNPs genotypes using low
dense panels as described in [3]. Selective phenotyping and genotyping of most
informative animals [4, 5] is also an interesting strategy that may be compatible with
the previous ones. This study focuses on this latter.
The objective of the present study was to evaluate the most informative case in a
selective genotyping strategy to increase the predictive accuracy of EBV in future
generations.
2 Methods
Simulation.
QMSim software [6] was run according to parameters showed in Table 1. First,
1040 historical generations were generated to produce a realistic level of LD. Then,
20,000 females and 300 males were selected as founders, followed by 10 generations
of selection. Animals in the generations 11 to 14 were used as reference population
(40,195), while whole generation 15 was genotyped as test set. These simulations try
to mimic a Spanish dairy population.
Selective genotyping strategies
The 2%, 5% and 10% of the reference population were selected as training set
according to different strategies for both 0.25 and 0.10 heritability traits, as described
next:
1. At random (RND).- Females drawn randomly from the whole reference
population.
2. Divergent phenotypic values (DPH).- Equal number of females in the α and (1α) percentiles of the ‘adjusted’ phenotypic distribution.
3. Divergent EBV values (DBV).- Females with their breeding values in the α and
(1-α) percentiles.
4. Highest phenotypic values (TopPH).- Top ranking cows for ‘adjusted’
phenotypic values.
5. Highest EBV values (TopBV).- Top ranking cows for breeding values.
6. Divergent family EBV (DFM).- Half sibs females sired only by the best and
worse bulls in EBV.
As benchmark, all contemporary sires (996) from G-11 to G-14 were also
genotyped in accordance with the most common current strategy (SiresDYD). It must
be pointed out that the phenotype used in this study may be interpreted as an adjusted
phenotype by environmental effects, resulting in a combination of cow’s genetic
value plus some residual.
Genomic evaluation model.
Bayesian Lasso [7] was used to estimate SNP coefficients on the genotyped
population (according to the strategies above) and predicting the corresponding
genomic breeding values (GBV) in generation 15. Phenotypes were used as
dependent variables in all strategies, except for SiresDYD, in which DYD were used.
Pearson correlations between predicted GBV and true breeding values (TBV) were
calculated in generation 15, and confident intervals were obtained using boostraping
[8].
Table 1: Parameters used in the simulations
Heritability
Phenotypic variance
Historical
population
generation
Founders (males/females)
Mutation rate
Selection generations
SNP Platform
0.25
1.0
1040
0.10
1.0
1040
300 /
20,000
2.5e-5
15
10,000
300 /
20,000
2.5e-5
15
10,000
Bias and Risk.
Bias of genomic predictions was estimate as the average difference between TBV
and GBV in generation 15. Finally the risk of the estimator was calculated as
Risk  bias 2   e2 . As reference, both measurements were calculated for pedigree
index values and genomic predictions based on DYD.
3 Results and Discussion
Table 2 shows some parameters describing the simulated population. A linkage
disequilibrium of 0.33 (±0.02), measured as r2 [9], per chromosome in the last
generation (test set) was obtained. This value was similar to that reported by
Sargolzaei et al [10] in Holstein Cattle in North America using a 10k SNP platform,
as was simulated in this study. An average inbreeding coefficient of 0.04 and 0.07
was achieved in the last generation in each of the scenarios of medium or low
heritability, respectively.
Table 2: Genetic parameters of interest in the test population
Parameters
Inbreeding
LD (r2*)
Pedigree Index accuracy Generation
15
h2=0.2
5
0.04±0
.01
0.33±0
.02
0.39±0
.01
h2=0.1
0
0.07±0
.03
0.33±0
.02
0.36±0
.01
*Average Linkage Disequilibrium between adjacent SNP in the 30 Chromosomes
Predictive accuracy, measured through Pearson correlations between predicted
GBV and TBV, are shown in Table 3 for both medium and low heritability scenarios.
The accuracies of the SiresDYD strategy were 0.44 and 0.71 for the low and medium
heritability traits, respectively. As expected, all strategies on the female population
increased the predictive accuracy as larger was the proportion of genotyped animals
(Figure 1). For the medium heritability trait there was a threshold accuracy around
0.8. Divergent predictors strategies, DPH (0.46 and 0.73) and DBV (0.49 and 0.73),
showed better predictive accuracy than DYD at 2% of population genotyped with
h2=0.25. DPH achieved the largest predictive accuracy (0.66 and 0.81) in both traits.
The selective genotyping strategies of TopPH (0.00 and 0.19) and TopBV (-0.05
and -0.18) achieved a lowered predictive accuracy regarding RND (0.14 and 0.44)
strategy for 2% of population genotyped (for low and medium heritability traits,
respectively). Genotyping top EBV animals for their use as a reference population in
the prediction of GBV achieved the poorest accuracy.
Table 3: Accuracy1 of genomic breeding values
%
Genotyped
Divergent Values
h2
Reference
Population
Phen
(DPH)
EBV
(DBV)
Top Values
Fam.
EBV
Phe
EBV
(DFM) (TopPH) (TopBV)
Random
RND
0.25
2%
5%
10%
0.73
0.79
0.81
0.73
0.75
0.76
0.55
0.69
0.73
0.19
0.37
0.42
-0.18
-0.05
0.08
0.44
0.59
0.71
0.1
2%
5%
10%
0.46
0.59
0.66
0.49
0.52
0.53
0.34
0.42
0.52
0.00
0.06
0.14
-0.05
-0.09
-0.02
0.14
0.28
0.37
1All
standard deviations ranged between 0.003 and 0.007
a) Heritability = 0.1
1.00
b) Heritability = 0.25
1.00
0.80
0.80
SiresDYD
0.60
0.60
SiresDYD
0.40
0.40
0.20
0.20
0.00
0.00
0%
5%
10%
-0.20
% Animals genotyped Reference population
0%
5%
10%
-0.20
% Animals genotyped Reference population
RND
DPH
TopPH
RND
DPH
TopPH
DBV
TopBV
DFM
DBV
TopBV
DFM
Figure 1: Estimated accuracies for GBV in generation 15, when 2%, 5% and 10% of
females in the training population (G 11 – 14) was being genotyped using different strategies.
Traits heritability a) 0.10 and b) 0.25
DFM achieved an intermediate predictive accuracy (0.34 and 0.55) between
individual divergent (DPH and DBV) and RND strategies, and similar to that obtained
with SiresDYD for a medium heritability trait and over performing it in a low
heritability case at increasing the number of animals in the reference population.
Pedigree Index based on BLUP predictions showed lower Bias and Risk values (0.01 and 0.12) compared to EGV with the SiresDYD strategy (1.82 and 3.41) while
female strategies (Table 4) showed intermediate results. DFM and RND strategies
showed better properties, whereas TopPH resulted the most biased strategy and the
highest risk. Bias and risk came close to 0 for all strategies as the reference population
increased, except for DPH and DBV where there was not a clear trend.
DPH, DBV, TopPH, TopBV strategies tended to overestimate TBV while RND
and DFM as well as BullDYD underestimated the true genetic values of the
Generation 15
Table 4: BIAS and RISK of genomic predictions
h2=
0.25
%
Genotyped
Reference
Population
Divergent Values
Phen
(DPH)
EBV
(DBV)
Top Values
Fam.
EBV
Phe
EBV
(DFM) (TopPH) (TopBV)
Random
RND
BIAS
2%
5%
10%
-0.50
-0.66
-0.62
-0.29
-0.40
-0.39
0.27
0.13
0.05
-1.81
-1.46
-1.16
-1.27
-0.94
-0.67
0.30
0.20
0.12
RISK
2%
5%
10%
0.81
0.93
0.78
0.42
0.41
0.35
0.18
0.10
0.08
3.41
2.25
1.48
1.79
1.06
0.61
0.21
0.14
0.09
Conclusions
Predictive accuracy of GBV depends on the amount of animals genotyped and the
selective genotyping strategy used. Divergent genotyping strategies in females may
increase the efficiency of breeding programs when combined with the current male
genotyping strategy. Nonetheless, some bias was detected in all of these strategies
mainly for Top strategies and SiresDYD that deserve further researching.
Future research should focus on the male and female genotyping performance, as
well as incorporating other strategies to reduce costs. Furthermore, a complete
economical research is needed in order to find optimal size of the genotyped reference
population.
References
1. Van Raden, P.M., Van Tassell, C.P., Wiggans, G.R., Sontegard, T.S., Schnabel, R.D.,
Taylor, J.F., Schenkel, F.S.: Reliability of genomic predictions for North American Holstein
bulls. J. Dairy Sci. 92, 16-24 (2009)
2. Weigel, K. A., de los Campos, G., González-Recio, O., Naya, H., Wu, X.L., Long, N.,
Rosa, G.J. M.. Gianola, D.: Predictive ability of direct genomic values for lifetime net
merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. J.
Dairy Sci., 92, 5248-5257 (2009)
3. Weigel, K. A., Van Tassell, C.P., O´Connel, J.R., Van Raden, P.M., Wiggans, G.R.:
Prediction of unobserved single nucleotide polymorphism genotypes of jersey cattle using
reference panels and population-based imputation algorithms Submitted (2010)
4. Sen, S., Johannes, F., Broman, K.W.: Selective Genotyping and Phenotyping Strategies in a
Complex Trait Context. Genetics 181, 1613-1626 (2009)
5. Spangler, M.L., Sapp, R.L., Bertrand, J.K., Mac Neil, M.D., Rekaya, R.: Different
methods of selecting animals for genotyping to maximize the amount of genetic information
known in the population. J. of Anim, Sci. 86, 2471-2479 (2008)
6. Sargolzaei, M., and Schenkel., F. S.: QMSim: a large-scale genome simulator for livestock.
Bioinformatics 25, 680-681 (2009)
7. De los Campos, G., Naya, H., Gianola, D., Crossa, J., Legarra, A., Manfredi, E., Weigel, K.,
Cotes, J.M.: Posterior Predicting Quantitative Traits with Regression Models for Dense
Molecular Markers and Pedigrees. Genetics 182, 375–385 (2009)
8. Efron, B.: "Bootstrap Methods: Another Look at the Jackknife". The Annals of Statistics 7
(1), 1–26 (1979)
9. Hill, W.G., Robertson, A.: Linkage disequilibrium in finite populations. Theor. Appl. Genet
38, 226-231 (1968)
10.Sargolzaei, M., Schenkel, F. S., Jansen, G. B., Schaeffer, L.R.: Extent of Linkage
Disequilibrium in Holstein Cattle in North America J. Dairy Sci. 91, 2106-2117 (2008)
11.R Development Core Team: URL http://www.R-project.org (2009)
Download