Genotyping Strategies for Genomic Selection in Dairy Cattle J.A. Jiménez-Montero1*, O. Gonzalez-Recio2, R. Alenda1 1 Departamento de Producción Animal, E.T.S. Agrónomos-Universidad Politécnica,.Ciudad Universitaria s/n, 28040 Madrid, Spain 2Departamento de Mejora Genética Animal, INIA, Carretera La Coruña km 7, 28040 Madrid, Spain Abstract. Genotyping strategies have to be design to make economically feasible genomic selection in dairy cattle. A simulation was performed to study different genotyping strategies in a northern Spanish dairy cattle population. These strategies were used for the selection of the 2%, 5% and 10 % of the female population: random (RND), divergent phenotypic (DPH) and EBV (DBV) selection, top phenotypic (TopPH), and EBV (TopBV) selection and divergent family EBV selection (DFM). For the sake of comparison, a strategy genotyping sires (SDYD) was also used. Simulation was done for traits with h2=0.1 and h2=0.25. Genomic breeding values were estimated using Bayesian Lasso. Highest predictive accuracies were found for DPH and DBV strategies overperformed other strategies, including SDYD. Strategies selecting only top animals might be unwise. Keywords: genomic selection, genotyping strategies, female population, Bayesian Lasso, accuracy. 1 Introduction Genomic selection is one of the most promising tools to increase genetic gain rate appeared in the last decades. It has been focused on predicting sires’ PTA or DYD due to the higher accuracy of proven bulls’ EBVs and the impact of sires in the breeding programs. Nonetheless, there are some facts that make hypothesizing that genomic selection may be of interest in the female population. For instance, economically important traits are measured in the female population, it represents the largest proportion of the Holstein population, dominant and epistatic effects may be captured and exploited on it and the association between cow’s genotype and ‘adjusted phenotype’ is expected to be stronger than that between sire’s genotype and his progeny average. As current genotyping cost is still economically unfeasible for commercial farms, maximizing genotyping investment is still a great challenge in genomic selection to be used in the commercial population. Several genotyping strategies may be considered. For instances, a reduced number of informative SNPs may be chosen from a reference population, usually sires [1,2] * Corresponding author: jajmontero@gmail.com The use of imputation algorithms for obtaining unobserved SNPs genotypes using low dense panels as described in [3]. Selective phenotyping and genotyping of most informative animals [4, 5] is also an interesting strategy that may be compatible with the previous ones. This study focuses on this latter. The objective of the present study was to evaluate the most informative case in a selective genotyping strategy to increase the predictive accuracy of EBV in future generations. 2 Methods Simulation. QMSim software [6] was run according to parameters showed in Table 1. First, 1040 historical generations were generated to produce a realistic level of LD. Then, 20,000 females and 300 males were selected as founders, followed by 10 generations of selection. Animals in the generations 11 to 14 were used as reference population (40,195), while whole generation 15 was genotyped as test set. These simulations try to mimic a Spanish dairy population. Selective genotyping strategies The 2%, 5% and 10% of the reference population were selected as training set according to different strategies for both 0.25 and 0.10 heritability traits, as described next: 1. At random (RND).- Females drawn randomly from the whole reference population. 2. Divergent phenotypic values (DPH).- Equal number of females in the α and (1α) percentiles of the ‘adjusted’ phenotypic distribution. 3. Divergent EBV values (DBV).- Females with their breeding values in the α and (1-α) percentiles. 4. Highest phenotypic values (TopPH).- Top ranking cows for ‘adjusted’ phenotypic values. 5. Highest EBV values (TopBV).- Top ranking cows for breeding values. 6. Divergent family EBV (DFM).- Half sibs females sired only by the best and worse bulls in EBV. As benchmark, all contemporary sires (996) from G-11 to G-14 were also genotyped in accordance with the most common current strategy (SiresDYD). It must be pointed out that the phenotype used in this study may be interpreted as an adjusted phenotype by environmental effects, resulting in a combination of cow’s genetic value plus some residual. Genomic evaluation model. Bayesian Lasso [7] was used to estimate SNP coefficients on the genotyped population (according to the strategies above) and predicting the corresponding genomic breeding values (GBV) in generation 15. Phenotypes were used as dependent variables in all strategies, except for SiresDYD, in which DYD were used. Pearson correlations between predicted GBV and true breeding values (TBV) were calculated in generation 15, and confident intervals were obtained using boostraping [8]. Table 1: Parameters used in the simulations Heritability Phenotypic variance Historical population generation Founders (males/females) Mutation rate Selection generations SNP Platform 0.25 1.0 1040 0.10 1.0 1040 300 / 20,000 2.5e-5 15 10,000 300 / 20,000 2.5e-5 15 10,000 Bias and Risk. Bias of genomic predictions was estimate as the average difference between TBV and GBV in generation 15. Finally the risk of the estimator was calculated as Risk bias 2 e2 . As reference, both measurements were calculated for pedigree index values and genomic predictions based on DYD. 3 Results and Discussion Table 2 shows some parameters describing the simulated population. A linkage disequilibrium of 0.33 (±0.02), measured as r2 [9], per chromosome in the last generation (test set) was obtained. This value was similar to that reported by Sargolzaei et al [10] in Holstein Cattle in North America using a 10k SNP platform, as was simulated in this study. An average inbreeding coefficient of 0.04 and 0.07 was achieved in the last generation in each of the scenarios of medium or low heritability, respectively. Table 2: Genetic parameters of interest in the test population Parameters Inbreeding LD (r2*) Pedigree Index accuracy Generation 15 h2=0.2 5 0.04±0 .01 0.33±0 .02 0.39±0 .01 h2=0.1 0 0.07±0 .03 0.33±0 .02 0.36±0 .01 *Average Linkage Disequilibrium between adjacent SNP in the 30 Chromosomes Predictive accuracy, measured through Pearson correlations between predicted GBV and TBV, are shown in Table 3 for both medium and low heritability scenarios. The accuracies of the SiresDYD strategy were 0.44 and 0.71 for the low and medium heritability traits, respectively. As expected, all strategies on the female population increased the predictive accuracy as larger was the proportion of genotyped animals (Figure 1). For the medium heritability trait there was a threshold accuracy around 0.8. Divergent predictors strategies, DPH (0.46 and 0.73) and DBV (0.49 and 0.73), showed better predictive accuracy than DYD at 2% of population genotyped with h2=0.25. DPH achieved the largest predictive accuracy (0.66 and 0.81) in both traits. The selective genotyping strategies of TopPH (0.00 and 0.19) and TopBV (-0.05 and -0.18) achieved a lowered predictive accuracy regarding RND (0.14 and 0.44) strategy for 2% of population genotyped (for low and medium heritability traits, respectively). Genotyping top EBV animals for their use as a reference population in the prediction of GBV achieved the poorest accuracy. Table 3: Accuracy1 of genomic breeding values % Genotyped Divergent Values h2 Reference Population Phen (DPH) EBV (DBV) Top Values Fam. EBV Phe EBV (DFM) (TopPH) (TopBV) Random RND 0.25 2% 5% 10% 0.73 0.79 0.81 0.73 0.75 0.76 0.55 0.69 0.73 0.19 0.37 0.42 -0.18 -0.05 0.08 0.44 0.59 0.71 0.1 2% 5% 10% 0.46 0.59 0.66 0.49 0.52 0.53 0.34 0.42 0.52 0.00 0.06 0.14 -0.05 -0.09 -0.02 0.14 0.28 0.37 1All standard deviations ranged between 0.003 and 0.007 a) Heritability = 0.1 1.00 b) Heritability = 0.25 1.00 0.80 0.80 SiresDYD 0.60 0.60 SiresDYD 0.40 0.40 0.20 0.20 0.00 0.00 0% 5% 10% -0.20 % Animals genotyped Reference population 0% 5% 10% -0.20 % Animals genotyped Reference population RND DPH TopPH RND DPH TopPH DBV TopBV DFM DBV TopBV DFM Figure 1: Estimated accuracies for GBV in generation 15, when 2%, 5% and 10% of females in the training population (G 11 – 14) was being genotyped using different strategies. Traits heritability a) 0.10 and b) 0.25 DFM achieved an intermediate predictive accuracy (0.34 and 0.55) between individual divergent (DPH and DBV) and RND strategies, and similar to that obtained with SiresDYD for a medium heritability trait and over performing it in a low heritability case at increasing the number of animals in the reference population. Pedigree Index based on BLUP predictions showed lower Bias and Risk values (0.01 and 0.12) compared to EGV with the SiresDYD strategy (1.82 and 3.41) while female strategies (Table 4) showed intermediate results. DFM and RND strategies showed better properties, whereas TopPH resulted the most biased strategy and the highest risk. Bias and risk came close to 0 for all strategies as the reference population increased, except for DPH and DBV where there was not a clear trend. DPH, DBV, TopPH, TopBV strategies tended to overestimate TBV while RND and DFM as well as BullDYD underestimated the true genetic values of the Generation 15 Table 4: BIAS and RISK of genomic predictions h2= 0.25 % Genotyped Reference Population Divergent Values Phen (DPH) EBV (DBV) Top Values Fam. EBV Phe EBV (DFM) (TopPH) (TopBV) Random RND BIAS 2% 5% 10% -0.50 -0.66 -0.62 -0.29 -0.40 -0.39 0.27 0.13 0.05 -1.81 -1.46 -1.16 -1.27 -0.94 -0.67 0.30 0.20 0.12 RISK 2% 5% 10% 0.81 0.93 0.78 0.42 0.41 0.35 0.18 0.10 0.08 3.41 2.25 1.48 1.79 1.06 0.61 0.21 0.14 0.09 Conclusions Predictive accuracy of GBV depends on the amount of animals genotyped and the selective genotyping strategy used. Divergent genotyping strategies in females may increase the efficiency of breeding programs when combined with the current male genotyping strategy. Nonetheless, some bias was detected in all of these strategies mainly for Top strategies and SiresDYD that deserve further researching. Future research should focus on the male and female genotyping performance, as well as incorporating other strategies to reduce costs. Furthermore, a complete economical research is needed in order to find optimal size of the genotyped reference population. References 1. Van Raden, P.M., Van Tassell, C.P., Wiggans, G.R., Sontegard, T.S., Schnabel, R.D., Taylor, J.F., Schenkel, F.S.: Reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci. 92, 16-24 (2009) 2. Weigel, K. A., de los Campos, G., González-Recio, O., Naya, H., Wu, X.L., Long, N., Rosa, G.J. M.. Gianola, D.: Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. J. Dairy Sci., 92, 5248-5257 (2009) 3. Weigel, K. A., Van Tassell, C.P., O´Connel, J.R., Van Raden, P.M., Wiggans, G.R.: Prediction of unobserved single nucleotide polymorphism genotypes of jersey cattle using reference panels and population-based imputation algorithms Submitted (2010) 4. Sen, S., Johannes, F., Broman, K.W.: Selective Genotyping and Phenotyping Strategies in a Complex Trait Context. Genetics 181, 1613-1626 (2009) 5. Spangler, M.L., Sapp, R.L., Bertrand, J.K., Mac Neil, M.D., Rekaya, R.: Different methods of selecting animals for genotyping to maximize the amount of genetic information known in the population. J. of Anim, Sci. 86, 2471-2479 (2008) 6. Sargolzaei, M., and Schenkel., F. S.: QMSim: a large-scale genome simulator for livestock. Bioinformatics 25, 680-681 (2009) 7. De los Campos, G., Naya, H., Gianola, D., Crossa, J., Legarra, A., Manfredi, E., Weigel, K., Cotes, J.M.: Posterior Predicting Quantitative Traits with Regression Models for Dense Molecular Markers and Pedigrees. Genetics 182, 375–385 (2009) 8. Efron, B.: "Bootstrap Methods: Another Look at the Jackknife". The Annals of Statistics 7 (1), 1–26 (1979) 9. Hill, W.G., Robertson, A.: Linkage disequilibrium in finite populations. Theor. Appl. Genet 38, 226-231 (1968) 10.Sargolzaei, M., Schenkel, F. S., Jansen, G. B., Schaeffer, L.R.: Extent of Linkage Disequilibrium in Holstein Cattle in North America J. Dairy Sci. 91, 2106-2117 (2008) 11.R Development Core Team: URL http://www.R-project.org (2009)