1 Characterization of single nucleotide polymorphisms (SNPs) in sheep and 2 their variation as an evidence of selection 3 L. Pariset1, I. Cappuccio1, M.S. D’Andrea2, D. Marletta3, P. Ajmone Marsan4, A. 4 Valentini1 and ECONOGENE Consortium5 5 1 Dipartimento di Produzioni Animali, Università della Tuscia, Viterbo, Italy; 6 2 Dipartimento di Scienze Animali Vegetali e dell'Ambiente, Università degli Studi del 7 Molise, Campobasso, Italy; 8 3 9 Università di Catania, Catania, Italy; Dipartimento di Scienze Agronomiche, Agrochimiche e delle Produzioni Animali, 10 4 Istituto di Zootecnica, Università Cattolica del Sacro Cuore, Piacenza, Italy; 11 5 http://lasig.epfl.ch/projets/econogene/ 12 13 Correspondence: Lorraine Pariset. Dipartimento di Produzioni Animali, Facoltà di Agraria, 14 Università della Tuscia, via S. Camillo de Lellis, 01100 Viterbo, Italy - Tel. +39 0761 15 357447- Fax: +39 0761 357434 - Email:pariset@unitus.it 16 17 18 SUMMARY Single nucleotide polymorphisms (SNPs), already used in commercial tasks like 19 traceability, paternity testing and selection for suitable genotypes, are recently applied also in 20 evolutionary, ecological and conservation studies. 21 SNP discovery and characterization in sheep, was performed in 8 different European 22 breeds. Eleven SNPs were characterised and genotyped on about 1700 individuals belonging 23 to 57 breeds and basic population parameters were computed. A method for the identification 24 of loci that are likely to be subject to selection (outliers) was applied and three of them were 25 identified. 1 1 2 INTRODUCTION Single nucleotide polymorphisms (SNPs) spread across the genome and have the 3 potential to become the marker of choice in ecological and conservation studies (Morin et al., 4 2004; Seddon et al., 2005). Moreover, some studies have evaluated the utility of SNPs for 5 estimating parameters such as population history and inference of relationships (Kuhner et al., 6 2000; Glaubitz et al., 2003). SNPs are less variable than microsatellites, but offer a vast 7 number of potentially available loci (Brumfield et al., 2003). Despite the difficulties in SNPs 8 identification, the use of those markers leads to a rapid, large scale and cost effective 9 genotyping (Syvanen et al., 2001; Vitalis et al., 2001; Vignal et al., 2002; Schlotterer, 2004). 10 Loci that show unusually low or high levels of genetic differentiation are often assumed to be 11 subject to selection (Bowcock et al., 1991). A method for identify such loci was described by 12 Beaumont and Nichols (1996). 13 Signatures of natural selection can be detected quantifying the variation of SNP allele 14 frequencies between populations by the statistic Fst (Lewontin and Krakauer, 1973; 15 Beaumont and Nichols, 1996). Under selective neutrality, Fst is determined by genetic drift, 16 which will affect all loci across the genome in a similar fashion. On the other hand, selection 17 is a locus-specific force that can cause deviation in Fst valued in a selected gene and close 18 genetic markers. 19 A list of genes involved in key metabolic pathways influencing production, disease 20 resistance and morphological traits have been selected for SNP discovery under the EU 21 Econogene project (http://lasig.epfl.ch/projets/econogene/). All these genes are candidate to 22 have high adaptive value and to be under strong natural or artificial selection. 23 Here, we show the potential of SNPs as genotypic markers for assessing genetic 24 differentiation and heterozygosity in sheep. We therefore applied a method to identify loci 25 behaving differently from neutral models and potentially under natural or artificial selection. 2 1 2 MATERIAL AND METHODS 3 Samples 4 Blood samples of a total of 1748 sheep, about one third males, belonging to 57 breeds, 5 52 of which are autochthonous, were sampled and DNA was extracted with conventional 6 methods (tab. 1). Whenever possible, in each breed 3 samples per farm were collected in 11 7 farms spread over the traditional rearing area of the breed. 8 For polymorphisms screening and SNPs characterization, a panel of 16 unrelated 9 sheep belonging to 8 different European breeds was used as representing the most of the 10 variation across a wide geographic area (fig. 1). 11 SNP discovery 12 SNP discovery was set up on the previously described panel and polimorphysms have 13 been searched for in exon and intron sequences following direct sequencing and DHPLC 14 analyses. Eleven genes carrying at least one SNP were discovered from sixteen genes screened. 15 PCR amplifications 16 17 18 Pairs of PCR primers were designed on genomic sequences published in Genebank belonging either to Ovis aries or to the most genetically related species available. Amplification conditions were optimised to produce single amplicons of the predicted 19 length. Each polymerase chain reaction (PCR) was performed in volume of 30 l containing 20 30 ng of genomic DNA, 1.6 pMol of each primer (MWG Biotech), 200 mM dNTPs 21 (Promega), 1X PCR buffer and 0.2 unit Taq DNA polymerase (Amersham Pharmacia 22 Biotech) on a PCR Express cycler (Hybaid). After initial denaturation at 94°C for 5 min, 35 23 cycles of amplification with denaturation at 94°C for 30 sec, annealing for 30 sec at the 24 temperature optimised for each primer pair and extension at 72°C for 1 min, followed by a 25 final extension step at 72°C for 10 min, were performed. 3 1 Sequence analysis 2 Polymerase chain reaction products were purified through Sephadex G-50 (Amersham 3 Pharmacia Biotech) to remove residual primers and dNTPs and used as templates for two 4 sequencing reactions. Sequencing was performed using a CEQ8800 sequencer using DTCS 5 QuickStart Kit (Beckman Coulter) according to manufacturer’s instructions. The sequence 6 data were analysed and aligned with Bioedit software (Hall, 1999) to ascertain 7 polymorphisms and characterize SNPs. 8 DHPLC analysis 9 Seven gene fragments have been subjected to DHPLC scanning on an automated 10 HPLC instrument (Transgenomic WAVETM DNA Fragment Analysis System) for SNPs 11 discovery. PCR products, loaded on a DNASep analytical column (Transgenomic), were 12 eluted from the column with a binary gradient of 0.1 M triethylammonium acetate (TEAA, 13 pH 7.0) and 0.1 M TEAA with 25% of acetonitrile (pH 7.0) at a flow rate of 1.5 ml/min. 14 Elution gradients and analysis temperatures are automatically predicted from the target 15 sequences by WAVEMaker 4.1 software (Transgenomic) and eventually adjusted manually 16 according to results, and the eluted products detected by UV analysis at 260 nm. Polymorphic 17 individuals have been sequenced to confirm and characterize each SNP. 18 SNP analysis 19 Genotyping was performed by K Biosciences (www.Kbioscience.com), using 20 Amplifluor™ (Serologicals™) and Taqman™ (Applied Biosystems™) chemistries. 21 Generally, accuracy greater than 99% was achieved. Quality control criteria were performed 22 (Water as negative control, inter plate duplicate testing of a known DNA, intra plate testing of 23 a known DNA). 4 1 2 Data analysis Heterozygosis and Fst were calculated with Fdist software and used to quantify the 3 variation of SNP allele frequencies between populations for detecting signatures of selection 4 (Beaumont and Nichols, 1996). Population datasets were built by bootstrapping 200000 5 replications on real data using the coalescent model. An upper and lower confidence limit of 6 95% quantiles was assumed for conditional joint distribution of Fst versus mean 7 heterozygosity. 8 9 Significant differences in allele frequencies of outlier genes in sheep breeds with different productive attitude were assessed using a t-student test. 10 11 12 RESULTS AND DISCUSSION Using primers designed on sequences of Ovis aries or of the most related species 13 available, 16 fragments of genes with different functions were identified as polymorphic by 14 DHPLC analysis or direct sequencing in a panel of 8 sheep breeds. 15 Sequence data obtained with primers designed for this study were deposited into 16 GenBank nucleotide sequence database, under the accession numbers AY292283 (GHR), 17 AY292289 (GHRHR), AY737509 (IGF1), AY737517 (MYH1), AY737511 (TYRP1), 18 AY787745 (ITGB1), AY787747 (CTSB). 19 Twelve SNPs were identified on a total of 6686 bp sequenced, giving an average 20 density of one SNP every 557 bases. Successful genotyping was performed on 11 SNPs, 6 of 21 which were transitions (4 G<->A and 2 T<->G) and 5 were T<->G transversions. 22 23 24 25 The frequencies of the alleles are reported in table 2. Frequencies < 0.5 % were considered as monomorphic. Heterozigosity of the loci determined from SNP frequencies (tab. 3) ranged from 0.063 to 0.497, with a mean of 0.28. 5 1 The Fst among different sheep breeds determined from SNPs frequencies (tab. 3) 2 shows a low level differentiation, although the mean heterozygosity of the loci is 0.28, a value 3 not particularly low for these type of markers, and the sample included a large amount of 4 populations spanning a considerable geographic area. A possible explanation is that the 5 easiness of transportation even in ancient times shuffled the genetic pool and did not permit 6 stratification through genetic drift. 7 Three loci out of 11 (MYH1, MEG3 and CTSB) were found to lie outside the 95% 8 confidence region of the conditional joint distribution of Fst and mean heterozygosity (fig.2), 9 resulting potentially under selection. All of them are related to meat production and they can 10 potentially produce a morphology modification. Since this trait is a major (and most effective) 11 selection target in sheep, it is not unreasonable that we found under selection such a large 12 percentage of genes related to meat production. However, the t-student test revealed that the 13 allele frequency of these three genes is not significantly correlated with the productive attitude 14 of the breeds. 15 A survey on the allele frequencies in each breed revealed that MYH1 behave according 16 to a neutral model, ranging from 0.48 to 0.52, in all the four Hungarian breeds analysed. 17 Further studies will be necessary to assess what caused this difference in this particular area. 18 Detecting evidence for selection is challenging because the effect of selection on the 19 distribution of genetic variation can be mimicked by population demographic history (Ackey 20 et al., 2002). The recent use of SNPs as genetic markers provides the opportunity to identify 21 genome-wide signature of selection, as shown in humans (Sunyaev et al., 2000; Fay et al., 22 2001). Identifying loci that have been targets of selection is a challenging area of research in 23 animal genetics. Calculating Fst and mean heterozygosity as a measure of genetic 24 differentiation for each locus, we have identified three candidate genes in sheep, the 25 distribution of genetic variation of which indicates that they have been targets of selection. 6 1 2 ACKNOWLEDGMENTS 3 This work has been partially supported by the EU Econogene contract QLK5-CT-2001- 4 02461. The content of the publication does not represent necessarily the views of the 5 Commission or its services. 6 7 REFERENCES 8 Akey, J. M., Zhang, G., Zhang, K., Jin L., Shriver, M. D. (2002) Interrogating a High-Density 9 SNP Map for Signatures of Natural Selection. Genome Research 12, 1805-1814. 10 Beaumont, M.A., Nichols, R.A. (1996) Evaluating loci for use in the genetic analysis of 11 population structure. Proc. Roy. Soc. Lond. B. 263, 1619-1626. 12 Bowcock, A. M., Kidd J. R., Mountain J. L., Hebert J. M., Carotenuto L., Kidd K. K., 13 Cavalli-Sforza L. L. (1991) Drift, admixture, and selection in human evolution, a study with 14 DNA polymorphisms. Genetics 88, 839–843. 15 Brumfield, R. T., Beerli, P., Nickerson, D. A., Edwards, S. V. (2003) The utility of single 16 nucleotide polymorphisms in inferences of population history. Trends in Ecology and 17 Evolution 18, 249–256. 18 Fay, J.C., Wyckoff, G.J., Wu, C.I. (2001) Positive and negative selection on the human 19 genome. Genetics 158, 1227–1234. 20 Glaubitz, J. C., Rhodes, O. E. Jr, Dewoody, J. A. (2003) Prospects for inferring pairwise 21 relationships with single nucleotide polymorphisms. Molecular Ecology 12, 1039–1047. 22 Hall, T.A. (1999) BioEdit, a user-friendly biological sequence alignment editor and analysis 23 program for Windows 95/98/NT. Nucl. Acids. Symp. Ser. 41, 95-98. 7 1 Kuhner, M. K., Beerli, P., Yamato, J., Felsenstein, J. (2000) Usefulness of single nucleotide 2 polymorphism data for estimating population parameters. Genetics 156, 439–447. 3 Lewontin, R.C., Krakauer, J. (1973) Distribution of gene frequency as a test of the theory of 4 the selective neutrality of polymorphisms. Genetics 74, 175–195. 5 Morin, P. A., Luikart, G., Wayne, R. K. (2004) SNP workshop Group SNPs in ecology, 6 evolution and conservation. Trends in Ecology and Evolution 19, 208–216. 7 Schlötterer, C. (2004) The evolution of molecular markers — just a matter of fashion? Nature 8 Reviews Genetics 5, 63–69. 9 Seddon, J. M., Parker, H. G., Ostrander, E. A., Ellegren H. (2005) SNPs in ecological and 10 conservation studies, a test in the Scandinavian wolf population. Molecular Ecology 14, 503– 11 511. 12 Sunyaev, S.R., Lathe III, W.C., Ramensky, V.E., Bork, P. (2000) SNP frequencies in human 13 genes an excess of rare alleles and differing modes of selection. Trends Genet. 16, 335–337. 14 Syvanen, A.C. (2001) Accessing genetic variation: genotyping single nucleotide 15 polymorphisms. Nature Genetics Reviews 2, 930-942. 16 Vignal, A., Milan, D., San Cristobal M., Eggen A. (2002) A review on SNP and other types of 17 molecular markers and their use in animal genetics. Genetics Selection Evolution 34, 275– 18 305. 19 Vitalis, R., Dawson, K., Boursot, P. (2001) Interpretation of Variation Across Marker Loci as 20 Evidence of Selection. Genetics 158, 1811–1823. 21 Table 1. Breeds analysed, country of origin and number of genotyped samples. 22 * breeds included in the SNPs detection panels. Breed Origin Animals analysed Akkaraman* Altamurana Turkey Italy 17 31 8 Anogeiano Bardhoka Bergamasca* Churra Cikta Colmenareña Cypriot fat-tailed sheep Dagliç Delle Langhe Exmoor Horn Gentile di Puglia German Grey Heath German Merino Heri Hungarian Merino Hungarian Tsiagia Kalarritiko Kamieniec Karagouniko* Karakul Karayaka Kefallinias Kymi Laticauda Lesvos Manchega Morkaraman Naemi Najdi Orino Ossimi Piliorritiko Polish Merino Polish Mountain (Owza gorska) Pomorska Racka Rhönsheep* Romanian Merino Romanian Tsigaia Rubia del Molar* Ruda Scottish Blackface Segureña Sfakia Shkodrane Skopelos Spanish Merino I Spanish Merino II Swaledale Thône et Martod Turcana* Welsh Mountain* White and Brown Mountain Sheep Wrzosowka Zelazna* Greece Albania Italy Portugal Hungary Spain Cyprus Turkey Italy Great Britain Italy Germany Germany Saudi-Arabia Hungary Hungary Greece Poland Greece Romania Turkey Greece Greece Italy Greece Spain Turkey Saudi-Arabia Saudi-Arabia Greece Egypt Greece Poland Poland Poland Hungary Germany Romania Romania Spain Albania Great Britain Spain Greece Albania Greece Spain Spain Great Britain France Romania Great Britain Germany Poland Poland 31 31 31 30 31 31 32 31 31 31 31 31 31 29 31 29 31 31 31 31 31 31 31 31 31 31 31 29 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 32 31 31 31 31 31 31 31 1 2 9 1 Table 2. Locus, gene name, characterization and allele frequencies for each genotyped SNP. Locus Gene name Allele1 Allele2 %Ho1 %He %Ho2 SERPINA3 alpha1-antichymotrypsin A G 1.1 16.5 82.4 MEG3 callipyge T G 13.3 42.3 44.4 CTSB cathepsin B A G 1.3 14.5 84.1 GHRHR G T 5.5 31.8 62.7 GHR growth hormone releasing hormone receptor growth hormone receptor G A 30.2 46.0 23.8 IGF1 insulin like growth factor 1 T C 38.8 44.1 17.1 ITGB1 integrin B1 T C 1.4 15.1 83.5 GDF8 myostatin T C 0.5 5.3 94.2 TYRP1 tyrosinase-related protein 1 C T 2.8 23.4 73.9 ZP2 zona pellucida2 T C 4.8 27.7 67.5 MYH1 myosin G A 84.4 15.0 0.6 2 3 10 1 Table 3. Heterozygosity and Fst calculated for each locus Locus Sample heterozygosity Sample Fst Test P (simulated Fst statistic <Sample Fst) SERPINA3 0.170 0.039 -1.407 0.094 MEG3 0.451 0.030 -2.555 0.007 CTSB 0.169 0.022 -2.647 0.005 GHRHR 0.338 0.054 -0.736 0.256 GHR 0.497 0.085 0.864 0.218 IGF1 0.466 0.046 -1.303 0.113 ITGB1 0.166 0.068 0.115 0.463 GDF8 0.063 0.055 -0.181 0.441 TYRP1 0.254 0.057 -0.480 0.339 ZP2 0.308 0.087 0.873 0.215 MYH1 0.169 0.196 3.353 0.001 2 3 11 1 Figure 1. Distribution of the 8 European breeds used for SNPs characterization. 2 3 4 5 6 Figure 2. Plot of Fst against heterozigosity for the 11 SNPs analysed. ▲: upper and lower 7 confidential limits of 95% quantiles. ▬: median of 200000 replications of expected Fst and 8 heterozygosity using coalescent model. 9 12