1 Neutral and selective processes shape MHC gene diversity and expression in 2 stocked brook charr populations (Salvelinus fontinalis) 3 Fabien C. Lamaze, Eric Normandeau, Scott A. Pavey, Gabriel Roy, Dany Garant, and Louis 4 Bernatchez 5 6 7 8 9 APPENDIX / SUPPLEMENTARY MATERIAL 10 Table S1. Descriptive genetic statistics for the MHC IIβ locus. Included are the year of sampling, the category of population 11 (Cat.) either heavily (HS), moderately (MS), or non-stocked (NS) lakes (Marie et al. 2010) or reference population (Lamaze et 12 al. 2013), population names and their location either Portneuf Wildlife Reserve (PN) or Mastigouche wildlife Reserve (MA) 13 (see Fig.1, (Marie et al. 2010)), number of individuals sampled (N), number of fish successfully genotyped (Ng), number of 14 alleles (A), allelic richness corrected for the smallest population (Ar) unbiased expected heterozygosity (Nei 1978) (He), 15 observed heterozygosity (Ho) and FIS (Weir & Cockerham 1984). (*) Indicate significant departure from Hardy-Weinberg 16 equilibrium. 17 Year Cat. 2007-2008 HS MS NS Population names Amanites Lake (AMA) Belles-de-Jour Lake (BEL) Méthot Lake (MET) Average Arcand Lake (ARC) Rivard Lake (RIV) Veillette Lake (VEI) Average Caribou Lake (CAR) Main de fer Lake (MAI) Sorbier Lake (SOR) Average 2009 HS Méthot Lake (MET9) MS Petit st. Bernard Lake (BER9) 2007 Ref Jacques Cartier Hatchery (JC) 18 Location PN PN PN N 24 48 72 Ng 18 44 69 A Ar He Ho FIS 7 6.833 0.751 0.389 0.489* 16 10.849 0.78 0.455 0.420* 21 13.439 0.848 0.623 0.267* 14.7 10.374 ± 1.922 0.793 ± 0.029 0.489 ± 0.070 0.392 ± 0.066 6 5.658 0.671 0.7391 -0.105* 8 7.943 0.86 0.722 0.164 6 5.781 0.711 0.736 -0.037 6.7 6.461 ± 0.742 0.747 ± 0.058 0.732 ± 0.005 0.007 ± 0.081 6 5.586 0.56 0.476 0.153 6 5.895 0.635 0.5 0.216* 8 8.000 0.841 0.588 0.307* 7 6.827 ± 1.090 0.679 ± 0.084 0.521 ± 0.034 0.225 ± 0.045 PN PN PN 24 23 24 18 24 19 PN PN PN 24 21 24 24 24 17 PN MA 43 43 47 47 17 13 12.025 10.099 0.861 0.801 0.651 0.511 0.246* 0.365* - 48 41 16 10.777 0.756 0.561 0.259* 19 Table S2. Sequencing Primers, Real-Time PCR Primers, and Taqman MGB Probe for Each Candidate Gene. Product 20 sizes are given for Salvelinus fontinalis partial cDNA sequences generated with the sequencing primers. 21 Genes 22 23 Sequencing primers (5’ – 3’) Forward Reverse Product size (bp) Real-Time PCR Primers (5’–3’) Forward Reverse Actin AGATGAAATCGCCGCACTGGTT CTCGTTGTAGAAGGTGTGATGCCA 278 GCTGTCTTCCCCTCCATCGT TCTCCCACGTAGCTGTCTTTCTG MHC Iα (UBA) CAGGTTTCTACCCCAGTGG ACAACAACAGCAACGACGAG 443 CATTGAGTGGCTGAAGAAGTA TCTTCTGGAGCAGAGACACTG MHC II (DAB) (SP4501) CCTGTATTTATGTTCTCCTTTC (SP4502) TAAGTGTTGCTACGGAGCC 350 GCGCCGTACTGGATAAGACAGTT TCAGCATGGCAGGGTGTCT Taqman MGB Probe (5’–3’) TCGTCCCAGGCATC ACTATGGGAAGAGCACTC TGAGCTCAGTGACTCC 24 Table S3. Parasites abundance for 2008 and 2009. Included are year, the sampling region: either Portneuf Wildlife Reserve 25 or Mastigouche wildlife Reserve (see Fig.1, (Marie et al. 2010), the lake, the number of individuals sampled (N), the stocking 26 category of lakes (Cat.): either heavily (HS), moderately (MS), or non-stocked (NS) lakes (Marie et al. 2010), and the mean 27 number of parasites per fish ± 1 standard error (SE). (g) Indicate that individuals were genotyped at the MHC IIβ. (NA) not 28 available as only the stomach was screened in 2008 (see materials and methods). Year 2008 2009 29 30 31 Echinorhynchus sp. (Acanthocephala) Mean ± SE Eubothrium sp. (Cestoda) Mean ± SE Crepidostomum sp. (Trematoda) Mean ± SE NA NA NA NA NA Sterliadochona sp. (Nematoda) Mean ± SE 9.7 ± 2.0 0.4 ± 0.4 2.6 ± 0.9 7.1 ± 2.2 324.6 ± 70.3 NA NA NA NA NA Region Mastigouche Lake Brochard (BRO) Hollis (HOL) Petit St-Bernard (BER) Chamberlain (CHA) Moyen (MOY) N Cat. 10 HS 10 HS 10 MS 10 MS 4 NS Portneuf Amanites (AMA) g Belles de Jour (BEL) g Méthot (MET) g Caribou (CAR) g Main de Fer (MAI) g 11 10 11 11 10 HS HS HS NS NS NA NA NA NA NA NA NA NA NA NA 277.6 ± 61.5 0.0 ± 0.0 0.0 ± 0.0 1.9 ± 1.7 93.5 ± 16.3 NA NA NA NA NA Mastigouche Portneuf Petit St-Bernard (BER9) g Méthot (MET9) g 47 43 MS HS 3.1 ± 2.6 0.0. ± 0.0 0.02 ± 0.02 2.6 ± 0.6 2.4 ± 0.6 0.07 ± 0. 07 50.7 ± 13.9 249.9 ± 40.1 NA NA NA NA NA Figure S1: Neighbournet of the 29 MHC II beta 1 domains observed in Salvelinus fontinalis. Blue indicates sequences with the deletion of an amino acid at position 59. Red indicate sequences with an amino acid insertion at position 60. “r” and “s” stand for resistant (allele 21) or susceptible allele (allele 6) to Aeromonas salmonicida infection (Croisetière et al. 2008). Figure S2: Amino acid alignment of the 29 MHC IIβ alleles of Salvelinus fontinalis. The nomenclature of the allele six and 21 refer to (Croisetière et al. 2008). (*) Indicate sites under positive selection after controlling for recombination. Figure S3: Allelic frequency distribution at the MHC IIβ gene. MHC IIβ allele frequency per lake for three heavily stocked (HS) populations (AMA, BEL, MET), moderately stocked (MS) populations (ARC, RIV, VEI), three non-stocked (NS) populations (CAR, MAI, SOR) and one reference domestic strain (JC). 46/53 24/31 48/55 1/9 49/56 4/12 5/13 83/90 82/89 78/85 21/28 53/60 17/24 74/81 71/78 59/66 77/84 Figure S4: The simulated three-dimensional structure model of the beta 1 domain of Salvelinus fontinalis MHC class II. The tertiary structure prediction was based on the most frequent allele (allele 29) of 31% homology with the human sequence sp|P04440 (PDB hit: 3lqzB) in the Protein Data Bank (http://www.rcsb.org/pdb/explore/explore.do). Amino acid residues under significant positive selection in S. fontinalis and corresponding to antigen binding sites or homodimerization patch in humans are highlighted in red or dark red, respectively. Residues shown in yellow and orange were under significant positive selection in S. fontinalis but do not correspond with antigen binding sites in humans. Residues in orange are conserved among salmonids species investigated thus far. Residues in blue are human antigen binding sites that were not found to evolve under positive selection in S. fontinalis. The 85 amplified exon 2 codons are numbered, only for sites under episodic selection. Those correspond to codons 9-92 of the mature protein. The first and second numbers separated by a dash represent codons under positive selection in the S. fontinalis and the corresponding codon number of the mature protein, respectively. Figure S5: Box plot presenting MHC IIβ expression in brook charr head kidney as function of minisatellite repeat number in MHC IIβ intron 2 and temperature, (8°C or 20°C). Grey and white correspond to long or short minisatellite repeat motifs (32bp). Comparisons of the distribution for the four groups were done with the least significant difference test with a Bonferroni correction (α = 0.05). The black dot represents the mean value. Appendix S6: Supplementary material and methods MHC sequencing The forward primer for the 454 amplicon preparation consisted of a nucleotide sequence containing (from 5’ to 3’) the primer A, the key, the MID and the SaCo_F primer as described by the manufacturer (Roche). We also designed a degenerate reverse primer as a SNP located at the 3’ end of our reverse SaCo_R primer was found in a previous study (Croisetière et al. 2008). The reverse primer for the 454 amplicon was composed from 5’ to 3’ of the B primer and the SaCO_Rde primer (5’-AGCCCTGCTCACCTGTCTTR-3’) as described by the manufacturer (Roche). Following amplification, samples were visualized on a 1% agarose gel. Reactions that yielded tight, strong bands at the expected size were purified using AMPure beads (Beckman Coulter Genomics) with a 96-well plate following the manufacturer’s instructions. Samples were quantified with Picogreen reagent (Invitrogen) on a Fluoroskan Ascent FL flourometer (Thermo Labsystems), prior to combination in an equimolar fashion for a final DNA quantity of 30 ng/µl in three libraries containing 151 samples each. Then, the libraries were sent to the Plateforme d’analyses biomoléculaires (Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, Canada) to perform the pyrosequencing using the 454 GS-FLX DNA Sequencer with the Titanium Chemistry (Roche) using the procedure described by the manufacturer. Each library was pyrosequenced on 1/8 of a plate. In a first step of quality control of the 454 sequencing, 27 individuals were rerun, representing at least two individuals per population. To control for quality between 454 sequencing and Sanger sequencing, we also Sanger sequenced MHC IIβ in 90 individuals from the 2009 populations, using the forward SP4501 and reverse SP4502 primers (Croisetière et al. 2008). PCR was carried out in a final volume of 12.5 μL containing 512ng of DNA using the GoTaq® Flexi DNA polymerase kit (Promega). The PCR protocol had a 95°C initial denaturation step for 2min, followed by 35 cycles of denaturation at 94°C for 30s, an annealing step for 30s at 47°C, and an elongation step at 72°C for 1min. A final extension step at 72°C for 10min and a cool-down step at 10°C were added. PCR products were directly sequenced on an ABI 3100 (Applied Biosystems) after running samples on a 1% agarose gel to check for the presence of tight and strong bands at the expected molecular weight. Cloning was not used, as it is prone to introduce artifacts (e.g. Longeri et al. 2002). All chromatograms were visually inspected for quality control. MHC genotyping During phase one, the internal branch length threshold value, defining the minimum branch length to define a cluster and an appropriate threshold value for the minimal proportion of reads used to define a cluster was set to 0.20 and 0.05, respectively. Putative individual consensus alleles were aligned with Sanger sequences of homozygote individuals (n = 10) from 2009 samples and six Salvelinus fontinalis MHC class IIβ alleles (Croisetière et al. 2008), with MUSCLE (Edgar 2004). The Sanger sequences were assumed to be free of sequencing error, and gaps in the resulting alignment were removed from all sequences. Phase two of genotyping pipeline was then performed on the putative alleles for each individual with a minimum internal branch length of 0.08, which represents the difference between two alleles of 255bp in length with a single SNP, and a minimum number of two sequences to define a cluster. The result of this step represents the putative consensus alleles for all populations. In the third step, we assigned one or two global alleles to each individual, and excluded alleles that were likely artifactual. Cleaned aligned sequences from each individual were BLASTed to the individual global consensus alleles that were the output of phase two of genotyping pipeline. This resulted in a count of the number of sequences that blasted to a consensus allele for each individual. We then plotted the number of individuals an allele received with at least one BLAST hit against the number of sequences representing the allele only in the individuals where it is found. By examining each allele in this fashion, we found that a natural break in this relationship could be determined visually. Then, a minimal threshold number of sequences per individual were established to genotype the allele to that individual. For most of the alleles, this threshold was 50 reads per individual but for alleles 8, 9, 11, 26 the number was 15. Finally, for genotype call confirmation and data quality control, the genotype of two individuals per population were run twice and compared. Also individuals that were Sanger sequenced were used to confirm the 454 genotype status of each fish found to be either a homozygote or heterozygote and check the allelic concordance for homozygote individuals. Appendix Literature Cited Croisetière S, Tarte P, Bernatchez L, Belhumeur P (2008) Identification of MHC class IIβ resistance/susceptibility alleles to Aeromonas salmonicida in brook charr (Salvelinus fontinalis). Molecular Immunology, 45, 3107–3116. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32, 1792–1797. Lamaze FC, Garant D, Bernatchez L (2013) Stocking impacts the expression of candidate genes and physiological condition in introgressed brook charr (Salvelinus fontinalis) populations. Evolutionary Applications, 6, 393–407. Lamaze FC, Sauvage C, Marie A, Garant D, Bernatchez L (2012) Dynamics of introgressive hybridization assessed by SNP population genomics of coding genes in stocked brook charr (Salvelinus fontinalis). Molecular Ecology, 21, 2877–2895. Longeri M, Zanotti M, Damiani G (2002) Recombinant DRB sequences produced by mismatch repair of heteroduplexes during cloning in Escherichia coli. European journal of immunogenetics : official journal of the British Society for Histocompatibility and Immunogenetics, 29, 517–523. Marie AD, Bernatchez L, Garant D (2010) Loss of genetic integrity correlates with stocking intensity in brook charr (Salvelinus fontinalis). Molecular Ecology, 19, 2025–2037. Nei M (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics, 89, 583–590. Weir B, Cockerham CC (1984) Estimating f-statistics for the analysis of population structure. Evolution, 38, 1358–1370.