Development of nineteen polymorphic microsatellite loci in the threatened polar bear (Ursus maritimus) using next generation sequencing Supplementary Material, Conservation Genetics Resources Jessica R. Brandt1*, Peter J. van Coeverden de Groot2, Kai Zhao1, Markus G. Dyck3, Peter T. Boag2 and Alfred L. Roca1 1 Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA 2 Department of Biology, Queen’s University, Kingston, ON, K7L 3N6 Canada 3 Department of Environment, Government of Nunavut, Igloolik, NU, X0A 0L0 Canada *Corresponding Author: jrbrandt@illinois.edu 1 Supplementary methods Next-generation sequencing (NGS) platforms (e.g. Roche 454 FLX Titanium and Illumina Genome Analyzer) are being used increasingly for large-scale identification of genetic markers in species of interest (Csencsics et al. 2010; Saarinen & Austin 2010). NGS methods provide rapid and effective means for identification of hundreds to thousands of candidate polymorphic microsatellite loci in any species for which samples are available (Castoe et al. 2010; Lepais & Bacles 2011). This method of characterizing microsatellite markers allows for the rapid screening of large sequence databases for identification of loci that meet stringent parameters. Microsatellite markers designed to amplify very short target regions (< 200 bp) can be used on samples that are highly degraded or contain low quality DNA, typical of fecal samples, museum specimens, or confiscated materials (Ishida et al. 2012). These markers can be useful for obtaining data from non-invasively acquired samples, minimizing the stress and danger of handling live individuals during specimen collection, while increasing the proportion of a population from which data can be obtained. Studies of polar bear population structure and genetic diversity have utilized microsatellite markers designed in the black bear (Ursus americana), brown bear (Ursus arctos), Asiatic black bear (Ursus thibetanus), or a combination of these (Paetkau et al. 1995; Paetkau et al. 1997; Paetkau & Strobeck 1998; Paetkau et al. 1999; Cronin et al. 2006; Crompton et al. 2008; Cronin et al. 2009; Zeyl et al. 2009). While a recent study developed ten microsatellite loci in the polar bear (Poissant & Davis 2011), only two of the novel loci yielded amplicon sizes that were less than 200 base pair in length. We therefore sought to generate microsatellite loci with 2 very short amplicons sizes, which would be useful for studies relying on non-invasive genotyping of polar bears. One high quality polar bear DNA sample was subject to shotgun sequencing using the Roche 454 GS FLX+ Titanium kit; a ¼ plate run was completed. The MSATCOMMANDER program (Faircloth 2008), used for identification of microsatellite repeats and initial primer design, was used to require a flanking region with a minimum length of 18 bp between the microsatellite array and the primer sequences. Primers were designed in MSATCOMMANDER through an interface with PRIMER3 software (Rozen & Skaletsky 2000) to meet the following criteria: amplification of a target product in the 75 to 150 bp size range (inclusive of the two primer lengths), optimal length of 20 base pair (range 18 to 22 base pair), optimal melting temperature of 60.0 °C (range of 58.0 °C to 62.0 °C), optimal GC content of 50%, inclusion of at least 1 bp GC clamp, low self or pair complementarity and a maximum end stability of 8.0 (Faircloth 2008). Once designed a number of quality checks were implemented before selection of primer pairs for testing. To prevent amplification of multiple non-target loci two steps were taken to ensure the uniqueness of the primer sequences: 1) a Perl script was written to search each primer sequence against the entire 454 generated sequence database and 2) primer sequences were searched against the non-redundant BLAST database. Any primers showing evidence of being part of a repetitive element (e.g. LINEs or SINEs) were removed from further analysis. DNA extracts from 10 adult male and 10 adult female individuals surveyed as part of the government of Nunavut’s census of the polar bears of M’Clintock Channel, Nunavut, were used to assay polymorphism at the novel loci. PCR products were fluorescently labeled using M13tailed primers (CACGACGTTGTAAAACGAC). Primer pairs were initially tested by PCR 3 performed in a 10.76 uL reaction mixture that included: 0.2 uM of each forward and reverse primer (M13-tailed forward primers), 0.2 mM of each dNTP, 1x PCR buffer, 1.5 mM MgCL2, 0.5 unit of Taq polymerase, 0.5 uL of a primer tailed with a 700 Infrared dye, and 1.2 uL of template DNA. Two PCR algorithms were used: a) initial 94°C for 5 min; with 35 cycles of 20 sec at 94°C, 30 sec at 55°C, 30 sec at 72°C; with a final extension of 72°C for 10 min or b) initial 94°C for 5 min; with 16 cycles of 20 sec at 94°C, 30 sec of 50°C up to 58°C (0.5°C increase per cycle), 40 sec at 72°C; followed by 24 additional cycles with 58°C annealing and a final extension at 72°C for 7 min (detailed PCR mix and algorithms are listed below). PCR amplicons were separated on a 6.5% polyacrylamide gel using the Licor 4200 automatic sequencer and allele sizes were scored using Gene ImagIR v.4.05 (Scanalytics). At total of 81 primer sets were examined for variability. Fifty-nine of the loci assessed contained dinucleotide repeats, of which 19 successfully amplified a product in the expected size range and 6 were polymorphic. For an additional 22 loci with non-dinucleotide repeats 14 successfully amplified a product in the expected size range and 13 were polymorphic. Of the 33 primer pairs that consistently amplified products in the correct size range 19 were polymorphic. Microsatellite variability was assessed based on the number of alleles per locus and expected and observed heterozygosity calculated respectively by GENEPOP, v.4.0 (Raymond & Rousset 1995) and POPGENE, v. 1.32 (Yeh & Boyle 1997). Linkage disequilibrium between pairs of loci was calculated with FSTAT, v. 2.9.3.2 (Goudet 1995) and deviations from Hardy-Weinberg equilibrium were estimated in GENEPOP; statistical significance was evaluated after a Bonferroni correction (Rice 1989). 4 Details of the PCR setup and PCR algorithm PCR Components Volume (ul) Distilled water 7.90 10X PCR Buffera 1.00 b dNTP Mix (100mM) 0.02 Forward Primer with M13 Tail (100uM) 0.02 Reverse Primer (100uM) 0.02 a Taq Polymerase 0.10 700 IR dye tailed primer 0.50 Template DNA 1.20 Total Volume 10.76 Ultrapure Taq DNA Polymerase with 10X ViBuffer S (Vivantis, PL1202) b 25mM of each dNTP (dATP, dCTP, dGTP, and dTTP) in a mix (Vivantis, NP2406) a PCR Algorithms A. 55°C Anneal 5 min at 94°C 35 cycles of 20 sec at 94°C, 30 sec at 55°C, 30 sec at 72°C 10 min final extension at 72°C Hold at 4°C B. Touchup 5 min at 94°C 1 cycle of 20 sec at 94°C, 30 sec at 50.0°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 50.5°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 51.0°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 51.5°C, 40 sec at 72°C 5 1 cycle of 20 sec at 94°C, 30 sec at 52.0°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 52.5°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 53.0°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 53.5°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 54.0°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 54.5°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 55.0°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 55.5°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 56.0°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 56.5°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 57.0°C, 40 sec at 72°C 1 cycle of 20 sec at 94°C, 30 sec at 57.5°C, 40 sec at 72°C 24 cycles of 20 sec at 94°C, 30 sec at 58.0°C, 40 sec at 72°C 7 min final extension at 72°C Hold at 4°C 6 References for supplementary information Castoe TA, Poole AW, Gu W, de Koning APJ, Daza JM, Smith EN, Pollock DD (2010) Rapid identification of thousands of copperhead snake (Agkistrodon contortrix) microsatellite loci from modest amounts of 454 shotgun genome sequence. Molecular Ecology Resources 10(2):341-347 Crompton AE, Obbard ME, Petersen SD, Wilson PJ (2008) Population genetic structure in polar bears (Ursus maritimus) from Hudson Bay, Canada: implications of future climate change. Biological Conservation 141(10):2528-2539 Cronin MA, Amstrup SC, Scribner KT (2006) Microsatellite DNA and mitochondrial DNA variation in polar bears (Ursus maritimus) from the Beaufort and Chukchi Seas, Alaska. Canadian Journal of Zoology-Revue Canadienne De Zoologie 84(5):655-660 Cronin MA, Amstrup SC, Talbot SL, Sage GK, Amstrup KS (2009) Genetic variation, relatedness, and effective population size of polar bears (Ursus maritimus) in the southern Beaufort Sea, Alaska. Journal of Heredity 100(6):681-690 Csencsics D, Brodbeck S, Holderegger R (2010) Cost-effective, species-specific microsatellite development for the endangered dwarf bulrush (Typha minima) using next-generation sequencing technology. Journal of Heredity 101(6):789-793 Faircloth BC (2008) MSATCOMMANDER: detection of microsatellite repeat arrays and automated, locus-specific primer design. Molecular Ecology Resources 8(1):92-94 Goudet J (1995) FSTAT (Version 1.2): A computer program to calculate F-statistics. Journal of Heredity 86(6):485-486 7 Ishida Y, Demeke Y, de Groot PJvC, Georgiadis NJ, Leggett KEA, Fox VE, Roca AL (2012) Short amplicon microsatellite markers for low quality elephant DNA. Conservation Genetics Resources 4(2):491-494 Lepais O, Bacles CFE (2011) Comparison of random and SSR-enriched shotgun pyrosequencing for microsatellite discovery and single multiplex PCR optimization in Acacia harpophylla F. Muell. Ex Benth. Molecular Ecology Resources 11(4):711-724 Paetkau D, Amstrup SC, Born EW, Calvert W, Derocher AE, Garner GW, Messier F, Stirling I, Taylor MK, Wiig O, Strobeck C (1999) Genetic structure of the world's polar bear populations. Molecular Ecology 8(10):1571-1584 Paetkau D, Calvert W, Stirling I, Strobeck C (1995) Microsatellite analysis of population structure in Canadian polar bears. Molecular Ecology 4(3):347-354 Paetkau D, Strobeck C (1998) Ecological genetic studies of bears using microsatellite analysis. Ursus 10:299-306 Paetkau D, Waits LP, Clarkson PL, Craighead L, Strobeck C (1997) An empirical evaluation of genetic distance statistics using microsatellite data from bear (Ursidae) populations. Genetics 147(4):1943-1957 Poissant J, Davis CS (2011) Isolation and characterization of ten polar bear (Ursus maritimus) microsatellite loci and cross-amplification in other Ursidae. Conservation Genetics Resources 3(4):637-639 Raymond M, Rousset F (1995) An exact test for population differentiation. Evolution 49(6):1280-1283 Rice WR (1989) Analyzing tables of statistical tests. Evolution 43(1):223-225 8 Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. Methods in molecular biology 132:365-386 Saarinen EV, Austin JD (2010) When technology meets conservation: increased microsatellite marker production using 454 genome sequencing on the endangered Okaloosa darter (Etheostoma okaloosae). Journal of Heredity 101(6):784-788 Yeh FC, Boyle TJB (1997) Population genetic analysis of co-dominant and dominant markers and quantitative traits. Belgian Journal of Botany 129(2):157-157 Zeyl E, Aars J, Ehrich D, Wiig O (2009) Families in space: relatedness in the Barents Sea population of polar bears (Ursus maritimus). Molecular Ecology 18(4):735-749 9 B. Number of loci Number of loci meeting primer design criteria A. Microsatellite Motifs Microsatellite Motifs Supplementary Figure S1. Number of identified microsatellite loci for each motif (panel “A”) and the number of loci of each motif which met the criteria for potential primer design, i.e., target sites for primers could be found near the repeat motif on both flanks (panel “B”). Note that the Y axis scales are different for the two panels. The dinucleotide motif AC was identified most frequently in the sequencing data; other dinucleotide motifs (AG and AT) were also commonly found. The most abundant motifs contained the largest overall proportion of loci that had suitable priming regions that were not part of a repetitive element. 10 Supplementary Table S1. Number and type of microsatellite motifs identified by next generation sequencing of polar bear DNA. Repeat Motif Dinucleotide Trinucleotide Tetranucleotide Pentanucleotide Hexanucleotide No. Repeats ˂10 10 ˃10 ˂10 10 ˃10 ˂10 10 ˃10 ˂10 10 ˃10 ˂10 10 ˃10 Sequenced Loci 1396 459 2368 77 22 108 231 185 260 23 6 13 1 - Loci with Primer Sites 82 27 64 7 2 2 17 8 4 3 - 11 Compound or Interrupted Loci 44 21 48 2 14 14 8 18 2 - Supplementary Table S2. Full amplicon sequence at polar bear microsatellite loci. Locus Uma14 Uma21 Uma35 Uma40 Uma42 Uma65 Uma73 Uma78 Uma84 Uma95 Uma101 Uma102 Uma127 Uma168 Amplicon Sequence (including primer sequences) GAGTTCCTCTTCATGCTTCGGTTTTTTCCTTTCCTTTCCTTTCCATTTCCAT TTCCATTCCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCAAAT TTCCTTTCATTTCCTATTCCCAGGCCATGTGTGTGAAAG TCCCATCCATGTGTCCATCCATCCATCCACCTGGCCATCCATCCATCTTT GTGTCCGTCCATCCATCCATCCATCCATCCATCCATCCATCCATCCATCT TCTTAAGACCTTTTATGTGCCAGGACAGGTGCAGACAG TCATCAGCGTCACCTACACCACGGCTCTGTACGCGTCCCTGTCTCCGTG TGTGTGTGTGTGTGTGTGACAGGCCTCGCCCCCCGCCGCCCGTCTCTTG GGACGCTTCCCAAGCACGTGTCCTGGTTCCTGTGTCCACCTCTC ACTTACACCATGGGCTCTCCTGGTTCTGAGGCCTCAGACTGAAACTACAT GTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCT ATCTATCATTGGTCCATATTCTGTTTCTCTGGACGCTGAATACA TACAGAACCCACAGTCCCAGACGAAGAAACGCTATCTCTTAGCGGATAG TGGCTCATTCGTTCGTTCGTTCGTTCATTCATTCATTCATTCATTCATTCAT TCATTCAGTGTTTACTGAGTCCTACTTGAGGCCAATCGTTCTGTTCTG ACGAAATGTGTTACCCTGCAGTAGGGTCTGCCGTGCTTGTGAGTGAGTG AGTGAGTGAGTGAGTGAGTGAGTGCTGGCAAAGAGTGCGCTCACCCTTG ACCACTTGTGGCTGCTCTGACT CTAGGTGGTCTCCCTCTGTGGCCCTGCATCCATTTTTCTTCCTATATGAAT CATTAGCATTCATTCATTCATTCATTCATTCATTCATTCAACGAATATCCAC TGAGCAGCTACTCTGTGCCAGGCACATAGAACAGCT GAAGAGCAGTCAAAGCCAGGGCAGAGCACGTGGGGTGGGGTGGGAAG GGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGACACG TGGGGAAAAGAGCTTGGTGCAATCAGAGAACCGAAAGAAGGCC AGGAGGGCTTCTGAACTGTGGACCTCCCAATCATTCTGAGGTTTTTTTGT TTGTTTGTTTGTTTGTTTGTTTGTTTGTTTGGGGTTTTTAGGTCTGGAAGC CAAGTTGGACAACACTTAACAAGAACAATAGCTTCCTTGTTCTCGGG AGTACAGATCCCGGCACAAGGATGAGCTCAGTGAAGGCTGGTTGCTGAA ATGATGGATGGATGGATGGATGGATGGATGGATGGATGGATGATGGATG GATGAGTGGATGGATGGATGGGGATGCCAAAGAATTCTGGCAGGGA TCCCAGACAAGAAAGCACAGAATTCTGTCGTACAGGAAAAAAAAAGAGA GAGAGAGAGAGAGAGACTTTTATAAATCACAGCTGTAGGAGAGTGTCTG CACATAGTAAAATTAGAAGCAGTGGGACATGGAAC TGAAATCAAGAGCCCGACACTTAACCAACTGCCCCAGTTTCCCTTTCTTT CTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCT TCTTTTTTGAGTTTTAGAATGTCCAATCCTTACAAGCACGT CTGCTTTGCTGGTGGACTTGGTGCCGCTGCTGCTGCTACTGGTGCTGCT GTTGGGCGGGGGGGCTCGTGGTGGTGGTGGTGGTGGTGGTGGTGGTG GTGCTGCCGCTGCTGCTGCTGCTGATGCTGGGTGGAGGG GCCAGGCCTTTGAATTCTGGATGAATTCATTCAAGTACCCGGTATGATAC CAATATCGTATCTGTCTCCCCAACACACACACACACACACACACACACTT ATACACACACAATTGTGGCATGACACATGGATGCAGCACCAAC 12 Uma185 ACGTGTCCTAAGGTATGCTGGGAGGCAGGTGTGTGCGTGCTTGTGTGTG TGTGTGTGTGTGTGTGTGTGTGTGCACACTCATGTATGCACATAGCTTGC CATGACCAGAGAAC Uma211 CTCCCTTCTTCCTCTGCCTGCCGCTCCTCTCCTTATGTGTGCTCTCTCTC TCTCTCTCTCTGTCAAATAAATGAATAAAATCTTTAAGGTGATTTCCCTGT CTGCATG Uma218 AGGCCAAGGGTACTACATGCCCTGGACCTGCCTTAGTTCTGCAGGCCGT GTGTGTGTGTGTGTGTGTGCGTGTAGAACAGACTGTCCTACCACCAGTG TATGGTGTGGAAAGGAACAGGGGAAAGGGAAGGGAGACGGTGCTCTTA Uma229 GTCTGGAGCAACACAGGATGTCAGAGGCTCCATTTTGATAGATAGATAGA TAGATAGATAGATAGATAGATAGATAGATATTTTCATGATCACCATAGCAG GGGAGAGGAGATGTGGTGAATCA Uma277 GTGTTCTGATTTCTCCACCTCCTCACTAGTACTTGTTGCTGTTATTATTATT ATTATTATTATTATTATTATTATTATTATTATATGGAATGCTTCGTGAATTTA TGTGTTGTCACGTTTGGGAGGGGCTGTGCTGATCTTCCCTA 13