mec13027-sup-0001-SupInfo

Supporting Information Reconstructing the Demographic History of Orang-utans using Approximate Bayesian Computation Sample Collection Our sample set for this study included orang-utan samples used in previous genetic studies of orang-utans (Arora et al. 2010; Nater et al. 2011; Nater et al. 2013; Greminger et al. 2014). These samples were either faecal and hair samples non-invasively collected from wild populations or blood samples collected from rehabilitant orang-utans. Geographic provenance of samples from rehabilitant orang-utans was confirmed based on their mtDNA haplotypes, which has been shown to be a reliable indicator for the natal area in orang-utans (Arora et al. 2010; Nater et al. 2011). Sample details and DNA extraction procedures are described in the aforementioned studies. The collection and transport of samples was conducted in strict accordance with Indonesian, Malaysian and international regulations. Samples were transferred to Zurich under the Convention on International Trade in Endangered Species (CITES) (permits 09717/IV/SATS-LN/2010, 07279/IV/SATS-LN/2009, 00961/IV/SATS-LN/2007, 06968/IV/SATS-LN/2005, and 4872). PCR Amplification, Sequencing and Genotyping We complemented the data set of previously published autosomal (Arora et al. 2010; Nater et al. 2013; Greminger et al. 2014), mitochondrial (Nater et al. 2011), and Y-chromosomal (Nater et al. 2011; Nietlisbach et al. 2012) markers by generating sequence data for four non-coding autosomal regions and one non-coding X-chromosomal region (Supporting Table S2). The PCRs contained 10 ng genomic DNA, 0.16 µl Phire Hot Start DNA Polymerase, 1x Phire Reaction Buffer (both Finnzymes) containing 1.5 mM MgCl2, 0.1 mM dNTPs and 0.1 µM each of forward and reverse primer in 8 µl total volume. PCR amplifications were performed in a Veriti Thermal Cycler (Applied Biosystems) with the following parameters: Initial denaturation at 98°C for 30 seconds, 40 cycles of 98°C for 10 seconds, primer specific annealing temperature (Supporting Table S1) for 10 seconds, and 72°C for 40 seconds, followed by a final extension step at 72°C for 5 minutes. Cycle sequencing was performed with BigDye Terminator v3.1 chemistry on a 3730 DNA Analyzer (both Applied Biosystems). We used SEQUENCING ANALYSIS v5.3.1 (Applied Biosystems) for raw data analysis. The SEQMAN program of the LASERGENE 8 software package (DNASTAR) was used to trim and align the sequences. We used the program PHASE v.2.1 (Stephens et al. 2001) to infer haplotypes of autosomal and X-chromosomal sequences. Heterozygous positions with phasing probabilities of less than 0.95 were coded with IUPAC ambiguity codes. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 1 Validation of Population Units Good knowledge of the underlying population structure is essential in order to design adequate demographic models, since cryptic population structure can lead to erroneous inference of population size changes (Stadler et al. 2009; Chikhi et al. 2010; Peter et al. 2010). In a first analysis step, we therefore investigated the geographical distribution of genetic diversity in our data set in order to identify distinct genetic clusters. We used the Bayesian clustering algorithm implemented in the software STRUCTURE v2.3.3 (Pritchard et al. 2000) to identify and visualise genetic structure in the autosomal microsatellite data set. We applied the admixture model with correlated allele frequencies, a burn-in length of 3×105 steps followed by 3×106 Markov chain Monte Carlo (MCMC) steps, running the analysis with the number of clusters K ranging from 1 to 10. We performed ten iterations per K and averaged the likelihood of the data Pr(D|K) over all iterations for each K to calculate the deltaK statistic (Evanno et al. 2005), which we used as a criterion to select the most probable number of clusters in the data set. The STRUCTURE run analysing the autosomal microsatellite data set resulted in the highest deltaK values for K=2 (Supporting Figure S1), clearly separating Bornean and Sumatran individuals (Supporting Figure S2). Since STRUCTURE tends to find only the highest level of hierarchical genetic structure in a data set (Evanno et al. 2005), we repeated the analysis separately for each island. This resulted in two and three distinct clusters on Borneo and Sumatra, respectively (Supporting Figure S2). The two Bornean clusters separated individuals from south of the Kinabatangan River in Sabah (South Kinabatangan) and East Kalimantan from individuals from Central and West Kalimantan, Sarawak, as well as north of the Kinabatangan River (North Kinabatangan). Further runs incorporating only samples from the same higher-level cluster revealed a total of five distinct genetic clusters within Bornean orang-utans, separating nearly all regions except Sarawak, which clusters together with West Kalimantan (Supporting Figure S2). In Sumatra, we detected no further hierarchical substructure. Thus, at the lowest level of hierarchal genetic structure, there are a total of eight distinct autosomal clusters (5 on Borneo, 3 on Sumatra) among all sampled orang-utans. Using previously published results from three mtDNA genes (Nater et al. 2011), two additional population units became apparent (separating North Aceh and Langkat on Sumatra, as well as West Kalimantan and Sarawak on Borneo). These cluster pairs were indistinguishable with autosomal microsatellite data alone, most likely due to frequent male-mediated gene flow between them. Conversely, the mtDNA genes alone did not resolve the significant population differentiation between North and South Kinabatangan found in the autosomal microsatellite data, probably due to lack of diversity in the mtDNA genes. Thus, by combining markers with different inheritance patterns, we identified a total of ten distinct genetic clusters (four on Sumatra and six on Borneo), showing significant pairwise differentiation in either autosomal or mtDNA markers. Thus, these clusters should be treated as separate panmictic population units in the demographic modelling. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 2 Phylogenetic Analyses We used a Bayesian MCMC approach implemented in BEAST v1.6.2 (Drummond & Rambaut 2007) to infer gene trees and mutation rates of the autosomal, X-chromosomal and mitochondrial loci, based on our sequence alignments. We applied a TrN+G substitution model (Tamura & Nei 1993) for the mitochondrial alignment and a HKY+G+I model (Hasegawa et al. 1985) for all autosomal and X-chromosomal alignments, as determined by jMODELTEST v0.1.1 (Posada 2008). We estimated locus-specific mutation rates under the relaxed molecular clock model with uncorrelated lognormal distributed branch rates (Drummond et al. 2006) and a prior distribution of node ages derived from a birth-death process (Yang & Rannala 2006; Gernhard 2008). Each gene trees was rooted with a human and a central chimpanzee sequence from GenBank (accession nos. GQ983109.1 and HM068590.1, respectively), and the calibration of the molecular clock was implemented as described in Nater et al. (2011). For the four autosomal loci and the single X-chromosomal locus, the BEAST runs resulted in mean mutation rates of 1.61–3.04×10-8 and 2.00×10-8 per site per generation, respectively (Supporting Table S4). As expected, the mitochondrial regions showed a mutation rate an order of magnitude higher as compared to the nuclear loci (2.38×10-7 per site per generation). The phylogenetic trees of the five nuclear loci revealed different topologies compared to the mitochondrial tree (Supporting Figure S3). All autosomal regions showed incomplete lineage sorting and in some cases even haplotype sharing between Borneo and Sumatra. For the Xchromosomal region, all Bornean sequences formed a monophyletic group with a comparatively recent common ancestor, while the Sumatran sequences were paraphyletic. The Sumatran population south of Lake Toba, Batang Toru, did not form a distinct cluster for any of the five nuclear gene trees. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 3 Supporting Table S1: Primers used for amplification and sequencing of four autosomal regions and one X-chromosomal region Primer Name Primer Type Chr2a_Region17_F Chr2a_Region17_R Chr2a_Region17_seq1 Chr2a_Region17_seq2 Chr9_Region16_F Chr9_Region16_R Chr9_Region16_seq1 Chr12_Region1_F Chr12_Region1_R Chr12_Region1_seq1 Chr19_Region7_F Chr19_Region7_R Chr19_Region7_seq1 Xq13.3_2_F Xq13.3_2_R Xq13.3_2_seq1 Xq13.3_3_F Xq13.3_3_R Xq13.3_3_seq1 Xq13.3_4_F Xq13.3_4_R Xq13.3_4_seq1 Xq13.3_5_F Xq13.3_5_R Xq13.3_5_seq1 Xq13.3_5_seq2 PCR / Sequencing primer PCR / Sequencing primer Sequencing primer Sequencing primer PCR / Sequencing primer PCR / Sequencing primer Sequencing primer PCR / Sequencing primer PCR / Sequencing primer Sequencing primer PCR / Sequencing primer PCR / Sequencing primer Sequencing primer PCR / Sequencing primer PCR / Sequencing primer Sequencing primer PCR / Sequencing primer PCR / Sequencing primer Sequencing primer PCR / Sequencing primer PCR / Sequencing primer Sequencing primer PCR / Sequencing primer PCR / Sequencing primer Sequencing primer Sequencing primer Annealing Temp. (PCR/Sequencing) 64°C / 53°C 64°C /53°C 53°C 53°C 64°C / 53°C 64°C /53°C 53°C 64°C / 53°C 64°C /53°C 53°C 64°C / 53°C 64°C /53°C 53°C 62°C / 53°C 62°C / 53°C 53°C 62°C / 53°C 62°C / 53°C 53°C 62°C / 53°C 62°C / 53°C 53°C 62°C / 53°C 62°C / 53°C 53°C 53°C Sequence (5’-3’) AGTGCCCCGACACAAGTGATACAG GAGCAGGGCTTAGGCAAGGAGA GTTTTGAAGCCATTAAGTTGCTGAT GGTGGAAACATTTTCAAAACTCAGA TTCATATGCAGGGCAAGAGAACAAG CCCTGGTCATCATGCCTGCTATTAT AAGTTCACAGCCTTCCTCAAGAG ATCCAAATGGCCAAACTCACCT GCAACCCACATGCTCATCAATAG CCAGGGAGAGCCAGGGAACA GGAGGGTTGATGACGTTTACTTACA TGACACATGATTGATGCCACTCTC AGGATACAAGCCCTATTTTGCTGAA CTCAGTAACTTGGCGAAACCTCAT GCCCCCAACAGACTCCAGTGT TGCAGCAACTAACAGCATTCA TAAGTGGGAGCTGAATGATAAGAAC GACAGGGAAGATTGAGAGTGAAGAT TCCCATGAAACACTCTCCTAAACA CCCCTCTGAACCCTGCTCCTA CCCTGGACTTGTAGAAAAATCTGCT ATAATCATGTTCTTTGGAAGACCTG AAATCTTCTTAACTGTTGGGCACTT TTAACGTTAACGCCATCAGTCC GGCAATTGGGAAAGGATACTCA AGCCAGAGTCTTGGTTTGTCTCC The naming of the regions correspond to the names used in Fischer et al. (2006). Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 4 Supporting Table S2: List of sequence loci used in the ABC analysis. All Pongo Borneo Sumatra Locus Loc.a Acc. nos.b chr2a_R17 AUT - 2165 22 44 0.0051 0.29 10 24 0.0039 1.01 12 33 0.0047 0.62 chr9_R16 AUT - 2101 22 35 0.0036 -0.17 10 12 0.0015 -0.36 12 29 0.0042 0.50 chr12_R1 AUT - 1954 22 43 0.0054 0.20 10 20 0.0040 1.41 12 36 0.0060 0.80 chr19_R7 AUT - 1937 22 36 0.0039 -0.29 10 22 0.0037 0.65 12 29 0.0034 -0.54 Xq13.3 X - 8050 36 80 0.0023 -0.13 18 6 0.0001 -1.11 18 69 0.0023 -0.30 16S MT HQ912716– HQ912723 346 118 16 0.0151 2.01 52 3 0.0010 -0.99 66 9 0.0068 0.65 ND3 MT HQ912741– HQ912752 494 118 53 0.0385 2.86 52 8 0.0036 0.02 66 40 0.0219 0.93 CYTB MT HQ912724– HQ912740 515 118 73 0.0434 2.02 52 8 0.0016 -1.42 66 62 0.0305 0.69 LBasesc NIndd NSege πf Dg NInd π NSeg D NInd π NSeg D a , Genomic location of the locus (AUT = autosomal, X = X-chromosomal, MT = mitochondrial); b, gene bank accession numbers; c, sequence length in base pairs; d, number of sampled individuals; e, number of segregating sites; f, nucleotide diversity; g, Tajima’s D (Tajima 1989). Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 5 Supporting Table S3: List of microsatellite loci used in the ABC analysis. Locus Loc.a D1S550 D2S1326 D4S1627 D4S2408 D5S1457 D5S1470 D5S1505 D13S321 D13S765 D16S420 O4_6 O4_A1 O4_A5 O4_A7 O4_A8 O4_B3 O4_B5 O4_B6 O4_B17 O4_B20 O4_B24 O4_C9 O4_C13 O4_Chr5 O4_Chr7 DYS502.1 DYS502.2 DYS510 DYS532 DYS556 DYS561 DYS577 DYS587 DYS630 DYS645 Y6C2 AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT AUT Y Y Y Y Y Y Y Y Y Y Y All Pongo Borneo Sumatra NIndb NAc HOd HEe NInd NA HO HE NInd NA HO HE 233 11 0.70 0.82 124 8 0.60 0.70 109 9 0.81 0.80 236 7 0.54 0.71 126 7 0.64 0.72 110 5 0.43 0.40 230 7 0.57 0.71 122 7 0.63 0.68 108 5 0.49 0.66 228 6 0.57 0.73 118 6 0.60 0.62 110 4 0.55 0.65 232 9 0.61 0.78 122 9 0.71 0.79 110 6 0.50 0.71 231 10 0.69 0.79 122 9 0.67 0.75 109 8 0.71 0.79 231 9 0.65 0.78 122 8 0.72 0.82 109 7 0.58 0.71 234 10 0.66 0.84 126 7 0.57 0.76 108 9 0.76 0.79 233 8 0.61 0.80 123 6 0.55 0.60 110 7 0.68 0.68 229 10 0.59 0.70 121 6 0.50 0.57 108 9 0.69 0.80 235 8 0.67 0.81 125 7 0.62 0.68 110 4 0.73 0.74 231 7 0.66 0.78 121 4 0.61 0.69 110 6 0.71 0.78 233 8 0.56 0.77 124 8 0.53 0.61 109 6 0.60 0.62 234 5 0.49 0.57 125 5 0.36 0.41 109 5 0.63 0.67 237 3 0.15 0.15 127 2 0.01 0.01 110 3 0.31 0.29 231 5 0.32 0.51 121 3 0.03 0.03 110 3 0.65 0.60 234 10 0.68 0.83 125 9 0.65 0.68 109 7 0.72 0.74 228 12 0.56 0.82 119 9 0.54 0.86 109 10 0.59 0.58 226 12 0.62 0.82 119 12 0.67 0.79 107 6 0.55 0.64 227 5 0.35 0.54 121 3 0.45 0.48 106 4 0.25 0.28 237 4 0.27 0.47 127 1 0.00 0.00 110 4 0.57 0.66 225 6 0.51 0.58 116 6 0.48 0.58 109 5 0.53 0.57 226 7 0.56 0.66 118 7 0.54 0.72 108 4 0.58 0.55 231 11 0.78 0.86 122 7 0.77 0.78 109 9 0.80 0.82 228 25 0.85 0.94 122 23 0.86 0.91 106 20 0.83 0.93 129 2 - 0.49 53 1 - 0.00 76 1 - 0.00 129 3 - 0.03 53 3 - 0.07 76 1 - 0.00 129 7 - 0.68 53 3 - 0.42 76 5 - 0.37 129 2 - 0.49 53 1 - 0.00 76 1 - 0.00 129 3 - 0.28 53 3 - 0.54 76 1 - 0.00 129 5 - 0.56 53 5 - 0.49 76 1 - 0.00 129 2 - 0.08 53 2 - 0.17 76 1 - 0.00 129 8 - 0.68 53 7 - 0.84 76 3 - 0.30 129 11 - 0.87 53 8 - 0.81 76 5 - 0.76 129 2 - 0.49 53 1 - 0.00 76 1 - 0.00 129 2 - 0.49 53 1 - 0.00 76 1 - 0.00 a , Genomic location of the locus (AUT = autosomal, Y = Y-chromosomal); b, number of alleles; c, number of sampled individuals; d, observed heterozygosity; e, expected heterozygosity. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 6 Supporting Table S4: Mutation rate estimates of sequence loci Mean substitution rate 95%-HPDa 95%-HPDa Kappab pInvc Mean and SD of mutation rate per site per generation lower upper per variable site per generation Chr2a_Region17 1.61×10-8 1.06×10-8 2.21×10-8 3.85 0.73 6.02×10-8, 3.0×10-9 Chr9_Region16 3.04×10-8 2.00×10-8 4.16×10-8 3.66 0.78 14.00×10-8, 5.6×10-9 Chr12_Region1 1.84×10-8 1.24×10-8 2.49×10-8 5.14 0.51 3.77×10-8, 3.3×10-9 Chr19_Region7 2.24×10-8 1.04×10-8 3.35×10-8 5.98 0.70 7.42×10-8, 6.0×10-9 Xq13.3 2.00×10-8 1.48×10-8 2.55×10-8 3.59 0.80 10.15×10-8, 2.8×10-9 mtDNA 2.38×10-7 1.78×10-7 3.06×10-7 18.82 0.66 6.93×10-7, 3.4×10-8 a , 95% highest posterior density boundaries; b, transition/transversion rate ratio; c, proportion of invariable sites Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 7 Supporting Table S5: Parameterisation and parameter prior distributions for all 2-population models Parametera Log(NNOWBO) Log(NNOWSU) Log(NBNBO) Log(NBNSU) Log(NANCBO) Log(NANCSU) Log(NANCPO) Log(TSPLIT) Log(TMIGSTOP) Log(TBNBO) Log(TBNSU) Log(mBO-SU) Log(mSU-BO) ALPHASTR-AUT ALPHASTR-Y Log(MUTRAESTR-AUT) MUTRATESTR-Y MUTRATEChr2a MUTRATEChr9 MUTRATEChr12 MUTRATEChr19 MUTRATEXq13.3 MUTRATEMTDNA Prior distribution uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform normal normal normal normal normal normal normal I2b 3, 5 3, 5 3, 5 4.2, 5 8, 15 8, 15 -5, -3 2.0×10-3, 1.0×10-3 6.02×10-8, 3.0×10-9 14.00×10-8, 5.6×10-9 3.77×10-8, 3.3×10-9 7.42×10-8, 6.0×10-9 10.15×10-8, 2.8×10-9 6.93×10-7, 3.4×10-8 IM2c 3, 5 3, 5 IM2-GRd 3, 5 3, 5 3, 5 4.2, 5 2.5, 4.2 3, 5 3, 5 3, 5 4.2, 5 2.5, 4.2 -5, -3 -5, -3 8, 15 8, 15 -5, -3 2.0×10-3, 1.0×10-3 6.02×10-8, 3.0×10-9 14.00×10-8, 5.6×10-9 3.77×10-8, 3.3×10-9 7.42×10-8, 6.0×10-9 10.15×10-8, 2.8×10-9 6.93×10-7, 3.4×10-8 -5, -3 -5, -3 8, 15 8, 15 -5, -3 2.0×10-3, 1.0×10-3 6.02×10-8, 3.0×10-9 14.00×10-8, 5.6×10-9 3.77×10-8, 3.3×10-9 7.42×10-8, 6.0×10-9 10.15×10-8, 2.8×10-9 6.93×10-7, 3.4×10-8 IM2-BN-GRe 3, 5 3, 5 2, 5 2, 5 3, 5 3, 5 3, 5 4.2, 5 2.5, 4.2 2, 4.2 2, 4.2 -5, -3 -5, -3 8, 15 8, 15 -5, -3 2.0×10-3, 1.0×10-3 6.02×10-8, 3.0×10-9 14.00×10-8, 5.6×10-9 3.77×10-8, 3.3×10-9 7.42×10-8, 6.0×10-9 10.15×10-8, 2.8×10-9 6.93×10-7, 3.4×10-8 a , BO = Borneo, SU = Sumatra, PO = All Pongo, NNOW = current effective population size, NBN = effective population size during population bottleneck, NANC = ancestral effective population size, TSPLIT = population split time, TMIGSTOP = time since migration between Borneo and Sumatra stopped, T BN = time since population bottleneck, m = migration rate per individual per generation, ALPHA = shape parameter of gamma distribution of mutation rate, MUT = mean mutation rate per locus/site per generation; b, isolation model with two populations; c, isolation-with-migration with two populations; d, isolation-withmigration model with two populations and exponential growth; e, isolation-with-migration model with two populations and bottleneck followed by exponential growth. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 8 Supporting Table S6: Parameterisation and parameter prior distributions for all 10-population models Parametera Log(NNOWBO) Log(NNOWNT) Log(NNOWST) Log(NBNBO) Log(NTOBANT) Log(NTOBAST) Log(NSTRUCBO) Log(NSTRUCNT) Log(NANCBO) Log(NANCNT) Log(NANCST) Log(TSPLITBO) Log(TSPLITNT) Log(TMIGSTOP) Log(TBNDURBO) Log(TTOBASU) Log(TSTRUCBO) Log(TSTRUCNT) Log(TDECBO) Log(TDECSU) Log(mBO-ST) Log(mST-BO) Log(mNT-ST) Log(mST-NT) Log(mBO) Log(mNT) Prior distribution IM10b uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform uniform 3, 5 3, 5 3, 5 IM10BOc NT IM10DECSUd IM10DECBOe IM10DECALLf IM10BNBODECSUg 3, 5 3, 5 3, 5 3, 5 2, 4 2, 4 2, 4 2, 4 2, 4 2, 4 2, 4 2, 4 3, 5 2,4 2,4 2,4 3, 5 3, 5 3, 5 3, 5 3, 5 3, 5 4.2, 4.8 4.8, 5.2 2.5, 4.2 3, 5 3, 5 3, 5 4.2, 5.2 4.2, 5.2 2.5, 4.2 3, 5 4.2, 5.2 4.2, 5.2 2.5, 4.2 3, 5 3, 5 3, 5 3, 5 4.2, 4.8 4.8, 5.2 2.5, 4.2 2.5, 4.2 3.5, 4.8 2.5, 4.2 3.5, 4.8 2.5, 4.2 3.5, 4.8 -5, -3 -5, -3 -5, -3 -5, -3 -4, -2 -4, -2 1, 3.5 -5, -3 -5, -3 -5, -3 -5, -3 -4, -2 -4, -2 -5, -3 -5, -3 -5, -3 -5, -3 -4, -2 -4, -2 Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 3, 5 3, 5 3, 5 4.2, 4.8 4.8, 5.2 2.5, 4.2 2.5, 4.2 3.5, 4.8 1, 3.5 -5, -3 -5, -3 -5, -3 -5, -3 -4, -2 -4, -2 2.5, 4.2 3.5, 4.8 1, 3.5 1, 3.5 -5, -3 -5, -3 -5, -3 -5, -3 -4, -2 -4, -2 IM10BNBOTOBADECSUh 3, 5 2, 4 2, 4 2, 4 2, 4 2, 4 IM10BNBORECOLDECSUi 3, 5 2, 4 2, 4 2, 4 1, 2 1, 2 2.5, 4.2 3.5, 4.8 3, 5 3, 5 3, 5 3, 5 4.2, 4.8 4.8, 5.2 2.5, 4.2 1, 3.6 3.4, 3.5 2.5, 4.2 3.5, 4.8 3, 5 3, 5 3, 5 3, 5 4.2, 4.8 4.8, 5.2 2.5, 4.2 1, 3.6 3.4, 3.5 2.5, 4.2 3.5, 4.8 1, 3.5 -5, -3 -5, -3 -5, -3 -5, -3 -4, -2 -4, -2 1, 3.5 -5, -3 -5, -3 -5, -3 -5, -3 -4, -2 -4, -2 1, 3.5 -5, -3 -5, -3 -5, -3 -5, -3 -4, -2 -4, -2 3, 5 3, 5 3, 5 3, 5 4.2, 4.8 4.8, 5.2 2.5, 4.2 1, 3.6 9 ALPHASTR-AUT ALPHASTR-Y Log(MUTRAESTR-AUT) MUTRATESTR-Y uniform uniform uniform normal MUTRATEChr2a normal MUTRATEChr9 normal MUTRATEChr12 normal MUTRATEChr19 normal MUTRATEXq13.3 normal MUTRATEMTDNA normal 8, 15 8, 15 -5, -3 2.0×10-3, 1.0×10-3 6.02×10-8, 3.0×10-9 14.00×108 , 5.6×10-9 3.77×10-8, 3.3×10-9 7.42×10-8, 6.0×10-9 10.15×108 , 2.8×10-9 6.93×10-7, 3.4×10-8 8, 15 8, 15 -5, -3 2.0×10-3, 1.0×10-3 6.02×10-8, 3.0×10-9 14.00×108 , 5.6×10-9 3.77×10-8, 3.3×10-9 7.42×10-8, 6.0×10-9 10.15×108 , 2.8×10-9 6.93×10-7, 3.4×10-8 8, 15 8, 15 -5, -3 2.0×10-3, 1.0×10-3 6.02×10-8, 3.0×10-9 14.00×108 , 5.6×10-9 3.77×10-8, 3.3×10-9 7.42×10-8, 6.0×10-9 10.15×108 , 2.8×10-9 6.93×10-7, 3.4×10-8 8, 15 8, 15 -5, -3 2.0×10-3, 1.0×10-3 6.02×10-8, 3.0×10-9 14.00×108 , 5.6×10-9 3.77×10-8, 3.3×10-9 7.42×10-8, 6.0×10-9 10.15×108 , 2.8×10-9 6.93×10-7, 3.4×10-8 8, 15 8, 15 -5, -3 2.0×10-3, 1.0×10-3 6.02×10-8, 3.0×10-9 14.00×108 , 5.6×10-9 3.77×10-8, 3.3×10-9 7.42×10-8, 6.0×10-9 10.15×108 , 2.8×10-9 6.93×10-7, 3.4×10-8 8, 15 8, 15 -5, -3 2.0×10-3, 1.0×10-3 6.02×10-8, 3.0×10-9 14.00×108 , 5.6×10-9 3.77×10-8, 3.3×10-9 7.42×10-8, 6.0×10-9 10.15×108 , 2.8×10-9 6.93×10-7, 3.4×10-8 8, 15 8, 15 -5, -3 2.0×10-3, 1.0×10-3 6.02×10-8, 3.0×10-9 14.00×108 , 5.6×10-9 3.77×10-8, 3.3×10-9 7.42×10-8, 6.0×10-9 10.15×108 , 2.8×10-9 6.93×10-7, 3.4×10-8 8, 15 8, 15 -5, -3 2.0×10-3, 1.0×10-3 6.02×10-8, 3.0×10-9 14.00×108 , 5.6×10-9 3.77×10-8, 3.3×10-9 7.42×10-8, 6.0×10-9 10.15×108 , 2.8×10-9 6.93×10-7, 3.4×10-8 a , BO = Borneo, SU = Sumatra, NT = Sumatra north of Lake Toba, ST = Sumatra south of Lake Toba, NNOW = current effective population size, NBN = effective population size during population bottleneck, NTOBA = effective population size during bottleneck associated with the Toba eruption, NSTRUC = effective population size before recent decline, NANC = ancestral effective population size, TSPLIT = population split time, TMIGSTOP = time since migration between Borneo and Sumatra stopped, TBNDUR = duration of population bottleneck, TTOBA = time of bottleneck associated with the Toba eruption, TSTRUC = time since establishment of population structure, TDEC = time since population decline, m = migration rate per individual per generation, ALPHA = shape parameter of gamma distribution of mutation rate, MUT = mean mutation rate per locus/site per generation; b, isolation-with-migration model with 10 populations, ST/NT split before ST/BO split; c, isolation-with-migration model with 10 populations, ST/BO split before ST/NT split; d, isolation-withmigration model with 10 populations and recent population decline on Sumatra; e, isolation-with-migration model with 10 populations and recent population decline on Borneo; f, isolation-with-migration model with 10 populations and recent population decline on Borneo and Sumatra; g, isolation-with-migration model with 10 populations, bottleneck on Borneo and recent population decline on Sumatra; h, isolation-with-migration model with 10 populations, bottleneck on Borneo and Sumatra, and recent population decline on Sumatra; i, isolation-with-migration model with 10 populations, bottleneck on Borneo, recolonisation and recent population decline of populations on Sumatra. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 10 Supporting Table S7: Summary statistics used for approximate Bayesian computation Summary statistic SX prSX Spop Stot DX Dpop FSX FSpop πX πpop ΦST_XY πXY KX Kpop Ktot HX Hpop Htot KX Data sets (number of statistics) Autosomal (6), mtDNA (3) Autosomal (6), mtDNA (3) mtDNA (2) Autosomal (2), mtDNA (1) Autosomal (6), mtDNA (3) mtDNA (2) Autosomal (6), mtDNA (3) mtDNA (2) Autosomal (6), mtDNA (3) X-chrom. (3), X-chrom. (3), Description Number of segregating sites per population Number of private segregating sites per population Mean and standard deviation of the number of segregating sites over all populations X-chrom. (1), X-chrom. (3), Total number of segregating sites over all populations Tajima’s D (Tajima 1989), calculated for each population. Mean and standard deviation of Tajima's D over all populations X-chrom. (3), Fu's FS statistic (Fu 1997), calculated for each population. Mean and standard deviation of Fu's FS over all populations X-chrom. (3), Average number of pairwise sequence differences within each population Mean and standard deviation of the average number of pairwise sequence differences within each population over all populations Autosomal (6), X-chrom. (3), Differentiation index between all pairs of populations and over all populations, calculated as mtDNA (3) ΦST (Excoffier et al. 1992). Autosomal (6), X-chrom. (3), Mean number of sequence differences between all pairs of populations mtDNA (3) mtDNA (3) Number of haplotypes per population mtDNA (2) Mean and standard deviation of the number of haplotypes over populations mtDNA (1) Total number of haplotypes over all populations mtDNA (3) Expected heterozygosity per population mtDNA (2) Mean and standard deviation of expected heterozygosity over populations mtDNA (1) Total expected heterozygosity over all populations Autosomal-STR (6), Y-STR Mean and standard deviation of the number of alleles over all loci per population (6) mtDNA (2) Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 11 Kpop Ktot HX Hpop Htot GWX GWpop GWtot NGWX NGWpop RX Rpop Rtot FIS, FIT, FST FST FST_XY Autosomal-STR (2), Y-STR (2) Autosomal-STR (1), Y-STR (1) Autosomal-STR (6), Y-STR (6) Autosomal-STR (2), Y-STR (2) Autosomal-STR (1), Y-STR (1) Autosomal-STR (6), Y-STR (6) Autosomal-STR (2), Y-STR (2) Autosomal-STR (1), Y-STR (1) Autosomal-STR (6), Y-STR (6) Autosomal-STR (2), Y-STR (2) Autosomal-STR (6), Y-STR (6) Autosomal-STR (2), Y-STR (2) Autosomal-STR (1), Y-STR (1) Autosomal-STR (6), Y-STR (2), mtDNA (2) Autosomal (2), X-chrom. (1), Autosomal-STR (1), Y-STR (1), mtDNA (1) Autosomal-STR (3), Y-STR (3) Mean and standard deviation of the mean number of alleles (autosomal) or haplotypes (Ychromosomal) over populations Mean over all loci of the total number of alleles in all populations (autosomal) or total number of haplotypes in all populations (Y-chromosomal) Mean and standard deviation of the observed heterozygosity over all loci per population Mean and standard deviation of the mean observed heterozygosity over populations Mean over all loci of the total observed heterozygosity in all populations Mean and standard deviation of the Garza-Williamson index (Garza & Williamson 2001) over all loci per population (GWX =KX/(RX+1)) Mean and standard deviation of the mean Garza-Williamson index over populations Mean Garza-Williamson index over all loci over all populations Mean and standard deviation of the modified Garza-Williamson index (Garza & Williamson 2001) over all loci per population (NGWX=KX/(Rtot+1)) Mean and standard deviation of the mean modified Garza-Williamson index over populations Mean and standard deviation of the allelic size range over all loci per population Mean and standard deviation of the mean allelic size range over populations Mean over all loci of the total allelic size range in all populations Mean over all loci of the global F-statistics, separately calculated for the Bornean and Sumatran meta-populations Global differentiation index Differentiation index between all pairs of populations, calculated as θ W (Weir & Cockerham 1984). Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 12 πXY (δμ)2xY VarX ln(β)X Autosomal-STR (3) Autosomal-STR (3) Autosomal-STR (6) Autosomal-STR (6) (3), Y-STR Mean number of allelic differences between all pairs of populations (3), Y-STR Square difference of mean within population repeat size between all pairs of populations (Goldstein et al. 1995) (6), Y-STR Mean and standard deviation over loci of the allele size variance (6), Y-STR Mean and standard deviation over loci of the natural logarithm of the imbalance index β (Kimmel et al. 1998) a , The number of statistics refers to the unpooled calculations in the ten-population setting, whereby due to sample size restrictions the summary statistics, if not otherwise indicated, are calculated over all the samples from the populations north of Lake Toba, south of Lake Toba, and Borneo, respectively. For autosomal sequence data, each summary statistic is represented as mean and standard deviation over all four loci. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 13 Supporting Table S8: Model fits of all tested demographic models Modela Log10 MDb p-valuec I2 (pooled data) -24.51 0.001 IM2 (pooled data) -22.88 0.003 IM2-GR (pooled data) -23.42 0.001 IM2-BN-GR (pooled data) -19.17 0.017 IM10 (pooled data) -16.71 0.224 IM10 (full data) -20.65 0.019 IM10BO-NT (full data) -21.52 0.011 IM10-DECSU (full data) -15.41 0.553 IM10-DECBO (full data) -17.90 0.060 IM10-DECALL (full data) -15.43 0.627 IM10-BNBO-DECSU (full data)* -15.74 0.696 IM10-BNBO-TOBA-DECSU (full data) -15.67 0.661 IM10-BNBO-RECOL-DECSU (full data) -16.69 0.521 a , Values for the 2-population models correspond to a smaller set of pooled summary statistics as compared to the 10-population models and are therefore not directly comparable. For comparison, the simplest 10-population model is shown with values for both the pooled and the full set of summary statistics; b, marginal density of the observed data under the inferred GLM; c, p-value of the observed data under the inferred GLM. * selected model Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 14 Supporting Table S9: Accuracy of different point estimators in parameter estimation RMSEMODE RMSEMEAN RMSEMEDIAN Log(NNOWBO) 0.095 0.098 0.096 Log(NNOWNT) 0.133 0.140 0.137 Log(NNOWST) 0.188 0.208 0.200 Log(NBNBO) 0.301 0.362 0.338 Log(NANCBO) 0.318 0.398 0.371 Log(NSTRUCNT) 0.293 0.354 0.329 Log(NANCNT) 0.284 0.366 0.340 Log(NANCST) 0.263 0.317 0.295 TBNENDBO 1,660 1,884 1,799 TBNDURBO 633 857 802 TSPLITBO 6,971 9,439 8,784 TDECSU 445 575 529 TSTRUCNT 6,342 7,181 6,838 TSPLITNT 10,528 11,667 11,139 TMIGSTOP 2,531 3,040 2,842 Log(mBO-ST) 0.327 0.421 0.392 Log(mST-BO) 0.327 0.413 0.385 Log(mNT-ST) 0.271 0.313 0.294 Log(mST-NT) 0.285 0.359 0.332 Log(mBO) 0.280 0.362 0.337 Log(mNT) 0.290 0.348 0.326 ALPHASTR-AUT 1.090 1.478 1.377 ALPHASTR-Y 1.123 1.548 1.449 Log(MUTRATESTR) 0.095 0.093 0.092 The accuracy is measured as the root mean squared error over 1,000 pseudo-observed data sets. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 15 Supporting Table S10: Accuracy of parameter estimation under different tolerance levels Tolerance 0.01% 0.05% 0.10% 0.20% 0.50% 1.00% Log(NNOWBO) 0.130 0.131 0.131 0.132 0.133 0.134 Log(NNOWNT) 0.184 0.187 0.188 0.189 0.190 0.192 Log(NNOWST) 0.263 0.267 0.269 0.270 0.271 0.272 Log(NBNBO) 0.439 0.455 0.460 0.464 0.468 0.465 Log(NANCBO) 0.471 0.490 0.496 0.500 0.505 0.507 Log(NSTRUCNT) 0.423 0.440 0.445 0.449 0.454 0.461 Log(NANCNT) 0.438 0.454 0.459 0.464 0.468 0.475 Log(NANCST) 0.391 0.405 0.410 0.415 0.420 0.423 TBNENDBO 27 27 28 28 28 28 TBNDURBO 22 23 23 23 23 23 TSPLITBO 73 76 77 77 78 78 TDECSU 18 18 19 19 19 19 TSTRUCNT 53 54 55 55 56 56 TSPLITNT 67 69 69 69 70 68 TMIGSTOP 40 42 43 43 43 43 Log(mBO-ST) 0.489 0.508 0.514 0.518 0.521 0.526 Log(mST-BO) 0.478 0.497 0.503 0.507 0.511 0.515 Log(mNT-ST) 0.390 0.403 0.407 0.410 0.414 0.414 Log(mST-NT) 0.427 0.443 0.447 0.452 0.456 0.461 Log(mBO) 0.425 0.440 0.445 0.449 0.453 0.459 Log(mNT) 0.413 0.428 0.433 0.436 0.440 0.443 ALPHASTR-AUT 0.924 0.957 0.967 0.974 0.981 0.985 ALPHASTR-Y 0.941 0.976 0.985 0.992 0.998 0.987 Log(MUTRATESTR) 0.122 0.121 0.121 0.121 0.121 0.124 The accuracy is calculated by taking the average of the root mean integrated squared error (RMISE) (Wegmann et al. 2009) over 1,000 pseudo-observed data sets for each of six different tolerance levels (proportion of retained simulations). For the parameter estimation, we used the closest 10,000 simulations from the likelihood-free MCMC run over 107 simulations with an MCMC tolerance level of 0.1, which is equal to a tolerance level of ~0.01% in a standard rejection sampling approach (Wegmann et al. 2009). Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 16 Supporting Figure S1: Pr(Data|K) and deltaK statistics for all STRUCTURE runs. The population structure analysis incorporated multiple levels of hierarchical structure, starting with all samples and subsequently reducing the data set to only samples assigned to the same cluster in the previous analysis. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 17 Supporting Figure S2: Structure plot for 25 microsatellite markers used for the demographic modelling. The three rows of plots correspond to the three levels of hierarchical structure we identified in the complete data set. The geographical location of the ten sampling regions is shown in Figure 1 of the main text. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 18 Supporting Figure S3: Gene trees based on sequence data of six different loci. The tips of black branches refer to Sumatran samples, light grey to Bornean samples, and dark grey to the human and chimpanzee outgroup. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 19 Supporting Figure S4: Cross validation of the parameter estimation. We drew 1,000 random parameter sets from the prior distributions of the model parameters and generated pseudo-observed data sets by simulating summary statistics under the selected model (IM10-BNBO-DECSU). We then performed the standard parameter estimation procedure with each dataset. The histograms represent the number of times Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 20 the known parameter values fall into each 10%-quantile of the estimated posterior distribution. For unbiased parameter estimates, the expectation is a uniform distribution over the entire prior space. A concentration of data points at the borders indicate too narrow posterior estimates, while a concentration of data points at the centre points toward too conservative posterior estimates. The p-value of the Kolmogorov-Smirnov test is given above each histogram. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 21 Supporting Figure S5: First 16 principal components of the posterior predictive distribution for the selected model (IM10-BNBO-DECSU). Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 22 References Arora N, Nater A, van Schaik CP, et al. (2010) Effects of Pleistocene glaciations and rivers on the population structure of Bornean orangutans (Pongo pygmaeus). Proceedings of the National Academy of Sciences 107, 21376-21381. Chikhi L, Sousa VC, Luisi P, Goossens B, Beaumont MA (2010) The confounding effects of population structure, genetic diversity and the sampling scheme on the detection and quantification of population size changes. Genetics 186, 983-U347. Drummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics and dating with confidence. PLoS Biology 4, 699-710. Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology 7, -. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology 14, 2611-2620. Excoffier L, Smouse PE, Quattro JM (1992) Analysis of molecular variance inferred from metric distances among DNA haplotypes - application to human mitochondrial DNA restriction data. Genetics 131, 479-491. Fischer A, Pollack J, Thalmann O, Nickel B, Paabo S (2006) Demographic history and genetic differentiation in apes. Current Biology 16, 1133-1138. Fu YX (1997) Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics 147, 915-925. Garza JC, Williamson EG (2001) Detection of reduction in population size using data from microsatellite loci. Molecular Ecology 10, 305-318. Gernhard T (2008) The conditioned reconstructed process. J Theor Biol 253, 769-778. Goldstein DB, Linares AR, Cavallisforza LL, Feldman MW (1995) Genetic absolute dating based on microsatellites and the origin of modern humans. Proceedings of the National Academy of Sciences of the United States of America 92, 6723-6727. Greminger MP, Stölting KN, Nater A, et al. (2014) Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms. Bmc Genomics 15. Hasegawa M, Kishino H, Yano TA (1985) Dating of the human ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22, 160-174. Kimmel M, Chakraborty R, King JP, et al. (1998) Signatures of population expansion in microsatellite repeat data. Genetics 148, 1921-1930. Nater A, Nietlisbach P, Arora N, et al. (2011) Sex-biased dispersal and volcanic activities shaped phylogeographic patterns of extant orangutans (genus: Pongo). Molecular Biology and Evolution 28, 2275-2288. Nater A, Arora N, Greminger MP, et al. (2013) Marked population structure and recent migration in the critically endangered Sumatran orangutan (Pongo abelii). Journal of Heredity 104, 2-13. Nietlisbach P, Arora N, Nater A, et al. (2012) Heavily male-biased long-distance dispersal of orang-utans (genus: Pongo), as revealed by Y-chromosomal and mitochondrial genetic markers. Molecular Ecology 21, 3173-3186. Peter BM, Wegmann D, Excoffier L (2010) Distinguishing between population bottleneck and population subdivision by a Bayesian model choice procedure. Molecular Ecology 19, 4648-4660. Posada D (2008) jModelTest: phylogenetic model averaging. Molecular Biology and Evolution 25, 1253-1256. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155, 945-959. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 23 Stadler T, Haubold B, Merino C, Stephan W, Pfaffelhuber P (2009) The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations. Genetics 182, 205-216. Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68, 978-989. Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585-595. Tamura K, Nei M (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution 10, 512-526. Wegmann D, Leuenberger C, Excoffier L (2009) Efficient Approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182, 1207-1218. Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38, 1358-1370. Yang ZH, Rannala B (2006) Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Molecular Biology and Evolution 23, 212-226. Demographic History of Orang-Utans (Pongo spp.) – Supporting Information 24

mec13027-sup-0001-SupInfo

Related documents

Products

Support

mec13027-sup-0001-SupInfo

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib