mec13027-sup-0001-SupInfo

advertisement
Supporting Information
Reconstructing the Demographic History of Orang-utans
using Approximate Bayesian Computation
Sample Collection
Our sample set for this study included orang-utan samples used in previous genetic studies of
orang-utans (Arora et al. 2010; Nater et al. 2011; Nater et al. 2013; Greminger et al. 2014).
These samples were either faecal and hair samples non-invasively collected from wild
populations or blood samples collected from rehabilitant orang-utans. Geographic provenance of
samples from rehabilitant orang-utans was confirmed based on their mtDNA haplotypes, which
has been shown to be a reliable indicator for the natal area in orang-utans (Arora et al. 2010;
Nater et al. 2011). Sample details and DNA extraction procedures are described in the
aforementioned studies. The collection and transport of samples was conducted in strict
accordance with Indonesian, Malaysian and international regulations. Samples were transferred
to Zurich under the Convention on International Trade in Endangered Species (CITES) (permits
09717/IV/SATS-LN/2010,
07279/IV/SATS-LN/2009,
00961/IV/SATS-LN/2007,
06968/IV/SATS-LN/2005, and 4872).
PCR Amplification, Sequencing and Genotyping
We complemented the data set of previously published autosomal (Arora et al. 2010; Nater et al.
2013; Greminger et al. 2014), mitochondrial (Nater et al. 2011), and Y-chromosomal (Nater et
al. 2011; Nietlisbach et al. 2012) markers by generating sequence data for four non-coding
autosomal regions and one non-coding X-chromosomal region (Supporting Table S2). The PCRs
contained 10 ng genomic DNA, 0.16 µl Phire Hot Start DNA Polymerase, 1x Phire Reaction
Buffer (both Finnzymes) containing 1.5 mM MgCl2, 0.1 mM dNTPs and 0.1 µM each of forward
and reverse primer in 8 µl total volume. PCR amplifications were performed in a Veriti Thermal
Cycler (Applied Biosystems) with the following parameters: Initial denaturation at 98°C for 30
seconds, 40 cycles of 98°C for 10 seconds, primer specific annealing temperature (Supporting
Table S1) for 10 seconds, and 72°C for 40 seconds, followed by a final extension step at 72°C
for 5 minutes. Cycle sequencing was performed with BigDye Terminator v3.1 chemistry on a
3730 DNA Analyzer (both Applied Biosystems). We used SEQUENCING ANALYSIS v5.3.1
(Applied Biosystems) for raw data analysis. The SEQMAN program of the LASERGENE 8
software package (DNASTAR) was used to trim and align the sequences. We used the program
PHASE v.2.1 (Stephens et al. 2001) to infer haplotypes of autosomal and X-chromosomal
sequences. Heterozygous positions with phasing probabilities of less than 0.95 were coded with
IUPAC ambiguity codes.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
1
Validation of Population Units
Good knowledge of the underlying population structure is essential in order to design adequate
demographic models, since cryptic population structure can lead to erroneous inference of
population size changes (Stadler et al. 2009; Chikhi et al. 2010; Peter et al. 2010). In a first
analysis step, we therefore investigated the geographical distribution of genetic diversity in our
data set in order to identify distinct genetic clusters.
We used the Bayesian clustering algorithm implemented in the software STRUCTURE v2.3.3
(Pritchard et al. 2000) to identify and visualise genetic structure in the autosomal microsatellite
data set. We applied the admixture model with correlated allele frequencies, a burn-in length of
3×105 steps followed by 3×106 Markov chain Monte Carlo (MCMC) steps, running the analysis
with the number of clusters K ranging from 1 to 10. We performed ten iterations per K and
averaged the likelihood of the data Pr(D|K) over all iterations for each K to calculate the deltaK
statistic (Evanno et al. 2005), which we used as a criterion to select the most probable number of
clusters in the data set.
The STRUCTURE run analysing the autosomal microsatellite data set resulted in the highest
deltaK values for K=2 (Supporting Figure S1), clearly separating Bornean and Sumatran
individuals (Supporting Figure S2). Since STRUCTURE tends to find only the highest level of
hierarchical genetic structure in a data set (Evanno et al. 2005), we repeated the analysis
separately for each island. This resulted in two and three distinct clusters on Borneo and
Sumatra, respectively (Supporting Figure S2). The two Bornean clusters separated individuals
from south of the Kinabatangan River in Sabah (South Kinabatangan) and East Kalimantan from
individuals from Central and West Kalimantan, Sarawak, as well as north of the Kinabatangan
River (North Kinabatangan). Further runs incorporating only samples from the same higher-level
cluster revealed a total of five distinct genetic clusters within Bornean orang-utans, separating
nearly all regions except Sarawak, which clusters together with West Kalimantan (Supporting
Figure S2). In Sumatra, we detected no further hierarchical substructure. Thus, at the lowest
level of hierarchal genetic structure, there are a total of eight distinct autosomal clusters (5 on
Borneo, 3 on Sumatra) among all sampled orang-utans.
Using previously published results from three mtDNA genes (Nater et al. 2011), two additional
population units became apparent (separating North Aceh and Langkat on Sumatra, as well as
West Kalimantan and Sarawak on Borneo). These cluster pairs were indistinguishable with
autosomal microsatellite data alone, most likely due to frequent male-mediated gene flow
between them. Conversely, the mtDNA genes alone did not resolve the significant population
differentiation between North and South Kinabatangan found in the autosomal microsatellite
data, probably due to lack of diversity in the mtDNA genes. Thus, by combining markers with
different inheritance patterns, we identified a total of ten distinct genetic clusters (four on
Sumatra and six on Borneo), showing significant pairwise differentiation in either autosomal or
mtDNA markers. Thus, these clusters should be treated as separate panmictic population units in
the demographic modelling.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
2
Phylogenetic Analyses
We used a Bayesian MCMC approach implemented in BEAST v1.6.2 (Drummond & Rambaut
2007) to infer gene trees and mutation rates of the autosomal, X-chromosomal and mitochondrial
loci, based on our sequence alignments. We applied a TrN+G substitution model (Tamura & Nei
1993) for the mitochondrial alignment and a HKY+G+I model (Hasegawa et al. 1985) for all
autosomal and X-chromosomal alignments, as determined by jMODELTEST v0.1.1 (Posada
2008). We estimated locus-specific mutation rates under the relaxed molecular clock model with
uncorrelated lognormal distributed branch rates (Drummond et al. 2006) and a prior distribution
of node ages derived from a birth-death process (Yang & Rannala 2006; Gernhard 2008). Each
gene trees was rooted with a human and a central chimpanzee sequence from GenBank
(accession nos. GQ983109.1 and HM068590.1, respectively), and the calibration of the
molecular clock was implemented as described in Nater et al. (2011).
For the four autosomal loci and the single X-chromosomal locus, the BEAST runs resulted in
mean mutation rates of 1.61–3.04×10-8 and 2.00×10-8 per site per generation, respectively
(Supporting Table S4). As expected, the mitochondrial regions showed a mutation rate an order
of magnitude higher as compared to the nuclear loci (2.38×10-7 per site per generation). The
phylogenetic trees of the five nuclear loci revealed different topologies compared to the
mitochondrial tree (Supporting Figure S3). All autosomal regions showed incomplete lineage
sorting and in some cases even haplotype sharing between Borneo and Sumatra. For the Xchromosomal region, all Bornean sequences formed a monophyletic group with a comparatively
recent common ancestor, while the Sumatran sequences were paraphyletic. The Sumatran
population south of Lake Toba, Batang Toru, did not form a distinct cluster for any of the five
nuclear gene trees.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
3
Supporting Table S1: Primers used for amplification and sequencing of four autosomal regions and one X-chromosomal region
Primer Name
Primer Type
Chr2a_Region17_F
Chr2a_Region17_R
Chr2a_Region17_seq1
Chr2a_Region17_seq2
Chr9_Region16_F
Chr9_Region16_R
Chr9_Region16_seq1
Chr12_Region1_F
Chr12_Region1_R
Chr12_Region1_seq1
Chr19_Region7_F
Chr19_Region7_R
Chr19_Region7_seq1
Xq13.3_2_F
Xq13.3_2_R
Xq13.3_2_seq1
Xq13.3_3_F
Xq13.3_3_R
Xq13.3_3_seq1
Xq13.3_4_F
Xq13.3_4_R
Xq13.3_4_seq1
Xq13.3_5_F
Xq13.3_5_R
Xq13.3_5_seq1
Xq13.3_5_seq2
PCR / Sequencing primer
PCR / Sequencing primer
Sequencing primer
Sequencing primer
PCR / Sequencing primer
PCR / Sequencing primer
Sequencing primer
PCR / Sequencing primer
PCR / Sequencing primer
Sequencing primer
PCR / Sequencing primer
PCR / Sequencing primer
Sequencing primer
PCR / Sequencing primer
PCR / Sequencing primer
Sequencing primer
PCR / Sequencing primer
PCR / Sequencing primer
Sequencing primer
PCR / Sequencing primer
PCR / Sequencing primer
Sequencing primer
PCR / Sequencing primer
PCR / Sequencing primer
Sequencing primer
Sequencing primer
Annealing Temp.
(PCR/Sequencing)
64°C / 53°C
64°C /53°C
53°C
53°C
64°C / 53°C
64°C /53°C
53°C
64°C / 53°C
64°C /53°C
53°C
64°C / 53°C
64°C /53°C
53°C
62°C / 53°C
62°C / 53°C
53°C
62°C / 53°C
62°C / 53°C
53°C
62°C / 53°C
62°C / 53°C
53°C
62°C / 53°C
62°C / 53°C
53°C
53°C
Sequence (5’-3’)
AGTGCCCCGACACAAGTGATACAG
GAGCAGGGCTTAGGCAAGGAGA
GTTTTGAAGCCATTAAGTTGCTGAT
GGTGGAAACATTTTCAAAACTCAGA
TTCATATGCAGGGCAAGAGAACAAG
CCCTGGTCATCATGCCTGCTATTAT
AAGTTCACAGCCTTCCTCAAGAG
ATCCAAATGGCCAAACTCACCT
GCAACCCACATGCTCATCAATAG
CCAGGGAGAGCCAGGGAACA
GGAGGGTTGATGACGTTTACTTACA
TGACACATGATTGATGCCACTCTC
AGGATACAAGCCCTATTTTGCTGAA
CTCAGTAACTTGGCGAAACCTCAT
GCCCCCAACAGACTCCAGTGT
TGCAGCAACTAACAGCATTCA
TAAGTGGGAGCTGAATGATAAGAAC
GACAGGGAAGATTGAGAGTGAAGAT
TCCCATGAAACACTCTCCTAAACA
CCCCTCTGAACCCTGCTCCTA
CCCTGGACTTGTAGAAAAATCTGCT
ATAATCATGTTCTTTGGAAGACCTG
AAATCTTCTTAACTGTTGGGCACTT
TTAACGTTAACGCCATCAGTCC
GGCAATTGGGAAAGGATACTCA
AGCCAGAGTCTTGGTTTGTCTCC
The naming of the regions correspond to the names used in Fischer et al. (2006).
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
4
Supporting Table S2: List of sequence loci used in the ABC analysis.
All Pongo
Borneo
Sumatra
Locus
Loc.a Acc. nos.b
chr2a_R17
AUT
-
2165
22
44
0.0051
0.29
10
24
0.0039
1.01
12
33
0.0047
0.62
chr9_R16
AUT
-
2101
22
35
0.0036
-0.17
10
12
0.0015
-0.36
12
29
0.0042
0.50
chr12_R1
AUT
-
1954
22
43
0.0054
0.20
10
20
0.0040
1.41
12
36
0.0060
0.80
chr19_R7
AUT
-
1937
22
36
0.0039
-0.29
10
22
0.0037
0.65
12
29
0.0034
-0.54
Xq13.3
X
-
8050
36
80
0.0023
-0.13
18
6
0.0001
-1.11
18
69
0.0023
-0.30
16S
MT
HQ912716–
HQ912723
346
118
16
0.0151
2.01
52
3
0.0010
-0.99
66
9
0.0068
0.65
ND3
MT
HQ912741–
HQ912752
494
118
53
0.0385
2.86
52
8
0.0036
0.02
66
40
0.0219
0.93
CYTB
MT
HQ912724–
HQ912740
515
118
73
0.0434
2.02
52
8
0.0016
-1.42
66
62
0.0305
0.69
LBasesc
NIndd
NSege
πf
Dg
NInd
π
NSeg
D
NInd
π
NSeg
D
a
, Genomic location of the locus (AUT = autosomal, X = X-chromosomal, MT = mitochondrial); b, gene bank accession numbers; c, sequence length in base
pairs; d, number of sampled individuals; e, number of segregating sites; f, nucleotide diversity; g, Tajima’s D (Tajima 1989).
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
5
Supporting Table S3: List of microsatellite loci used in the ABC analysis.
Locus
Loc.a
D1S550
D2S1326
D4S1627
D4S2408
D5S1457
D5S1470
D5S1505
D13S321
D13S765
D16S420
O4_6
O4_A1
O4_A5
O4_A7
O4_A8
O4_B3
O4_B5
O4_B6
O4_B17
O4_B20
O4_B24
O4_C9
O4_C13
O4_Chr5
O4_Chr7
DYS502.1
DYS502.2
DYS510
DYS532
DYS556
DYS561
DYS577
DYS587
DYS630
DYS645
Y6C2
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
AUT
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
All Pongo
Borneo
Sumatra
NIndb NAc HOd HEe NInd NA HO
HE
NInd NA HO
HE
233
11 0.70 0.82 124
8 0.60 0.70 109
9 0.81 0.80
236
7 0.54 0.71 126
7 0.64 0.72 110
5 0.43 0.40
230
7 0.57 0.71 122
7 0.63 0.68 108
5 0.49 0.66
228
6 0.57 0.73 118
6 0.60 0.62 110
4 0.55 0.65
232
9 0.61 0.78 122
9 0.71 0.79 110
6 0.50 0.71
231
10 0.69 0.79 122
9 0.67 0.75 109
8 0.71 0.79
231
9 0.65 0.78 122
8 0.72 0.82 109
7 0.58 0.71
234
10 0.66 0.84 126
7 0.57 0.76 108
9 0.76 0.79
233
8 0.61 0.80 123
6 0.55 0.60 110
7 0.68 0.68
229
10 0.59 0.70 121
6 0.50 0.57 108
9 0.69 0.80
235
8 0.67 0.81 125
7 0.62 0.68 110
4 0.73 0.74
231
7 0.66 0.78 121
4 0.61 0.69 110
6 0.71 0.78
233
8 0.56 0.77 124
8 0.53 0.61 109
6 0.60 0.62
234
5 0.49 0.57 125
5 0.36 0.41 109
5 0.63 0.67
237
3 0.15 0.15 127
2 0.01 0.01 110
3 0.31 0.29
231
5 0.32 0.51 121
3 0.03 0.03 110
3 0.65 0.60
234
10 0.68 0.83 125
9 0.65 0.68 109
7 0.72 0.74
228
12 0.56 0.82 119
9 0.54 0.86 109 10 0.59 0.58
226
12 0.62 0.82 119 12 0.67 0.79 107
6 0.55 0.64
227
5 0.35 0.54 121
3 0.45 0.48 106
4 0.25 0.28
237
4 0.27 0.47 127
1 0.00 0.00 110
4 0.57 0.66
225
6 0.51 0.58 116
6 0.48 0.58 109
5 0.53 0.57
226
7 0.56 0.66 118
7 0.54 0.72 108
4 0.58 0.55
231
11 0.78 0.86 122
7 0.77 0.78 109
9 0.80 0.82
228
25 0.85 0.94 122 23 0.86 0.91 106 20 0.83 0.93
129
2
- 0.49
53
1
- 0.00
76
1
- 0.00
129
3
- 0.03
53
3
- 0.07
76
1
- 0.00
129
7
- 0.68
53
3
- 0.42
76
5
- 0.37
129
2
- 0.49
53
1
- 0.00
76
1
- 0.00
129
3
- 0.28
53
3
- 0.54
76
1
- 0.00
129
5
- 0.56
53
5
- 0.49
76
1
- 0.00
129
2
- 0.08
53
2
- 0.17
76
1
- 0.00
129
8
- 0.68
53
7
- 0.84
76
3
- 0.30
129
11
- 0.87
53
8
- 0.81
76
5
- 0.76
129
2
- 0.49
53
1
- 0.00
76
1
- 0.00
129
2
- 0.49
53
1
- 0.00
76
1
- 0.00
a
, Genomic location of the locus (AUT = autosomal, Y = Y-chromosomal); b, number of alleles; c, number
of sampled individuals; d, observed heterozygosity; e, expected heterozygosity.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
6
Supporting Table S4: Mutation rate estimates of sequence loci
Mean substitution rate 95%-HPDa 95%-HPDa Kappab pInvc Mean and SD of mutation rate
per site per generation lower
upper
per variable site per generation
Chr2a_Region17 1.61×10-8
1.06×10-8
2.21×10-8
3.85
0.73
6.02×10-8, 3.0×10-9
Chr9_Region16
3.04×10-8
2.00×10-8
4.16×10-8
3.66
0.78
14.00×10-8, 5.6×10-9
Chr12_Region1
1.84×10-8
1.24×10-8
2.49×10-8
5.14
0.51
3.77×10-8, 3.3×10-9
Chr19_Region7
2.24×10-8
1.04×10-8
3.35×10-8
5.98
0.70
7.42×10-8, 6.0×10-9
Xq13.3
2.00×10-8
1.48×10-8
2.55×10-8
3.59
0.80
10.15×10-8, 2.8×10-9
mtDNA
2.38×10-7
1.78×10-7
3.06×10-7
18.82
0.66
6.93×10-7, 3.4×10-8
a
, 95% highest posterior density boundaries; b, transition/transversion rate ratio; c, proportion of invariable sites
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
7
Supporting Table S5: Parameterisation and parameter prior distributions for all 2-population models
Parametera
Log(NNOWBO)
Log(NNOWSU)
Log(NBNBO)
Log(NBNSU)
Log(NANCBO)
Log(NANCSU)
Log(NANCPO)
Log(TSPLIT)
Log(TMIGSTOP)
Log(TBNBO)
Log(TBNSU)
Log(mBO-SU)
Log(mSU-BO)
ALPHASTR-AUT
ALPHASTR-Y
Log(MUTRAESTR-AUT)
MUTRATESTR-Y
MUTRATEChr2a
MUTRATEChr9
MUTRATEChr12
MUTRATEChr19
MUTRATEXq13.3
MUTRATEMTDNA
Prior distribution
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
normal
normal
normal
normal
normal
normal
normal
I2b
3, 5
3, 5
3, 5
4.2, 5
8, 15
8, 15
-5, -3
2.0×10-3, 1.0×10-3
6.02×10-8, 3.0×10-9
14.00×10-8, 5.6×10-9
3.77×10-8, 3.3×10-9
7.42×10-8, 6.0×10-9
10.15×10-8, 2.8×10-9
6.93×10-7, 3.4×10-8
IM2c
3, 5
3, 5
IM2-GRd
3, 5
3, 5
3, 5
4.2, 5
2.5, 4.2
3, 5
3, 5
3, 5
4.2, 5
2.5, 4.2
-5, -3
-5, -3
8, 15
8, 15
-5, -3
2.0×10-3, 1.0×10-3
6.02×10-8, 3.0×10-9
14.00×10-8, 5.6×10-9
3.77×10-8, 3.3×10-9
7.42×10-8, 6.0×10-9
10.15×10-8, 2.8×10-9
6.93×10-7, 3.4×10-8
-5, -3
-5, -3
8, 15
8, 15
-5, -3
2.0×10-3, 1.0×10-3
6.02×10-8, 3.0×10-9
14.00×10-8, 5.6×10-9
3.77×10-8, 3.3×10-9
7.42×10-8, 6.0×10-9
10.15×10-8, 2.8×10-9
6.93×10-7, 3.4×10-8
IM2-BN-GRe
3, 5
3, 5
2, 5
2, 5
3, 5
3, 5
3, 5
4.2, 5
2.5, 4.2
2, 4.2
2, 4.2
-5, -3
-5, -3
8, 15
8, 15
-5, -3
2.0×10-3, 1.0×10-3
6.02×10-8, 3.0×10-9
14.00×10-8, 5.6×10-9
3.77×10-8, 3.3×10-9
7.42×10-8, 6.0×10-9
10.15×10-8, 2.8×10-9
6.93×10-7, 3.4×10-8
a
, BO = Borneo, SU = Sumatra, PO = All Pongo, NNOW = current effective population size, NBN = effective population size during population bottleneck, NANC
= ancestral effective population size, TSPLIT = population split time, TMIGSTOP = time since migration between Borneo and Sumatra stopped, T BN = time since
population bottleneck, m = migration rate per individual per generation, ALPHA = shape parameter of gamma distribution of mutation rate, MUT = mean
mutation rate per locus/site per generation; b, isolation model with two populations; c, isolation-with-migration with two populations; d, isolation-withmigration model with two populations and exponential growth; e, isolation-with-migration model with two populations and bottleneck followed by exponential
growth.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
8
Supporting Table S6: Parameterisation and parameter prior distributions for all 10-population models
Parametera
Log(NNOWBO)
Log(NNOWNT)
Log(NNOWST)
Log(NBNBO)
Log(NTOBANT)
Log(NTOBAST)
Log(NSTRUCBO)
Log(NSTRUCNT)
Log(NANCBO)
Log(NANCNT)
Log(NANCST)
Log(TSPLITBO)
Log(TSPLITNT)
Log(TMIGSTOP)
Log(TBNDURBO)
Log(TTOBASU)
Log(TSTRUCBO)
Log(TSTRUCNT)
Log(TDECBO)
Log(TDECSU)
Log(mBO-ST)
Log(mST-BO)
Log(mNT-ST)
Log(mST-NT)
Log(mBO)
Log(mNT)
Prior
distribution
IM10b
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
uniform
3, 5
3, 5
3, 5
IM10BOc
NT
IM10DECSUd
IM10DECBOe
IM10DECALLf
IM10BNBODECSUg
3, 5
3, 5
3, 5
3, 5
2, 4
2, 4
2, 4
2, 4
2, 4
2, 4
2, 4
2, 4
3, 5
2,4
2,4
2,4
3, 5
3, 5
3, 5
3, 5
3, 5
3, 5
4.2, 4.8
4.8, 5.2
2.5, 4.2
3, 5
3, 5
3, 5
4.2, 5.2
4.2, 5.2
2.5, 4.2
3, 5
4.2, 5.2
4.2, 5.2
2.5, 4.2
3, 5
3, 5
3, 5
3, 5
4.2, 4.8
4.8, 5.2
2.5, 4.2
2.5, 4.2
3.5, 4.8
2.5, 4.2
3.5, 4.8
2.5, 4.2
3.5, 4.8
-5, -3
-5, -3
-5, -3
-5, -3
-4, -2
-4, -2
1, 3.5
-5, -3
-5, -3
-5, -3
-5, -3
-4, -2
-4, -2
-5, -3
-5, -3
-5, -3
-5, -3
-4, -2
-4, -2
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
3, 5
3, 5
3, 5
4.2, 4.8
4.8, 5.2
2.5, 4.2
2.5, 4.2
3.5, 4.8
1, 3.5
-5, -3
-5, -3
-5, -3
-5, -3
-4, -2
-4, -2
2.5, 4.2
3.5, 4.8
1, 3.5
1, 3.5
-5, -3
-5, -3
-5, -3
-5, -3
-4, -2
-4, -2
IM10BNBOTOBADECSUh
3, 5
2, 4
2, 4
2, 4
2, 4
2, 4
IM10BNBORECOLDECSUi
3, 5
2, 4
2, 4
2, 4
1, 2
1, 2
2.5, 4.2
3.5, 4.8
3, 5
3, 5
3, 5
3, 5
4.2, 4.8
4.8, 5.2
2.5, 4.2
1, 3.6
3.4, 3.5
2.5, 4.2
3.5, 4.8
3, 5
3, 5
3, 5
3, 5
4.2, 4.8
4.8, 5.2
2.5, 4.2
1, 3.6
3.4, 3.5
2.5, 4.2
3.5, 4.8
1, 3.5
-5, -3
-5, -3
-5, -3
-5, -3
-4, -2
-4, -2
1, 3.5
-5, -3
-5, -3
-5, -3
-5, -3
-4, -2
-4, -2
1, 3.5
-5, -3
-5, -3
-5, -3
-5, -3
-4, -2
-4, -2
3, 5
3, 5
3, 5
3, 5
4.2, 4.8
4.8, 5.2
2.5, 4.2
1, 3.6
9
ALPHASTR-AUT
ALPHASTR-Y
Log(MUTRAESTR-AUT)
MUTRATESTR-Y
uniform
uniform
uniform
normal
MUTRATEChr2a
normal
MUTRATEChr9
normal
MUTRATEChr12
normal
MUTRATEChr19
normal
MUTRATEXq13.3
normal
MUTRATEMTDNA
normal
8, 15
8, 15
-5, -3
2.0×10-3,
1.0×10-3
6.02×10-8,
3.0×10-9
14.00×108
, 5.6×10-9
3.77×10-8,
3.3×10-9
7.42×10-8,
6.0×10-9
10.15×108
, 2.8×10-9
6.93×10-7,
3.4×10-8
8, 15
8, 15
-5, -3
2.0×10-3,
1.0×10-3
6.02×10-8,
3.0×10-9
14.00×108
, 5.6×10-9
3.77×10-8,
3.3×10-9
7.42×10-8,
6.0×10-9
10.15×108
, 2.8×10-9
6.93×10-7,
3.4×10-8
8, 15
8, 15
-5, -3
2.0×10-3,
1.0×10-3
6.02×10-8,
3.0×10-9
14.00×108
, 5.6×10-9
3.77×10-8,
3.3×10-9
7.42×10-8,
6.0×10-9
10.15×108
, 2.8×10-9
6.93×10-7,
3.4×10-8
8, 15
8, 15
-5, -3
2.0×10-3,
1.0×10-3
6.02×10-8,
3.0×10-9
14.00×108
, 5.6×10-9
3.77×10-8,
3.3×10-9
7.42×10-8,
6.0×10-9
10.15×108
, 2.8×10-9
6.93×10-7,
3.4×10-8
8, 15
8, 15
-5, -3
2.0×10-3,
1.0×10-3
6.02×10-8,
3.0×10-9
14.00×108
, 5.6×10-9
3.77×10-8,
3.3×10-9
7.42×10-8,
6.0×10-9
10.15×108
, 2.8×10-9
6.93×10-7,
3.4×10-8
8, 15
8, 15
-5, -3
2.0×10-3,
1.0×10-3
6.02×10-8,
3.0×10-9
14.00×108
, 5.6×10-9
3.77×10-8,
3.3×10-9
7.42×10-8,
6.0×10-9
10.15×108
, 2.8×10-9
6.93×10-7,
3.4×10-8
8, 15
8, 15
-5, -3
2.0×10-3,
1.0×10-3
6.02×10-8,
3.0×10-9
14.00×108
, 5.6×10-9
3.77×10-8,
3.3×10-9
7.42×10-8,
6.0×10-9
10.15×108
, 2.8×10-9
6.93×10-7,
3.4×10-8
8, 15
8, 15
-5, -3
2.0×10-3,
1.0×10-3
6.02×10-8,
3.0×10-9
14.00×108
, 5.6×10-9
3.77×10-8,
3.3×10-9
7.42×10-8,
6.0×10-9
10.15×108
, 2.8×10-9
6.93×10-7,
3.4×10-8
a
, BO = Borneo, SU = Sumatra, NT = Sumatra north of Lake Toba, ST = Sumatra south of Lake Toba, NNOW = current effective population size, NBN =
effective population size during population bottleneck, NTOBA = effective population size during bottleneck associated with the Toba eruption, NSTRUC =
effective population size before recent decline, NANC = ancestral effective population size, TSPLIT = population split time, TMIGSTOP = time since migration
between Borneo and Sumatra stopped, TBNDUR = duration of population bottleneck, TTOBA = time of bottleneck associated with the Toba eruption, TSTRUC =
time since establishment of population structure, TDEC = time since population decline, m = migration rate per individual per generation, ALPHA = shape
parameter of gamma distribution of mutation rate, MUT = mean mutation rate per locus/site per generation; b, isolation-with-migration model with 10
populations, ST/NT split before ST/BO split; c, isolation-with-migration model with 10 populations, ST/BO split before ST/NT split; d, isolation-withmigration model with 10 populations and recent population decline on Sumatra; e, isolation-with-migration model with 10 populations and recent population
decline on Borneo; f, isolation-with-migration model with 10 populations and recent population decline on Borneo and Sumatra; g, isolation-with-migration
model with 10 populations, bottleneck on Borneo and recent population decline on Sumatra; h, isolation-with-migration model with 10 populations, bottleneck
on Borneo and Sumatra, and recent population decline on Sumatra; i, isolation-with-migration model with 10 populations, bottleneck on Borneo,
recolonisation and recent population decline of populations on Sumatra.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
10
Supporting Table S7: Summary statistics used for approximate Bayesian computation
Summary
statistic
SX
prSX
Spop
Stot
DX
Dpop
FSX
FSpop
πX
πpop
ΦST_XY
πXY
KX
Kpop
Ktot
HX
Hpop
Htot
KX
Data sets (number of statistics)
Autosomal (6),
mtDNA (3)
Autosomal (6),
mtDNA (3)
mtDNA (2)
Autosomal (2),
mtDNA (1)
Autosomal (6),
mtDNA (3)
mtDNA (2)
Autosomal (6),
mtDNA (3)
mtDNA (2)
Autosomal (6),
mtDNA (3)
X-chrom. (3),
X-chrom. (3),
Description
Number of segregating sites per population
Number of private segregating sites per population
Mean and standard deviation of the number of segregating sites over all populations
X-chrom. (1),
X-chrom. (3),
Total number of segregating sites over all populations
Tajima’s D (Tajima 1989), calculated for each population.
Mean and standard deviation of Tajima's D over all populations
X-chrom. (3),
Fu's FS statistic (Fu 1997), calculated for each population.
Mean and standard deviation of Fu's FS over all populations
X-chrom. (3),
Average number of pairwise sequence differences within each population
Mean and standard deviation of the average number of pairwise sequence differences within
each population over all populations
Autosomal (6), X-chrom. (3), Differentiation index between all pairs of populations and over all populations, calculated as
mtDNA (3)
ΦST (Excoffier et al. 1992).
Autosomal (6), X-chrom. (3),
Mean number of sequence differences between all pairs of populations
mtDNA (3)
mtDNA (3)
Number of haplotypes per population
mtDNA (2)
Mean and standard deviation of the number of haplotypes over populations
mtDNA (1)
Total number of haplotypes over all populations
mtDNA (3)
Expected heterozygosity per population
mtDNA (2)
Mean and standard deviation of expected heterozygosity over populations
mtDNA (1)
Total expected heterozygosity over all populations
Autosomal-STR (6), Y-STR
Mean and standard deviation of the number of alleles over all loci per population
(6)
mtDNA (2)
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
11
Kpop
Ktot
HX
Hpop
Htot
GWX
GWpop
GWtot
NGWX
NGWpop
RX
Rpop
Rtot
FIS, FIT, FST
FST
FST_XY
Autosomal-STR (2), Y-STR
(2)
Autosomal-STR (1), Y-STR
(1)
Autosomal-STR (6), Y-STR
(6)
Autosomal-STR (2), Y-STR
(2)
Autosomal-STR (1), Y-STR
(1)
Autosomal-STR (6), Y-STR
(6)
Autosomal-STR (2), Y-STR
(2)
Autosomal-STR (1), Y-STR
(1)
Autosomal-STR (6), Y-STR
(6)
Autosomal-STR (2), Y-STR
(2)
Autosomal-STR (6), Y-STR
(6)
Autosomal-STR (2), Y-STR
(2)
Autosomal-STR (1), Y-STR
(1)
Autosomal-STR (6), Y-STR
(2), mtDNA (2)
Autosomal (2), X-chrom. (1),
Autosomal-STR (1), Y-STR
(1), mtDNA (1)
Autosomal-STR (3), Y-STR
(3)
Mean and standard deviation of the mean number of alleles (autosomal) or haplotypes (Ychromosomal) over populations
Mean over all loci of the total number of alleles in all populations (autosomal) or total
number of haplotypes in all populations (Y-chromosomal)
Mean and standard deviation of the observed heterozygosity over all loci per population
Mean and standard deviation of the mean observed heterozygosity over populations
Mean over all loci of the total observed heterozygosity in all populations
Mean and standard deviation of the Garza-Williamson index (Garza & Williamson 2001)
over all loci per population (GWX =KX/(RX+1))
Mean and standard deviation of the mean Garza-Williamson index over populations
Mean Garza-Williamson index over all loci over all populations
Mean and standard deviation of the modified Garza-Williamson index (Garza & Williamson
2001) over all loci per population (NGWX=KX/(Rtot+1))
Mean and standard deviation of the mean modified Garza-Williamson index over populations
Mean and standard deviation of the allelic size range over all loci per population
Mean and standard deviation of the mean allelic size range over populations
Mean over all loci of the total allelic size range in all populations
Mean over all loci of the global F-statistics, separately calculated for the Bornean and
Sumatran meta-populations
Global differentiation index
Differentiation index between all pairs of populations, calculated as θ W (Weir & Cockerham
1984).
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
12
πXY
(δμ)2xY
VarX
ln(β)X
Autosomal-STR
(3)
Autosomal-STR
(3)
Autosomal-STR
(6)
Autosomal-STR
(6)
(3), Y-STR
Mean number of allelic differences between all pairs of populations
(3), Y-STR Square difference of mean within population repeat size between all pairs of populations
(Goldstein et al. 1995)
(6), Y-STR
Mean and standard deviation over loci of the allele size variance
(6), Y-STR Mean and standard deviation over loci of the natural logarithm of the imbalance index β
(Kimmel et al. 1998)
a
, The number of statistics refers to the unpooled calculations in the ten-population setting, whereby due to sample size restrictions the summary statistics, if
not otherwise indicated, are calculated over all the samples from the populations north of Lake Toba, south of Lake Toba, and Borneo, respectively. For
autosomal sequence data, each summary statistic is represented as mean and standard deviation over all four loci.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
13
Supporting Table S8: Model fits of all tested demographic models
Modela
Log10 MDb
p-valuec
I2 (pooled data)
-24.51
0.001
IM2 (pooled data)
-22.88
0.003
IM2-GR (pooled data)
-23.42
0.001
IM2-BN-GR (pooled data)
-19.17
0.017
IM10 (pooled data)
-16.71
0.224
IM10 (full data)
-20.65
0.019
IM10BO-NT (full data)
-21.52
0.011
IM10-DECSU (full data)
-15.41
0.553
IM10-DECBO (full data)
-17.90
0.060
IM10-DECALL (full data)
-15.43
0.627
IM10-BNBO-DECSU (full data)*
-15.74
0.696
IM10-BNBO-TOBA-DECSU (full data)
-15.67
0.661
IM10-BNBO-RECOL-DECSU (full data)
-16.69
0.521
a
, Values for the 2-population models correspond to a smaller set of pooled summary statistics as
compared to the 10-population models and are therefore not directly comparable. For comparison, the
simplest 10-population model is shown with values for both the pooled and the full set of summary
statistics; b, marginal density of the observed data under the inferred GLM; c, p-value of the observed data
under the inferred GLM.
* selected model
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
14
Supporting Table S9: Accuracy of different point estimators in parameter estimation
RMSEMODE
RMSEMEAN
RMSEMEDIAN
Log(NNOWBO)
0.095
0.098
0.096
Log(NNOWNT)
0.133
0.140
0.137
Log(NNOWST)
0.188
0.208
0.200
Log(NBNBO)
0.301
0.362
0.338
Log(NANCBO)
0.318
0.398
0.371
Log(NSTRUCNT)
0.293
0.354
0.329
Log(NANCNT)
0.284
0.366
0.340
Log(NANCST)
0.263
0.317
0.295
TBNENDBO
1,660
1,884
1,799
TBNDURBO
633
857
802
TSPLITBO
6,971
9,439
8,784
TDECSU
445
575
529
TSTRUCNT
6,342
7,181
6,838
TSPLITNT
10,528
11,667
11,139
TMIGSTOP
2,531
3,040
2,842
Log(mBO-ST)
0.327
0.421
0.392
Log(mST-BO)
0.327
0.413
0.385
Log(mNT-ST)
0.271
0.313
0.294
Log(mST-NT)
0.285
0.359
0.332
Log(mBO)
0.280
0.362
0.337
Log(mNT)
0.290
0.348
0.326
ALPHASTR-AUT
1.090
1.478
1.377
ALPHASTR-Y
1.123
1.548
1.449
Log(MUTRATESTR) 0.095
0.093
0.092
The accuracy is measured as the root mean squared error over 1,000 pseudo-observed data sets.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
15
Supporting Table S10: Accuracy of parameter estimation under different tolerance levels
Tolerance
0.01%
0.05%
0.10%
0.20%
0.50%
1.00%
Log(NNOWBO)
0.130
0.131
0.131
0.132
0.133
0.134
Log(NNOWNT)
0.184
0.187
0.188
0.189
0.190
0.192
Log(NNOWST)
0.263
0.267
0.269
0.270
0.271
0.272
Log(NBNBO)
0.439
0.455
0.460
0.464
0.468
0.465
Log(NANCBO)
0.471
0.490
0.496
0.500
0.505
0.507
Log(NSTRUCNT)
0.423
0.440
0.445
0.449
0.454
0.461
Log(NANCNT)
0.438
0.454
0.459
0.464
0.468
0.475
Log(NANCST)
0.391
0.405
0.410
0.415
0.420
0.423
TBNENDBO
27
27
28
28
28
28
TBNDURBO
22
23
23
23
23
23
TSPLITBO
73
76
77
77
78
78
TDECSU
18
18
19
19
19
19
TSTRUCNT
53
54
55
55
56
56
TSPLITNT
67
69
69
69
70
68
TMIGSTOP
40
42
43
43
43
43
Log(mBO-ST)
0.489
0.508
0.514
0.518
0.521
0.526
Log(mST-BO)
0.478
0.497
0.503
0.507
0.511
0.515
Log(mNT-ST)
0.390
0.403
0.407
0.410
0.414
0.414
Log(mST-NT)
0.427
0.443
0.447
0.452
0.456
0.461
Log(mBO)
0.425
0.440
0.445
0.449
0.453
0.459
Log(mNT)
0.413
0.428
0.433
0.436
0.440
0.443
ALPHASTR-AUT
0.924
0.957
0.967
0.974
0.981
0.985
ALPHASTR-Y
0.941
0.976
0.985
0.992
0.998
0.987
Log(MUTRATESTR)
0.122
0.121
0.121
0.121
0.121
0.124
The accuracy is calculated by taking the average of the root mean integrated squared error (RMISE)
(Wegmann et al. 2009) over 1,000 pseudo-observed data sets for each of six different tolerance levels
(proportion of retained simulations). For the parameter estimation, we used the closest 10,000 simulations
from the likelihood-free MCMC run over 107 simulations with an MCMC tolerance level of 0.1, which is
equal to a tolerance level of ~0.01% in a standard rejection sampling approach (Wegmann et al. 2009).
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
16
Supporting Figure S1: Pr(Data|K) and deltaK statistics for all STRUCTURE runs. The population structure analysis incorporated multiple levels of
hierarchical structure, starting with all samples and subsequently reducing the data set to only samples assigned to the same cluster in the previous analysis.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
17
Supporting Figure S2: Structure plot for 25 microsatellite markers used for the demographic modelling.
The three rows of plots correspond to the three levels of hierarchical structure we identified in the
complete data set. The geographical location of the ten sampling regions is shown in Figure 1 of the main
text.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
18
Supporting Figure S3: Gene trees based on sequence data of six different loci. The tips of black
branches refer to Sumatran samples, light grey to Bornean samples, and dark grey to the human and
chimpanzee outgroup.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
19
Supporting Figure S4: Cross validation of the parameter estimation. We drew 1,000 random parameter
sets from the prior distributions of the model parameters and generated pseudo-observed data sets by
simulating summary statistics under the selected model (IM10-BNBO-DECSU). We then performed the
standard parameter estimation procedure with each dataset. The histograms represent the number of times
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
20
the known parameter values fall into each 10%-quantile of the estimated posterior distribution. For
unbiased parameter estimates, the expectation is a uniform distribution over the entire prior space. A
concentration of data points at the borders indicate too narrow posterior estimates, while a concentration
of data points at the centre points toward too conservative posterior estimates. The p-value of the
Kolmogorov-Smirnov test is given above each histogram.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
21
Supporting Figure S5: First 16 principal components of the posterior predictive distribution for the selected model (IM10-BNBO-DECSU).
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
22
References
Arora N, Nater A, van Schaik CP, et al. (2010) Effects of Pleistocene glaciations and rivers
on the population structure of Bornean orangutans (Pongo pygmaeus). Proceedings of the
National Academy of Sciences 107, 21376-21381.
Chikhi L, Sousa VC, Luisi P, Goossens B, Beaumont MA (2010) The confounding effects of
population structure, genetic diversity and the sampling scheme on the detection and
quantification of population size changes. Genetics 186, 983-U347.
Drummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics and dating
with confidence. PLoS Biology 4, 699-710.
Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling
trees. BMC Evolutionary Biology 7, -.
Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using
the software STRUCTURE: a simulation study. Molecular Ecology 14, 2611-2620.
Excoffier L, Smouse PE, Quattro JM (1992) Analysis of molecular variance inferred from
metric distances among DNA haplotypes - application to human mitochondrial DNA
restriction data. Genetics 131, 479-491.
Fischer A, Pollack J, Thalmann O, Nickel B, Paabo S (2006) Demographic history and
genetic differentiation in apes. Current Biology 16, 1133-1138.
Fu YX (1997) Statistical tests of neutrality of mutations against population growth,
hitchhiking and background selection. Genetics 147, 915-925.
Garza JC, Williamson EG (2001) Detection of reduction in population size using data from
microsatellite loci. Molecular Ecology 10, 305-318.
Gernhard T (2008) The conditioned reconstructed process. J Theor Biol 253, 769-778.
Goldstein DB, Linares AR, Cavallisforza LL, Feldman MW (1995) Genetic absolute dating
based on microsatellites and the origin of modern humans. Proceedings of the National
Academy of Sciences of the United States of America 92, 6723-6727.
Greminger MP, Stölting KN, Nater A, et al. (2014) Generation of SNP datasets for orangutan
population genomics using improved reduced-representation sequencing and direct
comparisons of SNP calling algorithms. Bmc Genomics 15.
Hasegawa M, Kishino H, Yano TA (1985) Dating of the human ape splitting by a molecular
clock of mitochondrial DNA. Journal of Molecular Evolution 22, 160-174.
Kimmel M, Chakraborty R, King JP, et al. (1998) Signatures of population expansion in
microsatellite repeat data. Genetics 148, 1921-1930.
Nater A, Nietlisbach P, Arora N, et al. (2011) Sex-biased dispersal and volcanic activities
shaped phylogeographic patterns of extant orangutans (genus: Pongo). Molecular Biology
and Evolution 28, 2275-2288.
Nater A, Arora N, Greminger MP, et al. (2013) Marked population structure and recent
migration in the critically endangered Sumatran orangutan (Pongo abelii). Journal of
Heredity 104, 2-13.
Nietlisbach P, Arora N, Nater A, et al. (2012) Heavily male-biased long-distance dispersal of
orang-utans (genus: Pongo), as revealed by Y-chromosomal and mitochondrial genetic
markers. Molecular Ecology 21, 3173-3186.
Peter BM, Wegmann D, Excoffier L (2010) Distinguishing between population bottleneck
and population subdivision by a Bayesian model choice procedure. Molecular Ecology 19,
4648-4660.
Posada D (2008) jModelTest: phylogenetic model averaging. Molecular Biology and
Evolution 25, 1253-1256.
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using
multilocus genotype data. Genetics 155, 945-959.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
23
Stadler T, Haubold B, Merino C, Stephan W, Pfaffelhuber P (2009) The Impact of Sampling
Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations.
Genetics 182, 205-216.
Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype
reconstruction from population data. American Journal of Human Genetics 68, 978-989.
Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA
polymorphism. Genetics 123, 585-595.
Tamura K, Nei M (1993) Estimation of the number of nucleotide substitutions in the control
region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and
Evolution 10, 512-526.
Wegmann D, Leuenberger C, Excoffier L (2009) Efficient Approximate Bayesian
computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182,
1207-1218.
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population
structure. Evolution 38, 1358-1370.
Yang ZH, Rannala B (2006) Bayesian estimation of species divergence times under a
molecular clock using multiple fossil calibrations with soft bounds. Molecular Biology
and Evolution 23, 212-226.
Demographic History of Orang-Utans (Pongo spp.) – Supporting Information
24
Download