mec12576-sup-0001-FigS1-S8-TableS1-S7

advertisement
Pleistocene Chinese cave hyenas and the recent Eurasian history of
the spotted hyena, Crocuta crocuta
Supplementary information
Materials and Methods
Samples
The six Lingxian Cave samples were collected directly from the excavation site in November 2008.
LXD-7 (CADG12) and LXD-9 (CADG14) were partial limb bones; LXD-8 (CADG13) and LXD-11
(CADG16) were partial rib bones; LXD-10 (CADG15) and LXD-12 (CADG19) were canine teeth.
These samples were associated with a faunal assemblage typical of the “Crocuta-Cervus Fauna” found
in Northern China throughout the Late Pleistocene. This included species such as deer (Cervus
canadensis), wolf (Canis lupus), fox (Vulpes corsac), and rhinoceros (Dicerorhinus mercki).
The Tonghe Bridge tooth (HS-29; CADG20) was collected directly from the excavation site in 2003.
This sample was found together with bison (Bison sp.), horse (Equus sp), mammoth (Mammuthus sp.)
and woolly rhino (Coelodonta sp.), all of which belong to the “Coelodonta-Mammuthus Fauna” in
Northeastern China (Chow 1959; Zhang 2009). The CADG samples are kept at the State Key
Laboratory of Biogeology and Environmental Geology, China University of Geosciences.
The three Da’an Cave teeth are accessioned at the Research Center for Chinese Frontier Archaeology,
Jilin University (DARD03:0337, DARD03:0428 and DARD03:0360-2; see Fig.S6). These samples
were associated with bear (Ursus sp.), deer (Cervus (Pseudaxis) grayi Zdansky), horse (Elaphus sp.),
donkey (Equus spp.), badger (Meles sp.), rhino (Dicerorhinus), and antelope (Gazella sp.).
DARD03:0337 was sent for AMS dating at the Quaternary Geology & Archaeological Chronology
Laboratory at Peking University (lab number BA121709).
Alignment and Phylogenetic Analyses
BEAST analysis. Additional analyses using an uncorrelated lognormal relaxed-clock (Drummond et al.
2006) or a Random Local Clock (Drummond & Suchard 2010) were used to account for potential rate
variation, but neither could reject the strict clock assumption, and both led to similar results with no
significant differences between divergence dates according to the 95% HPD. A Bayesian skyline
analysis was performed to account for potential demographic variation, but after a Bayes factor
comparison (Suchard et al. 2001), this model did not show any significant improvement over the
assumption of a constant population size.
Date randomisation test. All dates associated with the sequences were randomized before the
phylogenetic analysis in BEAST was replicated as described above. If the structure and spread of the
ancient sequences in the tree show enough temporal information to calibrate the analysis, the inferred
mean rate calculated using the correct association date/sequence should be significantly different from
the 95% HPD rate estimates calculated from the randomized data set. A comparison of resulting rate
estimates from ten replicates and the non-randomized data are shown in Fig.S4.
Methods:
Sequence replication
Teeth samples were extracted using a Qiagen Blood & Tissue Kit (Valencia, California) using a
modified protocol with added EDTA and Proteinase K (Thomson et al., in preparation). PCR
amplifications were carried out in 25μL volumes, using 1x of PCR buffer, 2.5 mM of MgSO4, 1
mg/mL of rabbit serum albumin (RSA), 0.2 M of each primer, 0.25 mM of dNTPs, 1 U of Platinum
HiFi Taq (Invitrogen) and 2 μL of ancient DNA extract. Cycling conditions were: 94ºC for 2 min; 50
cycles of 94ºC for 15 s, 54ºC for 30 s, and 68ºC for 20 s; 68ºC for 10 min. PCR products were
visualized under UV light on a 3.5% agarose gel stained with ethidium bromide. Successful
amplifications were purified using Ampure (Agencourt) according to manufacturer’s instructions and
both strands were sequenced directly using Big Dye chemistry and an ABI 3130XL Genetic Analyzer
(Applied Biosystems).
Bayes Factor calculation
The Bayes factor of the two divergence models given the fossil dates (earliest appearance of C. crocuta
in China and the estimated time of arrival of cave hyena in western Eurasia) was calculated empirically
from the BEAST estimates of the two basal nodes using the following formulae:
where BF12 is the Bayes Factor of model 1 compared to model 2, M1 is the tip calibrated model, M2 is
the fossil calibrated model, D is the paleontological data, Tij is the estimate of the paleontological date
i with model j, and assuming equal priors for M1 and M2.
Each of the numerator and denominator (i.e. the probability of each model given the data) was
calculated empirically using the area under the curve of the posterior distribution of each basal node,
such that:
where area 1 is the area under the curve of the posterior distribution of basal node 1 (older node) less
than the paleontological date 1 (230,000 - 400,000 ya BP), area 2 is the area under the curve of the
posterior distribution of basal node 2 (younger node) less than the paleontological date 2 (< 300,000 ya
BP), and the total area is the sum of the total areas under each posterior distribution.
These calculations were performed in R (R development core team 2011) using the approxfun function
to describe the posterior distribution of the two basal nodes from the BEAST analyses, and the
integrate function to estimate the area under the curves.
Hypothesis testing.
Hypothesis testing of each model against the paleontological fossil dates
was also used to evaluate the statistical support for each model separately. In each case, the null
hypothesis was that the paleontological fossil dates stem from the same population history as the
posterior distribution of dates for each node (calculated in BEAST using the appropriate calibration
model). The alternate hypothesis is that the paleontological fossil dates do not stem from the same
population history as the posterior distribution of dates for each node. The posterior distributions for
the tip calibration model were transformed using the natural logarithm as both nodes were positively
skewed (i.e. not normally distributed). Each posterior distribution was then scaled to a mean of zero
and standard deviation of 1; the z-score was calculated to estimate the number of standard deviations
the paleontological fossil dates were away from the mean of the posterior distribution; and the z-score
was translated into a p-value (Table S4) using R (R development core team 2011).
Serial Coalescent Simulations.
Bayesian Serial Simcoal (BayeSSC; Anderson et al. 2004) was
used to simulate datasets under two divergence models, one proposed by Rohland et al. (2005) using
fossil calibrations, and the other proposed in this study using tip dated ancient DNA samples. Four
different mutation rate estimates were used in the simulations: two constant mutation rates calculated
from BEAST analyses (using fossil calibrations as per Rohland et al. (2005) and the tip date
calibrations in this study), and two time dependent mutation rates (using linear and exponential decay
equations). Summary statistics from the twelve simulated datasets (three models by four mutation
rates) were compared to those of the genetic dataset generated in this study, using Akaike Information
Criterion (AIC) values to compare the likelihood of each model (Figure S6 and Table S5).
Summary Statistics.
For each simulated dataset we calculated 30 summary statistics, namely two
measures of population differentiation between each pair of clades (FST and private alleles between
each of clades A1, A2, B, C and D). All of the observed summary statistics from the dataset generated
in the present study were calculated using the SCStat.exe program (Anderson pers comm.) to ensure
comparable methods of generating FST; BayeSSC calculates FST using the formula of Hudson et al.
(1992), whereas Arlequin uses the standard formula of Wright (1951).
Simulation parameters. BayeSSC was used to generate 10,000,000 simulations under each of the
eight models with divergence events, with 500,000 simulations generated for each of the four
panmictic models. The panmictic models differed from the other eight in that no divergence events
were modelled, while the divergence models differed in terms of the prior distributions on divergence
order/times between clades. The Rohland et al. (2005) model proposed that the node containing all
clade A1 (Eurasian Pleistocene and modern samples) samples within clade A occurred 0.36 million
years ago (Ma), clades A-C diverged 1.3 – 1.5 Ma, and all other clades diverged from clade D 3.48 Ma
BP (2.25 – 5.09 Ma BP). Using the hyena generation time of 5.7 years (Watts et al. 2011), each of
these events was converted from years to generations and given a normal distribution as prior. The
Rohland et al. (2005) divergence model was given a divergence prior for clades A1/A2 of 63,158 ±
8,500 generations (mean ± standard deviation), a divergence prior for clades A/B and A/C of 245,614 ±
45,000 and 245,615 ± 45,000 generations respectively, and a divergence prior for all other clades from
clade D of 614,035 ± 95,000 generations ago. In contrast, the BEAST analysis using the tip calibration
dates in the present study estimates that clades A1/A2 diverged 15,614 ± 1,200 generations ago, clades
B/C diverged 28,596 ± 2,000 generations ago, clades A/B/C diverged 39,298 ± 2,000 generations ago,
and clades A/B/C/D diverged 75,439 ± 10,000 generations ago. Each of the prior standard deviations
was narrowed to ensure that no divergence event overlapped with any other, to ensure coalescence of
lineages occurred. Parameters common to all divergence models include modern effective population
size (1000 per clade), growth rate (constant population size), and migration (no migration). Parameters
common to all panmictic models include modern effective population size (5000), while the growth
rate ranged from an ancient effective population size of 1000 to a modern effective population size of
5000 over a similar timescale to the divergence of the basal node for each mutation rate/model (2 Ma
for internal calibrated model; 2.5 Ma for external calibrated model; 2 Ma for time dependent linear
decay; and 6 Ma for time dependent exponential decay; see Fig. S7).
Mutation Parameters.
The simulations were based on the current dataset of 366bp of
mitochondrial DNA sequence data, with a transition/transversion ratio of 0.956522. The gamma shape
parameter of 483 with 4 rate categories was used for the simulations using the constant mutation rate
from tip dates, as well as all the time dependent mutation rate classes (as calculated from the tip dated
BEAST analysis). The simulations for the fossil calibrated constant mutation rate used a gamma shape
parameter of 325 with 4 rate categories (as calculated from the fossil calibrated BEAST analysis). The
constant mutation rates were converted from that estimated in the BEAST analysis (in
substitutions/site/year) to the units used in BayeSSC (in substitutions/sequence length/generation)
using the sequence length of 366 base pairs and the generation time of 5.7 years (Watts et al. 2011).
Equations were written for each of the time dependent mutation rates using the constant mutation rates
above as point estimates (Table S6 and Figure S8).
Approximate Bayesian Computation.
For all the observed measures of population
differentiation, we retained the closest 0.1% of the simulations using the reject function (available at
http://www.stanford.edu/group/hadlylab/ssc/eval.r), and further estimated the maximum likelihood
estimator (MLE) for each parameter for the divergence models. The MLEs were input into BayeSSC as
priors for a second round of 10,000 simulations in order to generate AIC values using the aic.ssc
function (also available at http://www.stanford.edu/group/hadlylab/ssc/eval.r). The panmictic models
had the closest 0.1% of the simulations retained using the reject function, with AIC values generated
directly from the 500,000 simulations using the aic.ssc function, as the parameters did not have a prior
distribution to be estimated (i.e. the parameters were either constant (growth rate) or time dependent
(mutation rate)). The aic.ssc function was chosen to evaluate the different analyses, as AIC values are
the best way to compare models accounting for the differing number of parameters (i.e., the timedependent mutation rate class of models had one additional prior over the constant rate class of
models). The AIC values were converted to AICc (second order AIC values, which are more
appropriate for small sample sizes), with the delta AIC and Akaike weights calculated to provide
measures of strength of evidence for each model. The best or preferred model is that with the lowest
AIC value. Delta AIC values were generated between the best model and each of the other models as a
measure of the strength for the best model, with delta AIC values <2 providing substantial support for
the other model, delta AIC values 3 - 7 indicate the other model has considerably less support, and
delta AIC values > 10 show the other model is very unlikely. Akaike weights provide an additional
measure of strength for each model and represent the relative support that each model has out of all
candidate models tested.
Results and discussion.
Model comparison.
The Bayes Factor of M1 (using tip calibrations) compared to M2 (using fossil calibrations) was
calculated as 872 (BF <1 supports M2; 1 < BF < 3 provides support for M1 that is barely worth
mentioning; 3 < BF < 10 provides substantial support for M 1; 10 < BF < 30 provides strong support for
M1; 30 < BF < 100 provides very strong support for M1; and BF > 100 provides decisive support for
M1). Therefore, we find decisive evidence that the tip calibration model is more strongly supported by
the paleontological data under consideration than the fossil calibration model of Rohland et al. (2005).
Hypothesis testing.
Evaluating the external (fossil dated) calibration model of Rohland et al. (2005) against: basal node 1
(older node, 400-230 thousand years ago (ka)) yielded a p-value of 0.003; and against basal node 2
(younger node, < 300 ka) yielded a p-value of 0.021. These results are significant at the 5% level (i.e.
there is evidence against the null hypothesis that the paleontological fossil dates stem from the same
population history as the fossil calibrated nodes). In contrast, the internal (tip dated) calibration model
proposed in this study compared to basal node 1 yielded a p-value of 0.653, and compared to basal
node 2 yielded a p-value of 0.337, indicating there is no evidence against the null hypothesis (i.e., there
is no evidence against the null hypothesis that the paleontological fossil dates stem from the same
population history as the tip calibrated nodes).
Serial Coalescent Simulations
The model with the lowest AIC/AICc values was the model of population differentiation proposed in
the present study, with a time dependent mutation rate that followed a linear decay. The preferred
model proposes that clade D diverged from clades A/B/C 225 ka BP, which fits with the fossil record
in China where the earliest C. crocuta ultima specimen was found which dates to 400-230 ka BP
(Turner 1990; Qiu et al. 2004; Zhou et al. 2000). Clades B/C diverged from clade A 186 ka BP, clade
B diverged from clade C 127 ka BP, and clade A1 and A2 diverged from each other only 64 ka BP.
The combination of this model and the time dependent linear decay mutation rate has a 99.93% chance
of being the best model amongst those considered in this set of twelve candidate models (Table S5).
Russia
Russia
Mongolia
Mongolia
China
China
Fig. S1. Geographic distribution of fossil spotted hyenas in Far East Asia, showing Pleistocene
hyena fossil sites in China. Empty red triangles: Late Pleistocene C. crocuta ultima hyenas; empty
Fig. S1. Geographic distribution of fossil spotted hyenas in Far East Asia, showing Pleistocene
green triangles: Mid Pleistocene C. crocuta ultima hyenas; filled red triangles: samples sequenced in
hyena fossil sites in China. Empty red triangles: Late Pleistocene C. crocuta ultima hyenas; empty
this study; filled red circle: samples reported in Rohland et al. (2005).
green triangles: Mid Pleistocene C. crocuta ultima hyenas.
(a)
713
38
750
1140
CrR1l
CrF1
CrF3
CrR3
Cb1H
Cb1L
Cb3H Cb5L
Cb3L
Cb5H
(b)
CrR2
CrF2
(c)
20
98
CrF4
12
93
Cb2L
14
100
53
121
CrR4
Cb2H
20
85
20
116
Cb4L
Cb4H
16
90
38
89
113
Fig. S2. Schematic view of the 713 bp configs of the cyt b gene for the Pleistocene samples using
Fig. S2. Schematic view of the 713 bp configs of the cyt b gene for the Pleistocene samples using
nine overlapping PCR fragments. (a) Beginning and ending nucleotide positions of the 713 bp
contigs
in the 1140
bp complete
cyt (a)
b gene;
(b) Primer
bindingnucleotide
areas of nine
primerof
pairs;
(c) Nine
nine
overlapping
PCR
fragments.
Beginning
and ending
positions
the 713
bp
overlapping fragments, numbers below fragments show length of the amplification products without
contigs in the 1140 bp complete cyt b gene; (b) Primer binding areas of nine primer pairs; (c) Nine
primers, numbers above fragments show overlaps between individual fragments.
overlapping fragments, numbers below fragments show length of the amplification products without
primers, numbers above fragments show overlaps between individual fragments.
Fig. S3. Phylogenetic tree for fossil and extant spotted hyenas from the 366bp dataset, using the
striped hyena as an outgroup and calibration of 9-9.5 Ma for the divergence between Crocuta lineage
and Hyaena/Parahyaena lineage, in addition to the tip calibrations from the dates of ancient samples.
(a)
(b)
1e− 07
2E7
1.75E7
1e− 09
1.5E7
1e− 11
Rate (s/s/y)
tMRCA (years)
1.25E7
10000000
95% HPD:
5.12E4 - 1.78E7
1.33E5 - 8.47E5
7500000
5000000
1e− 13
2500000
Iterations
0
Prior only
With data
Fig. S4. Different tests showing temporal signal in the datasets. (a) Date Randomization Test. Red
circle and dotted line represent the mean rate calculated during the phylogenetic analysis of the 366bp
alignment of spotted hyenas using the radiocarbon dates associated with the ancient sequences as
calibration. The grey lines represent the 95% HPD of rates calculated for ten replicates of the same
analysis with randomized dates. The fact that none of these margins overlap with the original mean rate
demonstrates that the radiocarbon dates used for this study is informative enough to calibrate the timed
phylogeny. (b) BEAST run with priors only. Comparison of the tMRCA of spotted hyena calculated
from BEAST with and without the data, to investigate the influence of the priors on the results. This
comparison shows that the tMRCA inferred in this study is not driven by the priors only, but is a result
of the phylogenetic signal from the genetic data combined with the calibration dates.
Fig. S5. Comparison of rates (a) and dates (b) estimates with 2 calibration dates for the Chinese
samples. Marginal posterior densities of the inferred molecular rates and the two basal nodes of the
calculated
for phylogeny
ten replicates
of the from
sameBEAST
analysisanalyses,
with randomized
dates.
factkathat
none
of these
spotted
hyena
reported
using either
theThe
35.52
direct
AMS
dating
margins overlap with the original mean rate demonstrates that the radiocarbon dates used for this study
of DARD-1 or the 34.37 ka proxy date from the associated deer bone as a calibration for all three
is informative enough to calibrate the timed phylogeny.
Chinese samples (DARD-1, 2 and 3).
Fig.S5 Photos of three Da’an Cave specimens. DARD03:0337, DARD03:0428, and DARD03:0360
Fig.the
S6.excavation
Photos ofno.
three
Da’an
Cavewhich
specimens.
DARD03:0337,
DARD03:0428,
and DARD03:0360
are
of the
samples,
have been
named DARD-1,
DARD-2, and
DARD-3 in
Table
respectively.
are
theS2,
excavation
no. of the samples, which have been named DARD-1, DARD-2, and DARD-3 in
Table S2, respectively. The photos were taken before the tips were removed for DNA extraction.
Fig. S7. Posterior maximum likelihood estimators output from BayeSSC. The AIC values calculated for each of the twelve model/rate combinations are noted, with the
best model indicated in grey with the AIC value asterixed* (Model H). The branching order of some populations in Models K and L are reversed due to the way in which
divergence events are described in BayeSSC (i.e. 100% of population 1 migrates into population 2 backwards in time).
Fig. S8. Graph of time dependent mutation rate, showing (a) linear and (b) exponential decay
relationships between the two calibration points. The youngest calibration point signifies tip dates
from radiocarbon dated ancient bone samples (11 samples which average to 7,522 generations ago, or
42,877 ya BP), and has a BEAST calculated mutation rate of 1.83 x 10-4 subs/seq/gens (or 8.8 x 10-8
subs/site/yr). The oldest calibration point uses fossil dates to estimate the basal node on a phylogenetic
tree at 1,754,385 generations ago (or approximately 10 Ma BP), with a BEAST calculated mutation
rate of 0.19 x 10-4 subs/seq/gens (or 9.34 x 10-9 subs/site/yr). The equation for the linear relationship is:
1.83x10-4 – (9.39x10-11 x [T]) and the equation for the exponential decay relationship is: 1.85x10 -4 x
0.999998 [T].
Table S1. PCR primers for Crocuta crocuta ultima mitochondrial cytochrome b gene
Table S2. Details on sequences used in this study
No. in this study
DARD-1
DARD03:0337
DARD-2
DARD03:0428
DARD-3
DARD03:0360-2
C.crocuta_Belgium
C.crocuta_Russia
C.crocuta_Austria1
C.crocuta_Austria2
C.crocuta_Czech Rep
C.crocuta_North Sea
C.crocuta_Romania
C.crocuta_France
C.crocuta_Ukraine
C.crocuta_Altai
C.crocuta_Slovalia
C.crocuta_Germany1
C.crocuta_Germany2
C.crocuta_Hungary
C.crocuta_Senegal
C.crocuta_Ethiopial
C.crocuta_Cameroon
C.crocuta_Togo
C.crocuta_Tanzania
C.crocuta_Rwanda
C.crocuta_Eritrea
C.crocuta_Sudan
C.crocuta_Zimbabwe
C.crocuta_Namibia
GenBank
Length
Accesion No.
Species
C14 age (ka)
Crocuta crocuta ultima
35.52±0.23
KC117379
Crocuta crocuta ultima
~35.52±0.23
Crocuta crocuta ultima
~35.52±0.23
Crocuta crocuta spelaea
N/A
Crocuta crocuta spelaea 48.65+2.38/-1.84
Crocuta crocuta spelaea
38.06±0.85
Crocuta crocuta spelaea
38.68±0.92
Crocuta crocuta spelaea
46.0±2.1
Crocuta crocuta spelaea
N/A
Crocuta crocuta spelaea
41.8+1.4/-1.2
Crocuta crocuta spelaea
40.7±0.9
Crocuta crocuta spelaea
41.3±1.2
Crocuta crocuta spelaea 42.3+0.94/-0.84
Crocuta crocuta spelaea
51.2+4.9/-3.0
Crocuta crocuta spelaea
N/A
Crocuta crocuta spelaea
N/A
Crocuta crocuta spelaea
41.8±1.3
Crocuta crocuta
modern
Crocuta crocuta
modern
Crocuta crocuta
modern
Crocuta crocuta
modern
Crocuta crocuta
modern
Crocuta crocuta
modern
Crocuta crocuta
modern
Crocuta crocuta
modern
Crocuta crocuta
modern
Crocuta crocuta
modern
Location
Reference
713 bp
China
This study
KC117380
713 bp
China
This study
KC117381
713 bp
China
This study
DQ157554
DQ157555
AJ809318
AJ809320
AJ809321
AJ809323
AJ809324
AJ809325
AJ809326
AJ809327
AJ809328
AJ809329
AJ809330
AJ809331
DQ157556
DQ157557
DQ157558
DQ157559
DQ157560
DQ157562
DQ157564
DQ157565
DQ157566
DQ157568
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
366 bp
Belgium
Russia
Teufelsucke Austria
Winden, Austria
Czech Rep.
The Netherlands
Romania
France
Ukraine
Russia
Slovakia
Germany
Germany
Hungary
Senegal
Ethiopial
Cameroon
Togo
Tanzania
NE-Rwanda
Eritrea
Sudan
Zimbabwe
Namibia
1
1
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
C.crocuta_South Africa
Crocuta crocuta
modern
DQ157569
366 bp
South Africa
1
C.crocuta_Angola
C.crocuta_Somalia
C.crocuta_Kenya
C.crocuta_Uganda
C.crocuta_zoo1
C.crocuta_zoo2
C.crocuta_zoo3
C.crocuta_zoo4
H.hyaena
H.hyaena
P.brunnea
P.brunnea
P.cristatus
P.cristatus
Crocuta crocuta
Crocuta crocuta
Crocuta crocuta
Crocuta crocuta
Crocuta crocuta
Crocuta crocuta
Crocuta crocuta
Crocuta crocuta
Hyaena hyaena
Hyaena hyaena
Parahyaena brunnea
Parahyaena brunnea
Proteles cristatus
Proteles cristatus
modern
modern
modern
modern
modern
modern
modern
modern
modern
modern
modern
modern
modern
modern
DQ157570
DQ157571
DQ157572
DQ157574
AY048786
AF511064
AY170114
AY928676
AY928678
AY048787
AY048790
AY928677
AY048791
AY048792
366 bp
366 bp
366 bp
366 bp
1137 bp
1140 bp
1140 bp
1140 bp
1140 bp
1137bp
1137bp
1140bp
1137bp
1137bp
Angola
Somalia
Kenya
Uganda
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
1
1
1
1
3
4
5
6
6
3
3
6
3
3
P.cristatus
Proteles cristatus
modern
AY928675
1140bp
N/A
6
Table S3. Variations in the newly obtained ancient sequences compared to living spotted hyenas
Four spotted hyena Cyt b sequences (Accession Nos: AY048786, AY928786, AY170114, and
AF511064) available in GenBank. The large majority of the variable positions in the Pleistocene fossil
spotted hyenas were transitions, with only two transversions (A→T at nucleotide position 258 and
T→A at nucleotide position 712). Moreover, 19.4%, 10.2%, and 69.4% of the polymorphic sites were
40
46
84
90
96
102
109
124
144
153
179
204
219
243
245
258
261
264
291
321
336
342
358
387
390
393
396
402
426
438
468
474
478
495
503
513
565
574
577
578
648
657
669
705
707
709
712
found at 1st, 2nd, and 3rd codon positions, respectively.
AY048786
AY928676
AY170114
AF511064
DARD-1
DARD-2
A
.
.
G
G
G
A
.
.
G
.
.
A
G
G
.
.
.
A
G
G
.
.
.
T
.
.
.
C
C
G
.
.
.
A
A
C
T
T
T
.
.
G
A
A
A
A
A
C
T
T
T
.
.
A
.
.
.
G
G
C
.
.
.
T
T
C
.
.
.
T
T
T
A
A
A
.
.
C
.
.
.
T
T
T
.
.
.
C
C
A
.
.
.
T
T
T
.
.
.
C
C
C
.
.
.
T
T
C
T
T
T
T
T
T
C
C
C
C
C
G
.
.
.
A
A
T
C
C
C
C
C
C
.
.
.
T
T
G
A
A
A
.
.
T
.
.
.
C
C
T
.
.
.
C
C
C
.
.
.
T
T
G
A
A
A
A
A
T
.
.
.
C
C
T
.
.
C
.
C
C
.
.
.
T
T
C
T
T
T
.
.
T
C
C
C
C
C
A
.
.
.
G
G
T
.
.
C
.
.
T
.
.
.
C
C
G
A
A
A
.
.
C
.
.
G
.
.
G
.
.
.
A
A
T
C
C
C
C
C
C
.
.
.
T
T
A
.
.
.
G
G
C
.
.
.
T
T
G
.
.
.
A
A
T
.
.
.
C
C
C
T
T
T
.
.
T
A
A
A
A
A
DARD-3
G
.
.
.
C
A
.
A
.
G
T
T
.
T
C
T
C
T
T
C
A
C
T
.
C
C
T
A
C
.
T
.
C
G
.
C
.
.
A
.
T
T
T
A
C
.
A
is no evidence against the null hypothesis that the paleontological fossil dates stem from the same
population history as the tip calibrated nodes).
'
Table'S4'Probability'of'observing'the'paleontological'fossil'dates'under'the'null'
Table
S4. Probability of observing the paleontological fossil dates under the null hypothesis that
hypothesis'that'the'posterior'distribution'of'the'basal'nodes'are'either'the'externally'
the
posterior distribution of the basal nodes are either the externally (Miocene fossil dated)
(fossil'dated)'calibration'model'or'the'internally'(tip'dated)'calibration'model.''
calibration model or the internally (AMS radiocarbon tip dated) calibration model.
Paleontological fossil date
Old fossil date between 400-230 kya
Young fossil date (<300 kya)
Old fossil date between 400-230 kya
Young fossil date (<300 kya)
Model
Fossil calibration
Fossil calibration
Tip calibration
Tip calibration
P-value
0.003
0.021
0.653
0.337
Serial Coalescent Simulations
The model with the lowest AIC/AICc values was the model of population differentiation proposed in
the present study, with a time dependent mutation rate that followed a linear decay. The preferred
model proposes that clade D diverged from clades A/B/C 225 kya BP, which fits with the fossil record
in China where the earliest C. crocuta ultima specimen was found which dates to 400-230 kya BP (1517). Clades B/C diverged from clade A 186 kya BP, clade B diverged from clade C 127 kya BP, and
clade A1 and A2 diverged from each other only 64 kya BP. The combination of this model and the
time dependent linear decay mutation rate has a 99.93% chance of being the best model amongst those
considered in this set of eight candidate models (Table S5).
Table S5. Akaike Information Criterion (AIC) values for the twelve different model/mutation
rate combinations.
Method
of rate
calc.
Model
P
#
Ln Li
Li
AIC
AICc
Delta
AICc
Relative
model
likelihoods
Akaike
weights
(wi)
H0 (A) 1 -337.09 4.01x10-147 338.09 676.31 157.80 5.41x10-35 0.0000
Tip dates H1 (B) 4 -278.94 7.19x10-122 282.94 567.26 48.76 2.58x10-11 0.0000
H2 (C) 4 -262.90 6.67x10-115 266.90 535.18 16.68 2.39x10-04 0.0002
H0 (D) 1 -528.76 2.31x10-230 529.76 1059.64 541.14 3.12x10-118 0.0000
Fossil
H1 (E) 4 -427.52 2.15x10-186 431.52 864.41 345.91 7.70x10-76 0.0000
dates
H2 (F) 4
Infinity
-146
H0 (G) 2 -335.69 1.63x10
337.69 675.76 157.26 7.10x10-35 0.0000
TD linear H1* (H) 5 -253.18 1.11x10-110 258.18 518.50
0.00
1
0.9993
H2 (I) 5
Infinity
H0 (J) 2 -341.40 5.40x10-149 343.40 687.19 168.68 2.35x10-37 0.0000
TD_expo H1 (K) 5 -261.95 1.72x10-114 266.95 536.05 17.54 1.55x10-04 0.0002
H2 (L) 5 -261.39 3.02x10-114 266.39 534.92 16.42 2.72x10-04 0.0003
* Line highlighted in grey represents the model with lowest AIC value – best model;
TD – Time dependent mutation rate;
Expo – exponential decay equation describes the decreasing mutation rate with increasing time since
the present day;
H0 – panmixia model rejected by Rohland et al. (2005) and shown in Fig. S7A, D, G, J;
H1 – model proposed in the present study and shown in Fig. S7B, E, H, K;
H2 – model proposed in Rohland et al. (2005) and shown in Fig. S7C, F, I, L;
P# - number of parameters extimated in BayeSSC;
Li – Likelihood of the model;
Ln Li – Natural logarithm of the likelihood of the model;
AIC – Akaike Information Criterion;
AICc – second order Akaike Information Criterion;
Delta AICc – difference between the AICc of each model compared to the best model.
Table S6. Maximum Likelihood Estimators (MLE) with lower (2.5%) and upper (97.5%) confidence bounds for divergence times from the Bayesian Serial Simcoal
(BayeSSC) analysis (for divergence models only – H1 and H2).
Split between A1 & A2
Split between B & C
Split between A2 & C
Split between C and D
2.5%
MLE
97.5%
2.5%
MLE
97.5%
2.5%
MLE
97.5%
2.5%
MLE
97.5%
Demographic model proposed in this
paper – H1
Constant mutation rate: based on tip
59,577
72,739
115,715 115,287 129,506 204,436
185,441 186,272 257,743
261,527
565,921
609,963
dates
Constant mutation rate: based on
63,024
62,699
114,357 121,448 121,427 204,663
185,606 184,859 259,154
280,652
639,463
640,747
fossil dates
Time dependent mutation rate: linear
63,226
63,774
111,345 123,206 127,599 202,253
183,982 186,254 260,668
226,888
225,792
663,071
decay
Time dependent mutation rate:
64,644
67,990
111,116 124,287 209,426 209,497
182,570 189,765 266,983
232,302
651,151
645,110
exponential decay
Demographic model of Rohland et
al. (2005) – H2
Constant mutation rate: based on tip
161,025 182,730 476,673 397,740 431,755 2,267,223 199,416 442,537 1,850,296 1,503,967 1,613,764 5,233,553
dates
Constant mutation rate: based on
194,145 196,106 526,087 558,036 609,659 2,178,957 471,872 648,565 2,241,066 1,816,061 1,982,040 5,323,330
fossil dates
Time dependent mutation rate: linear
143,180 177,202 445,294 282,382 365,218 2,082,622 247,830 400,721 1,801,559 1,443,440 1,543,733 5,208,712
decay
Time dependent mutation rate:
159,092 172,986 452,070 339,612 469,850 2,238,376 286,378 439,309 2,013,735 1,350,090 5,284,195 5,305,761
exponential decay
Line highlighted in grey represents the model and mutation rate with lowest AIC (i.e. the more likely of the models).
Values in italix represent a reversal in branching order from the other versions of this model, resulting from the way BayeSSC requires divergence events to be described (i.e.
100% of individuals from population C migrating backwards in time to population A2 prior to 100% of individuals from population C migrating backwards in time to
population B).
Table S7. Mutation rates used in the BayeSSC analysis, and the method used to calculate the time
dependent mutation rates.
Rate at
Rate at 7,522
1,754,385
generations
generations
-4
Tip dates (this study)
Constant
1.836 x 10
1.836 x 10-4
-4
Fossil calibrations (4)
Constant
0.195 x 10
0.195 x 10-4
-4
-11
-4
Time dependent – linear
1.83x10 – (9.39x10 x [T]) 0.195 x 10
1.836 x 10-4
-4
[T]
-4
Time dependent - exponential
1.85x10 x (0.999998 )
0.195 x 10
1.836 x 10-4
[T] – Number of generations back in time as lineages coalesce, i.e. for each generation simulated from
the present back in time, the value of [T] increases by 1.
The rate at 7522 generations (42,877 ya BP) equates to the average of the tip dates across the tree.
The rate at 1,754,385 generations (10 Mya BP) equates to the fossil calibration of the basal node used
in Rohland et al. (2005).
Method of rate calculation
Mutation rate description
Reference:
Anderson CNK, Ramakrishnan U, Chan YL, Hadly EA (2004). Serial SimCoal: A population genetics
model for data from multiple populations and points in time. Bioinformatics, 21, 1733-1734.
Chow MC (1959) Age of the mammalian fossil assemblages. In: Pleistocene Mammalian Fossils from
the Northeastern Provinces (ed. The Paleomammalogy Group of IVPP). Memoirs of Institute
of Vertebrate Paleontology and Paleoanthropology Academia Sinica, No. 3 pp. 9-10 (in
Chinese with English summary).
Drummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics and dating with
confidence. PLoS Biology, 4, 699-710.
Drummond AJ, Suchard MA (2010) Bayesian random local clocks, or one rate to rule them all. BMC
Biology, 8, 114.
Hudson RR, Slatkin M, Maddison WP (1992) Estimation of levels of gene flow from DNA sequence
data. Genetics, 132, 583-589.
Qiu ZX, Deng T, Wang BY (2004) Early Pleistocene mammalian fauna from Longdan, Dongxiang,
Gansu, China. Palaeontologia Sinica, Series C, 27, 1-198.
R development core team (2011) R: A language and environment for statistical computing. (R
foundation for statistical computing, Retrieved from http://www.r-project.org).
Rohland N, Pollack J L, Nagel D et al. (2005) The population history of extant and extinct hyenas.
Molecular Biology and Evolution, 22, 2435-2443.
Suchard MA, Weiss RE, Sinsheimer JS (2001) Bayesian selection of continuous-time Markov Chain
Evolutionary models. Molecular Biology and Evolution, 18, 1001-1013.
Turner, A (1990) The evolution of the guild of larger terrestrial carnivores during the Plio-Pleistocene
in Africa. Geobios, 23, 349-368.
Watts HE, Scribner KT, Garcia HA, Holekamp KE (2011) Genetic diversity and structure in two
spotted hyena populations reflects social organization and male dispersal. Journal of Zoology,
285, 281-291.
Wright S (1951) The genetical structure of populations. Annals of Eugenics, 15, 323-354.
Zhang HC (2009) A Review of the Study of Environmental Changes and Extinction of the
Mammuthus-Colelodonta Fauna during the Middle-late Late Pleistocene in NE China.
Advances in Earth Science, 24, 49-60.
Zhou C, Lui Z, Wang Y (2000) Climatic cycles investigated by sediment analysis in Peking Man’s
Cave, Zhoukoudian, China. Journal of Archaeological Science, 27, 101-109.
Download