Supplemental Methods:

advertisement
Supplemental Methods:
1. Recombination tests.
To test for recombination in our alignment, we used the GARD method [1] with the
HKY85+Γ model of nucleotide substitution [2]. No evidence of recombination was found, but
examining the sequences by eye, two genes, arcC and yqiL, showed unusual patterns of
conservation at fourfold degenerate sites which might indicate undetected recombination (see
Figure S1). Accordingly, we excluded these genes from our analyses.
2. Estimation of substitution rate.
Obtaining external temporal information to calibrate bacterial phylogenies is difficult [3].
Under selective neutrality, mutation rates could be used for calibration, but in practice,
laboratory estimates of the per-generation mutation rate relate poorly to evolutionary rates in
the wild [3]. Other approaches use indirect evidence from specific host-associations, but this
is useful only when the bacterium-host associations are long-lived and stable, such as in the
case of endosymbionts [3, 4]. An alternative, widely used in virology, is to exploit known
dates of isolation to infer substitution rates from dated phylogenetic tips [5-7]. Such
approaches have been shown to be viable for S. aureus, but only with deep sequencing of
complete genomes [8-11], and the rates thus obtained are not expected to apply to sites
subject to purifying selection. This is because weak purifying selection (under which
mutations have substantial sojourn times but are unlikely to reach population-wide fixation),
can reduce rate estimates over longer periods, but not over the short time periods typical of
serial sampling schemes [12].
Accordingly, to apply a temporal scale to our phylogeny, we estimated a rate from a
reanalysis of the data of Harris et al. [8], from the most widely sampled Strain Type, ST239.
We then applied this result as a prior to the third positions only of our alignment.
Specifically, using the data of Harris et al. [8], we generated a new alignment by mapping
polymorphic sites from 62 strains to the core genome of the reference strain TW20. Using the
TW20 annotation in GenBank, we discarded sites that were intergenic, appeared in
overlapping reading frames, or in any of the 34 non-annotated genes. The final alignment of
~2.5 Mb contained 3348 polymorphic sites. A dated phylogeny was estimated with BEAST
[13]. Here, as for the analyses described in the main text, we used two MCMC runs
terminated after checking for convergence and discarding burn-in. We used the dates of
isolation provided by [8], and enforced a strict molecular clock (reflecting the shorter
timescale and relative paucity of substitutions). The posterior distribution of rates that was
obtained is shown in Figure S2, and this was used to specify a prior on the rates at third codon
positions for our main analysis. Given the near normality of the posterior shown in Figure S2,
we chose a normal prior distribution, whose mean and variance match those shown in Figure
S2. The prior was applied to the mean rate across the tree (i.e., to the mean of the lognormal
distribution of variable rates [14]), and so our model allowed for further rate variation across
the phylogeny.
Note that our rate estimate was consistent with, but more precise than, previously published
estimates from other strain types ([9, 10]; see Table S2). To demonstrate the importance of
applying the prior solely to third positions, we repeated our analysis of ST239 applying
separate rates to third codon positions. As shown in Table S2, for ST239, third position rates
did not differ significantly from those obtained from codon positions 1 and 2, which is
consistent with the inability of purifying selection to act over the 20-year period represented
by these data. However, the rate estimates from our global data set (with the prior from ST239
applied to third sites), show a much lower rate of evolution at the first two codon positions.
This is consistent with purifying selection acting predominantly on amino-acid changing
substitutions [12], but only having an effect over the longer timescales spanned by our global
data set.
3. Tests of sequence saturation
Given the depth of the phylogeny in which we are interested, it was important to test for
saturation at rapidly evolving third codon positions. Accordingly, we applied the test of Xia et
al. [15, 16] and plotted transition and transversion changes against genetic distance [17], as
shown in Figure S3. Neither test showed any evidence of saturation, suggesting that S. aureus
MLST genes evolve sufficiently slowly for our analyses.
4. Reconstruction of ancestral host states.
To model host switching events across our phylogeny, we modified the phylogeographic
method of [18], replacing locations with host states. In other words, we modelled the
probabilities of transfer between each pair of host types via a continuous-time Markov chain
with a non-reversible infinitesimal rate matrix [18].
To allow us to fix the state of the root node (which prior information indicates was a humanassociated strain), we modified published methods [19] to decouple the stationary distribution
of the rate matrix and the prior distribution over the unobserved root host state. While we
chose to incorporate our prior information in this way, for the present data, it made little
difference to the results obtained: the root node had a human host state with 95% posterior
probability in unconstrained runs.
To estimate the number of host switching events, we compute posterior expectations directly
using Markov Jumps methods [20, 21] avoiding the computational cost and high Monte Carlo
error in traditional rejection-sampling-based approaches. An example XML file showing the
syntax for implementing the new methods is provided as XML S1.
5. Comparison with other approaches of ancestral state reconstruction
To compare our new approach to other, well-used methods, we used the MCC phylogeny
produced by BEAST (as shown in Figure 1) with the package Mesquite 2.75 [22]. This
package assumes that the phylogeny and branches lengths are known without error, but
implements multiple methods of reconstructing the ancestral host states across the tree. We
first used Mesquite’s method of maximum likelihood reconstruction, under its Mk1 model, a
three-state generalisation of the Jukes-Cantor model of molecular evolution [22]. This model
is much simpler than the one implemented in BEAST, in that the rates of transition are all
assumed to be exactly equal (e.g., the human-to-bovid rate is assumed to equal the bovidhuman and the bovid-to-avian rates etc.). Nevertheless, as Figure S4 shows, the maximum
likelihood ancestral state reconstructions exactly match those obtained with our method
(Table 1; Figure 1). We also used Mesquite’s parsimony reconstruction, under which the
number of character state changes is minimised across the tree (with no explicit modelling of
rates of change between the states). The results again agreed exactly with those shown in
Table 1, and Figure S4, and so are not shown.
Legends for Supplemental Figures and Tables
Table S1: The strains and host types in the global S. aureus data set, as obtained from the
MLST database [23].
Table S2: Estimates of the rate of nucleotide substitution (substitutions/site/million years).
Rate estimates for the “global” data set are parameter estimates of the mean of the lognormal
distribution used to model rate variation across the tree.
Figure S1: Possible evidence of recombination between -S. aureus strains for the MLST
genes arcC and yqiL. Alignments shown include only four-fold degenerate sites. These genes
were excluded from the analyses in the main text.
Figure S2: Posterior density of the rate of nucleotide substitution obtained from a reanalysis
of the data of Harris et al. [8], comprising whole genome sequences of ST239.
Figure S3: The number of transition and transversion differences at third codon positions
plotted against the proportional Euclidean distance between all sequences pairs in our global
data set. The transition/transversion ratio continues to increase with genetic distance (even for
comparisons involving the outgroup. A decrease in this ratio towards or below unity would be
evidence of sequence saturation, and is therefore not detected in our data.
Figure S4: Reconstruction of ancestral host states using the parsimony approach of Mesquite
[22], and the phylogenetic topology and branch lengths from our MCC topology (as shown in
Figure 1). Ancestral host reconstructions were obtained with a likelihood model in which all
transition probabilities between host types were equiprobable. The maximum likelihood
solutions shown agreed exactly with parsimony-based reconstructions (not shown) and with
our own Bayesian estimates (Table 1; Figure 1).
XML S1: Example XML code of the model implemented in the study.
Table S2.
Strains
Ref.
[8] and this study
[10]
[9]
Total rate
Median 95% CI
2.25
(1.99, 2.52)
2.00
(1.20, 2.90)
4.90
(1.80, 8.70)
Codon positions 1 & 2
Median
95% CI
2.13
(1.88, 2.39)
-
Codon position 3
Median 95% CI
2.49
(2.18, 2.81)
-
ST239
ST225
ST5
Global
This study
1.07
0.39
2.43
(0.92, 1.23)
(0.29, 0.51)
(2.12, 2.74)
REFERENCES
1.
Kosakovsky Pond S.L., Posada, D., Gravenor, M. B., Woelk, C. H., Frost, S.
D. W. 2006 GARD: a genetic algorithm for recombination detection. Bioinformatics
22(24), 3096-98.
2.
Hasegawa M., Kishino H., et al. 1985 Dating the human-ape splitting by a
molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22, 160 - 74.
3.
Ochman H., Elwyn S., et al. 1999 Calibrating bacterial evolution. Proceedings
of the National Academy of Sciences 96(22), 12638-43.
(doi:10.1073/pnas.96.22.12638).
4.
Ochman H., Wilson A.C. 1987 Evolution in bacteria: Evidence for a universal
substitution rate in cellular genomes. Journal of Molecular Evolution 26(1), 74-86.
(doi:10.1007/bf02111283).
5.
Drummond A., Nicholls G., et al. 2002 Estimating mutation parameters,
population history and genealogy simultaneously from temporally spaced sequence
data. Genetics 161, 1307 - 20.
6.
Drummond A., Rambaut A. 2007 BEAST: Bayesian evolutionary analysis by
sampling trees. BMC Evolutionary Biology 7(1), 214.
7.
Rambaut A. 2000 Estimating the rate of molecular evolution: incorporating
non-contemporaneous sequences into maximum likelihood phylogenies.
Bioinformatics 16(4), 395-99. (doi:10.1093/bioinformatics/16.4.395).
8.
Harris S.R., Feil E.J., et al. 2010 Evolution of MRSA During Hospital
Transmission and Intercontinental Spread. Science 327(5964), 469-74.
9.
Lowder B.V., Guinane C.M., et al. 2009 Recent human-to-poultry host jump,
adaptation, and pandemic spread of Staphylococcus aureus. Proceedings of the
National Academy of Sciences 106(46), 19545-50.
10.
Nübel U., Dordel J., et al. 2010 A Timescale for Evolution, Population
Expansion, and Spatial Spread of an Emerging Clone of Methicillin-Resistant
Staphylococcus aureus. PLoS Pathogens 6(4), e1000855.
11.
Smyth D.S., McDougal L.K., et al. 2010 Population Structure of a Hybrid
Clonal Group of Methicillin-Resistant Staphylococcus aureus, ST239-MRSA-III. PLoS
ONE 5(1), e8582.
12.
Hasegawa M., Cao Y., et al. 1998 Preponderance of slightly deleterious
polymorphism in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much
higher within species than between species. Molecular Biology and Evolution 15(11),
1499-505.
13.
Drummond A.J., Suchard M.A., et al. 2012 Bayesian phylogenetics with
BEAUti and the BEAST 1.7. Molecular Biology and Evolution.
(doi:10.1093/molbev/mss075).
14.
Drummond A., Ho S., et al. 2006 Relaxed phylogenetics and dating with
confidence. PLoS Biology 4, e88.
15.
Xia X., Lemey P. 2009 Assessing substitution saturation with DAMBE. In The
Phylogenetic handbook: A practical approach to DNA and Protein Phylogeny. 2nd
Edition (eds. Lemey P., Salemi M., Vandamme A.-M.). Cambridge, Cambridge
University Press.
16.
Xia X., Xie Z., et al. 2003 An index of substitution saturation and its
application. Mol Phylogenet Evol 26(1), 1-7.
17.
Brown W., Prager E., et al. 1982 Mitochondrial DNA sequences of primates:
tempo and mode of evolution. J Mol Evol 18(4), 225-39.
18.
Lemey P., Rambaut A., et al. 2009 Bayesian Phylogeography Finds Its Roots.
PLoS Computational Biology 5(9), e1000520.
19.
Edwards Ceiridwen J., Suchard Marc A., et al. 2011 Ancient Hybridization and
an Irish Origin for the Modern Polar Bear Matriline. Current Biology 21(15), 1251-58.
(doi:10.1016/j.cub.2011.05.058).
20.
Minin V., Suchard M. 2008 Counting labeled transitions in continuous-time
Markov models of evolution. Journal of Mathematical Biology 56(3), 391-412.
(doi:10.1007/s00285-007-0120-8).
21.
O'Brien J.D., Minin V.N., et al. 2009 Learning to Count: Robust Estimates for
Labeled Distances between Molecular Sequences. Molecular Biology and Evolution
26(4), 801-14. (doi:10.1093/molbev/msp003).
22.
Maddison W.P., Maddison D.R. 2011 Mesquite: a modular system for
evolutionary analysis. Version 2.75.
23.
Enright M.C., Day N.P.J., et al. 2000 Multilocus Sequence Typing for
Characterization of Methicillin-Resistant and Methicillin-Susceptible Clones of
Staphylococcus aureus. Journal of Clinical Microbiology 38(3), 1008-15.
Download