Supplemental Methods: 1. Recombination tests. To test for recombination in our alignment, we used the GARD method [1] with the HKY85+Γ model of nucleotide substitution [2]. No evidence of recombination was found, but examining the sequences by eye, two genes, arcC and yqiL, showed unusual patterns of conservation at fourfold degenerate sites which might indicate undetected recombination (see Figure S1). Accordingly, we excluded these genes from our analyses. 2. Estimation of substitution rate. Obtaining external temporal information to calibrate bacterial phylogenies is difficult [3]. Under selective neutrality, mutation rates could be used for calibration, but in practice, laboratory estimates of the per-generation mutation rate relate poorly to evolutionary rates in the wild [3]. Other approaches use indirect evidence from specific host-associations, but this is useful only when the bacterium-host associations are long-lived and stable, such as in the case of endosymbionts [3, 4]. An alternative, widely used in virology, is to exploit known dates of isolation to infer substitution rates from dated phylogenetic tips [5-7]. Such approaches have been shown to be viable for S. aureus, but only with deep sequencing of complete genomes [8-11], and the rates thus obtained are not expected to apply to sites subject to purifying selection. This is because weak purifying selection (under which mutations have substantial sojourn times but are unlikely to reach population-wide fixation), can reduce rate estimates over longer periods, but not over the short time periods typical of serial sampling schemes [12]. Accordingly, to apply a temporal scale to our phylogeny, we estimated a rate from a reanalysis of the data of Harris et al. [8], from the most widely sampled Strain Type, ST239. We then applied this result as a prior to the third positions only of our alignment. Specifically, using the data of Harris et al. [8], we generated a new alignment by mapping polymorphic sites from 62 strains to the core genome of the reference strain TW20. Using the TW20 annotation in GenBank, we discarded sites that were intergenic, appeared in overlapping reading frames, or in any of the 34 non-annotated genes. The final alignment of ~2.5 Mb contained 3348 polymorphic sites. A dated phylogeny was estimated with BEAST [13]. Here, as for the analyses described in the main text, we used two MCMC runs terminated after checking for convergence and discarding burn-in. We used the dates of isolation provided by [8], and enforced a strict molecular clock (reflecting the shorter timescale and relative paucity of substitutions). The posterior distribution of rates that was obtained is shown in Figure S2, and this was used to specify a prior on the rates at third codon positions for our main analysis. Given the near normality of the posterior shown in Figure S2, we chose a normal prior distribution, whose mean and variance match those shown in Figure S2. The prior was applied to the mean rate across the tree (i.e., to the mean of the lognormal distribution of variable rates [14]), and so our model allowed for further rate variation across the phylogeny. Note that our rate estimate was consistent with, but more precise than, previously published estimates from other strain types ([9, 10]; see Table S2). To demonstrate the importance of applying the prior solely to third positions, we repeated our analysis of ST239 applying separate rates to third codon positions. As shown in Table S2, for ST239, third position rates did not differ significantly from those obtained from codon positions 1 and 2, which is consistent with the inability of purifying selection to act over the 20-year period represented by these data. However, the rate estimates from our global data set (with the prior from ST239 applied to third sites), show a much lower rate of evolution at the first two codon positions. This is consistent with purifying selection acting predominantly on amino-acid changing substitutions [12], but only having an effect over the longer timescales spanned by our global data set. 3. Tests of sequence saturation Given the depth of the phylogeny in which we are interested, it was important to test for saturation at rapidly evolving third codon positions. Accordingly, we applied the test of Xia et al. [15, 16] and plotted transition and transversion changes against genetic distance [17], as shown in Figure S3. Neither test showed any evidence of saturation, suggesting that S. aureus MLST genes evolve sufficiently slowly for our analyses. 4. Reconstruction of ancestral host states. To model host switching events across our phylogeny, we modified the phylogeographic method of [18], replacing locations with host states. In other words, we modelled the probabilities of transfer between each pair of host types via a continuous-time Markov chain with a non-reversible infinitesimal rate matrix [18]. To allow us to fix the state of the root node (which prior information indicates was a humanassociated strain), we modified published methods [19] to decouple the stationary distribution of the rate matrix and the prior distribution over the unobserved root host state. While we chose to incorporate our prior information in this way, for the present data, it made little difference to the results obtained: the root node had a human host state with 95% posterior probability in unconstrained runs. To estimate the number of host switching events, we compute posterior expectations directly using Markov Jumps methods [20, 21] avoiding the computational cost and high Monte Carlo error in traditional rejection-sampling-based approaches. An example XML file showing the syntax for implementing the new methods is provided as XML S1. 5. Comparison with other approaches of ancestral state reconstruction To compare our new approach to other, well-used methods, we used the MCC phylogeny produced by BEAST (as shown in Figure 1) with the package Mesquite 2.75 [22]. This package assumes that the phylogeny and branches lengths are known without error, but implements multiple methods of reconstructing the ancestral host states across the tree. We first used Mesquite’s method of maximum likelihood reconstruction, under its Mk1 model, a three-state generalisation of the Jukes-Cantor model of molecular evolution [22]. This model is much simpler than the one implemented in BEAST, in that the rates of transition are all assumed to be exactly equal (e.g., the human-to-bovid rate is assumed to equal the bovidhuman and the bovid-to-avian rates etc.). Nevertheless, as Figure S4 shows, the maximum likelihood ancestral state reconstructions exactly match those obtained with our method (Table 1; Figure 1). We also used Mesquite’s parsimony reconstruction, under which the number of character state changes is minimised across the tree (with no explicit modelling of rates of change between the states). The results again agreed exactly with those shown in Table 1, and Figure S4, and so are not shown. Legends for Supplemental Figures and Tables Table S1: The strains and host types in the global S. aureus data set, as obtained from the MLST database [23]. Table S2: Estimates of the rate of nucleotide substitution (substitutions/site/million years). Rate estimates for the “global” data set are parameter estimates of the mean of the lognormal distribution used to model rate variation across the tree. Figure S1: Possible evidence of recombination between -S. aureus strains for the MLST genes arcC and yqiL. Alignments shown include only four-fold degenerate sites. These genes were excluded from the analyses in the main text. Figure S2: Posterior density of the rate of nucleotide substitution obtained from a reanalysis of the data of Harris et al. [8], comprising whole genome sequences of ST239. Figure S3: The number of transition and transversion differences at third codon positions plotted against the proportional Euclidean distance between all sequences pairs in our global data set. The transition/transversion ratio continues to increase with genetic distance (even for comparisons involving the outgroup. A decrease in this ratio towards or below unity would be evidence of sequence saturation, and is therefore not detected in our data. Figure S4: Reconstruction of ancestral host states using the parsimony approach of Mesquite [22], and the phylogenetic topology and branch lengths from our MCC topology (as shown in Figure 1). Ancestral host reconstructions were obtained with a likelihood model in which all transition probabilities between host types were equiprobable. The maximum likelihood solutions shown agreed exactly with parsimony-based reconstructions (not shown) and with our own Bayesian estimates (Table 1; Figure 1). XML S1: Example XML code of the model implemented in the study. Table S2. Strains Ref. [8] and this study [10] [9] Total rate Median 95% CI 2.25 (1.99, 2.52) 2.00 (1.20, 2.90) 4.90 (1.80, 8.70) Codon positions 1 & 2 Median 95% CI 2.13 (1.88, 2.39) - Codon position 3 Median 95% CI 2.49 (2.18, 2.81) - ST239 ST225 ST5 Global This study 1.07 0.39 2.43 (0.92, 1.23) (0.29, 0.51) (2.12, 2.74) REFERENCES 1. Kosakovsky Pond S.L., Posada, D., Gravenor, M. B., Woelk, C. H., Frost, S. D. W. 2006 GARD: a genetic algorithm for recombination detection. Bioinformatics 22(24), 3096-98. 2. Hasegawa M., Kishino H., et al. 1985 Dating the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22, 160 - 74. 3. Ochman H., Elwyn S., et al. 1999 Calibrating bacterial evolution. Proceedings of the National Academy of Sciences 96(22), 12638-43. (doi:10.1073/pnas.96.22.12638). 4. Ochman H., Wilson A.C. 1987 Evolution in bacteria: Evidence for a universal substitution rate in cellular genomes. Journal of Molecular Evolution 26(1), 74-86. (doi:10.1007/bf02111283). 5. Drummond A., Nicholls G., et al. 2002 Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161, 1307 - 20. 6. Drummond A., Rambaut A. 2007 BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology 7(1), 214. 7. Rambaut A. 2000 Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies. Bioinformatics 16(4), 395-99. (doi:10.1093/bioinformatics/16.4.395). 8. Harris S.R., Feil E.J., et al. 2010 Evolution of MRSA During Hospital Transmission and Intercontinental Spread. Science 327(5964), 469-74. 9. Lowder B.V., Guinane C.M., et al. 2009 Recent human-to-poultry host jump, adaptation, and pandemic spread of Staphylococcus aureus. Proceedings of the National Academy of Sciences 106(46), 19545-50. 10. Nübel U., Dordel J., et al. 2010 A Timescale for Evolution, Population Expansion, and Spatial Spread of an Emerging Clone of Methicillin-Resistant Staphylococcus aureus. PLoS Pathogens 6(4), e1000855. 11. Smyth D.S., McDougal L.K., et al. 2010 Population Structure of a Hybrid Clonal Group of Methicillin-Resistant Staphylococcus aureus, ST239-MRSA-III. PLoS ONE 5(1), e8582. 12. Hasegawa M., Cao Y., et al. 1998 Preponderance of slightly deleterious polymorphism in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much higher within species than between species. Molecular Biology and Evolution 15(11), 1499-505. 13. Drummond A.J., Suchard M.A., et al. 2012 Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution. (doi:10.1093/molbev/mss075). 14. Drummond A., Ho S., et al. 2006 Relaxed phylogenetics and dating with confidence. PLoS Biology 4, e88. 15. Xia X., Lemey P. 2009 Assessing substitution saturation with DAMBE. In The Phylogenetic handbook: A practical approach to DNA and Protein Phylogeny. 2nd Edition (eds. Lemey P., Salemi M., Vandamme A.-M.). Cambridge, Cambridge University Press. 16. Xia X., Xie Z., et al. 2003 An index of substitution saturation and its application. Mol Phylogenet Evol 26(1), 1-7. 17. Brown W., Prager E., et al. 1982 Mitochondrial DNA sequences of primates: tempo and mode of evolution. J Mol Evol 18(4), 225-39. 18. Lemey P., Rambaut A., et al. 2009 Bayesian Phylogeography Finds Its Roots. PLoS Computational Biology 5(9), e1000520. 19. Edwards Ceiridwen J., Suchard Marc A., et al. 2011 Ancient Hybridization and an Irish Origin for the Modern Polar Bear Matriline. Current Biology 21(15), 1251-58. (doi:10.1016/j.cub.2011.05.058). 20. Minin V., Suchard M. 2008 Counting labeled transitions in continuous-time Markov models of evolution. Journal of Mathematical Biology 56(3), 391-412. (doi:10.1007/s00285-007-0120-8). 21. O'Brien J.D., Minin V.N., et al. 2009 Learning to Count: Robust Estimates for Labeled Distances between Molecular Sequences. Molecular Biology and Evolution 26(4), 801-14. (doi:10.1093/molbev/msp003). 22. Maddison W.P., Maddison D.R. 2011 Mesquite: a modular system for evolutionary analysis. Version 2.75. 23. Enright M.C., Day N.P.J., et al. 2000 Multilocus Sequence Typing for Characterization of Methicillin-Resistant and Methicillin-Susceptible Clones of Staphylococcus aureus. Journal of Clinical Microbiology 38(3), 1008-15.