Electronic-Supplementary-Text - Proceedings of the Royal Society B

advertisement
Supplementary Text
Pretreatment of the samples and DNA extraction method
All the reagents were prepared in a laboratory specifically dedicated to work with
ancient DNA under restrictive sterility conditions. Before starting the extraction process
the samples were briefly cleaned with a 5% hypochlorite solution trying to manipulate
them as little as possible. All the process was conducted inside a laminar flux hood.
Each sample was obtained using a different set of dental material that included
forceps and a high speed dental diamond bur inserted in a 220 V micromotor. The
powder was directly extracted from the lesion itself and put in a 15ml polypropylene
tube, where it was incubated overnight at 37ºC in 5ml of extraction buffer (250ul Tris
HCl 1M (pH 8.0-8.5), 250ul SDS 10%, 250ul sterile deionised water and 4,25ml EDTA
0,5M) and 50ul of 0.01 g/ml proteinase K. After incubation, the DNA was
subsequently extracted with a standard phenol-chloroform extraction protocol and the
aqueous phase was concentrated using a Centricon-30 filter column (Millipore) up to a
30 ml volume [13]. An extraction blank was prepared in each extraction process to
check for contamination.
Sequence used as reference and other modern sequences
The Streptococcus mutans (S. mutans) D4930.1 strain [1] was used as reference. The
numbering was done starting from the first nucleotide that appears in the S. mutans
dexA gene for dextranase, complete cds, Accesion number D49430.1. The other modern
sequences used are U159 [2] and GS5 (USA) [3], NN2025 and LJ23 (Japan) [4,5],
ATCC 25175 (England) [6], 5DC8 (England), KK21, KK23 and AC4446 (Germany)
and NCTC11060 (Denmark) [7].
Ancient sequences obtained
The full-length caries ancient sequences retrieved were all originally obtained in this
work. Some aspects of the ancient human populations from those sites were described
in: M1 (Catalonia) [8], CR1 (Majorca) [9], V1 (Universitat Autònoma de Barcelona
(UAB) archaeological collection) (Catalonia, present study), SP1 and SP2 (Catalonia)
[10], T1, T2, LO1 and LO2 (México) [11] and U1 (UAB archaeological collection)
(Catalonia, present study).
Determination of human mitochondrial haplogroups
The mitochondrial haplogroups of two individuals, one of European and one of
American origin, were obtained combining the information obtained by the
sequenciation of the second half of the Hypervariable Region I (nucleotide positions
16210 to 16400) and generating restriction-length polymorphisms
Detailed data
Branch-site test of positive selection
The branch-site test of positive selection [14,15] was applied. This compares the
modified model A with the corresponding null model with ω = 1 fixed (fix_omega = 1
and omega = 1). 𝜒12 with critical values 3.84 and 5.99 was used to guide against
violations of model assumptions, as recommended in [16]. To calculate the p value
based on this mixture distribution, p was calculated using 2∆l, and then the obtained
value was divided by 2. As no orthologous sequences from a closely related species
could be found performing a Blast search [17] (see the electronic supplementary
material, table S8), the most ancient sample studied, M1, was taken as the background
branch when applying the branch-site test of positive selection to the ancient sequence
data set only, or to the whole sequence data set. When applying the test to the modern
sequence data only, strain NN2025 (Japan) was used as the background branch because
it was in the principal node, and Japan was the only country with representatives in the
two principal nodes of the network.
Likelihood-ratio tests using the site models
Three of the tests supported by PAML package were carried out. The first test compared
the one-ratio model (M0) with the discrete model (M3), which tests whether ω can vary
among sites. The two likelihood-ratio tests (LRTs) used to check for positive selection
were M1a vs M2a and M7 vs M8 [18]. In these tests, a null model that does not allow
ω>1 in the class distribution of this value (M1a and M7) is compared with an alternative
model that does (M2a and M8, respectively). These are the two best LRTs used so far to
test for positive selection [19]. Twice the log likelihood difference (2∆l) between the
values obtained under these models can be compared to a 𝜒2 distribution with 4 degrees
of freedom (df) in M0 vs M3, and with 2 df in M1a vs M2a and M7 vs M8 [20]. The
F3x4 model of codon frequencies (the equilibrium codon frequencies are calculated
from the average nucleotide frequencies at the three codon positions) was used to
accommodate biased codon usage. Under the conditions set by positive selection
models M2a and M8a, the most likely site category (with the associated dN/dS ratio) at
each codon (amino acid) site was inferred. In all cases, branch lengths were fixed at
their Maximum-Likelihood Estimation (MLE) under M0 (one-ratio) to save
computation, as several previous studies have shown that tests of positive selection and
detection of specific sites under its action are insensitive to minor errors in the tree
topology or to different estimates of branch lengths [21-23]. After detecting that sites
under positive selection were present using LRTs, we applied a procedure known as
Bayes Empirical Bayes (BEB) [24] to calculate the posterior probabilities that each site
belonged to the class ω>1. BEB appears to avoid the high false-positive rates of the
naïve empirical Bayes (NEB) approach in small non-informative data sets, as it better
accommodates uncertainties in the MLE of parameters in the ω distribution [15].
Different tests of codon selection (Twice the log-likelihood was calculated and
compared with the corresponding 𝜒2 test).
For ancient sequences
M0 vs M3: 2x (-109.230-(-117.484)) = 16.632, p<0,05 for 𝜒42
M1a vs M2a: 2x (-109.230-(-118.530)) = 18.6, p<0,01 for 𝜒22
M7 vs M8: 2x (-109.230-(-118.757)) = 19.054, p<0,01 for 𝜒22
For modern sequences
M0 vs M3: 2x (-129.751-(-134.917)) = 10.332, p<0,05 for 𝜒42
M1a vs M2a: 2x (-129.752 -(-135.555)) = 11.506, p<0,01 for 𝜒22
M7 vs M8: 2x (-130.253-(-135.816)) = 10.586, p<0,01 for 𝜒22
For all the sequences
M0 vs M3: 2x (-153.352-(-174.916)) = 43.128, p<0,05 for 𝜒42
M1a vs M2a: 2x (-153.352 -(-172.471)) = 38.238, p<0,01 for 𝜒22
M7 vs M8: 2x (-154.681-(-172.478)) = 35.594, p<0,01 for 𝜒22
Codon-by-codon analysis of natural selection
For each codon, the numbers of sites that are estimated to be synonymous (S) and nonsynonymous (N) were calculated. These estimates were produced using the joint
Maximum Likelihood reconstructions of ancestral states under a Muse-Gaut model [25]
of codon substitution and all the models of nucleotide substitution provided by the
HyPhy software package [26]. To estimate MLE values, a tree topology was
automatically computed. The test statistic dN - dS was calculated, where dS is the
number of synonymous substitutions per site (s/S) and dN is the number of nonsynonymous substitutions per site (n/N). A positive value for the test statistic indicates
an over-abundance of non-synonymous substitutions. Normalized dN - dS for the test
statistic was obtained using the total number of substitutions in the tree (measured in
expected substitutions per site), in order to make comparisons between the two data sets.
Maximum Likelihood computations of dN and dS were performed using HyPhy
software package as done in [27].
BLAST search
A search for short, nearly exact matches was carried out using the BLAST program [24]
from 7 to 20bp in length in the Bacillus/ Lactobacillus/Streptococcus group (taxid:
91061) and with somewhat similar sequences (blastn) that includes 20bp or longer with
just S. mutans dextranase giving significant results (E value lower than 0.1)
(http://blast.ncbi.nlm.nih.gov/Blast.cgi, last accessed 15/4/13).
References
1 Igarashi, T., Yamamoto, A., Goto, N. 1995 Sequence analysis of the Streptococcus
mutans Ingbritt dexA gene encoding extracellular dextranase. Microbiology and
Immunology 39(11), 853-60.
2 Ajdic, D., McShan, W. M., McLaughlin, R. E., Savic, G., Chang, J., Carson, M. B.,
Primeaux, C., Tian, R., Kenton, S., Jia, H., et al. 2002 Genome sequence of
Streptococcus mutans UA159, a cariogenic dental pathogen. Proceedings of the
National Academy of Sciences U S A 99(22), 14434–14439.
3 Biswas S, Biswas I. 2012 Determination of Complete Genome Sequence of the
Streptococcus mutans GS-5, a serotype c Strain. Journal of Bacteriology 194(17), 47878.
4 Maruyama, F., Kobata, M., Kurokawa, K., Nishida, K., Sakurai, A., Nakano,
K., Nomura, R., Kawabata, S., Ooshima, T., Nakai, K., et al. 2009 Comparative
genomic analyses of Streptococcus mutans provide insights into chromosomal shuffling
and species-specific content. Genomics 10, 358.
5 Aikawa, C., Furukawa, N., Watanabe, T., Minegishi, K., Furukawa, A., Eishi,
Y., Oshima, K., Kurokawa, K., Hattori, M., Nakano, K., et al. 2012 Complete Genome
Sequence of the Serotype k Streptococcus mutans Strain LJ23. Journal of Bacteriology
194(10), 2754-2755.
6 Kim, Y. M., Shimizu, R., Nakai, H., Mori, H., Okuyama, M., Kang, M. S., Fujimoto,
Z., Funane, K., Kim, D., Kimura, A. 2011 Truncation of N- and C-terminal regions of
Streptococcus mutans dextranase enhances catalytic activity. Applied Microbiology and
Biotechnology 91(2), 329-339.
7 Song, L., Sudhakar, P., Wang, W., Conrads, G., Brock, A., Sun, J., Wagner-Döbler, I.,
Zeng, A. P. 2012 A genome-wide study of two-component signal transduction systems
in eight newly sequenced mutans streptococci strains. BMC Genomics 13, 128.
8 Simón, M., Jordana, X., Armentano, N., Santos, C., Díaz, N., Solórzano, E., López, J.
B., González-Ruiz, M., Malgosa, A. 2011 The Presence of Nuclear Families in
Prehistoric Collective Burials Revisited. The Bronze Age Burial of Montanissell Cave
(Spain) in the Light of aDNA. American Journal of Physical Anthropology 146(3),
406-413.
9 Díaz, N. 2009 Bahía de Alcudia, Mallorca: un crisol genético en el Mediterráneo.
PhD Thesis. Barcelona: Universitat Autònoma de Barcelona.
10 Jordana, X. 2007 Caracterització i evolució d'una comunitat medieval catalana.
Estudi bioantropològic de les inhumacions de les esglésies de Sant Pere. PhD Thesis.
Barcelona: Universitat Autònoma de Barcelona.
11 Solórzano, E. 2006 De la Mesoamérica Prehispánica a la Colonial: La huella del
DNA antiguo. PhD Thesis. Barcelona: Universitat Autònoma de Barcelona.
12 van Oven M, Kayser M. 2009. Updated comprehensive phylogenetic tree of global
human mitochondrial DNA variation. Hum Mutat 30(2), E386-E394.
13 Malgosa, A., Montiel, R., Díaz, N., Solórzano, E., Smerling, A., Isidro, A., García, C.
& Simon, M. 2005. Ancient DNA: a modern look at the infections of the past.
Recent research developments in microbiology (ed. S.G. Pandalai), pp. 213-236.
Trivandrum, India: Research Signpost.
14 Yang, Z., Wong, W. S. W., Nielsen, R. 2005 Bayes Empirical Bayes Inference of
Amino Acid Sites Under Positive Selection. Molecular Biology and Evolution 22(4),
1107-1118.
15 Zhang, J., Nielsen, R., Yang, Z. 2005 Evaluation of an improved branch-site
likelihood method for detecting positive selection at the molecular level. Molecular
Biology and Evolution 22, 2472-2479.
16 Yang, Z. 2007 PAML 4: a program package for phylogenetic analysis by maximum
likelihood. Molecular Biology and Evolution 24, 1586-1591.
17 Altschul, S., Madden, T., Schäffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.
J. 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs. Nucleic Acids Research 25, 3389-3402.
18 Yang, Z. 1998 Likelihood Ratio Tests for Detecting Positive Selection and
Application to Primate Lysozime Evolution. Molecular Biology and Evolution 15(5),
568-573.
19 Anisimova, M., Bielawski, J., Dunn, K., Yang, Z. 2007 Phylogenomic analysis of
natural selection pressure in Streptococcus genomes. BMC Evolutionary Biology 7, 154.
20 Yang, Z., Nielsen, R. 2002 Codon-substitution models for detecting molecular
adaptation at individual sites along specific lineages. Molecular Biology and Evolution
19(6), 908-17.
21 Suzuki, Y., Gojobori, T. 1999 A Method for Detecting Positive Selection at Single
Amino Acid Sites. Molecular Biology and Evolution 16(10), 1315–1328.
22 Yang, Z., Nielsen, R., Goldman, N., Pedersen, A. M. 2000 Codon-substitution
models for heterogeneous selection pressure at amino acid sites. Genetics 155(1), 43149.
23 Swanson, W. J., Yang, Z., Wolfner, M. F., Aquadro, C. F. 2001 Positive Darwinian
selection drives the evolution of several female reproductive proteins in mammals.
Proceedings of the National Academy of Sciences USA 98, 2509–2514.
24 Deely, J. J., Lindley, D. V. 1981 Bayes Empirical Bayes. Journal of American
Statistical Association 76, 833-841.
25 Muse, S. V., Gaut, B. S. 1994 A likelihood approach for comparing synonymous and
nonsynonymous nucleotide substitution rates, with application to the chloroplast
genome. Molecular Biology and Evolution 11, 715-724.
26 Pond, S. L. K., Frost, S. D. W., Muse, S. V. 2005 HyPhy: hypothesis testing using
phylogenies. Bioinformatics 21, 676-679.
27 Pond, S. L. K., Frost, S. D. W. 2005 Not So Different After All: A Comparison of
Methods for Detecting Amino Acid Sites Under Selection. Molecular Biology and
Evolution 22, 1208-1222.
Download