1 Supplementary information Sample We analyzed a bone from a mammoth from Berelekh, Yakutia (71°N, 145°E, Russia). The sample was dated at the Leibniz Labor für Altersbestimmung und Isotopenforschung, Christian-Albrechts-Universität Kiel, Germany, using accelerator mass spectrometry (Table S1). The collagen fraction contained more than the minimum amount of one mg carbon recommended for AMS dating and the 13C-values are in the normal range of organic bone samples. The calibrated age (two-sigma range, 95% probability) lies between 11,900 and 13,400 years. Table S1. Dating weight carbon collagen radio carbon Number (g) (mg) (%) age KIA 25289 0.485 4.1 24.0 12,170 ± 50 BP 13C(‰) -20.50 ± 0.13 Primer design, DNA extraction and amplification Leipzig. Based on published sequences of African (Loxodonta africana) and Asian (Elephas maximus) elephant as well as dugong (Dugong dugon), we designed 46 primer pairs that are expected to amplify DNA fragments that assuming homology with extant elephant species, are expected to amplify DNA fragments that vary in length from 291 to 580 bp (including primers) and cover the entire mtDNA of the woolly mammoth (Mammuthus primigenius, Fig. 1, Table S3). To avoid amplification of the short overlapping fragments during the multiplex step, we divided the primer pairs into two sets so that the amplification products within a set do not overlap. DNA was extracted from 747 mg bone material1 and eluted in a final volume of 70 µl 1x TE. Two multiplex PCRs were initiated using DNA extract corresponding to approximately 15 mg of bone. The final concentrations of reagents in amplification reactions were 1x AmpliTaq Gold buffer, 4 mM MgCl2, 200 µM of each dNTP, 1 µM each of 46 primers (23 primer pairs, see Table S3), and 2U of AmpliTaq Gold (Applied Biosystems). PCRs were initiated by 9 min at 94°C to activate the polymerase followed by 27 cycles of 94°C for 20 s, 52°C 2 for 30 s and 72°C for 1 min. This amplification was diluted 40 fold and 5 µl of the dilution were used as template in each of 23 single amplification reactions. The specificity of the secondary amplifications was improved by the use of 'nested' primers internal to those used in the primary multiplex amplification (see Table S3). This approach resulted in final amplification lengths of the products ranging from 291 – 567 bp (including primers, see Table S3). Reagent concentrations were as above, except that a single primer pair was used at a concentration of 1.5 µM for each primer. The temperature profile was identical to the one above except that 33 instead of 27 cycles were used. To avoid amplification of contaminating modern human DNA, which is ubiquitous in the environment2-5 at least one primer per amplified fragment carried a mismatch to the corresponding human mtDNA sequence at the 3’-end6. Amplification products were visualized on 2.5% agarose gels. Amplification products of the correct size were cloned using the TOPO TA cloning Kit (Invitrogen, The Netherlands) and a minimum of three clones were sequenced on an ABI3730 capillary sequencer (Applied Biosystems). In cases when primer dimers were observed, products of the correct length were isolated from the gel and purified using the QIAquick gel extraction kit (Qiagen, Germany) before cloning. Amplification and extraction controls were negative throughout all experiments. Of five PCRs that failed initially four were successful in later attempts, suggesting that the initial failure was due to absence of even a single template molecule in the PCR. The fifth PCR failed repeatedly; inspection of the sequences of the overlapping amplimers revealed differences between the mammoth and elephant sequences sufficient to account for the failure. Altogether, we sequenced 842 clones from 121 PCR products, obtained from 14 initial multiplex amplifications. The complete mammoth mtDNA sequence was deposited in GenBank (accession number DQ188829). We observed consistent substitutions when comparing the products derived from one primary amplification with those from the second in 17 amplifications representing 13 of the 46 fragments, indicating that these amplifications started from a single template molecule7. Out of a total of 39 such substitutions, all were found to represent C to T or G to A substitutions. This type of substitution has been shown to occur frequently in ancient DNA8-10 and where tested represented deamination of cytosine in the ancient template molecules9. We amplified these fragments together with fragments that had failed to 3 amplify in either of the two first amplifications using additional multiplex reactions. We failed in obtaining a second amplification for one fragment (primer pair A23). This fragment contains 55 copies of a 6 bp motif in the single amplification we obtained. Varying copy numbers of this repetitive motif are also found in African and Asian elephant and dugong. As such repeats are known to vary in copy number both between and within individuals11 this fragment was excluded from all further analyses. In addition to the small amounts of bone necessary to sequence a complete mtDNA genome of extinct species, there are two other advantages of the nested primer approach used. First, nested primers improve the specificity of the final amplification products. Second, the risk of contamination with previously amplified PCR products is greatly reduced as the second-stage PCR products (which are amplified to very high levels) cannot later contaminate the primary PCRs, since they do not carry the external priming sites. It has recently been claimed that ancient DNA extracts may contain mutagenic substances that affect particular positions in DNA sequences12, 13. We tested whether our mammoth DNA extract contained such mutagenic factors. Thus, we amplified 274 bp of the mtDNA control region from 50ng of chimpanzee DNA, in the presence of either water, extraction blank or mammoth extract as previously described14. The obtained PCR products were cloned and a minimum of six clones sequenced. Consistent with previous results14, 15, we did not find any evidence for a mutagenic factor in our extract (data not shown). Cambridge 0.5 g of bone was extracted using the same protocol as in Leipzig. Instead of an extraction control without bone, a second extraction using cave bear bone was carried out in parallel to monitor for possible carrier effects16. Amplification was done using the same PCR conditions as in Leipzig, but with a reduced number of primer pairs. Both water controls and the cave bear extract were negative for mammoth-specific products, although the cave bear extract yielded products different from the correct product lengths with some primer pairs. Between two and four amplification products, originating from independent primary PCRs, were sequenced in both directions for eight different fragments distributed over the whole mitochondrial genome (12S, ND2, t-RNA Ser, t-RNA Asp, COX1, COX2, COX3, ND4, Cytb, D-Loop). A total of 3,824 bp were 4 amplified and sequenced. Two amplification products representing a single fragment carried an A at a single position each. At one of these positions three additional amplification products of the same fragment carried a G, while at the other two additional amplifications carried a G and the third amplification was heterogeneous for G/A. As described above such consistent substitutions between amplifications occur most likely due to cytosine deamination of the template DNA. The consensus sequences for all fragments were identical to the corresponding sequences done in Leipzig. London. DNA was extracted from 0.2 g of bone powder using the method described in Leonard et al.17, modified by the use of a Spex freezer mill to powder the bone. For amplifications, we used 1 µl of mammoth DNA extract in an antibody-mediated hot start PCR of 25 µl total volume. The final concentrations of reagents in amplification reactions were 1X Platinum Taq HiFi buffer, 2 mM MgSO4, 250 µM of each dNTP, 1 µM of each primer, 1 mg/ml bovine serum albumin and 1 U of Platinum Taq HiFi (Invitrogen). The PCR reactions comprised an initial denaturation of 5min at 94°C, followed by 44 cycles of 52°C for 1 min, 68°C for 1 min and 94°C for 1 min. Amplification products were visualized on 2.5% agarose gels. The resulting PCR products were sequenced on an ABI3700 capillary sequencer (Applied Biosystems). Amplification and extraction controls were negative throughout all experiments. The primers were designed using the mitochondrial sequences from Asian and African elephant for Cytb, t-RNA Thr, t-RNA Pro and D-Loop sequences. Primer Sequences Mammoth_14125_F ATCTGAAAAACCATCGTTGTATTTC Mammoth_14387_R GATGCTCCGTTTGAGTGTAGTTG Mammoth_14232_F CTACCCCATCCAACATCTCAAC Mammoth_14585_R TAAGGGATTGCTGAGAAAAGGTTAGT Mammoth_15038_F 5 GGCGTCCTAGCCCTACTCCTATCAAT Mammoth_15399_R TTGTTTGCAGGGAATAGTTTAAGAAG Mammoth_15178_F TGAATTGGCAGCCAACCAGTAGAA Mammoth_15530_R TATAAGCATGGGGTAAATAATGTGATG Mammoth_15393_F CCTCGCTATCAATACCCAAAACTG Mammoth_15780_R CGAGAAGAGGGACACGAAGATG Amplification products were sequenced directly from both strands. The determined sequences are identical to the corresponding sequences determined in Leipzig except for one position. At this position, a single amplification in London carried a T whereas three independent primary amplifications in Leipzig gave a C. Thus, the overall consensus nucleotide for this position is a C, by a ratio of 3:1 primary amplifications. Moreover, the minority nucleotide (T) can be explained by cytosine deamination of the template DNA. Therefore, we conclude that the majority nucleotide (C) represents the correct sequence. Combining the data from all three laboratories, 42 consistent changes were observed in 20 amplification products representing 14 different fragments. Given a previously estimated damage rate of 2% for cytosine deamination9 and following the approach that each position is determined from two independent PCRs, and in case of a discrepancy, a third PCR is done and the nucleotide observed twice is assumed to represent the correct sequence, the chance of incorrectly determining a position is 0.012%9. Thus, it is not surprising that the consensus sequences from Leipzig and Cambridge where this approach was followed are identical. Conversely, the sequences from London, which were determined only from single PCRs, differ at one position from the Leipzig consensus sequences. Given that 456 cytosines are found in both strands of the fragments amplified this result is not surprising for ancient DNA analyses. 6 Analyses The entire mitochondrial genome of the mammoth is 16,770 bp long and, like its extant relatives, carries 22 tRNA genes, 13 protein coding genes and two rRNA genes. All genes that are inferred to encode proteins show open reading frames of the expected length and all tRNAs show the expected anticodon sequence when folded into their twodimensional structures, both facts arguing against the amplification of nuclear insertions. The total length of the mtDNA genome is likely to vary among and perhaps even within individual mammoths since the control region contains a repeat motif that is also found in African and Asian elephants and dugong. Such repeats are known to vary in copy number both between and within individuals11 and may also induce in vitro recombination during PCR18. Hence, the number of repeat units reported here cannot be taken to be representative without further study. The length of the mtDNA genomes from African (accession number NC000934) and Asian (accession number NC005129) elephant is 16866 and 16831bp, respectively. We estimated phylogenetic trees using maximum parsimony, maximum likelihood, neighbor joining and Bayesian tree building methods and both hyrax and dugong either alone or in combination as outgroups, using 1000 bootstrap replicates and 3 million chains in the Bayesian analyses. However, we could not resolve the phylogeny of mammoth, African and Asian elephant unambiguously (Table S2). As two different tests did not reject the assumption of a molecular clock for these three species, we restricted the analyses to the three elephantidae species. To test whether the phylogenetic signal in the data is strong enough to warrant a resolution of the sequence tree we proceeded as following: The likelihood of the data was assessed under two alternative models i) a simple model with only a single free parameter corresponding to a star-like tree topology, and ii) a more complex model with two free parameters resembling the resolved tree topology. Subsequently, we applied a likelihood ratio test to infer whether the more complex model explains the data significantly better than the simpler model. This test assumes that the test variable 2 log lik resolved log lik star is approximately 2 distributed with one degree of freedom. A rejection of the simpler model indicates that the information in the data is sufficient to allow a meaningful reconstruction of the phylogenetic relationships of the 7 taxa under study. To infer the posterior probabilities by which each of the three possible resolved sequence trees is supported by the data, we calculated the likelihoods of the data individually for all three topologies. Posterior probabilities were then calculated as following: P( | D) P(D | T) P(T) /P(D) , where P(D | T) denotes the likelihood of the data given the tree, P(T) is the prior probability of the tree (set to 1/3 for each of the three trees) and P(D) is the total probability of the data over all three possible hypotheses. Our analysis obtained the following log likelihoods: ((Asian elephant, mammoth), African elephant): -26076.00, ((African elephant, mammoth), Asian elephant): -26079.4, and ((African elephant, Asian elephant), mammoth): -26078.8. Posterior probabilities were calculated from these log likelihoods using the perl script post_prob.pl. The short internal branch of the phylogenetic tree for mammoth, African and Asian elephant makes it likely that polymorphisms from the ancestral species may have persisted between the two speciation events19. This situation occurs, for example, for nuclear markers in humans, chimpanzees and gorillas20. As the internal branch for mammoth, African and Asian elephant is even shorter relative to the overall length of the tree than for the three primate species, the problem of lineage sorting is likely to also be even more severe for mammoth, African and Asian elephant than for humans and African great apes. Thus, it is possible that, despite the smaller effective population size of mitochondrial DNA compared to nuclear sequences, and contrary to the case of humans and African great apes, the phylogeny obtained for the mtDNA genomes of mammoth, African and Asian elephant does not represent the species phylogeny. However, as no comprehensive data are available for either the generation time or the effective population size of mammoth and the living elephants, it is not possible to estimate the likelihood that the mtDNA sequence phylogeny does not represent the species phylogeny. Finally, the comparison between the mammoth, African and Asian elephant tree and the human-chimp-gorilla (HCG) tree (Fig. 3b) reveals striking differences. The HGC mtDNA divergence is more than twice as great and the internal branch is more than twice as long, relative to the total number of changes in the tree (17.6% vs. 7.3%). Both an excellent fossil record21 and molecular estimates22 indicate that humans and chimpanzees diverged roughly six million years ago. Evidently, either the substitution rates on the 8 HGC and elephantidae trees differ by more than a factor of two or the common ancestor of the elephantidae was present much more recently than six million years ago. However, the fossil record indicates a common ancestor around six million years ago also for elephantidae23 Thus, the fact that the overall length of the phylogenetic tree for mammoth, African and Asian elephant is only about half the length of the tree for human, chimpanzee and gorilla (Fig. 3a), indicates a slower rate of nucleotide substitution in the mitochondrial DNA of mammoth, African and Asian elephant. The temporally close divergence events around six million years ago thus inferred for both species groups raise the possibility that both population divergences may have been triggered by the same cause. Further analyses are necessary to either confirm or reject this hypothesis. Table S2. Results using various tree-building methods and outgroups. NJ: neighbor joining; MP: maximum parsimony; ML: maximum likelihood. For each combination of tree building method and outgroup, the bootstrap value (or posterior probability for the Bayesian trees) and the sister group relationship are shown (M-L: mammoth – African elephant; M-E: mammoth – Asian elephant). Tree reconstruction method NJ MP ML Bayesian Outgroup Dugong 73 / M-L 62 / M-E 56 / M-L 97 / M-L Hyrax 83 / M-E 93 / M-E 79 / M-E 91 / M-E Both 87 / M-E 90 / M-E 54 / M-L 100 / M-E Addition of the published nuclear DNA sequences that are shared between mammoth, Asian and African elephants and the outgroup species does not change the tree topologies inferred for the mtDNA sequence alone. In all cases, the bootstrap values and posterior probabilities did not differ significantly from those obtained for the mtDNA-only alignment. Thus, on multiple runs, scores did not vary by more than ±3 from the values for the mtDNA-only analyses. 9 References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. Hofreiter, M. et al. Evidence for reproductive isolation between cave bear populations. Curr Biol 14, 40-3 (2004). Kolmann, C. J. & Tuross, N. Ancient DNA Analysis of Human Populations. American Journal of Physical Anthropology 111, 5-23 (2000). Hofreiter, M., Serre, D., Poinar, H. N., Kuch, M. & Pääbo, S. Ancient DNA. Nat Rev Genet 2, 353-9. (2001). Wandeler, P., Smith, S., Morin, P. A., Pettifor, R. A. & Funk, S. M. Patterns of nuclear DNA degeneration over time--a case study in historic teeth samples. Mol Ecol 12, 1087-93 (2003). Serre, D. et al. No evidence of neandertal mtDNA contribution to early modern humans. Plos Biology 2, 313-317 (2004). Kwok, S. et al. Effects of primer-template mismatches on the polymerase chain reaction: human immunodeficiency virus type 1 model studies. Nucleic Acids Res 18, 999-1005. (1990). Handt, O., Krings, M., Ward, R. H. & Pääbo, S. The retrieval of ancient human DNA sequences. Am J Hum Genet 59, 368-76 (1996). Hansen, A., Willerslev, E., Wiuf, C., Mourier, T. & Arctander, P. Statistical evidence for miscoding lesions in ancient DNA templates. Mol Biol Evol 18, 2625. (2001). Hofreiter, M., Jaenicke, V., Serre, D., Haeseler Av, A. & Pääbo, S. DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. Nucleic Acids Res 29, 4793-9. (2001). Gilbert, M. T. et al. Characterization of genetic miscoding lesions caused by postmortem damage. Am J Hum Genet 72, 48-61 (2003). Lunt, D. H., Whipple, L. E. & Hyman, B. C. Mitochondrial DNA variable number tandem repeats (VNTRs): utility and problems in molecular ecology. Mol Ecol 7, 1441-55 (1998). Pusch, C. M. & Bachmann, L. Spiking of Contemporary Human Template DNA with Ancient DNA Extracts Induces Mutations Under PCR and Generates Nonauthentic Mitochondrial Sequences. Mol Biol Evol 21, 957-964 (2004). Pusch, C. M. et al. PCR-induced sequence alterations hamper the typing of prehistoric bone samples for diagnostic achondroplasia mutations. Mol Biol Evol 21, 2005-11 (2004). Serre, D., Hofreiter, M. & Pääbo, S. Mutations Induced by Ancient DNA Extracts? Mol Biol Evol 21, 1463-1467 (2004). Hofreiter, M. et al. Lack of phylogeography in European mammals before the last glaciation. Proc Natl Acad Sci U S A 101, 12963-8 (2004). Handt, O., Höss, M., Krings, M. & Pääbo, S. Ancient DNA: methodological challenges. Experientia 50, 524-9 (1994). Leonard, J. A., Wayne, R. K. & Cooper, A. Population genetics of ice age brown bears. Proc Natl Acad Sci U S A 97, 1651-4 (2000). 10 18. 19. 20. 21. 22. 23. Shinde, D., Lai, Y., Sun, F. & Arnheim, N. Taq DNA polymerase slippage mutation rates measured by PCR and quasi-likelihood analysis: (CA/GT)n and (A/T)n microsatellites. Nucleic Acids Res 31, 974-80 (2003). Nei, M. in Evolutionary perspectives and the new genetics (eds. Gershowitz, H., Rucknagel, D. L. & Tashian, R. E.) 133-147 (Alan R. Liss, Inc., New York, 1986). Chen, F. C., Vallender, E. J., Wang, H., Tzeng, C. S. & Li, W. H. Genomic divergence between human and chimpanzee estimated from large- scale alignments of genomic sequences. J Hered 92, 481-9. (2001). Brunet, M. et al. A new hominid from the Upper Miocene of Chad, Central Africa. Nature 418, 145-51 (2002). Glazko, G. V. & Nei, M. Estimation of divergence times for major lineages of primate species. Mol Biol Evol 20, 424-34 (2003). Tassy, P. Elephantoidea from Lothagam (ed. Leakey, M. H., JM, eds) (Columbia University Press, New York, 2003).