Supplementary Information Ruling Out DNA Contamination or Mix-up DNA contamination or misidentification/mix-up is always a serious concern in studies on HGT, since one is looking for genes that are misplaced in molecular phylogenies or in phylogenetic distribution, and misplaced DNA will create such effects. Indeed, certain published claims for HGT in plants turn out to reflect contamination1,2. DNA contamination or mix-up can be definitively ruled out in all fives cases of HGT reported in this study because the results are entirely reproducible: Sanguinaria rps11: We isolated (by direct sequencing of PCR products) the same, chimeric rps11 gene, and only this gene, from DNA prepared from three different sources of Sanguinaria material, with the third set of DNA preparations and PCR reactions carried out in a different laboratory. As described in the main text, the near-identity of Sanguinaria rps11 genomic and cDNA sequences, differing only by mitochondrialspecific RNA editing, further rules out artifact. Amborella atp1: A eudicot-like atp1 gene of identical sequence was isolated by three independent groups, each working from a different preparation of Amborella DNA (refs. 18 and 19 of main text and this study). Our unpublished data showing near-identity of Amborella atp1 genomic and cDNA sequences, differing only by mitochondrial-specific RNA editing, further rules out artifact. Actinidia rps2: A monophyletic lineage of highly similar rps2 genes of monocot origin was isolated from each of six species of Actinidia sampled. 1 Caprifoliaceae rps11: A monophyletic lineage of highly similar rps11 genes of Ranunculalean origin was isolated from five genera of Caprifoliaceae (Fig. 2b of main text). Betulaceae rps11: A monophyletic lineage of highly similar rps11 genes of unresolved phylogenetic position but likely horizontal origin (see main text) was isolated from three genera of Betulaceae (Fig. 2b of main text). Mitochondrial Provenance of Horizontally-acquired Plant Genes Three forms of evidence lead us to conclude that in all five cases of HGT reported in this study the horizontally-acquired gene is located in the mitochondrial genome of the recipient group. First, in all cases, these genes show the very low divergence typical of genes located in angiosperm mitochondrial genomes and which have never existed in the much higher mutational environment of angiosperm nuclear genomes (see main text, refs. 14-16 of main text, and Supplemental Figure 1). Note that, as discussed in part in the main text, this same logic leads one to conclude that in all five cases the donor genome was the mitochondrial genome and not the nuclear genome. Second, in four of the five cases (all but Amborella atp1), Southern hybridization experiments (ref. 13 of main text) show that at least one member of the relevant lineage of plants contains a relevant sequence of the hybridization intensity expected for a mitochondrial gene, as assayed under highly controlled conditions [note that all DNAs were made from green leaves, which always contain significantly lower levels of mitochondrial DNA per cell (typically hundreds of copies) than chloroplast DNA (thousands of copies), but higher levels than nuclear DNA (two copies)3]. Third, that genomic and cDNA sequences for both Sanguinaria rps11 and Amborella atp1 differ only by mitochondrial-specific RNA editing firmly establishes that both of 2 these genes are located in the mitochondrial genome (expression of the other three genes was not assayed). In terms of where these horizontally acquired genes were inserted into the mitochondrial genome, the chimeric rps11 gene of Sanguinaria obviously reflects homologous integration of horizontally acquired, monocot rps11 sequences into the endogenous rps11 locus. We have no relevant information for the other four transfers, but predict, based on the loose organization of angiosperm mitochondrial genomes (consisting almost entirely of single gene islands floating in a large sea of intergenic spacers) and that virtually all known cases of gene duplicates in plant mitochondrial genomes are unlinked4,5, that these foreign sequences were randomly inserted into the genome. The Tip of an Iceberg of Mitochondrial HGT in Plants The five cases of plant mitochondrial HGT reported here must be the tip of a large iceberg of mitochondrial HGT in plants considering 1) the discovery of relatively many cases from such limited sampling (3 genes and a total of about 120 sequences), 2) the likelihood that many cases of HGT will be missed owing to the relatively poor resolution of plant mitochondrial single-gene trees (this reflects the very low mutation rate in plant mitochondrial genomes (refs. 14 and 15 of main text), as exacerbated by the short length of genes such as rps2 and rps11; see Fig. 2 of main text), 3) the potential for chimeric HGT to further muddy the phylogenetic waters, and 4) the inadequacy of Southern hybridization (used to detect recapture HGT) and PCR (used to isolate mitochondrial genes) to detect long-distance HGT events, e.g., from a fungus. 3 Parametric Bootstrapping To test further if the cases of inferred horizontal transfer can be explained under a vertical transmission scenario, we performed parametric bootstrapping6 using 100 simulated sequence replicates. The simulated sequences were generated with Seq-Gen7 using an HKY85 substitution model with a gamma distribution and a constrained maximum likelihood input tree from PAUP8. The model parameters were first estimated from the original data with Tree-Puzzle9. The simulated data matrices were analyzed by parsimony in PAUP. Parametric bootstrapping rejected (p<0.01) the vertical transmission hypotheses that group 1) rps2 from Actinidia with Grevillea/Platanus, 2) rps11 from Caprifoliaceae and Betulaceae with Trochodendrales, 3) rps11 from Caprifoliaceae with Trochodendrales (with Betulaceae excluded from the analysis), 4) the 3’ half of rps11 from Sanguinaria with other Papaveraeceae and 5) the two Amborella atp1 sequences together. Parametric bootstrapping did not, however, reject the hypothesis that the Betulaceae and Trochodendrales rps11 sequences form a monophyletic group. 4 Supplemental References: 1. Hudson, A. Fungal cytochrome c genes from plants. J. Mol. Evol. 41, 1170-1171 (1995). 2. Doolittle, R. F. The case for gene transfers between very distantly related organisms, p. 311320. In M. Syvanen and C. I. Kado (ed.), Horizontal Gene Transfer. Chapman & Hall, London, New York (1998). 3. Bendich, A. J. Why do chloroplasts and mitochondria contain so many copies of their genome? Bioessays 6, 279-282 (1987). 4. Lonsdale, D. M. The plant mitochondrial genome. In Stumpf, P. K. & Conn, E. E. (eds.), The Biochemistry of plants. Academic Press, San Diego, pp. 229-295 (1989). 5. Wolstenholme, D. R. & Fauron, C. M.-R. Mitochondrial genome organization. In Levings, C. S., III & Vasil, I. K. (eds.) The Molecular Biology of Plant Mitochondria, Kluwer, Netherlands pp. 1-59 (1995). 6. Huelsenbeck, J. P., Hillis, D. M., & Jones, R. Parametric bootstrapping in molecular phylogenetics: Applications and performance. In Ferraris, J. D. & Palumbi, S. R. (eds.) Molecular Zoology: Advances, strategies, and protocols, Wiley, New York pp 19-45 (1995). 7. Rambaut, A. & Grassly, N. C. Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comp. Appl. Biosci. 13, 235-238 (1997). 8. Strimmer, K. & von Haesler, A. Quartet puzzling: A quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13, 964-969 (1996). 9. Swofford, D. L. PAUP*: Phylogenetic analysis using parsimony (*and other methods), Sinauer Associates, Sunderland, Massachusetts (1988). 5 Table 1. PCR primers used in this study Gene rps2 rps2 rps2 rps2 rps2 rps2 rps2 rps11 rps11 rps11 rps11 rps11 rps11 rps11 rps11 rps11 rps11 rps11 atp1 atp1 atp1 atp1 atp1 atp1 Primer INV.R INV.F rps2.f2 ub41 rps2.r3 rps2.r4 ub39 ub3 ub6 ub14 ub4 ub5 ub8 ub1 ub17 ub18 ub32 ub33 ub25 ub26 ub27 ub28 ub29 ub30 Sequence ATAATAACTACACAATCTGGT ATACCTATTGCATCTTCAGT AAGACACTRATTTGTTTACGAA ATGACAATCCWTTCTATDGT AYGGGATAAGTKATTMKTTTAT TCMAGAATSMCTGTTTTSRT AACTGTATAGGATCATTC GAGCGCGTAGAGCAACAAGT ATGCCCCAGGAAAAAACAAC ATGCCCCAGGAAAAAACGG GGAAGTTGGGTCACATCGTGG CTTTGGGAGRCGGCANCCATTATG TCCGAGATGCTCTGTACGAAGTTCATG CTTATTGTGGATCGGTGGTAAATG CTAGCGCGCGTACTCTTCTTCTG GTTATGACTCGATGACTAAG CGAATCTACAGATCTMAA AGAAGCGTTATGACTCGATGAC TCGGTCGAGTGGTCTCAGTTG GGAGATGGGATTGCACGTG GAGAATGTAGGAAAAAAAG TCGATACTTCTGTCAGCCTT CAGCCTTGCACCTCTATTGA AAGCCTAGCACCTCTATTTG 6 Orientation R F F F R R R F F F R R R R F F F F F F F R R R Comment Inverse PCR Inverse PCR For upstream analysis For upstream analysis For upstream analysis For upstream analysis For upstream analysis For upstream analysis HGT copy specific HGT copy specific HGT copy specific Native copy specific Supplemental Figure 1 Much greater divergence of nuclear than mitochondrial genes in plants. Maximum likelihood trees of rps2 (a) and rps11 (b). Both scale bars correspond to 0.05 NT substitutions per site. Trees contain the same set of mitochondrial sequences as in Figs. 2a and 2b of main text, respectively, with 5 or 8 nuclear sequences added to the analyses. In each case, the nuclear sequences, all of which are from eudicots, form a monophyletic group, consistent with the hypothesis that they result from a common mitochondrial-to-nucleus transfer event occurring early in eudicot evolution (cf. timing of these genes’ loss from the mitochondrial genome; see Fig. 1 of main text). However, in neither gene tree is the nuclear clade placed in the position expected (marked with an asterisk) for a gene transfer event coincident in timing with mitochondrial gene loss early in eudicot evolution. This is entirely unsurprisingly owing to the highly divergent nature of the nuclear sequences relative to mitochondrial sequences and the resulting potential for artifacts of long branch attraction. Note especially the position of the rps11 nuclear clade: in the tree shown, it attaches to the long stem branch leading to monocots, and under other analytical conditions it attaches to the long outgroup branch. The clade’s alternative attachment, neither of which makes sense biologically, to either of the two longest branches of mitochondrial sequences (also compare to Fig. 2b of main text) is almost certainly a long-branch-attraction artifact. Supplemental Figure 2 Nucleotide alignment used for the rps2 analysis shown in Fig. 2a of main text. 7 Supplemental Figure 3 Nucleotide alignment used for the rps11 analysis shown in Fig. 2b, 2d and 2e of main text. Supplemental Figure 4 Nucleotide alignment used for the atp1 analysis shown in Fig. 2f of main text. Supplemental Figure 5 Nucleotide alignment used for the rps11 upstream sequence analysis shown in Fig. 2c of main text. 8