EMI 1361: High Intraspecific Recombination Rate in a Native Population of Candidatus Pelagibacter Ubique (SAR11) Supplementary Material Two additional tests were performed to assess recombination in 9 strains of candidatus Pelagibacter ubique (SAR11). The first method determines a standardized index of association between alleles (Haubold and Hudson, 2000). The program LIAN version 3.5 (Haubold and Hudson, 2000) calculates the standardized index of association, which tests the null hypothesis of linkage equilibrium (statistical independence of alleles at all loci) for multilocus data. The second test is a Maynard Smith and Smith’s Homoplasy Test (Maynard Smith and Smith, 1998) as implemented in the program START (Jolley, 2001). From the manual, the Homoplasy Test aims to measure the importance of recombination between members of a population. It is only valid where sequences differ by ~5% of nucleotides or less. The test tries to determine if there is a statistically significant excess of homoplasies (shared similarities found in different branches of a phylogenetic tree not inherited directly from an ancestor) derived from the dataset, compared to an estimate of the number of homoplasies expected by mutation in the absence of recombination. An excess of homoplasies is likely to have been brought about by recombination. The test requires at least six sequences containing at least ten 'informative sites' (sites at which the rarer of two alternative bases is present at least twice). A 'homoplasy ratio' is calculated which should range from zero, for a clonal population, to one, for a population under free recombination. Methods Standardized index of association: The standardized index of association was determined using the online program LIAN version 3.5 (Haubold and Hudson, 2000) which tests the null hypothesis of linkage equilibrium (statistical independence of alleles at all loci) for multilocus data. As in Whitaker (Whitaker et al., 2005), potential bias from single nucleotide substitutions was avoided by using a single polymorphic site from each gene which was present in approximately half of the sequences. With counts starting from the A in the ATG start codon, the positions used were: HSP60 – 1596, ATPDH – 600, ACoAAT – 462, recA – 123, OR – 543, DpoIII – 939, and Rpol – 534. These positions generate the following input table: HTCC1002 T A A G A T G HTCC1013 T A A A G C A HTCC1016 T C A G A T G HTCC1025 C C G A G C A HTCC1040 C C G G G C A HTCC1051 T A A A A C G HTCC1057 C C G G A C A HTCC1061 C A A G G T G HTCC1062 T C G A A T G Homoplasy Test: The program START, version 1.0.8 (Jolley, 2001), was used. Aligned sequences were analyzed with an Se value of 0.6S, the conservative default setting. Results LIAN analysis: VD = 2.39 Ve = 1.73 IAS = 0.635 Monte Carlo (1000 to 100,000) Var(VD) 0.1800 P 0.079- 0.093 Homoplasy results: Table S1. Results of Homoplasy test. P - ratio of expected homoplasies to true homoplasies after 1000 trials; ND – not determined (number of informative sites is less than 10). Gene HSP60 ATPDH ACoAAT recA OR DpolIIIα Rpolβ Variable Sites 49 45 45 27 76 14 34 Informative Sites 17 16 32 21 46 4 20 P 0.008 0.889 0.105 0.000 0.894 ND 0.016 Homoplasy ratio 0.638 -0.574 0.208 0.399 -0.338 ND 0.292 Table S2. Sequences homologous to candidatus Pelagibacter ubique pil genes. Designated HTCC1062 genes were compared with the non-redundant GenBank database using BLAST. Sequence fragments with an e-value greater than or equal to 1 were sorted by the classification of the matching sequence. Gene* Annotated gene name 0053 0054 0058 0060 0063 pilin pilin pilC pilQ pilMN (M portion) pilMN (N portion) 0065 pilD 0074 pilT Euk Proteobacteria Firmicute Cyano Oth Bac Alpha Beta Gamma Delt/Eps 0 0 1 6 2 0 0 0 1 0 40 45 2 2 5 1 0 0 20 52 11 4 7 8 0 0 35 55 9 0 0 0 0 0 0 0 1 0 0 0 359 0 0 0 3 42 0 0 0 0 26 53 9 4 3 2 0 0 20 39 8 12 11 8 Genes from HTCC1062 genome annotation (NC 007205, Genbank); Euk – Eukaryote, Cyano – Cyanobacteria, Oth Bac – Other Bacteria, Delt/Eps – Delta/Epsilon * Discussion Recombination is supported by both tests. In the first test, Monte Carlo simulations indicate that there is no significant difference from values expected under the null hypothesis of free recombination. In the second test, significant homoplasy is detected in three of six genes tested. The homoplasy test (Maynard Smith and Smith, 1998) is designed to be used when sequences are greater than 95% similar, which is the case for these gene sequences. A more distantly related sequence can be used as an outgroup to better estimate Se, but START version 1.0.8 (Jolley, 2001) does not have the capability to specify an outgroup sequence and make this calculation. Using a less conservative Se value of 0.7S, one more gene, ACoAAT, is calculated to have a significant homoplasy ratio (0.249, P=0.027). Table S2 shows that even under very permissive conditions, no genes with similarity to the pil genes found in HTCC1062 are detected in any known Alphaproteobacteria. Assuming that this type II secretion/type IV pilus assembly is involved in DNA uptake, it is possible that the recipient SAR11 cell recognizes the donor DNA. However, an exhaustive search for an uptake signal sequence or a site-specific recombination sequence yielded no statistically significant candidates. Smith, et al. (Smith et al., 1995; Smith et al., 1999) found that known uptake signal sequences are palindromic 9- or 10-mers that occur at frequencies hundreds of times above that expected by chance. The palindromic 9-mers “ATTTTTTTT” and “AAAAAAAT” were found frequently in two Candidatus Pelagibacter strains, but only five times above expected. Moreover, the specific location of the palindromes in Candidatus Pelagibacter showed that they were not paired to form hairpins nor did they show any preference for location inside of genes, at gene boundaries, or in intergenic spaces, making their role in site specific recombination doubtful. The 29 bp context upstream and downstream from the palindromic 9-mers showed a higher than normal AT content (70%-80%), but no conservation of individual positions, as was found in the uptake signal sequence of Haemophilus influenza (Smith et al., 1999). Elucidating the mechanism of DNA transfer is a topic of further research. References Haubold, B., and Hudson, R.R. (2000) LIAN 3.0: detecting linkage disequilibrium in multilocus data. Linkage Analysis. Bioinformatics 16: 847-848. Jolley, K.A., Feil, E. J., Chan, M. S., Maiden, M. C. (2001) Sequence type analysis and recombinational tests (START). Bioinformatics 17: 1230-1231. Maynard Smith, J., and Smith, N.H. (1998) Detecting recombination from gene trees. Mol Biol Evol 15: 590-599. Smith, H.O., Gwinn, M.L., and Salzberg, S.L. (1999) DNA uptake signal sequences in naturally transformable bacteria. Res Microbiol 150: 603-616. Smith, H.O., Tomb, J.F., Dougherty, B.A., Fleischmann, R.D., and Venter, J.C. (1995) Frequency and distribution of DNA uptake signal sequences in the Haemophilus influenzae Rd genome. Science 269: 538-540. Whitaker, R.J., Grogan, D.W., and Taylor, J.W. (2005) Recombination shapes the natural population structure of the hyperthermophilic archaeon Sulfolobus islandicus. Mol Biol Evol 22: 2354-2361.