Recombination in SAR11 Supporting Online Material

advertisement
EMI 1361: High Intraspecific Recombination Rate in a Native Population of
Candidatus Pelagibacter Ubique (SAR11)
Supplementary Material
Two additional tests were performed to assess recombination in 9 strains of candidatus
Pelagibacter ubique (SAR11). The first method determines a standardized index of
association between alleles (Haubold and Hudson, 2000). The program LIAN version 3.5
(Haubold and Hudson, 2000) calculates the standardized index of association, which tests
the null hypothesis of linkage equilibrium (statistical independence of alleles at all loci)
for multilocus data. The second test is a Maynard Smith and Smith’s Homoplasy Test
(Maynard Smith and Smith, 1998) as implemented in the program START (Jolley, 2001).
From the manual, the Homoplasy Test aims to measure the importance of recombination
between members of a population. It is only valid where sequences differ by ~5% of
nucleotides or less. The test tries to determine if there is a statistically significant excess
of homoplasies (shared similarities found in different branches of a phylogenetic tree not
inherited directly from an ancestor) derived from the dataset, compared to an estimate of
the number of homoplasies expected by mutation in the absence of recombination. An
excess of homoplasies is likely to have been brought about by recombination. The test
requires at least six sequences containing at least ten 'informative sites' (sites at which the
rarer of two alternative bases is present at least twice). A 'homoplasy ratio' is calculated
which should range from zero, for a clonal population, to one, for a population under free
recombination.
Methods
Standardized index of association:
The standardized index of association was determined using the online program LIAN
version 3.5 (Haubold and Hudson, 2000) which tests the null hypothesis of linkage
equilibrium (statistical independence of alleles at all loci) for multilocus data. As in
Whitaker (Whitaker et al., 2005), potential bias from single nucleotide substitutions was
avoided by using a single polymorphic site from each gene which was present in
approximately half of the sequences. With counts starting from the A in the ATG start
codon, the positions used were: HSP60 – 1596, ATPDH – 600, ACoAAT – 462, recA –
123, OR – 543, DpoIII – 939, and Rpol – 534. These positions generate the following
input table:
HTCC1002 T A A G A T G
HTCC1013 T A A A G C A
HTCC1016 T C A G A T G
HTCC1025 C C G A G C A
HTCC1040 C C G G G C A
HTCC1051 T A A A A C G
HTCC1057 C C G G A C A
HTCC1061 C A A G G T G
HTCC1062 T C G A A T G
Homoplasy Test:
The program START, version 1.0.8 (Jolley, 2001), was used. Aligned sequences were
analyzed with an Se value of 0.6S, the conservative default setting.
Results
LIAN analysis:
VD = 2.39
Ve = 1.73
IAS = 0.635
Monte Carlo (1000 to 100,000)
Var(VD) 0.1800
P
0.079- 0.093
Homoplasy results:
Table S1. Results of Homoplasy test. P - ratio of expected homoplasies to true
homoplasies after 1000 trials; ND – not determined (number of informative sites is less
than 10).
Gene
HSP60
ATPDH
ACoAAT
recA
OR
DpolIIIα
Rpolβ
Variable Sites
49
45
45
27
76
14
34
Informative Sites
17
16
32
21
46
4
20
P
0.008
0.889
0.105
0.000
0.894
ND
0.016
Homoplasy ratio
0.638
-0.574
0.208
0.399
-0.338
ND
0.292
Table S2. Sequences homologous to candidatus Pelagibacter ubique pil genes.
Designated HTCC1062 genes were compared with the non-redundant GenBank database
using BLAST. Sequence fragments with an e-value greater than or equal to 1 were sorted
by the classification of the matching sequence.
Gene* Annotated gene name
0053
0054
0058
0060
0063
pilin
pilin
pilC
pilQ
pilMN (M portion)
pilMN (N portion)
0065 pilD
0074 pilT
Euk
Proteobacteria
Firmicute Cyano Oth Bac
Alpha Beta Gamma Delt/Eps
0
0
1
6
2
0
0
0
1
0
40
45
2
2
5
1
0
0
20
52
11
4
7
8
0
0
35
55
9
0
0
0
0
0
0
0
1
0
0
0
359
0
0
0
3
42
0
0
0
0
26
53
9
4
3
2
0
0
20
39
8
12
11
8
Genes from HTCC1062 genome annotation (NC 007205, Genbank); Euk – Eukaryote,
Cyano – Cyanobacteria, Oth Bac – Other Bacteria, Delt/Eps – Delta/Epsilon
*
Discussion
Recombination is supported by both tests. In the first test, Monte Carlo simulations
indicate that there is no significant difference from values expected under the null
hypothesis of free recombination. In the second test, significant homoplasy is detected in
three of six genes tested. The homoplasy test (Maynard Smith and Smith, 1998) is
designed to be used when sequences are greater than 95% similar, which is the case for
these gene sequences. A more distantly related sequence can be used as an outgroup to
better estimate Se, but START version 1.0.8 (Jolley, 2001) does not have the capability to
specify an outgroup sequence and make this calculation. Using a less conservative Se
value of 0.7S, one more gene, ACoAAT, is calculated to have a significant homoplasy
ratio (0.249, P=0.027).
Table S2 shows that even under very permissive conditions, no genes with similarity to
the pil genes found in HTCC1062 are detected in any known Alphaproteobacteria.
Assuming that this type II secretion/type IV pilus assembly is involved in DNA uptake, it
is possible that the recipient SAR11 cell recognizes the donor DNA. However, an
exhaustive search for an uptake signal sequence or a site-specific recombination sequence
yielded no statistically significant candidates. Smith, et al. (Smith et al., 1995; Smith et
al., 1999) found that known uptake signal sequences are palindromic 9- or 10-mers that
occur at frequencies hundreds of times above that expected by chance. The palindromic
9-mers “ATTTTTTTT” and “AAAAAAAT” were found frequently in two Candidatus
Pelagibacter strains, but only five times above expected. Moreover, the specific location
of the palindromes in Candidatus Pelagibacter showed that they were not paired to form
hairpins nor did they show any preference for location inside of genes, at gene boundaries,
or in intergenic spaces, making their role in site specific recombination doubtful. The 29
bp context upstream and downstream from the palindromic 9-mers showed a higher than
normal AT content (70%-80%), but no conservation of individual positions, as was found
in the uptake signal sequence of Haemophilus influenza (Smith et al., 1999). Elucidating
the mechanism of DNA transfer is a topic of further research.
References
Haubold, B., and Hudson, R.R. (2000) LIAN 3.0: detecting linkage disequilibrium in
multilocus data. Linkage Analysis. Bioinformatics 16: 847-848.
Jolley, K.A., Feil, E. J., Chan, M. S., Maiden, M. C. (2001) Sequence type analysis and
recombinational tests (START). Bioinformatics 17: 1230-1231.
Maynard Smith, J., and Smith, N.H. (1998) Detecting recombination from gene trees.
Mol Biol Evol 15: 590-599.
Smith, H.O., Gwinn, M.L., and Salzberg, S.L. (1999) DNA uptake signal sequences in
naturally transformable bacteria. Res Microbiol 150: 603-616.
Smith, H.O., Tomb, J.F., Dougherty, B.A., Fleischmann, R.D., and Venter, J.C. (1995)
Frequency and distribution of DNA uptake signal sequences in the Haemophilus
influenzae Rd genome. Science 269: 538-540.
Whitaker, R.J., Grogan, D.W., and Taylor, J.W. (2005) Recombination shapes the natural
population structure of the hyperthermophilic archaeon Sulfolobus islandicus. Mol Biol
Evol 22: 2354-2361.
Download