EMI_2239_sm_suppl_info

advertisement
Supplementary Table 1. Media formulations used to cultivate Carpediemonas-like
organisms.
802SW
Boil 5g of cerophyll in 1l of seawater for 5 min. Filter the medium and
autoclave. Add 10 - 12 ml per 15 ml tube.
NM
In 15 ml tube combine: 1 sterile rice grain, 0.5ml of modified ATCC
medium 1171 (prepared with heat inactivated horse serum) and 10 ml
of sterile seawater
SW1773
In 15 ml tube mix: 9ml of 802SW (see above) and 3ml of sterile ATCC
medium 1171
T/S
Mix: 485 ml of sterile seawater, 485 ml of sterile modified TYSGM-9
medium (prepared without serum – see below) and 30 ml of heat
inactivated horse serum. Add 10-12 ml per 15ml tube
3%LB
802SW/horse
serum
Modified
TYSGM-9
medium
Pre-inoculation
In 15ml tube mix: 300 µl of LB media and 10 ml of sterile seawater
Prepare horse serum slant: add 3 ml of horse serum into 15 ml tube.
Incubate tubes on side at 80°C for 2 hours. The horse serum solidifies
and forms slanted surface at the bottom of the tube. Repeat twice:
incubate the horse serum slants overnight in the room temperature
followed by incubation at 80°C for 2 hours. Add 3-4 ml of 802SW
media over horse serum slant.
In 485 ml of distilled water dissolve 1 g of Tryptone, 0.5 g of yeast
extract, 1.4 g of K2HPO4, 0.2 g of KH2PO4 and 3.7 5g of NaCl.
Autoclave. Add 15 ml of heat-inactivated bovine or horse serum.
Media for isolates PCE, PCS, NC and GSML were pre-inoculated with
Klebsiella sp.
1
Analyses of 454 data
We searched for sequences from Carpediemonas-like organisms (CLOs) in two
environmental PCR datasets from anoxic marine material (Stoeck et al., 2009 with
~250,000 454 reads, and Stoeck et al., 2010 with ~660,000 454 reads). The first includes
sequences derived from material from the Framvaren Fjord, and Cariaco Basin, the
second from Framvaren Fjord only.
The sequences in these datasets are quite short (~150bp), and mostly encompass a
variable region of the SSU rRNA gene. We were concerned, therefore, that simple
BLAST analyses would not be a very sensitive method for identifying CLO sequences, as
they are quite divergent from each other. Therefore we analyzed all 454 reads one by one,
using a combination of phylogenetic methods and similarity searches. The workflow was
as follows:
1. Each 454 read was added to a reference alignment with similar taxon sampling to
the one presented in the main paper, and aligned using the program MAFFT
(Katoh et al., 2005) with the fastest possible set-up (‘mafft –intree 1 infile
outfile’).
2. Each alignment was then analyzed with the phylogenetic analysis program
RAxML 7.0.4 (Stamakis, 2006) using the ‘–f p’ option. The program used
maximum parsimony to place the new sequence within a fixed reference tree (the
new sequence is not present in the reference tree).
3. The program PHAT (part of the PhyloGenie package; Frickey and Lupas, 2004)
was then used to filter the trees where the new sequence was branching within or
sister to Fornicata, e.g.: sequences from possible CLOs or diplomonads. After this
we were left with ~ 33,000 potential Fornicata sequences. However, as Fornicata
sequences are long-branching, this set was presumed to include many divergent
sequences from organisms unrelated to Fornicata, in addition to genuine Fornicata
sequences.
4. The potential Fornicata sequences were then extracted into a fasta file and the
program BLASTCLUST (from the NCBI blast suite) was used to cluster the
sequences that were nearly identical (similarity set to 0.95). This step grouped the
~33,000 sequences into 1490 clusters.
5. Sequences representing each cluster were analyzed by BLAST and all sequences
with obvious high similarity (>90%) to organisms other than CLOs were
discarded. Around 90% of the sequences were excluded by this step.
6. Sequences representing the remaining clusters (150) were re-aligned to the dataset
from the main paper with program MAFFT (einsi setting). For each test sequence
a phylogenetic tree was constructed using maximum likelihood using the program
RAxML 7.0.4 (with GTRGAMMAI model).
2
Results: We have identified 22 reads closely related to clade CL6 (only from Stoeck et
al., 2009, Framvaren Fjord), and 8 reads closely related to clade CL1 (only from Stoeck
et al., 2009, Framvaren Fjord). A further 11 sequences appear to be from diplomonads (8
from Framvaren fjord and 3 from Cariaco basin, from Stoeck et al., 2009).
We have also identified one sequence that branches amongst CLOs in the ML
phylogeny, but is not closely related to any of this known sequences from CL1-6. It is
possible that this sequence represents an additional CLO lineage, but more likely that it
represents an unrelated sequence that is misplaced in this phylogeny. The 454 sequences
are simply too short to make a definitive statement about position of this sequence.
Discussion: Analysis of 454 sequencing did allow us to recover sequences from two
CLO clades from environments that did not yield any CLO sequences in previous studies
that employed clone libraries and Sanger sequencing. It is possible that this was due to
the much deeper sampling available with 454 sequencing. Interestingly we did not
recover any CLO sequences in the larger of the two 454 datasets examined (Stoeck et al.
2009). Overall this suggests that shallow sequence coverage is not the only reason for the
limited recovery of CLO by previous environmental studies. It supports the idea that that
CLOs are often extremely-rare-to-nonexistent in suboxic marine systems, or that there is
some strong bias against their sequences in PCR studies.
A downside of 454 sequencing at present is the limited lengths of the reads, which
makes it difficult to place some sequences on the tree, especially if the 454 sequence is
not very similar to any available near-full-length sequence. It is quite possible that
sequences from novel CLO lineages may have been missed by our analyses for this
reason. Until longer sequences become available, we do not expect analysis of such
datasets to be a particularly effective way of identifying additional major lineages within
groups with divergent rRNA genes, such as Fornicata.
References
Frickey, T., and Lupas, N.L. (2004) PhyloGenie: automated phylome generation and
analysis. Nucleic Acids Res 32: 5231-5238.
Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005) MAFFT version 5: improvement in
accuracy of multiple sequence alignment. Nucleic Acids Res 33: 511-518.
Stamakis, A. (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses
with thousands of taxa and mixed models. Bioinformatics 22: 2688-2690.
Stoeck, T., Bass, D., Nebel, M., Christen, R., Jones, M.D.M., Breiner, H.W., and
Richards, T.A. (2010) Multiple marker parallel tag environmental DNA sequencing
reveals a highly complex eukaryotic community in marine anoxic water. Mol Ecol 19: in
press.
Stoeck, T., Behnke, A., Christen, R., Amaral-Zettler, L., Rodriguez-Mora, M.J.,
Christoserdov, A. et al. (2009) Massively parallel tag sequencing reveals the complexity
of anaerobic marine protistan communities. BMC Biol 7: 1-20.
3
Download